10X AI (Issue #7): Meta's MusicGen, Google's Releases, AI Design Tools, and Extra Limbs
Plus function calling for GPT models, Meta's I-JEPA, and using Bing or ChatGPT to explain foreign idioms, proverbs, and other abstract concepts.
Happy Sunday, friends!
Welcome back to 10X AI: a weekly look at beginner-focused AI news, tools, and tips.
Let’s dive right in.
🗞️AI news
Here are this week’s AI developments.
1. Meta joins the text-to-music game
On the text-to-music front, I’ve already looked at Riffusion and Google’s MusicLM.
Now, Meta has entered the scene with its open-source MusicGen model, capable of generating tracks from text prompts.
What’s cool about MusicGen is that it also lets you supply a melody alongside the text prompt (optional), which the model will use for inspiration.
Here’s what I got for “upbeat funk track” mixed with the Super Mario theme song:
Or how about Super Mario + “hardcore hip-hop”?
There’s a public MusicGen demo that generates 12-second tracks. Enjoy!
2. A bunch of Google AI releases (US only)
Google’s AI train continues to chug along, with several consumer-facing releases this week.
Google Lens can now help identify skin conditions. Self-diagnosis, yay! (But seriously, you should probably still consult a doctor.)
Google Shopping added a “virtual try-on” tool that can show how different clothes might look on a variety of models.
The Search Generative Experience (SGE) got better at assisting you with shopping and travel decisions.
Google Maps received “glanceable directions” (directions and ETA shown directly on your lock screen) and Immersive View now features 500 new landmarks.
All of the above are only available in English and exclusively in the US for now.
3. Meta’s I-JEPA
But we’re not quite done with Meta yet!
The company has also open-sourced their new vision model called I-JEPA (or “Image-Based Joint Embedding Predictive Architecture” to its friends).
I-JEPA is a big deal because it is the first such AI model that learns by creating an internal model of the outside world, comparing abstract representations of images rather than the pixels themselves.
All of this makes I-JEPA more accurate, robust, and efficient.
The more tech-savvy among you can access the code and model checkpoints here.
4. GPT function calling: a game changer?
I normally keep the “News” segment focused on developments and releases that are instantly tangible or have a demo the average user can immediately check out.
But I had to make an exception for this one.
OpenAI just announced a new feature called “function calling” for GPT-4 and GPT-3.5.
In short, developers can now “teach” GPT specific functions of their tool and have GPT reliably call on those functions using the JSON format. ChatGPT plugins previously worked in a similar way but these interactions were much less reliable.
What function calling means is that GPT can now become a more efficient “tool user,” relying on third-party software to make up for its own limitations and weaknesses.
I expect we’ll see a whole slew of tools in the coming weeks that take advantage of function calling to bring GPT models to new levels of usefulness.
🛠️AI tools
This week, I’ve pulled together a few tools for visuals and graphic design.
5. Recraft
With Recraft, you can create and modify images in many styles and formats on an endless canvas:
For instance, you can make a chicken portrait in a cartoon style:
Then select the top of the chicken’s head…
…and ask for realistic Devil horns:
Excellent:
Recraft is especially useful to designers as images can be vectorized and exported as SVG and Lottie for further processing.
6. Sivi
Rather than dealing with standalone images like Recraft, Sivi is all about creating marketing collateral like banners and social media posts for a variety of channels.
You explain the main concept you need in words:
If you’re into magic spellcasting like some people out there, you might ask for an Instagram post about your new spellcasting eBook:
After about one minute, Sivi will spit out 16 design variations in your chosen format, complete with relevant placeholder copy:
From here, you can modify the design with your own text, customizations, etc.
You get a few free credits to test out the features and see if Sivi is for you.
7. ClipDrop
In the last edition of 10X AI, I mentioned ClipDrop’s new “Uncrop” and “Reimagine XL” features. But the site has plenty of other tools to choose from:
CrlipDrop has solutions for image upscaling, background removal/replacement, text manipulation, and more.
Now that it’s owned by Stability AI, we can expect ClipDrop to continuously release new features driven by generative AI in the future.
Their rather generous free plan lets you try all of the tools.
8. Lunacy
Lunacy claims to pull together all the features you’d find in Adobe XD, Figma, Sketch, and others into an all-in-one professional design app:
Unlike the rest of today’s tools, Lunacy requires a download instead of running in a browser. Here’s how the Windows app dashboard looks on my end:
For a casual user like me, this is quite an overwhelming amount of features and options to get used to. But seasoned graphic designers will probably feel right at home. It certainly doesn’t hurt that Lunacy seems to be 100% free.
💡AI tips
Here’s this week’s tip.
9. Ask Bing or ChatGPT to explain foreign idioms
Google Translate is very useful, but it struggles with idioms and other concepts that can’t be translated literally.
Here’s its attempt to translate a Danish expression “at have is i maven”:
Bing and ChatGPT, on the other hand, can actually understand the broader context. You can therefore use them to understand the indirect meaning of idioms, proverbs, and other not-so-easily-translatable expressions:
Not only does Bing explain the concept clearly and accurately, it even provides us with a relatable real world example. Neat!
🤦♂️10. AI fail of the week
When it comes to handstands, having an extra set of hands is handy.
Well done. The music creation stuff is moving really fast! I had a friend who described his experience maybe a month and a half ago, and it seems like you can do a TON more, much more easily right now. This revolution is less about being able to do things we were unable to before, and much more about doing things we could have done in 10 hours in 10 minutes.