10X AI (Issue #7): Meta's MusicGen, Google's Releases, AI Design Tools, and Extra Limbs

Plus function calling for GPT models, Meta's I-JEPA, and using Bing or ChatGPT to explain foreign idioms, proverbs, and other abstract concepts.

Daniel Nest

Jun 18, 2023

Happy Sunday, friends!

Welcome back to 10X AI: a weekly look at beginner-focused AI news, tools, and tips.

Let’s dive right in.

🗞️AI news

Here are this week’s AI developments.

1. Meta joins the text-to-music game

On the text-to-music front, I’ve already looked at Riffusion and Google’s MusicLM.

Now, Meta has entered the scene with its open-source MusicGen model, capable of generating tracks from text prompts.

What’s cool about MusicGen is that it also lets you supply a melody alongside the text prompt (optional), which the model will use for inspiration.

Here’s what I got for “upbeat funk track” mixed with the Super Mario theme song:

Or how about Super Mario + “hardcore hip-hop”?

There’s a public MusicGen demo that generates 12-second tracks. Enjoy!

2. A bunch of Google AI releases (US only)

Google’s AI train continues to chug along, with several consumer-facing releases this week.

Google Lens can now help identify skin conditions. Self-diagnosis, yay! (But seriously, you should probably still consult a doctor.)
Google Shopping added a “virtual try-on” tool that can show how different clothes might look on a variety of models.
The Search Generative Experience (SGE) got better at assisting you with shopping and travel decisions.
Google Maps received “glanceable directions” (directions and ETA shown directly on your lock screen) and Immersive View now features 500 new landmarks.

All of the above are only available in English and exclusively in the US for now.

3. Meta’s I-JEPA

But we’re not quite done with Meta yet!

The company has also open-sourced their new vision model called I-JEPA (or “Image-Based Joint Embedding Predictive Architecture” to its friends).

I-JEPA is a big deal because it is the first such AI model that learns by creating an internal model of the outside world, comparing abstract representations of images rather than the pixels themselves.

All of this makes I-JEPA more accurate, robust, and efficient.

The more tech-savvy among you can access the code and model checkpoints here.

4. GPT function calling: a game changer?

I normally keep the “News” segment focused on developments and releases that are instantly tangible or have a demo the average user can immediately check out.

But I had to make an exception for this one.

OpenAI just announced a new feature called “function calling” for GPT-4 and GPT-3.5.

In short, developers can now “teach” GPT specific functions of their tool and have GPT reliably call on those functions using the JSON format. ChatGPT plugins previously worked in a similar way but these interactions were much less reliable.

What function calling means is that GPT can now become a more efficient “tool user,” relying on third-party software to make up for its own limitations and weaknesses.

I expect we’ll see a whole slew of tools in the coming weeks that take advantage of function calling to bring GPT models to new levels of usefulness.

🛠️AI tools

This week, I’ve pulled together a few tools for visuals and graphic design.

5. Recraft

With Recraft, you can create and modify images in many styles and formats on an endless canvas:

For instance, you can make a chicken portrait in a cartoon style:

Asking for a "Chicken portrait" in Recraft

Then select the top of the chicken’s head…

Selecting the top of a chicken's head in Recraft

…and ask for realistic Devil horns:

Excellent:

Chicken with Devil horns in Recraft — What?! You’ve seen more realistic Devil horns?

Recraft is especially useful to designers as images can be vectorized and exported as SVG and Lottie for further processing.

Check out Recraft

6. Sivi

Rather than dealing with standalone images like Recraft, Sivi is all about creating marketing collateral like banners and social media posts for a variety of channels.

You explain the main concept you need in words:

If you’re into magic spellcasting like some people out there, you might ask for an Instagram post about your new spellcasting eBook:

Asking Sivi for an Instagram post about a magic spellcasting eBook

After about one minute, Sivi will spit out 16 design variations in your chosen format, complete with relevant placeholder copy:

Sivi design options for a magic spellcasting eBook

From here, you can modify the design with your own text, customizations, etc.

You get a few free credits to test out the features and see if Sivi is for you.

Check out Sivi

7. ClipDrop

In the last edition of 10X AI, I mentioned ClipDrop’s new “Uncrop” and “Reimagine XL” features. But the site has plenty of other tools to choose from:

Grid with nine different image manipulation tools in ClipDrop

CrlipDrop has solutions for image upscaling, background removal/replacement, text manipulation, and more.

Now that it’s owned by Stability AI, we can expect ClipDrop to continuously release new features driven by generative AI in the future.

Their rather generous free plan lets you try all of the tools.

Check out ClipDrop

8. Lunacy

Lunacy claims to pull together all the features you’d find in Adobe XD, Figma, Sketch, and others into an all-in-one professional design app:

Unlike the rest of today’s tools, Lunacy requires a download instead of running in a browser. Here’s how the Windows app dashboard looks on my end:

For a casual user like me, this is quite an overwhelming amount of features and options to get used to. But seasoned graphic designers will probably feel right at home. It certainly doesn’t hurt that Lunacy seems to be 100% free.

Check out Lunacy

💡AI tips

Here’s this week’s tip.

9. Ask Bing or ChatGPT to explain foreign idioms

Google Translate is very useful, but it struggles with idioms and other concepts that can’t be translated literally.

Here’s its attempt to translate a Danish expression “at have is i maven”:

Google Translate translating "at have is i maven" literally

Bing and ChatGPT, on the other hand, can actually understand the broader context. You can therefore use them to understand the indirect meaning of idioms, proverbs, and other not-so-easily-translatable expressions:

Bing explaining the Danish concept of "at have is i maven" accurately with examples

Not only does Bing explain the concept clearly and accurately, it even provides us with a relatable real world example. Neat!

🤦‍♂️10. AI fail of the week

When it comes to handstands, having an extra set of hands is handy.

Three people doing handstands in the park. They each have two sets of arms.

Sunday poll time

Liked the post? Help me grow Why Try AI by sharing it with others!

Andrew Smith

Well done. The music creation stuff is moving really fast! I had a friend who described his experience maybe a month and a half ago, and it seems like you can do a TON more, much more easily right now. This revolution is less about being able to do things we were unable to before, and much more about doing things we could have done in 10 hours in 10 minutes.

Expand full comment

4 replies by Daniel Nest and others

4 more comments...

Why Try AI