10X AI (Issue #41): Google Genie, Mistral Large, Ideogram 1.0, and a Cloud-Brain Squirrel
Sunday Showdown #1: DALL-E 3 and Imagen 2 create logos. Which one is better?
Happy Sunday, friends!
Welcome back to 10X AI: a weekly look at generative AI news for the average user.
New structure. New paid segment.
I know many of you like the news round-up and want it to continue.
So don’t worry, the “AI News” section isn’t going away.
At the same time, I wanted to go the extra mile for my paid subscribers.
That’s why, from now on, 10X AI will consist of:
AI news (free): The usual round-up of AI news but with shorter write-ups and curated links to external sources for more information.
AI fail (free): The usual AI image or video mishap for your entertainment.
Sunday Showdown (paid subscribers only): A new segment, where I pit similar AI tools against each other in solving a specific real-world task.
Sunday tip (paid subscribers only): A hands-on tip for using AI. Where possible, the tip is related to the “Sunday Showdown” participants.
I’m excited about the new format, and I hope my paid subscribers find it a nice addition.
Let’s get to it!
🗞️ AI news
Here are this week’s AI developments.
1. Google’s Genie creates playable words out of images
What’s up?
Google introduced a new foundation world model trained on Internet videos, which can turn a static image, photo, or a hand-drawn sketch into a playable world.
Why should I care?
The training was done without any action labels. Genie learned on its own by working through a large dataset of public videos, primarily 2D platformer games and robotics. Google believes this approach can work for training AI models in any domain.
Where can I learn more?
Read the official post from Google.
Study the research paper (PDF).
Watch this walkthrough by AI Explained:
2. Mistral Large is catching up to GPT-4
What’s up?
Mistral AI released a new flagship model called Mistral Large.
Why should I care?
Mistral Large approaches GPT-4-level performance on the MMLU benchmark and gets high scores on math, reasoning, and coding tasks. It’s fluent in several languages and achieves high scores on multilingual benchmarks. Finally, Mistral Large is natively capable of function calling, which makes it easier for developers to integrate it with their applications.
Where can I learn more?
Read the official announcement.
Try it for free in Mistral’s new chat interface (requires sign-up).
Watch this coding test by Mervin Praison:
3. Ideogram 1.0 is even better at text
What’s up?
Ideogram just released its first full-number model: 1.0.
Why should I care?
Ideogram 0.1 was the first text-to-image model that could reliably spell when it came out in August 2023. (Before DALL-E 3, Midjourney V6, and Stable Diffusion 3 caught up.) Now, the makers of Ideogram claim that 1.0 has lower text rendering error rates than DALL-E 3 and is preferred over both DALL-E 3 and Midjourney V6 by human evaluators.
Where can I learn more?
Read the official blog post.
Check out the official Twitter / X thread.
Try 1.0 for free at ideogram.ai.
4. Playground v2.5 has the best aesthetics?
What’s up?
After releasing v2.0 last December, Playground is ready with the next iteration of its text-to-image model: v2.5.
Why should I care?
Playground v2.5 shows a “surprisingly significant increase in aesthetic quality.” It is preferred by testers to most existing text-to-image models in internal evaluation (although some have challenged the approach used to measure this).
Where can I learn more?
Read the official announcement post.
Check out the announcement Tweet from Suhail (founder).
Try v2.5 for free at Playground.com.
5. MusicFX lets you become a DJ…sort of?
What’s up?
MusicFX, the winner of my recent Battle of the Bands, now has a “DJ mode.”
Why should I care?
The DJ Mode lets you fuse different beats, instruments, and sound effects while seamlessly tweaking the live track to match your changes on the fly. You can add your own prompts and adjust the strength of each. It’s free to test if you’re in the US.
Where can I learn more?
Read this Tweet from Google DeepMind researcher Adam Roberts.
Try the “DJ mode” for yourself in the AI Test Kitchen.
Check out my quick test:
6. Pika characters can talk now
What’s up?
Pika just released a new feature called Lip Sync in Early Access for Pro users.
Why should I care?
You can make lip-synced videos directly on Pika. Previously, this required working with separate third-party tools to first generate the voice(s) and then animate the characters’ mouths. Now, thanks to a partnership with ElevenLabs, Pika can do this natively.
Where can I learn more?
Read the official announcement Tweet.
If you have a Pro account, try it for yourself.
Watch this hands-on guide from Curious Refuge:
7. Alibaba’s Emote Portrait Alive is uncannily good
What’s up?
Emote Portrait Alive (EMO) can animate static images by using voice clips as input.
Why should I care?
Not only do the resulting animated characters lip sync perfectly, their head movements and facial expressions match the emotion behind the voice. It’s nuts. You can effectively make characters sing and act. EMO is in the research phase for now.
Where can I learn more?
Visit the project page.
Read the research paper.
Check out this video with examples:
8. Adobe branches out into AI audio
What’s up?
Adobe Research demoed Project Music GenAI Control (catchy!) that lets people co-create music tracks with AI.
Why should I care?
The tool lets you generate a track from a text prompt (like other text-to-music models) and then make edits to it in the same interface. Users can extend clips, create loops, increase the intensity of a track, adjust tempo, remix sections, and more.
Where can I learn more?
Read the announcement blog post from Adobe.
9. New apps for Inflection’s Pi and Brave’s Leo
What’s up?
Inflection’s Pi chatbot now has a desktop app. Brave’s Leo AI assistant now has an Android app.
Why should I care?
The two companies aren’t related to each other, but each came out with apps that give users new ways to access them. Pi can now live on your desktop, only a click away. Leo can finally be installed as an app by Android users.
Where can I learn more?
Check out the Pi announcement on Twitter / X.
Download the Pi app for Mac or Windows.
Check out the Leo announcement post.
Download the Leo app.
🤦♂️ 10. AI fail of the week
In the process of getting to this, I had to first suffer through this:
⚔️ Sunday Showdown (Issue #1) - DALL-E 3 vs. Imagen 2: Which makes the best logo?
Hey, remember this Super Bowl ad that shows people proudly telling their doubters “Watch me!” and then outsourcing all the work to Microsoft Copilot?
I’d like to focus on the 45-second mark where the Copilot creates a logo for a truck repair garage called “Mike’s.”
Today, I want to see how two free, widely available tools (plus a bonus paid contestant) handle a similar task in the real world.
Let’s take a look: