10X AI (Issue #43): Consistent Characters in MJ, Yelp in Perplexity, and an Owl Viking
Sunday Showdown #3: ElevenLabs' "Sound Effects" vs. Meta's "Audiobox": Which one generates the most realistic sounds?
Happy Sunday, friends!
Welcome back to 10X AI: a weekly look at generative AI covering the following:
AI News + AI Fail (free): I highlight nine noteworthy news stories of the week and share an AI photo or video fail for your entertainment.
Sunday Showdown + AI Tip (paid): I pit AI tools against each other in solving a specific real-world task and share a hands-on tip for working with AI.
Paid subscribers get instant access to every Sunday Showdown and other bonus perks.
Heads up: I’ll be traveling with the family for Easter next week, so I’ll likely skip an issue of 10X AI.
Let’s get to it!
🗞️ AI news
Here are this week’s AI developments.
1. Midjourney can do consistent characters at last
What’s up?
Midjourney introduced a new --cref parameter that lets you recreate the same character in multiple images from a single input URL.
Why should I care?
Character consistency is kind of the Holy Grail for many who work with image models. Being able to reuse a character across different images opens up new possibilities within visual storytelling, virtual photoshoots, and more.
Where can I learn more?
Check out Midjourney’s official announcement thread on Twitter / X.
Read my recent deep dive:
2. Pika Labs now also does sound effects
What’s up?
Pika Labs now lets users generate sounds for their video clips directly in the tool.
Why should I care?
Pika Labs makes it into 10X AI for the third week in a row. The company appears dead set on building a one-stop shop for video creation, including editing features, lip sync, and as of today, sound generation. Having the complete toolkit in one place is a major convenience for creators. Pika Labs is well on the way to becoming a go-to tool.
Where can I learn more?
Check out Pika’s official announcement on Twitter / X.
If you’re a paid user, you can already test the feature on Pika.art.
3. Claude 3 Haiku rolls out to the public
What’s up?
The smallest Claude 3 model, Haiku, can now be accessed via API and at Claude.ai by paying customers.
Why should I care?
While it’s the least impressive Claude 3 sibling, Haiku confidently outperforms other LLMs in its class, including GPT-3.5 and Gemini 1.0 Pro. Also, it’s the cheapest Claude 3 version to run, and its speed makes it invaluable for situations where short response times matter, like customer service.
Where can I learn more?
Read the official announcement post.
Check out the announcement thread on Twitter / X.
Watch this customer service use case demo by Anthropic:
4. Pi, everywhere, all at once
What’s up?
Inflection has rolled Pi out to Rakuten Viber, meaning it’s now available on 13 different platforms.
Why should I care?
While other companies are trying to one-up each other on LLM benchmarks, Inflection is focused on building a friendly chatbot that’s always a click away, accessible on whatever platform you’re using.
Where can I learn more?
Read the official announcement by Inflection.
Get Pi for Viber.
5. Perplexity integrates maps and Yelp reviews
What’s up?
Perplexity now shows maps and Yelp reviews for results involving local searches.
Why should I care?
The company is constantly evolving toward delivering dynamic details and real-time answers. Now, users can instantly access Yelp reviews for businesses, relevant photos, location maps, and more. This should help Perplexity position itself as a viable alternative to Google for local results.
Where can I learn more?
Check out the official announcement thread on Twitter / X.
Follow the discussion under the Reddit announcement post.
Read the deep-dive article by The Verge.
6. Amazon can generate product listing from URLs
What’s up?
Amazon sellers will soon be able to create a full listing from a single URL.
Why should I care?
If you sell on Amazon but have your products listed on a separate direct-to-consumer site, you can now bring them to Amazon with a single click. Just provide a URL to your existing product, and Amazon’s AI will generate the entire Amazon listing for you. (You can edit it as much as needed afterward.) Rollout is starting now and Amazon expects US sellers to have the functionality within several weeks.
Where can I learn more?
Read the official announcement post.
7. Devin is an autonomous AI software engineer
What’s up?
Cognition AI announced a new AI agent, Devin, who can code independently.
Why should I care?
Autonomous AI agents were all the rage about a year ago, but they didn’t quite work. Devin isn’t flawless either, but it can solve tasks on its own by creating a plan, writing the code, troubleshooting issues, and more. Devin managed to complete almost 14% of SWE-bench tasks based on real-world GitHub issues, unassisted. (In comparison, the next-in-line challenger, Claude 2, could only resolve about 5% with assistance.)
Where can I learn more?
Read the official announcement post (with many demo examples).
Watch this intro:
8. VLOGGER: image-to-animated-avatar research
What’s up?
Google’s VLOGGER can turn a single input image into an animated avatar.
Why should I care?
VLOGGER works a lot like EMO from Alibaba, using an input voice clip to animate a static reference image. But VLOGGER also offers the ability to dub videos into different languages, edit specific elements in a given video, and more. Like EMO, VLOGGER is just at the research stage for now.
Where can I learn more?
Read the research paper (PDF).
Visit the VLOGGER project page with many examples.
9. Meta’s EVE can edit videos using text commands
What’s up?
Emu Video Edit (EVE) achieves state-of-the-art performance on Text Guided Video Editing (TGVE) benchmarks.
Why should I care?
EVE would give people the ability to make precise changes to the original video using simple text prompts without sacrificing output quality. It does this by precisely editing each video frame while still ensuring coherence between them. EVE is also just in the research phase at this time.
Where can I learn more?
Read the research paper.
Check out the EVE project page with many examples.
🤦♂️ 10. AI fail of the week
“Excuse me, sir, can you tell me how I can get to AAAAAAAAAAAAHHH!!!”
For some strange reason, Substack doesn’t allow free subscribers to comment on posts with paid sections, but I am always open to your feedback. You can message me here:
⚔️ Sunday Showdown (Issue #3) - “Sound Effects” vs. “Audiobox”: Which is more realistic?
As Pika’s demo above shows, a sound effect can really elevate your story or video.
But Pika isn’t the only player in this game.
For instance, Meta AI’s Audiobox has been out since last December. And this week, I got early access to ElevenLabs’ AI Sound Effects.
So I wanted to see which of them sounds the most realistic.
Let’s find out!