10X AI (Issue #44): Gemini 1.5 Pro For All, Adobe Structure Reference, and Human Cocoons
Sunday Showdown #4: ChatGPT vs. Pi: Which one is the funniest?
Happy Sunday, friends!
Welcome back to 10X AI: a weekly look at generative AI that covers the following:
AI News + AI Fail (free): I highlight nine noteworthy news stories of the week and share an AI photo or video fail for your entertainment.
Sunday Showdown + AI Tip (paid): I pit AI tools against each other in solving a specific real-world task and share a hands-on tip for working with AI.
Remember: Paid subscribers get instant access to every Sunday Showdown + AI Tip and other bonus perks.
This is a catch-up edition so I’m highlighting items from the past two weeks.
Let’s get to it.
This post might get cut off in some email clients. Click here to read it online.
🗞️ AI news
Here are this week’s AI developments.
1. You can now try Google Gemini Pro 1.5 for free
What’s up?
Without much fanfare, Google made its current best model—Gemini Pro 1.5—available to everyone for free via the Google AI Studio.
Why should I care?
Anyone living in a supported country can now test Gemini Pro 1.5. Try it with uploaded images, videos, and documents, or simply copy-paste text to analyze into its massive 1M token window. The AI Studio was a bit buggy with uploaded items in my testing, but it appears to work flawlessly with text-only interactions.
Where can I learn more?
Read this Reddit thread.
Check out Google AI Studio for yourself.
Watch this test drive by 1littlecoder:
2. Use images as a Structure Reference in Adobe Firefly
What’s up?
Adobe added a Structure Reference feature in Firefly, reminiscent of ControlNet in Stable Diffusion.
Why should I care?
Structure Reference gives you unprecedented control over the composition of your image. Firefly will mimic the layout and structure of the input image while following your text prompt, so you can e.g. pose subjects using model references or use a sketch to define the look of your images.
Where can I learn more?
Read the official announcement post.
Try Structure Reference yourself at Adobe Firefly.
Watch this intro by photoshopCAFE:
3. Hume’s EVI can chat naturally
What’s up?
Hume AI released a demo of its Empathic Voice Interface (EVI) that can chat with you in real time.
Why should I care?
AI voice chats are nothing new (ChatGPT can do it). What makes EVI special is its uncanny ability to pick up on emotional cues, adjust its responses based on continued interactions, and handle interruptions much better than any voice AI assistant I’ve tried so far. EVI’s demo voice is rather stilted and synthetic, but it can truly hold a dialogue.
Where can I learn more?
Read the official announcement post.
4. Remix images endlessly with Freepik Reimagine
What’s up?
Freepik Company has a new feature called Reimagine that instantly makes multiple variations of any uploaded image (the ones above are of my kawaii sofa).
Why should I care?
Reimagine works like Midjourney’s Vary (Subtle) / (Strong) command. But unlike Midjourney, Reimagine is:
Free to use (with a daily limit and a reduced feature set)
Works with any uploaded image (Midjourney only remixes its own generations)
Generates variations in real time
Where can I learn more?
Read the official announcement post.
5. Character.AI makes AI voices available for all
What’s up?
Character.AI made its Character Voice feature free for all users.
Why should I care?
Character Voice used to be available only to paid members. Now everyone can use it. Pick a voice for your characters from a library or clone your own by recording a short clip. Character Voice appears to only work with the official mobile app though, so you’ll need to download that first.
Where can I learn more?
Read the official announcement post.
Find answers in the Character Voice FAQ.
6. Runway (lip)syncs up with Pika Labs
What’s up?
Runway released a lip sync feature, several weeks after Pika Labs did the same.
Why should I care?
Runway is generally considered the go-to AI video tool (despite its lackluster performance in my October 2023 test). For a few weeks, Pika had something Runway didn’t: The ability to make characters talk using a voice clip. Now, users in Runway’s Creative Partners Program can do this directly within Runway. (I’d expect this to be rolled out to all users soon.)
Where can I learn more?
Read and listen to this test by Tom’s Guide.
Consult the Lip Sync FAQ.
7. Two new launches from Stability AI
What’s up?
Stability AI released a 3D model called Stable Video Diffusion 3D and an instruction-tuned coding model called Stable Code Instruct 3B.
Why should I care?
Both models offer state-of-the-art performance in their respective categories against competitors of comparable size. Stable Video Diffusion 3D “advances the field of 3D technology, delivering greatly improved quality and multi-view", while Stable Code Instruct 3B “outperforms larger size models such as CodeLlama 7B Instruct and performs comparably with StarChat 15B in software engineering-related tasks.”
Where can I learn more?
Read the official announcement post about Stable Video Diffusion 3D.
Read the official announcement post about Stable Code Instruct 3B.
8. DBRX: The new king of open-source LLMs?
What’s up?
Databricks released a new state-of-the-art open LLM called DBRX.
Why should I care?
With new large language models coming out weekly, it’s hard to get excited about yet another one. But DBRX is both faster and more performant than every other open-source LLM out there, including Mixtral and Llama 2. Being open source also means that others are free to built and iterate upon DBRX. It’s available as a base model (DBRX Base) and a finetuned version (DBRX Instruct).
Where can I learn more?
Chat with DBRX Instruct on Hugging Face.
Download the weights for DBRX Base or DBRX Instruct.
9. Bezi AI: a better way to work with 3D designs
What’s up?
Bezi AI offers a collaborative space and 3D generation for designers working on apps or games.
Why should I care?
Text-to-3D isn’t exactly new (see e.g. Meshy or Luma Labs). But Bezi AI is more than a single-asset generator. It’s a complete suite of tools that lets you create new 3D objects from scratch, place many of them into a scene, and collaborate in real time on larger projects. In my limited testing, the quality and precision of generated 3D assets is great.
Where can I learn more?
Read the official announcement blog post.
Try Bezi AI for free on the website (you get 10 free credits upon signup).
🤦♂️ 10. AI fail of the week
What I was going for. The existential horror I saw before I got there:
Got anything on your mind?
Sadly, Substack doesn’t allow free subscribers to comment on posts with paid sections, but I am always open to your feedback.
You can message me directly:
⚔️ Sunday Showdown (Issue #4) - ChatGPT-4 vs. Pi: Which one tells the best jokes?
In my ongoing AI Jest Daily experiment, I explore whether ChatGPT with DALL-E 3 can deliver passable visual humor.
But what about simply telling jokes?
How does ChatGPT—which is often seen as generic and stilted—fare against the friendly and chatty Pi?
Let’s find out!
The verdict may surprise you.
(Especially if you’re expecting the opposite result. That’s how surprises work.)