Sunday Rundown #48: Firefly Image 3 Is Here (And So Is “Clone-Mom”)

Sunday Showdown #8: Firefly Image 3 vs. DALL-E 3: Which one is best at prompt adherence and text rendering?

Daniel Nest

Apr 28, 2024

∙ Paid

Happy Sunday, friends!

Welcome back to the Sunday Rundown (formerly “10X AI”)—a weekly look at generative AI that covers the following:

Sunday Rundown + AI Fail (free): I share this week’s AI news and a fail for your entertainment.
Sunday Showdown + AI Tip (paid): I pit AI tools against each other in solving a specific real-world task and share a hands-on tip for working with AI.

Let’s get to it.

🗞️ AI news

The news section looks different this week, so let me explain:

What’s changing?

Instead of an unstructured list, I now have four news categories:

AI releases: New products, features, or models that you can try right away.
AI research: New AI demos and previews that aren’t yet available to the public.
AI resources: New sites, tools, or videos that teach you about AI.
AI random: Other notable stories (if any) that don’t fit in the above categories.

Why?

Two reasons:

Arbitrarily picking exactly 9 news items for the “10X AI” gimmick felt unnecessarily constraining. (This also explains the new “Sunday Rundown” name.)
I wanted a more organized and skimmable overview to help you quickly get the gist and choose which stories to follow more closely.

I hope you like the new format - please vote in the poll below to let me know!

👩‍💻 AI Releases

New stuff you can try right now:

Adobe released the next iteration of its AI image model: Firefly Image 3. It produces higher-quality images and is better at prompt adherence and text rendering.
(Read the announcement post. | Try making images on firefly.adobe.com.)
Microsoft launched a new Phi-3 family of its small language models. Phi-3-mini, Phi-3-small, and Phi-3-medium outperform similar-sized competitor models.
(Read the announcement post. | Try the models on Hugging Face.)
Apple introduced an open-source family of language models called OpenELM. They are intended to run on mobile devices and come in four sizes: 270M, 450M, 1.1B, and 3B parameters.
(Read the research paper. | Get the source code on GitHub. | Get the model weights on Hugging Face.)
MyShell upgraded its voice-cloning model, OpenVoice, to V2. I was not impressed with V1. V2 seems more convincing.
(Read about the release on Hugging Face. | Get the source code on GitHub. | Try the demo on Hugging Face or MyShell Chat.)
Snowflake released a “top-tier enterprise-grade” LLM called Arctic. Arctic is a cost-effective model that excels at enterprise tasks (e.g. SQL generation).
(Read the announcement post. | Download the model from GitHub. | Try the demo on Hugging Face.)
Blockade Labs launched version 3 of its Skybox AI model. Skybox AI generates immersive 360-degree worlds from text prompts, and Model 3 can do this in 8K resolution.
(Follow the announcement thread on Twitter / X. | Try Skybox for yourself on skybox.blockadelabs.com.)
Midjourney introduced a “--sref random” parameter to help you explore styles. This adds a random style reference number to your prompt, changing its appearance. If you find a style you like, you can reuse the related --sref number.
(Read the announcement (requires Discord). | See examples by @aliejules on Twitter / X.)

🔬 AI Research

Cool stuff you might get to try one day:

Adobe introduced VideoGigaGAN, which upsamples videos by 8x (read more and see impressive examples)

📖 AI Resources

Helpful stuff that teaches you about AI:

“Google AI Essentials” - learn AI skills from Google’s experts
“What Is an AI Anyway?” - TED talk by Mustafa Suleyman [VIDEO]

🤦‍♂️ AI fail of the week

I wanted an image for the “Hush Little Baby” post, not existential nightmares

✅ This week’s poll:

💬 Anything to share?

Sadly, Substack doesn’t allow free subscribers to comment on posts with paid sections, but I am always open to your feedback. You can message me here:

⚔️ Sunday Showdown (Issue #8) - Firefly Image 3 vs. DALL-E 3: Which one is best at spelling and following instructions?

DALL-E 3 was the first text-to-image model to become scarily good at following detailed prompts in addition to rendering text. (Since then, Midjourney V6 has mostly caught up.)

Now, Adobe is claiming that its new Firefly Image 3 offers “astonishingly rich detail and prompt accuracy” as well as “clear text displays.”

And what better way to put that claim to the test than to make Firefly Image 3 go head-to-head against the OG of prompt accuracy, DALL-E 3?

Let’s see how the two handle my test prompts…

Why Try AI