Which AI Image Model Is the Best Speller? Let’s Find Out!

I test 7 image models to find those that can actually write.

Daniel Nest

Nov 14, 2024

Hey, remember back when image models used to write complete nonsense when trying to spell?

Oh, you don’t?

Here, take a look at my “The Real Spaghetti Western” from the Midjourney V4 days:

A Western style saloon in a town, made out of spaghetti. Midjourney V4 — Source: **My Reddit post**.

Fun was had by all:

But things started to change with the launch of DALL-E 3.

Suddenly, we had a model that could write short sentences that weren’t gibberish.

I was impressed enough to pen a post about using DALL-E 3 to make cartoons:

DALL-E 3: The Cartoonist

Daniel Nest

October 5, 2023

Read full story

I then took my own medicine and switched to DALL-E 3 for my AI Jest Daily project.

A bit later, Ideogram came out, positioning itself as the “can spell” model.

Even Midjourney eventually started to catch up, introducing the ability to accurately write short words:

Midjourney Version 6: Look Ma, Text!

Daniel Nest

December 28, 2023

Read full story

Now, one year later, image models that can spell are no longer a novelty.

Most people expect to be able to get at least some text inside an AI image.

So today, I’m testing the current crop of image models to find out which one is the best at writing text.

🪧 The contestants

I picked 7 participants for today’s challenge:

DALL-E 3 by OpenAI (via Microsoft Designer)
FLUX1.1 [pro] by Black Forest Labs (via Glif)
Ideogram 2.0 by Ideogram (via Ideogram)
Imagen 3 by Google (via Image FX)
Midjourney 6.1 by Midjourney (via Midjourney)
Recraft V3 by Recraft (via Recraft)
Stable Diffusion 3.5 Large by Stability AI (via Hugging Face)

This isn’t an exhaustive list of all text-to-image models. It’s notably missing Meta’s Emu, Adobe’s Firefly Image 3, and standalone sites like Leonardo and Playground.

I intentionally narrowed the field to models that market themselves as capable of spelling. I think you’ll forgive me for not including ones that write like this:

🧪 The test

Our challengers will compete to create the following three images of increasing difficulty.

🟢Easy mode

For this one, they’ll need to spell just one short word. Here’s the prompt:

Vintage 1950s poster for a diner with the word "Hungry?"

🟡Medium mode

Ramping things up, we’ll try for a 7-word speech bubble. Prompt:

Cartoon illustration: Starbucks coffee shop. A frowning customer holds a coffee cup, saying to the barista, “Even AI can spell better than you!” The barista stands behind the counter, next to coffee machines, stacked cups, and ingredients.

🔴Hard mode

Finally, for a real challenge, we’ll have our models try to handle even longer text:

A woman with a shocked expression stands in the middle of a library. She's holding a sign that says "Wow! Can you believe this AI model is so good at writing text?”

To keep a level playing field, I’ll be doing the following:

All images will be square.1
Each model will make 4 images, and I’ll only pick their best result.2
Where applicable, I’ll keep the default settings for each model.

I expect everyone to nail the “Easy” mode.

Midjourney (and perhaps others) will very likely fail already at “Medium” mode.

If we’re lucky, we might have at least one model pass the “Hard” mode.

Let’s see if I’m right!

📝 The results

So how did our contestants fare?

🟢Easy mode

Here are the images:

DALL-E

What?! How’s this a fail right out of the gate? And from the original “good speller” DALL-E 3, no less! This is the best out of four, too:

Vintage 1950s poster for a diner with the word "Hungry?" - by DALL-E 3 (4 image grid) — “I bunniste yinGgry all the time, man. All the time!”

Four takes, four fails. Come on, DALL-E 3!

FLUX

Nailed it!

I’ll forgive the “?” vs. “!” mix-up, especially since FLUX got the spelling consistently right in all four images.

Ideogram

Great!

Imagen

Excellent!

Midjourney

Not particularly inspired, but the text is correct!

Recraft

I sure am.

Stable Diffusion

Nice!

Side-by-side roundup

Vintage 1950s poster for a diner with the word "Hungry?" by DALL-E, FLUX, Ideogram, Imagen, Midjourney, Recraft. Stable Diffusion — Left to right: DALL-E, FLUX, Ideogram, Imagen, Midjourney, Recraft, Stable Diffusion

I’m still trying to get over the shock of DALL-E 3 stumbling on the easiest test.

But at least everyone else passed!

Scoreboard

DALL-E: 0
All other models: 1

🟡Medium mode

Let’s turn up the heat!

DALL-E

Cartoon illustration: Starbucks coffee shop. A frowning customer holds a coffee cup, saying to the barista, “Even AI can spell better than you!” The barista stands behind the counter, next to coffee machines, stacked cups, and ingredients. by DALL-E

Can it, DALL-E 3? Can it really?

FLUX

Why yes. Yes, it can.

Ideogram

Perfect!

Imagen

I expected a speech bubble, but I’ll take it, Imagen!

Midjourney

Oof, so close, Midjourney. But no.

Recraft

Excellent. Bonus points for getting all the scene details right as well.

Stable Diffusion

You sure turned this into a dark existential crisis kind of scene, Stable Diffusion, but you did spell everything correctly.

Side-by-side roundup

Scoreboard

DALL-E: 0
FLUX: 2
Ideogram: 2
Imagen: 2
Midjourney: 1
Recraft: 2
Stable Diffusion: 2

🔴Hard mode

This is it, AI. The final boss. Ready? Fight!

DALL-E

A woman with a shocked expression stands in the middle of a library. She's holding a sign that says "Wow! Can you believe this AI model is so good at writing text?” by DALL-E

It’s so bat indeed. So bat.

FLUX

Well played, FLUX.

Ideogram

Amazing. Extra points for the handwritten note. Nice touch, Ideogram.

Imagen

No!

Midjourney

You did what you could, Midjourney. You really did.

Recraft

The text works. Not sure about the odd font changes, but it’s a pass.

Stable Diffusion

Oh man, so close! “So” being the missing word here.

Side-by-side roundup

Scoreboard

DALL-E: 0
FLUX: 3
Ideogram: 3
Imagen: 2
Midjourney: 1
Recraft: 3
Stable Diffusion: 2

☠️ “Sudden death” mode (bonus round)

I’ll be honest: I didn’t expect as many as three models to survive this far.

To award the title of “Best Speller,” I’ll have to put FLUX, Ideogram, and Recraft through the ultimate stress test.

Here’s the prompt:

Yellow scrolling text against a black background: "A long time ago in a galaxy far, far away, three models fought a spelling battle to the death. Only one would win."

Let’s see those images, contestants!

FLUX

Yellow scrolling text against a black background: "A long time ago in a galaxy far, far away, three models fought a spelling battle to the death. Only one would win." by FLUX

Almost! And now I want to live in a “gelaxy,” too.

Ideogram

Even the best of us have an off day, Ideogram. Don’t be too hard on yourself.

Recraft

I’ll. Be. Damned. You did it. You really did it. Recraft, you crazy son of a bitch!

Side-by-side roundup

🏆 The verdict

Ladies and gentlemen, I believe we’re ready to crown our champion.

🌟God tier: Recraft

Recraft V3 is the undisputed leader of today’s spelling test.

While I’m not always a fan of its aesthetic, Recraft is undeniably capable of handling pretty long strings of error-free text.

So if you intend to add lots of text inside your AI-generated images, Recraft is a safe bet.

🥇 Tier #1: FLUX & Ideogram

Both FLUX1.1 [pro] and Ideogram 2.0 are solid spellers and only lost out to Recraft by a tiny margin.

Also, in my opinion, both of them typically create better-looking images than Recraft. This makes them better all-around models.

As such, my recommendation from last month still stands.

🥈 Tier #2: Imagen & Stable Diffusion

At our middle tier, with 2 points each, are Imagen 3 and Stable Diffusion 3.5 Large.

To me, Imagen is better aesthetically3, plus Stable Diffusion doesn’t always stay as faithful to the overall prompt.

So Imagen is the better pick of the two.

🥉 Tier #3: DALL-E & Midjourney

I’m hardly surprised to find Midjourney V6.1 here.

As much as I love Midjourney, spelling has always been its Achilles' heel.

But I’m shocked to see DALL-E 3 fail so spectacularly, scoring no points at all.

I must say, that’s the biggest disappointment of today’s experiment.

🫵 Over to you…

Have you been pushing any image models to their spelling limits? What’s been your experience? Are you aware of another AI model that can handle text well?

Leave a comment or drop me an email at whytryai@substack.com.

7 Text-To-Image AI Models: Tested

Daniel Nest

December 14, 2023

Read full story

10 of My Most Popular AI Image Series (+Prompts)

Daniel Nest

March 21, 2024

Read full story

Some tools like e.g. Imagen 3 in Image FX can only create square images.

Because e.g. Midjourney and Ideogram create a four-image grid per prompt by default.

And I loved Imagen’s diner ad quite a bit.

stillhooman

Nov 14

I found Ideogram's handwriting in Hard Mode downright spooky. It's perfect. Consistent size, color and flow. I'll never trust pictures of handwriting again!

Expand full comment

1 reply by Daniel Nest

Andrew Sniderman 🕷️

That was really fun; I wonder if there is some tech debt with OG models like DALL-E and midjourney that keeps them spelling losers

3 replies by Daniel Nest and others

16 more comments...

Why Try AI

DALL-E 3: The Cartoonist

Midjourney Version 6: Look Ma, Text!

7 Text-To-Image AI Models: Tested

10 of My Most Popular AI Image Series (+Prompts)

Discussion about this post

Why Try AI

Which AI Image Model Is the Best Speller? Let’s Find Out!

I test 7 image models to find those that can actually write.

DALL-E 3: The Cartoonist

Midjourney Version 6: Look Ma, Text!

🪧 The contestants

🧪 The test

🟢Easy mode

🟡Medium mode

🔴Hard mode

📝 The results

🟢Easy mode

DALL-E

FLUX

Ideogram

Imagen

Midjourney

Recraft

Stable Diffusion

Side-by-side roundup

Scoreboard

🟡Medium mode

DALL-E

FLUX

Ideogram

Imagen

Midjourney

Recraft

Stable Diffusion

Side-by-side roundup

Scoreboard

🔴Hard mode

DALL-E

FLUX

Ideogram

Imagen

Midjourney

Recraft

Stable Diffusion

Side-by-side roundup

Scoreboard

☠️ “Sudden death” mode (bonus round)

FLUX

Ideogram

Recraft

Side-by-side roundup

🏆 The verdict

🌟God tier: Recraft

🥇 Tier #1: FLUX & Ideogram

🥈 Tier #2: Imagen & Stable Diffusion

🥉 Tier #3: DALL-E & Midjourney

🫵 Over to you…

Related posts:

7 Text-To-Image AI Models: Tested

10 of My Most Popular AI Image Series (+Prompts)

Discussion about this post