Which AI Image Model Is the Best Speller? Let’s Find Out!
I test 7 image models to find those that can actually write.
Hey, remember back when image models used to write complete nonsense when trying to spell?
Oh, you don’t?
Here, take a look at my “The Real Spaghetti Western” from the Midjourney V4 days:
Fun was had by all:
But things started to change with the launch of DALL-E 3.
Suddenly, we had a model that could write short sentences that weren’t gibberish.
I was impressed enough to pen a post about using DALL-E 3 to make cartoons:
I then took my own medicine and switched to DALL-E 3 for my AI Jest Daily project.
A bit later, Ideogram came out, positioning itself as the “can spell” model.
Even Midjourney eventually started to catch up, introducing the ability to accurately write short words:
Now, one year later, image models that can spell are no longer a novelty.
Most people expect to be able to get at least some text inside an AI image.
So today, I’m testing the current crop of image models to find out which one is the best at writing text.
🪧 The contestants
I picked 7 participants for today’s challenge:
DALL-E 3 by OpenAI (via Microsoft Designer)
FLUX1.1 [pro] by Black Forest Labs (via Glif)
Ideogram 2.0 by Ideogram (via Ideogram)
Imagen 3 by Google (via Image FX)
Midjourney 6.1 by Midjourney (via Midjourney)
Recraft V3 by Recraft (via Recraft)
Stable Diffusion 3.5 Large by Stability AI (via Hugging Face)
This isn’t an exhaustive list of all text-to-image models. It’s notably missing Meta’s Emu, Adobe’s Firefly Image 3, and standalone sites like Leonardo and Playground.
I intentionally narrowed the field to models that market themselves as capable of spelling. I think you’ll forgive me for not including ones that write like this:
🧪 The test
Our challengers will compete to create the following three images of increasing difficulty.
🟢Easy mode
For this one, they’ll need to spell just one short word. Here’s the prompt:
Vintage 1950s poster for a diner with the word "Hungry?"
🟡Medium mode
Ramping things up, we’ll try for a 7-word speech bubble. Prompt:
Cartoon illustration: Starbucks coffee shop. A frowning customer holds a coffee cup, saying to the barista, “Even AI can spell better than you!” The barista stands behind the counter, next to coffee machines, stacked cups, and ingredients.
🔴Hard mode
Finally, for a real challenge, we’ll have our models try to handle even longer text:
A woman with a shocked expression stands in the middle of a library. She's holding a sign that says "Wow! Can you believe this AI model is so good at writing text?”
To keep a level playing field, I’ll be doing the following:
All images will be square.1
Each model will make 4 images, and I’ll only pick their best result.2
Where applicable, I’ll keep the default settings for each model.
I expect everyone to nail the “Easy” mode.
Midjourney (and perhaps others) will very likely fail already at “Medium” mode.
If we’re lucky, we might have at least one model pass the “Hard” mode.
Let’s see if I’m right!
📝 The results
So how did our contestants fare?
🟢Easy mode
Here are the images:
DALL-E
What?! How’s this a fail right out of the gate? And from the original “good speller” DALL-E 3, no less! This is the best out of four, too:
Four takes, four fails. Come on, DALL-E 3!
FLUX
Nailed it!
I’ll forgive the “?” vs. “!” mix-up, especially since FLUX got the spelling consistently right in all four images.
Ideogram
Great!
Imagen
Excellent!
Midjourney
Not particularly inspired, but the text is correct!
Recraft
I sure am.
Stable Diffusion
Nice!
Side-by-side roundup
I’m still trying to get over the shock of DALL-E 3 stumbling on the easiest test.
But at least everyone else passed!
Scoreboard
DALL-E: 0
All other models: 1
🟡Medium mode
Let’s turn up the heat!
DALL-E
Can it, DALL-E 3? Can it really?
FLUX
Why yes. Yes, it can.
Ideogram
Perfect!
Imagen
I expected a speech bubble, but I’ll take it, Imagen!
Midjourney
Oof, so close, Midjourney. But no.
Recraft
Excellent. Bonus points for getting all the scene details right as well.
Stable Diffusion
You sure turned this into a dark existential crisis kind of scene, Stable Diffusion, but you did spell everything correctly.
Side-by-side roundup
Scoreboard
DALL-E: 0
FLUX: 2
Ideogram: 2
Imagen: 2
Midjourney: 1
Recraft: 2
Stable Diffusion: 2
🔴Hard mode
This is it, AI. The final boss. Ready? Fight!
DALL-E
It’s so bat indeed. So bat.
FLUX
Well played, FLUX.
Ideogram
Amazing. Extra points for the handwritten note. Nice touch, Ideogram.
Imagen
No!
Midjourney
You did what you could, Midjourney. You really did.
Recraft
The text works. Not sure about the odd font changes, but it’s a pass.
Stable Diffusion
Oh man, so close! “So” being the missing word here.
Side-by-side roundup
Scoreboard
DALL-E: 0
FLUX: 3
Ideogram: 3
Imagen: 2
Midjourney: 1
Recraft: 3
Stable Diffusion: 2
☠️ “Sudden death” mode (bonus round)
I’ll be honest: I didn’t expect as many as three models to survive this far.
To award the title of “Best Speller,” I’ll have to put FLUX, Ideogram, and Recraft through the ultimate stress test.
Here’s the prompt:
Yellow scrolling text against a black background: "A long time ago in a galaxy far, far away, three models fought a spelling battle to the death. Only one would win."
Let’s see those images, contestants!
FLUX
Almost! And now I want to live in a “gelaxy,” too.
Ideogram
Even the best of us have an off day, Ideogram. Don’t be too hard on yourself.
Recraft
I’ll. Be. Damned. You did it. You really did it. Recraft, you crazy son of a bitch!
Side-by-side roundup
🏆 The verdict
Ladies and gentlemen, I believe we’re ready to crown our champion.
🌟God tier: Recraft
Recraft V3 is the undisputed leader of today’s spelling test.
While I’m not always a fan of its aesthetic, Recraft is undeniably capable of handling pretty long strings of error-free text.
So if you intend to add lots of text inside your AI-generated images, Recraft is a safe bet.
🥇 Tier #1: FLUX & Ideogram
Both FLUX1.1 [pro] and Ideogram 2.0 are solid spellers and only lost out to Recraft by a tiny margin.
Also, in my opinion, both of them typically create better-looking images than Recraft. This makes them better all-around models.
As such, my recommendation from last month still stands.
🥈 Tier #2: Imagen & Stable Diffusion
At our middle tier, with 2 points each, are Imagen 3 and Stable Diffusion 3.5 Large.
To me, Imagen is better aesthetically3, plus Stable Diffusion doesn’t always stay as faithful to the overall prompt.
So Imagen is the better pick of the two.
🥉 Tier #3: DALL-E & Midjourney
I’m hardly surprised to find Midjourney V6.1 here.
As much as I love Midjourney, spelling has always been its Achilles' heel.
But I’m shocked to see DALL-E 3 fail so spectacularly, scoring no points at all.
I must say, that’s the biggest disappointment of today’s experiment.
🫵 Over to you…
Have you been pushing any image models to their spelling limits? What’s been your experience? Are you aware of another AI model that can handle text well?
Leave a comment or drop me an email at whytryai@substack.com.
Related posts:
Some tools like e.g. Imagen 3 in Image FX can only create square images.
Because e.g. Midjourney and Ideogram create a four-image grid per prompt by default.
And I loved Imagen’s diner ad quite a bit.
I found Ideogram's handwriting in Hard Mode downright spooky. It's perfect. Consistent size, color and flow. I'll never trust pictures of handwriting again!
That was really fun; I wonder if there is some tech debt with OG models like DALL-E and midjourney that keeps them spelling losers