Which AI Image Model Is the Best Speller…

Nov 14, 2024

I test 7 image models to see which ones can actually write.

16 Comments

Nov 14, 2024

I found Ideogram's handwriting in Hard Mode downright spooky. It's perfect. Consistent size, color and flow. I'll never trust pictures of handwriting again!

Expand full comment

Reply (1)

Daniel Nest

Nov 14, 2024

Right? I was also taken a bit aback. If it's any consolation, the other 3 images had regular "typed out" text, so the handwriting seems to be somewhat of a fluke. Or a "reverie," in the terminology of Westworld. Wait, that only makes it more spooky. Nevermind.

Expand full comment

Andrew Sniderman 🕷️

Nov 14, 2024

That was really fun; I wonder if there is some tech debt with OG models like DALL-E and midjourney that keeps them spelling losers

Expand full comment

Reply (1)

Daniel Nest

Nov 14, 2024

In the case of Midjourney, it's a simple matter of the architecture and training data. The team has been focusing on aesthetic aspects rather than spelling. I'm sure they'll get better in future models.

As for DALL-E 3, Andrew Smith and I just found out that it does quite a bit better for specifically the simple "Hungry?" prompt if you use it inside ChatGPT vs. Microsoft Designer. I'm not quite sure why that is, but it's not inconceivable that they're different iterations of DALL-E 3 and that the ChatGPT one is more updated since it comes directly from the source (OpenAI).

Glad you liked the post!

Expand full comment

Reply (1)

Andrew Sniderman 🕷️

Nov 14, 2024

I still do all my images with Microsoft bing/copilot. The ipad version still generates 4 images and lets you apply styles/sizes. The new version is more tightly integrated with designer so I’m sure that’s where they are going; there is a competitive angle with Designer/Canva. The new copilot app is optimized for conversation not images and I don’t like it. I’m not ready for an AI friend yet

Expand full comment

Reply (1)

Daniel Nest

Nov 14, 2024Edited

Microsoft Bing/Copilot also uses DALL-E 3 under the hood. The question is whether it's yet another iteration compared to ChatGPT. The plot thickens.

Expand full comment

Patrick Jordan

Nov 14, 2024

Love, love, love this post. This is a real issue for any of us who try to get these tools to include text in images, and this is such a great reference point. I'm definitely taking note to try out Ideogram and Recraft and use Flux more.

The comments on the results are pure gold, hilarious

Expand full comment

Reply (1)

Daniel Nest

Nov 14, 2024

Happy you liked it and found it useful, Patrick!

Yeah I'd say Ideogram has lately become my go-to model when I'm not using Midjourney. And today's results only strengthen the case. Recraft's a great speller, but it does awfully with e.g. things like fingers, which is something that most other image models have figured out by now.

Keep me posted after you've used the models to share your thougths. Maybe you have a different experience!

Expand full comment

Reply (1)

Patrick Jordan

Nov 14, 2024

Oh great, another tip - I will focus on trying out Ideogram and Flex some more. One side note on this, sort of apples to oranges - I find that Claude is flawless with spelling words correctly when it’s creating images as part of a project / artifact. Have you noticed that too?

Expand full comment

Reply (1)

Daniel Nest

Nov 14, 2024

But Claude doesn't really generate images per se in the same ways diffusion models do, right? It just creates diagrams/tables/charts etc., so the text is basically direct input rather than pixel-by-pixel generation as with diffusion (image) models. So I'd expect it to be flawless by default.

Unless you're thinking of something else?

Expand full comment

Remixa

Nov 14, 2024

Haha, that's too funny!

As expected, DALL-E is really too outdated!

Expand full comment

Reply (1)

Daniel Nest

Nov 14, 2024

I've been using DALL-E 3 pretty regularly for AI Jest Daily cartoons (successfully), so I was completely taken aback by its inability to get "Hungry" right.

Poor old DALL-E 3. It ain't what it used to be.

Expand full comment

Andrew Smith

Nov 14, 2024

Well done, Daniel and Recraft! Holy crap, that's really good.

I still use Dall E almost every day (just the built in GPT generator). Any chance it works better in conjunction with Omni? Like, does it work better when used from chatgpt itself? The words are still tough to get right, but nothing like this terrible!

Expand full comment

Reply (1)

Daniel Nest

Nov 14, 2024

I doubt it'd make a difference. All that the large language model does inside ChatGPT is create the prompt itself. The image is still rendered by the same DALL-E 3 model as the on used in Microsoft Designer.

One way you could test it is by taking my prompt from here and asking ChatGPT for an image with *that exact prompt* and explicitly telling it to not make any changes or additions.

But I'm also quite puzzled by "Hungry" being such a trip-up word for DALL-E 3. I've been using the Microsoft Designer version for the last 100 issues of AI Jest Daily or os. And yes, it takes some rerolls, but nothing as bad as four strikes out of four for a single word.

Expand full comment

Reply (1)

Andrew Smith

Nov 14, 2024

I just tried the prompt you used. I just pasted this:

Vintage 1950s poster for a diner with the word "Hungry?"

Let's take things over to chat, where things get serious.

(that's for people reading this who might want to consider upgrading)

Expand full comment

Comment deleted

Nov 14, 2024

Comment deleted

Expand full comment

Daniel Nest

Nov 14, 2024

You got it, Phil - Ideogram is generally my go-to "all-round" recommendation for a great image model that's good at following instructions and looks nice. And you get 10 free generations (40 images) per day.

yOU aRe WelcOMe!

Expand full comment

Why Try AI

Which AI Image Model Is the Best Speller…