I found Ideogram's handwriting in Hard Mode downright spooky. It's perfect. Consistent size, color and flow. I'll never trust pictures of handwriting again!
Right? I was also taken a bit aback. If it's any consolation, the other 3 images had regular "typed out" text, so the handwriting seems to be somewhat of a fluke. Or a "reverie," in the terminology of Westworld. Wait, that only makes it more spooky. Nevermind.
In the case of Midjourney, it's a simple matter of the architecture and training data. The team has been focusing on aesthetic aspects rather than spelling. I'm sure they'll get better in future models.
As for DALL-E 3, Andrew Smith and I just found out that it does quite a bit better for specifically the simple "Hungry?" prompt if you use it inside ChatGPT vs. Microsoft Designer. I'm not quite sure why that is, but it's not inconceivable that they're different iterations of DALL-E 3 and that the ChatGPT one is more updated since it comes directly from the source (OpenAI).
I still do all my images with Microsoft bing/copilot. The ipad version still generates 4 images and lets you apply styles/sizes. The new version is more tightly integrated with designer so I’m sure that’s where they are going; there is a competitive angle with Designer/Canva. The new copilot app is optimized for conversation not images and I don’t like it. I’m not ready for an AI friend yet
Love, love, love this post. This is a real issue for any of us who try to get these tools to include text in images, and this is such a great reference point. I'm definitely taking note to try out Ideogram and Recraft and use Flux more.
The comments on the results are pure gold, hilarious
Yeah I'd say Ideogram has lately become my go-to model when I'm not using Midjourney. And today's results only strengthen the case. Recraft's a great speller, but it does awfully with e.g. things like fingers, which is something that most other image models have figured out by now.
Keep me posted after you've used the models to share your thougths. Maybe you have a different experience!
Oh great, another tip - I will focus on trying out Ideogram and Flex some more. One side note on this, sort of apples to oranges - I find that Claude is flawless with spelling words correctly when it’s creating images as part of a project / artifact. Have you noticed that too?
But Claude doesn't really generate images per se in the same ways diffusion models do, right? It just creates diagrams/tables/charts etc., so the text is basically direct input rather than pixel-by-pixel generation as with diffusion (image) models. So I'd expect it to be flawless by default.
I've been using DALL-E 3 pretty regularly for AI Jest Daily cartoons (successfully), so I was completely taken aback by its inability to get "Hungry" right.
You got it, Phil - Ideogram is generally my go-to "all-round" recommendation for a great image model that's good at following instructions and looks nice. And you get 10 free generations (40 images) per day.
Well done, Daniel and Recraft! Holy crap, that's really good.
I still use Dall E almost every day (just the built in GPT generator). Any chance it works better in conjunction with Omni? Like, does it work better when used from chatgpt itself? The words are still tough to get right, but nothing like this terrible!
I doubt it'd make a difference. All that the large language model does inside ChatGPT is create the prompt itself. The image is still rendered by the same DALL-E 3 model as the on used in Microsoft Designer.
One way you could test it is by taking my prompt from here and asking ChatGPT for an image with *that exact prompt* and explicitly telling it to not make any changes or additions.
But I'm also quite puzzled by "Hungry" being such a trip-up word for DALL-E 3. I've been using the Microsoft Designer version for the last 100 issues of AI Jest Daily or os. And yes, it takes some rerolls, but nothing as bad as four strikes out of four for a single word.
I found Ideogram's handwriting in Hard Mode downright spooky. It's perfect. Consistent size, color and flow. I'll never trust pictures of handwriting again!
Right? I was also taken a bit aback. If it's any consolation, the other 3 images had regular "typed out" text, so the handwriting seems to be somewhat of a fluke. Or a "reverie," in the terminology of Westworld. Wait, that only makes it more spooky. Nevermind.
That was really fun; I wonder if there is some tech debt with OG models like DALL-E and midjourney that keeps them spelling losers
In the case of Midjourney, it's a simple matter of the architecture and training data. The team has been focusing on aesthetic aspects rather than spelling. I'm sure they'll get better in future models.
As for DALL-E 3, Andrew Smith and I just found out that it does quite a bit better for specifically the simple "Hungry?" prompt if you use it inside ChatGPT vs. Microsoft Designer. I'm not quite sure why that is, but it's not inconceivable that they're different iterations of DALL-E 3 and that the ChatGPT one is more updated since it comes directly from the source (OpenAI).
Glad you liked the post!
I still do all my images with Microsoft bing/copilot. The ipad version still generates 4 images and lets you apply styles/sizes. The new version is more tightly integrated with designer so I’m sure that’s where they are going; there is a competitive angle with Designer/Canva. The new copilot app is optimized for conversation not images and I don’t like it. I’m not ready for an AI friend yet
Microsoft Bing/Copilot also uses DALL-E 3 under the hood. The question is whether it's yet another iteration compared to ChatGPT. The plot thickens.
Love, love, love this post. This is a real issue for any of us who try to get these tools to include text in images, and this is such a great reference point. I'm definitely taking note to try out Ideogram and Recraft and use Flux more.
The comments on the results are pure gold, hilarious
Happy you liked it and found it useful, Patrick!
Yeah I'd say Ideogram has lately become my go-to model when I'm not using Midjourney. And today's results only strengthen the case. Recraft's a great speller, but it does awfully with e.g. things like fingers, which is something that most other image models have figured out by now.
Keep me posted after you've used the models to share your thougths. Maybe you have a different experience!
Oh great, another tip - I will focus on trying out Ideogram and Flex some more. One side note on this, sort of apples to oranges - I find that Claude is flawless with spelling words correctly when it’s creating images as part of a project / artifact. Have you noticed that too?
But Claude doesn't really generate images per se in the same ways diffusion models do, right? It just creates diagrams/tables/charts etc., so the text is basically direct input rather than pixel-by-pixel generation as with diffusion (image) models. So I'd expect it to be flawless by default.
Unless you're thinking of something else?
Haha, that's too funny!
As expected, DALL-E is really too outdated!
I've been using DALL-E 3 pretty regularly for AI Jest Daily cartoons (successfully), so I was completely taken aback by its inability to get "Hungry" right.
Poor old DALL-E 3. It ain't what it used to be.
Thanks interesting and useful, thanks. I've been using mostly Dalle and Flux, so now I know what to do when Dalle fumbles text pics. ApPeciATTTe iT!
You got it, Phil - Ideogram is generally my go-to "all-round" recommendation for a great image model that's good at following instructions and looks nice. And you get 10 free generations (40 images) per day.
yOU aRe WelcOMe!
Thanks again, I'm off to explore Ideogram.
Well done, Daniel and Recraft! Holy crap, that's really good.
I still use Dall E almost every day (just the built in GPT generator). Any chance it works better in conjunction with Omni? Like, does it work better when used from chatgpt itself? The words are still tough to get right, but nothing like this terrible!
I doubt it'd make a difference. All that the large language model does inside ChatGPT is create the prompt itself. The image is still rendered by the same DALL-E 3 model as the on used in Microsoft Designer.
One way you could test it is by taking my prompt from here and asking ChatGPT for an image with *that exact prompt* and explicitly telling it to not make any changes or additions.
But I'm also quite puzzled by "Hungry" being such a trip-up word for DALL-E 3. I've been using the Microsoft Designer version for the last 100 issues of AI Jest Daily or os. And yes, it takes some rerolls, but nothing as bad as four strikes out of four for a single word.
I just tried the prompt you used. I just pasted this:
Vintage 1950s poster for a diner with the word "Hungry?"
Let's take things over to chat, where things get serious.
(that's for people reading this who might want to consider upgrading)