Now that Google’s version is out, the pressure is on OpenAI to catch up.
The landscape is changing fast.
We may soon wave goodbye to the era of separate features stitched into unholy amalgams. Instead, we’ll have truly omnimodal models handling everything on their own.
…yet I still keep seeing people use both of these constantly, simply copy-pasting random-descriptor-filled prompts at scale without reflection.
When you give users a model that intuitively knows what they want and can accurately create it on demand, they’ll flock to this model at the expense of alternatives.
Perhaps nothing illustrates this better than the launch of a new, state-of-the-art image model, which by cruel fate happened just before the OpenAI announcement.2
I ran a few tests and am impressed by Reve’s quality, instruction following, and text rendering.3
Reve is also free to use, so go ahead and try it over at preview.reve.art.
But here’s the thing: While text-to-image nerds like myself will happily try new sites and geek out about marginal improvements in diffusion models, most regular users will want something that “just works” inside a tool they’re already using.
And that’s exactly what the new 4o image creation in ChatGPT does.
My guess?
ChatGPT’s newfound drawing skills will open the floodgates for chat-based image generation by mainstream audiences.
I doubt it will wipe out existing text-to-image prompting methods overnight, but it’ll certainly shift the conversation toward a more intuitive way of doing things.
In fact, I won’t be surprised if we eventually look back at text-to-image prompt input boxes as relics of a bygone era.
Am I wrong?
Why Try AI is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
🫵 Over to you…
Am I too quick to dismiss current image models and their interfaces? Can chat-based image creation and text-to-image prompt boxes coexist? Do they serve different purposes and appeal to different groups of people?
🔗Share it to help others discover this newsletter.
🗩 Comment below—I love hearing your opinions.
Why Try AI is a passion project, and I’m grateful to those who help keep it going. If you want to support my work and unlock cool perks, consider a paid subscription:
Hot Takes are occasional timely posts that focus on fast-moving news and releases, in addition to my regular Thursday and Sunday columns.
If Hot Takes aren’t your cup of tea, simply go to your account at www.whytryai.com/accountand toggle the “Notifications” settings accordingly:
Man. From one perspective, there has never been a better time to be a consumer! All these trillion dollar companies are fighting to make tools that are gonna make my life loads better (although they may also make the world loads worse in the meantime).
I thought we'd have more than 2 weeks for me to report back on those Gemini images, but I'm pretty sure I'll be using 4o for the foreseeable future.
This makes me think about the convergence talk we had before. Whenever a rival firm invents something, it's just a matter of time before everyone benefits. That "matter of time" is very uncertain, but in the past the times have been very long, whereas they seem to be getting shorter. In other words, the overwhelming majority of us humans now benefit from a tide that lifts all boats much, much faster.
As people tend to gravitate towards solutions with less friction, it seems any dedicated app - image, audio, video, etc - will have a tough time going forward if one interface can do it all.
Man. From one perspective, there has never been a better time to be a consumer! All these trillion dollar companies are fighting to make tools that are gonna make my life loads better (although they may also make the world loads worse in the meantime).
I thought we'd have more than 2 weeks for me to report back on those Gemini images, but I'm pretty sure I'll be using 4o for the foreseeable future.
This makes me think about the convergence talk we had before. Whenever a rival firm invents something, it's just a matter of time before everyone benefits. That "matter of time" is very uncertain, but in the past the times have been very long, whereas they seem to be getting shorter. In other words, the overwhelming majority of us humans now benefit from a tide that lifts all boats much, much faster.
As people tend to gravitate towards solutions with less friction, it seems any dedicated app - image, audio, video, etc - will have a tough time going forward if one interface can do it all.