Man. From one perspective, there has never been a better time to be a consumer! All these trillion dollar companies are fighting to make tools that are gonna make my life loads better (although they may also make the world loads worse in the meantime).
I thought we'd have more than 2 weeks for me to report back on those Gemini images, but I'm pretty sure I'll be using 4o for the foreseeable future.
This makes me think about the convergence talk we had before. Whenever a rival firm invents something, it's just a matter of time before everyone benefits. That "matter of time" is very uncertain, but in the past the times have been very long, whereas they seem to be getting shorter. In other words, the overwhelming majority of us humans now benefit from a tide that lifts all boats much, much faster.
When they announced this yesterday, I immediately thought of you. You already use ChatGPT for most of your images, except it's been powered by the rather obsolete DALL-E 3 model up to now.
I'm curious to see what you think of the new aesthetic because it might actually take away from some of that animated style you're used to.
At the same time, the ability for nuanced edits is a major shift. So I'm curious to hear what you think once you've played with it.
Also, technically OpenAI had this capability before Google, but as expected, Google's Gemini 2.0 release probably pushed the release timeline forward quite a bit!
Good point on the updates not necessarily working w/my current style. I bet I get thrown a curve for the first few days, then adjust. Today I used a GIF (you saw) and a really short post for me, so I didn't mess w/images (rare!), but I'm pretty confident I will produce a few tomorrow.
And you're right to point out the back-and-forth corporate battle of release timing. That's a really interesting phenomenon.
Tip: If you find that the new default aesthetic isn't close to what you're used to, try grabbing one of your favorite earlier images, uploading it, and asking ChatGPT to mimic the style when making your new image. It should be really great at that.
As people tend to gravitate towards solutions with less friction, it seems any dedicated app - image, audio, video, etc - will have a tough time going forward if one interface can do it all.
Not only is 4o image generation very competitive in terms of quality, but it actually understands the context of your request, unlike somewhat "dumb" traditional image models that just draw what you tell them.
And if I can have this integrated into a tool that also handles my data analysis, coding, and chats and can create images related to those....well....
Yup, I've been playing around a lot with it on Sora.com - also fun to see what everyone else is making (once you filter through the same obvious memes).
That's a cool observation: Have you found any pattern in which aspect ratio has which effect on the same prompt?
And that makes a lot of sense - context awareness is what separates native image generation by 4o from regular diffusion models.
The other day, I asked for an image of Harry Potter set in Soviet Russia, inspired by my past Midjourney series, and it knew enough to generate a nametag for Harry that said "G. Potter," but in Cyrillic.
Small correction: it still uses prompts and you can see it if you right click while it's generating the image and select "copy" or "select text". You'll get something like this:
{ "prompt": "A cute, happy little robot with a friendly face, small size, big expressive eyes, and a metallic but soft-looking body. The robot is standing in a cheerful pose, maybe waving or giving a thumbs-up. The background is light and simple to highlight the robot.", "size": "1024x1024" }
I'm pretty blown away by how well it works. Spent a while playing with it this morning. Really next level imo. Although I think at this point it might be technically better, but have worse "taste". The aesthetic seems very samey and flat. But I expect that will be improved rapidly enough too.
Interesting! I wasn't able to reproduce this in ChatGPT on my end - are you able to grab a screenshot?
Also, my point was less about claiming that self-prompting doesn't happen (I'm not 100% sure of how 4o handles this under the hood just yet), but more to point out how one would spot the "DALL-E 3 generated image in ChaGPT (has an "i" icon after generation) and the "4o-generated image" (which doesn't).
But I agree on both counts:
1. It works insanely well. I tried doing what they did on the live stream and taking pictures of the real world and turning those into cartoons, sketches, etc. - the level of detail it notices and renders is insane.
2. Yeah, it has a bit of a vanilla vibe - definitely when compared to something as opinionated as Midjourney, which has its own beautification aesthetic by default. Having said that, you can easily upload a reference style that you like and force 4o to mimic it, nudging it out of its default "samey" look.
It seems it's possible on mobile app but not on my laptop. I can't share images in comments, but I'll message you some screenshots.
Yeah I need to explore different ways of using it some more. I reckon there will be more of an art to prompting it well, especially now we can prompt it using multiple images and whole conversations.
Thanks, I'll try it on my mobile app and see if I can replicate it. (Only tried the browser version so far.)
And I actually think the prompting part might be even less significant now than it was before. You can quite literally tell ChatGPT to come up with 10 alternative art styles and render your image as those. Adding your own reference images just makes it even more direct. Would love to hear if you end up stumbling upon some neat tricks in your experiments.
Substack needs AI AGENT Jest. I bet you could automate just about everything now including publishing. Even a little QA filter with filters to trigger Daniel intervention. So much material floating around. I bet subscriber growth would be nuggledyfuts
Man. From one perspective, there has never been a better time to be a consumer! All these trillion dollar companies are fighting to make tools that are gonna make my life loads better (although they may also make the world loads worse in the meantime).
I thought we'd have more than 2 weeks for me to report back on those Gemini images, but I'm pretty sure I'll be using 4o for the foreseeable future.
This makes me think about the convergence talk we had before. Whenever a rival firm invents something, it's just a matter of time before everyone benefits. That "matter of time" is very uncertain, but in the past the times have been very long, whereas they seem to be getting shorter. In other words, the overwhelming majority of us humans now benefit from a tide that lifts all boats much, much faster.
When they announced this yesterday, I immediately thought of you. You already use ChatGPT for most of your images, except it's been powered by the rather obsolete DALL-E 3 model up to now.
I'm curious to see what you think of the new aesthetic because it might actually take away from some of that animated style you're used to.
At the same time, the ability for nuanced edits is a major shift. So I'm curious to hear what you think once you've played with it.
Also, technically OpenAI had this capability before Google, but as expected, Google's Gemini 2.0 release probably pushed the release timeline forward quite a bit!
Good point on the updates not necessarily working w/my current style. I bet I get thrown a curve for the first few days, then adjust. Today I used a GIF (you saw) and a really short post for me, so I didn't mess w/images (rare!), but I'm pretty confident I will produce a few tomorrow.
And you're right to point out the back-and-forth corporate battle of release timing. That's a really interesting phenomenon.
Tip: If you find that the new default aesthetic isn't close to what you're used to, try grabbing one of your favorite earlier images, uploading it, and asking ChatGPT to mimic the style when making your new image. It should be really great at that.
Good tip! I'll try that if I hate everything.
As people tend to gravitate towards solutions with less friction, it seems any dedicated app - image, audio, video, etc - will have a tough time going forward if one interface can do it all.
Yup, that's my main take here, too.
Not only is 4o image generation very competitive in terms of quality, but it actually understands the context of your request, unlike somewhat "dumb" traditional image models that just draw what you tell them.
And if I can have this integrated into a tool that also handles my data analysis, coding, and chats and can create images related to those....well....
Sora also has these image new features. It has 3 ratio options(3:2, 1:1, 2:3) and can gen up to 4 pics each time. It replaces my ImageFX workflow now.
By the way, the ratio has a heavy impact on the outcome image. I get my satisfaction by trying different ratios.
Yup, I've been playing around a lot with it on Sora.com - also fun to see what everyone else is making (once you filter through the same obvious memes).
That's a cool observation: Have you found any pattern in which aspect ratio has which effect on the same prompt?
It can brilliantly choose the best angle for the prompt. For example, wrestling scene, 3:2 would be a side view, while 2:3 would be a front view.
Nice!
And that makes a lot of sense - context awareness is what separates native image generation by 4o from regular diffusion models.
The other day, I asked for an image of Harry Potter set in Soviet Russia, inspired by my past Midjourney series, and it knew enough to generate a nametag for Harry that said "G. Potter," but in Cyrillic.
Small correction: it still uses prompts and you can see it if you right click while it's generating the image and select "copy" or "select text". You'll get something like this:
{ "prompt": "A cute, happy little robot with a friendly face, small size, big expressive eyes, and a metallic but soft-looking body. The robot is standing in a cheerful pose, maybe waving or giving a thumbs-up. The background is light and simple to highlight the robot.", "size": "1024x1024" }
I'm pretty blown away by how well it works. Spent a while playing with it this morning. Really next level imo. Although I think at this point it might be technically better, but have worse "taste". The aesthetic seems very samey and flat. But I expect that will be improved rapidly enough too.
Interesting! I wasn't able to reproduce this in ChatGPT on my end - are you able to grab a screenshot?
Also, my point was less about claiming that self-prompting doesn't happen (I'm not 100% sure of how 4o handles this under the hood just yet), but more to point out how one would spot the "DALL-E 3 generated image in ChaGPT (has an "i" icon after generation) and the "4o-generated image" (which doesn't).
But I agree on both counts:
1. It works insanely well. I tried doing what they did on the live stream and taking pictures of the real world and turning those into cartoons, sketches, etc. - the level of detail it notices and renders is insane.
2. Yeah, it has a bit of a vanilla vibe - definitely when compared to something as opinionated as Midjourney, which has its own beautification aesthetic by default. Having said that, you can easily upload a reference style that you like and force 4o to mimic it, nudging it out of its default "samey" look.
Crazy time to be alive!
It seems it's possible on mobile app but not on my laptop. I can't share images in comments, but I'll message you some screenshots.
Yeah I need to explore different ways of using it some more. I reckon there will be more of an art to prompting it well, especially now we can prompt it using multiple images and whole conversations.
Thanks, I'll try it on my mobile app and see if I can replicate it. (Only tried the browser version so far.)
And I actually think the prompting part might be even less significant now than it was before. You can quite literally tell ChatGPT to come up with 10 alternative art styles and render your image as those. Adding your own reference images just makes it even more direct. Would love to hear if you end up stumbling upon some neat tricks in your experiments.
Turns out you can't send images over message either, but I've shared them as a note here - https://substack.com/@josephrahi/note/c-103633354 :)
AI Jest could write itself now
Indeed - in fact, it can easily handle multi-panel cartoons now. That's like AI Jest times [however many panels you make].
Substack needs AI AGENT Jest. I bet you could automate just about everything now including publishing. Even a little QA filter with filters to trigger Daniel intervention. So much material floating around. I bet subscriber growth would be nuggledyfuts
Then I can have an AI Jest Editor Agent to monitor the AI Jest Cartoonist agents and keep them in line. It's a publishing business that runs itself!
You must do it, if for only the good of the AI Economy.