Great job on these. I am still trying to get one of these tools to accurately generate images of hands with all five fingers, in a single close-up, just changing the nail polish color and skin tone. Any thoughts on the best tool for this?
Well, Midjourney is actually pretty consistent at making anatomically correct hands now.
But keeping the same exact image while changing skin/nail colors isn't a built-in feature. I just tried doing this, and even if you keep the seed the same but change the nail color, the images are different.
Then again, if all you're after is keeping the exact same hand, I'm sure there are third-party tools that let you select nails and change them. You could even consider giving Generative Fill in Adobe Firefly a go for that!
The consistent character GPT seems far simpler, but I'm not qualified to do a head to head comparison with Midjourney. All these tools have their limitations, and I'm looking forward to exploring Midjourney as an option to ChatGPT once they make the website fully available.
I'm learning to upload an image I want to work with and ask ChatGPT to describe it. This tells me what language ChatGPT is using for that image. Then I'll make a template including ChatGPT's own description for additional image generations. This seems to really help keep characters consistent, though again, perfection is typically not an option. Most of the time it's close enough though.
Yeah the big win here is that Midjourney is much better than DALL-E 3 for photographic imagery, so being able to use it for consistent character generation is huge for virtual photoshoots, etc.
It's about using image recognition to recreate+describe the image, so you can reuse that style.
The Midjourney website should go live relatively soon for everyone, as they've now lowered the entry requirement to 1K images generated (used to be 10K). So my guess is a couple of months at most, if not sooner.
ChatGPT/Dalle is experiencing a meltdown today, so Midjourney is sounding ever more interesting. I'm guessing I'm not going to be able to get serious work done with just one image generator. We'll see...
Oh wow, this is going to unlock a ton of new use cases for people just trying to make things for themselves. Any insight into how they're able to do this technically? I'm very curious on what changes they made to the model to make this work.
Midjourney don't really share much about their inner workings, unfortunately. Even during office hours, the focus tends to be on what's coming and what's being worked on, but not the "how."
But it appears they found a way to consistently isolate styles, aesthetic details, and specifics in their own images. That's what makes the style tuners possible (selecting your preferred pictures so they can blend them into a unique, reusable style). That's also what's behind --sref (style isolation), and now it appears they found a way to use a similar approach to isolate the subjects.
If you do stumble upon a technical explanation, I'd love to hear it! (And I will likewise let you know if I see anything.)
I use Discord for the tutorials to make sure others can follow.
But Midjourney has since lowered the criteria for having website access to 1K generated images, and they're gearing up for a mass rollout soon. So I guess within a few months, everyone who wants a website will be able to us it!
I did check out Artflow back in the day. There are many great alternative UI's like Leonardo, Playground, etc. I'm so used to Midjourney and DALL-E 3 in ChatGPT Plus that I haven't been looking for additional interfaces to use actively.
Gotcha gotcha, my image generation needs are pretty few and far between so I mostly use Dall-e
I’m more just waiting for a company to be able to render fully animated stories with consistent characters over long runtimes which I why I mentioned Artflow
Obviously we’re not there yet, but Sora, it seems, gets close
Yeah Sora is pure witchcraft, and Mira Murati said in a recent interview that she expects a public release definitely this year and maybe in a few months.
Yeah 10-minutes for a 1-minute magical video based purely on a text prompt is nothing. And if recent history is anything to go by, we should expect the speed to improve quickly and the costs to come down as well after it launches.
That's fun!
It is, isn't it?
Great job on these. I am still trying to get one of these tools to accurately generate images of hands with all five fingers, in a single close-up, just changing the nail polish color and skin tone. Any thoughts on the best tool for this?
Well, Midjourney is actually pretty consistent at making anatomically correct hands now.
But keeping the same exact image while changing skin/nail colors isn't a built-in feature. I just tried doing this, and even if you keep the seed the same but change the nail color, the images are different.
Then again, if all you're after is keeping the exact same hand, I'm sure there are third-party tools that let you select nails and change them. You could even consider giving Generative Fill in Adobe Firefly a go for that!
The consistent character GPT seems far simpler, but I'm not qualified to do a head to head comparison with Midjourney. All these tools have their limitations, and I'm looking forward to exploring Midjourney as an option to ChatGPT once they make the website fully available.
I'm learning to upload an image I want to work with and ask ChatGPT to describe it. This tells me what language ChatGPT is using for that image. Then I'll make a template including ChatGPT's own description for additional image generations. This seems to really help keep characters consistent, though again, perfection is typically not an option. Most of the time it's close enough though.
Yeah the big win here is that Midjourney is much better than DALL-E 3 for photographic imagery, so being able to use it for consistent character generation is huge for virtual photoshoots, etc.
And the process you've described is very similar to this tip I've shared back in the day: https://www.whytryai.com/p/10x-ai-22-canva-magic-studio-linkedin-zoom#%C2%A7mimic-styles-without-relying-on-artists-names
It's about using image recognition to recreate+describe the image, so you can reuse that style.
The Midjourney website should go live relatively soon for everyone, as they've now lowered the entry requirement to 1K images generated (used to be 10K). So my guess is a couple of months at most, if not sooner.
ChatGPT/Dalle is experiencing a meltdown today, so Midjourney is sounding ever more interesting. I'm guessing I'm not going to be able to get serious work done with just one image generator. We'll see...
Thanks as always for all your useful information!
Oh wow, this is going to unlock a ton of new use cases for people just trying to make things for themselves. Any insight into how they're able to do this technically? I'm very curious on what changes they made to the model to make this work.
Midjourney don't really share much about their inner workings, unfortunately. Even during office hours, the focus tends to be on what's coming and what's being worked on, but not the "how."
But it appears they found a way to consistently isolate styles, aesthetic details, and specifics in their own images. That's what makes the style tuners possible (selecting your preferred pictures so they can blend them into a unique, reusable style). That's also what's behind --sref (style isolation), and now it appears they found a way to use a similar approach to isolate the subjects.
If you do stumble upon a technical explanation, I'd love to hear it! (And I will likewise let you know if I see anything.)
Interesting, how do you feel about Midjourney still operating on Discord? (Feels kind of amateurish to me)
And have you tried Artflow? It’s been doing this for a while (though has some of the same problems and utalizes a freemium model)
Yeah I've never been a fan of the Discord interface, but I've gotten quite used to it over time.
Then again, I've had access to the Alpha website since December and have been using that exclusively. Wrote about that here (https://www.whytryai.com/p/10x-ai-32-midjourney-alpha-google-imagen-2).
I use Discord for the tutorials to make sure others can follow.
But Midjourney has since lowered the criteria for having website access to 1K generated images, and they're gearing up for a mass rollout soon. So I guess within a few months, everyone who wants a website will be able to us it!
I did check out Artflow back in the day. There are many great alternative UI's like Leonardo, Playground, etc. I'm so used to Midjourney and DALL-E 3 in ChatGPT Plus that I haven't been looking for additional interfaces to use actively.
Are you a frequent Artflow user?
Gotcha gotcha, my image generation needs are pretty few and far between so I mostly use Dall-e
I’m more just waiting for a company to be able to render fully animated stories with consistent characters over long runtimes which I why I mentioned Artflow
Obviously we’re not there yet, but Sora, it seems, gets close
Yeah Sora is pure witchcraft, and Mira Murati said in a recent interview that she expects a public release definitely this year and maybe in a few months.
I heard it takes ten minutes to render, but that’s totally worth it for the quality, especially if it’s bundled into the GPT subscription
Yeah 10-minutes for a 1-minute magical video based purely on a text prompt is nothing. And if recent history is anything to go by, we should expect the speed to improve quickly and the costs to come down as well after it launches.
Completely agree
I can imagine what you gonna do with Claude's API using ASCII characters. You certainly have a talent! Keep rocking!
ASCII art, the final art frontier!
I know I'm different than most, but I'm usually not happy until I've chained together three different apps anyway.
Why settle for less when you can have more?!