The "Secret Sauce" Behind DALL-E 3: How Is It…

Nov 2, 2023

I look at the main takeaways from OpenAI's "Improving Image Generation with Better Captions" research paper.

6 Comments

Jun 30, 2024

My Dall-e picture of animals dancing had a four and a half legged giraffe and a monstrous hedgehog.I also fing its text writing has still a LOT to be desired.Having read your paper twice now I have tried things likecspacing the letters out .so that they become individual images..just leads to spaced out garble.Still thoroughly enjoying my "art" though ..and love your newsletter.More expert prompts to try please...

Reply (1)

Daniel Nest

Jul 1, 2024

That sounds odd - are you using DALL-E 3 inside of ChatGPT or another place?

I know you're a Night Cafe user and they do offer the DALL-E 3 model, but I don't know how similar it is to the original. You can also try using DALL-E 3 in Microsoft Copilot (https://copilot.microsoft.com/) by asking it to draw something.

As for text in images, it works for shorter messages of up to 6-7 words and then the quality degrades quite a bit. I use it daily for the "AI Jest Daily" cartoons: https://daijest.substack.com/

Charlie Guo

Nov 2, 2023

This is an excellent analysis! I hadn't seen the DALL-E 3 paper but I'm going to take a look at it later this week.

My main challenge with DALL-E 3 is 1) I'm pretty bad at image prompting, and 2) I don't love a lot of the default image styles that come out. For better or worse, I do like the heavily-stylized and/or photorealistic Midjourney output. Is that something you've been able to replicate via better DALL-E prompts?

Reply (1)

Daniel Nest

Nov 3, 2023

Yeah it's worth a look for sure!

To your points:

1) That's the beauty of having DALL-E 3 in ChatGPT. You don't have to know a thing about prompting at all, just a vague idea of what you'd like to see. Then ChatGPT spits out images, you tell it what to fix, etc. until you have what you need.

If you haven't already, I recommend checking out Ethan Mollick's latest piece where he actually showcases this brief iterative process with DALL-E 3: https://www.oneusefulthing.org/p/working-with-ai-two-paths-to-prompting

2) Yup, Midjourney definitely still shines especially when it comes to photographic images. (That's another comparison you can see in Ethan's article above.)

Like you, I still think MJ is the best model in terms of image quality, so I'm excited to see if the MJ team manages to use OpenAI's learnings to also perfect its prompt adherence.

Also, Midjourney just released a new feature called "Style Tuner" that basically lets you personlize the look of any prompt imaginable based on your preferences (https://docs.midjourney.com/docs/style-tuner). I'll be showing more of it in the upcoming 10X AI, but I encourage you to check it out if you're paying for MJ.

Andrew Smith

Nov 2, 2023

"Fortunately, OpenAI has access to an obscure little language model called GPT-4."

Why was I not informed?!?

I have been an enormous beneficiary of Dall-E 3. Holy crap is the art I'm making better, and I suspect it's only upward from here. I'll look forward to continuing to hear about these little updates from you!

Reply (1)

Daniel Nest

Nov 3, 2023

It's pretty clear that things will only get better from here on out. As I write, most of the weaknesses of DALL-E 3 are fixable, and I'm sure OpenAI will iterate on those in the future!

Why Try AI

The "Secret Sauce" Behind DALL-E 3: How Is It…