DALL-E 3: The Cartoonist
DALL-E 3 can create single-panel cartoons with surprising ease. It even does speech bubbles with text. Try it yourself and let your imagination flow.
Happy Thursday, you rapscallions!
The tip from my last 10X AI post was about accessing a free preview of DALL-E 3 through the Bing Image Creator.
Since then, I’ve used the Bing version of DALL-E 31 quite a bit to figure out how it compares to what’s currently out there.
My tentative conclusion: For photographic images, Midjourney still has a very slight edge.
Here are a few comparisons:
Fashion photo of a woman in a red dress
Street photo of an old man sitting on a bench
Wildlife photo of giraffes eating leaves
Now, DALL-E 3 images are really good. Especially the old man.
Yet they do sometimes end up having a certain plasticky, photoshopped vibe. (View the woman’s portrait in full resolution and you’ll see what I mean.)
But I discovered an unexpected area where DALL-E 3 truly shines: Single-panel cartoons. On this front, it beats both Midjourney and Stable Diffusion by miles.
Here’s what I mean.
Why DALL-E 3 is perfect for cartoons
There are a few things that make DALL-E 3 especially well-suited for cartoons.
Prompt adherence: DALL-E 3 follows instructions incredibly well. Much better than all the current alternative models. You can describe a scene in great detail and usually find that every element is accurately reflected.
Text rendering: DALL-E 3 is not only good at writing intelligible text, but it also knows how to attribute the words to the right speaker via a speech bubble. (This part might take several rerolls, but it’s definitely doable.)
Cartoony style: In the absence of additional instructions, DALL-E 3 defaults to a somewhat goofy, cartoonish vibe. As
pointed out in a recent comment, vanilla DALL-E 3 “looks like a silly Disney poster.” It just so happens that—for this specific purpose—that’s exactly what we need.
What all of the above means is that you can consistently create complete cartoons2 in one go by simply describing them to DALL-E 3 in natural language.
This post might get cut off in some email clients. Click here to read it online.
DALL-E 3 cartoon showcase
To test this, I asked ChatGPT to come up with a few one-panel cartoon concepts, then had DALL-E 3 (Bing version) render these.
Here’s a selection of my descriptions and the resulting unedited images:
Cartoon illustration: A sad suitcase saying to a purse during a romantic dinner date, "I’ve got baggage."
Cartoon illustration: Frustrated T-rex struggles to play the harp. He says "I'm a one-octave band!"
Cartoon illustration: A skydiver checking his phone mid-air, saying, "There’s no app for this?"
Cartoon illustration: A woman is eating a meal made up of literal "Likes" and "Shares." She says, "I'm on a social media diet"
Cartoon illustration: A group of aliens are taking pictures of a fire hydrant. One exclaims: "Earth architecture is just wow"
Cartoon illustration: A dog in a yoga class, standing on all fours and facing down, saying "I call this one 'Downward me'"
DALL-E 3 is so good at these that I’ve changed the way I do AI cartoons for my silly side project: AI Jest Daily. (It’s an experiment to trace the evolution of AI’s comedic abilities by having various AI tools generate images and the accompanying jokes or funny captions.)
Speaking of 100% AI-generated projects: ’s latest post includes a video where the script, images, voices, etc. are done entirely by AI. I was happy to contribute the black-and-white line drawings (made in Midjourney) to the film. Check out Tita Costa’s posts for some great AI imagery as well.
Try making your own AI cartoons (for free)
The cool part?
At the moment, you can replicate the above without any paid tools. All you need is a free ChatGPT account and a free Microsoft account.
Here’s the entire process:
Head to chat.openai.com and start a chat using the free GPT-3.5 model
Use the following prompt (or tweak for your own needs):
Please help me generate 10 one-sentence descriptions for a funny single-panel cartoon that contains a speech bubble with around 5 words. Use the following template: “Cartoon illustration: [Your cartoon description]”
Pick your favorite cartoon idea(s)
Head to bing.com/create and log in with your Microsoft account
Copy-paste ChatGPT’s cartoon description as the prompt and click “Create”
Pick your favorite cartoon
That’s it!
Rinse. Repeat.
If you get some especially awesome results, feel free to share them!
DALL-E 3 and the end of “prompt engineering”
Already back in April, I ranted against AI gurus who try to convince you that “prompt engineering”3 is some exclusive knowledge bestowed upon the chosen few.
My take has always been that “prompting” is really about becoming better at communication in general. Here’s a quote:
Because both ChatGPT and text-to-image tools are so great at understanding natural language, being good at prompting really comes down to being good at expressing what you want with words.
That’s even more true now!
With DALL-E 3, we’re moving even further away from “prompt engineering” and “splatterprompting” to a future where you can have a regular conversation with AI to get results. Simply talk to AI as you would to a human artist, and you’ll be fine.
Over to you…
Have you experimented with the free preview of DALL-E 3 yet? Did you discover other things it’s especially useful for? Can you think of any more interesting applications for a model that is this good at following instructions?
Send me an email at whytryai@substack.com or leave a comment.
As mentioned, the Bing Image Creator version may well differ from the imminent official ChatGPT launch. I’ll definitely take DALL-E 3 for a spin when it’s finally released to ChatGPT Plus users.
Yes, I am very proud of that quintuple alliteration. Thank you for asking.
Understanding certain intricacies of how large language models process input might help you get more effective output on the first try, but you can always get there gradually through a back-and-forth conversation with e.g. ChatGPT.
Enjoyed your Discord link to Pika Labs but also see there is a Pica Ai. Are these two connected 🤔 ?
“Downward Me” gave me a full guffaw!