DALL-E 3: The Cartoonist

DALL-E 3 can create single-panel cartoons with surprising ease. It even does speech bubbles with text. Try it yourself and let your imagination flow.

Oct 05, 2023

Happy Thursday, you rapscallions!

The tip from my last 10X AI post was about accessing a free preview of DALL-E 3 through the Bing Image Creator.

Since then, I’ve used the Bing version of DALL-E 31 quite a bit to figure out how it compares to what’s currently out there.

My tentative conclusion: For photographic images, Midjourney still has a very slight edge.

Here are a few comparisons:

Fashion photo of a woman in a red dress

Street photo of an old man sitting on a bench

Midjourney vs. DALL-E 3 photos of an old man on a bench — Midjourney (left) vs. DALL-E 3 (right)

Wildlife photo of giraffes eating leaves

Giraffes eating leaves, wildlife photo by Midjourney vs DALL-E 3 — Midjourney (left) vs. DALL-E 3 (right)

Now, DALL-E 3 images are really good. Especially the old man.

Yet they do sometimes end up having a certain plasticky, photoshopped vibe. (View the woman’s portrait in full resolution and you’ll see what I mean.)

But I discovered an unexpected area where DALL-E 3 truly shines: Single-panel cartoons. On this front, it beats both Midjourney and Stable Diffusion by miles.

Here’s what I mean.

Why DALL-E 3 is perfect for cartoons

There are a few things that make DALL-E 3 especially well-suited for cartoons.

Prompt adherence: DALL-E 3 follows instructions incredibly well. Much better than all the current alternative models. You can describe a scene in great detail and usually find that every element is accurately reflected.
Text rendering: DALL-E 3 is not only good at writing intelligible text, but it also knows how to attribute the words to the right speaker via a speech bubble. (This part might take several rerolls, but it’s definitely doable.)
Cartoony style: In the absence of additional instructions, DALL-E 3 defaults to a somewhat goofy, cartoonish vibe. As
Rationaltail
pointed out in a recent comment, vanilla DALL-E 3 “looks like a silly Disney poster.” It just so happens that—for this specific purpose—that’s exactly what we need.

What all of the above means is that you can consistently create complete cartoons2 in one go by simply describing them to DALL-E 3 in natural language.

This post might get cut off in some email clients. Click here to read it online.

DALL-E 3 cartoon showcase

To test this, I asked ChatGPT to come up with a few one-panel cartoon concepts, then had DALL-E 3 (Bing version) render these.

Here’s a selection of my descriptions and the resulting unedited images:

Cartoon illustration: A sad suitcase saying to a purse during a romantic dinner date, "I’ve got baggage."

Cartoon illustration: Frustrated T-rex struggles to play the harp. He says "I'm a one-octave band!"

Cartoon illustration: A skydiver checking his phone mid-air, saying, "There’s no app for this?"

Cartoon illustration: A woman is eating a meal made up of literal "Likes" and "Shares." She says, "I'm on a social media diet"

Cartoon illustration: A group of aliens are taking pictures of a fire hydrant. One exclaims: "Earth architecture is just wow"

Cartoon illustration: A dog in a yoga class, standing on all fours and facing down, saying "I call this one 'Downward me'"

DALL-E 3 is so good at these that I’ve changed the way I do AI cartoons for my silly side project: AI Jest Daily. (It’s an experiment to trace the evolution of AI’s comedic abilities by having various AI tools generate images and the accompanying jokes or funny captions.)

Speaking of 100% AI-generated projects:

Tita Costa

’s latest post includes a video where the script, images, voices, etc. are done entirely by AI. I was happy to contribute the black-and-white line drawings (made in Midjourney) to the film. Check out Tita Costa’s posts for some great AI imagery as well.

Try making your own AI cartoons (for free)

The cool part?

At the moment, you can replicate the above without any paid tools. All you need is a free ChatGPT account and a free Microsoft account.

Here’s the entire process:

Head to chat.openai.com and start a chat using the free GPT-3.5 model
Use the following prompt (or tweak for your own needs):
Please help me generate 10 one-sentence descriptions for a funny single-panel cartoon that contains a speech bubble with around 5 words. Use the following template: “Cartoon illustration: [Your cartoon description]”
Pick your favorite cartoon idea(s)
Head to bing.com/create and log in with your Microsoft account
Copy-paste ChatGPT’s cartoon description as the prompt and click “Create”
Pick your favorite cartoon

That’s it!

Rinse. Repeat.

If you get some especially awesome results, feel free to share them!

DALL-E 3 and the end of “prompt engineering”

Already back in April, I ranted against AI gurus who try to convince you that “prompt engineering”3 is some exclusive knowledge bestowed upon the chosen few.

My take has always been that “prompting” is really about becoming better at communication in general. Here’s a quote:

Because both ChatGPT and text-to-image tools are so great at understanding natural language, being good at prompting really comes down to being good at expressing what you want with words.

That’s even more true now!

With DALL-E 3, we’re moving even further away from “prompt engineering” and “splatterprompting” to a future where you can have a regular conversation with AI to get results. Simply talk to AI as you would to a human artist, and you’ll be fine.

Over to you…

Have you experimented with the free preview of DALL-E 3 yet? Did you discover other things it’s especially useful for? Can you think of any more interesting applications for a model that is this good at following instructions?

Send me an email at whytryai@substack.com or leave a comment.

Why Try AI

Discussion about this post