AI Can Be Funny. But It Needs Your Help.
Observations after more than a year of making joke cartoons with AI.
In the month of June 2023, a unique confluence of two seemingly unrelated events would make Daniel—who, for reasons not entirely clear and highly inconvenient for this already needlessly long-winded sentence, was referring to himself in the third person—consider starting a new project.
Two things happened in June 2023:
AI chatbots like Bing started receiving image recognition capabilities.
I discovered that Midjourney could render images reminiscent of the iconic daily cartoon in The New Yorker.
This made me wonder whether AI chatbots with vision capabilities could successfully write joke captions for AI-generated images.
The only way to find out, I decided, was to start yet another Substack publication.
“Surely,” I said, “The public sentiment toward AI-generated content will forever remain rosy and positive and never turn against generative AI.”
On that naive note, was born.
My reasons for starting it were two-fold.
First, I was curious to see how a purely AI-driven Substack would fare.
Second, I wanted to try to answer the question we seem to love asking over and over and over again: Just how funny can AI be?
Now, 430+ issues later, I’d like to share my thoughts on what it’s like to try extracting semi-passable humor out of AI on a daily basis.
Let’s roll!
How AI Jest Daily changed over time
As generative AI evolved, so did AI Jest Daily.
As of now, AI Jest Daily has gone through two distinct eras.
Era 1: Cartoon-driven jokes
For the first 113 issues, I followed this process:
Ask Midjourney to create a black-and-white cartoon a la The New Yorker.
Upload my favorite of the resulting images to Bing.1
Ask Bing to come up with funny captions based on the image.
Select the winning caption and three runners-up (with Bing’s input).
I was essentially recreating The New Yorker’s caption contest with AI.
The result would look something like this:
During this time,
of wrote a thought piece on AI humor, featuring a few AI Jest Daily cartoons.The main characteristic of this era was that I’d always start with the cartoon, and then try to get AI to make jokes about it.
This continued until a little text-to-image model called DALL-E 3 came along…
Era 2: Joke-driven cartoons
At the time, DALL-E 3 set a new standard for prompt adherence and text rendering.
Shortly after it came out, I realized that DALL-E 3 could:
Create cartoon-style images that closely followed detailed prompts.
Render short speech bubbles with accurate text within those images.
So my concept for AI Jest Daily shifted from leading with the cartoon itself to leading with the joke.
As of this writing, my typical daily process looks like this:
Ask ChatGPT to brainstorm cartoon ideas on a given topic.
Refine those ideas through back-and-forth with ChatGPT.
Pick the final cartoon and ask DALL-E 32 to make it.
(I even have a custom GPT called “ToonSmith” for this.)
How my role changed over time
My original plan was to outsource the entire process to AI.
The idea was for Bing / ChatGPT to write the jokes, evaluate them, suggest the best ones, etc. without any involvement from my side.
I’d be a mere curator of images and the one who pressed the “Publish” button like a trained lab rat.
But I soon realized that it’s quite hard to completely extricate yourself from the process, especially after the initial pool of low-hanging jokes got exhausted.
Over time, I found myself increasingly “nudging” AI in the right direction while still letting it write the final joke.
Today, I am essentially the editor-in-chief of AI Jest Daily, with ChatGPT being the head writer who submits ideas for my approval and receives refinement suggestions.
I still try to stop myself from feeding ChatGPT the exact setup or punchline, but I’m far more hands-on than I’d expected to be when I started.
Now that AI and I have churned out several hundred cartoons, am I ready to definitively answer the “Can AI be funny?” question?
Well, I could certainly try.
What types of humor is AI best at?
Let me start with the classic caveat: Humor is very personal and subjective.3
I might find dad jokes amusing, but sociopaths like you might hate them because you’re dead inside.
So instead of giving a blanket answer as to ChatGPT’s comedy chops, I’ll share my observations on what types of humor AI is best suited for.
To help with this, I’ll use the “11 Funny Filters” popularized by
, co-founder of The Onion.4Here they are, with The Onion headlines as examples:
Now let me share my thoughts on large language models and types of comedy.
AI excels at…
Analogy
Parody
Wordplay
Wordplay is incredibly easy for large language models. After all, assembling tokens into words and sentences is what they do.
Playing with different meanings of a term for comedic effect is at the core of wordplay humor, and LLMs know every meaning of every word.
In fact, if you ask an AI chatbot for “a joke” without specifying it further, you’re very likely to get a bunch of regurgitated puns.
LLMs are also great at identifying similarities between different concepts. They’re connection-making machines, which is why analogy is another type of humor that comes easily to them.
Finally, LLMs are great at mimicking the tone of voice and style of a given author, celebrity, etc., which makes them great for parody.
One of the first things ChatGPT was ever asked to do was explain how to remove a peanut butter sandwich from a VCR in the style of The King James Bible, and it delivered:
So if you’re after clearcut, on-the-nose humor, large language models are the way to go.
AI can handle…
Character
Reference
Character and reference have similar roots to parody and analogy, but they require more nuance and subtlety to execute effectively.
Their goal is to give a knowing nod to a relatable trope or exhibit the traits of a well-established character stereotype without being too explicit about it.
Unfortunately, subtlety isn’t AI’s strong suit, as we’ll see in a moment.
Still, because LLMs can recognize popular references and character traits, they can usually manage these types of humor with a little help in toning things down.
The good news ends here.
Let’s now look at the areas where large language models tend to suck.
AI struggles with…
Irony
Hyperbole
Madcap
Meta-humor
Misplaced focus
Shock
Yup, that’s more than half the categories.
AI tends to fail at most of them, but for different reasons that I’ll unpack here.
First, let’s take irony, meta-humor, and misplaced focus.
What unifies these otherwise different comedy filters is the need for subtlety.
Misplaced focus only works if you don’t yell: “Look at me! I’m ignoring the obvious elephant in the room while focusing on this inconsequential mouse! Isn’t that hilarious?!”
And so on.
But here’s the thing: Overexplaining is LLMs’ entire shtick.
They’re trained with the explicit purpose of being useful assistants who communicate concepts clearly and exhaustively.
God forbid they leave something unsaid and force you to read between the lines!
Take today’s AI Jest Daily cartoon, inspired by a 100% real news headline (and combining the wordplay and reference filters):
At the risk of sounding like an LLM for a split second, this cartoon relies on drawing a parallel between our favorite meth cook Walter White in Breaking Bad and the common expression “breaking bread,” which wouldn’t be out of place in a food bank, while piggybacking on the suspiciously convenient real-world setup.
Whether or not the joke is your cup of tea, what’s supposed to make it work for the reader is the “Aha” moment when they put two and two together.
It rewards the reader for making the connection on their own, as jokes often do.
But when I asked ChatGPT to riff on the idea, it simply couldn’t stop spelling out the punchline. Here are the worst offenders:
ChatGPT is that one friend who elbows you in the ribs after every joke, saying “Get it?! GET IT?!”
We get it, ChatGPT!
We f-u-c-k-i-n-g get it.5
Now let’s take madcap and hyperbole.
These two filters are the opposite of subtle. They often work by painting a vivid picture that’s exaggerated, incongruent, or wacky.
So given that LLMs can’t be subtle, you’d think madcap and hyperbole would be a cakewalk.
But you’d be wrong.
With both madcap and hyperbole, there’s an underlying intent and purpose behind their over-the-top setup. There’s a method to the madness.
Hyperbole requires a solid understanding of the shared experience you’re shining a spotlight on.
Madcap needs at least some grounding in reality to be funny. It takes more than stringing random words together and calling it a day.
You can’t just say “banana frisbee donkey fart burgers” and expect anyone over the age of four to laugh. (Shame on you if you did! Shame!)
But LLMs are notorious for lacking a coherent world model, which is one of the many things people like Gary Marcus criticize them for.
As such, their attempts at madcap and hyperbole often end up being donkey-fart burgers. This was one of my observations when I pitted ChatGPT-4 against Pi in a joke-telling contest.
Finally, there’s shock.
Now, shock humor can get quite nuanced, but it can also just be a matter of throwing a couple of “shits” and “fucks” into a sentence…if you’re a lazy hack.
Here, mainstream LLMs aren’t limited by their inherent reasoning abilities.
Instead, what gets in the way is the fact that they’re deliberately trained to be respectful, inoffensive, and uncontroversial.
Now, it’s of course possible to work around this specific limitation.
For one, you can get your hands on an uncensored LLM that doesn’t have the same constraints as mass-market models.
Hell, even vanilla ChatGPT can be nudged out of its polite mode with a bit of prodding, like when I got it to gently roast a picture of me:
But the point is that LLMs will never default to shock comedy by themselves.
Observations & tips
Because this post wasn’t long enough, I wanted to share a few practical observations.
Maybe they’ll be helpful if you ever try making AI cartoons of your own.
1. Break the process down into steps
Instead of asking AI to come up with fleshed-out cartoon ideas in one go, here’s the approach that now works for me after plenty of trial and error:
Ask ChatGPT to list relatable tropes about my chosen topic.
Pick a trope I like and ask for ways to make it funny.
Choose a promising angle and ask ChatGPT to brainstorm single-panel cartoon ideas.
Iterate an idea with ChatGPT until I’m satisfied, then ask for a visual description of the cartoon.
Ask for alternative speech bubbles to land on the best one.
Request the final cartoon from DALL-E 3 or Ideogram.
I find that this approach generally results in better cartoons, as I get the chance to steer ChatGPT in the most promising direction at every turn.
It’s a bit like chain-of-thought prompting, except you’re working through the steps together with ChatGPT instead of prompting it to do it solo.
2. More constraints = better jokes
Just like us, large language models get more creative when nudged out of their comfort zone.6
Ask ChatGPT for “jokes about cats,” and it’ll happily stick to its favorite cringy wordplay:
But give it a few constraints, and you’ll get more interesting results:
Are these top-tier jokes? Not quite.
But several of them have potential and certainly give you more to work with than the purr-fectly generic originals.
3. Speech bubbles are fickle beasts
While both DALL-E 3 and Ideogram can reliably render text, you’ll still run into many oddities, like:
Speech bubbles assigned to the wrong character (even when the prompt correctly identifies the speaker)
Missing speech bubbles
Misspelled or dropped words
Words bleeding from speech bubbles into random text in the image
To improve your chances, I find that the following things help:
Limiting the speech bubble to about 6-7 words.
Having one speaking character per cartoon.
Describing the speaker and the speech bubble early on in the prompt.
Trying different ways to phrase the same joke if a specific combination of letters consistently fails.
But there’s no silver bullet, which is why you should...
4. Be ready to re-roll…a lot
I typically go through dozens of images to get to the final AI Jest Daily cartoon.
And that’s after settling on the joke and finalizing the text prompt for the image.
Here’s the discarded DALL-E 3 grid for today’s “Breaking Bread” cartoon:
Here’s the Ideogram grid for that same cartoon:
Even when a cartoon is almost good enough, I often end up having to make additional tweaks in Microsoft Paint (because I’ve got mad skills, you see).
So don’t get discouraged if DALL-E 3 doesn’t get it right on the first, second, or 37th try.
Adjust your prompt, roll the dice, and keep at it.
5. Profanity is possible
While ChatGPT itself will typically refuse to swear, you can sneak profanity into the cartoon thanks to the way DALL-E 3 handles text in an image.
For some reason, when the “fuck” in question is part of a speech bubble, DALL-E 3 will be happy to just render it.
There you have it, folks.
Now go out there and make me fucking proud!
🫵 Over to you…
What’s your take? Have you tried asking LLMs for jokes? And did they land?
Do you even give a shit?
Leave a comment or shoot me an email at whytryai@gmail.com.
I switched to using ChatGPT Plus after it got vision capabilities.
And occasionally ideogram.ai.
Unless it comes from me, in which case it’s universally hilarious.
Incidentally, this is one of the rare cartoons where I ended up feeding the exact speech bubble to ChatGPT.
There’s a reason creative writing prompts are narrow and specific, and not simply “Write a good story.”
This answers so many questions.
One thing I noticed is that gpt can deliver a good roast, but it always ends it with something conciliatory.
Another thing you can do is ask it to mimic styles of known comedians. I once spent lots of time massaging it to make a good comedy routine about how nobody really likes turkey for thanksgiving anyway because it’s so dry, and I think the Jim Gaffigan version turned out the best. Still it always ended with, “but all joking aside, let’s just enjoy our turkey and be thankful.”