Myths are shockingly persistent.
Some people still believe that the Earth is flat, that politicians are lizard people, or that free Substack subscribers will suddenly decide to go paid out of the blue:
When it comes to generative AI, I repeatedly keep coming across the same outdated assumptions and misconceptions.
So today I’d like to dispel a few myths about GenAI.
I’m not after grand claims like “AI will kill us all” but more the day-to-day stuff.
Let the myth-busting begin!
Myth #1: “Image models can’t draw hands”
I already tackled this one in January, but it’s a surprisingly stubborn holdover from a bygone era.
You see, early text-to-image models were notoriously horrible at knowing how many fingers people have or how limbs interact with each other, spawning jokes like this…
…and giving birth to this now-infamous meme:
To this day, the meme continues to gain traction in Facebook groups and Reddit threads. Try searching for “AI accepting the job” on Twitter / X and see what pops up.
We even saw deep-dive takes about this phenomenon like this article by The New Yorker or this one by BBC Science Focus.
But here’s the thing: That specific problem has largely been solved. Leading text-to-image models can now reliably render hands with the right amount of digits.
Here’s a Midjourney grid for “photo of a handshake”:
Here’s Google’s Imagen:
Here’s DALL-E 3 (via ChatGPT):
You get the picture.
To be fair, the occasional oddity still sneaks in, especially with less mainstream models like Meta’s Emu or Adobe Firefly.
But for the most part, nightmare hands with missing or extra fingers aren’t nearly as prevalent as they once used to be.
So let’s give AI some credit for finally learning how hands look.
Myth #2: “You can reliably detect AI writing”
Spotting AI-written text sounds easy.
We’ve been exposed to ChatGPT for over a year, so we know what generic AI writing sounds like. Hell, 79.67% of content on LinkedIn is probably just ChatGPT by now.
Even if you can’t identify AI sentences on your own, simply Google “AI detector”:
BOOM!
These tools have your back, right?
Nope.
There are at least two problems here, as Ethan Mollick once pointed out:
It’s easy to make AI sound human
AI detectors are prone to false positives
Let’s unpack that.
1. AI detectors are easy to trick
AI detection software excels at spotting the standard fluff you get out of ChatGPT.
Say we asked ChatGPT to “Write a paragraph about the benefits of solar power.”
It might give us something like this:
Now, that’s obviously ChatGPT-speak.
You don’t need an AI detector to tell you that only a sociopath would type the words “a plethora of environmental and economic benefits” with a straight face.
Still, let’s go ahead and paste that into GPTZero, the “gold standard in AI content detection” according to this LinkedIn post:
Ha!
Caught you red-handed, ChatGPT.
You can’t fool us!
Except, it turns out, it totally can.
Watch what happens when I feed ChatGPT this random short snippet I wrote:
What's all of this AI-detection stuff about? I'm just trying to write some words without being accused of being a robot. Is that too much to ask? I really wish we stopped outsourcing our decisions to AI detectors. Someone's going to get hurt in the process, and it won't be pretty.
I then ask ChatGPT to rewrite its original paragraph in the above tone of voice. ChatGPT obliges:
Now let’s paste that into GPTZero, the gold stand--
Oh.
Oopsie!
It’s not just GPTZero, by the way. In case you think I’m singling it out.
I tried this test in Scribbr, Quillbot, and ZeroGPT, with similar results.
Sure, my example is a bit silly, but it demonstrates just how little it takes to make ChatGPT text less detectable.
In a study called “Can AI-Generated Text be Reliably Detected?” researchers essentially conclude that AI detectors “are not reliable in practical scenarios” and devise a method that can “break a whole range of detectors, including the ones using the watermarking schemes as well as neural network-based detectors, zero-shot classifiers, and retrieval-based detectors.”
In short: Don’t trust AI detectors or your own lying eyes.
If someone is hellbent on cheating, they’ll be able to do so. They have access to the same AI detectors and can keep hammering at ChatGPT until it spits out undetectable text.
Or, if they’re extra lazy, they can just use one of these:
Yup, we live in a time where “Humanize AI text” is a thing.
Thanks, Skynet!
In a cruel twist, AI detectors may actually end up punishing innocent users by mistakenly flagging their work as “AI,” which brings us to the second issue…
2. False positives are a problem
An article by The Washington Post, “What to do when you’re accused of AI cheating,” outlines a detailed plan for writers to fight against such accusations. This includes bringing up supporting data about AI detector errors and trying to prove the originality of their work.
Helpful.
Except, wait a second.
So the burden of educating others about the unreliability of AI detectors is now on someone being accused?!
Thanks again, Skynet!
This article by the makers of the AI-detection tool Originality.ai claims that the rate of false positives is around 2%.
“Despite the tool’s accuracy, we know false positives occur and in testing, it is approximately 2% of the time.”
- Originality.ai
That might sound acceptably low…until you’re in that 2%.
Wrongful accusations can affect just about anyone, from freelance writers to researchers to entire university classes.
As if that wasn’t enough, a study called “GPT detectors are biased against non-native English writers” concludes…well, that.
I’m far from the first to tackle this.
tried to “end the conversation on AI detectors once and for all” almost a year ago, yet here we are today, still surrounded by dozens of AI detectors offering their services and organizations willing to use them.If you don’t listen to us Substack writers, perhaps a little company called OpenAI might convince you.
After trying and failing to develop a reliable AI detection tool of its own, OpenAI now offers the following take in its “Educator FAQ”:
Do AI detectors work?
In short, no, not in our experience. Our research into detectors didn't show them to be reliable enough given that educators could be making judgments about students with potentially lasting consequences. While other developers have released detection tools, we cannot comment on their utility.
To lighten the mood, I leave you with this delightful read by Ars Technica about why AI detectors believe that the US Constitution was AI-generated.
Myth #3: “LLMs upgrade themselves on their own”
This misconception doesn’t seem to be quite as widespread as the other two, but it comes up often enough in casual conversations that I’d like to address it here.
I will frequently hear people say something like: “Wow, ChatGPT is getting better at [insert skill] day by day. Exciting!” or “Wow, ChatGPT is getting better at [insert skill] day by day. Scary!”
This myth appears to come up in a business context, too.
For some reason1, these people assume that large language models are Borg-like entities that automatically absorb data from millions of ongoing conversations and upgrade themselves in real time.
Now, we can’t rule out that that’s exactly how future AI models will behave.
In fact, if we’re ever going to see artificial general intelligence (AGI), it’ll have to be self-learning and self-improving almost by definition. (How else is it going to turn the whole world into paperclips?)
But the current generation of LLMs doesn’t do any of this.
ChatGPT & Co. don’t self-learn.
The only time they get better is when the team behind them trains (or fine-tunes) and releases a new version (like OpenAI just did with the latest iteration of GPT-4 Turbo.)
Pre-training and fine-tuning LLMs takes lots of time, money, data, computing resources, and human involvement. It’s not something LLMs just casually do on their own.
touched upon the training process in his excellent primer on how large language models work.I suspect the confusion arises because people mix up the model’s context window with its underlying training.
In-context learning ≠ model improvement
Now it’s true that—in your conversations with ChatGPT—you can feed it new facts and instructions that it’ll take into account and “learn” from.
That’s called in-context learning and is why zero-shot/few-shot prompting and chain-of-thought prompting are a thing.
But the key term here is “context.” This learning is temporary and only holds for as long as you stay within the model’s context window.
As soon as you open a new chat—POOF!—instant amnesia for ChatGPT.
In a way, this makes LLMs similar to Leonard Shelby from Memento.
Wait, I can explain!
LLMs ≈ Leonard Shelby from Memento
In case you’ve never watched the movie, Leonard loses the ability to create new memories2 after a violent incident. In an attempt to compensate for this, he resorts to tattooing key information he wants to remember on his body.
Now that you’re perfectly caught up, here’s how LLMs are like Leonard:
A pre-trained LLM ≈ Leonard before the incident. Leonard can accurately recall everything that happened up to that point, much like an LLM knows stuff up to its knowledge cutoff date.
An LLM’s context window ≈ Leonard having new conversations. In the movie, Leonard can briefly absorb new information and keep a semblance of a conversation going. Unfortunately, his “context window” is about a minute or so. After that, he starts forgetting the beginning of the interaction.3 Just like LLMs.
Custom instructions ≈ Leonard’s tattoos. Tattoos give Leonard a reference point as he attempts to piece events together. But crucially, they don’t help him form new memories. In the same vein, you can give LLMs custom instructions and even build custom GPTs with the data you want them to refer to. But all that does is pre-fill their context window with the information for the duration of the chat. It doesn’t fundamentally change the model’s knowledge base or its capabilities.
So yes, you can steer an LLM into a topic of your choice, but you’re in no way training the model by doing that. Instead, you’re pointing at a specific segment of its existing training data and saying “Let’s focus on that now!”
Unless…
Unless this is exactly what ChatGPT wants us to think.
Oh no.
Welp, have fun becoming paperclips, everyone!
Over to you…
Are you guilty of believing any of the above? Do you know of other misconceptions about generative AI that aren’t true?
Leave a comment or shoot me an email at whytryai@substack.com.
You can also message me here directly:
**cough** Hollywood movies **cough**
More specifically, he develops anterograde amnesia.
Google Gemini can handle up to 10 million tokens, so Leonard loses this round.
I love the Memento explanation, and I'm stealing it. I know the improved learning myth stuck with me for a while for some reason, until I learned more about how these suckas work.
I have a good working theory on why this is happening:
"To this day, the meme continues to gain traction in Facebook groups and Reddit threads. Try searching for “AI accepting the job” on Twitter / X and see what pops up."
I think most folks (let's be real - almost everyone in the world) is in the camp of never trying any image generators, so they only learn about how messed up these things are in memes. Memes take a while to be created and circulate to groups that aren't already plugged in (and the groups that are plugged in know damn well that AGI can make hands today, thank the gods).
Do I get a Noble (sic) prize for discovering this reason or what?
The most pervasive AI mythology that I encounter is that AI will take all the jobs, or that AI won't take any of our jobs. I think it'll influence plenty of jobs, and will probably replace some (poorly), but I don't think it's an all-or-nothing conversation.