3 Myths About AI That Just Won't Die

Let's put them to rest once and for all.

Apr 18, 2024

Myths are shockingly persistent.

Some people still believe that the Earth is flat, that politicians are lizard people, or that free Substack subscribers will suddenly decide to go paid out of the blue:

When it comes to generative AI, I repeatedly keep coming across the same outdated assumptions and misconceptions.

So today I’d like to dispel a few myths about GenAI.

I’m not after grand claims like “AI will kill us all” but more the day-to-day stuff.

Let the myth-busting begin!

Myth #1: “Image models can’t draw hands”

I already tackled this one in January, but it’s a surprisingly stubborn holdover from a bygone era.

You see, early text-to-image models were notoriously horrible at knowing how many fingers people have or how limbs interact with each other, spawning jokes like this…

Ring-finger-ring - criminals will wear extra prosthetic fingers to make surveillance footage look AI generated and inadmissible — Source: @bristowbailey

…and giving birth to this now-infamous meme:

Everyone: AI art will make designers obsolete. AI accepting the job. — Source: **Know Your Meme**

To this day, the meme continues to gain traction in Facebook groups and Reddit threads. Try searching for “AI accepting the job” on Twitter / X and see what pops up.

We even saw deep-dive takes about this phenomenon like this article by The New Yorker or this one by BBC Science Focus.

But here’s the thing: That specific problem has largely been solved. Leading text-to-image models can now reliably render hands with the right amount of digits.

Here’s a Midjourney grid for “photo of a handshake”:

Four-image grid of a handshake from Midjourney

Here’s Google’s Imagen:

Four-image grid of a handshake from Imagen

Here’s DALL-E 3 (via ChatGPT):

Four-image grid of a handshake from DALL-E 3

You get the picture.

To be fair, the occasional oddity still sneaks in, especially with less mainstream models like Meta’s Emu or Adobe Firefly.

But for the most part, nightmare hands with missing or extra fingers aren’t nearly as prevalent as they once used to be.

So let’s give AI some credit for finally learning how hands look.

Myth #2: “You can reliably detect AI writing”

Spotting AI-written text sounds easy.

We’ve been exposed to ChatGPT for over a year, so we know what generic AI writing sounds like. Hell, 79.67% of content on LinkedIn is probably just ChatGPT by now.

Even if you can’t identify AI sentences on your own, simply Google “AI detector”:

Google search results page for "AI Detector"

BOOM!

These tools have your back, right?

Nope.

There are at least two problems here, as Ethan Mollick once pointed out:

It’s easy to make AI sound human
AI detectors are prone to false positives

Let’s unpack that.

1. AI detectors are easy to trick

AI detection software excels at spotting the standard fluff you get out of ChatGPT.

Say we asked ChatGPT to “Write a paragraph about the benefits of solar power.”

It might give us something like this:

Solar power offers a plethora of environmental and economic benefits that make it a highly attractive source of energy. As a clean and renewable resource, solar energy reduces reliance on fossil fuels, thereby decreasing air pollution and cutting greenhouse gas emissions, which are crucial steps towards mitigating climate change. Economically, solar power can significantly lower electricity bills, as it provides individuals and businesses the opportunity to generate their own energy and potentially sell excess power back to the grid. Additionally, the solar industry has been a burgeoning field for job creation, spanning manufacturing, installation, and maintenance of solar panels. With technological advancements improving efficiency and reducing costs, solar power continues to be a sustainable and increasingly accessible energy option for a wide range of applications.

Now, that’s obviously ChatGPT-speak.

You don’t need an AI detector to tell you that only a sociopath would type the words “a plethora of environmental and economic benefits” with a straight face.

Still, let’s go ahead and paste that into GPTZero, the “gold standard in AI content detection” according to this LinkedIn post:

Ha!

Caught you red-handed, ChatGPT.

You can’t fool us!

Except, it turns out, it totally can.

Watch what happens when I feed ChatGPT this random short snippet I wrote:

What's all of this AI-detection stuff about? I'm just trying to write some words without being accused of being a robot. Is that too much to ask? I really wish we stopped outsourcing our decisions to AI detectors. Someone's going to get hurt in the process, and it won't be pretty.

I then ask ChatGPT to rewrite its original paragraph in the above tone of voice. ChatGPT obliges:

What’s all this fuss about solar power, huh? I’m just trying to save some cash on my electric bill without someone telling me I’m jumping on some green bandwagon. Is that too much to ask? I really wish we’d just chill with the skepticism and see the sunshine for what it is—a way to cut down on pollution and keep a few more coins in our pockets. Jobs are popping up like daisies and honestly, with the tech getting cheaper, why the heck not? Someone’s going to thank us later, and it’s going to be pretty darn great.

Now let’s paste that into GPTZero, the gold stand--

GPT Zero mistakenly highlighting AI content as likely human

Oh.

Oopsie!

It’s not just GPTZero, by the way. In case you think I’m singling it out.

I tried this test in Scribbr, Quillbot, and ZeroGPT, with similar results.

Sure, my example is a bit silly, but it demonstrates just how little it takes to make ChatGPT text less detectable.

In a study called “Can AI-Generated Text be Reliably Detected?” researchers essentially conclude that AI detectors “are not reliable in practical scenarios” and devise a method that can “break a whole range of detectors, including the ones using the watermarking schemes as well as neural network-based detectors, zero-shot classifiers, and retrieval-based detectors.”

In short: Don’t trust AI detectors or your own lying eyes.

If someone is hellbent on cheating, they’ll be able to do so. They have access to the same AI detectors and can keep hammering at ChatGPT until it spits out undetectable text.

Or, if they’re extra lazy, they can just use one of these:

"AI humanized text" search results on Googe

Yup, we live in a time where “Humanize AI text” is a thing.

Thanks, Skynet!

In a cruel twist, AI detectors may actually end up punishing innocent users by mistakenly flagging their work as “AI,” which brings us to the second issue…

2. False positives are a problem

An article by The Washington Post, “What to do when you’re accused of AI cheating,” outlines a detailed plan for writers to fight against such accusations. This includes bringing up supporting data about AI detector errors and trying to prove the originality of their work.

Helpful.

Except, wait a second.

So the burden of educating others about the unreliability of AI detectors is now on someone being accused?!

Thanks again, Skynet!

This article by the makers of the AI-detection tool Originality.ai claims that the rate of false positives is around 2%.

“Despite the tool’s accuracy, we know false positives occur and in testing, it is approximately 2% of the time.”
- Originality.ai

That might sound acceptably low…until you’re in that 2%.

Wrongful accusations can affect just about anyone, from freelance writers to researchers to entire university classes.

As if that wasn’t enough, a study called “GPT detectors are biased against non-native English writers” concludes…well, that.

I’m far from the first to tackle this.

Alberto Romero

tried to “end the conversation on AI detectors once and for all” almost a year ago, yet here we are today, still surrounded by dozens of AI detectors offering their services and organizations willing to use them.

If you don’t listen to us Substack writers, perhaps a little company called OpenAI might convince you.

After trying and failing to develop a reliable AI detection tool of its own, OpenAI now offers the following take in its “Educator FAQ”:

Do AI detectors work?
In short, no, not in our experience. Our research into detectors didn't show them to be reliable enough given that educators could be making judgments about students with potentially lasting consequences. While other developers have released detection tools, we cannot comment on their utility.

To lighten the mood, I leave you with this delightful read by Ars Technica about why AI detectors believe that the US Constitution was AI-generated.

Myth #3: “LLMs upgrade themselves on their own”

This misconception doesn’t seem to be quite as widespread as the other two, but it comes up often enough in casual conversations that I’d like to address it here.

I will frequently hear people say something like: “Wow, ChatGPT is getting better at [insert skill] day by day. Exciting!” or “Wow, ChatGPT is getting better at [insert skill] day by day. Scary!”

This myth appears to come up in a business context, too.

For some reason1, these people assume that large language models are Borg-like entities that automatically absorb data from millions of ongoing conversations and upgrade themselves in real time.

Borg Meme with "A plethora of environmental and economic benefits" as the text

Now, we can’t rule out that that’s exactly how future AI models will behave.

In fact, if we’re ever going to see artificial general intelligence (AGI), it’ll have to be self-learning and self-improving almost by definition. (How else is it going to turn the whole world into paperclips?)

But the current generation of LLMs doesn’t do any of this.

ChatGPT & Co. don’t self-learn.

The only time they get better is when the team behind them trains (or fine-tunes) and releases a new version (like OpenAI just did with the latest iteration of GPT-4 Turbo.)

Pre-training and fine-tuning LLMs takes lots of time, money, data, computing resources, and human involvement. It’s not something LLMs just casually do on their own.

Timothy B Lee

touched upon the training process in his excellent primer on how large language models work.

I suspect the confusion arises because people mix up the model’s context window with its underlying training.

In-context learning ≠ model improvement

Now it’s true that—in your conversations with ChatGPT—you can feed it new facts and instructions that it’ll take into account and “learn” from.

That’s called in-context learning and is why zero-shot/few-shot prompting and chain-of-thought prompting are a thing.

But the key term here is “context.” This learning is temporary and only holds for as long as you stay within the model’s context window.

As soon as you open a new chat—POOF!—instant amnesia for ChatGPT.

In a way, this makes LLMs similar to Leonard Shelby from Memento.

Wait, I can explain!

LLMs ≈ Leonard Shelby from Memento

In case you’ve never watched the movie, Leonard loses the ability to create new memories2 after a violent incident. In an attempt to compensate for this, he resorts to tattooing key information he wants to remember on his body.

Now that you’re perfectly caught up, here’s how LLMs are like Leonard:

A pre-trained LLM ≈ Leonard before the incident. Leonard can accurately recall everything that happened up to that point, much like an LLM knows stuff up to its knowledge cutoff date.
An LLM’s context window ≈ Leonard having new conversations. In the movie, Leonard can briefly absorb new information and keep a semblance of a conversation going. Unfortunately, his “context window” is about a minute or so. After that, he starts forgetting the beginning of the interaction.3 Just like LLMs.
Custom instructions ≈ Leonard’s tattoos. Tattoos give Leonard a reference point as he attempts to piece events together. But crucially, they don’t help him form new memories. In the same vein, you can give LLMs custom instructions and even build custom GPTs with the data you want them to refer to. But all that does is pre-fill their context window with the information for the duration of the chat. It doesn’t fundamentally change the model’s knowledge base or its capabilities.

So yes, you can steer an LLM into a topic of your choice, but you’re in no way training the model by doing that. Instead, you’re pointing at a specific segment of its existing training data and saying “Let’s focus on that now!”

Unless…

Unless this is exactly what ChatGPT wants us to think.

Oh no.

Welp, have fun becoming paperclips, everyone!

Over to you…

Are you guilty of believing any of the above? Do you know of other misconceptions about generative AI that aren’t true?

Leave a comment or shoot me an email at whytryai@substack.com.

Why Try AI

Discussion about this post