this post was submitted on 23 Jul 2023
309 points (95.0% liked)

Technology

59662 readers
3035 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
 

I keep seeing posts about this kind of thing getting people's hopes up, so let's address this myth.

What's an "AI detector"?

We're talking about these tools that advertise the ability to accurately detect things like deep-fake videos or text generated by LLMs (like ChatGPT), etc. We are NOT talking about voluntary watermarking that companies like OpenAI might choose to add in the future.

What does "effective" mean?

I mean something with high levels of accuracy, both highly sensitive (low false negatives) and highly specific (low false positives). High would probably be at least 95%, though this is ultimately subjective.

Why should the accuracy bar be so high? Isn't anything better than a coin flip good enough?

If you're going to definitively label something as "fake" or "real", you better be damn sure about it, because the consequences for being wrong with that label are even worse than having no label at all. You're either telling people that they should trust a fake that they might have been skeptical about otherwise, or you're slandering something real. In both cases you're spreading misinformation which is worse than if you had just said "I'm not sure".

Why can't a good AI detector be built?

To understand this part you need to understand a little bit about how these neural networks are created in the first place. Generative Adversarial Networks (GANs) are a strategy often employed to train models that generate content. These work by having two different neural networks, one that generates content similar to existing content, and one that detects the difference between generated content and the existing content. These networks learn in tandem, each time one network gets better the other one also gets better.

That this means is that building a content generator and a fake content detector are effectively two different sides of the same coin. Improvements to one can always be translated directly and in an automated way into improvements into the other one. This means that the generator will always improve until the detector is fooled about 50% of the time.

Note that not all of these models are always trained in exactly this way, but the point is that anything CAN be trained this way, so even if a GAN wasn't originally used, any kind of improved detection can always be directly translated into improved generation to beat that detection. This isn't just any ordinary "arms race", because the turn around time here is so fast there won't be any chance of being ahead of the curve... the generators will always win.

Why do these "AI detectors" keep getting advertised if they don't work?

  1. People are afraid of being saturated by fake content, and the media is taking advantage of that fear to sell snake oil
  2. Every generator network comes with its own free detector network that doesn't really work all that well (~50% accuracy) because it was used to create the generator originally, so these detectors are ubiquitous among AI labs. That means the people that own the detectors are the SAME PEOPLE that created the problem in the first place, and they want to make sure you come back to them for the solution as well.
you are viewing a single comment's thread
view the rest of the comments
[–] itsnotlupus 54 points 1 year ago (1 children)

There are stories after stories of students getting shafted by gullible teachers who took one of those AI detectors at face value and decided their students were cheating based solely on their output.

And somehow those teachers are not getting the message that they're relying on snake oil to harm their students. They certainly won't see this post, and there just isn't enough mainstream pushback explaining that AI detectors are entirely inappropriate tools to decide whether to punish a student.

[–] Amazed 9 points 1 year ago (2 children)

Do you have suggestions on what might be more appropriate tools? What “punishment” may look like?

[–] itsnotlupus 16 points 1 year ago (1 children)

More appropriate tools to detect AI generated text you mean?

It's not a thing. I don't think it will ever be a thing. Certainly not reliably, and never as a 100% certainty tool.

The punishment for a teacher deciding you cheated on a test or an assignment? I don't know, but I imagine it sucks. Best case, you'd probably be at risk of failing the class and potentially the grade/semester. Worst case you might get expelled for being a filthy cheater. Because an unreliable tool said so and an unreliable teacher chose to believe it.

If you're asking what's the answer teachers should know to defend against AI generated content, I'm afraid I don't have one. It's akin to giving students math homework assignments but demanding that they don't use calculators. That could have been reasonable before calculators were a thing, but not anymore and so teachers don't expect that to make sense and don't put those rules on students.

[–] [email protected] 9 points 1 year ago (1 children)
[–] Decoy321 11 points 1 year ago (1 children)

Imagine someone bringing back old school pen and paper.

There'd be riots.

[–] [email protected] 14 points 1 year ago

In school and university, these are still widespread. Ditto physical proctoring vs remote as some IT certification rely on. If you thought cloud certs are annoying, try Red Hat.

[–] IamtheMorgz 1 points 1 year ago

Personally I think we're looking at it wrong. ChatGPT is a thing now, so teach it as a tool. Instead of write me a 5 page paper about Shakespeare it's "here's a five page paper on Shakespeare - figure out what's wrong with it, edit it, check sources, etc." Because that's the stuff ChatGPT can't do, and skills that will be valuable in the future.

We can check if students know material via tests (including their ability to write). But we should be teaching the new tool, too, not trying to get around it. Imagine today if your teacher said all your research needed to be done without the internet (in library and paper book only). You'd be rightfully pissed, because in the real world you have the internet to help you do research, and that tool should be available to you as a student.

Just my two cents. I used ChatGPT to help me write some stuff for work for the first time just a couple weeks ago. I would say it only got me about halfway to where I needed to be. Just like the ability to Google stuff doesn't mean we no longer have to know how to research (source checking, compiling information) ChatGPT doesn't mean we no longer have to have writing skills. It just shifts it a bit. Most tools throughout history have done that.