Technology

61761 readers

5625 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

468

OpenAI just admitted it can't identify AI-generated text. That's bad for the internet and it could be really bad for AI models. (www.businessinsider.com)

submitted 2 years ago by L4s to c/technology

133 comments fedilink hide all child comments

OpenAI just admitted it can't identify AI-generated text. That's bad for the internet and it could be really bad for AI models.::In January, OpenAI launched a system for identifying AI-generated text. This month, the company scrapped it.

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 1 points 2 years ago (1 children)

They already do. where do you think the training corpus comes from? The real world. It's curated by humans and then fed to the ml system.

Problem is that the real world now has a bunch of text generated by ai. And it has been well studied that feeding that back into the training will destroy your model (because the networks would then effectively be trained to predict their own output, which just doesn't make sense)

So humans still need to filter that stuff out of the training corpus. But we can't detect which ones are real and which ones are fake. And neither can a machine. So there's no way to do this properly.

The data almost always comes from the real world, except now the real world also contains "harmful" (to ai) data that we can't figure out how to find and remove.

[–] volodymyr 1 points 2 years ago

There are still people in between, building training data from their real world experices. Now digital world may become overwhelmed with AI creations, so training may lead to model collapse. So what if we give AI access to cameras, microphones, all that, and even let it articulate them. It would also need to be adventurous, searching for spaces away from other AI work. There is lot's of data in there which is not created by AI, although some point it might become so as well. I am living aside at the moment obvious dangers of this approach.