this post was submitted on 26 Jul 2023
868 points (96.5% liked)
Technology
59199 readers
3846 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
There is already a business model for compensating authors: it is called buying the book. If the AI trainers are pirating books, then yeah - sue them.
There are plagiarism and copyright laws to protect the output of these tools: if the output is infringing, then sue them. However, if the output of an AI would not be considered infringing for a human, then it isn’t infringement.
When you sell a book, you don’t get to control how that book is used. You can’t tell me that I can’t quote your book (within fair use restrictions). You can’t tell me that I can’t refer to your book in a blog post. You can’t dictate who may and may not read a book. You can’t tell me that I can’t give a book to a friend. Or an enemy. Or an anarchist.
Folks, this isn’t a new problem, and it doesn’t need new laws.
This is demonstrably wrong. You cannot buy a book, and then go use it to print your own copies for sale. You cannot use it as a script for a commercial movie. You cannot go publish a sequel to it.
Now please just try to tell me that AI training is specifically covered by fair use and satire case law. Spoiler: you can’t.
This is a novel (pun intended) problem space and deserves to be discussed and decided, like everything else. So yeah, your cavalier dismissal is cavalierly dismissed.
It's specifically distribution of the work or derivatives that copyright prevents.
So you could make an argument that an LLM that's memorized the book and can reproduce (parts of) it upon request is infringing. But one that's merely trained on the book, but hasn't memorized it, should be fine.
But by their very nature the LLM simply redistribute the material they’ve been trained on. They may disguise it assiduously, but there is no person at the center of the thing adding creative stokes. It’s copyrighted material in, copyrighted material out, so the plaintiffs allege.
They don't redistribute. They learn information about the material they've been trained on - not there natural itself*, and can use it to generate material they've never seen.