Technology

61994 readers

4596 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

1241

"Did you realize that we live in a reality where SciHub is illegal, and OpenAI is not?" (fosstodon.org)

submitted 1 year ago by [email protected] to c/technology

236 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] givesomefucks 14 points 1 year ago* (last edited 1 year ago) (1 children)

Using it to train is a grey area, if you paid for the works. If you didn't, it's still illegal

What it does is output copyrighted works which is copyright infringement. That is the legal issue. It's very easy to prompt it into giving full copyright text they never even paid to look at, let alone give to other people.

"AI" can't even handle switching synonyms to make it technically different like a college kid cheating on an essay

[–] [email protected] 6 points 1 year ago

Their argument is that the copying to their training database is "research". This would be a legal fair use of unauthorised copying. However, normally with research you make a prototype, and that prototype is distinctly different from the final commercial product. With LLM's the prototype is the finished commercial product, they keep adding to it, thus it isn't normal fair use.

When a court considers fair use, the first step is the type of use. The exemptions are education, research, news, comment, or criticism. Next, they consider the nature of the use, in particular whether it is commercial. Calling their copying "research" is a bit of a stretch - it's not like they're writing academic papers and making their data publicly available for review from other scientists - and their use is absolutely commercial. However, it needs to go before a judge to make the decision and it's very difficult for someone to show a cause of action, if only because all their copying is done secretly behind closed doors.

The output of the AI itself is a bit more difficult. The database ChatGPT runs off of does not include the whole works it learned from - it's in the training database where all the copying occurs. However, ChatGPT and other LLM's can sometimes still manage to reproduce the original works, and arguably this should be an offense. If a human being reads a book and then later writes a story that replicates significant parts of the book, then they would be guilty of plagiarism and copyright infringement, regardless of whether they genuinely believe they were coming up with original ideas.