this post was submitted on 10 Jul 2023
421 points (94.7% liked)

Technology

34530 readers
396 users here now

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.


Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.


Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

founded 5 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 4 points 1 year ago (3 children)

I was under impression that there was no real definitive way to tell what ChatGPT or similar AI use for their training. Am I wrong?

[–] NevermindNoMind 18 points 1 year ago (2 children)

Yes, it's in the lawsuit and another article I read. Open AI said they used a specific dataset, and the makers of that dataset said they used some online open libraries which have full texts of books. That's the primary basis of the lawsuit. They also argue that if you ask ChatGPT for a summary of their books, it will spit one out, which they are claiming is misuse of their copywriten work. That claim sounds dicey to me, Wikipedia and all manner of websites summarize books, so I'm not following how ChatGPT doing it is different. But I'm an idiot so who cares what I think.

[–] [email protected] 7 points 1 year ago* (last edited 1 year ago)

Remember, the human that wrote a summary had to legally obtain a copy of the source material first too. It should be no different when training an AI model. There's a whole new can of worms here, though, since the summary was written by another person and that person holds the copyright to that summary (unless there is a substantial amount of the original material, of course). But an AI model is not "creating" a new, copyrightable work. It has to be trained on the entire source material and algorithmically creates a summary directly from that. Because there's nothing 'new' being created, I can see why it could be claimed that a summary from an AI model should be considered a derivative work. But honestly, it's starting to border on the question of whether or not what AI models can do is considered 'creative thinking'. Shit's getting wild.

load more comments (1 replies)
load more comments (1 replies)