this post was submitted on 14 Jul 2023
122 points (93.0% liked)

Technology

35113 readers
145 users here now

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.


Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.


Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

founded 5 years ago
MODERATORS
 

Shit in -> shit out ๐Ÿ“ค

you are viewing a single comment's thread
view the rest of the comments
[โ€“] [email protected] 9 points 1 year ago (1 children)

From the article:

Knowing means that the search for a watermark that identifies AI-generated content (and that's infallible) has now become a much more important - and lucrative - endeavor, and that the responsibility for labeling AI-generated data has now become a much more serious requirement.

Simply wanting such a thing to exist isn't going to magically make it happen. I seriously doubt that any such "watermark" (I think they meant "fingerprint" since it'd need to work even if not deliberately added) can be found.

I suspect the actual solution is to curate the quality of the input data, regardless of whether it's AI-generated or not. The problem of autophagy is the loss of rare inputs, so try to ensure those inputs are found and included in the input data. It's probably fine to have some AI generated content in the training data in addition to the real stuff. Indeed, as long as the AI-generated content is subject to the same sort of selective pressure as the real content it's probably good to have.

[โ€“] [email protected] 0 points 1 year ago (1 children)

We'd need to test and see if AI-generated content that is curated by human quality assurance still causes MADness.

My suspicion is that would only slow down the degradation of the outputs, rather than stop it completely.

[โ€“] [email protected] 3 points 1 year ago

I wasn't proposing only using curated AI-generated content. If the problem is the loss of "rare data" from the edges, then adding some AI-generated data to a data set that still includes that rare data shouldn't be a problem.

The article doesn't say that AI-generated data is somehow "infectious", just that the data set becomes more and more limited with each cycle since rare information gets lost each time.