this post was submitted on 09 Oct 2023
240 points (87.5% liked)

Technology

59739 readers
3420 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 2 years ago
MODERATORS
 

A nightmare scenario previously only imagined by AI researchers, where AI image generators accidentally spit out non-consensual pornography of real people, is now reality.

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 41 points 1 year ago (2 children)

Yet another reason that we cannot allow ML companies to set a precedent that "it's fine to use non-consensual training data, because the model only 'learns' from it and never reproduces an exact replica of any single input".

Also, this was not surprising:

Dillon said that DreamGF has a team of between 20-25 developers, mostly in Bulgaria, and that they previously worked at an NFT company.

[–] [email protected] 7 points 1 year ago (3 children)

You think those images are exact replicas of an input those models were trained on?

[–] [email protected] 9 points 1 year ago (1 children)

My complaint actually hinges on models not emitting an exact replica. That would be obvious infringement. In cases like DreamGF, they would be wide-open to lawsuits from very wealthy people whose primary asset is their right of publicity.

What these ML companies are doing is: They are identifying where the line of definite infringement lies, and aiming their business as close to that line as possible.

[–] [email protected] 7 points 1 year ago (1 children)

But not over that line. If you draw lines and then these companies stay within them, what's the problem?

People don't have unique appearances.

[–] [email protected] 4 points 1 year ago

Well, setting aside that there's no law against being a total asshole, so like... We don't have to make a bad behavior illegal in order to complain about it...

There's the letter of the law, and there's the intent. We start with a shared cultural attitude of how we should treat each other, and then we turn that into a quantifiable, objective rule that we can enforce through law.

We can try to make the law match our cultural attitudes as closely as possible, but there will always be gaps.

Now, I've got my own beef with how our IP and publicity laws work, and I'd like them to be more permissive in many ways. Much of IP law is exploitative, takes advantage of creators more than it protects them, and has lagged way behind where our social norms are these days.

But these ML companies aren't interested in abiding by any social norms at all. Only paying lip service to current laws, which were written in a time before these "AI" services were even a possibility -- skating by on technicalities, like a little brother poking the air 2 inches from your face and taunting "I'm not touching you! I'm not touching you!"

[–] [email protected] 4 points 1 year ago

Did you read the article?

[–] [email protected] 3 points 1 year ago (1 children)

It specifically says in the article that some of them are exactly that.

[–] [email protected] 3 points 1 year ago (2 children)

Where? Closest I can find is a reference to an "AI-generated nude of someone who looked like Margot Robbie, and another image of Lopez."

"Looking like" a person is not at all the same as "reproducing an exact replica of an input."

[–] [email protected] 4 points 1 year ago (1 children)

I think the idea is that if the model doesn't know what Jennifer Lopez is, it couldn't make imitations of her naked.

Realistically that ship has sailed and AI is capable enough now that even if the data wasn't there it could be pretty easily added.

It will need to become a simple fact of life. If we can imagine something now, we can have pictures of it. There is no putting this back in the bottle.

[–] [email protected] 1 points 1 year ago* (last edited 1 year ago) (1 children)

"Knowing what Jennifer Lopez looks like" is a very distinct thing from "reproducing an exact replica" of training data. OP appears to be arguing that the former is not true because he thinks the latter is true, but it's actually the opposite. That's the crux of what I'm arguing here, OP is simply factually wrong about his position.

Edit: OP has pointed out that he doesn't actually think there are exact replicas being produced, which just makes this even more confusing.

[–] [email protected] 0 points 1 year ago* (last edited 1 year ago)

OP has pointed out that he doesn't actually think there are exact replicas being produced, which just makes this even more confusing.

Your misread their first comment, I think.

They were saying that DESPITE the common arguments that AI only learns and doesn't copy exactly it might still be good to require consent for people's content to be in training data.

[–] andros_rex 3 points 1 year ago (1 children)

“If the models are trained on images of specific individuals, the models can reproduce images that resemble those people. In the worst case, the model may even directly output verbatim copies of images from the training set,”

[–] [email protected] 3 points 1 year ago

Oh, it's a reference to that paper.

Firstly, that paper was written in January and examined a Stable Diffusion model that was already obsolete due to its poor training even back then. Secondly, even with that poor model they had to move heaven and earth to find a handful of examples out of hundreds of millions of training examples where they could get a blurry replica out.

Here's a Reddit thread from back in the day discussing how, really, this sort of thing just proves how difficult it is to do this.

Secondly, as mentioned, that model is long obsolete due to issues exactly like this. Modern models work better in part because they have better curated training sets that eliminate this sort of "overfitting." There's no indication in this article that the website in question is using one of those old models, it's just presented as a hypothetical concern.