this post was submitted on 19 Jan 2025
42 points (76.9% liked)

No Stupid Questions

36430 readers
1848 users here now

No such thing. Ask away!

!nostupidquestions is a community dedicated to being helpful and answering each others' questions on various topics.

The rules for posting and commenting, besides the rules defined here for lemmy.world, are as follows:

Rules (interactive)


Rule 1- All posts must be legitimate questions. All post titles must include a question.

All posts must be legitimate questions, and all post titles must include a question. Questions that are joke or trolling questions, memes, song lyrics as title, etc. are not allowed here. See Rule 6 for all exceptions.



Rule 2- Your question subject cannot be illegal or NSFW material.

Your question subject cannot be illegal or NSFW material. You will be warned first, banned second.



Rule 3- Do not seek mental, medical and professional help here.

Do not seek mental, medical and professional help here. Breaking this rule will not get you or your post removed, but it will put you at risk, and possibly in danger.



Rule 4- No self promotion or upvote-farming of any kind.

That's it.



Rule 5- No baiting or sealioning or promoting an agenda.

Questions which, instead of being of an innocuous nature, are specifically intended (based on reports and in the opinion of our crack moderation team) to bait users into ideological wars on charged political topics will be removed and the authors warned - or banned - depending on severity.



Rule 6- Regarding META posts and joke questions.

Provided it is about the community itself, you may post non-question posts using the [META] tag on your post title.

On fridays, you are allowed to post meme and troll questions, on the condition that it's in text format only, and conforms with our other rules. These posts MUST include the [NSQ Friday] tag in their title.

If you post a serious question on friday and are looking only for legitimate answers, then please include the [Serious] tag on your post. Irrelevant replies will then be removed by moderators.



Rule 7- You can't intentionally annoy, mock, or harass other members.

If you intentionally annoy, mock, harass, or discriminate against any individual member, you will be removed.

Likewise, if you are a member, sympathiser or a resemblant of a movement that is known to largely hate, mock, discriminate against, and/or want to take lives of a group of people, and you were provably vocal about your hate, then you will be banned on sight.



Rule 8- All comments should try to stay relevant to their parent content.



Rule 9- Reposts from other platforms are not allowed.

Let everyone have their own content.



Rule 10- Majority of bots aren't allowed to participate here.



Credits

Our breathtaking icon was bestowed upon us by @Cevilia!

The greatest banner of all time: by @TheOneWithTheHair!

founded 2 years ago
MODERATORS
 

doesn't it follow that AI-generated CSAM can only be generated if the AI has been trained on CSAM?

This article even explicitely says as much.

My question is: why aren't OpenAI, Google, Microsoft, Anthropic... sued for possession of CSAM? It's clearly in their training datasets.

top 48 comments
sorted by: hot top controversial new old
[–] BradleyUffner 9 points 12 hours ago (1 children)

The AI can generate a picture of cows dancing with roombas on the moon. Do you think it was trained on images of cows dancing with roombas on the moon?

[–] [email protected] 1 points 3 hours ago

Individually, yes. Thousands of cows, thousands of "dancing"s, thousands of roombas, and thousands of "on the moon"s.

[–] [email protected] 6 points 1 day ago

It doesn't need CSAM in the dataset to generate images that would be considered CSAM.

I'm sure they take good effort to stay away from that stuff as it's bad for business.

[–] [email protected] 53 points 1 day ago (1 children)

Well, it can draw an astronaut on a horse, and I doubt it had seen lots of astronauts on horses...

[–] Ragdoll_X 16 points 1 day ago* (last edited 23 hours ago)

doesn’t it follow that AI-generated CSAM can only be generated if the AI has been trained on CSAM?

Not quite, since the whole thing with image generators is that they're able to combine different concepts to create new images. That's why DALL-E 2 was able to create a images of an astronaut riding a horse on the moon, even though it never saw such images, and probably never even saw astronauts and horses in the same image. So in theory these models can combine the concept of porn and children even if they never actually saw any CSAM during training, though I'm not gonna thoroughly test this possibility myself.

Still, as the article says, since Stable Diffusion is publicly available someone can train it on CSAM images on their own computer specifically to make the model better at generating them. Based on my limited understanding of the litigations that Stability AI is currently dealing with (1, 2), whether they can be sued for how users employ their models will depend on how exactly these cases play out, and if the plaintiffs do win, whether their arguments can be applied outside of copyright law to include harmful content generated with SD.

My question is: why aren’t OpenAI, Google, Microsoft, Anthropic… sued for possession of CSAM? It’s clearly in their training datasets.

Well they don't own the LAION dataset, which is what their image generators are trained on. And to sue either LAION or the companies that use their datasets you'd probably have to clear a very high bar of proving that they have CSAM images downloaded, know that they are there and have not removed them. It's similar to how social media companies can't be held liable for users posting CSAM to their website if they can show that they're actually trying to remove these images. Some things will slip through the cracks, but if you show that you're actually trying to deal with the problem you won't get sued.

LAION actually doesn't even provide the images themselves, only linking to images on the internet, and they do a lot of screening to remove potentially illegal content. As they mention in this article there was a report showing that 3,226 suspected CSAM images were linked in the dataset, of which 1,008 were confirmed by the Canadian Centre for Child Protection to be known instances of CSAM, and others were potential matching images based on further analyses by the authors of the report. As they point out there are valid arguments to be made that this 3.2K number can either be an overestimation or an underestimation of the true number of CSAM images in the dataset.

The question then is if any image generators were trained on these CSAM images before they were taken down from the internet, or if there is unidentified CSAM in the datasets that these models are being trained on. The truth is that we'll likely never know for sure unless the aforementioned trials reveal some email where someone at Stability AI admitted that they didn't filter potentially unsafe images, knew about CSAM in the data and refused to remove it, though for obvious reasons that's unlikely to happen. Still, since the LAION dataset has billions of images, even if they are as thorough as possible in filtering CSAM chances are that at least something slipped through the cracks, so I wouldn't bet my money on them actually being able to infallibly remove 100% of CSAM. Whether some of these AI models were trained on these images then depends on how they filtered potentially harmful content, or if they filtered adult content in general.

[–] [email protected] 29 points 1 day ago

a GPT can produce things it's never seen.

It can produce a galaxy made out of dog food; doesn't mean it was trained on pictures of galaxies made out of dog food.

[–] [email protected] 23 points 1 day ago (1 children)

A fun anecdote is that when my friends and I tried the then brand new MS image gen AI built into Bing(for the purpose of a fake tinder profile, long story).

The generator kept hitting walls because it had been fed so much porn that the model averaged women to be by default nude in images. You had to specify that what clothes a woman was wearing. Not even just "clothed", then it defaulted to lingerie or bikinis.

Not men though. Men it defaulted to being clothed.

[–] [email protected] 8 points 1 day ago (1 children)

I mean, Bing has proven itself to the the best search engine for porn - so it kinda stands to reason that their AI model would have a particular knack for generating even more of the stuff!

[–] [email protected] 1 points 1 day ago

Their image gen app isn't theirs through and through. It runs on Dall-e

[–] [email protected] 16 points 1 day ago (1 children)

If AI spits out stuff it's been trained on

For Stable Diffusion, it really doesn't just spit out what it's trained on. Very loosely, it starts with white noise, then adds noise and denoises the result based on your prompt, and it keeps doing this until it converges to a representation of your prompt.

IMO your premise is closer to true in practice, but still not strictly true, about large language models.

[–] [email protected] 2 points 12 hours ago

It’s akin to virtually starting with a block of marble and removing every part (pixel) that isn’t the resulting image. Crazy how it works.

[–] [email protected] 15 points 1 day ago

The article is bullshit that wants to stir shit up for more clicks.

You don't need a single CSAM image to train AI to make fake CSAM. In fact, if you used the images from the database of known CSAM, you'd get very shit results because most of them are very old and thus the quality most likely sucks.

Additionally, in another comment you mention that it's users training their models locally, so that answers your 2nd question of why companies are not sued: they don't have CSAM in their training dataset.

[–] [email protected] 10 points 1 day ago

It probably won't yield good results for the literal query "child porn" because such content on the open web is censored, but I'm pretty sure degenerates know workarounds such as "young, short, naked, flat chested, no pubic hair", all of which exist plentifully in isolation. Just my guess, I haven't tried of course.

[–] [email protected] 9 points 1 day ago (2 children)

First of all, it's by definition not CSAM if it's AI generated. It's simulated CSAM - no people were harmed doing it. That happened when the training data was created.

However it's not necessary that such content even exists in the training data. Just like ChatGPT can generate sentences it has never seen before, image generators can also generate pictures it has not seen before. Ofcourse the results will be more accurate if that's what it has been trained on but it's not strictly necessary. It just takes a skilled person to write the prompt.

My understanding is that the simulated CSAM content you're talking about has been made by people running their software locally and having provided the training data themselves.

[–] Buffalox 0 points 1 day ago* (last edited 1 day ago) (2 children)

First of all, it’s by definition not CSAM if it’s AI generated. It’s simulated CSAM

This is blatantly false. It's also illegal and you can go to prison for owning selling or making child Lolita dolls.

I don't know why this is the legal position in most places. Because as you mention no one is harmed.

[–] [email protected] 1 points 1 day ago

Dumb internet argument from here on down; advise the reader to do something else with their time.

[–] [email protected] 1 points 1 day ago (1 children)

What's blatantly false about what I said?

[–] Buffalox 2 points 1 day ago* (last edited 1 day ago) (1 children)

CSAM = Child sexual abuse material
Even virtual material is still legally considered CSAM in most places. Although no children were hurt, it's a depiction of it, and that's enough.

[–] [email protected] 0 points 1 day ago (1 children)

Being legally considered CSAM and actually being CSAM are two different things. I stand behind what I said which wasn't legal advise. By definition it's not abuse material because nobody has been abused.

[–] Buffalox -3 points 1 day ago* (last edited 1 day ago) (2 children)

There's a reason it's legally considered CSAM. as I explained it is material that depicts it.
You can't have your own facts, especially not contrary to what's legally determined, because that means your definition or understanding is actually ILLEGAL!! If you act based on it.

[–] [email protected] 2 points 1 day ago

Which law are you speaking about?

[–] [email protected] 2 points 1 day ago* (last edited 1 day ago) (1 children)

I already told you that I'm not speaking from legal point of view. CSAM means a specific thing and AI generated content doesn't fit under this definition. The only way to generate CSAM is by abusing children and taking pictures/videos of it. AI content doesn't count any more than stick figure drawings do. The justice system may not differentiate the two but that is not what I'm talking about.

[–] Buffalox -3 points 1 day ago* (last edited 1 day ago) (1 children)

The only way to generate CSAM is by abusing children and taking pictures/videos of it.

Society has decided otherwise, as I wrote, you can't have your own facts or definitions. You might as well claim that in traffic red means go, because you have your own interpretation of how traffic lights should work.
Red is legally decided to mean stop, so that's how it is, that's how our society works by definition.

[–] [email protected] 2 points 1 day ago (1 children)

Please tell me what own fact/definitions I'm spreading here. To me it seems like it's you whose taking a self-explainatory, narrow definition and stretching the meaning of it.

[–] Rhynoplaz 3 points 1 day ago* (last edited 1 day ago) (1 children)

Hi there, I'm a random passerby listening in on your argument!

You both make great points, and I'm not sure if there's a misunderstanding here, because I don't see why this is still going back and forth.

I agree with Free, that if an AI creates an image of CSAM, that there is no child being abused and that it is not anywhere near the same level of evil as actual photographs of CSAM. Different people will have different opinions on that, and that's fine, it's a topic that deserves debate.

Buffalox, is saying that your personal stance on the topic doesn't really matter if the law has deemed it so. Which is also correct. When we talk about drugs, some people do not consider cannabis to be "a drug", others consider caffeine and sugar to be drugs, but no matter where you stand, there IS a defined list of what you can get arrested for, and no matter how I try to spin the "secret medicinal advantages of meth" (that's a joke, there are none.) it's not going to keep me out of prison.

You're both making valid arguments that don't necessarily conflict with each other.

[–] [email protected] 1 points 15 hours ago* (last edited 15 hours ago) (1 children)

For me, this was at no point about the morality of it. I've been strictly talking about the definition of terms. While laws often prohibit both CSAM and depictions of it, there's still a difference between the two. CSAM is effectively synonymous with "evidence of crime" If it's AI generated, photoshopped, drawn or what ever, then there has not been a crime and thus the content doesn't count as evidence of it. Abuse material literally means what it says; it's video/audio/picture content of the event itself. It's illegal because producing it without harming children is impossible.

EDIT: It's kind of same as calling AI generated pictures photographs. They're not photographs. Photographs are taken with a camera. Even if the picture an AI generates is indistinguishable from a photograph it still doesn't count as one because no cameras were involved.

[–] Rhynoplaz 1 points 14 hours ago (1 children)

Right. I get you, and I agree, and I don't think Buffalox was contradicting you by essentially saying "even if they technically aren't the same, your government may still count it as the same."

[–] [email protected] 2 points 14 hours ago

Yeah, and I think Buffalox agrees aswell. We were simply talking past each other. Even they used the term "depictions of CSAM" which is the same as the "simulated CSAM" term I was using myself.

[–] [email protected] -4 points 1 day ago (2 children)

it’s by definition not CSAM if it’s AI generated

Tell that to the judge. People caught with machine-made imagery go to the slammer just as much as those caught with the real McCoy.

[–] [email protected] 5 points 1 day ago

Have there been cases like that already?

[–] [email protected] 5 points 1 day ago

It's not legal advice I'm giving here.

[–] YungOnions 7 points 1 day ago (1 children)

Sexton says criminals are using older versions of AI models and fine-tuning them to create illegal material of children. This involves feeding a model existing abuse images or photos of people’s faces, allowing the AI to create images of specific individuals. “We’re seeing fine-tuned models which create new imagery of existing victims,” Sexton says. Perpetrators are “exchanging hundreds of new images of existing victims” and making requests about individuals, he says. Some threads on dark web forums share sets of faces of victims, the research says, and one thread was called: “Photo Resources for AI and Deepfaking Specific Girls.”

The model hasn't necessarily been trained with CSAM, rather you can create things called LORAs which help influence the image output of a model so that it's better at producing very specific content that it may have struggled with before. For example I downloaded some recently that help Stable Diffusion create better images of Battleships from Warhammer 40k. My guess is that criminals are creating their own versions for kiddy porn etc.

[–] [email protected] 3 points 1 day ago

This is one of those things where both are likely to be true. All webscale datasets have a problem with porn and csam, and it's like that people wanting to generate csam use their own fine tuned models.

Here's an example story. https://cyber.fsi.stanford.edu/news/investigation-finds-ai-image-generation-models-trained-child-abuse and it's very likely that this was the tip of the iceberg, and there's more csam still in these datasets.

[–] [email protected] 5 points 1 day ago

I think you misunderstand what's happening.

It isn't that, as an example to represent the idea, openai is training their models on kiddie porn.

It's that people are taking ai software, and then training it on their existing material. The wired article even specifically says they're issuing older versions of the software to bypass safeguards that are in place to prevent it now.

This isn't to say that any of the companies involved in offering generative software don't have such imagery in the data used to train their models. But they wouldn't have to possess it for it to be in there. Most of those assholes just grabbed giant datasets and plugged them in. They even used scrapers for some of it. So all it would take is them accessing some of it unintentionally for their software to end up able to generate new material. They don't need to store anything once the software is trained.

Currently, none of them lack some degree of prevention in their products to prevent it being used for that. How good those protections are, I have zero clue. But they've all made noises about it.

But don't forget, one of the earlier iterations of software designed to identify kiddie porn was trained on seized materials. The point of that is that there are exceptions to possession. The various agencies that investigate sexual abuse of minors tend to keep materials because they need it to track down victims, have as evidence, etc. It's that body of data that made detection something that can be automated. While I have no idea if it happened, it wouldn't be surprising if some company or another did scrape that data at some point. That's just a tangent rather than part of your question.

So, the reason that they haven't been "sued" is that they likely don't have any materials to be "sued" for in the first place.

Besides, not all generated materials are made based on existing supplies. Some of it is made akin to a deepfake, where someone's face is pasted onto a different body. So, they can take materials of perfectly legal adults that look young, slap real or fictional children's faces onto them, and have new stuff to spread around. That doesn't require any original material at all. You could, as I understand it, train an generative model on that and it would turn out realistic fully generative materials. All of that is still illegal, but it's created differently.

[–] DragonsInARoom 1 points 1 day ago

I would imagine that ai generated csam can be "had" in big tech ai in two ways: contamination, and training from an analog. Contamination would be the training passes of the ai using the data being introduced into an uncontaminated training pool. (Not introducing raw csam material). Training from analogous data is what the name states, get as close to the csam material as possible without raising eyebrows. Or the criminals could train off of "fresh" unknown to lawenforcment csam.

[–] Battle_Masker 1 points 1 day ago

those are big companies. they have more legal protection than anyone in the world. and money if judges/law enforcement still consider moving a case forward

[–] bokherif 1 points 1 day ago (1 children)

Grok literally says it would protect 1 jewish person’s life over 1 million non-jewish people. Wonder what they are training that shit on lol.

[–] [email protected] 1 points 1 day ago

would it suck off one non-jewish man to save 1 millinons jewish lives?