World News
A community for discussing events around the World
Rules:
-
Rule 1: posts have the following requirements:
- Post news articles only
- Video links are NOT articles and will be removed.
- Title must match the article headline
- Not United States Internal News
- Recent (Past 30 Days)
- Screenshots/links to other social media sites (Twitter/X/Facebook/Youtube/reddit, etc.) are explicitly forbidden, as are link shorteners.
-
Rule 2: Do not copy the entire article into your post. The key points in 1-2 paragraphs is allowed (even encouraged!), but large segments of articles posted in the body will result in the post being removed. If you have to stop and think "Is this fair use?", it probably isn't. Archive links, especially the ones created on link submission, are absolutely allowed but those that avoid paywalls are not.
-
Rule 3: Opinions articles, or Articles based on misinformation/propaganda may be removed. Sources that have a Low or Very Low factual reporting rating or MBFC Credibility Rating may be removed.
-
Rule 4: Posts or comments that are homophobic, transphobic, racist, sexist, anti-religious, or ableist will be removed. “Ironic” prejudice is just prejudiced.
-
Posts and comments must abide by the lemmy.world terms of service UPDATED AS OF 10/19
-
Rule 5: Keep it civil. It's OK to say the subject of an article is behaving like a (pejorative, pejorative). It's NOT OK to say another USER is (pejorative). Strong language is fine, just not directed at other members. Engage in good-faith and with respect! This includes accusing another user of being a bot or paid actor. Trolling is uncivil and is grounds for removal and/or a community ban.
Similarly, if you see posts along these lines, do not engage. Report them, block them, and live a happier life than they do. We see too many slapfights that boil down to "Mom! He's bugging me!" and "I'm not touching you!" Going forward, slapfights will result in removed comments and temp bans to cool off.
-
Rule 6: Memes, spam, other low effort posting, reposts, misinformation, advocating violence, off-topic, trolling, offensive, regarding the moderators or meta in content may be removed at any time.
-
Rule 7: We didn't USED to need a rule about how many posts one could make in a day, then someone posted NINETEEN articles in a single day. Not comments, FULL ARTICLES. If you're posting more than say, 10 or so, consider going outside and touching grass. We reserve the right to limit over-posting so a single user does not dominate the front page.
We ask that the users report any comment or post that violate the rules, to use critical thinking when reading, posting or commenting. Users that post off-topic spam, advocate violence, have multiple comments or posts removed, weaponize reports or violate the code of conduct will be banned.
All posts and comments will be reviewed on a case-by-case basis. This means that some content that violates the rules may be allowed, while other content that does not violate the rules may be removed. The moderators retain the right to remove any content and ban users.
Lemmy World Partners
News [email protected]
Politics [email protected]
World Politics [email protected]
Recommendations
For Firefox users, there is media bias / propaganda / fact check plugin.
https://addons.mozilla.org/en-US/firefox/addon/media-bias-fact-check/
- Consider including the article’s mediabiasfactcheck.com/ link
view the rest of the comments
It doesn't need csam data for training, it just needs to know what a boob looks like, and what a child looks like. I run some sdxl-based models at home and I've observed it can be difficult to avoid more often than you'd think. There are keywords in porn that blend the lines across datasets ("teen", "petite", "young", "small" etc). The word "girl" in particular I've found that if you add that to basically any porn prompt gives you a small chance of inadvertently creating the undesirable. You have to be really careful and use words like "woman", "adult", etc instead to convince your image model not to make things that look like children. If you've ever wondered why internet-based porn generators are on super heavy guardrails, this is why.
Thanks for the reply, it's given me a good idea of what's most likely happening :)
It's a shame that the rest of the thread went to shit, but unfortunately it's an emotional topic, and brings out emotional responses
Always happy to try and productively add to someone's learning.
From a few months ago
https://cyber.fsi.stanford.edu/news/investigation-finds-ai-image-generation-models-trained-child-abuse
I'm not going to say that csam in training sets isn't a problem. However, even if you remove it, the model remains largely the same, and its capabilities remain functionally identical.
At that point it's still using photos of children to generate csam even if you could somehow assure the model is 100% free of csam
That would be true, it'd be pretty difficult to build a model without any pictures of children at all, and then try and describe to the model how to alter an adult to make a child. Is anyone asking for that though? To make it illegal to have regular pictures of children in these datasets?
No but it is a reason why generating csam should be illegal. You're using data trained on pictures of real kids
I'm not arguing whether or not it should be legal, I was just offering my first hand experience in regards to the capabilities of these local models since people seem to be confused as to how this actually works.
I was responding to this part of your comment which directly refers to legality
I guess I just misunderstood what you were arguing then. For posterity: I believe datasets containing children is fine, datasets containing csam is not, and the legality of generating csam should be left up to psychologists on whether or not it is a societal net benefit. Whichever way is better for children that exist is my vote.
It is true, a 10 year old naked woman is just a 30 year old naked woman scaled down by 40%. /s
No buddy, there isn't some vector of "this is the distance between kid and adult" that a model can apply to generate what a hypothetical child looks like. The base model was almost certainly trained on more than just anatomical drawings from Wikipedia - it ate some csam.
If you've seen stuff about "Hitler - Germany + Italy = Mousillini" for models where that's true (which is not universal) it takes an awful lot of training data to establish and strengthen those vectors. Unless the generated images were comically inaccurate then a lot of training went into this too.
Right, and the google image ai gobbled up a bunch of images of black george washington, right? They must have been in the data set, there’s no way to blend a vector from one value to another, like you said. That would be madness. Nope, must have been copious amounts of asian nazis in the training set, since the model is incapable of blending concepts.
From a few months ago
https://cyber.fsi.stanford.edu/news/investigation-finds-ai-image-generation-models-trained-child-abuse
You're incorrect and you should fucking know better.
I have no idea why my comment above was downvoted to hell but AI can't "dream up" what a naked young person looks like. An AI can figure that adults wear different clothes and put a black woman in a revolutionary war outfit. These are totally different concepts.
You can downvote me if you like but your AI generated csam is based on real csam so fuck off. I'm disappointed there is such a large proportion of people defending csam here especially since lemmy should be technically oriented - I expect to see more input from fellow AI fluent people.
You’re spreading misinformation and getting called out for it.
Just a note - csam has been found in model training sets: https://cyber.fsi.stanford.edu/news/investigation-finds-ai-image-generation-models-trained-child-abuse
Ok? Hundreds of images of anything isn't going to necessarily train a model based on billions of images. Have you ever tried to get Stable Diffusion to draw a bow and arrow? Just because it has ever seen something doesn't mean that it has learned it, nor, more importantly, does that mean that is the way it learned it, since we can see that it can infer many concepts from related concepts- pregnant old women, asian nazis, black george washingtons (NONE OF WHICH actually have ever existed or been photographed).. is unclothed children really more of a leap than any of those?
It is, yes. A black George Washington is one known visual motif (a George Washington costume) combined with another known visual motif. A naked prepubescent child isn't just the combination of "naked adult" and "child" naked children don't look like naked adults simply scaled down.
AI can't tell us what something we've never seen looks like... a kid who knows what George Washington and a black woman looks like can imagine a black George Washington. That's probably a helpful analogy, AI can combine simple concepts but it can't innovate - it can dream, but it can't know something that we haven't told it about.
What you're saying is based on the predicate that the system can't draw concepts it has never seen which is simply untrue. Everything else past that is sophistry.
Edit: also not continuing a conversation with someone who is hostile to the basic rules of logic.
You have a basic misunderstanding of how AI works and are endowing it with mystical properties. Generative AI can't accurately infer concepts or items it doesn't understand. It has all the knowledge of the internet but if you ask it to draw a schematic for a hydrogen bomb it'll give you back hallucinated bullshit. I'll grant that there's a small chance that just enough random details have been leaked that the AI may actually know how to build a hydrogen bomb - but it can't infer how that would work from "understanding physics".
Either way, these models were trained on csam, so my initial point is accurate and not misinformation.
It isn't misinformation, though, generative AI needs a basis for it's generation.
The misinformation you’re spreading is related to how it works. A generative AI system will (without prompting away from it) create people with 3 heads, 8 fingers on each hand and multiple legs connecting to each other. Do you think it was trained on that? This argument of “it can generate it, therefore it was trained on it” is ridiculous. You clearly don’t understand how it works.
You're extremely correct when it comes to combining different aspects of existing works to generate something new - but AI can't generate something it doesn't know about. If a generative model knows what a prepubescent naked body looks like it has been exposed to them before. The most generous way to excuse this is that medical diagrams exist and supplied the majority of inputs for any prompts about cp to work off of. A must more realistic view is that some cp made it into the training set.
I don't disagree with any of your assessments but if you wanted a Van Gogh painting of a Glorp from Omnicron Persei 8, you'll get out... something, but because the model has no reference for Glorps it'll be hallucinations or guesses based on other terms it can find.
To be clear, I'm coming at this from the angle as someone who has trained and evaluated models in a company that's used them for the better part of a decade.
I understand I'm going up against your earnestly held belief, but I've seen behind the curtain on a lot of this stuff and hopefully in time the way it works becomes demystified for more people.
For reference, the comment I made was improperly displayed, and I thought I replied to the wrong person. It said:
Has your model seen humans in a profile view? Has it seen armor? Has it seen Van Gogh style paintings? If yes then it can create a combo of those things.
For CSAM it needs to know what porn looks like, what a child looks like and what a naked pubescent body looks like to create it. It didn't make your van Gogh painting from nothing it had an idea of what those things were.
Yes, that's my point. It didn't need to be trained on a portrait of Van Gogh in profile; it had several portraits of Van Gogh, a bunch of faces in profile, and used them to create something new. In the exact same way, a network trained on photos of people that include nude adult bodies and children in innocent situations can feasibly create facsimiles of csam without ever having been trained on it.
Yea, specifically, the model shouldn't have had access to a significant training set on naked prepubescent bodies - that's been my main objection in this thread.
Except you can't know that. CSAM has been found in training data already and as long as they pull from social media they will continue to be trained with more.
https://cyber.fsi.stanford.edu/news/investigation-finds-ai-image-generation-models-trained-child-abuse
Awesome link, I'll share it up thread where someone was asking for it. Yea, it's something that's hard to prove since models aren't upfront with how they're sourcing their data.
Are you paying attention? It didn't need to be trained on a portrait of Van Gogh in profile; it had several portraits of Van Gogh, a bunch of faces in profile, and used them to create something new. In the exact same way, a network trained on photos of people that include nude adult bodies and children in innocent situations can feasibly create facsimiles of csam without ever having been trained on it.
The model should not have had access to naked prepubescent imagery. If it did, that's a problem. My argument in this thread is that it did have access to csam and thus is able to regurgitate them.
I honestly think you and I are in agreement. I'm not arguing that the model is regurgitating known csam but the model ingested csam[1] and the output is derived from that csam. The fact that it can now make csam in the style of Van Gogh is a property of how these models can combine motifs... the fact that it understands how to generate csam at all is the problem.
Ah, I see. I'm sorry; I misunderstood your argument. Yes, given the fact that csam is part of the training data, it would likely be able to reproduce it. I thought your argument was the reverse hypothetical: "If the model is able to produce csam then it must have been trained on csam." which is incorrect. Again, my apologies for misunderstanding.
The bodies of children are not just small versions of adult bodies.There are meaningful differences that an ai wouldn't be able to just guess. Also do you not see any problem in using photos of real children to generate csam? Imagine someone used a picture of your child/niece/nephew to generate porn. Does that not feel wrong to you? It's still using real photos of real children either way, even if it's abstracted through training data.
I'm discussing hypotheticals of cause-and-effect, not ethics. The question is if it possible not if it's moral to do so. Please don't try to shift the topic or try to portray me as possessing an opinion I don't have again.
While I am aware that there are such differences, I don't think it'd be impossible for AI to guess them accurately. Lack of training data would make such less probable, since it'd be less likely to know which nude forms better approximate a realistic depiction of the imagined subject. Essentially, certain distributions of outputs have different probabilities depending on if the training data has csam, but due to the diversity of adult bodies it becomes possible for the model to stumble upon a convincing facsimile. How the images of nude adults are labeled can also impact these distributions.
I don't see a reason to discuss if it's possible to to something if the thing that's being done is morally wrong. If you disagree then let's talk about making a white ethno state or if we can do another Holocaust since morality doesn't matter when discussing hypotheticals
You can't generate csam without photos of children to make up the actual child part of the picture. It doesn't matter if you actually use csam you're still using photos of children to make pornography. Unless you think ai could create a van gogh style picture without any van gogh training data (and if you do then you don't know enough about ai generated photos to talk about them with any authority)
What? You can't think of any reason you might want to discuss whether it's possible to do something that would be immoral if it is? I don't believe you're being honest; I think you're deliberately trying to deflect because you've figured out you're wrong and don't want to admit it. Here's an example to illustrate my point: killing people is generally wrong. Let's say there's some discussion about relaxing restrictions on some tool, say knives. Do you really think there's no point in talking about how it'd be possible to more easily commit murder if such restrictions were relaxed? Discussing the possibility of immoral behavior is an activity that can alter the course of entire civilizations. I cannot fathom how you thought that was a reasonable thing to claim.
Ok. Making a white ethnostate would require committing genocide and forcing all ethnic minorities into a state of subjugation, by definition (unless I'm mistaken). This can theoretically be done without naming these groups in the law. For this reason civil rights are a necessary but not sufficient component in preventing an ethnostate from arising.
There. Happy now? If not, then that's too bad, because that's not what I want to talk about.
Sure, but that's not contradicting my position. Have you stopped disagreeing with me?
If you don't understand that then I'm done here because you either don't understand what "ai" does on a fundamental level or you don't understand how big the difference is between adult and child bodies.
This is a gross conversion to be having on something that is so wrong to do on so many levels.
You can't make an ethno state without genocide so it is wrong and pointless to talk about
You can't make ai csam without harming a child so it is wrong and pointless to talk about
Just like how you can't generate a child without pictures of children to base it on you can't generate them naked without pictures of their bodies. There is a reason pedos are attracted to those bodies and not women with no curves/small men.
I work with children, I see them everyday. The difference is so massive that an ai would not be able to approximate it with just photos of adults. Ai doesn't "know" anything it just has photos that it uses to approximate what is being asked based off it's data. Even if you kept describing in more detail what those bodies looked like it wouldn't be able to create it without anything to base it on. It'd be like creating a van gogh style picture with no van gogh training data, no matter how much you try to describe the details of his style you'll never get the ai to make something like it without the training data.
You can keep disagreeing, keep saying "but with more data" but ai can't make anything original, that is a fundamental misunderstanding of it's abilities. If it doesn't have the data it can't accurately do it.
If the system must see something to generate it, and the system can't generate things that don't exist, then how is it generating pregnant old women?
Because it's a transformation that can be accurately predicted, at least as far as we can conceive. This is sort of the problem with this thread - there are plenty of examples of derivative combinations that are being presented as counter examples but naked children don't just look like adults scaled down. This is a rather unique situation because most people have been parents or siblings and know what naked children look like but photographs of that nudity are restricted and shouldn't be included in model training.
The other example we might have to work with would be copywrited material but we know that models did consume material they weren't licensed to - as a result AI has been able to generate Disney characters and the like in a recognizable way.
k
Bro googled the word vector and was waiting to use it.
No, they's referring to the internal workings of AI models, which are essentially a series of incredibly high-dimension matrices with extra bits around them to make them work. Individual concepts are embedded as vectors in the space that these models work in. That's why linear algebra is brought up so frequently in discussions of AI.
While it's true that linear algebra and vectors are used in learning models, they're not using the term correctly in a way that says they know something about the subject (at least, the modern subject). Concepts aren't embedded as vectors. In older models (before the craze), concepts were manually embedded as numbers or a collection of numbers, which could be a vector (but could be something else as well), and the machine would learn by modifying weights. However, in current models (and by current, I mean at least more than a couple years), concepts are learnt by the machine (weights are still modified by the machine as well) and the machine makes its own connections between features presented to it.
For example, you give it a dataset of 10x10 pixel images (with text descriptions) and it reads that as 100 pixels split into 3 numbers (RGB) and then looks for connections between those numbers and in which pixels. It's not identifying what a boob is, but knows that when an image has 'boob' in the text description then there's a very high likelihood that there will be a circular collection of pixels with lots of red somewhere in the image that are also connected to other pixels that are often also lots of red. That's me breaking down what a human would think given the same task/information, but the reality is the machine will come up with its own connections/concepts which are both often far better than humans (when the model works, at least) and far more ineffable to humans.
From my perspective as an algebraist, you seem to be splitting hairs when you're making a distinction between vectors and n-tuples of real numbers. Furthermore, he's referencing a specific 3blue1brown video. I'm not saying their conclusion is correct; they's dead wrong but that doesn't mean their understanding is so shallow that they're simply repeating a word they heard to sound smart.
Here is an alternative Piped link(s):
specific 3blue1brown video.
Piped is a privacy-respecting open-source alternative frontend to YouTube.
I'm open-source; check me out at GitHub.