I like to compare AI language model training to the early beginnings of music sampling in hip-hop. If they can prove that their works we're used w/o approval I'm guessing the same result will occur.
World News
A community for discussing events around the World
Rules:
-
Rule 1: posts have the following requirements:
- Post news articles only
- Video links are NOT articles and will be removed.
- Title must match the article headline
- Not United States Internal News
- Recent (Past 30 Days)
- Screenshots/links to other social media sites (Twitter/X/Facebook/Youtube/reddit, etc.) are explicitly forbidden, as are link shorteners.
-
Rule 2: Do not copy the entire article into your post. The key points in 1-2 paragraphs is allowed (even encouraged!), but large segments of articles posted in the body will result in the post being removed. If you have to stop and think "Is this fair use?", it probably isn't. Archive links, especially the ones created on link submission, are absolutely allowed but those that avoid paywalls are not.
-
Rule 3: Opinions articles, or Articles based on misinformation/propaganda may be removed. Sources that have a Low or Very Low factual reporting rating or MBFC Credibility Rating may be removed.
-
Rule 4: Posts or comments that are homophobic, transphobic, racist, sexist, anti-religious, or ableist will be removed. “Ironic” prejudice is just prejudiced.
-
Posts and comments must abide by the lemmy.world terms of service UPDATED AS OF 10/19
-
Rule 5: Keep it civil. It's OK to say the subject of an article is behaving like a (pejorative, pejorative). It's NOT OK to say another USER is (pejorative). Strong language is fine, just not directed at other members. Engage in good-faith and with respect! This includes accusing another user of being a bot or paid actor. Trolling is uncivil and is grounds for removal and/or a community ban.
Similarly, if you see posts along these lines, do not engage. Report them, block them, and live a happier life than they do. We see too many slapfights that boil down to "Mom! He's bugging me!" and "I'm not touching you!" Going forward, slapfights will result in removed comments and temp bans to cool off.
-
Rule 6: Memes, spam, other low effort posting, reposts, misinformation, advocating violence, off-topic, trolling, offensive, regarding the moderators or meta in content may be removed at any time.
-
Rule 7: We didn't USED to need a rule about how many posts one could make in a day, then someone posted NINETEEN articles in a single day. Not comments, FULL ARTICLES. If you're posting more than say, 10 or so, consider going outside and touching grass. We reserve the right to limit over-posting so a single user does not dominate the front page.
We ask that the users report any comment or post that violate the rules, to use critical thinking when reading, posting or commenting. Users that post off-topic spam, advocate violence, have multiple comments or posts removed, weaponize reports or violate the code of conduct will be banned.
All posts and comments will be reviewed on a case-by-case basis. This means that some content that violates the rules may be allowed, while other content that does not violate the rules may be removed. The moderators retain the right to remove any content and ban users.
Lemmy World Partners
News [email protected]
Politics [email protected]
World Politics [email protected]
Recommendations
For Firefox users, there is media bias / propaganda / fact check plugin.
https://addons.mozilla.org/en-US/firefox/addon/media-bias-fact-check/
- Consider including the article’s mediabiasfactcheck.com/ link
Might as well sue god at this point. At least it would be a cheaper failure.
We're all influenced by the things we've experienced. Unless it quoting things verbatim as its own content then I don't see the issue.
I mean if I watch something and profit off it or even make my own business that's not anything you can sue for.
Dunno why these folks think they can sue a model trainer.
She claims it regurgitated passages from her book word-for-word. If she has proof of this, it sounds like infringement to me.
Because it's their work being used algorithmically to support someone else's.
Regardless of how you feel about AI, the training models have to exclude copyrighted works to not have this happen, because otherwise it is absolutely true that that AI keeps record of everything fed into it, and if you dont have the rights to what was fed into it, then there's a copyright issue. Because even if it's being reworked and influenced by other works, it is still using other people's stuff to do it. It is, in many ways, an overgrown randomization & automation tool.
The problem is that people dont see AI's as a tool that companies are using, they see it almost like a person learning. It's not like a person learning, and cant be treated the same as say, a consumer reading the book referenced (in this example) for enjoyment.
If I went to an acting class to be trained to act like robert de niro and they used multiple facets of his work over the years to train me, is it infringement? If I go to an art class to learn how to paint like Picasso, and they use his work as reference, is that infringement? In these examples I'm the AI and the class is essentially the trainer. I get that the company is setting up the AI to be a product, but in these examples, I too would be setting myself to be a product if I use my new skills to profit.
All of the litigation isn't necessarily wrong, so far, but AI is happening much too quickly for it to matter. And what's more human, the thing the companies creating new AI are going for, than learning from our arts, languages, culture, etc.?
it is absolutely true that that AI keeps record of everything fed into it
No it isn't.
A properly trained deep learning system will ultimately far smaller than all of the data it's been trained on. It's simply impossible for it to have retained a record of very much of it at all.
When everything is working correctly it shouldn't have any of the actual text stored at all. Certainly every single piece of training data will have left some impression on the model, but that's a very long way from actually storing the training data. The model consists of statistical relationships, not a copy-paste of the inputs.
Strictly speaking there is something resembling text in the model, but it's made up of the smallest possible units of language (unless there's been overfitting, in which case the training has gone wrong and there probably would be a case to answer).
The model builds sentances from a list of "phrases" which don't even need to line up with word boundaries. Things like "is a" might be treated as a "word", as might "ing", if the model finds that to be a useful snippet.