this post was submitted on 17 Jul 2024

194 points (97.1% liked)

Fuck AI

1333 readers

139 users here now

"We did it, Patrick! We made a technological breakthrough!"

A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

founded 7 months ago

MODERATORS

194

Nvidia, Apple, and others allegedly trained AI using 173,000 YouTube videos — professional creators frustrated by latest AI training scandal: Report (www.tomshardware.com)

submitted 3 months ago by [email protected] to c/fuck_ai

29 comments fedilink hide all child comments

Some of the world's wealthiest companies, including Apple and Nvidia, are among countless parties who allegedly trained their AI using scraped YouTube videos as training data. The YouTube transcripts were reportedly accumulated through means that violate YouTube's Terms of Service and have some creators seeing red. The news was first discovered in a joint investigation by Proof News and Wired.

While major AI companies and producers often keep their AI training data secret, heavyweights like Apple, Nvidia, and Salesforce have revealed their use of "The Pile", an 800GB training dataset created by EleutherAI, and the YouTube Subtitles dataset within it. The YouTube Subtitles training data is made up of 173,536 YouTube plaintext transcripts scraped from the site, including 12,000+ videos which have been removed since the dataset's creation in 2020.

Affected parties whose work was purportedly scraped for the training data include education channels like Crash Course (1,862 videos taken for training) and Philosophy Tube (146 videos taken), YouTube megastars like MrBeast (two videos) and Pewdiepie (337 videos), and TechTubers like Marques Brownlee (seven videos) and Linus Tech Tips (90 videos). Proof News created a tool you can use to survey the entirety of the YouTube videos allegedly used without consent.

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 7 points 3 months ago (1 children)

Dude learns how to do plumbing from plumbing channels, makes his own shittier video series on how to do plumbing made out of clips he didn't have the rights to from the plumbing channel

Fixed that for you

[+] [email protected] -9 points 3 months ago* (last edited 3 months ago) (1 children)

made out of clips he didn't have the rights

See, and this is where your showing your ignorance in understanding how currently AI functions.

Yes, it's possible the AI could go and make shittier videos with its new knowledge. As could the novice plumber in the example I gave.

But the AI isn't copying clips of any videos.

It's not a repository of the videos/pictures or words it was exposed to, that it just recalls.

LLMs do not model the world - Sean Carroll

[–] [email protected] 5 points 3 months ago* (last edited 3 months ago) (1 children)

It generates new content that is based on patterns it has acquired from training data. The fact that you can't readily trace/attribute output to specific parts of training data does not make it permissible for a human to cause the LLM to train on that data without permission of the rights holder, or in violation of the content provider's ToS.

I fear you are getting stuck nitpicking my analogy which was a bit simplified.

[–] [email protected] 1 points 3 months ago* (last edited 3 months ago) (1 children)

does not make it permissible for a human to cause the LLM to train on that data without permission of the rights holder

Says who? These videos are out there for people (or things) to see.

If someone was playing some videos to train their dog to to respond to a noise, what business is that of the rights holder?

Show me were in the ToS over a year ago, where it says you're not allowed to train an AI on the video.

Rights holder can't control what people are using the video for. They can control when and how it's delivered, but not who's actually watching it.

[–] [email protected] 1 points 3 months ago (1 children)

Says who? These videos are out there for people (or things) to see.

What an awful troll you are. You conveniently didn't quote the remainder of the sentence so you could try to nitpick a part of my response out of context.

Read the "Permissions and Restrictions" section of the YouTube terms of service.

[–] [email protected] 1 points 3 months ago

Even now, it says nothing about letting AI watch them.