TechTakes

1528 readers

404 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 2 years ago

MODERATORS

[email protected]

Facebook "Secretly Trained Its AI on a Notorious Piracy Database, Newly Unredacted Court Docs Reveal" (awful.systems)

submitted 1 day ago by [email protected] to c/[email protected]

12 comments fedilink hide all child comments

Kate Knibbs reports in Wired magazine:

Against the company’s wishes, a court unredacted information alleging that Meta used Library Genesis (LibGen), a notorious so-called shadow library of pirated books that originated in Russia, to help train its generative AI language models. [...] In his order, Chhabria referenced an internal quote from a Meta employee, included in the documents, in which they speculated, “If there is media coverage suggesting we have used a dataset we know to be pirated, such as LibGen, this may undermine our negotiating position with regulators on these issues.” [...] These newly unredacted documents reveal exchanges between Meta employees unearthed in the discovery process, like a Meta engineer telling a colleague that they hesitated to access LibGen data because “torrenting from a [Meta-owned] corporate laptop doesn’t feel right 😃”. They also allege that internal discussions about using LibGen data were escalated to Meta CEO Mark Zuckerberg (referred to as "MZ" in the memo handed over during discovery) and that Meta's AI team was "approved to use" the pirated material.

top 12 comments

sorted by: hot top controversial new old

[–] [email protected] 1 points 1 hour ago

When I said “libgen is great because information should be free!” this isn’t what I meant… jeez

[–] [email protected] 5 points 14 hours ago

no way, that's illegal!

[–] [email protected] 27 points 22 hours ago (1 children)

Did they seed at least?

[–] [email protected] 17 points 20 hours ago (1 children)

it's facebook, they probably issued a takedown request for all their logged peers

[–] [email protected] 5 points 13 hours ago (1 children)

The pivot-to-ai writeup is out, they did seed! I assume it's documented then.

Multinational corporations can act ethically after all.

[–] [email protected] 2 points 4 hours ago

Multinational corporations can act ethically after all.

I wouldn't go that far

[–] [email protected] 5 points 20 hours ago* (last edited 18 hours ago)

So as libgen is blocked here in .nl by various providers (mine calls it thepiratebay for some reason), i look forward to all their llm being blocked.

[–] [email protected] 25 points 1 day ago (1 children)

Nice! Now simply fine them to pay significant royalty to every author in there, say, a millicent per word of everything they've generated before they get caught.

[–] JeeBaiChow 12 points 1 day ago (1 children)

We should just start a meme movement that makes up an imaginary yet believable fact, like the lemmings jumping off a cliff thing, wait for the ais to repeat it and lobby for royalties. Do one for each of the major ai platforms - openai, reddit, meta, apple, google etc. we would eventually find out which public forums are training which bots.

[–] [email protected] 3 points 18 hours ago (1 children)

Doesn't even have to be believable, LLMs Don not care.

[–] JeeBaiChow 1 points 4 hours ago

And yet these are the things the investment bankers expect to take us to the next level lol

[–] JeeBaiChow 14 points 1 day ago

I used to think they'd just train on every Facebook account that was 'deleted', i.e. removed from the public eye. This feels much worse.