Free Open-Source Artificial Intelligence

2930 readers

2 users here now

Welcome to Free Open-Source Artificial Intelligence!

We are a community dedicated to forwarding the availability and access to:

Free Open Source Artificial Intelligence (F.O.S.A.I.)

More AI Communities

LLM Leaderboards

Developer Resources

GitHub Projects

GitHub Stars

FOSAI Time Capsule

founded 2 years ago

MODERATORS

Blaed

fosai

AI training data has a price tag that only Big Tech can afford (techcrunch.com)

submitted 6 months ago by [email protected] to c/fosai

6 comments fedilink hide all child comments

top 6 comments

sorted by: hot top controversial new old

[–] grue 14 points 6 months ago

All this data was "crowdsourced" -- i.e., stolen -- from the public in the first place. As far as I'm concerned, they owe us and have no room to complain if we steal it right back.

[–] [email protected] 2 points 6 months ago* (last edited 6 months ago)

I recently listened to this German language podcast episode about the social cost and how life is for a few clickworkers in Africa: Das Wissen | SWR: Clickworker – Ausgebeutet für künstliche Intelligenz

[–] [email protected] 1 points 6 months ago (3 children)

Based on the post title alone, I call bull because I could buy enough storage and pirate enough books in order to create an AI, using copyrighted material as the training data. Yes it would be an absolutely horrible AI since I don't have a clue what I'd be doing, but it's possible.

[–] [email protected] 7 points 6 months ago

Then go ahead and buy 2000 Nvidia cards.

The training data is important, but currently the bottleneck is computing power. Buying so many chips and having them run full blast 24/7 costs a lot of money.

[–] Audalin 4 points 6 months ago

You can get your hands on books3 or any other dataset that was exposed to the public at some point, but large companies have private human-filtered high-quality datasets that perform better. You're unlikely to have the resources to do the same.

[–] General_Effort 1 points 6 months ago

It's not clear if this is piracy. In the US, it's obviously an ongoing fight. Basically, what you describe is "books3", put together with scripts by Aaron Swartz.

It's legal in Japan, if the purpose is only AI training and not enjoyment. I'm not sure if there are issues regarding DRM or such.

In the EU, the dataset and resulting model would be illegal. Any business offering the model would be in hot water, but I think internal use would be fine.