this post was submitted on 21 Jan 2025
159 points (99.4% liked)
Fuck AI
1781 readers
354 users here now
"We did it, Patrick! We made a technological breakthrough!"
A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.
founded 10 months ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I'd be surprised if anything crawled from a site using iocaine actually made it into an LLM training set. GPT 3's initial set of 45 terabytes was reduced to 570 GB, which it was actually trained on. So yeah, there's a lot of filtering/processing that takes place between crawl and train. Then again, they seem to have failed entirely to clean the reddit data they fed into Gemini, so /shrug