Great article. Other discussions on AI training consistently discuss how data collected now from social media might be poisoned and can't inherently be trusted with all the new chatbots and that RLHF will need to be used making it that much more expensive and difficult. The final line of this article puts the problem of data poisoning into full perspective.
Excellent Reads
Are you tired of clickbait and the current state of journalism? This community is meant to remind you that excellent journalism still happens. While not sticking to a specific topic, the focus will be on high-quality articles and discussion around their topics.
Politics is allowed, but should not be the main focus of the community.
Submissions should be articles of medium length or longer. As in, it should take you 5 minutes or more to read it. Article series’ would also qualify.
Please either submit an archive link, or include it in your summary.
Rules:
- Common Sense. Civility, etc.
- Server rules.
I never thought about it like that, but you're right on, the data quality matters. I saw discussion on another board how all of the Reddit data that we use in our searches might become extremely valuable since was majority genuine human.
Of course, obligatory fuck u/spez for his handling of what we all created, but there's no reason we can't do it again here.