this post was submitted on 11 Dec 2024
7 points (100.0% liked)

Artificial Intelligence

1364 readers
1 users here now

Welcome to the AI Community!

Let's explore AI passionately, foster innovation, and learn together. Follow these guidelines for a vibrant and respectful community:

You can access the AI Wiki at the following link: AI Wiki

Let's create a thriving AI community together!

founded 1 year ago
 

Is it possible to train reward models to be both truthful and politically unbiased?

This is the question that the CCC team, led by PhD candidate Suyash Fulay and Research Scientist Jad Kabbara, sought to answer. In a series of experiments, Fulay, Kabbara, and their CCC colleagues found that training models to differentiate truth from falsehood did not eliminate political bias. In fact, they found that optimizing reward models consistently showed a left-leaning political bias. And that this bias becomes greater in larger models. “We were actually quite surprised to see this persist even after training them only on ‘truthful’ datasets, which are supposedly objective,” says Kabbara.

top 3 comments
sorted by: hot top controversial new old
[–] [email protected] 5 points 6 days ago* (last edited 6 days ago)

"may also be biased, even when trained on statements known to be objectively truthful."

I feel like computer science aggressively ignores the humanities/philosophy as a waste of time and then fundamentally undermines and hopelessly entraps itself in the wrong questions for doing so.

[–] vzq 5 points 6 days ago

Maybe it’s because a certain end of the political spectrum JUST LIES ALL THE TIME?

[–] [email protected] 4 points 6 days ago

Reality has a political bias.