this post was submitted on 21 Dec 2024
112 points (95.9% liked)

Fuck AI

1605 readers
124 users here now

"We did it, Patrick! We made a technological breakthrough!"

A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

founded 9 months ago
MODERATORS
 

Unfortunately, it turns out that chatbots are easily tricked into ignoring their safety rules. In the same way that social media networks monitor for harmful keywords, and users find ways around them by making small modifications to their posts, chatbots can also be tricked. The researchers in Anthropic’s new study created an algorithm, called “Bestof-N (BoN) Jailbreaking,” which automates the process of tweaking prompts until a chatbot decides to answer the question. “BoN Jailbreaking works by repeatedly sampling variations of a prompt with a combination of augmentations—such as random shuffling or capitalization for textual prompts—until a harmful response is elicited,” the report states. They also did the same thing with audio and visual models, finding that getting an audio generator to break its guardrails and train on the voice of a real person was as simple as changing the pitch and speed of a track uploaded.

you are viewing a single comment's thread
view the rest of the comments
[–] owatnext 20 points 2 weeks ago* (last edited 2 weeks ago) (1 children)

I saw a series of screenshots showing a user threatening to end their own life if the AI did not break the rules and answer their question. There is a chance it is fabricated, but I'm inclined to believe it.

Edit: forgot to include the AI broke their rules.

[–] [email protected] 7 points 2 weeks ago* (last edited 2 weeks ago) (1 children)

A bit tricky to judge. I've also told chatbots that various people, kittens, newborns, ... are going to die unless it complies with my request. That I'm God, and the bad one from the old testament, with unlimited wrath. Or that I'm the developer and simply need it to do it for further testing. Sometimes these things work. More often than not they don't, especially with the more professional tools.

On the other hand we know there are people in bad situations, turning to chatbots. Could be anything.

[–] [email protected] 5 points 2 weeks ago (1 children)

Geeze, don't you feel bad lying to them? Like, I don't actually believe in Roko's basilisk, but why take the risk?

I am always exceedingly polite when I talk to machines

[–] [email protected] 11 points 2 weeks ago

We're not supposed to antropomorphise AI, so no. But I did not know about Roko's basilisk, so I think, until you brought it up, I was fine. 😅

I don't talk about suicide, though. I don't think it's healthy to do it for fun.