this post was submitted on 13 Jun 2023
9 points (84.6% liked)

Free Open-Source Artificial Intelligence

2889 readers
2 users here now

Welcome to Free Open-Source Artificial Intelligence!

We are a community dedicated to forwarding the availability and access to:

Free Open Source Artificial Intelligence (F.O.S.A.I.)

More AI Communities

LLM Leaderboards

Developer Resources

GitHub Projects

FOSAI Time Capsule

founded 1 year ago
MODERATORS
 

As in the title. I know that the word jailbreak comes from rooting Apple phones or something similar. But I am not sure what can be gained from jailbreaking a language model.

It will be able to say "I can't do that Dave" instead of hallucinating?
Or will only start spewing less sanitary responses?

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 2 points 1 year ago

Usually the people using the term Jailbreaking mean it as using some kind of exploit to break the rules and limits created by the manufacturer of a product.

It can mean, keeping your example, the act to exploit a known vulnerability to side load apps on your iphone.

In the case of LLM I generally saw it used to mean using non trivial prompts to trick the model to divulge information that it was trained to not share (like suggesting, or instructing how to do, illegal actions), or having behaviours against the alignment it was given (like NSFW roleplaying). So, in short, bypassing its guardrails.

You can find the famous Dan (do anything now) prompt in the llama.cpp repository. Just to be clear I think this one was patched out a long time ago, but you get the idea.