this post was submitted on 13 Jun 2023
9 points (84.6% liked)

Free Open-Source Artificial Intelligence

2888 readers
2 users here now

Welcome to Free Open-Source Artificial Intelligence!

We are a community dedicated to forwarding the availability and access to:

Free Open Source Artificial Intelligence (F.O.S.A.I.)

More AI Communities

LLM Leaderboards

Developer Resources

GitHub Projects

FOSAI Time Capsule

founded 1 year ago
MODERATORS
 

As in the title. I know that the word jailbreak comes from rooting Apple phones or something similar. But I am not sure what can be gained from jailbreaking a language model.

It will be able to say "I can't do that Dave" instead of hallucinating?
Or will only start spewing less sanitary responses?

you are viewing a single comment's thread
view the rest of the comments
[–] deavid 6 points 1 year ago (3 children)

Large language models from corporations like OpenAI or Google need to limit the abilities of their AIs to prevent users from receiving potentially harmful or illegal instructions, as this could lead to a lawsuit.

So for example if you ask it how to break into a car or how to make drugs, the AI will reject the request and give you "alternatives".

It also happens for medical advice, and when treating the AI like a human.

Jailbreaking here refers to misleading the AI to a point that it will ignore these safeguards and tell you what you want.

[–] INeedMana 1 points 1 year ago (2 children)

So there's probably little to be gained from jailbreaking on HuggingFace chat?

[–] deavid 5 points 1 year ago

so far most models in HuggingFace are also "censored", so maybe something can be gained. But over there are "uncensored" models that can be used instead.

load more comments (1 replies)
load more comments (1 replies)