this post was submitted on 13 Jun 2023
9 points (84.6% liked)

Free Open-Source Artificial Intelligence

2889 readers
2 users here now

Welcome to Free Open-Source Artificial Intelligence!

We are a community dedicated to forwarding the availability and access to:

Free Open Source Artificial Intelligence (F.O.S.A.I.)

More AI Communities

LLM Leaderboards

Developer Resources

GitHub Projects

FOSAI Time Capsule

founded 1 year ago
MODERATORS
 

As in the title. I know that the word jailbreak comes from rooting Apple phones or something similar. But I am not sure what can be gained from jailbreaking a language model.

It will be able to say "I can't do that Dave" instead of hallucinating?
Or will only start spewing less sanitary responses?

you are viewing a single comment's thread
view the rest of the comments
[–] Blaed 2 points 1 year ago* (last edited 1 year ago)

It will be able to say “I can’t do that Dave” instead of hallucinating? Or will only start spewing less sanitary responses?

In terms of uncensored model responses - they vary based on model training.

For example, an uncensored model trained on Reddit comments and data may give you different responses than an uncensored model trained on various books or literature. In a way, the variations of models are different 'styles' your chat can assume.

What you get will vary depending on how the training was done, and which transformer architecture your chosen model was built upon (i.e. LLaMA-based models vs GPT-J-based models vs MPT-based models, etc.)

Model responses will also drastically change based on how you prompt your questions or tasks. Especially so for the uncensored ones.

Use certain language - and like any prompt engineering, you can steer the conversation and output in a certain direction. With no guardrails - that can be good or bad, depending on your goal.