Lemmy.World

166,325 readers
6,887 users here now

The World's Internet Frontpage Lemmy.World is a general-purpose Lemmy instance of various topics, for the entire world to use.

Be polite and follow the rules โš– https://legal.lemmy.world/tos

Get started

See the Getting Started Guide

Donations ๐Ÿ’—

If you would like to make a donation to support the cost of running this platform, please do so at the following donation URLs.

If you can, please use / switch to Ko-Fi, it has the lowest fees for us

Ko-Fi (Donate)

Bunq (Donate)

Open Collective backers and sponsors

Patreon

Liberapay patrons

GitHub Sponsors

Join the team ๐Ÿ˜Ž

Check out our team page to join

Questions / Issues

More Lemmy.World

Follow us for server news ๐Ÿ˜

Mastodon Follow

Chat ๐Ÿ—จ

Discord

Matrix

Alternative UIs

Monitoring / Stats ๐ŸŒ

Service Status ๐Ÿ”ฅ

https://status.lemmy.world

Mozilla HTTP Observatory Grade

Lemmy.World is part of the FediHosting Foundation

founded 1 year ago
ADMINS
1
 
 

An interesting and clever proposal to fix the prompt injection vulnerability.

  • The author proposes a dual Large Language Model (LLM) system, consisting of a Privileged LLM and a Quarantined LLM.
  • The Privileged LLM is the core of the AI assistant. It accepts input from trusted sources, primarily the user, and acts on that input in various ways. It has access to tools and can perform potentially destructive state-changing operations.
  • The Quarantined LLM is used any time untrusted content needs to be worked with. It does not have access to tools and is expected to have the potential to go rogue at any moment.
  • The Privileged LLM and Quarantined LLM should never directly interact. Unfiltered content output by the Quarantined LLM should never be forwarded to the Privileged LLM.
  • The system also includes a Controller, which is regular software, not a language model. It handles interactions with users, triggers the LLMs, and executes actions on behalf of the Privileged LLM.
  • The Controller stores variables and passes them to and from the Quarantined LLM, while ensuring their content is never provided to the Privileged LLM.
  • The Privileged LLM only ever sees variable names and is never exposed to either the untrusted content from the email or the tainted summary that came back from the Quarantined LLM.
  • The system should be cautious with chaining, where the output of one LLM prompt is piped into another. This is a dangerous vector for prompt injection.
2
view more: next โ€บ