Lemmy.World

166,325 readers
6,887 users here now

The World's Internet Frontpage Lemmy.World is a general-purpose Lemmy instance of various topics, for the entire world to use.

Be polite and follow the rules ⚖ https://legal.lemmy.world/tos

Get started

If you would like to make a donation to support the cost of running this platform, please do so at the following donation URLs.

If you can, please use / switch to Ko-Fi, it has the lowest fees for us

Questions/issues post to
To open a ticket
Reporting is to be done via the reporting button under a post/comment.
Additional Report Info HERE
Please note, you will NOT be able to comment or post while on a VPN or Tor connection

Lemmy.World is part of the FediHosting Foundation

founded 1 year ago

ADMINS

submitted 1 year ago by [email protected] to c/[email protected]

4 comments fedilink

An interesting and clever proposal to fix the prompt injection vulnerability.

The author proposes a dual Large Language Model (LLM) system, consisting of a Privileged LLM and a Quarantined LLM.
The Privileged LLM is the core of the AI assistant. It accepts input from trusted sources, primarily the user, and acts on that input in various ways. It has access to tools and can perform potentially destructive state-changing operations.
The Quarantined LLM is used any time untrusted content needs to be worked with. It does not have access to tools and is expected to have the potential to go rogue at any moment.
The Privileged LLM and Quarantined LLM should never directly interact. Unfiltered content output by the Quarantined LLM should never be forwarded to the Privileged LLM.
The system also includes a Controller, which is regular software, not a language model. It handles interactions with users, triggers the LLMs, and executes actions on behalf of the Privileged LLM.
The Controller stores variables and passes them to and from the Quarantined LLM, while ensuring their content is never provided to the Privileged LLM.
The Privileged LLM only ever sees variable names and is never exposed to either the untrusted content from the email or the tainted summary that came back from the Quarantined LLM.
The system should be cautious with chaining, where the output of one LLM prompt is piped into another. This is a dangerous vector for prompt injection.

submitted 1 year ago by [email protected] to c/[email protected]

0 comments fedilink