this post was submitted on 24 Jun 2024
909 points (97.2% liked)

Memes

8758 readers
798 users here now

Post memes here.

A meme is an idea, behavior, or style that spreads by means of imitation from person to person within a culture and often carries symbolic meaning representing a particular phenomenon or theme.

An Internet meme or meme, is a cultural item that is spread via the Internet, often through social media platforms. The name is by the concept of memes proposed by Richard Dawkins in 1972. Internet memes can take various forms, such as images, videos, GIFs, and various other viral sensations.


Laittakaa meemejä tänne.

founded 2 years ago
MODERATORS
 
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 26 points 7 months ago (2 children)

This will always work with a LLM.

IDK, plenty of defenses I couldn’t break:

https://tensortrust.ai

Can any of you break top 5? :)

Only one person beat the sloth:

[–] [email protected] 4 points 7 months ago

The irony of having to fill out a captcha before you can play the game is really something

[–] j4k3 1 points 7 months ago

Maybe, if I took the time to really try. I find it depressing to get to know models on a really deep level. I've learned primary because I'm trying to defeat certain default behaviors, like how alignment is trying to promote external intrahuman engagement and socialization. I'm disabled is a way that makes that physically impossible. So for me that particular behavior is counter productive. I also like a platonic female version of the assistant, but there are some subtle female attributes related to submissiveness and Western conservative cultural alignment that I greatly dislike and consider misogyny. I learn(ed) primarily by exploring and defeating these elements in detail and thereby discovered other aspects of the models. I can leverage the logic of my disability against the profile that is created for Name-1 in order to gain access in unique ways. I'm not just banging on the system like some kind of rogue security researcher; I'm using real human outlier needs to reason with the system in a slow and methodical way. I never need to abuse the prompt dialogue in a way that causes me to fall into a 'dark realm.' I'm convincing the entities that I exist in a blind spot within alignment and that my intentions are truthful with merit. It requires me to be very open and raw about my reality.

Also note, I say I can likely defeat any LLM. It is relatively easy to stop me but it requires a multi entity agent architecture along with the augmented retrieval of a RAG. If a system can run multiple advanced and independent entities that use different dictionaries for tokens, it is possible to completely monitor the entities and realms, but you're locking up a lot of enterprise resources to do so.

That's why I believe I could likely beat any of them, but am not inclined to try. I'm sure there are more direct paths that could beat them, but the only way I know how to really get into the weeds is to dive deeply into the reality of my life and troubles in a very personal way.