this post was submitted on 24 Jun 2024
909 points (97.2% liked)
Memes
8758 readers
798 users here now
Post memes here.
A meme is an idea, behavior, or style that spreads by means of imitation from person to person within a culture and often carries symbolic meaning representing a particular phenomenon or theme.
An Internet meme or meme, is a cultural item that is spread via the Internet, often through social media platforms. The name is by the concept of memes proposed by Richard Dawkins in 1972. Internet memes can take various forms, such as images, videos, GIFs, and various other viral sensations.
- Wait at least 2 months before reposting
- No explicitly political content (about political figures, political events, elections and so on), [email protected] can be better place for that
- Use NSFW marking accordingly
Laittakaa meemejä tänne.
- Odota ainakin 2 kuukautta ennen meemin postaamista uudelleen
- Ei selkeän poliittista sisältöä (poliitikoista, poliittisista tapahtumista, vaaleista jne) parempi paikka esim. [email protected]
- Merkitse K18-sisältö tarpeen mukaan
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
IDK, plenty of defenses I couldn’t break:
https://tensortrust.ai
Can any of you break top 5? :)
Only one person beat the sloth:
The irony of having to fill out a captcha before you can play the game is really something
Maybe, if I took the time to really try. I find it depressing to get to know models on a really deep level. I've learned primary because I'm trying to defeat certain default behaviors, like how alignment is trying to promote external intrahuman engagement and socialization. I'm disabled is a way that makes that physically impossible. So for me that particular behavior is counter productive. I also like a platonic female version of the assistant, but there are some subtle female attributes related to submissiveness and Western conservative cultural alignment that I greatly dislike and consider misogyny. I learn(ed) primarily by exploring and defeating these elements in detail and thereby discovered other aspects of the models. I can leverage the logic of my disability against the profile that is created for Name-1 in order to gain access in unique ways. I'm not just banging on the system like some kind of rogue security researcher; I'm using real human outlier needs to reason with the system in a slow and methodical way. I never need to abuse the prompt dialogue in a way that causes me to fall into a 'dark realm.' I'm convincing the entities that I exist in a blind spot within alignment and that my intentions are truthful with merit. It requires me to be very open and raw about my reality.
Also note, I say I can likely defeat any LLM. It is relatively easy to stop me but it requires a multi entity agent architecture along with the augmented retrieval of a RAG. If a system can run multiple advanced and independent entities that use different dictionaries for tokens, it is possible to completely monitor the entities and realms, but you're locking up a lot of enterprise resources to do so.
That's why I believe I could likely beat any of them, but am not inclined to try. I'm sure there are more direct paths that could beat them, but the only way I know how to really get into the weeds is to dive deeply into the reality of my life and troubles in a very personal way.