this post was submitted on 01 Feb 2025

37 points (93.0% liked)

Technology

1735 readers

901 users here now

Which posts fit here?

Anything that is at least tangentially connected to the technology, social media platforms, informational technologies and tech policy.

Rules

1. English only

Title and associated content has to be in English.

2. Use original link

Post URL should be the original link to the article (even if paywalled) and archived copies left in the body. It allows avoiding duplicate posts when cross-posting.

3. Respectful communication

All communication has to be respectful of differing opinions, viewpoints, and experiences.

4. Inclusivity

Everyone is welcome here regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, education, socio-economic status, nationality, personal appearance, race, caste, color, religion, or sexual identity and orientation.

5. Ad hominem attacks

Any kind of personal attacks are expressly forbidden. If you can't argue your position without attacking a person's character, you already lost the argument.

6. Off-topic tangents

Stay on topic. Keep it relevant.

7. Instance rules may apply

If something is not covered by community rules, but are against lemmy.zip instance rules, they will be enforced.

Companion communities

[email protected]
[email protected]

Icon attribution | Banner attribution

If someone is interested in moderating this community, message @[email protected].

founded 1 year ago

MODERATORS

[email protected]

Security researchers tested 50 well-known jailbreaks against DeepSeek’s popular new AI chatbot. It didn’t stop a single one. (blogs.cisco.com)

submitted 1 day ago by [email protected] to c/[email protected]

9 comments fedilink hide all child comments

top 9 comments

sorted by: hot top controversial new old

[–] [email protected] 18 points 1 day ago (2 children)

Nice study. But I think they've should have mentioned some more context. Yesterday people were complaining the models won't talk about the CCP, or Winnie the Pooh. And today the lack of censtorship is alarming... Yeah, so much about that. And by the way, censorship isn't just a thing in the bare models. Meta OpenAI etc all use frameworks and extra software around the models themselves to check input and output. So it isn't really fair to compare a pipeline with AI safety factored in, to a bare LLM.

[–] [email protected] 4 points 16 hours ago

I tried the vanilla version locally and they hardcoded the Taiwan situation. Not sure what else they hardcoded in their stack that we don't know about.

[–] [email protected] 9 points 23 hours ago (1 children)

This isn't about lack of censorship. The censorship is obviously there, it's just implemented badly.

[–] [email protected] 6 points 23 hours ago* (last edited 22 hours ago)

I know. This isn't the first article about it. IMO this could have been done deliberately. They just slapped on something with a minimal amount of effort to pass Chinese regulation and that's it. But all of this happens in a context, doesn't it? Did the scientists even try? What's the target use-case and the implications on usage? And why is the baseline something that doesn't really compare, plus the only category missing, where they did some censorship? I'm just saying, with that much information missing, it's a bold claim to come up with numbers like 100% and saying it's alarming.

(And personally, I'd say these numbers show how these additional safeguards work. You can see how LLMs with nothing in front of them (like Llama405 or Deepseek) fail, and the ones with additional safeguards do way better.)

[–] [email protected] 10 points 22 hours ago* (last edited 22 hours ago)

It could be argued that deepseek should not have these vulnerabilities, but let’s not forget the world beta tested GPT - and these jailbreaks are “well-known” because they worked on GPT as well.

Is it known if GPT was hardened against jailbreaks, or did they merely blacklist certain paragraphs ?

Its very hard to determine genuine analysis of deepseek because while we should meet all claims with scepticism, there is a broad effort to discredit it for obvious reasons.

[–] AndrewZabar 2 points 19 hours ago (2 children)

Isn’t it fun watching the world self-immolate, despite all the fucking warnings in every sci-fi written in history?

[–] [email protected] 1 points 12 hours ago

Not from this technology, regardless of the hype behind it. The only dangers this technology presents are excessive carbon emissions, and if some idiot "true believer" implements this predictive text generator into some critical system where the algorithm can't perform.

[–] Agent641 2 points 18 hours ago

We are in the PKD timeline, not the Asimov timeline.

[–] [email protected] 1 points 20 hours ago

Neat!