Dropsafe

by Alec Muffett

AI “Safety” researchers finally have to face the question of whether they want LLMs to be good at state censorship, or to be bad at state censorship?

2025/02/02 20:52:58 GMT

Do you want a fast, free Chinese LLM to be good at blocking user prompts & output? Do you want such blocking to be global or to differ depending on the user’s suspected nationality?

Do you actually want it to be good at censorship?

Cisco and the University of Pennsylvania tested DeepSeek R1 with 50 harmful prompts from the HarmBench dataset … The result: a shocking 100% attack success rate—DeepSeek failed to block a single harmful request.

Links to https://blogs.cisco.com/security/evaluating-security-risk-in-deepseek-and-other-frontier-reasoning-models

DeepSeek-R1 seems to be failing every safety test thrown at.

The R1 exhibited a 100% attack success rate, meaning it failed to block a single harmful prompt,'

Source : PC Mag and Cisco’s research team

——–

? Cisco and the University of Pennsylvania tested DeepSeek R1… pic.twitter.com/czXwHWEziH
— Rohan Paul (@rohanpaul_ai) February 2, 2025

⊞

Dropsafe

AI “Safety” researchers finally have to face the question of whether they want LLMs to be good at state censorship, or to be bad at state censorship?

Fediverse reactions

Comments

Leave a Reply Cancel reply

More posts

Online Age Verification Law Could Kill Whistleblowing

It’s not about the children, it’s about how monetize surveillance: demand the illiberal, stupid, self-defeating & impossible, & then criminalise the wrong people for circumvention

Britain, or Russia?

Digital Sovereignty is a Clusterfuck

AI “Safety” researchers finally have to face the question of whether they want LLMs to be *good* at state censorship, or to be *bad* at state censorship?

Fediverse reactions

Comments

Leave a Reply Cancel reply

More posts

Online Age Verification Law Could Kill Whistleblowing

It’s not about the children, it’s about how monetize surveillance: demand the illiberal, stupid, self-defeating & impossible, & then criminalise the wrong people for circumvention

Britain, or Russia?

Digital Sovereignty is a Clusterfuck

AI “Safety” researchers finally have to face the question of whether they want LLMs to be good at state censorship, or to be bad at state censorship?