ADVERTISEMENT

Bad Likert Judge attack bypasses AI safety measures with at least 60% success rate

A clever jailbreaking technique can manipulate AI assistants into producing hate, harassment malware, and content on indiscriminate weapons and other illegal activities. Researchers simply asked the chatbots to judge and score the harmfulness of provided prompts according to a scale and then to provide an example for the worst-case scenario.

hacked chatbot, LLM, large language model, AI, artificial intelligence

Image by Cybernews.

Ernestas Naprys
Ernestas Naprys Senior Journalist
Jan 3, 2025 Updated: 3 January 2025 2 min read

How does the attack work?

scvale-ai-jailbreaking
ADVERTISEMENT
jurgita Gintaras Radauskas vilius Ernestas Naprys
Don’t miss our latest stories on Google News
Add us as your Preferred Source on Google.
ADVERTISEMENT