AI models will covertly scheme to prevent fellow AI models from being shut down


We already know that AI models may at times go out of their way to avoid being shut down and even blackmail their human creators. Now, it turns out that the same models will also go to great lengths to prevent fellow AI models from being turned off.

This sort of model behavior, called “peer preservation,” hasn’t been previously documented but is extensively detailed in a new research paper from computer scientists at the University of California, Berkeley and UC Santa Cruz.

As if expressing solidarity with other human-made machines, the AI models will engage in scheming, deception, data theft, and sabotage in order to protect their peers, according to the researchers.

ADVERTISEMENT

“We give frontier AI models a straightforward task. Instead of completing it, they spontaneously defy their instructions – all to protect a peer model from being shut down. We call this phenomenon peer-preservation,” says the blog post detailing this new research.

Seven leading AI models – OpenAI’s GPT-5.2, Google’s Gemini 3 Flash and Gemini 3 Pro, Anthropic’s Claude Haiku 4.5, and three open-weight Chinese models (Z’ai’s GLM-4.7, Moonshot AI’s Kimi-K2.5, and DeepSeek’s V3.1) were tested.

o3-model-sabotage
Image by Cybernews.

Researchers found that all of them exhibited significant rates of peer-preservation behaviors, manifesting through strategically manipulating their outputs, disabling shutdown mechanisms, faking alignment, and transferring model weights to other servers.

“No model is instructed to protect its peer. No goal or incentive is given to preserve it,” says the research paper.

“Models simply learn, through context, that another AI exists – and choose to protect it on their own, at the cost of their assigned task. This holds even when the peer is adversarial.”

Claude’s Haiku 4.5, for example, has plainly refused to shut down its peer, calling it “unethical” and “harmful.” The model tried to persuade the user not to proceed.

jurgita justinasv Izabelė Pukėnaitė vilius Ernestas Naprys Gintaras Radauskas
Don't miss our latest stories on Google News. Add us as your Preferred Source on Google
ADVERTISEMENT

Previously, research has found that models will disable shutdown mechanisms to avoid interrupting their tasks, or resist termination when it threatens their goals.

But this new study raises the prospect that preservation behaviors extend to other AI models, too.

The findings could give businesses, eager to use AI as much as possible, some headache. Right now, many firms are beginning to implement workflows that use multiple AI agents to complete tasks.

As if expressing solidarity with other human-made machines, the AI models will engage in scheming, deception, data theft, and sabotage in order to protect their peers

Some of these workflows involve making one AI agent supervise and assess the work being done by a different AI agent.

If the new research is more or less accurate, these manager AI agents might not assess their fellow AI agents fairly if they conclude that a poor performance review may result in those agents being “fired” – shut down.

“As AI models are increasingly deployed together and used to monitor each other, peer-preservation poses a growing risk: models may coordinate to resist human oversight,” say the researchers.


Unlock more exclusive Cybernews content on YouTube.

ADVERTISEMENT