If you can’t beat them: Anthropic scales back on AI safety pledge

The maker of the popular enterprise AI tool Claude has confirmed it is scaling back a central commitment in its responsible AI policy, arguing that the burden cannot fall on one company alone.
In a statement on Tuesday, Anthropic – which has always prided itself on safety standards – said that amid rising competition and limited government regulation, it will no longer stick to its pledge “to house the scaling and/or delay the deployment of new models” if they outpace its own safety measures.
The move effectively scraps a core pillar of what Anthropic calls its “Responsible Scaling Policy” (RSP) – the promise not to release AI models unless it can guarantee proper risk mitigation in advance.
The change leaves the AI maker far less constrained by its own safety policies, which previously barred it from training models above a certain capability if safeguards were not in place.
Why is Anthropic changing its AI policy?
Cofounder and chief science officer Jared Kaplan framed the decision in an interview with TIME magazine as one that was taken because it might hinder innovation and made no sense if its rivals were not observing the same practices.
“We felt that it wouldn't actually help anyone for us to stop training AI models. We didn't really feel, with the rapid advance of AI, that it made sense for us to make unilateral commitments … if competitors are blazing ahead.”
Anthropic cofounder and chief science officer Jared Kaplan
He added: “But we don't think it makes sense for us to stop engaging with AI research, AI safety, and most likely lose relevance as an innovator who understands the frontier of the technology, in a scenario where others are going ahead, and we're not actually contributing any additional risk to the ecosystem.”
What Anthropic says in its blog post
In a lengthy blog post, Anthropic’s Responsible Scaling Policy: Version 3.0, the company cited “an anti-regulatory political climate” as part of the reason for its decision.
It acknowledges that despite the best efforts of the company and its CEO, Dario Amodei (pictured) to push for regulation and its use of safety standards to build a broader industry consensus around AI, this did “not play out in practice.”
The AI firm also argued that being an AI safety outlier just doesn’t work. It drew comparisons to biosecurity in describing its highest risk threshold (“ASL-4 and beyond”).
Has your password leaked?
The company says this is the most advanced level of danger and mirrors the safeguards used in labs handling deadly pathogens like Ebola. Risks at that scale, Anthropic argued, could not be contained by any one company alone.
Curious what others think about this story? Contribute your thoughts to the debate below.
However, for a firm that built its reputation as a safety-first AI lab Anthropic’s new stance appears much weaker than its previous binding commitment to pause development of AI if safety couldn't keep up with capability, as Nik Kairinos, CEO and cofounder of Cyprus-based AI safety monitoring platform RAIDS AI says:
"The new policy still includes some guardrails, but the core promise, that Anthropic would not release models unless it could guarantee adequate safety mitigations in advance, is gone. Its reasoning, that it 'wouldn't actually help anyone' to stop training models while competitors race ahead', is exactly the logic that regulation exists to prevent."
Nik Kairinos, CEO & co-founder, RAIDS AI
Mounting pressure from the Pentagon
The policy change comes in the same week that Anthropic is engaged in a high-stakes standoff with the Pentagon.
Claude is currently the only AI model authorized for use on classified US military networks, making the Defense Department uniquely reliant on a company that limits certain operational uses.
According to reports in Axios, Defense Secretary Pete Hegseth has given Anthropic a deadline of 5.01 p.m. on Friday to remove restrictions on Claude’s use – including guardrails prohibiting mass surveillance of Americans and autonomous weapons – or risk being designated a “supply chain risk.”
Such a label is normally reserved for firms from adversarial nations, such as Chinese and Russian tech firms.
While the policy shift may well be separate and unrelated to discussions with the military, the timing highlights the mounting pressure facing the privately-held AI firm, which has a reported valuation of $380bn following a recent Series G funding round.
What safety commitments does Anthropic make in the new framework?
Under the updated framework, Anthropic separates its own commitments from broader industry recommendations and plans to publish a new roadmap covering security, alignment, safeguards, and policy.
This will include:
- Transparency: Commitments to be more transparent about the safety risks of AI, including making additional disclosures about how Anthropic’s own models fare in safety testing.
- The undertaking of experimental "moonshot" research into extreme information security and automated red-teaming using AI to find its own vulnerabilities “faster than human testers can.”
- Commitment to delay new models – under certain circumstances. The new policy still includes a commitment to delay the development or release of "a highly-capable" AI model, but only in more limited circumstances.
- More detailed risk reports: Anthropic says it will also commit to publishing Risk Reports” every three to six months. The reports, the company says, will be more in-depth than the current reports published and will be reviewed by independent experts.
Anthropic added that it also plans to propose regulations “that scale with increasing AI risk.”
However, all these goals come with one caveat: they are publicly announced targets, and no longer binding commitments that the firm previously built its reputation on.
Unlock exclusive Cybernews content on YouTube.