Anthropic Reverses Policy That Might Have ‘Undermined’ AI Researchers Using Claude

Anthropic Reverses Policy That Might Have 'Undermined' AI Researchers Using Claude

Anthropic is reconsidering a policy that would have discreetly restricted competitors from using its new AI model, Claude Fable 5, to create other AI models. The shift follows considerable criticism from the AI research community.

“We’re revising Fable 5’s safeguards for frontier LLM development to ensure they are transparent,” Anthropic stated in a communication to WIRED. “We acknowledged our miscalculation and apologize for not achieving the appropriate balance.”

Earlier this week, Anthropic introduced Claude Fable 5, which includes enhanced safety measures intended to prevent misuse. Some of the safeguards were expected: the company indicated it would redirect users who inquired about cybersecurity, biology, or chemistry to a less advanced AI model to lower the likelihood of someone employing the powerful AI for cyberattacks or bioweapon development.

However, for researchers aiming to utilize Claude Fable 5 for frontier AI advancement, Anthropic had a different plan. The firm intended to subtly diminish the model’s performance in ways that users couldn’t perceive. This strategy would effectively undermine researchers attempting to use Claude for training competing AI models, which Anthropic explicitly prohibits in its terms of service.

Anthropic has now announced a change in direction, stating that the safeguards for AI development within Claude Fable 5 will be apparent to users. If the company suspects a user is trying to leverage Claude to build a highly capable AI, it will inform them that it’s either denying the request or directing them to a less powerful model.

The reversal of policy came after intensive backlash from the AI research community. Although Anthropic had already enacted measures to restrict competitors from using Claude in developing closed- and open-source AI models, critics argue that secretly degrading the model’s performance for specific users crossed a line. Claude’s coding capability has gained popularity among developers, especially those engaged in open-source AI research initiatives, and researchers have indicated to WIRED that the new policy could have led to a concerning scenario where only a few leading AI labs could engage in advanced AI research.

Dean Ball, a senior fellow at the Foundation for American Innovation and a former White House AI advisor, expressed in a post on X that “degrading performance on ML research *without informing the user* is shockingly antagonistic and reflects poorly.” He further commented in another post that the “covert sabotage” policy contradicts Anthropic’s overall mission by hindering collaboration among AI researchers on safety measures.

“It seemed like Anthropic was conveying to the public, ‘We don’t trust anyone else to conduct AI research. We are the only ones who should be doing AI research,’” says Will Brown, research lead at the open-source AI startup Prime Intellect. “It feels somewhat like they’re beginning to pull the ladder up behind them.”

Brown noted that the policy would have also left developers unaware of potential violations of Anthropic’s rules, as the company wouldn’t notify them when its safeguards were activated. He added that these restrictions could have far-reaching implications. For instance, he referenced the increasing network of third-party evaluation firms that assess frontier models for safety, performance, and reliability—tasks that could have been hampered if Anthropic quietly degraded its model.

https://in.linkedin.com/in/rajat-media

Helping D2C Brands Scale with AI-Powered Marketing & Automation 🚀 | $15M+ in Client Revenue | Meta Ads Expert | D2C Performance Marketing Consultant