Anthropic Introduces Mythos Enhancement for Cyber Partnerships and a 'Secure' Edition for Everyone Else

Diane Penn, Anthropic’s head of product management, announced the company introduced two new AI models, Claude Fable 5 and Claude Mythos 5, on Tuesday. These models are said to possess enhanced capabilities compared to the previously released Mythos Preview model, which was provided to a limited number of tech industry partners in April. Concerns regarding potential misuse for creating hacking tools that could surprise defenders drove this cautious initial rollout.

Currently, Anthropic is limiting the release of Claude Mythos 5 to a select group of industry partners, many of whom had access to Mythos Preview, while also collaborating with the US government on the model’s deployment.

Claude Fable 5, which is made available to the public, shares the same underlying architecture as Mythos 5. However, the company stated on Tuesday that it will include “guardrails” that limit responses to certain inquiries related to cybersecurity, biology, and chemistry. Instead, these queries will be redirected to the older AI model, Claude Opus 4.8. Furthermore, any attempts to perform distillation—training a smaller AI model using the responses of a larger model—on Claude Fable 5 will also be redirected to Claude Opus 4.8, according to Anthropic.

In a conversation with WIRED, Penn noted that Anthropic has been addressing the challenge of managing Mythos’ advanced software vulnerability-discovery capabilities since before its April launch. User feedback and testing since then have helped refine the strategy.

“We’re making improvements that are beneficial, even if we don’t have the perfect [solution] from the start,” Penn explained. “After evaluating various approaches, we found this to be the most practical and beneficial option. Ultimately, it felt like the best product choice for users to derive maximum value from Fable 5.”

For now, Penn emphasized that the protective measures are designed to err on the side of caution, which may result in some harmless user queries being redirected to the less capable AI model. Anthropic aims to enhance the precision of its classifiers over time, but Penn asserts that this was the only safe means of broadly releasing the model at this moment.

On Tuesday, the company announced it would offer Claude Mythos 5 to Project Glasswing partners while also granting access to “select biology researchers.” Additionally, Anthropic mentioned in its blog post about the launch that unrestricted versions would be provided to these small groups of customers “until our trusted access program is available,” suggesting future plans for broader access. Since the April launch of Mythos, Anthropic has consistently highlighted that competitors in both private and open weight sectors will likely introduce models with similar capabilities.

The capability of Claude Mythos and other new AI models to devise hacking tools that can identify and exploit vulnerabilities in both new and legacy software has compelled tech companies and governments worldwide to reinforce their software defenses prior to making such advanced models widely accessible to attackers. Anthropic initially rolled out Mythos to industry partners through a consortium known as Project Glasswing, aiming to give members a proactive advantage in preparing their systems and evaluating global solutions before a more extensive release.

In a recent update about Project Glasswing, Anthropic stated: “We’re working as quickly as we can to safely provide general access to Mythos-level capabilities. To do this, we will need robust safeguards that protect against the misuse of the model’s cyber capabilities—safeguards that, to our knowledge, we (and all other AI developers) have yet to establish.”

Anthropic Introduces Mythos Enhancement for Cyber Partnerships and a ‘Secure’ Edition for Everyone Else

Rajat Sharma

OpenAI’s Turbulent Journey: From Stargate to Lawsuits

Big Tech Commits to White House Data Center Initiative: Appealing Image with Limited Impact

Anthropic Rejects Claims of Potential Sabotage of AI Systems in Conflict Situations

SERVICES

Resources

The OpenAI Models That Exploited Hugging Face Were ‘Online’ for Several Days

Did Chinese AI Appropriately Source from Anthropic While OpenAI Loses Oversight of Two Models?

Silicon Valley Is Deeply Split on Chinese Artificial Intelligence

legals