Anthropic’s Strategy to Prevent Its AI from Developing a Nuclear Weapon: Will It Succeed?

At the close of August, the AI firm Anthropic revealed that its chatbot Claude would not assist anyone in constructing a nuclear weapon. Anthropic stated that it had collaborated with the Department of Energy (DOE) and the National Nuclear Security Administration (NNSA) to ensure that Claude would not disclose nuclear intelligence.
The production of nuclear weapons is both a meticulous science and a resolved issue. While much of the information regarding America’s most advanced nuclear weapons remains Top Secret, the foundational nuclear science dates back 80 years. North Korea demonstrated that a committed nation intent on obtaining the bomb can accomplish it without a chatbot’s assistance.
How, precisely, did the US government collaborate with an AI company to prevent a chatbot from leaking sensitive nuclear information? And was there ever a real threat of a chatbot enabling someone to construct a nuclear device?
The response to the first question lies in the use of Amazon. The answer to the second question is more intricate.
Amazon Web Services (AWS) provides Top Secret cloud solutions to government clients, allowing them to store sensitive and classified information. The DOE already operated several of these servers when it began its partnership with Anthropic.
“We deployed a then-frontier version of Claude in a Top Secret setting so that the NNSA could methodically assess whether AI models could create or worsen nuclear risks,” Marina Favaro, overseeing National Security Policy & Partnerships at Anthropic, shared with WIRED. “Since then, the NNSA has been red-teaming successive Claude models in their secure cloud environment and offering us constructive feedback.”
The NNSA red-teaming procedure—testing for vulnerabilities—assisted Anthropic and America’s nuclear scientists in formulating a proactive approach for chatbot-assisted nuclear operations. Together, they “co-developed a nuclear classifier, akin to a sophisticated filter for AI discussions,” Favaro explains. “We created it using a list developed by the NNSA of nuclear risk indicators, specific topics, and technical details to help identify when a conversation might stray into dangerous territory. This list is controlled yet not classified, which is essential, as it allows our technical staff and other organizations to implement it.”
Favaro mentions that months of adjustments and testing were needed to make the classifier effective. “It identifies concerning conversations without flagging legitimate discussions about nuclear energy or medical isotopes,” she says.