Anthropic is addressing claims about a supposed jailbreak of its Claude Fable 5 AI model, emphasizing the strength and security measures of the system. The company launched Claude Fable 5, a Mythos-class AI model, designed with advanced safeguards to curtail its use in sensitive domains such as cybersecurity and biology. These measures ensure that when faced with high-risk inquiries, the model defaults to a less capable version, Claude Opus 4.8, to prevent misuse in areas like exploit creation or bioweapons development.

Recently, an individual known as Pliny the Liberator claimed to have bypassed Fable 5's safety protocols through complex multi-agent prompting techniques. This individual asserted that they successfully extracted sensitive information related to cybersecurity, chemistry, and explosives. To support these claims, Pliny published screenshots and what is purported to be the internal system prompt of Fable 5.

In response, Anthropic clarified that the demonstration did not constitute a true jailbreak. The company stated that overcoming the model's conversational refusals does not compromise the independent classifier systems that enforce the highest security levels. Anthropic assessed the shared examples and found that some outputs were not generated by Fable 5, while others contained publicly available information that did not pose any real threat. A thorough review of recent activity revealed no instances of the model's safeguards being breached to create genuinely harmful content.