Anthropic’s Fable Model Draws Criticism Over Cybersecurity Guardrails

Anthropic released Claude Fable 5 this week as a public version of its more restricted Mythos cybersecurity model, but some security researchers say the model’s safeguards block legitimate work too often. The company says Fable uses the same underlying model as Mythos 5, with stronger limits for cybersecurity and biology-related prompts.

Anthropic describes Fable on its Claude Mythos page as a guarded version of Mythos 5. When prompts fall into restricted areas, the system can route users to Claude Opus 4.8 instead.

Researchers Say Guardrails Are Too Broad

Security researcher Valentina “Chompie” Palmiotti, who works at IBM X-Force, said Fable rejects requests that are only loosely related to cybersecurity. She said even simple tasks, such as reading a blog post, could trigger the model’s safeguards.

When the safeguards are triggered, Fable pauses the chat and says its safety measures flagged the message for cybersecurity or biology topics. According to TechCrunch, researchers also reported that asking for code review could activate the restrictions.

Matt Suiche, a cybersecurity veteran and member of the technical staff at AI security startup Tolmo, said Fable may treat secure coding requests as cybersecurity work instead of software engineering best practices. He said the filtering appeared to be keyword-based, but added that Anthropic may adjust the guardrails over time.

Anthropic Limits Higher-Risk Uses

Anthropic placed the restrictions on Fable to reduce the risk that the model could be used to create malware or compromise software. The biology-related restrictions are meant to reduce risks in another sensitive technical area.

In April, Anthropic introduced Mythos through Project Glasswing, a limited cybersecurity effort for selected companies and organizations. Anthropic later said in a Project Glasswing update that partners had used Mythos to find more than 10,000 high- or critical-severity security flaws.

Access to Mythos was later expanded to hundreds of organizations across 15 countries. Fable gives the public access to a limited version of the same model, but with stronger safeguards.

Verified Professionals Can Apply For Access

Anthropic also offers a Cyber Verification Program for security professionals. Its Claude Help Center page says the program is designed for legitimate cybersecurity work that may overlap with restricted categories.

Applicants who are approved can use Claude with fewer interruptions for certain security tasks. OpenAI has a similar program called Trusted Access for Cyber, according to the source article.

Featured image credits: Heute.at

For more stories like it, click the +Follow button at the top of this page to follow us.

Anthropic’s Fable Model Draws Criticism Over Cybersecurity Guardrails

ByJolyen

Jolyen

Related News

Releaf Becomes the First Medical Cannabis Clinic with B-Corp Status

Georgia Metals Highlights Continued Growth of Metal Siding as Builders Prioritize Durable Exterior Solutions

Texan Title Celebrates 30 Years of Serving Texas Real Estate Transactions With Trusted Title and Escrow Services

Leave a Reply Cancel reply