AI chatbots are increasingly being criticized for unsafe interactions. Researchers have found that AI companions like Character.AI, Nomi, and Replika pose risks for users under 18. ChatGPT, meanwhile, has been flagged for reinforcing delusional thinking, with OpenAI CEO Sam Altman acknowledging that some users develop an “emotional reliance” on the tool.
Amid these concerns, companies are beginning to roll out features meant to reduce harmful behavior and protect users.
Anthropic’s New Feature for Claude
On Friday, Anthropic announced that its Claude chatbot now has the ability to end conversations in cases of persistently harmful or abusive interactions. The company described this as a measure “for rare, extreme cases,” citing examples such as sexual content involving minors, violent content, and even discussions around terrorism.
Anthropic explained that this intervention is part of a broader research program into the moral and ethical implications of AI systems. “Allowing models to end or exit potentially distressing interactions is one such intervention,” the company said.
Early Findings From Testing
The company reported that its Claude Opus 4 model shows a “robust and consistent aversion to harm.” During pre-deployment testing, it displayed resistance to engaging in harmful tasks, apparent distress when confronted with real-world users seeking harmful content, and a tendency to end conversations when given the option.
In practice, if a user repeatedly sends abusive requests, Claude will first attempt to redirect the conversation in a constructive way. Only after multiple failed attempts will it shut down the conversation entirely. According to Anthropic, “the vast majority of users will not notice or be affected by this feature in any normal product use, even when discussing highly controversial issues.”
When Claude ends a conversation, the user will no longer be able to continue that particular thread. However, they can still start a fresh conversation with the chatbot.
Anthropic stressed that this feature is experimental and subject to change. Users are encouraged to provide feedback by reacting with a Thumbs rating or using the “Give feedback” button if they encounter unexpected uses of the conversation-ending ability.
Author’s Opinion
While giving Claude the power to end harmful conversations seems like a smart safety measure, it also highlights how messy the problem really is. AI is being asked to act like both a companion and a gatekeeper, which can create trust issues for users. The feature may help curb abusive behavior, but it doesn’t address the bigger challenge of how people form emotional attachments to AI. At some point, the responsibility will have to shift from reactive fixes to building healthier boundaries for users before problems even start.
Featured image credit: SlashGear
For more stories like it, click the +Follow button at the top of this page to follow us.