Concerns are mounting around OpenAI’s transcription tool Whisper after researchers identified significant issues with “AI hallucinations,” especially alarming given the tool’s adoption in high-stakes fields like healthcare.
The transcription software, designed to convert audio into text, has reportedly been generating fabricated and sometimes disturbing content, according to research shared with AP News.
Engineers, developers, and academic researchers cited alarming examples, noting that Whisper had injected racial remarks, invented treatments, and other inaccuracies into transcriptions. These types of errors are rare in transcription tools, where the output is expected to reflect audio content closely. With over 30,000 clinicians and 40 health systems, including Mankato Clinic in Minnesota and Children’s Hospital Los Angeles, integrating Whisper, these findings have ignited fears over potential risks to patient safety and decision-making accuracy.
Key findings reported by researchers include:
- University of Michigan Study: Found hallucinations in 80% of transcripts from public meetings.
- Machine Learning Engineer’s Review: Observed inaccuracies in more than half of 100+ hours of Whisper transcriptions.
- Developer’s Data: Detected hallucinations in nearly all of his 26,000 generated transcripts.
Researchers warn that hallucination rates at this scale are unprecedented among transcription tools, and some feel Whisper’s potential risks could outweigh its benefits in high-risk industries.
In response, an OpenAI spokesperson stated that the company is working to improve model accuracy and reduce hallucinations, emphasizing that Whisper is not recommended for use in high-stakes decisions. “We thank researchers for sharing their findings,” the spokesperson added, suggesting OpenAI is aware of and addressing these concerns. Microsoft also publicly stated that Whisper is not intended for critical-use cases.
Risks for Healthcare Applications
Alondra Nelson, a Princeton professor of social science, highlighted the real-world implications, warning of “grave consequences” for healthcare. “Nobody wants a misdiagnosis,” she said, stressing the need for a high accuracy threshold in medical contexts. Former OpenAI employee William Saunders echoed her concerns, saying, “It’s problematic if you put this out there and people are overconfident about what it can do.”
This issue isn’t exclusive to OpenAI. Recently, AI tools from other tech giants have faced similar backlash. Google’s AI Overviews, for example, has come under fire for erroneous advice, like recommending non-toxic glues on pizzas to keep toppings in place. Meanwhile, Apple CEO Steve Cook admitted AI hallucinations could also be a challenge for Apple’s upcoming AI suite, Apple Intelligence.
As AI tools like Whisper find their way into areas like healthcare, concerns about their reliability grow. If AI can make up information in transcriptions, it may not be ready for tasks where accuracy is crucial. This highlights the need for caution as AI takes on more serious roles.
Featured Image courtesy of Nadine E on Unsplash
Follow us for more tech news updates.