Hugging Face Introduces a Benchmark for Evaluating Generative AI Performance in Health-related Tasks

Hugging Face, the AI startup, has introduced a new benchmark named Open Medical-LLM, designed to evaluate generative AI models on medical-related tasks.

New: Open Medical LLM Leaderboard! 🩺

In basic chatbots, errors are annoyances.
In medical LLMs, errors can have life-threatening consequences 🩸

It's therefore vital to benchmark/follow advances in medical LLMs before thinking about deployment.

Blog: https://t.co/pddLtkmhsz
— Clémentine Fourrier 🍊 (@clefourrier) April 18, 2024

This initiative, developed in collaboration with Open Life Science AI and the University of Edinburgh’s Natural Language Processing Group, incorporates existing test sets like MedQA, PubMedQA, and MedMCQA. These tests probe generative AI’s knowledge in areas such as anatomy, pharmacology, genetics, and clinical practice.

The benchmark includes a variety of question types that test medical reasoning, drawing from resources like U.S. and Indian medical licensing exams and college biology test banks.

The release of Open Medical-LLM comes at a time when the adoption of generative AI in healthcare is increasing, though with a mixture of enthusiasm and caution.

The Dual Perspectives on AI in Healthcare

Proponents of generative AI in healthcare believe it can enhance efficiency and uncover insights that might otherwise remain undiscovered. However, critics argue that these models carry inherent flaws and biases that could potentially lead to poorer health outcomes.

Medical test comparison — *^{Credits: Hugging Face}*

Hugging Face has positioned Open Medical-LLM as a robust tool for assessing the capabilities of healthcare-oriented generative AI models.

According to a blog post by the company, the benchmark allows researchers and practitioners to pinpoint the strengths and weaknesses of different AI approaches. This, in turn, is intended to drive advancements in the field and improve patient care and outcomes.

Yet, despite these intentions, some medical professionals express skepticism about the benchmark’s real-world applicability.

Medical Professionals Weigh In

Liam McCoy, a resident physician in neurology at the University of Alberta, highlighted on social media platform X, the discrepancy between the controlled environment of a benchmark test and the complexities of actual clinical practice. Clémentine Fourrier, a research scientist at Hugging Face and co-author of the benchmark announcement, concurred with McCoy’s viewpoint.

She emphasized that while such benchmarks can guide initial model selection for specific use cases, a more comprehensive phase of testing is crucial to evaluate a model’s limitations and applicability in real-world conditions. Fourrier further advised that medical generative AI models should not be used independently by patients but should serve as support tools for medical professionals.

Learning from Past AI Implementations in Healthcare

The conversation around generative AI in healthcare also recalls past experiences with AI tools in medical settings.

An illustrative example is Google’s AI screening tool for diabetic retinopathy in Thailand. Although it showed high theoretical accuracy, the tool was impractical in real-world applications, leading to frustration among patients and healthcare workers due to inconsistent results and lack of integration with existing medical practices.

The challenges of implementing AI in healthcare are further underscored by regulatory considerations. To date, the U.S. Food and Drug Administration (FDA) has approved 139 AI-related medical devices, none of which employ generative AI.

This highlights the difficulty in predicting how well AI tools developed and tested in laboratory settings will perform in clinical environments and how their effectiveness might evolve over time.

Hugging Face Introduces a Benchmark for Evaluating Generative AI Performance in Health-related Tasks

ByHuey Yee Ong

The Dual Perspectives on AI in Healthcare

Medical Professionals Weigh In

Learning from Past AI Implementations in Healthcare

Huey Yee Ong

Related News

Eight US Newspapers File Copyright Infringement Lawsuit Against OpenAI and Microsoft

Microsoft’s $1.5 billion investment in G42 underscores escalating tensions between the US and China

Bain Plans IPO for Japan’s Kioxia to Facilitate Refinancing of $5.8 Billion Loan

Leave a Reply Cancel reply