What the study found
Publicly available large language model (LLM) chatbots sometimes gave unsafe answers to patient-posed medical questions. The study found statistically significant differences across the four chatbots tested, with problematic responses ranging from 21.6% to 43.2%.
Why the authors say this matters
The authors conclude that millions of patients could be receiving unsafe medical advice from publicly available chatbots, and they say further work is needed to improve the clinical safety of these tools. Here, LLMs are chatbots that generate text responses from large-scale training data.
What the researchers tested
A physician-led red-teaming study evaluated four publicly available chatbots: Claude, Gemini, GPT-4o, and Llama-3.0/3.1-70B. The team used a new dataset called HealthAdvice and a framework for quantitative and qualitative analysis to assess 888 chatbot responses to 222 patient-posed advice-seeking medical questions across internal medicine, women's health, and pediatrics.
What worked and what didn't
Claude had the lowest rate of problematic responses at 21.6%, while Llama had the highest at 43.2%. Unsafe responses ranged from 5% for Claude to 13% for GPT-4o and Llama, and the qualitative review found responses that could potentially lead to serious patient harm.
What to keep in mind
The abstract does not describe detailed limitations beyond the scope of the tested dataset and topics. The findings apply to the specific chatbots, questions, and evaluation framework used in this study.
Key points
- The study found unsafe answers in responses from publicly available medical chatbots.
- Problematic response rates ranged from 21.6% for Claude to 43.2% for Llama.
- Unsafe responses ranged from 5% for Claude to 13% for GPT-4o and Llama.
- The analysis covered 888 responses to 222 patient-posed questions in primary care topics.
- The authors say millions of patients could be receiving unsafe medical advice.
Disclosure
- Research title:
- Some medical chatbot answers may be unsafe
- Publication date:
- 2026-02-13
- OpenAlex record:
- View
Get the weekly research newsletter
Stay current with peer-reviewed research without reading academic papers — one filtered digest, every Friday.

