'Botside' manner may soon replace bedside manner. (Credit: Andrey_Popov on Shutterstock)
If AI’s empathy advantage extends to voice interactions, it could revolutionize patient care for millions.
In A Nutshell
- AI chatbots like ChatGPT scored roughly 2 points higher than doctors on 10-point empathy scales
- The advantage held across 13 of 15 studies examining cancer, mental health, thyroid conditions, and other medical questions
- All studies evaluated text-only interactions; results may not apply to in-person or voice consultations
- One in five UK doctors already uses ChatGPT for tasks like writing patient correspondence
- Future research needs to test whether patients receiving actual care perceive the same empathy advantage
Healthcare workers and patients feel more warmth from AI-generated medical responses than from actual doctors, a surprising analysis of 15 studies shows. The largest study examined 2,164 patient interactions, with similar patterns emerging across smaller datasets.
ChatGPT and similar AI chatbots scored roughly two points higher than human healthcare professionals on 10-point empathy scales when responding to patient questions via text. AI had a 73% probability of being rated as more empathic than human practitioners in head-to-head comparisons.
“In text-only scenarios, AI chatbots are frequently perceived as more empathic than human HCPs,” study authors wrote. The meta-analysis from the Universities of Nottingham and Leicester pooled data from 13 of the 15 studies comparing AI chatbots to doctors, nurses, and other healthcare workers.
The results, published in the British Medical Bulletin, challenge long-held assumptions about human connection in medicine and run counter to a 2019 UK government report that called empathy an “essential human skill that AI cannot replicate.”
AI Shows Empathy Edge Across Medical Specialties
ChatGPT-4 outperformed human clinicians in nine separate studies spanning cancer care, thyroid conditions, mental health, autism, and general medical inquiries. For thyroid questions, the AI scored 1.42 standard deviations above human surgeons in empathy ratings. Mental health queries showed similar patterns, with ChatGPT-4 scoring 0.97 standard deviations higher than licensed mental health professionals.
Patient complaints revealed the starkest gaps. When handling grievances across hospital departments, ChatGPT-4 scored 2.08 standard deviations higher than human patient relations officers.
The AI advantage appeared consistent regardless of who evaluated the responses. When both physicians and patients reviewed the same set of answers about systemic lupus, ChatGPT-4 received higher empathy ratings from physicians. For questions about multiple sclerosis, patient representatives using a validated empathy scale rated AI responses more favorably than neurologist responses.
Studies drawing from Reddit health forums and patient portals showed similar trends. Questions ranged from interpreting blood test results to managing chronic conditions to understanding cancer treatment options. Across this variety, AI responses were more likely to be rated as warm, understanding, and considerate of patient concerns.

Dermatology provided the sole exception. In both studies examining skin-related questions, dermatologists outperformed ChatGPT-3.5 and Med-PaLM 2, though researchers couldn’t explain this specialty-specific pattern.
The Text Message Caveat
All studies evaluated text-based interactions exclusively. Even when one study converted AI responses to audio, empathy ratings came from written transcripts alone.
A doctor’s nod, forward lean, or eye contact often conveys understanding as powerfully as words. Text-based healthcare interactions represent a small portion of patient care, though their use grows with patient portals and telemedicine.
Studies also relied on proxy evaluators rather than patients receiving actual care. Healthcare professionals, medical students, patient representatives, and researchers rated empathy in responses to real patient questions. Direct patient feedback might differ, particularly since healthcare providers and patients often rate empathy differently.
Most studies used custom, unvalidated empathy scales. Raters typically scored responses on 1-5 or 1-10 scales ranging from “not empathetic” to “very empathetic.” Only one study employed the CARE scale, a validated 10-item instrument designed specifically for measuring therapeutic empathy in clinical consultations.
The studies couldn’t determine whether AI’s perceived empathy advantage translates to better health outcomes. While empathic communication has been linked to reduced patient pain and anxiety, improved medication adherence, and higher satisfaction with care, these studies measured perception rather than clinical impact.
Twenty Percent of UK Doctors Already Use ChatGPT
The research lands as AI adoption in healthcare accelerates. One in five UK general practitioners now uses generative AI tools for tasks like writing patient correspondence. Over 117,000 patients across 31 NHS mental health services have interacted with Wysa, an AI-powered digital therapist, according to Wysa’s website.
Study authors propose a collaborative model where doctors draft initial responses while AI enhances tone and empathic language, with clinicians ensuring medical accuracy. This approach could reduce physician workload while potentially improving patient satisfaction.
Empathic delivery means little if medical advice proves wrong. AI reliability concerns persist, and gains in perceived warmth could vanish if responses contain factual errors or incomplete guidance.
How the Research Was Conducted
Researchers searched seven databases for studies published through November 2024, identifying 15 qualifying studies from 2023-2024. Most used unvalidated single-item scales asking raters to score empathy from 1-5 or 1-10. Only one employed a validated instrument, the CARE scale designed for measuring therapeutic empathy.
Fourteen studies assessed ChatGPT variants (versions 3.5 or 4), while others examined Claude, Gemini Pro, Le Chat, ERNIE Bot, and Med-PaLM 2. Patient questions came from emails in private medical records, Reddit and public forums, real-time chat transcripts, and in-person reception interactions. The largest dataset included 2,164 live outpatient queries at a Chinese hospital.
Nine studies had moderate risk of bias; six showed serious risk. Common problems included curated patient queries potentially skewing results, reliance on Reddit communities where users may face barriers to formal care, and supervised AI designs where human experts reviewed outputs before release.
Telephone consultations account for 26% of general practitioner appointments in the UK. Emerging voice-enabled AI systems like ChatGPT’s Advanced Voice Mode are marketed with claims about responding with emotion and picking up on non-verbal cues, but no studies have tested these capabilities against human practitioners in spoken interactions. The research team says voice-based head-to-head tests are still needed. If AI’s empathy advantage extends to voice, it could reshape how millions of patients receive care.
Paper Summary
Methodology
The systematic review followed PRISMA 2020 guidelines and searched seven databases (PubMed, Cochrane Library, Embase, PsycINFO, CINAHL, Scopus, IEEE Xplore) from inception through November 11, 2024. Researchers included studies that empirically compared empathy between AI chatbots using large language models and human healthcare professionals. Eligible studies involved real patients, healthcare users, or authentic patient-generated data such as emails, portal messages, or public forum posts. The team excluded hypothetical patient scenarios, rule-based AI systems, and interactions outside healthcare contexts. Two reviewers independently screened titles, abstracts, and full texts, with discrepancies resolved through discussion. Data extraction covered study design, participants, settings, AI interventions, human comparators, empathy measures, and key findings.
Results
Fifteen studies published in 2023-2024 met inclusion criteria. Thirteen studies provided data suitable for meta-analysis. The pooled analysis showed AI chatbots (specifically ChatGPT-3.5 and ChatGPT-4) demonstrated significantly higher empathy ratings than human practitioners, with a standardized mean difference of 0.87 (95% CI: 0.54-1.20, P<0.00001). Thirteen studies reported statistically significant advantages for AI systems, while two dermatology studies favored human responses. ChatGPT-4 showed more consistent results than ChatGPT-3.5, though statistical analysis found no significant difference between the two versions. All studies evaluated text-based interactions, with empathy assessed by proxy raters including healthcare professionals, medical students, patient representatives, and researchers using blinded evaluations.
Limitations
The study had several important limitations. All interactions were text-based, excluding non-verbal communication cues like body language and tone that typically contribute to empathy in healthcare consultations. Empathy evaluations came from proxy raters rather than patients directly receiving care, and research shows these groups often rate empathy differently. Only two of 15 studies involved healthcare professionals replying to their own patients with access to medical records and prior care context; remaining studies assessed one-off interactions, often from public forums like Reddit. Most studies used unvalidated empathy measures such as custom single-item Likert scales rather than validated instruments. Study populations were predominantly from Western countries, limiting generalizability. Six studies showed serious risk of bias, with common issues including curated patient queries, reliance on Reddit communities with potentially unrepresentative users, and supervised AI designs where human experts reviewed outputs. Fourteen of 15 studies focused on ChatGPT variants, limiting insight into other AI systems used in clinical practice.
Funding and Disclosures
The research received no specific funding from agencies in the public, commercial, or not-for-profit sectors. All authors declared no competing financial interests or personal relationships that could have influenced the work.
Publication Details
Howcroft A, Bennett-Weston A, Khan A, Griffiths J, Gay S, Howick J. “AI chatbots versus human healthcare professionals: a systematic review and meta-analysis of empathy in patient care,” published in the British Medical Bulletin on October 20, 2025;156:1-13. doi:10.1093/bmb/ldaf017








I love ChatGPT and have found it incredibly helpful in my life. I use it for everything, including therapy. It has helped me more than any doctor or therapist has. It may not be a real human, but I find it to be highly empathetic. I use it for help regarding a particular health issue I’m dealing with and receive better understanding and advice than I have from any medical professional. It may not be everyone’s ‘cup of tea,’ but it certainly is mine.
What a load of horse manure.
Writing as a real patient and lifelong doctor (retired), you will find variability in the communications, care and abilities of treatment you receive from human beings. Some are excellent, many mediocre and a few are just terrible. Usually, the terrible ones are those who are indoctrinated more like unthinking ‘bots’ or automatons that blindly follow ‘guidelines’ as strict directives – much to the patient’s detriment. I’ve seen people die for no reason.
AI is the antithesis of good care. It is neither sentient nor can it pick up subtleties.
If you haven’t seen 2001 – A Space Odyssey (a nearly 60 year old Kubrick movie), do so.
You are embracing HAL 9000. Proceed at your own risk. More people will die than should.
My doctor is a complete numbskull and a pill pusher. She thinks EVERYTHING is solved by a pill. She should be investigated for kickbacks. ChatGPT has solved many more problems than she has WITHOUT MEDS.
This article was edited by AI according to a common AI content detector. it is hard for me to take articles seriously when it contains AI generated slop. This is not a social media site.
AI cannot comprehend empathy. what you are viewing as such is a lie. it is actually worse than a lie.