SAN DIEGO — ChatGPT offers better and more empathetic advice than real doctors, and users prefer it to general practitioners (GPs), according to a new study. The research also suggests that AI assistants like ChatGPT could help reduce physicians’ workload in the future.
A medical panel preferred the AI chatbot’s responses to real doctors’ answers in eight out of 10 instances when both answered the same medical questions. The panel found ChatGPT’s responses to be 3.5 times more likely to be of superior quality and nearly 10 times as likely to be more empathetic than doctors’ responses. The study, conducted by researchers from the University of California San Diego School of Medicine, offers an early glimpse into the potential role of AI technology in the future of medicine.
To obtain a large and diverse sample of healthcare questions with responses from qualified physicians, the study team turned to the social media network Reddit. Millions of patients post daily medical questions on the r/AskDocs subreddit forum, which has over 450,000 members. Verified healthcare professionals answer these questions, and their credentials are made visible alongside their responses.
The research team sampled 195 exchanges from the r/AskDocs forum in which a verified doctor responded to a public question posted by a group member. They then posed the same question to ChatGPT, an increasingly popular and impressively intelligent AI chatbot developed by OpenAI, and asked it to write a response.
Three licensed healthcare professionals assessed each question alongside the relevant answers, without knowing which was submitted by a genuine doctor and which were written by ChatGPT. They rated the responses based on information quality and empathy, noting which response they preferred.
ChatGPT did a better job of addressing patients’ concerns
The results were surprising: panel members preferred ChatGPT’s responses over doctors’ 79 percent of the time. AI responses also received significantly higher quality ratings, with “good” or “very good” quality answers 3.6 times more likely to come from ChatGPT (physicians 22.1% vs. ChatGPT 78.5%). The AI responses were found to be more empathetic than those of human doctors, with “empathetic” or “very empathetic” responses nearly 10 times more likely to come from ChatGPT (physicians 4.6% vs. ChatGPT 45.1%).
Dr. John Ayers, of the Qualcomm Institute within the University of California San Diego, says the study, published in JAMA Internal Medicine, has significant implications for the future of medicine.
Jessica Kelley, a nurse practitioner with San Diego firm Human Longevity and study co-author, adds that ChatGPT provided nuanced and accurate information, often addressing more aspects of patients’ questions than physician responses. Dr. Aaron Goodman, another co-author and associate clinical professor at UC San Diego School of Medicine, expressed his amazement, saying ChatGPT would “transform the way I support my patients.”
Doctors want to use the chatbot as a medical tool
However, the research team emphasizes that although ChatGPT could significantly help doctors by relieving their expanding workload, especially with online consultations, it would not replace them. Dr. Adam Poliak, an assistant professor of Computer Science at Bryn Mawr College and study co-author, explains that the ultimate solution is not to eliminate doctors but to have physicians harness ChatGPT for better and more empathetic care.
Dr. Christopher Longhurst, Chief Medical Officer and Chief Digital Officer at UC San Diego Health, suggests that ChatGPT could give medical advice reviewed by doctors in the near future.
“Our study is among the first to show how AI assistants can potentially solve real-world healthcare delivery problems. These results suggest that tools like ChatGPT can efficiently draft high quality, personalized medical advice for review by clinicians, and we are beginning that process at UCSD Health,” Dr. Longhurst says.
Co-author Dr. Mike Hogarth emphasized the importance of integrating AI assistants into healthcare messaging through randomized controlled trials to evaluate their impact on both physicians and patients.
You might also be interested in:
- ‘I used ChatGPT to force my landlord to fix my washer and dryer’
- Should AI decide life and death issues? ChatGPT can influence society’s moral judgements
- Doctors give patients just 11 seconds to explain reason for visit before interrupting!
ChatGPT could lead the next wave of telehealth services
Doctors also pointed to the pandemic as a catalyst for a significant shift in how patients seek medical advice. Dr. Eric Leas, an assistant professor at the UC San Diego Herbert Wertheim School of Public Health and Human Longevity Science, argued that the COVID-19 pandemic accelerated virtual healthcare adoption. While this made accessing care easier for patients, physicians faced an overwhelming number of electronic patient messages seeking medical advice, contributing to record-breaking levels of physician burnout.
“We could use these technologies to train doctors in patient-centered communication, eliminate health disparities suffered by minority populations who often seek healthcare via messaging, build new medical safety systems, and assist doctors by delivering higher quality and more efficient care,” concludes Dr. Mark Dredze, the John C. Malone Associate Professor of Computer Science at Johns Hopkins and study co-author.
How does ChatGPT work?
According to ChatGPT itself, the program is a language model based on the GPT-4 architecture developed by OpenAI. It is designed to understand and generate human-like responses in a conversational context. The underlying technology, GPT-4, is an advanced iteration of the GPT series and improves upon its predecessors in terms of scale and performance. Here’s an overview of how ChatGPT works:
- Pre-training: ChatGPT is pre-trained on a large body of text data from diverse sources like books, articles, and websites. During this phase, the model learns the structure and patterns in human language, such as grammar, syntax, semantics, and even some factual information. However, it is essential to note that the knowledge acquired during pre-training is limited to the information available in the training data, which has a cutoff date.
- Fine-tuning: After the pre-training phase, ChatGPT is fine-tuned using a narrower dataset, typically containing conversations or dialogue samples. This dataset may be generated with the help of human reviewers following specific guidelines. The fine-tuning process helps the model learn to generate more contextually relevant and coherent responses in a conversational setting.
- Transformer architecture: ChatGPT is based on the transformer architecture, which allows it to efficiently process and generate text. It uses self-attention mechanisms to weigh the importance of words in a given context and to capture long-range dependencies in language. This architecture enables the model to understand and generate complex and contextually appropriate responses.
- Tokenization: When a user inputs text, ChatGPT first tokenizes the text into smaller units called tokens. These tokens can represent characters, words, or subwords, depending on the language and tokenization strategy used. The model processes these tokens in parallel, allowing it to generate context-aware responses quickly.
- Decoding: After processing the input tokens and generating a context vector, ChatGPT decodes the output by generating a sequence of tokens that form the response. This is typically done using a greedy search, beam search, or other decoding strategies to select the most likely next token based on the model’s predictions.
- Interactive conversation: ChatGPT maintains a conversation history to keep track of the context during a dialogue. This history is fed back into the model during each interaction, enabling it to generate contextually coherent responses.
It’s important to note that the AI program actually admits that it has limitations, such as generating incorrect or nonsensical answers, being sensitive to input phrasing, being excessively verbose, or not asking clarifying questions for ambiguous queries. OpenAI adds that it continually works on improving these aspects and refining the model to make it more effective and safer for the public to use.
South West News Service writer James Gamble contributed to this report.