Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Are There Best Ways to Respond to a Patient With an <scp>AI</scp> ‐Generated Answer You May Disagree With?

2025·0 Zitationen·The LaryngoscopeOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

With the increasing use of generative artificial intelligence by patients, discrepancies in discussion can arise during the patient–physician clinic visit. This paper seeks to address practical guidelines for responding during a clinical encounter in which the physician disagrees with an AI-generated medical answer brought forth by a patient. Generative artificial intelligence (AI) and large language models (LLMs) have revolutionized easy public access to information about medical illnesses and treatments. For patients, LLMs can summarize information from hundreds of online resources and generate friendly human-like text responses regarding their medical condition. Thus, the use of AI by patients is becoming increasingly popular. A rising quandary now exists since many different LLMs are readily available to give patients medical opinions in how to diagnose and manage medical conditions (common sites include ChatGPT, Claude, Copilot, DeepSeek, Gemini, Grok, etc.). Answers generated by AI can sometimes be assertive in nature and thus can skew a patient's opinion, resulting in a discrepancy with the physician's evaluation and their diagnosis or recommended management plans. This can be quite daunting to patients and challenging for physicians to respond to in clinic. In this Triological Best Practices article, we attempt to address practical ways one could respond to such a clinical encounter. In a narrative JAMA article by Sundar, the author shares his experience with a patient who arrives in his clinic with intermittent dizziness, who the patient describes her personal symptoms using medical terminology and asks for tilt table testing [1]. After some questioning, he discovers that the patient utilized ChatGPT to query her symptoms. Not only was tilt table testing not routinely available in his clinic, but it also was not particularly necessary at this stage in her diagnosis. He emphasized that in the age of AI he has been tasked with “explaining concepts like overdiagnosis, false-positives, or other risks of unnecessary testing.” The article states that some patients feel that the knowledge gained from the use of generative AI allows them to “advocate for [themselves] better.” In addition, he notes that the friendly tone of LLM-generated answers allows patients to feel their concerns are heard. Ultimately, the call-to-action was to approach the patient with empathy first and knowledge second, because only after developing rapport can one establish themself as a trusted authority. Liu et al. reviewed papers that demonstrated the utility of ChatGPT in clinical practice, including clinical decision support, question-answer/medical queries, and medical documentation [2]. Overall, ChatGPT was able to answer medical questions across different specialty areas (retinal disease, obstetrics and gynecology, hepatic disease, and cancer) with greater than 70% accuracy. One limitation contributing to the inaccuracies was that the performance of the LLM was only as strong as its training data. The authors emphasize that answers generated by ChatGPT must be independently verified by physicians. Additionally, ChatGPT cannot consider the whole patient, including their specific medical history and physical exam findings. A research study by Xu et al. examined the performance of ChatGPT within the field of otolaryngology [3]. They tested ChatGPT-4o on a set of 50 questions about thyroid cancer across multiple domains—assessment and diagnosis, treatment strategies, postoperative care, psychological support, and rehabilitation and prognosis. The author used either no prompt prior to asking each of the 50 questions or prompts that asked ChatGPT to answer at a 6th grade reading level, answer at an 8th grade reading level, or answer using statistics and references to support the answer. The “statistics and references” prompt improved the quality of the responses as graded by expert reviewers. This finding suggests that the way in which the user interacts with ChatGPT will change the content of the generated answer. Since patients are going to continue using LLMs, physicians may find it useful to educate their patients to obtain the most accurate results by confirming with their physician. Patients may also be using LLMs to understand whether they are a candidate for surgery and may come to the office requesting a surgery that may not be indicated or may be too risky. A paper by Langlie et al. asked ChatGPT-3.5 for a description (“how do I know if I need [procedure]”), treatment alternatives, risks, surgical procedure, and recovery process for five common otolaryngology procedures (adenotonsillectomy, tympanoplasty, endoscopic sinus surgery, parotidectomy, and total laryngectomy) and qualitatively graded these answers [4]. Their results suggested that the basic description and treatment alternatives were accurate, but specific risks, surgical steps, and important aspects of recovery were not always included in the ChatGPT answers. Given this information, a physician who is faced with this situation should highlight the realistic decision to perform surgery and explain risks that ChatGPT may have missed so that the patient is better informed. A study by Armbruster et al. surveyed patients and physicians on ChatGPT 4.0-generated answers versus responses from a web-based platform's expert panel (EP) [5]. There were 20 questions per specialty for which responses were graded (1 of the 5 specialties was otolaryngology). A key finding was that both patients and physicians rated the ChatGPT answers as higher in empathy and usefulness than the response from the EP. Additionally, physicians classified some ChatGPT answers as potentially harmful due to “overtreatment or overdiagnosis, undertreatment or underdiagnosis, or insufficient patient education.” The data also show that physicians rated the ChatGPT answers that gave potentially harmful advice with lower empathy, usefulness, and correctness scores. However, patients did not rate the same ChatGPT answers as inferior. Thus, it is important to remember that patients do not necessarily have the medical knowledge and clinical experience to critically evaluate a LLM-generated answer. Thus, the patient can be more vulnerable to misinterpretation with ChatGPT's friendly answer and may not recognize potentially harmful advice. In the situation where a patient arrives at the clinic with an AI-generated answer, several important responses are important to convey to the patient. As AI continues to evolve to educate the public about medical ailments, physicians are uniquely qualified to show the authenticity and value of the true patient–physician relationship by comprehensively evaluating and treating the patients as a whole human being (Table 1). The studies included four level 3 studies and one level 4 study. The authors have nothing to report. The authors declare no conflicts of interest. Data sharing not applicable - no new data generated as this is a review of publicly available papers.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationAI in Service InteractionsClinical Reasoning and Diagnostic Skills

Volltext beim Verlag öffnen

Are There Best Ways to Respond to a Patient With an <scp>AI</scp> ‐Generated Answer You May Disagree With?

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen