Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Assessment of ChatGPT-4o’s Answers to Common Questions on Thyroid Fine-Needle Aspiration Biopsy
0
Zitationen
2
Autoren
2026
Jahr
Abstract
This study set out to assess the quality of ChatGPT-4o’s (Chat Generative Pre-trained Transformer, version 4o) replies to common patient questions concerning thyroid fine-needle aspiration biopsy (FNAB). A cross-sectional design was employed, in which patient-focused questions were gathered using the search phrase “frequently asked questions about thyroid biopsy” on Google. Following the removal of duplicates and overlapping items, 20 unique questions were chosen. Each question was submitted to ChatGPT-4o in a new session. The generated responses were then evaluated by 12 radiologists, all blinded to the source of the answers. Ratings were given on a 5-point Likert scale across four categories: relevance, accuracy, clarity, and completeness. Descriptive analyses were performed, and interrater reliability was calculated using the intraclass correlation coefficient (ICC). All 20 questions received scores between 3 and 5 in every category. The overall mean score was 4.72±0.12. Relevance achieved the best performance with a mean of 4.95±0.06, while clarity was the lowest at 4.61±0.23. The reliability analysis showed weak agreement among evaluators, with ICC values of –0.028 (p=0.863) for relevance, 0.061 (p=0.005) for accuracy, 0.072 (p=0.002) for clarity, 0.031 (p=0.016) for completeness, and 0.061 (p=0.002) for the overall score. In conclusion, ChatGPT-4o produced highly relevant, accurate, and generally comprehensive responses to patient inquiries regarding thyroid FNAB. Nonetheless, the limited interrater reliability underscores variability in expert judgment, especially in clarity and completeness. Although ChatGPT-4o holds promise as a supportive tool for patient education, its outputs should be reviewed and tailored by healthcare professionals prior to use in clinical practice.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.490 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.376 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.832 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.553 Zit.