Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluating Hallucination in Medical Prompt Responses: A Comparative Study of ChatGPT-4 and ChatGPT-4o
2
Zitationen
6
Autoren
2024
Jahr
Abstract
This study evaluates the performance of ChatGPT-4 and its optimized version, ChatGPT-4o, in generating medical responses, aiming to determine which model offers more accurate and contextually relevant outputs in medical contexts. By employing various metrics, including cosine similarity, BERTScore, BLEU, ROUGE, METEOR, Jaccard similarity, F1-scores, topic modeling, and linguistic patterns with readability scores, we comprehensively compared the effectiveness of these models, ChatGPT-4 demonstrated better semantic alignment with reference answers, achieving a higher mean cosine similarity score, while ChatGPT-4o outperformed in overall alignment and recall metrics. The Jaccard similarity showed moderate vocabulary overlap for both models, reflecting effective use of medical terminology. F1-score analysis revealed perfect scores for “BMI”, but identified deficiencies in terms like “weight,” “obesity,” and “health.” Topic modeling indicated that both models addressed similar health-related themes but differed in specific term usage, with unique terms like “mmHg” more prevalent in ChatGPT-4o and “child” and “protein” in ChatGPT-4. Linguistic patterns and readability analysis showed that ChatGPT-4 provided more readable and lexically diverse responses, making it more suitable for general users, whereas ChatGPT-4o offered more detailed but complex responses, beneficial for specialized medical contexts. These findings emphasize the need for continuous refinement of AI models to enhance their performance and usability in healthcare applications.
Ähnliche Arbeiten
"Why Should I Trust You?"
2016 · 14.396 Zit.
A Comprehensive Survey on Graph Neural Networks
2020 · 8.729 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.270 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.702 Zit.
Artificial intelligence in healthcare: past, present and future
2017 · 4.437 Zit.