Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluating Hallucination in Medical Prompt Responses: A Comparative Study of ChatGPT-4 and ChatGPT-4o

2024·2 Zitationen

Volltext beim Verlag öffnen

Zitationen

Autoren

2024

Jahr

Abstract

This study evaluates the performance of ChatGPT-4 and its optimized version, ChatGPT-4o, in generating medical responses, aiming to determine which model offers more accurate and contextually relevant outputs in medical contexts. By employing various metrics, including cosine similarity, BERTScore, BLEU, ROUGE, METEOR, Jaccard similarity, F1-scores, topic modeling, and linguistic patterns with readability scores, we comprehensively compared the effectiveness of these models, ChatGPT-4 demonstrated better semantic alignment with reference answers, achieving a higher mean cosine similarity score, while ChatGPT-4o outperformed in overall alignment and recall metrics. The Jaccard similarity showed moderate vocabulary overlap for both models, reflecting effective use of medical terminology. F1-score analysis revealed perfect scores for “BMI”, but identified deficiencies in terms like “weight,” “obesity,” and “health.” Topic modeling indicated that both models addressed similar health-related themes but differed in specific term usage, with unique terms like “mmHg” more prevalent in ChatGPT-4o and “child” and “protein” in ChatGPT-4. Linguistic patterns and readability analysis showed that ChatGPT-4 provided more readable and lexically diverse responses, making it more suitable for general users, whereas ChatGPT-4o offered more detailed but complex responses, beneficial for specialized medical contexts. These findings emphasize the need for continuous refinement of AI models to enhance their performance and usability in healthcare applications.

Autoren

Institutionen

Surya University(ID)

Themen

Machine Learning in HealthcareArtificial Intelligence in Healthcare and EducationFunctional Brain Connectivity Studies

Volltext beim Verlag öffnen

Evaluating Hallucination in Medical Prompt Responses: A Comparative Study of ChatGPT-4 and ChatGPT-4o

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen