Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Accuracy, consistency, and contextual understanding of large language models in restorative dentistry and endodontics
5
Zitationen
4
Autoren
2025
Jahr
Abstract
OBJECTIVE: This study aimed to evaluate and compare the performance of several large language models (LLMs) in the context of restorative dentistry and endodontics, focusing on their accuracy, consistency, and contextual understanding. METHODS: The dataset was extracted from the national educational archives of the Collège National des Enseignants en Odontologie Conservatrice (CNEOC) and includes all chapters from the reference manual for dental residency applicants. Multiple-choice questions (MCQs) were selected following a review by three independent academic experts. Four LLMs were assessed: ChatGPT-3.5, ChatGPT-4 (OpenAI), Claude-3 (Anthropic), and Mistral 7B (Mistral AI). Model accuracy was determined by comparing responses with expert-provided answers. Consistency was measured through robustness (the ability to provide identical responses to paraphrased questions) and repeatability (the ability to provide identical responses to the same question). Contextual understanding was evaluated based on the model's ability to categorise questions correctly and infer terms from definitions. Additionally, accuracy was reassessed after providing the LLMs with the relevant full course chapter. RESULTS: A total of 517 MCQs and 539 definitions were included. ChatGPT-4 and Claude-3 demonstrated significantly higher accuracy and repeatability than Mistral 7B, with ChatGPT-4 showing the greater robustness. Advanced LLMs displayed high accuracy in presenting dental content, although performance varied on closely related concepts. Supplying course chapters generally improved response accuracy, though inconsistently across topics. CONCLUSION: Even the most advanced LLMs, such as ChatGPT-4 and Claude 3, achieve moderate performance and require cautious use due to inconsistencies in robustness. Future studies should focus on integrating validated content and refining prompt engineering to enhance the educational and clinical utility of LLMs. CLINICAL SIGNIFICANCE: The findings underscore the potential of advanced LLMs and context-based prompting in restorative dentistry and endodontics.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.646 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.554 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.071 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.851 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Autoren
Institutionen
- Université de Bordeaux(FR)
- Centre Hospitalier Universitaire de Bordeaux(FR)
- Bordeaux Population Health(FR)
- Inserm(FR)
- Economic & Social Sciences, Health Systems & Medical Informatics(FR)
- Institut de Recherche pour le Développement(FR)
- Université Claude Bernard Lyon 1(FR)
- Centre National de la Recherche Scientifique(FR)
- Université de Lyon(FR)
- Institut National des Sciences Appliquées de Lyon(FR)