OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 25.05.2026, 05:02

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Benchmarking GPT-5, LLaMA, and Mistral for Clinical Named Entity Recognition in Ophthalmology Progress Notes

2026·0 Zitationen·Translational Vision Science & TechnologyOpen Access
Volltext beim Verlag öffnen

0

Zitationen

3

Autoren

2026

Jahr

Abstract

Purpose: To compare the performance of GPT-5, LLaMA 3.1-70B, and Mistral 7B in extracting clinically relevant entities from unstructured ophthalmology progress notes. Methods: We evaluated GPT-5, quantized and full-precision LLaMA 3.1-70B, and Mistral 7B on a corpus of 480 publicly available deidentified progress notes from patients with glaucoma at a single academic center. Each note was presented to all models using a standardized prompt. The outputs were manually annotated for six clinical entity types: ocular surgical history, follow-up dates, intraocular pressure (IOP), diagnostic tests and results, ocular conditions, and family history. Precision, recall, F1 score, and accuracy were calculated for each model. Micro- and macro-averaged evaluations were also performed, alongside stratification by note length and qualitative error analysis. Results: GPT-5 outperformed LLaMA 3.1-70B and Mistral 7B across all metrics, with its highest F1 score in surgical history extraction (0.929) and lowest in diagnostic tests (0.820). Full-precision LLaMA 3.1-70B and Mistral performed best on IOP extraction (F1 score: 0.875 and 0.884, respectively) and worst on diagnostic tests (F1 score: 0.338 and 0.642, respectively). Micro- and macro-averaged evaluations confirmed GPT-5's superior performance (F1: 0.941 micro, 0.936 macro) compared to full-precision LLaMA's (F1: 0.718 micro, 0.694 macro) and Mistral's (F1: 0.804 micro, 0.789 macro). Conclusions: These findings highlight the promise of large language models for accurate clinical information extraction from unstructured ophthalmic text. Translational Relevance: Large language models may offer the potential to streamline information retrieval, enhance clinical decision-making, and generate structured datasets that can accelerate research and improve patient outcomes in ophthalmology.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Biomedical Text Mining and OntologiesArtificial Intelligence in Healthcare and EducationTopic Modeling
Volltext beim Verlag öffnen