Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Benchmarking GPT-5, LLaMA, and Mistral for Clinical Named Entity Recognition in Ophthalmology Progress Notes
0
Zitationen
3
Autoren
2026
Jahr
Abstract
Purpose: To compare the performance of GPT-5, LLaMA 3.1-70B, and Mistral 7B in extracting clinically relevant entities from unstructured ophthalmology progress notes. Methods: We evaluated GPT-5, quantized and full-precision LLaMA 3.1-70B, and Mistral 7B on a corpus of 480 publicly available deidentified progress notes from patients with glaucoma at a single academic center. Each note was presented to all models using a standardized prompt. The outputs were manually annotated for six clinical entity types: ocular surgical history, follow-up dates, intraocular pressure (IOP), diagnostic tests and results, ocular conditions, and family history. Precision, recall, F1 score, and accuracy were calculated for each model. Micro- and macro-averaged evaluations were also performed, alongside stratification by note length and qualitative error analysis. Results: GPT-5 outperformed LLaMA 3.1-70B and Mistral 7B across all metrics, with its highest F1 score in surgical history extraction (0.929) and lowest in diagnostic tests (0.820). Full-precision LLaMA 3.1-70B and Mistral performed best on IOP extraction (F1 score: 0.875 and 0.884, respectively) and worst on diagnostic tests (F1 score: 0.338 and 0.642, respectively). Micro- and macro-averaged evaluations confirmed GPT-5's superior performance (F1: 0.941 micro, 0.936 macro) compared to full-precision LLaMA's (F1: 0.718 micro, 0.694 macro) and Mistral's (F1: 0.804 micro, 0.789 macro). Conclusions: These findings highlight the promise of large language models for accurate clinical information extraction from unstructured ophthalmic text. Translational Relevance: Large language models may offer the potential to streamline information retrieval, enhance clinical decision-making, and generate structured datasets that can accelerate research and improve patient outcomes in ophthalmology.
Ähnliche Arbeiten
Research electronic data capture (REDCap)—A metadata-driven methodology and workflow process for providing translational research informatics support
2008 · 51.056 Zit.
Gene Ontology: tool for the unification of biology
2000 · 44.406 Zit.
STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets
2018 · 19.057 Zit.
Haploview: analysis and visualization of LD and haplotype maps
2004 · 14.715 Zit.
A translation approach to portable ontology specifications
1993 · 12.507 Zit.