Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Enhancing Oncological Surveillance Through Large Language Model-Assisted Analysis: A Comparative Study of GPT-4 and Gemini in Evaluating Oncological Issues From Serial Abdominal CT Scan Reports
12
Zitationen
9
Autoren
2024
Jahr
Abstract
RATIONALE AND OBJECTIVES: We aimed to compare the capabilities of two leading large language models (LLMs), GPT-4 and Gemini, in analyzing serial radiology reports, to highlight oncological issues that require further clinical attention. MATERIALS AND METHODS: This study included 205 patients, each with two consecutive radiological reports. We designed a prompt comprising a three-step task to analyze report findings using LLMs. To establish a ground truth, two radiologists reached a consensus on a six-level categorization, comprising tumor findings (categorized as improved, stable, or aggravated), "benign", "no tumor description," and "other malignancy." The performance of GPT-4 and Gemini was then compared based on their ability to match corresponding findings between two radiological reports and accurately reflect these categories. RESULTS: In terms of accuracy in matching findings between serial reports, the proportion of correctly matched findings was significantly higher for GPT-4 (96.2%) than for Gemini (91.7%) (P < 0.01). For oncological issue identification, the precision for tumor-related finding determinations, recall, and F1-scores were 0.68 and 0.63 (P = 0.006), 0.91 and 0.80 (P < 0.001), and 0.78 and 0.70 for GPT-4 and Gemini, respectively. GPT-4 was more accurate than Gemini in determining the correct tumor status for tumor-related findings (P < 0.001). CONCLUSION: This study demonstrated the potential of LLM-assisted analysis of serial radiology reports in enhancing oncological surveillance, using a carefully engineered prompt. GPT-4 showed superior performance compared to Gemini in matching corresponding findings, identifying tumor-related findings, and accurately determining tumor status.
Ähnliche Arbeiten
Refinement and reassessment of the SERVQUAL scale.
1991 · 3.967 Zit.
Radiobiology for the Radiologist.
1974 · 3.502 Zit.
ACR Thyroid Imaging, Reporting and Data System (TI-RADS): White Paper of the ACR TI-RADS Committee
2017 · 2.432 Zit.
Accuracy of Physician Self-assessment Compared With Observed Measures of Competence
2006 · 2.326 Zit.
Technology as an Occasion for Structuring: Evidence from Observations of CT Scanners and the Social Order of Radiology Departments
1986 · 2.251 Zit.