Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Artificial intelligence for procedural coding in cardiac critical care: Evaluating large language models for current procedural terminology accuracy
0
Zitationen
7
Autoren
2025
Jahr
Abstract
BackgroundAccurate procedural coding is essential for resource allocation, billing integrity, and quality reporting within critical care. Current Procedural Terminology (CPT) coding is largely manual and error-prone, especially in high-acuity environments such as the cardiovascular surgical intensive care unit (CVSICU), where complex procedures like extracorporeal membrane oxygenation (ECMO) are common. Large Language Models (LLMs) may offer scalable solutions for automated coding, but their performance in the CVSICU has not been systematically evaluated.MethodsSix publicly accessible LLMs (GPT-4, Claude 3.7 Sonnet, Perplexity, DeepSeek, Google Gemini 2.5 Pro, Mistral) were tested on CPT code assignment to 47 CVSICU procedures, including 7 ECMO-related interventions, from a single tertiary center (July 2023 to May 2025). Models received prompts in a standardized format and was evaluated based on code accuracy. Statistical comparisons were conducted to assess inter-model performance differences for ECMO and non-ECMO related procedures.ResultsFor non-ECMO procedures, Gemini 2.5 Pro and Perplexity achieved the highest accuracy (88%), followed by Deepseek (78%), Claude 3.7 Sonnet (75%), Mistral (68%), and GPT-4.0 (56%). For ECMO-related codes, Perplexity outperformed all models (86%), followed by Gemini 2.5 Pro (71%), Mistral (43%), DeepSeek (29%), Claude 3.7 Sonnet (14%), and GPT 4.0 (0%). Pairwise comparisons revealed statistically significant inter-model differences.ConclusionsWhile LLMs such as Perplexity and Gemini show promise for automated coding, their limited understanding of context, specifically context-dependent nuances of ECMO, remains a key barrier. Future work should focus on developing domain-specific fine-tuning to capture procedural context before they are employed in high acuity clinical settings.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.490 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.376 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.832 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.553 Zit.