Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
61 COMPASS: Comprehensive multimodal performance assessment for safe vascular AI systems using Curie8, an LLM-based medical imaging data pipeline
0
Zitationen
9
Autoren
2026
Jahr
Abstract
Objectives/Goals: To identify and characterize real-world AI error points in vascular imaging, and then develop and use curie8, a large language model (LLM)-based data pipeline, to curate a broad, multimodal dataset for COMPASS – an AI stress-testing framework to define safety boundaries for clinical deployment. Methods/Study Population: We analyzed a pilot dataset of CT pulmonary angiography (CTPA) studies processed by a commercial AI tool for pulmonary embolism detection. AI errors were categorized using a radiologist-defined clinical and technical taxonomy to identify factors which affect AI performance. In parallel, curie8 was developed as an institutional LLM pipeline to query the radiology report text database and automatically link corresponding DICOM medical images and metadata. Together, the pilot dataset and taxonomy will inform the design of a full curie8-curated stress-test CTPA real-world dataset across the health system’s nine imaging sites (Figure 1). 2025-10-20 ACTS Fig and Table.pptx [ https://somumaryland-my.sharepoint.com/:p:/g/personal/fdoo_som_umaryland_edu/EXPxiwh3cxhHpnWYLglMxK4BaByWvFivJq5p9ho5Csm4GA?e=ZPCrAC] Results/Anticipated Results: Pilot analyses (n=5,923) showed high overall performance of the commercial AI PE detection tool (sensitivity 89.6%, specificity 98.9%, PPV 88.6%, NPV 99.0%, accuracy 98.1%). AI errors were a small fraction, but had characteristic features – such as small embolus false negatives and artifact-driven false positives – influenced by patient and scan characteristics (Table 1) and technical factors (Table 2). These findings will guide thecurie8 pipeline in building a full multimodal vascular imaging dataset for benchmarking AI models under real-world stressors. Expected deliverables include a harmonized stress-test framework for vascular imaging AI safety evaluation. 2025-10-20 ACTS Fig and Table.pptx [ https://somumaryland-my.sharepoint.com/:p:/g/personal/fdoo_som_umaryland_edu/EXPxiwh3cxhHpnWYLglMxK4BaByWvFivJq5p9ho5Csm4GA?e=ZPCrAC ] Discussion/Significance of Impact: COMPASS combines institutional LLM-driven data curation (curie8) with structured AI error analysis to build standardized stress-testing resources. This framework advances reproducible, regulatory-aligned evaluation of AI safety and reliability in clinical imaging settings.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.773 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.682 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.242 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.898 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.