OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 27.05.2026, 18:28

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Beyond Accuracy: An Interpretability-Driven Audit of Deep Learning Models for Pneumonia Detection from Chest X-Rays

2026·0 ZitationenOpen Access
Volltext beim Verlag öffnen

0

Zitationen

1

Autoren

2026

Jahr

Abstract

Pneumonia remains one of the leading causes of morbidity and mortality among pediatric populations worldwide, with chest X-ray imaging serving as the primary diagnostic modality due to its accessibility and low cost. In recent years, deep learning-based approaches have demonstrated strong performance for automated pneumonia detection from chest radiographs. However, high predictive accuracy alone does not guarantee clinically trustworthy behavior, as neural networks may exploit spurious correlations or dataset-specific artifacts rather than learning true pathological features. In this work, we present a comprehensive deep learning framework for pediatric pneumonia screening from chest X-ray images with a strong emphasis on interpretability, robustness, and model auditing. A DenseNet-based convolutional neural network was trained using transfer learning and evaluated using clinically relevant metrics, including accuracy, F1score, sensitivity, specificity, and the area under the receiver operating characteristic curve (AUROC). While the model achieved high AUROC and near-perfect sensitivity, extensive interpretability analysis using GradCAM++ and RISE revealed that high-confidence predictions were frequently driven by non-pathological cues, such as radiographic text markers, osseous structures, and central thoracic anatomy. To rigorously validate these observations, controlled causal experiments were conducted by injecting and removing radiographic markers in chest X-ray images. These interventions resulted in statistically significant changes in predicted pneumonia probability for normal images, providing direct evidence of shortcut learning, while having negligible impact on true pneumonia cases. Following this audit, a targeted mitigation strategy was implemented by systematically removing these artifacts from the training data. The retrained model exhibited improved specificity (0.49 → 0.86), more anatomically plausible saliency maps, and a better balance between performance and trustworthiness. Finally, external validation on the NIH ChestX-ray14 dataset demonstrated that the audited model retained discriminative capability (AUROC 0.65) in a zero-shot transfer scenario, comparable to supervised baselines given the domain shift. Overall, this study highlights the limitations of accuracy-centric evaluation in medical imaging and demonstrates how explainability and causal analysis can be leveraged to build more reliable and clinically trustworthy deep learning systems.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

COVID-19 diagnosis using AIPneumonia and Respiratory InfectionsArtificial Intelligence in Healthcare and Education
Volltext beim Verlag öffnen