Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
AI-First, Expert-Verified: Validating Generative AI for HFACS-Based Coding of Healthcare RCA Transcripts with Governance Considerations
0
Zitationen
9
Autoren
2026
Jahr
Abstract
BACKGROUND: Root Cause Analysis (RCA) is widely used in healthcare incident investigation, but its outputs can be limited by inconsistent causal framing and variable integration of human factors. The Human Factors Analysis and Classification System (HFACS) offers a structured taxonomy for causal attribution, yet manual coding is resource intensive. Empirical validation of generative AI for document-level HFACS coding from complete healthcare RCA transcripts remains limited. METHODS: We conducted a cross-sectional validation study at an 829-bed medical center in Taiwan. Thirty-five de-identified RCA interview transcripts (2024-2025) with verbatim transcription were analyzed using SKH-AI, an in-house platform integrating an Azure OpenAI-hosted GPT-4o model with deterministic decoding (temperature = 0; top_p = 1.0). The model processed each transcript holistically to identify salient narrative segments and assign HFACS codes with evidence-linked rationales, without rule-based post-processing. Outputs were compared with dual-expert HFACS coding with adjudicated consensus. Performance was assessed using precision, recall, Micro-/Macro-F1, and Cohen's κ with bootstrapped 95% confidence intervals. RESULTS: Across 562 AI-derived segments, Micro-F1 was 0.66 (95% CI: 0.63-0.69) and Macro-F1 was 0.68 (95% CI: 0.64-0.72), with moderate agreement versus expert coding (κ = 0.56, 95% CI: 0.52-0.60). Performance was higher for text-anchored categories (Level 2 Preconditions F1 = 0.70; Level 1 Unsafe Acts F1 = 0.69) than for more abstract domains (Level 3 F1 = 0.66; Level 4 F1 = 0.65). Subcategory analyses showed stronger detection of concrete cues (e.g., decision errors, physical environment, communication) and weaker performance for latent constructs (e.g., process management). Bias analyses indicated a recall-leaning tendency at Levels 3-4, consistent with increased over-attribution risk. CONCLUSIONS: Generative AI can produce auditable, evidence-linked candidate HFACS attributions from document-level RCA transcripts with moderate concordance to expert coding. Higher-level supervisory and organizational attributions remain vulnerable to overgeneralization and should be governed as decision support, with evidence anchoring and mandatory expert sign-off for Level 3-4 codes.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.561 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.452 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.948 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.797 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.