Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

AI-First, Expert-Verified: Validating Generative AI for HFACS-Based Coding of Healthcare RCA Transcripts with Governance Considerations

2026·0 Zitationen·International Journal for Quality in Health Care

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

BACKGROUND: Root Cause Analysis (RCA) is widely used in healthcare incident investigation, but its outputs can be limited by inconsistent causal framing and variable integration of human factors. The Human Factors Analysis and Classification System (HFACS) offers a structured taxonomy for causal attribution, yet manual coding is resource intensive. Empirical validation of generative AI for document-level HFACS coding from complete healthcare RCA transcripts remains limited. METHODS: We conducted a cross-sectional validation study at an 829-bed medical center in Taiwan. Thirty-five de-identified RCA interview transcripts (2024-2025) with verbatim transcription were analyzed using SKH-AI, an in-house platform integrating an Azure OpenAI-hosted GPT-4o model with deterministic decoding (temperature = 0; top_p = 1.0). The model processed each transcript holistically to identify salient narrative segments and assign HFACS codes with evidence-linked rationales, without rule-based post-processing. Outputs were compared with dual-expert HFACS coding with adjudicated consensus. Performance was assessed using precision, recall, Micro-/Macro-F1, and Cohen's κ with bootstrapped 95% confidence intervals. RESULTS: Across 562 AI-derived segments, Micro-F1 was 0.66 (95% CI: 0.63-0.69) and Macro-F1 was 0.68 (95% CI: 0.64-0.72), with moderate agreement versus expert coding (κ = 0.56, 95% CI: 0.52-0.60). Performance was higher for text-anchored categories (Level 2 Preconditions F1 = 0.70; Level 1 Unsafe Acts F1 = 0.69) than for more abstract domains (Level 3 F1 = 0.66; Level 4 F1 = 0.65). Subcategory analyses showed stronger detection of concrete cues (e.g., decision errors, physical environment, communication) and weaker performance for latent constructs (e.g., process management). Bias analyses indicated a recall-leaning tendency at Levels 3-4, consistent with increased over-attribution risk. CONCLUSIONS: Generative AI can produce auditable, evidence-linked candidate HFACS attributions from document-level RCA transcripts with moderate concordance to expert coding. Higher-level supervisory and organizational attributions remain vulnerable to overgeneralization and should be governed as decision support, with evidence anchoring and mandatory expert sign-off for Level 3-4 codes.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationMachine Learning in HealthcareElectronic Health Records Systems

Volltext beim Verlag öffnen

AI-First, Expert-Verified: Validating Generative AI for HFACS-Based Coding of Healthcare RCA Transcripts with Governance Considerations

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen