Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Improving Medical Assessment Validity with Generative AI: Lessons from Human–AI Interaction in Bloom’s Taxonomy Classification

2026·0 Zitationen·Medical Science EducatorOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

While Bloom's Taxonomy is essential for developing higher-order clinical reasoning, its practical application by faculty remains limited. This perpetuates a reliance on assessments that favor memorization over clinical reasoning. This study aimed to develop and validate a Generative AI tool to assist educators in accurately classifying assessment items, and to evaluate the tool's impact on medical faculty's classification accuracy. A two-phase experimental design was used. Phase 1 involved training and validating a ChatGPT-4 model using 200 medical exam items classified by a panel who established a “gold standard”. The model’s performance was then tested on an independent set of 100 items and assessed using overall accuracy, Cohen’s Kappa, and Matthews Correlation Coefficient (MCC). Phase 2 involved experienced medical professors using the validated AI tool. In Phase 1, the AI model achieved a high overall accuracy of 95.0% (95% CI: 90.0–99.0%) and “very good” inter-rater agreement with the expert standard (K = 0.85). In Phase 2, faculty demonstrated high overall adherence (75.2%) to the AI’s recommendations. In cases of disagreement where faculty chose to override the AI, the faculty’s final decision was accurate only 29.4% of the time, demonstrating human overconfidence. Generative AI, despite a specific and predictable flaw in differentiating “Analyze” from “Understand,” serves as a powerful partner for medical faculty. The tool provides a more reliable classification than un-aided, non-expert human judgment. It can significantly improve assessment validity by supporting faculty, helping to bridge pedagogical training gaps, and promoting the development of assessments that target higher-order cognitive skills essential for medical practice.

Autoren

Institutionen

Themen

Clinical Reasoning and Diagnostic SkillsArtificial Intelligence in Healthcare and EducationIntelligent Tutoring Systems and Adaptive Learning

Volltext beim Verlag öffnen

Improving Medical Assessment Validity with Generative AI: Lessons from Human–AI Interaction in Bloom’s Taxonomy Classification

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen