Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Improving Medical Assessment Validity with Generative AI: Lessons from Human–AI Interaction in Bloom’s Taxonomy Classification
0
Zitationen
7
Autoren
2026
Jahr
Abstract
While Bloom's Taxonomy is essential for developing higher-order clinical reasoning, its practical application by faculty remains limited. This perpetuates a reliance on assessments that favor memorization over clinical reasoning. This study aimed to develop and validate a Generative AI tool to assist educators in accurately classifying assessment items, and to evaluate the tool's impact on medical faculty's classification accuracy. A two-phase experimental design was used. Phase 1 involved training and validating a ChatGPT-4 model using 200 medical exam items classified by a panel who established a “gold standard”. The model’s performance was then tested on an independent set of 100 items and assessed using overall accuracy, Cohen’s Kappa, and Matthews Correlation Coefficient (MCC). Phase 2 involved experienced medical professors using the validated AI tool. In Phase 1, the AI model achieved a high overall accuracy of 95.0% (95% CI: 90.0–99.0%) and “very good” inter-rater agreement with the expert standard (K = 0.85). In Phase 2, faculty demonstrated high overall adherence (75.2%) to the AI’s recommendations. In cases of disagreement where faculty chose to override the AI, the faculty’s final decision was accurate only 29.4% of the time, demonstrating human overconfidence. Generative AI, despite a specific and predictable flaw in differentiating “Analyze” from “Understand,” serves as a powerful partner for medical faculty. The tool provides a more reliable classification than un-aided, non-expert human judgment. It can significantly improve assessment validity by supporting faculty, helping to bridge pedagogical training gaps, and promoting the development of assessments that target higher-order cognitive skills essential for medical practice.
Ähnliche Arbeiten
The Strengths and Difficulties Questionnaire: A Research Note
1997 · 14.699 Zit.
Making sense of Cronbach's alpha
2011 · 14.061 Zit.
QUADAS-2: A Revised Tool for the Quality Assessment of Diagnostic Accuracy Studies
2011 · 13.802 Zit.
A method for estimating the probability of adverse drug reactions
1981 · 11.544 Zit.
Clarifying Confusion: The Confusion Assessment Method
1990 · 5.253 Zit.