Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
A comparison of the psychometric properties of GPT-4 versus human novice and expert authors of clinically complex MCQs in a mock examination of Australian medical students
1
Zitationen
6
Autoren
2025
Jahr
Abstract
PURPOSE: Creating clinically complex Multiple Choice Questions (MCQs) for medical assessment can be time-consuming . Large language models such as GPT-4, a type of generative artificial intelligence (AI), are a potential MCQ design tool. Evaluating the psychometric properties of AI-generated MCQs is essential to ensuring quality. METHODS: A 120-item mock examination was constructed, containing 40 human-generated MCQs at novice item-writer level, 40 at expert level, and 40 AI-generated MCQs. int. All examination items underwent panel review to ensure they tested higher order cognitive skills and met a minimum acceptable standard. The online mock examination was administered to Australian medical students, who were blinded to each item's author. RESULTS: = 0.382). CONCLUSIONS: The psychometric properties of AI-generated MCQs are comparable to human-generated MCQs at both novice and expert level. Item quality can be improved across all author groups. AI-generated items should undergo human review to enhance distractor efficiency.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.611 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.504 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.025 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.835 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.