Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Comparison of a generative large language model to pharmacy student performance on therapeutics examinations
3
Zitationen
3
Autoren
2025
Jahr
Abstract
OBJECTIVE: To compare the performance of a generative language model (ChatGPT-3.5) to pharmacy students on therapeutics examinations. METHODS: Questions were drawn from two pharmacotherapeutics courses in a 4-year PharmD program. Questions were classified as case based or non-case based and application or recall. Questions were entered into ChatGPT version 3.5 and responses were scored. ChatGPT's score for each exam was calculated by dividing the number of correct responses by the total number of questions. The mean composite score for ChatGPT was calculated by adding individual scores from each exam and dividing by the number of exams. The mean composite score for the students was calculated by dividing the sum of the mean class performance on each exam divided by the number of exams. Chi-square was used to identify factors associated with incorrect responses from ChatGPT. RESULTS: The mean composite score across 6 exams for ChatGPT was 53 (SD = 19.2) compared to 82 (SD = 4) for the pharmacy students (p = 0.0048). ChatGPT answered 51 % of questions correctly. ChatGPT was less likely to answer application-based questions correctly compared to recall-based questions (44 % vs 80 %) and less likely to answer case-based questions correctly compared to non-case-based questions (45 % vs 74 %). CONCLUSION: ChatGPT scored lower than the average grade for pharmacy students and was less likely to answer application-based and case-based questions correctly. These findings provide valuable insight into how this technology will perform which can help to inform best practices for item development and helps highlight the limitations of this technology.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.697 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.602 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.127 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.872 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.