Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Revolutionizing Educational Assessment: The Role of Bing Chat GPT-4 Chatbot in Increasing Efficiency in Grading Open-Ended Questions

2025·0 Zitationen·CureusOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Background Bing Chat (Microsoft Corporation, Redmond, WA) is an artificial intelligence (AI) program that can respond to typed text. Many educational institutions are exploring the incorporation of AI into their materials, given its rapid advancements and potential to streamline various processes. One area that still requires investigation is the ability of AI models to grade open-ended questions. The purpose of this study is to determine whether Bing Chat or university faculty exhibit greater consistency in grading such questions. Methods The authors recruited 21 medical students from the American University of the Caribbean (AUC) to answer five open-ended questions related to the United States Medical Licensing Examination (USMLE) Step 1 topics. The volunteer participants consisted of first-year and second-year medical students. The responses of each student to the five questions were graded by six different Bing Chat accounts and six faculty members. Differences in scores between Bing Chat and the faculty members were compared, and inter-rater reliability estimates were calculated. Results Both Bing Chat and faculty consistently measured the same responses; although there was some variability in both cases, it was more pronounced in faculty grading. For analysis, problem-solving questions with elements of application, explanatory, and recall-type questions, Bing Chat's grading closely paralleled that of the faculty. However, when grading a combined recall-and-application question, a significant gap between Bing Chat and faculty scores was observed (p = 0.010). Overall, Bing Chat demonstrated higher inter-rater reliability than faculty, as evidenced by both percent agreement and Gwet's agreement coefficient 1 (AC1). Conclusion Bing Chat demonstrated promising results in evaluating written answers to open-ended questions and shows potential as a supportive grading tool. As educational leaders seek more dependable, faster, and economical methods for assessment, Bing Chat may offer a notable contribution to education. Large language models (LLMs), such as Bing Chat, can be beneficial to both students and educators.

Autoren

Institutionen

American University of the Caribbean School of Medicine(SX)

Themen

Artificial Intelligence in Healthcare and EducationAI in Service InteractionsSocial Media in Health Education

Volltext beim Verlag öffnen

Revolutionizing Educational Assessment: The Role of Bing Chat GPT-4 Chatbot in Increasing Efficiency in Grading Open-Ended Questions

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen