Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
A Comparative Study on the Consistency of GPT-Based AI Grading Using Human-Developed Assessment Criteria
0
Zitationen
3
Autoren
2025
Jahr
Abstract
The integration of Artificial Intelligence (AI) into educational assessment presents both promising opportunities and notable challenges in evaluating student performance. This study conducts a comparative analysis of ChatGPT-based AI grading systems versus human grading, using structured rubrics as a common framework. Data were collected from two distinct assignments in a computer programming course. Both AI and human graders assessed 20 student submissions. The study utilizes three statistical methods: Intraclass-Correlation Coefficient to evaluate grading consistency, Bland-Altman analysis to measure the agreement between AI and human grades, and paired t-tests to identify significant differences. Results indicate a moderate to high grading consistency for the AI system. While overall agreement with human graders was observed, some discrepancies emerged in specific evaluation criteria. These findings offer valuable insights into the capabilities and current limitations of AI-assisted grading in educational settings.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.460 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.341 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.791 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.536 Zit.