OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 26.05.2026, 09:28

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

The Role of Prompt Engineering in AI Essay Scoring: A Comparative Analysis of ChatGPT's Scoring Stability Across Varying Prompt Designs

2026·0 Zitationen
Volltext beim Verlag öffnen

0

Zitationen

3

Autoren

2026

Jahr

Abstract

The rapid adoption of large language models (LLMs), including ChatGPT, in educational contexts has renewed interest in their potential use for automated essay scoring (AES). While prior studies report moderate to strong agreement between ChatGPT and human raters, the role of prompt engineering in shaping scoring reliability and validity remains insufficiently examined. This study investigates how different prompt designs influence the consistency and human alignment of ChatGPT-based essay scoring. Using a stratified sample of 100 learner essays from the ICNALE corpus, each essay was evaluated under four prompt conditions with increasing levels of instructional structure. Scoring outcomes were analyzed using descriptive statistics, intraclass correlation coefficients (ICC), repeated-measures ANOVA, and error-based metrics. The results reveal statistically significant differences across prompt conditions, with rubric-aligned prompts yielding substantially lower score variability and the highest agreement with human ratings (average-measure ICC <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$&gt;0.92)$</tex>. Distributional analyses further demonstrate that structured prompts effectively constrain stochastic scoring behavior. These findings provide empirical evidence that prompt design is a critical methodological factor in LLM-based AES and that reliability limitations commonly attributed to ChatGPT can be mitigated through principled prompt engineering. The study offers practical guidance for the responsible deployment of AI-assisted assessment systems and contributes to the growing literature on prompt-sensitive behavior in educational applications of LLMs.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationTopic ModelingEthics and Social Impacts of AI
Volltext beim Verlag öffnen