OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 22.05.2026, 03:41

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Automatic Short Answer Grading in the LLM Era: Does GPT-4 with Prompt Engineering beat Traditional Models?

2025·24 ZitationenOpen Access
Volltext beim Verlag öffnen

24

Zitationen

8

Autoren

2025

Jahr

Abstract

Assessing short answers in educational settings is challenging due to the need for scalability and accuracy, which led to the field of Automatic Short Answer Grading (ASAG). Traditional machine learning models, such as ensemble and embeddings, have been widely researched in ASAG, but they often suffer from generalizability issues. Recently, Large Language Models (LLMs) emerged as an alternative to optimize ASAG systems. However, previous research has failed to present a comprehensive analysis of LLMs' performance powered by prompt engineering strategies and compare its capabilities to traditional models. This study presents a comparative analysis between traditional machine learning models and GPT-4 in the context of ASAG. We investigated the effectiveness of different models and text representation techniques and explored prompt engineering strategies for LLMs. The results indicate that traditional machine learning models outperform LLMs. However, GPT-4 showed promising capabilities, especially when configured with optimized prompt components, such as few-shot examples and clear instructions. This study contributes to the literature by providing a detailed evaluation of LLM performance compared to traditional machine learning models in a multilingual ASAG context, offering insights for developing more efficient automatic grading systems.

Ähnliche Arbeiten