OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 21.04.2026, 21:28

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Explainable Machine Learning Framework for Evaluating Academic Integrity and Educational Validity in Student Writing

2026·0 Zitationen·Zenodo (CERN European Organization for Nuclear Research)Open Access
Volltext beim Verlag öffnen

0

Zitationen

5

Autoren

2026

Jahr

Abstract

This dataset supports the study entitled “Explainable Machine Learning Framework for Evaluating Academic Integrity and Educational Validity in Student Writing.” It comprises a structured collection of student-written responses annotated for quality assessment and enriched with features used in machine learning-based evaluation. The dataset includes textual responses collected from academic writing tasks, categorized into three quality levels: High Quality, Medium Quality, and Low Quality. Each instance is accompanied by corresponding labels and derived linguistic and semantic features, including but not limited to semantic depth, coherence, and lexical richness. These features are extracted through a preprocessing pipeline designed to standardize textual inputs and enhance downstream model performance. In addition to raw and processed text data, the dataset contains model-related outputs generated using a BERT-based classification framework. These outputs include predicted class labels, probability scores, and evaluation metrics such as precision, recall, and F1-score across categories. Furthermore, explainability components are incorporated through SHAP (SHapley Additive exPlanations) values, providing both local and global interpretability of feature contributions to model predictions. To support causal inference analysis, the dataset also includes variables used in propensity score matching (PSM). These variables enable the estimation of treatment effects associated with key linguistic features, facilitating a deeper understanding of their impact on writing quality. Corresponding statistical outputs, including effect sizes and p-values, are provided to ensure reproducibility and transparency. This dataset is intended for research in natural language processing, educational data mining, and explainable artificial intelligence. It enables the development, evaluation, and interpretation of machine learning models for automated writing assessment while supporting rigorous analysis of feature importance and causal relationships. The dataset is organized into multiple files, including raw text data, feature matrices, model predictions, SHAP explanations, and causal inference results. Detailed documentation is provided to ensure ease of use and reproducibility of the experimental framework.

Ähnliche Arbeiten