Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Explainable Machine Learning Framework for Evaluating Academic Integrity and Educational Validity in Student Writing
0
Zitationen
5
Autoren
2026
Jahr
Abstract
This dataset supports the study entitled “Explainable Machine Learning Framework for Evaluating Academic Integrity and Educational Validity in Student Writing.” It comprises a structured collection of student-written responses annotated for quality assessment and enriched with features used in machine learning-based evaluation. The dataset includes textual responses collected from academic writing tasks, categorized into three quality levels: High Quality, Medium Quality, and Low Quality. Each instance is accompanied by corresponding labels and derived linguistic and semantic features, including but not limited to semantic depth, coherence, and lexical richness. These features are extracted through a preprocessing pipeline designed to standardize textual inputs and enhance downstream model performance. In addition to raw and processed text data, the dataset contains model-related outputs generated using a BERT-based classification framework. These outputs include predicted class labels, probability scores, and evaluation metrics such as precision, recall, and F1-score across categories. Furthermore, explainability components are incorporated through SHAP (SHapley Additive exPlanations) values, providing both local and global interpretability of feature contributions to model predictions. To support causal inference analysis, the dataset also includes variables used in propensity score matching (PSM). These variables enable the estimation of treatment effects associated with key linguistic features, facilitating a deeper understanding of their impact on writing quality. Corresponding statistical outputs, including effect sizes and p-values, are provided to ensure reproducibility and transparency. This dataset is intended for research in natural language processing, educational data mining, and explainable artificial intelligence. It enables the development, evaluation, and interpretation of machine learning models for automated writing assessment while supporting rigorous analysis of feature importance and causal relationships. The dataset is organized into multiple files, including raw text data, feature matrices, model predictions, SHAP explanations, and causal inference results. Detailed documentation is provided to ensure ease of use and reproducibility of the experimental framework.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.493 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.377 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.835 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.555 Zit.