OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 09.04.2026, 22:28

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

SPRoC: Semantics-Preserving Mutations for Robustness Evaluation of Code Generation Large Language Models

2025·0 Zitationen
Volltext beim Verlag öffnen

0

Zitationen

4

Autoren

2025

Jahr

Abstract

With the widespread use of large language models (LLMs) in code generation, their capabilities continues enhanced. However, LLMs still exhibit instability when faced with minor input prompt variations, which presents challenges for practical deployment. Existing prompt mutation methods, such as random insertions, deletions, or replacements without understanding prompt semantics and structure, still have some limitations. These methods fail to capture the diverse ways of real users express, limiting their ability to assess code generation robustness of LLMs. To address this, we propose SPRoC (Semantics-Preserving Robustness of Code generation), a method for evaluating LLM robustness through prompt mutation. Using a BERT-based model, SPRoC generates mutated prompts that maintain semantic consistency but offer diverse expressions. These prompts create a new dataset to verify the functionality of LLM-generated code. SPRoC compares the functional correctness of code before and after mutation to assess robustness of LLMs to input variations. We conduct experiments on the HumanEval dataset with several mainstream LLMs, including ChatGPT, DeepSeek, Claude, ERNIE, and Qwen, to evaluate performance under SPRoC mutations. Results show that SPRoC reduces the models’ Pass@k scores with minimal semantic changes, outperforming the baseline Radamsa method. In addition, SPRoC achieves better performance in terms of similarity metrics like BLEU and BERTScore, improving by 12.96% and 1.83%, respectively.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Topic ModelingArtificial Intelligence in Healthcare and EducationSoftware Engineering Research
Volltext beim Verlag öffnen