Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

A Large-Scale Comprehensive Measurement of AI-Generated Code in Real-World Repositories A Large-Scale Comprehensive Measurement of AI-Generated Code in Real-World Repositories

2026·0 Zitationen·arXiv (Cornell University)Open Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Large language models (LLMs) are rapidly transforming software engineering by enabling developers to generate code ranging from small snippets to entire projects. As AI-generated code becomes increasingly integrated into real-world systems, understanding its characteristics and impact is critical. However, prior work primarily focuses on small-scale, controlled evaluations and lacks comprehensive analysis in real-world settings. In this paper, we present a large-scale empirical study of AI-generated code in real-world repositories. We analyze both code-level metrics (\eg complexity, structure, and defect-related indicators) and commit-level characteristics (\eg commit size, frequency, and post-commit stability). To enable this study, we develop heuristic filter with LLM classification to identify AI-generated code and construct a large dataset. Our results provide new insights into how AI-generated code differs from human-written code and how AI assistance influences development practices. These findings contribute to a deeper understanding of the practical implications of AI-assisted programming.

Autoren

Institutionen

Indiana University Bloomington(US)

Themen

Software Engineering ResearchSoftware Engineering Techniques and PracticesArtificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

A Large-Scale Comprehensive Measurement of AI-Generated Code in Real-World Repositories A Large-Scale Comprehensive Measurement of AI-Generated Code in Real-World Repositories

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen