An empirical study on the code refactoring capability of large language models

· 2024 · arXiv 2411.02320

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 2

citation-polarity summary

background 1 support 1

representative citing papers

SmellBench: Evaluating LLM Agents on Architectural Code Smell Repair

cs.SE · 2026-05-07 · unverdicted · novelty 7.0 · 2 refs

SmellBench is the first benchmark showing LLM agents resolve 47.7% of architectural code smells while accurately spotting false positives, but aggressive repairs often introduce new smells and degrade overall quality.

Patterns of Developer Adoption of LLM-Generated Code Refactoring Suggestions

cs.SE · 2026-05-06 · unverdicted · novelty 7.0

Analysis of GitHub commits shows developers mostly accept LLM refactoring suggestions without changes, with modifications clustering into five patterns based on activity, prompt, and response validity.

Foundation Models as Oracles for Refactoring Correctness Detection

cs.SE · 2026-05-03 · unverdicted · novelty 7.0

Foundation models serve as effective oracles for detecting refactoring correctness issues in Java programs, achieving up to 93.8% accuracy in zero-shot evaluations on 226 real bugs.

Structural Anchors and Reasoning Fragility:Understanding CoT Robustness in LLM4Code

cs.SE · 2026-04-14 · unverdicted · novelty 7.0

CoT prompting in LLM4Code shows mixed robustness that depends on model family, task structure, and perturbations destabilizing structural anchors, leading to trajectory deformations like lengthening, branching, and simplification.

Your Build Scripts Stink: The State of Code Smells in Build Scripts

cs.SE · 2025-06-22 · conditional · novelty 7.0

The study identifies 13 categories of code smells in build scripts, detects 10,895 occurrences across 5882 scripts from 4877 repositories, and finds common patterns like insecure URLs in Maven and hardcoded paths in Gradle and CMake.

AI-Generated Smells: An Analysis of Code and Architecture in LLM and Agent-Driven Development

cs.SE · 2026-05-04 · unverdicted · novelty 6.0

More capable LLMs and agents generate code with greater volume and architectural decay, following a Volume-Quality Inverse Law that neither functional correctness nor prompting mitigates.

A Blueprint for AI-Driven Software Quality: Integrating LLMs with Established Standards

cs.SE · 2025-05-19 · unverdicted · novelty 3.0

Survey mapping LLM applications in software quality assurance to established standards including ISO/IEC 12207, ISO 25010, CMMI, and TMM, with case studies, challenges, and future directions.

citing papers explorer

Showing 7 of 7 citing papers.

SmellBench: Evaluating LLM Agents on Architectural Code Smell Repair cs.SE · 2026-05-07 · unverdicted · none · ref 8 · 2 links
SmellBench is the first benchmark showing LLM agents resolve 47.7% of architectural code smells while accurately spotting false positives, but aggressive repairs often introduce new smells and degrade overall quality.
Patterns of Developer Adoption of LLM-Generated Code Refactoring Suggestions cs.SE · 2026-05-06 · unverdicted · none · ref 6
Analysis of GitHub commits shows developers mostly accept LLM refactoring suggestions without changes, with modifications clustering into five patterns based on activity, prompt, and response validity.
Foundation Models as Oracles for Refactoring Correctness Detection cs.SE · 2026-05-03 · unverdicted · none · ref 70
Foundation models serve as effective oracles for detecting refactoring correctness issues in Java programs, achieving up to 93.8% accuracy in zero-shot evaluations on 226 real bugs.
Structural Anchors and Reasoning Fragility:Understanding CoT Robustness in LLM4Code cs.SE · 2026-04-14 · unverdicted · none · ref 22
CoT prompting in LLM4Code shows mixed robustness that depends on model family, task structure, and perturbations destabilizing structural anchors, leading to trajectory deformations like lengthening, branching, and simplification.
Your Build Scripts Stink: The State of Code Smells in Build Scripts cs.SE · 2025-06-22 · conditional · none · ref 16
The study identifies 13 categories of code smells in build scripts, detects 10,895 occurrences across 5882 scripts from 4877 repositories, and finds common patterns like insecure URLs in Maven and hardcoded paths in Gradle and CMake.
AI-Generated Smells: An Analysis of Code and Architecture in LLM and Agent-Driven Development cs.SE · 2026-05-04 · unverdicted · none · ref 6
More capable LLMs and agents generate code with greater volume and architectural decay, following a Volume-Quality Inverse Law that neither functional correctness nor prompting mitigates.
A Blueprint for AI-Driven Software Quality: Integrating LLMs with Established Standards cs.SE · 2025-05-19 · unverdicted · none · ref 44
Survey mapping LLM applications in software quality assurance to established standards including ISO/IEC 12207, ISO 25010, CMMI, and TMM, with case studies, challenges, and future directions.

An empirical study on the code refactoring capability of large language models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer