SlopCodeBench shows coding agents degrade in structural quality and verbosity across iterative extensions, with no agent solving any problem completely and agent code 2x more eroded than human code.
Title resolution pending
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 9roles
background 2polarities
background 2representative citing papers
Reversa is a reverse documentation engineering framework that deploys a multi-agent pipeline to extract implicit rules from legacy software and produce traceable specifications with confidence scores and explicit gaps for human review.
MR-Coupler leverages functional coupling analysis and LLMs to generate valid metamorphic test cases for over 90% of tasks while detecting 44% of real bugs, outperforming baselines by 64.90% in validity and 36.56% in false-alarm reduction.
Qualitative study of 19 practitioners reveals ten LLM product evaluation practices and introduces the results-actionability gap as a key barrier to turning findings into improvements.
AI coding agents produce pull requests with substantially more commits and slightly higher description-to-diff similarity than human developers, based on analysis of 29,095 merged PRs.
QTyBERT matches or exceeds BERT-based log anomaly detection effectiveness while reducing embedding generation time to near static word embedding levels.
WALL-E uses external library linking via client-server architecture to support ten managed languages in WebAssembly with hundreds-fold speedup over nested runtimes.
Proposes and validates via feasibility study an open-source automated framework for reproducible, species-fair security comparisons of human-written, LLM-generated, and hybrid code.
The authors present a registered report outlining their planned large-scale empirical study of vulnerability communication in pull requests by different account types.
citing papers explorer
-
SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks
SlopCodeBench shows coding agents degrade in structural quality and verbosity across iterative extensions, with no agent solving any problem completely and agent code 2x more eroded than human code.
-
How AI Coding Agents Modify Code: A Large-Scale Study of GitHub Pull Requests
AI coding agents produce pull requests with substantially more commits and slightly higher description-to-diff similarity than human developers, based on analysis of 29,095 merged PRs.
-
A Comparative Study of Semantic Log Representations for Software Log-based Anomaly Detection
QTyBERT matches or exceeds BERT-based log anomaly detection effectiveness while reducing embedding generation time to near static word embedding levels.
-
Bringing Managed Language Support to WebAssembly with External Library Linking
WALL-E uses external library linking via client-server architecture to support ten managed languages in WebAssembly with hundreds-fold speedup over nested runtimes.
-
How to Compare the Security of Code Written by Humans to LLM-generated Code
Proposes and validates via feasibility study an open-source automated framework for reproducible, species-fair security comparisons of human-written, LLM-generated, and hybrid code.
-
How Humans, Bots, and Agents Communicate About Vulnerabilities in Pull Requests
The authors present a registered report outlining their planned large-scale empirical study of vulnerability communication in pull requests by different account types.