Mäntylä, Jesse Nyyssölä, Ke Ping, and Liqiang Wang

Danning Xie, Byungwoo Yoo, Nan Jiang, Mijung Kim, Lin Tan, Xiangyu Zhang, Judy S · 2025 · arXiv 4311.2025

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks

cs.SE · 2026-03-25 · unverdicted · novelty 8.0

SlopCodeBench shows coding agents degrade in structural quality and verbosity across iterative extensions, with no agent solving any problem completely and agent code 2x more eroded than human code.

Reversa: A Reverse Documentation Engineering Framework for Converting Legacy Software into Operational Specifications for AI Agents

cs.SE · 2026-05-18 · conditional · novelty 7.0

Reversa is a reverse documentation engineering framework that deploys a multi-agent pipeline to extract implicit rules from legacy software and produce traceable specifications with confidence scores and explicit gaps for human review.

MR-Coupler: Automated Metamorphic Test Generation via Functional Coupling Analysis

cs.SE · 2026-04-11 · conditional · novelty 7.0

MR-Coupler leverages functional coupling analysis and LLMs to generate valid metamorphic test cases for over 90% of tasks while detecting 44% of real bugs, outperforming baselines by 64.90% in validity and 36.56% in false-alarm reduction.

Results-Actionability Gap: Understanding How Practitioners Evaluate LLM Products in the Wild

cs.SE · 2026-01-25 · conditional · novelty 7.0

Qualitative study of 19 practitioners reveals ten LLM product evaluation practices and introduces the results-actionability gap as a key barrier to turning findings into improvements.

How AI Coding Agents Modify Code: A Large-Scale Study of GitHub Pull Requests

cs.SE · 2026-01-24 · unverdicted · novelty 7.0

AI coding agents produce pull requests with substantially more commits and slightly higher description-to-diff similarity than human developers, based on analysis of 29,095 merged PRs.

A Comparative Study of Semantic Log Representations for Software Log-based Anomaly Detection

cs.SE · 2026-04-09 · unverdicted · novelty 6.0

QTyBERT matches or exceeds BERT-based log anomaly detection effectiveness while reducing embedding generation time to near static word embedding levels.

citing papers explorer

Showing 6 of 6 citing papers.

SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks cs.SE · 2026-03-25 · unverdicted · none · ref 1
SlopCodeBench shows coding agents degrade in structural quality and verbosity across iterative extensions, with no agent solving any problem completely and agent code 2x more eroded than human code.
Reversa: A Reverse Documentation Engineering Framework for Converting Legacy Software into Operational Specifications for AI Agents cs.SE · 2026-05-18 · conditional · none · ref 40
Reversa is a reverse documentation engineering framework that deploys a multi-agent pipeline to extract implicit rules from legacy software and produce traceable specifications with confidence scores and explicit gaps for human review.
MR-Coupler: Automated Metamorphic Test Generation via Functional Coupling Analysis cs.SE · 2026-04-11 · conditional · none · ref 72
MR-Coupler leverages functional coupling analysis and LLMs to generate valid metamorphic test cases for over 90% of tasks while detecting 44% of real bugs, outperforming baselines by 64.90% in validity and 36.56% in false-alarm reduction.
Results-Actionability Gap: Understanding How Practitioners Evaluate LLM Products in the Wild cs.SE · 2026-01-25 · conditional · none · ref 48
Qualitative study of 19 practitioners reveals ten LLM product evaluation practices and introduces the results-actionability gap as a key barrier to turning findings into improvements.
How AI Coding Agents Modify Code: A Large-Scale Study of GitHub Pull Requests cs.SE · 2026-01-24 · unverdicted · none · ref 9
AI coding agents produce pull requests with substantially more commits and slightly higher description-to-diff similarity than human developers, based on analysis of 29,095 merged PRs.
A Comparative Study of Semantic Log Representations for Software Log-based Anomaly Detection cs.SE · 2026-04-09 · unverdicted · none · ref 40
QTyBERT matches or exceeds BERT-based log anomaly detection effectiveness while reducing embedding generation time to near static word embedding levels.

Mäntylä, Jesse Nyyssölä, Ke Ping, and Liqiang Wang

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer