pith. sign in

hub Canonical reference

Demystifying llm-based software engineering agents

Canonical reference. 82% of citing Pith papers cite this work as background.

27 Pith papers citing it
38 external citations · Crossref
Background 82% of classified citations

hub tools

citation-role summary

background 9 baseline 1 method 1

citation-polarity summary

years

2026 23 2025 4

representative citing papers

SmellBench: Evaluating LLM Agents on Architectural Code Smell Repair

cs.SE · 2026-05-07 · unverdicted · novelty 7.0 · 2 refs

SmellBench is the first benchmark showing LLM agents resolve 47.7% of architectural code smells while accurately spotting false positives, but aggressive repairs often introduce new smells and degrade overall quality.

Investigating Test Overfitting on SWE-bench

cs.SE · 2025-11-20 · unverdicted · novelty 7.0

The first empirical study of test overfitting shows that auto-generated tests from issues can lead to code that passes observed tests but misses important cases or breaks functionality in SWE-bench issue resolution.

FuzzAgent: Multi-Agent System for Evolutionary Library Fuzzing

cs.SE · 2026-05-14 · conditional · novelty 6.0

FuzzAgent deploys specialized agents that collaborate on harness generation, execution, and crash triage to evolve fuzzing campaigns, delivering 45-191% more branch coverage than four baselines on 20 C/C++ libraries and surfacing 102 real bugs.

Reproduction Test Generation for Java SWE Issues

cs.SE · 2026-05-05 · unverdicted · novelty 6.0 · 2 refs

Introduces the first benchmark for Java reproduction test generation from repository issues and adapts a prior Python tool to produce high performance on it.

CodeTracer: Towards Traceable Agent States

cs.SE · 2026-04-13 · unverdicted · novelty 6.0

CodeTracer reconstructs hierarchical state-transition trees from heterogeneous code-agent run artifacts and localizes failure onsets, outperforming baselines on the new CodeTraceBench dataset while enabling recovery of failed trajectories.

Can Old Tests Do New Tricks for Resolving SWE Issues?

cs.SE · 2025-10-21 · conditional · novelty 6.0

TestPrune minimizes regression test suites to improve bug reproduction and patch validation in LLM-based agentic repair pipelines, delivering 6-13% relative gains on SWE-Bench benchmarks at low API cost.

citing papers explorer

Showing 27 of 27 citing papers.