Zhang, Pinjia He, and Ahmed E

Zhiyu Fan, Kirill Vasilevski, Dayi Lin, Boyuan Chen, Yihao Chen, Zhiqing Zhong, Jie M · 2025 · arXiv 2509.09853

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

representative citing papers

Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks

cs.LG · 2026-06-10 · conditional · novelty 7.0

Claw-SWE-Bench is a 350-instance multilingual benchmark for OpenClaw-style agent harnesses that shows adapter design raises Pass@1 from 19.1% to 73.4% on the same model while releasing data for reproducible comparison.

Profile-Then-Reason: Bounded Semantic Complexity for Tool-Augmented Language Agents

cs.AI · 2026-04-05 · unverdicted · novelty 7.0

PTR framework profiles a workflow upfront then executes it deterministically with bounded verification and repair, limiting LM calls to 2-3 while outperforming ReAct in 16 of 24 tested configurations.

Inside the Scaffold: A Source-Code Taxonomy of Coding Agent Architectures

cs.SE · 2026-04-03 · accept · novelty 7.0

Analysis of 13 coding agent scaffolds at pinned commits yields a 12-dimension taxonomy showing five composable loop primitives, with 11 agents combining multiple primitives instead of using one fixed structure.

JETO-Bench: A Reproducible Benchmark for Execution Time Improvement Patches in Java

cs.SE · 2026-06-30 · conditional · novelty 6.0

JETO-Mine is a reusable three-phase pipeline that mines 1.8 million Java commits to produce JETO-Bench containing 91 verified executable ETIPs, on which OpenHands succeeds at 14.3%.

Retrieval-Conditioned Topology Selection with Provable Budget Conservation for Multi-Agent Code Generation

cs.AI · 2026-05-07 · unverdicted · novelty 5.0

RGAO combines retrieval-based complexity assessment with a formal budget algebra to enable dynamic topology selection in multi-agent code generation with provable conservation.

Position: Coding Benchmarks Are Misaligned with Agentic Software Engineering

cs.SE · 2026-06-16 · unverdicted · novelty 4.0

Coding benchmarks misalign with agentic software engineering because they conflate model and harness, grade against single references, and provide no component-level iteration signals.

Projecting the Emerging Mindset of SWE Agent by Launching a Wild Code Understanding Journey

cs.SE · 2026-06-07 · unverdicted · novelty 4.0

Ada is a scoped apparatus that records SWE-agent trajectories in real repositories and applies observation lenses to project navigation, evidence selection, synthesis, grounding, and stopping behaviors across 408 runs.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Profile-Then-Reason: Bounded Semantic Complexity for Tool-Augmented Language Agents cs.AI · 2026-04-05 · unverdicted · none · ref 8
PTR framework profiles a workflow upfront then executes it deterministically with bounded verification and repair, limiting LM calls to 2-3 while outperforming ReAct in 16 of 24 tested configurations.
Retrieval-Conditioned Topology Selection with Provable Budget Conservation for Multi-Agent Code Generation cs.AI · 2026-05-07 · unverdicted · none · ref 60
RGAO combines retrieval-based complexity assessment with a formal budget algebra to enable dynamic topology selection in multi-agent code generation with provable conservation.

Zhang, Pinjia He, and Ahmed E

fields

years

verdicts

representative citing papers

citing papers explorer