Claw-SWE-Bench is a 350-instance multilingual benchmark for OpenClaw-style agent harnesses that shows adapter design raises Pass@1 from 19.1% to 73.4% on the same model while releasing data for reproducible comparison.
Zhang, Pinjia He, and Ahmed E
7 Pith papers cite this work. Polarity classification is still indexing.
years
2026 7representative citing papers
PTR framework profiles a workflow upfront then executes it deterministically with bounded verification and repair, limiting LM calls to 2-3 while outperforming ReAct in 16 of 24 tested configurations.
Analysis of 13 coding agent scaffolds at pinned commits yields a 12-dimension taxonomy showing five composable loop primitives, with 11 agents combining multiple primitives instead of using one fixed structure.
JETO-Mine is a reusable three-phase pipeline that mines 1.8 million Java commits to produce JETO-Bench containing 91 verified executable ETIPs, on which OpenHands succeeds at 14.3%.
RGAO combines retrieval-based complexity assessment with a formal budget algebra to enable dynamic topology selection in multi-agent code generation with provable conservation.
Coding benchmarks misalign with agentic software engineering because they conflate model and harness, grade against single references, and provide no component-level iteration signals.
Ada is a scoped apparatus that records SWE-agent trajectories in real repositories and applies observation lenses to project navigation, evidence selection, synthesis, grounding, and stopping behaviors across 408 runs.
citing papers explorer
-
Profile-Then-Reason: Bounded Semantic Complexity for Tool-Augmented Language Agents
PTR framework profiles a workflow upfront then executes it deterministically with bounded verification and repair, limiting LM calls to 2-3 while outperforming ReAct in 16 of 24 tested configurations.
-
Retrieval-Conditioned Topology Selection with Provable Budget Conservation for Multi-Agent Code Generation
RGAO combines retrieval-based complexity assessment with a formal budget algebra to enable dynamic topology selection in multi-agent code generation with provable conservation.