arXiv preprint arXiv:2506.18824 , year=

· 2025 · arXiv 2506.18824

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

read on arXiv browse 12 citing papers

citation-role summary

background 1 baseline 1

citation-polarity summary

background 1 baseline 1

representative citing papers

RepoMirage: Probing Repository Context Reasoning in Code Agents with Perturbations

cs.SE · 2026-05-25 · unverdicted · novelty 7.0

RepoMirage uses semantics-preserving perturbations on SWE-Bench to show code agents lack repository context reasoning, with performance falling sharply on extended structure tasks, and introduces RepoAnchor as a structure-first fix.

AgentLens: Revealing The Lucky Pass Problem in SWE-Agent Evaluation

cs.SE · 2026-05-13 · unverdicted · novelty 7.0

AgentLens reveals 10.7% of passing SWE-agent trajectories exhibit Lucky Pass behaviors and introduces a process-level evaluation framework with a new annotated dataset of 1,815 trajectories.

Inside the Scaffold: A Source-Code Taxonomy of Coding Agent Architectures

cs.SE · 2026-04-03 · accept · novelty 7.0

Analysis of 13 coding agent scaffolds at pinned commits yields a 12-dimension taxonomy showing five composable loop primitives, with 11 agents combining multiple primitives instead of using one fixed structure.

Agentic Much? Adoption of Coding Agents on GitHub

cs.SE · 2026-01-26 · conditional · novelty 7.0

Coding agents reached 22-29% adoption in GitHub projects within months of release, with agent-assisted commits larger and focused on features and bug fixes.

When Agents Fail: A Comprehensive Study of Bugs in LLM Agents with Automated Labeling

cs.SE · 2026-01-21 · unverdicted · novelty 7.0

A large-scale empirical study categorizes bugs in LLM agents and demonstrates that a specialized LLM agent can annotate them accurately at very low cost.

TraceView: Interactive Visualization of Agentic Program Repair Trajectories

cs.SE · 2026-06-20 · accept · novelty 6.0

TraceView organizes agentic APR trajectories into Thought-Action-Result components for semantic labeling and renders them as interactive graphs, with a user study showing improved scanability and understanding for five researchers.

LLM-as-Code: Agentic Programming for Agent Harness

cs.AI · 2026-06-14 · unverdicted · novelty 6.0

Proposes Agentic Programming in which programs control execution flow and LLMs act as invoked components (LLM-as-Code) only for reasoning, producing DAG-structured contexts that improve stability in long-horizon computer-use agents.

Human oversight of agentic systems in practice: Examining the oversight work, challenges, and heuristics of developers using software agents

cs.SE · 2026-06-03 · unverdicted · novelty 6.0

Exploratory interview study with 17 developers identifies four forms of emergent oversight work for software agents and documents situated challenges and heuristics.

FALAT: Tracing Failures in LLM Agent Trajectories via Dependency-Guided Search

cs.AI · 2026-05-30 · unverdicted · novelty 6.0

FALAT improves failure attribution in LLM agent trajectories via dependency-guided search, achieving 46.0% step-level accuracy on algorithm-generated and 29.1% on hand-crafted trajectories in the Who&When benchmark.

Towards Knowledgeable Deep Research: Framework and Benchmark

cs.AI · 2026-04-09 · unverdicted · novelty 6.0

The paper introduces the KDR task, HKA multi-agent framework, and KDR-Bench to enable LLM agents to integrate structured knowledge into deep research reports, with experiments showing outperformance over prior agents.

Projecting the Emerging Mindset of SWE Agent by Launching a Wild Code Understanding Journey

cs.SE · 2026-06-07 · unverdicted · novelty 4.0

Ada is a scoped apparatus that records SWE-agent trajectories in real repositories and applies observation lenses to project navigation, evidence selection, synthesis, grounding, and stopping behaviors across 408 runs.

Understanding Conversational Patterns in Multi-agent Programming: A Case Study on Fibonacci Game Development

cs.SE · 2026-05-22 · unverdicted · novelty 4.0

Case study of 12 LLM agent pairs on Fibonacci game development finds only DeepSeek-R1:DeepSeek-R1 converges correctly from the first iteration while others either diverge or fail to converge.

citing papers explorer

Showing 12 of 12 citing papers.

RepoMirage: Probing Repository Context Reasoning in Code Agents with Perturbations cs.SE · 2026-05-25 · unverdicted · none · ref 36
RepoMirage uses semantics-preserving perturbations on SWE-Bench to show code agents lack repository context reasoning, with performance falling sharply on extended structure tasks, and introduces RepoAnchor as a structure-first fix.
AgentLens: Revealing The Lucky Pass Problem in SWE-Agent Evaluation cs.SE · 2026-05-13 · unverdicted · none · ref 3
AgentLens reveals 10.7% of passing SWE-agent trajectories exhibit Lucky Pass behaviors and introduces a process-level evaluation framework with a new annotated dataset of 1,815 trajectories.
Inside the Scaffold: A Source-Code Taxonomy of Coding Agent Architectures cs.SE · 2026-04-03 · accept · none · ref 2
Analysis of 13 coding agent scaffolds at pinned commits yields a 12-dimension taxonomy showing five composable loop primitives, with 11 agents combining multiple primitives instead of using one fixed structure.
Agentic Much? Adoption of Coding Agents on GitHub cs.SE · 2026-01-26 · conditional · none · ref 6
Coding agents reached 22-29% adoption in GitHub projects within months of release, with agent-assisted commits larger and focused on features and bug fixes.
When Agents Fail: A Comprehensive Study of Bugs in LLM Agents with Automated Labeling cs.SE · 2026-01-21 · unverdicted · none · ref 9
A large-scale empirical study categorizes bugs in LLM agents and demonstrates that a specialized LLM agent can annotate them accurately at very low cost.
TraceView: Interactive Visualization of Agentic Program Repair Trajectories cs.SE · 2026-06-20 · accept · none · ref 7
TraceView organizes agentic APR trajectories into Thought-Action-Result components for semantic labeling and renders them as interactive graphs, with a user study showing improved scanability and understanding for five researchers.
LLM-as-Code: Agentic Programming for Agent Harness cs.AI · 2026-06-14 · unverdicted · none · ref 1
Proposes Agentic Programming in which programs control execution flow and LLMs act as invoked components (LLM-as-Code) only for reasoning, producing DAG-structured contexts that improve stability in long-horizon computer-use agents.
Human oversight of agentic systems in practice: Examining the oversight work, challenges, and heuristics of developers using software agents cs.SE · 2026-06-03 · unverdicted · none · ref 11
Exploratory interview study with 17 developers identifies four forms of emergent oversight work for software agents and documents situated challenges and heuristics.
FALAT: Tracing Failures in LLM Agent Trajectories via Dependency-Guided Search cs.AI · 2026-05-30 · unverdicted · none · ref 8
FALAT improves failure attribution in LLM agent trajectories via dependency-guided search, achieving 46.0% step-level accuracy on algorithm-generated and 29.1% on hand-crafted trajectories in the Who&When benchmark.
Towards Knowledgeable Deep Research: Framework and Benchmark cs.AI · 2026-04-09 · unverdicted · none · ref 5
The paper introduces the KDR task, HKA multi-agent framework, and KDR-Bench to enable LLM agents to integrate structured knowledge into deep research reports, with experiments showing outperformance over prior agents.
Projecting the Emerging Mindset of SWE Agent by Launching a Wild Code Understanding Journey cs.SE · 2026-06-07 · unverdicted · none · ref 10
Ada is a scoped apparatus that records SWE-agent trajectories in real repositories and applies observation lenses to project navigation, evidence selection, synthesis, grounding, and stopping behaviors across 408 runs.
Understanding Conversational Patterns in Multi-agent Programming: A Case Study on Fibonacci Game Development cs.SE · 2026-05-22 · unverdicted · none · ref 1
Case study of 12 LLM agent pairs on Fibonacci game development finds only DeepSeek-R1:DeepSeek-R1 converges correctly from the first iteration while others either diverge or fail to converge.

arXiv preprint arXiv:2506.18824 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer