Process-Centric Analysis of Agentic Software Systems

· 2025 · cs.SE · arXiv 2512.02393

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open full Pith review browse 4 citing papers arXiv PDF

abstract

Agentic systems are modern software systems: they consist of orchestrated modules, expose interfaces, and are deployed in software pipelines. Unlike conventional programs, their execution, i.e., trajectories, is inherently stochastic and adaptive to the problems they solve. Evaluation of such systems is often outcome-centric. This narrow focus overlooks detailed insights, failing to explain how agents reason, plan, act, or change their strategies. Inspired by the structured representation of conventional software systems as graphs, we introduce Graphectory to systematically encode the temporal and semantic relations in such systems. Using Graphectory, we automatically analyze 4000 trajectories of two dominant agentic programming workflows, SWE-agent and OpenHands, with four backbone Large Language Models (LLMs), attempting to resolve SWE-bench issues. Our automated analyses (completed within four minutes) reveal that: (1) agents using richer prompts or stronger LLMs exhibit more complex Graphectory, reflecting deeper exploration, broader context gathering, and more thorough validation; (2) agents' strategies vary with problem difficulty and the underlying LLM - for resolved issues, strategies often follow coherent localization-patching-validation steps, while unresolved ones exhibit chaotic or backtracking behaviors; and (3) even successful agentic systems often display inefficient processes. We also implement a novel technique for real-time construction and analysis of Graphectory and Langutory during agent execution to flag trajectory issues. Upon detecting such issues, the technique notifies the agent with a diagnostic message and, when applicable, rolls back the trajectory. Experiments show that online monitoring and interventions improve resolution rates by 6.9%-23.5% across models for problematic instances, while significantly shortening trajectories with near-zero overhead.

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

Evaluating Plan Compliance in Autonomous Programming Agents

cs.SE · 2026-04-13 · unverdicted · novelty 7.0

Autonomous programming agents frequently fail to follow instructed plans, falling back on incomplete internalized workflows, while standard plans and periodic reminders improve performance but poor plans can degrade it more than no plan.

ReCodeAgent: A Multi-Agent Workflow for Language-agnostic Translation and Validation of Large-scale Repositories

cs.SE · 2026-04-08 · unverdicted · novelty 7.0

ReCodeAgent uses a multi-agent system to translate and validate large code repositories across multiple programming languages, achieving 60.8% higher test pass rates than prior neuro-symbolic and agentic methods on 118 real-world projects.

AgentSZZ: Teaching the LLM Agent to Play Detective with Bug-Inducing Commits

cs.SE · 2026-04-03 · conditional · novelty 7.0

AgentSZZ is an LLM-agent framework that identifies bug-inducing commits with up to 27.2% higher F1 scores than prior methods by enabling adaptive exploration and causal tracing, especially for cross-file and ghost commits.

Beyond Resolution Rates: Behavioral Drivers of Coding Agent Success and Failure

cs.SE · 2026-04-02 · accept · novelty 7.0

Large-scale trajectory analysis of 19 coding agents on 500 tasks finds that LLM choice drives outcomes more than framework design and that context-gathering plus validation behaviors improve success beyond task difficulty predictions.

citing papers explorer

Showing 4 of 4 citing papers.

Evaluating Plan Compliance in Autonomous Programming Agents cs.SE · 2026-04-13 · unverdicted · none · ref 18 · internal anchor
Autonomous programming agents frequently fail to follow instructed plans, falling back on incomplete internalized workflows, while standard plans and periodic reminders improve performance but poor plans can degrade it more than no plan.
ReCodeAgent: A Multi-Agent Workflow for Language-agnostic Translation and Validation of Large-scale Repositories cs.SE · 2026-04-08 · unverdicted · none · ref 40 · internal anchor
ReCodeAgent uses a multi-agent system to translate and validate large code repositories across multiple programming languages, achieving 60.8% higher test pass rates than prior neuro-symbolic and agentic methods on 118 real-world projects.
AgentSZZ: Teaching the LLM Agent to Play Detective with Bug-Inducing Commits cs.SE · 2026-04-03 · conditional · none · ref 22 · internal anchor
AgentSZZ is an LLM-agent framework that identifies bug-inducing commits with up to 27.2% higher F1 scores than prior methods by enabling adaptive exploration and causal tracing, especially for cross-file and ghost commits.
Beyond Resolution Rates: Behavioral Drivers of Coding Agent Success and Failure cs.SE · 2026-04-02 · accept · none · ref 14 · internal anchor
Large-scale trajectory analysis of 19 coding agents on 500 tasks finds that LLM choice drives outcomes more than framework design and that context-gathering plus validation behaviors improve success beyond task difficulty predictions.

Process-Centric Analysis of Agentic Software Systems

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer