hub

arXiv preprint arXiv:2406.01304 (2024)

· 2024 · arXiv 2406.01304

17 Pith papers cite this work. Polarity classification is still indexing.

17 Pith papers citing it

read on arXiv browse 17 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4

citation-polarity summary

background 4

representative citing papers

Evaluating LLM-Based 0-to-1 Software Generation in End-to-End CLI Tool Scenarios

cs.SE · 2026-04-08 · unverdicted · novelty 7.0

A new benchmark for 0-to-1 CLI tool generation shows state-of-the-art LLMs achieve under 43% success rate with black-box equivalence testing against real oracles.

OrchestrXR: A Multi-Agent System for Idea-to-Prototype XR Study Authoring

cs.HC · 2026-07-02 · unverdicted · novelty 6.0

OrchestrXR uses multi-agent orchestration with structured schemas to generate Unity XR study prototypes from ideas, supported by a user study with 12 researchers indicating effective support and intent preservation.

On the Reliability of Networks of AI Agents: Density Evolution, Stopping Sets, and Architecture Optimization

cs.MA · 2026-06-16 · unverdicted · novelty 6.0

Extends density evolution to role-typed factor graphs with nonlinear Boolean verifiers to predict asymptotic unresolved subclaims in AI agent networks under three erasure failure modes.

Agentic Coding Needs Proactivity, Not Just Autonomy

cs.SE · 2026-05-07 · conditional · novelty 6.0

Coding agents require a three-level proactivity taxonomy (Reactive, Scheduled, Situation Aware) evaluated by insight policy quality using Insight Decision Quality, Context Grounding Score, and Learning Lift.

SAFEdit: Does Multi-Agent Decomposition Resolve the Reliability Challenges of Instructed Code Editing?

cs.SE · 2026-04-28 · unverdicted · novelty 6.0

SAFEdit reaches 68.6% task success on EditBench code edits by using planner, editor, and verifier agents plus a failure abstraction layer, beating single-model and ReAct baselines.

REAgent: Requirement-Driven LLM Agents for Software Issue Resolution

cs.SE · 2026-04-08 · unverdicted · novelty 6.0

REAgent improves LLM patch generation for software issues by 17.4% on average through automated construction, quality checking, and iterative refinement of structured issue-oriented requirements.

Beyond Fixed Tests: Repository-Level Issue Resolution as Coevolution of Code and Behavioral Constraints

cs.SE · 2026-04-06 · unverdicted · novelty 6.0

Agent-CoEvo is a multi-agent LLM framework that coevolves code patches and test patches to resolve repository-level issues, outperforming fixed-test baselines on SWE-bench Lite and SWT-bench Lite.

A Retrieval-Augmented Generation Approach to Extracting Algorithmic Logic from Neural Networks

cs.CV · 2025-12-03 · unverdicted · novelty 6.0

NN-RAG extracts 1,289 candidate neural modules from 19 PyTorch repositories, validates 941 of them, and supplies roughly 72% of the novel structures in the LEMUR dataset while enabling cross-repository migration.

Process-Centric Analysis of Agentic Software Systems

cs.SE · 2025-12-02 · unverdicted · novelty 6.0

Graphectory turns stochastic agent trajectories into analyzable graphs, showing that stronger models and successful fixes follow coherent localization-validation steps while failures are chaotic, and online detection plus rollback improves resolution rates by 6.9-23.5%.

Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory

cs.CL · 2025-11-25 · unverdicted · novelty 6.0

Evo-Memory is a new streaming benchmark and evaluation framework for self-evolving memory in LLM agents, unifying over ten memory modules and introducing the ReMem pipeline for continual improvement on multi-turn and reasoning datasets.

Agentless: Demystifying LLM-based Software Engineering Agents

cs.SE · 2024-07-01 · conditional · novelty 6.0

Agentless, a basic three-phase LLM pipeline for bug localization, repair, and validation, outperforms complex open-source agents on SWE-bench Lite with 32% success rate at $0.70 cost.

Exploration Structure in LLM Agents for Multi-File Change Localization

cs.SE · 2026-06-10 · unverdicted · novelty 4.0

Non-linear domain-scoped parallel LLM agents achieve higher micro F1 than linear exploration and some baselines for multi-file change localization on SWE-bench Pro ansible tasks.

What makes a harness a harness: necessary and sufficient conditions for an agent harness

cs.SE · 2026-06-08 · unverdicted · novelty 4.0

Proposes and tests a constitutive definition of 'agent harness' via conceptual analysis of literature and six real systems.

LLM-Based Automated Diagnosis Of Integration Test Failures At Google

cs.SE · 2026-04-13 · unverdicted · novelty 4.0

Auto-Diagnose applies LLMs to summarize and diagnose root causes of integration test failures, reporting 90.14% accuracy on 71 manual cases and positive adoption after Google-wide rollout.

From Cool Demos to Production-Ready FMware: Core Challenges and a Technology Roadmap

cs.SE · 2024-10-28 · unverdicted · novelty 4.0

A semi-structured thematic synthesis identifies core challenges in FM selection, alignment, prompting, orchestration, testing, deployment, and cross-cutting concerns like observability for production-ready FMware.

Large Language Model-Based Agents for Software Engineering: A Survey

cs.SE · 2024-09-04 · unverdicted · novelty 4.0

A literature survey that collects and categorizes 124 papers on LLM-based agents for software engineering from SE and agent perspectives.

iCoRe: An Iterative Correlation-Aware Retriever for Bug Reproduction Test Generation

cs.SE · 2026-04-21

citing papers explorer

Showing 17 of 17 citing papers.

Evaluating LLM-Based 0-to-1 Software Generation in End-to-End CLI Tool Scenarios cs.SE · 2026-04-08 · unverdicted · none · ref 4
A new benchmark for 0-to-1 CLI tool generation shows state-of-the-art LLMs achieve under 43% success rate with black-box equivalence testing against real oracles.
OrchestrXR: A Multi-Agent System for Idea-to-Prototype XR Study Authoring cs.HC · 2026-07-02 · unverdicted · none · ref 10
OrchestrXR uses multi-agent orchestration with structured schemas to generate Unity XR study prototypes from ideas, supported by a user study with 12 researchers indicating effective support and intent preservation.
On the Reliability of Networks of AI Agents: Density Evolution, Stopping Sets, and Architecture Optimization cs.MA · 2026-06-16 · unverdicted · none · ref 2
Extends density evolution to role-typed factor graphs with nonlinear Boolean verifiers to predict asymptotic unresolved subclaims in AI agent networks under three erasure failure modes.
Agentic Coding Needs Proactivity, Not Just Autonomy cs.SE · 2026-05-07 · conditional · none · ref 10
Coding agents require a three-level proactivity taxonomy (Reactive, Scheduled, Situation Aware) evaluated by insight policy quality using Insight Decision Quality, Context Grounding Score, and Learning Lift.
SAFEdit: Does Multi-Agent Decomposition Resolve the Reliability Challenges of Instructed Code Editing? cs.SE · 2026-04-28 · unverdicted · none · ref 5
SAFEdit reaches 68.6% task success on EditBench code edits by using planner, editor, and verifier agents plus a failure abstraction layer, beating single-model and ReAct baselines.
REAgent: Requirement-Driven LLM Agents for Software Issue Resolution cs.SE · 2026-04-08 · unverdicted · none · ref 8
REAgent improves LLM patch generation for software issues by 17.4% on average through automated construction, quality checking, and iterative refinement of structured issue-oriented requirements.
Beyond Fixed Tests: Repository-Level Issue Resolution as Coevolution of Code and Behavioral Constraints cs.SE · 2026-04-06 · unverdicted · none · ref 9
Agent-CoEvo is a multi-agent LLM framework that coevolves code patches and test patches to resolve repository-level issues, outperforming fixed-test baselines on SWE-bench Lite and SWT-bench Lite.
A Retrieval-Augmented Generation Approach to Extracting Algorithmic Logic from Neural Networks cs.CV · 2025-12-03 · unverdicted · none · ref 20
NN-RAG extracts 1,289 candidate neural modules from 19 PyTorch repositories, validates 941 of them, and supplies roughly 72% of the novel structures in the LEMUR dataset while enabling cross-repository migration.
Process-Centric Analysis of Agentic Software Systems cs.SE · 2025-12-02 · unverdicted · none · ref 13
Graphectory turns stochastic agent trajectories into analyzable graphs, showing that stronger models and successful fixes follow coherent localization-validation steps while failures are chaotic, and online detection plus rollback improves resolution rates by 6.9-23.5%.
Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory cs.CL · 2025-11-25 · unverdicted · none · ref 221
Evo-Memory is a new streaming benchmark and evaluation framework for self-evolving memory in LLM agents, unifying over ten memory modules and introducing the ReMem pipeline for continual improvement on multi-turn and reasoning datasets.
Agentless: Demystifying LLM-based Software Engineering Agents cs.SE · 2024-07-01 · conditional · none · ref 31
Agentless, a basic three-phase LLM pipeline for bug localization, repair, and validation, outperforms complex open-source agents on SWE-bench Lite with 32% success rate at $0.70 cost.
Exploration Structure in LLM Agents for Multi-File Change Localization cs.SE · 2026-06-10 · unverdicted · none · ref 9
Non-linear domain-scoped parallel LLM agents achieve higher micro F1 than linear exploration and some baselines for multi-file change localization on SWE-bench Pro ansible tasks.
What makes a harness a harness: necessary and sufficient conditions for an agent harness cs.SE · 2026-06-08 · unverdicted · none · ref 7
Proposes and tests a constitutive definition of 'agent harness' via conceptual analysis of literature and six real systems.
LLM-Based Automated Diagnosis Of Integration Test Failures At Google cs.SE · 2026-04-13 · unverdicted · none · ref 5
Auto-Diagnose applies LLMs to summarize and diagnose root causes of integration test failures, reporting 90.14% accuracy on 71 manual cases and positive adoption after Google-wide rollout.
From Cool Demos to Production-Ready FMware: Core Challenges and a Technology Roadmap cs.SE · 2024-10-28 · unverdicted · none · ref 32
A semi-structured thematic synthesis identifies core challenges in FM selection, alignment, prompting, orchestration, testing, deployment, and cross-cutting concerns like observability for production-ready FMware.
Large Language Model-Based Agents for Software Engineering: A Survey cs.SE · 2024-09-04 · unverdicted · none · ref 242
A literature survey that collects and categorizes 124 papers on LLM-based agents for software engineering from SE and agent perspectives.
iCoRe: An Iterative Correlation-Aware Retriever for Bug Reproduction Test Generation cs.SE · 2026-04-21 · unreviewed · ref 8

arXiv preprint arXiv:2406.01304 (2024)

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer