{"total":13,"items":[{"citing_arxiv_id":"2605.06717","ref_index":10,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Agentic Coding Needs Proactivity, Not Just Autonomy","primary_cat":"cs.SE","submitted_at":"2026-05-07T03:52:56+00:00","verdict":"CONDITIONAL","verdict_confidence":"UNKNOWN","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Coding agents require a three-level proactivity taxonomy (Reactive, Scheduled, Situation Aware) evaluated by insight policy quality using Insight Decision Quality, Context Grounding Score, and Learning Lift.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.25737","ref_index":5,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SAFEdit: Does Multi-Agent Decomposition Resolve the Reliability Challenges of Instructed Code Editing?","primary_cat":"cs.SE","submitted_at":"2026-04-28T15:04:46+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SAFEdit reaches 68.6% task success on EditBench code edits by using planner, editor, and verifier agents plus a failure abstraction layer, beating single-model and ReAct baselines.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.19224","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"iCoRe: An Iterative Correlation-Aware Retriever for Bug Reproduction Test Generation","primary_cat":"cs.SE","submitted_at":"2026-04-21T08:26:30+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"iCoRe improves Fail-to-Pass rates to 42.0% and 52.8% on two bug reproduction benchmarks by using correlation-aware iterative retrieval instead of standard semantic or BM25 methods.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.12108","ref_index":5,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LLM-Based Automated Diagnosis Of Integration Test Failures At Google","primary_cat":"cs.SE","submitted_at":"2026-04-13T22:30:53+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Auto-Diagnose applies LLMs to summarize and diagnose root causes of integration test failures, reporting 90.14% accuracy on 71 manual cases and positive adoption after Google-wide rollout.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.06861","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"REAgent: Requirement-Driven LLM Agents for Software Issue Resolution","primary_cat":"cs.SE","submitted_at":"2026-04-08T09:22:30+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"REAgent improves LLM patch generation for software issues by 17.4% on average through automated construction, quality checking, and iterative refinement of structured issue-oriented requirements.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"code search, and test execution to facilitate repository interaction. Agentless [77] decomposes issue resolution into predefined stages, including localization, patch generation, and patch validation. Sub- sequent work further improves these frameworks by introducing advanced retrieval strategies [11, 51], context compression meth- ods [39, 74], and multi-agent collaboration mechanisms [8, 52]. Despite these advances, existing techniques primarily focus on improving how LLMs solve problems through better tools or work- flows, while largely overlooking what is being solved, namely the quality of the task specification itself. Most techniques directly treat issue descriptions as input, implicitly assuming that they accu- rately capture the programming specifications for the desired code"},{"citing_arxiv_id":"2604.06742","ref_index":4,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Evaluating LLM-Based 0-to-1 Software Generation in End-to-End CLI Tool Scenarios","primary_cat":"cs.SE","submitted_at":"2026-04-08T07:09:10+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A new benchmark for 0-to-1 CLI tool generation shows state-of-the-art LLMs achieve under 43% success rate with black-box equivalence testing against real oracles.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.04580","ref_index":9,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Beyond Fixed Tests: Repository-Level Issue Resolution as Coevolution of Code and Behavioral Constraints","primary_cat":"cs.SE","submitted_at":"2026-04-06T10:26:46+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Agent-CoEvo is a multi-agent LLM framework that coevolves code patches and test patches to resolve repository-level issues, outperforming fixed-test baselines on SWE-bench Lite and SWT-bench Lite.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"space of code patches that satisfy these fixed behavioral signals. A large body of work explores agent- based repository reasoning and patch synthesis under this assumption. Systems such as SWE-agent [50], OpenHands [45], MarsCode [27], and AutoCodeRover [51] design tool-augmented agents to navigate codebases and iteratively refine patches. Other efforts investigate structured collaboration [9], historical and dependency-aware exploration [28, 30], or simplified agent pipelines that reduce step-wise error propagation [47]. Another line of work improves repair effectiveness by enhancing repository understanding and structured reasoning. Methods such as RepoGraph [38], knowledge-graph-augmented approaches [49], and intent-grounding systems like SpecRover [40] inject structural and semantic information"},{"citing_arxiv_id":"2512.04329","ref_index":20,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"A Retrieval-Augmented Generation Approach to Extracting Algorithmic Logic from Neural Networks","primary_cat":"cs.CV","submitted_at":"2025-12-03T23:28:30+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"NN-RAG extracts 1,289 candidate neural modules from 19 PyTorch repositories, validates 941 of them, and supplies roughly 72% of the novel structures in the LEMUR dataset while enabling cross-repository migration.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2512.02393","ref_index":13,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Process-Centric Analysis of Agentic Software Systems","primary_cat":"cs.SE","submitted_at":"2025-12-02T04:12:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Graphectory turns stochastic agent trajectories into analyzable graphs, showing that stronger models and successful fixes follow coherent localization-validation steps while failures are chaotic, and online detection plus rollback improves resolution rates by 6.9-23.5%.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2511.20857","ref_index":221,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory","primary_cat":"cs.CL","submitted_at":"2025-11-25T21:08:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Evo-Memory is a new streaming benchmark and evaluation framework for self-evolving memory in LLM agents, unifying over ten memory modules and introducing the ReMem pipeline for continual improvement on multi-turn and reasoning datasets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2410.20791","ref_index":32,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"From Cool Demos to Production-Ready FMware: Core Challenges and a Technology Roadmap","primary_cat":"cs.SE","submitted_at":"2024-10-28T07:16:00+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"A semi-structured thematic synthesis identifies core challenges in FM selection, alignment, prompting, orchestration, testing, deployment, and cross-cutting concerns like observability for production-ready FMware.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2409.02977","ref_index":242,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Large Language Model-Based Agents for Software Engineering: A Survey","primary_cat":"cs.SE","submitted_at":"2024-09-04T15:59:41+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"A literature survey that collects and categorizes 124 papers on LLM-based agents for software engineering from SE and agent perspectives.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"FlowGen Scrum [237]✓Agile Pre-defined Vertical + Horizontal Direct Communication CodeS [238]✓- Pre-defined Vertical Direct Communication Qianet al.[239]✓- Pre-defined Vertical + Horizontal Direct Communication CTC [240]✓Waterfall Pre-defined Vertical + Horizontal Direct Communication + Memory AgileCoder [241]✓Agile Pre-defined Vertical Direct Communication + Memory MacNet [242]✓- Pre-defined Vertical + Horizontal Direct Communication + Memory Samiet al.[243]✓Waterfall Pre-defined Vertical Direct Communication developing a Snake Game application from scratch) beyond an individual phase of software development. In particular, like the real-world software development team, these agent systems can cover the entire software development life cycle"},{"citing_arxiv_id":"2407.01489","ref_index":31,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Agentless: Demystifying LLM-based Software Engineering Agents","primary_cat":"cs.SE","submitted_at":"2024-07-01T17:24:45+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Agentless, a basic three-phase LLM pipeline for bug localization, repair, and validation, outperforms complex open-source agents on SWE-bench Lite with 32% success rate at $0.70 cost.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"in autonomous software development. We hope AGENTLESS will help reset the baseline, starting point, and horizon for autonomous software agents, and inspire future work along this crucial direction. We have open-sourced AGENTLESS at: https://github.com/OpenAutoCoder/Agentless 1 Introduction Large language models (LLMs) have become the go-to default choice for code gener- ation [31, 28, 53, 89]. State-of-the-art LLMs like GPT-4 [ 71] and Claude 3.5 Sonnet [ 26] have demonstrated their prowess in being able to synthesize code snippets based on given user description. However, compared to the main evaluation setting of simple, self-contained problems, applying LLMs on repository-level software engineering tasks has been understudied."}],"limit":50,"offset":0}