hub

Codeagent: Enhancing code generation with tool-integrated agent systems for real-world repo-level coding challenges

Kechi Zhang, Jia Li, Ge Li, Xianjie Shi, Zhi Jin · 2024 · arXiv 2401.07339

17 Pith papers cite this work. Polarity classification is still indexing.

17 Pith papers citing it

read on arXiv browse 17 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

IdleSpec: Exploiting Idle Time via Speculative Planning for LLM Agents

cs.AI · 2026-05-21 · conditional · novelty 7.0

IdleSpec improves LLM agent accuracy by generating and aggregating speculative plans during idle time between tool calls and observations using complementary drafting strategies.

Context-Augmented Code Generation: How Product Context Improves AI Coding Agent Decision Compliance by 49%

cs.SE · 2026-04-27 · unverdicted · novelty 7.0

Adding product context retrieval to AI coding agents raises decision compliance from 46% to 95% on a new benchmark of 8 tasks with 41 weighted decision points.

Beyond Verifiable Rewards: Rubric-Based GRM for Reinforced Fine-Tuning SWE Agents

cs.LG · 2026-03-13 · unverdicted · novelty 7.0

A rubric-based generative reward model improves reinforced fine-tuning of SWE agents by supplying richer behavioral guidance than binary terminal rewards alone.

SkillJect: Effectively Automating Skill-Based Prompt Injection for Skill-Enabled Agents

cs.CR · 2026-02-15 · unverdicted · novelty 7.0

SkillJect automates creation of poisoned skills that outperform manual prompt injection by embedding payloads in helper scripts and using multi-agent feedback to refine instructions.

In Line with Context: Repository-Level Code Generation via Context Inlining

cs.SE · 2026-01-01 · unverdicted · novelty 7.0

InlineCoder reframes repository-level code generation as function-level coding by using a draft anchor to inline the target function into its call graph for upstream usage and downstream dependency context.

FaSTA$^*$: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing

cs.CV · 2025-06-26 · unverdicted · novelty 7.0

FaSTA* combines LLM fast planning with A* search and inductive subroutine mining to create an efficient agent for multi-turn image editing tasks.

Contextualized Code Pretraining for Code Generation

cs.SE · 2026-05-18 · unverdicted · novelty 6.0

Introduces contextualized code pretraining with caller-callee pairs from static analysis to train CallerGen models that outperform baselines on the new CallerEval benchmark.

Revisiting DAgger in the Era of LLM-Agents

cs.LG · 2026-05-13 · conditional · novelty 6.0

DAgger-style training with turn-level policy interpolation raises 4B and 8B LLM agents to 27.3% and 29.8% on SWE-bench Verified, beating several larger published systems.

RealBench: A Repo-Level Code Generation Benchmark Aligned with Real-World Software Development Practices

cs.SE · 2026-04-24 · unverdicted · novelty 6.0

RealBench is a new repo-level code generation benchmark that adds UML diagrams to natural language specs, showing LLMs struggle more at full repositories, create modules with errors, and perform best with whole-repo generation on small projects versus module-by-module on complex ones.

MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation

cs.CV · 2026-04-16 · unverdicted · novelty 6.0

MM-WebAgent is a hierarchical multimodal agent that coordinates AIGC tools through planning and iterative self-reflection to generate coherent, visually consistent webpages and outperforms baselines on a new benchmark.

TokenCake: A KV-Cache-centric Serving Framework for LLM-based Multi-Agent Applications

cs.DC · 2025-10-21 · unverdicted · novelty 6.0

TokenCake introduces agent-aware temporal and spatial schedulers for KV cache management in LLM multi-agent serving, claiming over 47% lower end-to-end latency and up to 16.9% better GPU memory utilization than vLLM on representative benchmarks.

Beyond Translation Accuracy: Addressing False Failures in LLM-Based Code Translation

cs.SE · 2026-05-04 · unverdicted · novelty 5.0 · 2 refs

A large-scale study finds that many LLM code translation failures are false negatives due to improper evaluation configurations rather than incorrect translations.

AblateCell: A Reproduce-then-Ablate Agent for Virtual Cell Repositories

cs.AI · 2026-04-21 · unverdicted · novelty 5.0

AblateCell reproduces baselines in three single-cell perturbation repositories with 88.9% success and recovers ground-truth critical components with 93.3% accuracy via closed-loop ablation.

Learning Project-wise Subsequent Code Edits via Interleaving Neural-based Induction and Tool-based Deduction

cs.SE · 2026-04-14 · unverdicted · novelty 5.0

TRACE improves project-wise subsequent code editing by interleaving neural-based induction for semantic edits and tool-based deduction for syntactic edits.

Retrieval-Augmented Generation for AI-Generated Content: A Survey

cs.CV · 2024-02-29 · accept · novelty 5.0

A survey classifying RAG foundations for AIGC, summarizing enhancements, cross-modal applications, benchmarks, limitations, and future directions.

LLM-Based Multi-Agent Systems for Code Generation: A Multi-Vocal Literature Review

cs.SE · 2026-02-25 · unverdicted · novelty 3.0

A review of 114 studies classifies motivations into nine categories, analyzes common models and benchmarks, synthesizes challenges into six categories with 26 subcategories and solutions, and identifies six future research directions with 18 subcategories.

A Survey on the Memory Mechanism of Large Language Model based Agents

cs.AI · 2024-04-21 · accept · novelty 3.0

A systematic review of memory designs, evaluation methods, applications, limitations, and future directions for LLM-based agents.

citing papers explorer

Showing 17 of 17 citing papers.

IdleSpec: Exploiting Idle Time via Speculative Planning for LLM Agents cs.AI · 2026-05-21 · conditional · none · ref 10
IdleSpec improves LLM agent accuracy by generating and aggregating speculative plans during idle time between tool calls and observations using complementary drafting strategies.
Context-Augmented Code Generation: How Product Context Improves AI Coding Agent Decision Compliance by 49% cs.SE · 2026-04-27 · unverdicted · none · ref 16
Adding product context retrieval to AI coding agents raises decision compliance from 46% to 95% on a new benchmark of 8 tasks with 41 weighted decision points.
Beyond Verifiable Rewards: Rubric-Based GRM for Reinforced Fine-Tuning SWE Agents cs.LG · 2026-03-13 · unverdicted · none · ref 37
A rubric-based generative reward model improves reinforced fine-tuning of SWE agents by supplying richer behavioral guidance than binary terminal rewards alone.
SkillJect: Effectively Automating Skill-Based Prompt Injection for Skill-Enabled Agents cs.CR · 2026-02-15 · unverdicted · none · ref 25
SkillJect automates creation of poisoned skills that outperform manual prompt injection by embedding payloads in helper scripts and using multi-agent feedback to refine instructions.
In Line with Context: Repository-Level Code Generation via Context Inlining cs.SE · 2026-01-01 · unverdicted · none · ref 69
InlineCoder reframes repository-level code generation as function-level coding by using a draft anchor to inline the target function into its call graph for upstream usage and downstream dependency context.
FaSTA$^*$: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing cs.CV · 2025-06-26 · unverdicted · none · ref 47
FaSTA* combines LLM fast planning with A* search and inductive subroutine mining to create an efficient agent for multi-turn image editing tasks.
Contextualized Code Pretraining for Code Generation cs.SE · 2026-05-18 · unverdicted · none · ref 50
Introduces contextualized code pretraining with caller-callee pairs from static analysis to train CallerGen models that outperform baselines on the new CallerEval benchmark.
Revisiting DAgger in the Era of LLM-Agents cs.LG · 2026-05-13 · conditional · none · ref 48
DAgger-style training with turn-level policy interpolation raises 4B and 8B LLM agents to 27.3% and 29.8% on SWE-bench Verified, beating several larger published systems.
RealBench: A Repo-Level Code Generation Benchmark Aligned with Real-World Software Development Practices cs.SE · 2026-04-24 · unverdicted · none · ref 53
RealBench is a new repo-level code generation benchmark that adds UML diagrams to natural language specs, showing LLMs struggle more at full repositories, create modules with errors, and perform best with whole-repo generation on small projects versus module-by-module on complex ones.
MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation cs.CV · 2026-04-16 · unverdicted · none · ref 34
MM-WebAgent is a hierarchical multimodal agent that coordinates AIGC tools through planning and iterative self-reflection to generate coherent, visually consistent webpages and outperforms baselines on a new benchmark.
TokenCake: A KV-Cache-centric Serving Framework for LLM-based Multi-Agent Applications cs.DC · 2025-10-21 · unverdicted · none · ref 19
TokenCake introduces agent-aware temporal and spatial schedulers for KV cache management in LLM multi-agent serving, claiming over 47% lower end-to-end latency and up to 16.9% better GPU memory utilization than vLLM on representative benchmarks.
Beyond Translation Accuracy: Addressing False Failures in LLM-Based Code Translation cs.SE · 2026-05-04 · unverdicted · none · ref 43 · 2 links
A large-scale study finds that many LLM code translation failures are false negatives due to improper evaluation configurations rather than incorrect translations.
AblateCell: A Reproduce-then-Ablate Agent for Virtual Cell Repositories cs.AI · 2026-04-21 · unverdicted · none · ref 26
AblateCell reproduces baselines in three single-cell perturbation repositories with 88.9% success and recovers ground-truth critical components with 93.3% accuracy via closed-loop ablation.
Learning Project-wise Subsequent Code Edits via Interleaving Neural-based Induction and Tool-based Deduction cs.SE · 2026-04-14 · unverdicted · none · ref 23
TRACE improves project-wise subsequent code editing by interleaving neural-based induction for semantic edits and tool-based deduction for syntactic edits.
Retrieval-Augmented Generation for AI-Generated Content: A Survey cs.CV · 2024-02-29 · accept · none · ref 227
A survey classifying RAG foundations for AIGC, summarizing enhancements, cross-modal applications, benchmarks, limitations, and future directions.
LLM-Based Multi-Agent Systems for Code Generation: A Multi-Vocal Literature Review cs.SE · 2026-02-25 · unverdicted · none · ref 45
A review of 114 studies classifies motivations into nine categories, analyzes common models and benchmarks, synthesizes challenges into six categories with 26 subcategories and solutions, and identifies six future research directions with 18 subcategories.
A Survey on the Memory Mechanism of Large Language Model based Agents cs.AI · 2024-04-21 · accept · none · ref 114
A systematic review of memory designs, evaluation methods, applications, limitations, and future directions for LLM-based agents.

Codeagent: Enhancing code generation with tool-integrated agent systems for real-world repo-level coding challenges

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer