LLMs frequently specify library versions with known CVEs in generated code (36-56% of tasks), show low compatibility (20-63%), and converge on the same risky versions across models.
hub Canonical reference
Llm-based multi-agent systems for software engineering: Literature review, vision, and the road ahead.ACM Trans.Softw
Canonical reference. 100% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
roles
background 5polarities
background 5representative citing papers
AIDev is a new open dataset of 456k AI-agent pull requests showing agents submit code faster than humans but with lower acceptance rates and simpler changes.
A new six-dimension process taxonomy for AI software development frameworks shows convergence on artifact persistence and human oversight but reveals that no framework covers all dimensions strongly, indicating a depth-portability trade-off.
MALMAS is a memory-augmented multi-agent LLM system that generates diverse, high-quality features for tabular data via agent decomposition, routing, and iterative memory-guided refinement.
FLARE extracts specifications from multi-agent LLM code and applies coverage-guided fuzzing to achieve 96.9% inter-agent and 91.1% intra-agent coverage while uncovering 56 new failures across 16 applications.
VIGA introduces a training-free interleaved multimodal reasoning loop that improves vision-as-inverse-graphics accuracy over one-shot baselines on BlenderGym, SlideBench, and new BlenderBench.
POIROT protocol repurposes agents in LLM multi-agent systems as an internal diagnostic layer for failure detection, outperforming single-LLM evaluators with gains that increase with complexity, agent count, and fault types.
ReasonVul deploys three LLM agents with independent analysis and structured debate to achieve 40% PairAcc and 72.52% F1 on PrimeVul, outperforming baselines by 81% in PairAcc.
SelfHeal uses two ReAct agents and empirical fix patterns to repair bugs in LLM agents, outperforming baselines on a new 37-instance benchmark.
Agentic Business Process Management reframes BPM around autonomous agents that must exhibit framed autonomy, explainability, conversational actionability, and self-modification to keep their actions aligned with organizational objectives.
Large-scale experiment with 1440 task executions finds dynamic routing of coordination strategies achieves near-best quality scores across models and classes but does not reliably identify exact winners.
RocketSmith is an LLM-based agentic system that designs four high-powered rockets via additive manufacturing, with two achieving stable launches and recovery after reaching 80% of simulated apogee.
Agentic architecture with context engineering enables scalable UML diagram generation from source code across multiple languages and diagram types.
Systematic study of inter-agent communication in LLM multi-agent systems shows reasoning and verification are critical for performance, with a new augmentation technique recovering 86.2% of failures.
Comparative review of AI coding tool ToS shows responsibility for code quality and compliance shifted to users, with policy misalignment for autonomous agents, plus a research roadmap.
CodeWiki presents a unified framework for repository-level documentation across seven languages using hierarchical decomposition, recursive multi-agent processing, and multi-modal synthesis, outperforming DeepWiki by 4.73% on CodeWikiBench.
Metagente is an LLM multi-agent system using Teacher-Student collaboration that outperforms baselines on real-world software documentation summarization for requirements analysis and technical docs.
A multi-case study plus survey produces seven actionable recommendations for efficient and responsible LLM use in industrial software engineering.
Compact constraint headers reduce prompt tokens by 25-30% with no significant change in constraint compliance rates across tested models and tasks.
A rapid review of fairness in LLM-enabled multi-agent systems for the software development lifecycle concludes that the field lacks standardized evaluations, broad coverage, and effective governance, leaving it unprepared for deployable fair systems.
citing papers explorer
No citing papers match the current filters.