LLMs frequently specify library versions with known CVEs in generated code (36-56% of tasks), show low compatibility (20-63%), and converge on the same risky versions across models.
hub Canonical reference
Llm-based multi-agent systems for software engineering: Literature review, vision, and the road ahead.ACM Trans.Softw
Canonical reference. 100% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
roles
background 5polarities
background 5representative citing papers
AIDev is a new open dataset of 456k AI-agent pull requests showing agents submit code faster than humans but with lower acceptance rates and simpler changes.
A new six-dimension process taxonomy for AI software development frameworks shows convergence on artifact persistence and human oversight but reveals that no framework covers all dimensions strongly, indicating a depth-portability trade-off.
MALMAS is a memory-augmented multi-agent LLM system that generates diverse, high-quality features for tabular data via agent decomposition, routing, and iterative memory-guided refinement.
FLARE extracts specifications from multi-agent LLM code and applies coverage-guided fuzzing to achieve 96.9% inter-agent and 91.1% intra-agent coverage while uncovering 56 new failures across 16 applications.
VIGA introduces a training-free interleaved multimodal reasoning loop that improves vision-as-inverse-graphics accuracy over one-shot baselines on BlenderGym, SlideBench, and new BlenderBench.
POIROT protocol repurposes agents in LLM multi-agent systems as an internal diagnostic layer for failure detection, outperforming single-LLM evaluators with gains that increase with complexity, agent count, and fault types.
ReasonVul deploys three LLM agents with independent analysis and structured debate to achieve 40% PairAcc and 72.52% F1 on PrimeVul, outperforming baselines by 81% in PairAcc.
SelfHeal uses two ReAct agents and empirical fix patterns to repair bugs in LLM agents, outperforming baselines on a new 37-instance benchmark.
Agentic Business Process Management reframes BPM around autonomous agents that must exhibit framed autonomy, explainability, conversational actionability, and self-modification to keep their actions aligned with organizational objectives.
Large-scale experiment with 1440 task executions finds dynamic routing of coordination strategies achieves near-best quality scores across models and classes but does not reliably identify exact winners.
RocketSmith is an LLM-based agentic system that designs four high-powered rockets via additive manufacturing, with two achieving stable launches and recovery after reaching 80% of simulated apogee.
Agentic architecture with context engineering enables scalable UML diagram generation from source code across multiple languages and diagram types.
Systematic study of inter-agent communication in LLM multi-agent systems shows reasoning and verification are critical for performance, with a new augmentation technique recovering 86.2% of failures.
Comparative review of AI coding tool ToS shows responsibility for code quality and compliance shifted to users, with policy misalignment for autonomous agents, plus a research roadmap.
CodeWiki presents a unified framework for repository-level documentation across seven languages using hierarchical decomposition, recursive multi-agent processing, and multi-modal synthesis, outperforming DeepWiki by 4.73% on CodeWikiBench.
Metagente is an LLM multi-agent system using Teacher-Student collaboration that outperforms baselines on real-world software documentation summarization for requirements analysis and technical docs.
A multi-case study plus survey produces seven actionable recommendations for efficient and responsible LLM use in industrial software engineering.
Compact constraint headers reduce prompt tokens by 25-30% with no significant change in constraint compliance rates across tested models and tasks.
A rapid review of fairness in LLM-enabled multi-agent systems for the software development lifecycle concludes that the field lacks standardized evaluations, broad coverage, and effective governance, leaving it unprepared for deployable fair systems.
citing papers explorer
-
RocketSmith: Agentic Additive Manufacturing of High-Powered Rockets
RocketSmith is an LLM-based agentic system that designs four high-powered rockets via additive manufacturing, with two achieving stable launches and recovery after reaching 80% of simulated apogee.