ReCrit frames critic interaction as a correctness-transition problem and uses quadrant-based RL rewards to improve LLM performance on scientific reasoning benchmarks by rewarding corrections and robustness while penalizing sycophancy.
hub
Llm-based multi-agent reinforcement learning: Current and future directions
11 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 3polarities
background 3representative citing papers
LLM-generated coordination graph priors improve multi-agent reinforcement learning performance on MPE benchmarks, with models as small as 1.5B parameters proving effective.
CoMAM jointly optimizes agents in multi-agent LLM memory systems via end-to-end RL and adaptive credit assignment to improve collaboration and performance.
AgeMem unifies long-term and short-term memory management in LLM agents by exposing memory operations as learnable tool actions trained via three-stage progressive reinforcement learning, outperforming baselines on long-horizon tasks.
WebSailor trains open-source web agents to match proprietary performance on complex information-seeking tasks by generating high-uncertainty scenarios and using a new RL method called DUPO.
CoEvolve improves LLM agent performance by 15-19% on AppWorld and BFCL benchmarks through mutual evolution of the agent and data distribution using feedback-driven task synthesis.
OATH combines adaptive Halton sampling, obstacle-aware clustering with auctions, and LLM-based instruction interpretation to improve task assignment and planning for heterogeneous robot teams in obstacle-rich environments.
A survey comparing classical multi-agent systems with large foundation model-enabled multi-agent systems, showing how the latter enables semantic-level collaboration and greater adaptability.
The paper delivers the first systematic review of self-evolving agents, structured around what components evolve, when adaptation occurs, and how it is implemented.
The survey organizes LLM-based multi-agent collaboration mechanisms into a framework with dimensions of actors, types, structures, strategies, and coordination protocols, reviews applications across domains, and identifies challenges for future research.
A survey consolidating frameworks, data practices, large action models, benchmarks, applications, and research gaps in LLM-brained GUI agents.
citing papers explorer
-
ReCrit: Transition-Aware Reinforcement Learning for Scientific Critic Reasoning
ReCrit frames critic interaction as a correctness-transition problem and uses quadrant-based RL rewards to improve LLM performance on scientific reasoning benchmarks by rewarding corrections and robustness while penalizing sycophancy.
-
Do LLM-derived graph priors improve multi-agent coordination?
LLM-generated coordination graph priors improve multi-agent reinforcement learning performance on MPE benchmarks, with models as small as 1.5B parameters proving effective.
-
Joint Optimization of Multi-agent Memory System
CoMAM jointly optimizes agents in multi-agent LLM memory systems via end-to-end RL and adaptive credit assignment to improve collaboration and performance.
-
Agentic Memory: Learning Unified Long-Term and Short-Term Memory Management for Large Language Model Agents
AgeMem unifies long-term and short-term memory management in LLM agents by exposing memory operations as learnable tool actions trained via three-stage progressive reinforcement learning, outperforming baselines on long-horizon tasks.
-
WebSailor: Navigating Super-human Reasoning for Web Agent
WebSailor trains open-source web agents to match proprietary performance on complex information-seeking tasks by generating high-uncertainty scenarios and using a new RL method called DUPO.
-
CoEvolve: Training LLM Agents via Agent-Data Mutual Evolution
CoEvolve improves LLM agent performance by 15-19% on AppWorld and BFCL benchmarks through mutual evolution of the agent and data distribution using feedback-driven task synthesis.
-
Adaptive Obstacle-Aware Task Assignment and Planning for Heterogeneous Robot Teaming
OATH combines adaptive Halton sampling, obstacle-aware clustering with auctions, and LLM-based instruction interpretation to improve task assignment and planning for heterogeneous robot teams in obstacle-rich environments.
-
Multi-Agent Systems: From Classical Paradigms to Large Foundation Model-Enabled Futures
A survey comparing classical multi-agent systems with large foundation model-enabled multi-agent systems, showing how the latter enables semantic-level collaboration and greater adaptability.
-
A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence
The paper delivers the first systematic review of self-evolving agents, structured around what components evolve, when adaptation occurs, and how it is implemented.
-
Multi-Agent Collaboration Mechanisms: A Survey of LLMs
The survey organizes LLM-based multi-agent collaboration mechanisms into a framework with dimensions of actors, types, structures, strategies, and coordination protocols, reviews applications across domains, and identifies challenges for future research.
-
Large Language Model-Brained GUI Agents: A Survey
A survey consolidating frameworks, data practices, large action models, benchmarks, applications, and research gaps in LLM-brained GUI agents.