hub

Marft: Multi-agent reinforcement fine-tuning

Junwei Liao, Muning Wen, Jun Wang, Weinan Zhang · 2025 · arXiv 2504.16129

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

open full Pith review browse 11 citing papers arXiv PDF

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

Learning from Self-Debate: Preparing Reasoning Models for Multi-Agent Debate

cs.CL · 2026-01-29 · unverdicted · novelty 7.0

SDRL trains LLMs via self-generated multi-path debates and joint optimization of standalone plus debate-conditioned responses to boost both single-model reasoning and multi-agent debate performance.

AIPO: Learning to Reason from Active Interaction

cs.CL · 2026-05-08 · unverdicted · novelty 6.0 · 2 refs

AIPO adds active multi-agent consultation (Verify, Knowledge, Reasoning agents) plus custom importance sampling to RLVR training so LLMs expand their reasoning boundary and then operate without the agents.

Tree-based Credit Assignment for Multi-Agent Memory System

cs.MA · 2026-05-06 · unverdicted · novelty 6.0

TreeMem assigns credit to agents in multi-agent memory systems by expanding outputs into a tree and using Monte Carlo averaging of final rewards to optimize each agent's policy.

Joint Optimization of Multi-agent Memory System

cs.MA · 2026-03-13 · unverdicted · novelty 6.0

CoMAM jointly optimizes agents in multi-agent LLM memory systems via end-to-end RL and adaptive credit assignment to improve collaboration and performance.

Memory in the Age of AI Agents

cs.CL · 2025-12-15 · unverdicted · novelty 6.0

The paper maps agent memory research via three forms (token-level, parametric, latent), three functions (factual, experiential, working), and dynamics of formation/evolution/retrieval, plus benchmarks and future directions.

Reinforced Collaboration in Multi-Agent Flow Networks

cs.LG · 2026-05-13 · unverdicted · novelty 5.0

MANGO optimizes multi-agent LLM workflows via flow networks, RL, and textual gradients, delivering up to 12.8% higher performance and 47.4% better efficiency while generalizing to new domains.

A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems

cs.AI · 2025-08-10 · unverdicted · novelty 5.0

A comprehensive review of self-evolving AI agents that improve themselves over time, organized via a framework of inputs, agent system, environment, and optimizers, with domain-specific and safety discussions.

Position: Agentic AI System Is a Foreseeable Pathway to AGI

cs.AI · 2026-05-13 · unverdicted · novelty 4.0

Agentic AI systems with DAG topologies are claimed to deliver exponentially superior generalization and sample efficiency compared to monolithic scaling for achieving AGI.

Reinforcement Learning for LLM-based Multi-Agent Systems through Orchestration Traces

cs.CL · 2026-05-04 · unverdicted · novelty 4.0

This survey organizes RL for LLM multi-agent systems into reward families, credit units, and five orchestration sub-decisions, notes the absence of explicit stopping-decision training in its paper pool, and releases a tagged corpus.

Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

cs.AI · 2025-03-31 · unverdicted · novelty 2.0

This survey frames foundation agents using brain-inspired modular architectures and reviews challenges in evolution, collaboration, and safety.

Learning Decentralized LLM Collaboration with Multi-Agent Actor Critic

cs.AI · 2026-01-29

citing papers explorer

Showing 4 of 4 citing papers after filters.

Learning from Self-Debate: Preparing Reasoning Models for Multi-Agent Debate cs.CL · 2026-01-29 · unverdicted · none · ref 13 · internal anchor
SDRL trains LLMs via self-generated multi-path debates and joint optimization of standalone plus debate-conditioned responses to boost both single-model reasoning and multi-agent debate performance.
AIPO: Learning to Reason from Active Interaction cs.CL · 2026-05-08 · unverdicted · none · ref 36 · 2 links · internal anchor
AIPO adds active multi-agent consultation (Verify, Knowledge, Reasoning agents) plus custom importance sampling to RLVR training so LLMs expand their reasoning boundary and then operate without the agents.
Memory in the Age of AI Agents cs.CL · 2025-12-15 · unverdicted · none · ref 67 · internal anchor
The paper maps agent memory research via three forms (token-level, parametric, latent), three functions (factual, experiential, working), and dynamics of formation/evolution/retrieval, plus benchmarks and future directions.
Reinforcement Learning for LLM-based Multi-Agent Systems through Orchestration Traces cs.CL · 2026-05-04 · unverdicted · none · ref 32 · internal anchor
This survey organizes RL for LLM multi-agent systems into reward families, credit units, and five orchestration sub-decisions, notes the absence of explicit stopping-decision training in its paper pool, and releases a tagged corpus.

Marft: Multi-agent reinforcement fine-tuning

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer