CLS-DP distills privileged multi-agent dynamics into a collaborative latent space that each agent infers from local RGB observations to condition diffusion-based actions, achieving 38% mean success on six RoboFactory tasks versus 20% for the best centralized baseline.
An initial introduction to cooperative multi-agent reinforcement learning
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
MAVIC corrects Bellman backups at instruction boundaries by adjusting the incoming objective and restoring continuation value, enabling consistent estimation under stochastic instruction switching in cooperative MARL.
citing papers explorer
-
Robust Instruction Compliance in Cooperative Multi-Agent Reinforcement Learning
MAVIC corrects Bellman backups at instruction boundaries by adjusting the incoming objective and restoring continuation value, enabling consistent estimation under stochastic instruction switching in cooperative MARL.