Decentralized diffusion policies trained with importance sampling score matching enhance exploration and performance in cooperative MARL over Gaussian policy baselines.
By the triangle inequality, we have that D2 TV(PX0 ∥PY0)≤2D 2 TV(PX0 ∥P ˜X0 ) + 2D2 TV(P ˜X0 ∥PY0)
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.MA 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Decentralized Diffusion Policy Learning for Enhanced Exploration in Cooperative Multi-agent Reinforcement Learning
Decentralized diffusion policies trained with importance sampling score matching enhance exploration and performance in cooperative MARL over Gaussian policy baselines.