Entropy-adaptive Gumbel-Sinkhorn formulation for unsupervised permutation learning that modulates temperature per assignment to address non-uniform uncertainty.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
MAVIC corrects Bellman backups at instruction boundaries by adjusting the incoming objective and restoring continuation value, enabling consistent estimation under stochastic instruction switching in cooperative MARL.
citing papers explorer
-
Learning Permutation from Structure Without Supervision
Entropy-adaptive Gumbel-Sinkhorn formulation for unsupervised permutation learning that modulates temperature per assignment to address non-uniform uncertainty.
-
Robust Instruction Compliance in Cooperative Multi-Agent Reinforcement Learning
MAVIC corrects Bellman backups at instruction boundaries by adjusting the incoming objective and restoring continuation value, enabling consistent estimation under stochastic instruction switching in cooperative MARL.