MAVIC corrects Bellman backups at instruction boundaries by adjusting the incoming objective and restoring continuation value, enabling consistent estimation under stochastic instruction switching in cooperative MARL.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Exact critic in entropy-regularized actor-critic yields strong variance reduction, enabling Õ(log(1/ε)) sample complexity for ε-optimal regularized value.
citing papers explorer
-
Refined Analysis of Entropy-Regularized Actor-Critic
Exact critic in entropy-regularized actor-critic yields strong variance reduction, enabling Õ(log(1/ε)) sample complexity for ε-optimal regularized value.