Finite sliding window policies achieve near-optimality and Q-learning converges to them for decentralized stochastic control under OSDISP and KSPISP information structures when a predictor stability condition holds in expected total variation.
Monte-carlo expectation maximization for decentralized pomdps.Proccedings of the International joint conference on artificial intelligence, 2013
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
math.OC 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Approximations and Learning for Decentralized Stochastic Control and Near Optimal Finite Window Policies
Finite sliding window policies achieve near-optimality and Q-learning converges to them for decentralized stochastic control under OSDISP and KSPISP information structures when a predictor stability condition holds in expected total variation.