Finite sliding window policies achieve near-optimality and Q-learning converges to them for decentralized stochastic control under OSDISP and KSPISP information structures when a predictor stability condition holds in expected total variation.
On the role of information structure in rein- forcement learning for partially-observable sequential teams and games
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
math.OC 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Approximations and Learning for Decentralized Stochastic Control and Near Optimal Finite Window Policies
Finite sliding window policies achieve near-optimality and Q-learning converges to them for decentralized stochastic control under OSDISP and KSPISP information structures when a predictor stability condition holds in expected total variation.