Periodic and soft target updates guarantee convergence in linear Q-learning to the exact projected Q-Bellman solution under spectral and step-size conditions via joint spectral radius analysis of switched linear systems.
Title resolution pending
4 Pith papers cite this work. Polarity classification is still indexing.
4
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
verdicts
UNVERDICTED 4roles
background 1polarities
background 1representative citing papers
Derives an exact linear switched model for the mean dynamics of Q-learning with linear function approximation and relates convergence to joint spectral radius stability of the switched system, extending the view to stochastic and regularized cases.
Introduces and analyzes the λ-target update for linear Q-learning via geometric averaging of periodic target maps, studied with a switching-system model in the deterministic case.
Derives optimal strategies for a partially observed Stackelberg SDE game with asymmetric information and extends deterministic multi-agent formation control to the stochastic case.
citing papers explorer
No citing papers match the current filters.