Feature-Based Q-Learning for Two-Player Stochastic Games

Lin F. Yang; Mengdi Wang; Zeyu Jia

arxiv: 1906.00423 · v1 · pith:737QV3ULnew · submitted 2019-06-02 · 💻 cs.LG · cs.GT· stat.ML

Feature-Based Q-Learning for Two-Player Stochastic Games

Zeyu Jia , Lin F. Yang , Mengdi Wang This is my paper

classification 💻 cs.LG cs.GTstat.ML

keywords algorithmstrategyepsilonsampletwo-playerfeaturesfindgame

0 comments

read the original abstract

Consider a two-player zero-sum stochastic game where the transition function can be embedded in a given feature space. We propose a two-player Q-learning algorithm for approximating the Nash equilibrium strategy via sampling. The algorithm is shown to find an $\epsilon$-optimal strategy using sample size linear to the number of features. To further improve its sample efficiency, we develop an accelerated algorithm by adopting techniques such as variance reduction, monotonicity preservation and two-sided strategy approximation. We prove that the algorithm is guaranteed to find an $\epsilon$-optimal strategy using no more than $\tilde{\mathcal{O}}(K/(\epsilon^{2}(1-\gamma)^{4}))$ samples with high probability, where $K$ is the number of features and $\gamma$ is a discount factor. The sample, time and space complexities of the algorithm are independent of original dimensions of the game.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Sample efficient inductive matrix completion with noise and inexact side information
stat.ML 2026-05 unverdicted novelty 7.0

Nonconvex projected gradient descent for noisy inductive matrix completion achieves linear convergence and order-optimal error at sample complexity scaling with side-information dimension a instead of ambient dimension n.