pith. sign in

arxiv: 1810.04433 · v3 · pith:BEHDKPTSnew · submitted 2018-10-10 · 💻 cs.LG · stat.ML

Lazy-CFR: fast and near optimal regret minimization for extensive games with imperfect information

classification 💻 cs.LG stat.ML
keywords regretinformationlazy-cfrvanillaboundgameslazymathcal
0
0 comments X
read the original abstract

Counterfactual regret minimization (CFR) is the most popular algorithm on solving two-player zero-sum extensive games with imperfect information and achieves state-of-the-art performance in practice. However, the performance of CFR is not fully understood, since empirical results on the regret are much better than the upper bound proved in \cite{zinkevich2008regret}. Another issue is that CFR has to traverse the whole game tree in each round, which is time-consuming in large scale games. In this paper, we present a novel technique, lazy update, which can avoid traversing the whole game tree in CFR, as well as a novel analysis on the regret of CFR with lazy update. Our analysis can also be applied to the vanilla CFR, resulting in a much tighter regret bound than that in \cite{zinkevich2008regret}. Inspired by lazy update, we further present a novel CFR variant, named Lazy-CFR. Compared to traversing $O(|\mathcal{I}|)$ information sets in vanilla CFR, Lazy-CFR needs only to traverse $O(\sqrt{|\mathcal{I}|})$ information sets per round while keeping the regret bound almost the same, where $\mathcal{I}$ is the class of all information sets. As a result, Lazy-CFR shows better convergence result compared with vanilla CFR. Experimental results consistently show that Lazy-CFR outperforms the vanilla CFR significantly.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Low-Variance and Zero-Variance Baselines for Extensive-Form Games

    cs.GT 2019-07 unverdicted novelty 7.0

    A framework for baseline-corrected values in EFGs is introduced, with new baselines including a predictive one that is provably optimal for zero-variance estimates under certain sampling schemes.