Selecting Normal-Form Nash Equilibria in Extensive-Form Games via a Sequence-Form Variant of Logit Quantal Response Equilibrium
Pith reviewed 2026-05-10 07:05 UTC · model grok-4.3
The pith
A sequence-form version of logit QRE lets researchers compute selected normal-form Nash equilibria in extensive-form games by following a differentiable path to the limit.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The sequence-form logit QRE is equivalent to the normal-form version, and a path-following method based on it converges to a normal-form Nash equilibrium as the rationality parameter tends to infinity, starting from an arbitrary initial point.
What carries the argument
The sequence-form formulation of logit QRE, which expresses the equilibrium conditions in terms of sequences rather than complete strategies, enabling the differentiable path-following method.
If this is right
- The method computes equilibria in extensive-form games without explicitly constructing the normal form.
- Each point on the computed path corresponds to a logit QRE for a specific rationality level.
- The limit point is a Nash equilibrium selected according to the logit response criterion.
- It provides an efficient framework for exploiting the equilibrium selection property of logit QRE.
Where Pith is reading between the lines
- This could allow selection of particular equilibria in very large games like those arising in multi-agent reinforcement learning.
- The differentiability may enable integration with gradient-based optimization techniques for game solving.
- Similar reformulations might apply to other quantal response variants or equilibrium concepts.
Load-bearing premise
The sequence-form logit QRE is mathematically equivalent to the normal-form logit QRE, preserving the selection of the same normal-form Nash equilibrium in the limit, and the game has perfect recall.
What would settle it
Demonstrating a game with perfect recall where the limit point from the sequence-form path is not a Nash equilibrium of the normal-form game, or selects a different one than the standard normal-form logit QRE.
read the original abstract
Although logit quantal response equilibrium (logit QRE) offers a natural equilibrium selection mechanism and converges to Nash equilibrium as the rationality parameter tends to infinity, its computation in extensive-form games is generally intractable when based on the normal-form representation, due to the exponential growth of the strategy space. To address this difficulty, this paper develops a sequence-form formulation of logit QRE for finite n-player extensive-form games with perfect recall, which avoids explicit construction of the normal form and enables compact computation. Based on this formulation, we further develop a differentiable path-following method starting from an arbitrary initial point, such that each point on the path corresponds to a logit QRE associated with a particular value of the rationality parameter, and the limiting point yields a Nash equilibrium. In this way, the proposed method provides an efficient computational framework for exploiting the equilibrium selection property of logit QRE in extensive-form games. The effectiveness of the proposed method is validated by theoretical analysis and numerical experiments.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript develops a sequence-form formulation of logit quantal response equilibrium (QRE) for finite n-player extensive-form games with perfect recall. This avoids explicit construction of the normal-form strategy space and enables compact computation. The authors further propose a differentiable path-following procedure that traces a continuum of logit QRE points parameterized by the rationality level, converging in the limit to a normal-form Nash equilibrium selected by the logit QRE mechanism. Theoretical analysis and numerical experiments are used to support the approach.
Significance. If the sequence-form variant preserves the equilibrium selection properties of standard normal-form logit QRE, the work would provide a practical computational framework for exploiting QRE-based selection in large extensive-form games without exponential blow-up. The differentiable path-following method and its ability to start from arbitrary initial points are notable technical strengths that could support reproducibility and further algorithmic extensions.
major comments (1)
- The central claim that the sequence-form logit QRE converges to the same normal-form Nash equilibrium that would be selected by applying logit QRE directly to the normal-form representation is load-bearing for the title and abstract. Because the logit response is applied locally to sequence probabilities at information sets rather than to full pure strategies (whose payoffs are defined over the exponential normal-form space), the selection dynamics may differ even under perfect recall and Kuhn equivalence. A formal argument or counterexample establishing that the limiting equilibria coincide (or explicitly characterizing when they do not) is required.
minor comments (1)
- Clarify in the experimental section how the initial points for the path-following are chosen and whether different starts can yield different limiting equilibria in the tested games.
Simulated Author's Rebuttal
We thank the referee for the careful reading of our manuscript and the insightful comments on the relationship between sequence-form and normal-form logit QRE. We address the major comment below with a clarification of our claims and a commitment to strengthen the presentation.
read point-by-point responses
-
Referee: The central claim that the sequence-form logit QRE converges to the same normal-form Nash equilibrium that would be selected by applying logit QRE directly to the normal-form representation is load-bearing for the title and abstract. Because the logit response is applied locally to sequence probabilities at information sets rather than to full pure strategies (whose payoffs are defined over the exponential normal-form space), the selection dynamics may differ even under perfect recall and Kuhn equivalence. A formal argument or counterexample establishing that the limiting equilibria coincide (or explicitly characterizing when they do not) is required.
Authors: We agree that the distinction between local sequence-form responses and global normal-form responses is important and merits explicit treatment. The manuscript positions the contribution as a sequence-form variant of logit QRE whose limiting points are normal-form Nash equilibria; it does not assert that these limits are identical to those obtained by applying standard normal-form logit QRE to the exponentially expanded strategy space. Because the quantal-response function is defined on sequence probabilities at each information set (conditional on reaching that set), the selection dynamics are those of a behavioral/agent-form QRE rather than a normal-form QRE. Under perfect recall the two representations are strategically equivalent by Kuhn's theorem, yet the equilibrium-selection properties generally differ. In the revised manuscript we will (i) add a short clarifying paragraph to the abstract and introduction, (ii) insert a new subsection in the theoretical analysis that formally relates the two notions, and (iii) supply both a characterization of the games in which the selected equilibria coincide and a simple counter-example (a two-player game with a non-trivial information set) in which they diverge. These additions will make the precise scope of the selection claim transparent without altering the algorithmic contribution. revision: partial
Circularity Check
No significant circularity in derivation of sequence-form logit QRE or path-following method
full rationale
The paper introduces a sequence-form formulation of logit QRE for perfect-recall extensive-form games and a differentiable path-following procedure whose limit is a normal-form Nash equilibrium. These rest on the standard property that logit QRE converges to Nash as the rationality parameter tends to infinity, together with the known equivalence of behavioral and mixed strategies under perfect recall. No equation or step is shown to reduce by construction to a fitted parameter, a self-defined quantity, or a load-bearing self-citation whose content is merely renamed. The computational framework and selection claim are therefore independent of the target result and self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Players have perfect recall in the extensive-form game
Reference graph
Works this paper leans on
-
[1]
Jr.: Equilibrium points of bimatrix games
Lemke, C.E., Howson, J.T. Jr.: Equilibrium points of bimatrix games. J. Soc. Ind. Appl. Math. 12(2), 413–423 (1964) https://doi.org/10.1137/0112033
-
[2]
Rosenm¨ uller, J.: On a generalization of the lemke–howson algorithm to noncooperative n-person games. SIAM J. Appl. Math.21(1), 73–79 (1971) https://doi.org/10.1137/0121010
-
[3]
Wilson, R.: Computing equilibria of n-person games. SIAM J. Appl. Math.21(1), 80–87 (1971) https://doi.org/10.1137/0121011
-
[4]
Garcia, C.B., Lemke, C.E., Luethi, H.: Simplicial approximation of an equilibrium point for non- cooperative n-person games. In: Hu, T.C., Robinson, S.M. (eds.) Mathematical Programming, pp. 227–260. Academic Press, New York (1973). https://doi.org/10.1016/B978-0-12-358350-5. 50011-7
-
[5]
van der Laan, G., Talman, A.J.J.: On the computation of fixed points in the product space of unit simplices and an application to noncooperative n person games. Math. Oper. Res.7(1), 1–13 (1982) https://doi.org/10.1287/moor.7.1.1
-
[6]
Doup, T.M., Talman, A.J.J.: A new simplicial variable dimension algorithm to find equilibria on the product space of unit simplices. Math. Program.37(3), 319–355 (1987) https://doi.org/ 10.1007/BF02591741
-
[7]
Herings, P.J.-J., van den Elzen, A.: Computation of the nash equilibrium selected by the tracing procedure in n-person games. Games Econ. Behav.38(1), 89–117 (2002) https://doi.org/10. 1006/game.2001.0856
-
[8]
Herings, P.J.-J., Peeters, R.J.A.P.: A differentiable homotopy to compute nash equilibria of n-person games. Econ. Theory18(1), 159–185 (2001) https://doi.org/10.1007/PL00004129
-
[9]
Govindan, S., Wilson, R.: A global newton method to compute nash equilibria. J. Econ. Theory 110(1), 65–86 (2003) https://doi.org/10.1016/S0022-0531(03)00005-X
-
[10]
Chen, Y., Dang, C.: A reformulation-based smooth path-following method for computing nash equilibria. Econ. Theory Bull.4(2), 231–246 (2016) https://doi.org/10.1007/s40505-015-0083-7
-
[11]
Cao, Y., Dang, C., Sun, Y.: Complementarity enhanced nash’s mappings and differentiable homotopy methods to select perfect equilibria. J. Optim. Theory Appl.192(2), 533–563 (2022) https://doi.org/10.1007/s10957-021-01977-x
-
[12]
Cao, Y., Dang, C.: A variant of harsanyi’s tracing procedures to select a perfect equilibrium in normal form games. Games Econ. Behav.134, 127–150 (2022) https://doi.org/10.1016/j.geb. 2022.04.004
-
[13]
Cao, Y., Chen, Y., Dang, C.: A variant of the logistic quantal response equilibrium to select a perfect equilibrium. J. Optim. Theory Appl.201(3), 1026–1062 (2024) https://doi.org/10.1007/ s10957-024-02433-2 15
work page 2024
-
[14]
Chen, Y., Dang, C.: A differentiable homotopy method to compute perfect equilibria. Math. Program.185(1), 77–109 (2021) https://doi.org/10.1007/s10107-019-01422-y
- [15]
-
[16]
Dokl.akad.nauk Sssr3(3), 62–64 (1962)
Romanovskii, I.V.: Reduction of a game with complete memory to a matrix game. Dokl.akad.nauk Sssr3(3), 62–64 (1962)
work page 1962
-
[17]
Koller, D., Megiddo, N.: The complexity of two-person zero-sum games in extensive form. Games Econ. Behav.4(4), 528–552 (1992) https://doi.org/10.1016/0899-8256(92)90035-Q
-
[18]
von Stengel, B.: Efficient computation of behavior strategies. Games Econ. Behav.14(2), 220– 246 (1996) https://doi.org/10.1006/game.1996.0050
-
[19]
Koller, D., Megiddo, N., von Stengel, B.: Efficient computation of equilibria for extensive two- person games. Games Econ. Behav.14(2), 247–259 (1996) https://doi.org/10.1006/game.1996. 0051
-
[20]
Koller, D., Pfeffer, A.: Representations and solutions for game-theoretic problems. Artif. Intell. 94(1), 167–215 (1997) https://doi.org/10.1016/S0004-3702(97)00023-4
-
[21]
Econometrica70(2), 693–715 (2002) https://doi.org/10.1111/ 1468-0262.00300
von Stengel, B., van den Elzen, A., Talman, D.: Computing normal form perfect equilibria for extensive two-person games. Econometrica70(2), 693–715 (2002) https://doi.org/10.1111/ 1468-0262.00300
-
[22]
Miltersen, P.B., Sørensen, T.B.: Computing a quasi-perfect equilibrium of a two-player game. Econ. Theory42(1), 175–192 (2010) https://doi.org/10.1007/s00199-009-0440-6
-
[23]
Govindan, S., Wilson, R.: Structure theorems for game trees. Proc. Natl. Acad. Sci.99(13), 9077–9080 (2002) https://doi.org/10.1073/pnas.082249599
-
[24]
Hou, Y., Cao, Y., Dang, C., Wang, Y.: A sequence-form differentiable path-following method to compute nash equilibria. Comput. Optim. Appl. (2025) https://doi.org/10.1007/ s10589-025-00702-y
work page 2025
-
[25]
Hou, Y., Cao, Y., Dang, C., Wang, Y.: A sequence-form characterization and differentiable path- following method for computing normal-form perfect equilibria in extensive-form games. arXiv (2025). https://doi.org/10.48550/arXiv.2505.13827
-
[26]
Osborne, M.J., Rubinstein, A.: A Course in Game Theory vol. 1. The MIT Press, Cambridge (1994)
work page 1994
-
[27]
Selten, R.: Reexamination of the perfectness concept for equilibrium points in extensive games. Int. J. Game Theory4(1), 25–55 (1975) https://doi.org/10.1007/BF01766400
-
[28]
Harsanyi, J.C., Selten, R.: A General Theory of Equilibrium Selection in Games vol. 1. The MIT 16 Press, Cambridge (1988) 17
work page 1988
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.