Stackelberg Stochastic Linear-Quadratic Differential Games: A Closed-Loop Equilibrium Approach
Pith reviewed 2026-05-08 11:21 UTC · model grok-4.3
The pith
A closed-loop equilibrium reformulation derives the leader's strategy in stochastic Stackelberg LQ games via a variational method and shows it matches the feedback solution with global well-posedness for any finite horizon.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By treating the leader's problem as a nonlinear forward-backward optimal control problem and applying a variational method, the paper obtains an equilibrium Riccati equation whose solution coincides exactly with the coupled HJB system that defines the feedback Stackelberg solution; a priori estimates on this equation establish global existence and uniqueness for arbitrary finite horizons in both one and high dimensions, removing the short-horizon or control-independent-diffusion restrictions of previous results.
What carries the argument
The equilibrium Riccati equation obtained by imposing the variational necessary condition on the leader's forward-backward system in which the follower applies the global best response to any leader control.
If this is right
- The leader's closed-loop equilibrium strategy is identical to the feedback Stackelberg solution.
- The game is globally well-posed for any finite time horizon without additional restrictions on the diffusion coefficient.
- The follower's response remains optimal against every admissible leader control, not merely along the equilibrium trajectory.
- The discretization-and-limit procedure used in earlier literature converges to the true equilibrium.
- Numerical or analytic solution of the equilibrium Riccati equation directly yields the equilibrium strategies.
Where Pith is reading between the lines
- The same forward-backward reformulation may supply a route to equilibria in non-linear-quadratic Stackelberg games where HJB equations are unavailable.
- Solving the equilibrium Riccati equation numerically could provide a practical algorithm for computing credible Stackelberg strategies in applications such as asset management.
- Requiring global optimality of the follower may strengthen credibility results in other dynamic games with commitment.
Load-bearing premise
The variational necessary condition applied to the nonlinear forward-backward system with controlled Riccati equations is sufficient to characterize the equilibrium and the resulting equation admits a global solution for any finite horizon.
What would settle it
A concrete stochastic LQ Stackelberg game with long time horizon in which either the solution of the derived equilibrium Riccati equation differs from the strategy obtained from the coupled HJB system or the Riccati equation fails to remain solvable up to the terminal time.
Figures
read the original abstract
This paper addresses a Stackelberg stochastic linear-quadratic (LQ) differential game under closed-loop information, a problem inherently time-inconsistent. Existing approaches rely on solving two coupled Hamilton-Jacobi-Bellman (HJB) equations derived via time discretization and a limiting argument, whose convergence remains an open problem. We propose an alternative framework based on closed-loop equilibrium strategies. We reformulate the leader's problem as a forward-backward optimal control problem involving a coupled system of forward SDEs and backward Riccati equations. Due to the presence of controlled Riccati equations, the leader's problem becomes essentially nonlinear. Using a variational method, we characterize the leader's closed-loop equilibrium strategy and derive the associated equilibrium Riccati equation (ERE). A key conceptual distinction is that the follower adopts a globally optimal strategy against any admissible control of the leader, whereas in previous literature the follower's strategy was only locally optimal along the leader's specific equilibrium path. This makes the follower's strategy more robust and the leader's commitment more credible. In our LQ setting, the resulting ERE coincides exactly with the coupled HJB system from the literature, showing the leader's strategy is equivalent to the feedback Stackelberg solution. Thus, our framework provides not only an alternative derivation but also a rigorous justification of the limiting argument. We establish a priori estimates for the ERE, covering 1D and high-dimensional cases, ensuring global well-posedness for any finite horizon. This significantly extends existing results which require a sufficiently short time horizon or control-independent diffusion. An application to an asset management problem with numerical simulations illustrates the theoretical results.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops a closed-loop equilibrium approach for Stackelberg stochastic linear-quadratic differential games under closed-loop information. It reformulates the leader's problem as a nonlinear forward-backward optimal control problem with controlled Riccati equations, applies a variational method to characterize the leader's equilibrium strategy, and derives an equilibrium Riccati equation (ERE) that exactly coincides with the coupled HJB system from prior literature. A priori estimates are established for the ERE to guarantee global well-posedness on arbitrary finite horizons in both 1D and higher dimensions, extending results that previously required short horizons or control-independent diffusion. The framework emphasizes the follower's global optimality against any leader control and is illustrated via an asset management application with numerical simulations.
Significance. If the variational derivation and a priori estimates hold, the work supplies an independent route to the feedback Stackelberg solution that justifies the limiting argument from time-discretization methods whose convergence was previously open. The distinction between global and local optimality for the follower strengthens the conceptual credibility of the equilibrium. The global well-posedness results in high dimensions represent a substantial technical advance over existing LQ Stackelberg theory.
major comments (2)
- The central claim that the ERE coincides exactly with the coupled HJB system rests on the variational first-order condition applied to the nonlinear forward-backward system; the manuscript should explicitly verify that the controlled-Riccati structure does not introduce additional terms that would break the equivalence (see the derivation leading to the ERE in the main technical section).
- The a priori estimates guaranteeing global solvability for arbitrary finite horizons in high dimensions are load-bearing for the well-posedness claim; the proof should clarify the dependence on the dimension and on the control-dependent diffusion coefficients, as the extension beyond control-independent cases is a key contribution.
minor comments (2)
- Notation for the controlled Riccati equations could be introduced earlier with a clear distinction between the leader's and follower's Riccati variables to improve readability.
- The numerical example in the asset-management application would benefit from a brief discussion of how the computed strategies differ from those obtained via the classical coupled-HJB approach.
Simulated Author's Rebuttal
We thank the referee for the thorough review and the encouraging recommendation for minor revision. We address each of the major comments in detail below.
read point-by-point responses
-
Referee: The central claim that the ERE coincides exactly with the coupled HJB system rests on the variational first-order condition applied to the nonlinear forward-backward system; the manuscript should explicitly verify that the controlled-Riccati structure does not introduce additional terms that would break the equivalence (see the derivation leading to the ERE in the main technical section).
Authors: We appreciate the referee's suggestion to make the equivalence more explicit. In deriving the equilibrium Riccati equation (ERE) via the variational approach in Section 3, we start from the first-order condition on the nonlinear forward-backward system and substitute the controlled Riccati dynamics. To confirm no additional terms arise, we will include a direct comparison in the revised version, showing term-by-term that the ERE matches the coupled HJB system previously obtained in the literature. This verification will be added as a dedicated paragraph following the derivation of the ERE. revision: yes
-
Referee: The a priori estimates guaranteeing global solvability for arbitrary finite horizons in high dimensions are load-bearing for the well-posedness claim; the proof should clarify the dependence on the dimension and on the control-dependent diffusion coefficients, as the extension beyond control-independent cases is a key contribution.
Authors: We agree that clarifying the dependence is important for highlighting the contribution. The a priori estimates in Section 4 rely on Gronwall-type inequalities and matrix norm bounds that explicitly account for the dimension through the trace and Frobenius norms, as well as the control-dependent diffusion via the uniform boundedness of the diffusion coefficients. In the revision, we will expand the proof to include explicit remarks on these dependencies, particularly how the estimates remain uniform in dimension under our assumptions and how control dependence is handled without restricting to short horizons. This will not change the mathematical content but improve readability. revision: yes
Circularity Check
No significant circularity; derivation proceeds from independent variational characterization
full rationale
The paper starts from the closed-loop equilibrium definition, reformulates the leader's problem as a nonlinear forward-backward stochastic control system with controlled Riccati equations, and applies a variational method to characterize the equilibrium strategy and obtain the equilibrium Riccati equation (ERE). The subsequent observation that this ERE coincides with the coupled HJB system of prior literature is presented as a verification result rather than an input assumption. Global well-posedness follows from a priori estimates derived directly on the ERE for arbitrary finite horizons in both one and high dimensions. No load-bearing step reduces by construction to a fitted parameter, self-citation, or definitional identity; the framework supplies an alternative route whose equivalence to the literature is a derived property, not a presupposition.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math Existence and uniqueness of solutions to the forward SDEs and backward Riccati equations under the linear-quadratic coefficients
- domain assumption Applicability of the variational method to the nonlinear controlled-Riccati forward-backward system
Reference graph
Works this paper leans on
-
[1]
Başar and A
T. Başar and A. Haurie, Feedback equilibria in differential games with structural and modal uncer- tainties, in Advances in Large Scale Systems, J. J. B. Cruz, ed., JAI Press, Greenwich, CT, 1984, 163–201
1984
-
[2]
Başar and G
T. Başar and G. J. Olsder, Dynamic Noncooperative Game Theory, Academic Press, London, UK, 1995
1995
-
[3]
Bensoussan, S
A. Bensoussan, S. Chen, and S. P. Sethi, The maximum principle for global solutions of stochastic stackelberg differential games, SIAM J. Control Optim., 53 (2015), 1956–1981
2015
-
[4]
Bensoussan et al., A
A. Bensoussan et al., A. Bensoussan, S. Chen, A. Chutani, S. P. Sethi, C.C. Siu, and S.C. Phillip Yam, Feedback Stackelberg-Nash equilibria in mixed leadership games with an application to cooperative advertising, SIAM J. Control Optim., 57 (2019), 3413–3444
2019
-
[5]
Bensoussan, S
A. Bensoussan, S. Chen, and S. P. Sethi, Feedback Stackelberg solutions of infinite-horizon stochastic differential games, In F. El Ouardighi and K. Kogan, (eds), Models and methods in economics and management science. Springer Cham, 2014
2014
-
[6]
Björk, M
T. Björk, M. Khapko, and A. Murgoci, On time-inconsistent stochastic control in continuous time, Finance Stoch., 21 (2017), 331–360
2017
-
[7]
Dou and Q
F. Dou and Q. Lü, Time-inconsistent linear quadratic optimal control problems for stochastic evolution equations, SIAM J. Control Optim., 58 (2020), 485–509
2020
-
[8]
C. H. Edwards, Advanced Calculus of Several Variables, Academic Press, New York, 1973
1973
-
[9]
Ekeland and A
I. Ekeland and A. Lazrak, The golden rule when preferences are time inconsistent, Math. Financ. Econ., 4 (2010), 29–55
2010
-
[10]
Ekeland and T
I. Ekeland and T. A. Pirvu, Investment and consumption without commitment, Math. Financ. Econ., 2 (2008), 57–86
2008
-
[11]
A. F. Filippov, Differential Equations with Discontinuous Righthand Sides, Math. Appl., Kluwer Aca- demic Publishers, Dordrecht, the Netherlands, 1988
1988
-
[12]
Gârleanu and L
N. Gârleanu and L. H. Pedersen, Dynamic portfolio choice with frictions, J. Econom. Theory, 165 (2016), 487–516. 25
2016
-
[13]
Hernández, N
C. Hernández, N. Hernández-Santibáñez, E. Hubert, and D. Possamaï, Closed-loop equilibria for Stack- elberg games: A story about stochastic targets, Ann. Appl. Probab., 36 (2026), 901–954
2026
-
[14]
R. A. Horn and C. R. Johnson, Topics in Matrix Analysis, Cambridge University Press, Cambridge, UK, 1991
1991
-
[15]
Y. Hu, H. Jin, and X. Y. Zhou, Time-inconsistent stochastic linear-quadratic control, SIAM J. Control Optim., 50 (2012), 1548–1572
2012
-
[16]
Y. Hu, H. Jin, and X. Y. Zhou, Time-inconsistent stochastic linear-quadratic control: characterization and uniqueness of equilibrium, SIAM J. Control Optim., 55 (2017), 1261–1279
2017
-
[17]
Huang and J
Q. Huang and J. Shi, Stackelberg stochastic differential games in feedback information pattern with applications, Dyn. Games Appl., 14 (2024), 1191–1224
2024
-
[18]
Li and Z
N. Li and Z. Yu, Forward-backward stochastic differential equations and linear-quadratic generalized Stackelberg games, SIAM J. Control Optim., 56 (2018), 4148–4180
2018
-
[19]
Lü and B
Q. Lü and B. Ma, Time-inconsistent linear quadratic optimal control problem for forward-backward stochastic differential equations, ESAIM Control Optim. Calc. Var., 30 (2024), 76
2024
- [20]
-
[21]
Moon and H
J. Moon and H. J. Yang, Linear-quadratic time-inconsistent mean-field type Stackelberg differential games: time-consistent open-loop solutions, IEEE Trans. Automat. Control, 66 (2020), 375–382
2020
-
[22]
Moon, Linear-quadratic stochastic Stackelberg differential games for jump-diffusion systems, SIAM J
J. Moon, Linear-quadratic stochastic Stackelberg differential games for jump-diffusion systems, SIAM J. Control Optim., 59 (2021), 954–976
2021
-
[23]
G. P. Papavassilopoulos and J. B. Cruz, Jr., Nonclassical control problems and Stackelberg games, IEEE Trans. Automat. Control, 24 (1979), 155–166
1979
-
[24]
J. Shi, G. Wang, and J. Xiong, Leader-follower stochastic differential game with asymmetric informa- tion and applications, Automatica, 63 (2016), 60–73
2016
-
[25]
Simaan and J
M. Simaan and J. B. Cruz, Jr., Additional aspects of the Stackelberg strategy in nonzero-sum games, J. Optim. Theory Appl., 11 (1973), 613–626
1973
-
[26]
von Stackelberg, Marktform und Gleichgewicht, Springer-Verlag, Wien New York, 1934
H. von Stackelberg, Marktform und Gleichgewicht, Springer-Verlag, Wien New York, 1934
1934
-
[27]
Strotz, Myopia and inconsistency in dynamic utility maximization, Rev
R. Strotz, Myopia and inconsistency in dynamic utility maximization, Rev. Econ. Stud., 23 (1955), 165–180
1955
-
[28]
J. Sun, H. Wang, and J. Wen, Zero-sum Stackelberg stochastic linear-quadratic differential games, SIAM J. Control Optim., 61 (2023), 250–282
2023
-
[29]
H. Wang, J. Yong, and C. Zhou, Optimal controls for forward-backward stochastic differential equations: time-inconsistency and time-consistent solutions, J. Math. Pures Appl., 190 (2024), 103603
2024
-
[30]
Q. Wei, J. Yong, and Z. Yu, Time-inconsistent recursive stochastic optimal control problems, SIAM J. Control Optim., 55 (2017), 4156–4201
2017
-
[31]
J. Xu, J. Shi, and H. Zhang, A leader-follower stochastic linear quadratic differential game with time delay, Sci. China Inf. Sci., 61 (2018), 112202
2018
-
[32]
Yong, A leader-follower stochastic linear quadratic differential game, SIAM J
J. Yong, A leader-follower stochastic linear quadratic differential game, SIAM J. Control Optim., 41 (2002), 1015–1041
2002
-
[33]
Yong, Time-inconsistent optimal control problems, Proceedings of 2014 ICM, Section 16
J. Yong, Time-inconsistent optimal control problems, Proceedings of 2014 ICM, Section 16. Control Theory and Optimization, (2014), 947–969
2014
-
[34]
Yong, Linear-quadratic optimal control problems for mean-field stochastic differential equations— time-consistent solutions, Trans
J. Yong, Linear-quadratic optimal control problems for mean-field stochastic differential equations— time-consistent solutions, Trans. Amer. Math. Soc., 369 (2017), 5467–5523
2017
-
[35]
Yong and X
J. Yong and X. Y. Zhou, Stochastic Control: Hamiltonian Systems and HJB Equations, Springer- Verlag, New York, 1999. 26
1999
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.