pith. sign in

arxiv: 2604.22317 · v1 · submitted 2026-04-24 · 🧮 math.OC

Stackelberg Stochastic Linear-Quadratic Differential Games: A Closed-Loop Equilibrium Approach

Pith reviewed 2026-05-08 11:21 UTC · model grok-4.3

classification 🧮 math.OC
keywords Stackelberg gamesstochastic differential gameslinear-quadratic controlclosed-loop equilibriaRiccati equationstime-inconsistencyoptimal control
0
0 comments X

The pith

A closed-loop equilibrium reformulation derives the leader's strategy in stochastic Stackelberg LQ games via a variational method and shows it matches the feedback solution with global well-posedness for any finite horizon.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses time-inconsistency in Stackelberg stochastic linear-quadratic differential games under closed-loop information. It replaces time-discretization methods with a direct reformulation of the leader's problem as a forward-backward stochastic control problem whose backward part consists of controlled Riccati equations. A variational necessary condition then yields the leader's equilibrium strategy together with an associated equilibrium Riccati equation. Because the follower is required to respond optimally against every admissible leader control rather than only along the equilibrium path, the resulting strategy is more robust. In the linear-quadratic case the derived equation coincides exactly with the known coupled HJB system, supplying both an alternative derivation and a rigorous justification for the limiting procedure used in earlier work.

Core claim

By treating the leader's problem as a nonlinear forward-backward optimal control problem and applying a variational method, the paper obtains an equilibrium Riccati equation whose solution coincides exactly with the coupled HJB system that defines the feedback Stackelberg solution; a priori estimates on this equation establish global existence and uniqueness for arbitrary finite horizons in both one and high dimensions, removing the short-horizon or control-independent-diffusion restrictions of previous results.

What carries the argument

The equilibrium Riccati equation obtained by imposing the variational necessary condition on the leader's forward-backward system in which the follower applies the global best response to any leader control.

If this is right

  • The leader's closed-loop equilibrium strategy is identical to the feedback Stackelberg solution.
  • The game is globally well-posed for any finite time horizon without additional restrictions on the diffusion coefficient.
  • The follower's response remains optimal against every admissible leader control, not merely along the equilibrium trajectory.
  • The discretization-and-limit procedure used in earlier literature converges to the true equilibrium.
  • Numerical or analytic solution of the equilibrium Riccati equation directly yields the equilibrium strategies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same forward-backward reformulation may supply a route to equilibria in non-linear-quadratic Stackelberg games where HJB equations are unavailable.
  • Solving the equilibrium Riccati equation numerically could provide a practical algorithm for computing credible Stackelberg strategies in applications such as asset management.
  • Requiring global optimality of the follower may strengthen credibility results in other dynamic games with commitment.

Load-bearing premise

The variational necessary condition applied to the nonlinear forward-backward system with controlled Riccati equations is sufficient to characterize the equilibrium and the resulting equation admits a global solution for any finite horizon.

What would settle it

A concrete stochastic LQ Stackelberg game with long time horizon in which either the solution of the derived equilibrium Riccati equation differs from the strategy obtained from the coupled HJB system or the Riccati equation fails to remain solvable up to the terminal time.

Figures

Figures reproduced from arXiv: 2604.22317 by Bowen Ma, Hanxiao Wang, Qi L\"u.

Figure 1
Figure 1. Figure 1: Deterministic solutions P1(s) and P2(s) governing the feedback Stackelberg equilibrium view at source ↗
Figure 2
Figure 2. Figure 2: displays three sample paths of the wealth process X(s) under the closed-loop equilibrium strate￾gies. Several observations are in order: 1. Convergence to the target. All three paths start from the same initial condition, increase over time, and approach the terminal target z as s → T. This shows that the equilibrium strategies successfully steer wealth toward the goal despite stochastic volatility. 2. Rob… view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of control trajectories u1(s) (leader) and u2(s) (follower) across three sample paths. Remark 5.2. The numerical simulation presented here is merely an approximation, and the observed phe￾nomena are illustrative rather than mathematically rigorous. A full justification–including a stability analysis of the ERE and a precise interpretation of the approximate equilibrium strategy–is required but f… view at source ↗
read the original abstract

This paper addresses a Stackelberg stochastic linear-quadratic (LQ) differential game under closed-loop information, a problem inherently time-inconsistent. Existing approaches rely on solving two coupled Hamilton-Jacobi-Bellman (HJB) equations derived via time discretization and a limiting argument, whose convergence remains an open problem. We propose an alternative framework based on closed-loop equilibrium strategies. We reformulate the leader's problem as a forward-backward optimal control problem involving a coupled system of forward SDEs and backward Riccati equations. Due to the presence of controlled Riccati equations, the leader's problem becomes essentially nonlinear. Using a variational method, we characterize the leader's closed-loop equilibrium strategy and derive the associated equilibrium Riccati equation (ERE). A key conceptual distinction is that the follower adopts a globally optimal strategy against any admissible control of the leader, whereas in previous literature the follower's strategy was only locally optimal along the leader's specific equilibrium path. This makes the follower's strategy more robust and the leader's commitment more credible. In our LQ setting, the resulting ERE coincides exactly with the coupled HJB system from the literature, showing the leader's strategy is equivalent to the feedback Stackelberg solution. Thus, our framework provides not only an alternative derivation but also a rigorous justification of the limiting argument. We establish a priori estimates for the ERE, covering 1D and high-dimensional cases, ensuring global well-posedness for any finite horizon. This significantly extends existing results which require a sufficiently short time horizon or control-independent diffusion. An application to an asset management problem with numerical simulations illustrates the theoretical results.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper develops a closed-loop equilibrium approach for Stackelberg stochastic linear-quadratic differential games under closed-loop information. It reformulates the leader's problem as a nonlinear forward-backward optimal control problem with controlled Riccati equations, applies a variational method to characterize the leader's equilibrium strategy, and derives an equilibrium Riccati equation (ERE) that exactly coincides with the coupled HJB system from prior literature. A priori estimates are established for the ERE to guarantee global well-posedness on arbitrary finite horizons in both 1D and higher dimensions, extending results that previously required short horizons or control-independent diffusion. The framework emphasizes the follower's global optimality against any leader control and is illustrated via an asset management application with numerical simulations.

Significance. If the variational derivation and a priori estimates hold, the work supplies an independent route to the feedback Stackelberg solution that justifies the limiting argument from time-discretization methods whose convergence was previously open. The distinction between global and local optimality for the follower strengthens the conceptual credibility of the equilibrium. The global well-posedness results in high dimensions represent a substantial technical advance over existing LQ Stackelberg theory.

major comments (2)
  1. The central claim that the ERE coincides exactly with the coupled HJB system rests on the variational first-order condition applied to the nonlinear forward-backward system; the manuscript should explicitly verify that the controlled-Riccati structure does not introduce additional terms that would break the equivalence (see the derivation leading to the ERE in the main technical section).
  2. The a priori estimates guaranteeing global solvability for arbitrary finite horizons in high dimensions are load-bearing for the well-posedness claim; the proof should clarify the dependence on the dimension and on the control-dependent diffusion coefficients, as the extension beyond control-independent cases is a key contribution.
minor comments (2)
  1. Notation for the controlled Riccati equations could be introduced earlier with a clear distinction between the leader's and follower's Riccati variables to improve readability.
  2. The numerical example in the asset-management application would benefit from a brief discussion of how the computed strategies differ from those obtained via the classical coupled-HJB approach.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thorough review and the encouraging recommendation for minor revision. We address each of the major comments in detail below.

read point-by-point responses
  1. Referee: The central claim that the ERE coincides exactly with the coupled HJB system rests on the variational first-order condition applied to the nonlinear forward-backward system; the manuscript should explicitly verify that the controlled-Riccati structure does not introduce additional terms that would break the equivalence (see the derivation leading to the ERE in the main technical section).

    Authors: We appreciate the referee's suggestion to make the equivalence more explicit. In deriving the equilibrium Riccati equation (ERE) via the variational approach in Section 3, we start from the first-order condition on the nonlinear forward-backward system and substitute the controlled Riccati dynamics. To confirm no additional terms arise, we will include a direct comparison in the revised version, showing term-by-term that the ERE matches the coupled HJB system previously obtained in the literature. This verification will be added as a dedicated paragraph following the derivation of the ERE. revision: yes

  2. Referee: The a priori estimates guaranteeing global solvability for arbitrary finite horizons in high dimensions are load-bearing for the well-posedness claim; the proof should clarify the dependence on the dimension and on the control-dependent diffusion coefficients, as the extension beyond control-independent cases is a key contribution.

    Authors: We agree that clarifying the dependence is important for highlighting the contribution. The a priori estimates in Section 4 rely on Gronwall-type inequalities and matrix norm bounds that explicitly account for the dimension through the trace and Frobenius norms, as well as the control-dependent diffusion via the uniform boundedness of the diffusion coefficients. In the revision, we will expand the proof to include explicit remarks on these dependencies, particularly how the estimates remain uniform in dimension under our assumptions and how control dependence is handled without restricting to short horizons. This will not change the mathematical content but improve readability. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation proceeds from independent variational characterization

full rationale

The paper starts from the closed-loop equilibrium definition, reformulates the leader's problem as a nonlinear forward-backward stochastic control system with controlled Riccati equations, and applies a variational method to characterize the equilibrium strategy and obtain the equilibrium Riccati equation (ERE). The subsequent observation that this ERE coincides with the coupled HJB system of prior literature is presented as a verification result rather than an input assumption. Global well-posedness follows from a priori estimates derived directly on the ERE for arbitrary finite horizons in both one and high dimensions. No load-bearing step reduces by construction to a fitted parameter, self-citation, or definitional identity; the framework supplies an alternative route whose equivalence to the literature is a derived property, not a presupposition.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard existence and uniqueness results for forward-backward SDEs and Riccati equations in stochastic control, plus the linear-quadratic structure that permits the variational argument; no new free parameters or invented entities are introduced in the abstract.

axioms (2)
  • standard math Existence and uniqueness of solutions to the forward SDEs and backward Riccati equations under the linear-quadratic coefficients
    Invoked implicitly when the equilibrium Riccati equation is stated to be well-posed globally
  • domain assumption Applicability of the variational method to the nonlinear controlled-Riccati forward-backward system
    Required for characterizing the leader's equilibrium strategy

pith-pipeline@v0.9.0 · 5602 in / 1576 out tokens · 46796 ms · 2026-05-08T11:21:20.125138+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages

  1. [1]

    Başar and A

    T. Başar and A. Haurie, Feedback equilibria in differential games with structural and modal uncer- tainties, in Advances in Large Scale Systems, J. J. B. Cruz, ed., JAI Press, Greenwich, CT, 1984, 163–201

  2. [2]

    Başar and G

    T. Başar and G. J. Olsder, Dynamic Noncooperative Game Theory, Academic Press, London, UK, 1995

  3. [3]

    Bensoussan, S

    A. Bensoussan, S. Chen, and S. P. Sethi, The maximum principle for global solutions of stochastic stackelberg differential games, SIAM J. Control Optim., 53 (2015), 1956–1981

  4. [4]

    Bensoussan et al., A

    A. Bensoussan et al., A. Bensoussan, S. Chen, A. Chutani, S. P. Sethi, C.C. Siu, and S.C. Phillip Yam, Feedback Stackelberg-Nash equilibria in mixed leadership games with an application to cooperative advertising, SIAM J. Control Optim., 57 (2019), 3413–3444

  5. [5]

    Bensoussan, S

    A. Bensoussan, S. Chen, and S. P. Sethi, Feedback Stackelberg solutions of infinite-horizon stochastic differential games, In F. El Ouardighi and K. Kogan, (eds), Models and methods in economics and management science. Springer Cham, 2014

  6. [6]

    Björk, M

    T. Björk, M. Khapko, and A. Murgoci, On time-inconsistent stochastic control in continuous time, Finance Stoch., 21 (2017), 331–360

  7. [7]

    Dou and Q

    F. Dou and Q. Lü, Time-inconsistent linear quadratic optimal control problems for stochastic evolution equations, SIAM J. Control Optim., 58 (2020), 485–509

  8. [8]

    C. H. Edwards, Advanced Calculus of Several Variables, Academic Press, New York, 1973

  9. [9]

    Ekeland and A

    I. Ekeland and A. Lazrak, The golden rule when preferences are time inconsistent, Math. Financ. Econ., 4 (2010), 29–55

  10. [10]

    Ekeland and T

    I. Ekeland and T. A. Pirvu, Investment and consumption without commitment, Math. Financ. Econ., 2 (2008), 57–86

  11. [11]

    A. F. Filippov, Differential Equations with Discontinuous Righthand Sides, Math. Appl., Kluwer Aca- demic Publishers, Dordrecht, the Netherlands, 1988

  12. [12]

    Gârleanu and L

    N. Gârleanu and L. H. Pedersen, Dynamic portfolio choice with frictions, J. Econom. Theory, 165 (2016), 487–516. 25

  13. [13]

    Hernández, N

    C. Hernández, N. Hernández-Santibáñez, E. Hubert, and D. Possamaï, Closed-loop equilibria for Stack- elberg games: A story about stochastic targets, Ann. Appl. Probab., 36 (2026), 901–954

  14. [14]

    R. A. Horn and C. R. Johnson, Topics in Matrix Analysis, Cambridge University Press, Cambridge, UK, 1991

  15. [15]

    Y. Hu, H. Jin, and X. Y. Zhou, Time-inconsistent stochastic linear-quadratic control, SIAM J. Control Optim., 50 (2012), 1548–1572

  16. [16]

    Y. Hu, H. Jin, and X. Y. Zhou, Time-inconsistent stochastic linear-quadratic control: characterization and uniqueness of equilibrium, SIAM J. Control Optim., 55 (2017), 1261–1279

  17. [17]

    Huang and J

    Q. Huang and J. Shi, Stackelberg stochastic differential games in feedback information pattern with applications, Dyn. Games Appl., 14 (2024), 1191–1224

  18. [18]

    Li and Z

    N. Li and Z. Yu, Forward-backward stochastic differential equations and linear-quadratic generalized Stackelberg games, SIAM J. Control Optim., 56 (2018), 4148–4180

  19. [19]

    Lü and B

    Q. Lü and B. Ma, Time-inconsistent linear quadratic optimal control problem for forward-backward stochastic differential equations, ESAIM Control Optim. Calc. Var., 30 (2024), 76

  20. [20]

    Q. Lü, B. Ma, and H. Wang, Forward-backward stochastic linear-quadratic optimal controls: equilibrium strategies and non-symmetric Riccati equations, SIAM J. Control Optim., to appear; arXiv:2504.14288

  21. [21]

    Moon and H

    J. Moon and H. J. Yang, Linear-quadratic time-inconsistent mean-field type Stackelberg differential games: time-consistent open-loop solutions, IEEE Trans. Automat. Control, 66 (2020), 375–382

  22. [22]

    Moon, Linear-quadratic stochastic Stackelberg differential games for jump-diffusion systems, SIAM J

    J. Moon, Linear-quadratic stochastic Stackelberg differential games for jump-diffusion systems, SIAM J. Control Optim., 59 (2021), 954–976

  23. [23]

    G. P. Papavassilopoulos and J. B. Cruz, Jr., Nonclassical control problems and Stackelberg games, IEEE Trans. Automat. Control, 24 (1979), 155–166

  24. [24]

    J. Shi, G. Wang, and J. Xiong, Leader-follower stochastic differential game with asymmetric informa- tion and applications, Automatica, 63 (2016), 60–73

  25. [25]

    Simaan and J

    M. Simaan and J. B. Cruz, Jr., Additional aspects of the Stackelberg strategy in nonzero-sum games, J. Optim. Theory Appl., 11 (1973), 613–626

  26. [26]

    von Stackelberg, Marktform und Gleichgewicht, Springer-Verlag, Wien New York, 1934

    H. von Stackelberg, Marktform und Gleichgewicht, Springer-Verlag, Wien New York, 1934

  27. [27]

    Strotz, Myopia and inconsistency in dynamic utility maximization, Rev

    R. Strotz, Myopia and inconsistency in dynamic utility maximization, Rev. Econ. Stud., 23 (1955), 165–180

  28. [28]

    J. Sun, H. Wang, and J. Wen, Zero-sum Stackelberg stochastic linear-quadratic differential games, SIAM J. Control Optim., 61 (2023), 250–282

  29. [29]

    H. Wang, J. Yong, and C. Zhou, Optimal controls for forward-backward stochastic differential equations: time-inconsistency and time-consistent solutions, J. Math. Pures Appl., 190 (2024), 103603

  30. [30]

    Q. Wei, J. Yong, and Z. Yu, Time-inconsistent recursive stochastic optimal control problems, SIAM J. Control Optim., 55 (2017), 4156–4201

  31. [31]

    J. Xu, J. Shi, and H. Zhang, A leader-follower stochastic linear quadratic differential game with time delay, Sci. China Inf. Sci., 61 (2018), 112202

  32. [32]

    Yong, A leader-follower stochastic linear quadratic differential game, SIAM J

    J. Yong, A leader-follower stochastic linear quadratic differential game, SIAM J. Control Optim., 41 (2002), 1015–1041

  33. [33]

    Yong, Time-inconsistent optimal control problems, Proceedings of 2014 ICM, Section 16

    J. Yong, Time-inconsistent optimal control problems, Proceedings of 2014 ICM, Section 16. Control Theory and Optimization, (2014), 947–969

  34. [34]

    Yong, Linear-quadratic optimal control problems for mean-field stochastic differential equations— time-consistent solutions, Trans

    J. Yong, Linear-quadratic optimal control problems for mean-field stochastic differential equations— time-consistent solutions, Trans. Amer. Math. Soc., 369 (2017), 5467–5523

  35. [35]

    Yong and X

    J. Yong and X. Y. Zhou, Stochastic Control: Hamiltonian Systems and HJB Equations, Springer- Verlag, New York, 1999. 26