Closed-loop equilibria for Stackelberg games: a story about stochastic targets
Pith reviewed 2026-05-23 23:51 UTC · model grok-4.3
The pith
Stackelberg games under closed-loop strategies reduce to single-level stochastic control with target constraints.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By considering the second-order backward stochastic differential equation associated with the continuation utility of the follower as a controlled state variable for the leader, the latter's unconventional optimisation problem can be reformulated as a more standard stochastic control problem with target constraints. Thereafter the optimal strategies and equilibrium value are characterised through the solution of a system of Hamilton-Jacobi-Bellman equations.
What carries the argument
The second-order backward stochastic differential equation for the follower's continuation utility, adjoined as an additional controlled state variable for the leader.
If this is right
- The bi-level Stackelberg problem becomes a single-level stochastic control problem with terminal constraints.
- Equilibrium strategies and value functions are recovered from solutions to a system of Hamilton-Jacobi-Bellman equations.
- The method applies to games where both players control drift and volatility of the same output process.
- Numerical and theoretical comparisons with open-loop or other information structures become feasible via the resulting HJB system.
Where Pith is reading between the lines
- The reformulation may allow existing numerical solvers for target-constrained control to be applied directly to Stackelberg settings.
- It suggests a route to closed-loop solutions in other hierarchical stochastic games where the follower's value process can be written as a BSDE.
- If the HJB system admits unique solutions, the method would guarantee existence of closed-loop equilibria without requiring the leader to observe the noise.
Load-bearing premise
The follower's continuation-utility BSDE can be adjoined as a controlled state for the leader while preserving decisions based only on output history and without introducing explicit dependence on the unobservable driving noise.
What would settle it
An explicit closed-loop equilibrium strategy obtained from the HJB system that cannot be implemented using only the observed output path and requires direct knowledge of the driving noise.
Figures
read the original abstract
We provide a general approach to reformulating any continuous-time stochastic Stackelberg differential game under closed-loop strategies as a single-level optimisation problem with target constraints. More precisely, we consider a Stackelberg game in which the leader and the follower can both control the drift and the volatility of a stochastic output process, in order to maximise their respective expected utility. The aim is to characterise the Stackelberg equilibrium when the players adopt 'closed-loop strategies', i.e. their decisions are based solely on the historical information of the output process, excluding especially any direct dependence on the underlying driving noise, often unobservable in real-world applications. We first show that, by considering the second-order backward stochastic differential equation associated with the continuation utility of the follower as a controlled state variable for the leader, the latter's unconventional optimisation problem can be reformulated as a more standard stochastic control problem with target constraints. Thereafter, adapting the methodology developed by Soner and Touzi (2002a) or Bouchard, Elie and Imbert (2010), the optimal strategies, as well as the corresponding value of the Stackelberg equilibrium, can be characterised through the solution of a well-specified system of Hamilton- Jacobi-Bellman equations. For a more comprehensive insight, we illustrate our approach through a simple example, facilitating both theoretical and numerical detailed comparisons with the solutions under different information structures studied in the literature.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to reformulate any continuous-time stochastic Stackelberg differential game under closed-loop strategies (decisions based only on the history of the output process X) as a single-level stochastic control problem with target constraints. This is achieved by adjoining the second-order BSDE for the follower's continuation utility as an additional controlled state for the leader; the resulting problem is then solved via HJB equations adapting Soner-Touzi (2002) and Bouchard et al. (2010), with an illustrative example for comparison across information structures.
Significance. If the adjoining step is rigorously justified, the approach would extend standard stochastic control methods (with target constraints) to closed-loop Stackelberg games where the driving noise is unobservable, providing a general framework beyond open-loop or full-information cases and enabling numerical comparisons in the example.
major comments (2)
- [Reformulation of the leader's problem] The central reformulation (abstract and the step adjoining the follower's 2BSDE) treats the 2BSDE solution as a controlled state while claiming to preserve the closed-loop filtration generated by X alone. No verification is supplied that this state process remains adapted to the observable sigma-field of X (without explicit dependence on the unobservable W), which is load-bearing for the single-level problem to be a valid closed-loop Stackelberg equilibrium; the cited external methods do not automatically guarantee this under the paper's information structure.
- [HJB characterization step] The subsequent HJB characterization (abstract) is asserted to yield the optimal strategies and equilibrium value, but the manuscript supplies no proof sketch, verification theorem, or error analysis for the reformulated control problem with the adjoined state; this leaves the soundness of the equilibrium characterization unassessed beyond the high-level claim.
minor comments (2)
- The abstract refers to 'a well-specified system of Hamilton-Jacobi-Bellman equations' without indicating its dimension or the form of the target constraints in the general case; a brief outline would improve readability.
- Notation for the output process, controls, and filtrations could be introduced more explicitly at the start of the main text to aid comparison with the cited literature on different information structures.
Simulated Author's Rebuttal
We thank the referee for their careful reading of the manuscript and for highlighting these important points regarding the rigor of the reformulation. We address the major comments point by point below.
read point-by-point responses
-
Referee: [Reformulation of the leader's problem] The central reformulation (abstract and the step adjoining the follower's 2BSDE) treats the 2BSDE solution as a controlled state while claiming to preserve the closed-loop filtration generated by X alone. No verification is supplied that this state process remains adapted to the observable sigma-field of X (without explicit dependence on the unobservable W), which is load-bearing for the single-level problem to be a valid closed-loop Stackelberg equilibrium; the cited external methods do not automatically guarantee this under the paper's information structure.
Authors: We agree that an explicit verification of the adaptation to the filtration generated by X is necessary to confirm that the reformulated problem indeed corresponds to a closed-loop equilibrium. In the manuscript, the 2BSDE is constructed with coefficients that depend on the closed-loop controls, which are functions of the history of X, ensuring by construction that its solution is adapted to the observable filtration. However, we acknowledge that this is not stated explicitly as a separate result. In the revision, we will add a short lemma verifying that the solution to the follower's 2BSDE remains adapted to the sigma-field generated by X, without direct dependence on the unobservable Brownian motion W. This will strengthen the justification for adjoining it as a controlled state. revision: yes
-
Referee: [HJB characterization step] The subsequent HJB characterization (abstract) is asserted to yield the optimal strategies and equilibrium value, but the manuscript supplies no proof sketch, verification theorem, or error analysis for the reformulated control problem with the adjoined state; this leaves the soundness of the equilibrium characterization unassessed beyond the high-level claim.
Authors: The HJB system is obtained by direct application of the methodology in Soner and Touzi (2002) and Bouchard et al. (2010) to the reformulated stochastic control problem with target constraints and the additional controlled state from the 2BSDE. While the manuscript relies on these references for the characterization, we concur that including a brief outline of how the verification theorem applies in this augmented setting would be beneficial. We will add a short proof sketch in the revised version, detailing the key steps of the verification argument and noting any modifications required due to the presence of the adjoined state and the target constraints. revision: yes
Circularity Check
No circularity; reformulation adapts external Soner-Touzi and Bouchard methodologies without self-reduction
full rationale
The paper's core step adjoins the follower's 2BSDE as a controlled state and then applies the cited external frameworks of Soner-Touzi (2002) and Bouchard-Elie-Imbert (2010) to obtain an HJB system. These citations are independent, externally published, and not self-citations by the present authors. No equation or claim reduces by construction to a fitted parameter or prior result defined inside this manuscript; the closed-loop information structure is preserved by the problem setup rather than by redefinition. The derivation therefore remains non-circular.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 3 Pith papers
-
Principal-agent problems with adverse selection: A stochastic target problem formulation
Agent's optimization in unique-contract principal-agent problem with adverse selection is recast as stochastic target problem, enabling principal's objective as stochastic optimal control with partial information and ...
-
Principal-agent problems with adverse selection: A stochastic target problem formulation
Principal-agent adverse selection with unique contracts is reformulated as a stochastic target problem for the agent and a stochastic optimal control problem for the principal.
-
Optimal control of Volterra integral diffusions and application to contract theory
The value function for optimal control of non-convolution Volterra integral diffusions is characterized as the unique viscosity solution to a parabolic PDE on Sobolev space, with applications to time-inconsistent cont...
Reference graph
Works this paper leans on
-
[1]
R. Aïd, M. Basei, and H. Pham. A McKean–Vlasov approach to distributed electricity generation development.Mathe- matical Methods of Operations Research, 91:269–310, 2020
work page 2020
-
[2]
A. Bagchi. Stackelberg differential games in economic models, volume 64 of Lecture notes in control and information sciences. Springer Berlin, Heidelberg, 1984. 32
work page 1984
-
[3]
T. Başar. Stochastic stagewise Stackelberg strategies for linear quadratic systems. In M. Kohlmann and W. Vogel, editors, Stochastic control theory and stochastic differential systems. Proceedings of a workshop of the „Sonderforschungsbereich 72 der Deutschen Forschungsgemeinschaft an der Universität Bonn” which took place in January 1979 at Bad Honnef, v...
work page 1979
-
[4]
T. Başar. A new method for the Stackelberg solution of differential games with sampled-data state information.IFAC Proceedings Volumes, 14(2):1365–1370, 1981
work page 1981
-
[5]
T. Başar and A. Haurie. Feedback equilibria in differential games with structural and modal uncertainties. In J.B. Cruz Jr., editor, Advances in large scale systems, volume 1, pages 163–201. 1984
work page 1984
-
[6]
T. Başar and G.J. Olsder. Team-optimal closed-loop Stackelberg strategies in hierarchical control problems.Automatica, 16(4):409–414, 1980
work page 1980
-
[7]
T. Başar and G.J. Olsder.Dynamic noncooperative game theory. SIAM, 2nd revised edition, 1999
work page 1999
-
[8]
T. Başar and H. Selbuz. A new approach for derivation of closed-loop Stackelberg strategies. In R.E. Larson and A.S. Willsky, editors,1978 IEEE conference on decision and control including the 17th symposium on adaptive processes, pages 1113–1118, 1978
work page 1978
-
[9]
T. Başar and H. Selbuz. Closed-loop Stackelberg strategies with applications in the optimal control of multilevel systems. IEEE Transactions on Automatic Control, 24(2):166–179, 1979
work page 1979
-
[10]
A. Bensoussan, S. Chen, and S.P. Sethi. Feedback Stackelberg solutions of infinite-horizon stochastic differential ames. In F. El Ouardighi and K. Kogan, editors,Models and methods in economics and management science: essays in honor of Charles S. Tapiero, volume 198 ofInternational series in operations research & management science, pages 3–15. Springer ...
work page 2014
-
[11]
A. Bensoussan, S. Chen, and S.P. Sethi. The maximum principle for global solutions of stochastic Stackelberg differential games. SIAM Journal on Control and Optimization, 53(4):1956–1981, 2015
work page 1956
-
[12]
A. Bensoussan, S. Chen, A. Chutani, S.P. Sethi, C.C. Siu, and S.C.P. Yam. Feedback Stackelberg–Nash equilibria in mixed leadership games with an application to cooperative advertising.SIAM Journal on Control and Optimization, 57 (5):3413–3444, 2019
work page 2019
-
[13]
B. Bouchard, R. Élie, and N. Touzi. Stochastic target problems with controlled loss. SIAM Journal on Control and Optimization, 48(5):3123–3150, 2009
work page 2009
-
[14]
B. Bouchard, R. Élie, and C. Imbert. Optimal control under stochastic target constraints.SIAM Journal on Control and Optimization, 48(5):3501–3531, 2010
work page 2010
-
[15]
A. Bressan. Noncooperative differential games.Milan Journal of Mathematics, 79:357–427, 2011
work page 2011
-
[16]
R. Carmona.Lectures on BSDEs, stochastic control, and stochastic differential games with financial applications, volume 1 of Financial mathematics. SIAM, 2016
work page 2016
- [17]
-
[18]
Stackelbergsolutionfortwo-persongameswithbiasedinformationpatterns
C.I.ChenandJ.B.CruzJr. Stackelbergsolutionfortwo-persongameswithbiasedinformationpatterns. IEEE Transactions on Automatic Control, 17(6):791–798, 1972
work page 1972
-
[19]
A. Chutani and S.P. Sethi. A feedback Stackelberg game of cooperative advertising in a durable goods oligopoly. In J. Haunschmied, V.M. Veliov, and S. Wrzaczek, editors,Dynamic games in economics, volume 16 ofDynamic modeling and econometrics in economics and finance, pages 89–114. Springer, 2014
work page 2014
-
[20]
W. Cong and J. Shi. Direct approach of linear–quadratic Stackelberg mean field games of backward–forward stochastic systems. ArXiv preprint arXiv:2401.15835, 2024
-
[21]
M.G. Crandall, H. Ishii, and P.-L. Lions. User’s guide to viscosity solutions of second order partial differential equations. Bulletin of the American Mathematical Society, 27(1):1–67, 1992
work page 1992
-
[22]
J.B. Cruz Jr. Survey of Nash and Stackelberg equilibrium strategies in dynamic games. InAnnals of economic and social measurement, volume 4, pages 339–344. National Bureau of Economic Research, 1975
work page 1975
-
[23]
J.B. Cruz Jr. Stackelberg strategies for multilevel systems. In Y.C. Ho and S.K. Mitter, editors,Directions in large-scale systems, pages 139–147. Springer New York, NY, 1976. 33
work page 1976
-
[24]
J. Cvitanić and J. Zhang.Contract theory in continuous-time models. Springer, 2012
work page 2012
-
[25]
J. Cvitanić, D. Possamaï, and N. Touzi. Dynamic programming approach to principal–agent problems.Finance and Stochastics, 22(1):1–37, 2018
work page 2018
-
[26]
G. Dayanıklı and M. Laurière. A machine learning method for Stackelberg mean field games. ArXiv preprint arXiv:2302.10440, 2023
-
[27]
E.J. Dockner, S. Jorgensen, N. Van Long, and G. Sorger. Differential games in economics and management science. Cambridge University Press, 2000
work page 2000
-
[28]
N. El Karoui and X. Tan. Capacities, measurable selection and dynamic programming part II: application in stochastic control problems. Technical report, École Polytechnique and université Paris-Dauphine, 2013
work page 2013
-
[29]
X. Feng, Y. Hu, and J. Huang. Backward Stackelberg differential game with constraints: a mixed terminal-perturbation and linear–quadratic approach.SIAM Journal on Control and Optimization, 60(3):1488–1518, 2022
work page 2022
- [30]
-
[31]
B Gardner and J.B. Cruz Jr. Feedback Stackelberg strategy for a two player game.IEEE Transactions on Automatic Control, 22(2):270–271, 1977
work page 1977
- [32]
-
[33]
G. Guan, Z. Liang, and Y. Song. A Stackelberg reinsurance–investment game underα-maxmin mean–variance criterion and stochastic volatility.Scandinavian Actuarial Journal, 2024(1):28–63, 2024
work page 2024
-
[34]
X. Han, D. Landriault, and D. Li. Optimal reinsurance contract in a Stackelberg game framework: a view of social planner. Scandinavian Actuarial Journal, 2024(2):124–148, 2024
work page 2024
-
[35]
Y. Havrylenko, M. Hinken, and R. Zagst. Risk sharing in equity-linked insurance products: Stackelberg equilibrium between an insurer and a reinsurer.ArXiv preprint arXiv:2203.04053, 2022
-
[36]
X. He, A. Prasad, S.P. Sethi, and G.J. Gutierrez. A survey of Stackelberg differential game models in supply and marketing channels. Journal of Systems Science and Systems Engineering, 16:385–413, 2007
work page 2007
-
[37]
X. He, A. Prasad, and S.P. Sethi. Cooperative advertising and pricing in a dynamic stochastic supply chain: feedback Stackelberg strategies. In D.F. Kocaoglu, T.R. Anderson, and T.U. Daim, editors,PICMET ’08, Portland international conference on management of engineering & technology, pages 1634–1649, 2008
work page 2008
-
[38]
Q. Huang and J. Shi. A verification theorem for Stackelberg stochastic differential games in feedback information pattern. ArXiv preprint arXiv:2108.06498, 2021
-
[39]
K. Kang and J. Shi. A three-level stochastic linear–quadratic Stackelberg differential game with asymmetric information. ArXiv preprint arXiv:2210.11808, 2022
-
[40]
R.L. Karandikar. On pathwise stochastic integration.Stochastic Processes and their Applications, 57(1):11–18, 1995
work page 1995
- [41]
- [42]
- [43]
-
[44]
T. Li and S.P. Sethi. A review of dynamic Stackelberg game models.Discrete & Continuous Dynamical Systems–B, 22(1): 125–129, 2017
work page 2017
- [45]
- [46]
- [47]
-
[48]
J. Liu, Y. Fan, Z. Chen, and Y. Zheng. Pessimistic bilevel optimization: a survey.International Journal of Computational Intelligence Systems, 11(1):725–736, 2018
work page 2018
-
[49]
S. Lv, J. Xiong, and X. Zhang. Linear quadratic leader–follower stochastic differential games for mean-field switching diffusions. Automatica, 154(111072):1–9, 2023
work page 2023
-
[50]
L. Mallozzi and J. Morgan. Weak Stackelberg problem and mixed solutions under data perturbations.Optimization, 32 (3):269–290, 1995
work page 1995
-
[51]
J. Moon. Linear–quadratic stochastic Stackelberg differential games for jump–diffusion systems.SIAM Journal on Control and Optimization, 59(2):954–976, 2021
work page 2021
-
[52]
Y.-H. Ni, L. Liu, and X. Zhang. Deterministic dynamic Stackelberg games: time-consistent open-loop solution.Automatica, 148(110757):1–9, 2023
work page 2023
-
[53]
M. Nutz. Pathwise construction of stochastic integrals.Electronic Communications in Probability, 17(24):1–7, 2012
work page 2012
-
[54]
B. Øksendal, L. Sandal, and J. Ubøe. Stochastic Stackelberg equilibria with applications to time-dependent newsvendor models. Journal of Economic Dynamics and Control, 37(7):1284–1299, 2013
work page 2013
-
[55]
G.P. Papavassilopoulos. Leader–follower and Nash strategies with state information. PhD thesis, University of Illinois at Urbana-Champaign, 1979
work page 1979
-
[56]
G.P. Papavassilopoulos and J.B. Cruz Jr. Nonclassical control problems and Stackelberg games.IEEE Transactions on Automatic Control, 24(2):155–166, 1979
work page 1979
-
[57]
G.P. Papavassilopoulos and J.B. Cruz Jr. Sufficient conditions for Stackelberg and Nash strategies with memory.Journal of Optimization Theory and Applications, 31(2):233–260, 1980
work page 1980
-
[58]
D. Possamaï, X. Tan, and C. Zhou. Stochastic control for a class of nonlinear kernels and applications.The Annals of Probability, 46(1):551–603, 2018
work page 2018
-
[59]
D. Possamaï, N. Touzi, and J. Zhang. Zero-sum path-dependent stochastic differential games in weak formulation.The Annals of Applied Probability, 30(3):1415–1457, 2020
work page 2020
-
[60]
Z. Ren, X. Tan, N. Touzi, and J. Yang. Entropic optimal planning for path-dependent mean field games.SIAM Journal on Control and Optimization, 61(3):1415–1437, 2023
work page 2023
- [61]
-
[62]
J. Shi, G. Wang, and J. Xiong. Leader–follower stochastic differential game with asymmetric information and applications. Automatica, 63:60–73, 2016
work page 2016
- [63]
-
[64]
M. Simaan and J.B. Cruz Jr. Additional aspects of the Stackelberg strategy in nonzero-sum games.Journal of Optimization Theory and Applications, 11(6):613–626, 1973
work page 1973
-
[65]
M. Simaan and J.B. Cruz Jr. On the Stackelberg strategy in nonzero-sum games.Journal of Optimization Theory and Applications, 11(5):533–555, 1973
work page 1973
-
[66]
M. Simaan and J.B. Cruz Jr. On the Stackelberg strategy in nonzero-sum games. In G. Leitmann, editor,Multicriteria decision making and differential games, Mathematical concepts and methods in science and engineering, pages 173–195. Springer New York, NY, 1976
work page 1976
-
[67]
H.M. Soner and N. Touzi. Dynamic programming for stochastic target problems and geometric flows.Journal of the European Mathematical Society, 4(3):201–236, 2002
work page 2002
-
[68]
H.M. Soner and N. Touzi. Stochastic target problems, dynamic programming, and viscosity solutions.SIAM Journal on Control and Optimization, 41(2):404–424, 2002
work page 2002
-
[69]
H.M. Soner and N. Touzi. A stochastic representation for mean curvature type geometric flows.The Annals of Probability, 31(3):1145–1165, 2003. 35
work page 2003
- [70]
-
[71]
D.W. Stroock and S.R.S. Varadhan.Multidimensional diffusion processes, volume 233 ofGrundlehren der mathematischen Wissenschaften. Springer-Verlag Berlin Heidelberg, 1997
work page 1997
-
[72]
J. Sun, H. Wang, and J. Wen. Zero-sum Stackelberg stochastic linear–quadratic differential games.SIAM Journal on Control and Optimization, 61(1):252–284, 2023
work page 2023
- [73]
- [74]
-
[75]
D. Vasal. Sequential decomposition of stochastic Stackelberg games. In B. Ferri and F. Zhang, editors,2022 American control conference, pages 1266–1271. IEEE, 2022
work page 2022
-
[76]
von Stackelberg.Marktform und Gleichgewicht
H. von Stackelberg.Marktform und Gleichgewicht. Springer-Verlag Wien New York, 1934
work page 1934
-
[77]
G. Wang, Y. Wang, and S. Zhang. An asymmetric information mean-field type linear–quadratic stochastic Stackelberg differential game with one leader and two followers.Optimal Control Applications and Methods, 41(4):1034–1051, 2020
work page 2020
-
[78]
W. Wiesemann, A. Tsoukalas, P.-M. Kleniati, and B. Rustem. Pessimistic bilevel optimization.SIAM Journal on Opti- mization, 23(1):353–380, 2013
work page 2013
-
[79]
Z. Wu. A general maximum principle for optimal control of forward–backward stochastic systems.Automatica, 49(5): 1473–1480, 2013
work page 2013
-
[80]
J. Yong. A leader–follower stochastic linear quadratic differential game.SIAM Journal on Control and Optimization, 41 (4):1015–1041, 2002
work page 2002
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.