When the Correct Model Fails: The Optimality of Stackelberg Equilibria with Follower Intention Updates
Pith reviewed 2026-05-17 23:30 UTC · model grok-4.3
The pith
Assuming an incorrect follower best response can yield lower leader costs in dynamic Stackelberg games with intention updates.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We prove that in general, assuming an incorrect follower's best response may lead to a lower leader cost over the entire game than knowing the true follower's best response. This holds when the leader receives updated beliefs about the follower best response before the end of the game such that the update prompts the leader and subsequently the follower to re-optimize their strategies. We characterize the optimality guarantees for open loop and feedback information structures and support the results with examples in linear quadratic Stackelberg games.
What carries the argument
Stackelberg equilibrium under belief updates about the follower's best-response function; it allows re-optimization after the leader receives new information on the follower's intentions, enabling comparison of costs between correct and incorrect assumptions.
If this is right
- In open-loop information structures, the Stackelberg equilibrium with an incorrect best response can achieve lower leader cost than the true best response after an update.
- The same cost advantage for incorrect assumptions appears under feedback information structures.
- Numerical examples in linear-quadratic Stackelberg games demonstrate concrete cases of lower leader costs with incorrect best-response assumptions.
- Monte Carlo simulations show that instances where an incorrect best response improves leader cost are non-trivial in collision-avoidance linear-quadratic games.
Where Pith is reading between the lines
- Leaders facing uncertain follower intentions might sometimes prefer to retain an approximate model rather than pursue full accuracy.
- The result connects to other dynamic decision problems where mid-game information updates allow strategies to adapt in ways that reward initial model mismatch.
Load-bearing premise
The leader receives updated beliefs about the follower best response before the end of the game such that the update prompts the leader and subsequently the follower to re-optimize their strategies.
What would settle it
A linear-quadratic Stackelberg game instance where, for every possible mid-game update of the follower's best-response belief, the leader's cumulative cost is always minimized by using the true best response rather than any incorrect one.
Figures
read the original abstract
We study a two-player dynamic Stackelberg game where the follower's intention is unknown to the leader. Classical formulations of the Stackelberg equilibrium (SE) assume that the follower's best response (BR) function is known to the leader. However, this is not always true in practice. We study a setting in which the leader receives updated beliefs about the follower BR before the end of the game, such that the update prompts the leader and subsequently the follower to re-optimize their strategies. We characterize the optimality guarantees of the SE solutions under this belief update for both open loop and feedback information structures. Interestingly, we prove that in general, assuming an incorrect follower's BR may lead to a lower leader cost over the entire game than knowing the true follower's BR. We support these results with numerical examples in a linear quadratic (LQ) Stackelberg game, and use Monte Carlo simulations to show that the instances of incorrect BR achieving lower leader costs are non-trivial in collision avoidance LQ Stackelberg games.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript studies a two-player dynamic Stackelberg game in which the leader initially holds an incorrect belief about the follower's best-response function. The model incorporates a mid-game belief update received by the leader that triggers re-optimization by both players. The authors characterize the optimality properties of the resulting Stackelberg equilibria under both open-loop and feedback information structures. The central claim is that, in general, the leader's cumulative cost can be strictly lower when starting from an incorrect BR assumption than when starting from the true BR. The theoretical results are illustrated with linear-quadratic examples and supported by Monte Carlo simulations in a collision-avoidance setting.
Significance. If the central characterization holds, the result is noteworthy because it identifies a counter-intuitive regime in which imperfect information about the follower can be advantageous for the leader over the full horizon. The paper earns credit for supplying a general, parameter-free theoretical characterization for both information structures and for using Monte Carlo runs to demonstrate that the cost-improving instances are non-trivial rather than measure-zero artifacts. This contributes to the literature on dynamic games with incomplete information and has potential implications for robust controller design in multi-agent systems.
major comments (2)
- [§3.1] §3.1 (open-loop characterization): The proof that an incorrect initial BR yields lower leader cost relies on the specific timing and form of the belief update that forces re-optimization; the manuscript should state explicitly whether the inequality continues to hold for arbitrary (including stochastic) update times or requires the update to occur before a fixed fraction of the horizon.
- [§4] §4 (feedback structure): The optimality guarantee for the feedback SE under incorrect BR is derived under the assumption that the follower observes the leader's updated strategy and re-optimizes; it is unclear from the derivation whether the same strict cost reduction holds when the follower only has noisy observations of the leader's action after the update.
minor comments (2)
- [Preliminaries] The definition of the belief-update operator should be placed in the preliminaries with a dedicated symbol rather than being introduced inline in the main theorems.
- [Numerical Examples] In the Monte Carlo section, the histograms would benefit from an additional panel showing the distribution of cost differences rather than only the fraction of improving cases.
Simulated Author's Rebuttal
We thank the referee for the thoughtful review and positive assessment of the manuscript. We address the two major comments below, indicating the revisions we plan to incorporate.
read point-by-point responses
-
Referee: [§3.1] §3.1 (open-loop characterization): The proof that an incorrect initial BR yields lower leader cost relies on the specific timing and form of the belief update that forces re-optimization; the manuscript should state explicitly whether the inequality continues to hold for arbitrary (including stochastic) update times or requires the update to occur before a fixed fraction of the horizon.
Authors: We agree that the open-loop characterization in §3.1 is derived for a deterministic belief update occurring at a fixed interior time t_update. The strict leader cost reduction depends on sufficient remaining horizon length after re-optimization. For arbitrary or stochastic update times, the inequality does not hold in general (e.g., if the update occurs near the terminal time). We will revise §3.1 to state this assumption explicitly and add a short remark on the conditions required for the result to extend to stochastic updates. revision: yes
-
Referee: [§4] §4 (feedback structure): The optimality guarantee for the feedback SE under incorrect BR is derived under the assumption that the follower observes the leader's updated strategy and re-optimizes; it is unclear from the derivation whether the same strict cost reduction holds when the follower only has noisy observations of the leader's action after the update.
Authors: We thank the referee for this clarification request. The feedback equilibrium analysis in §4 assumes that the follower perfectly observes the leader's updated strategy after the belief update, enabling exact re-optimization. With only noisy observations, the follower's best response would be computed from a noisy estimate, which can change the resulting equilibrium and may remove the strict cost advantage. We will revise §4 to state this observability assumption clearly and note the noisy-observation case as an open direction for future work. revision: yes
Circularity Check
No significant circularity
full rationale
The paper's central claim is a general theoretical characterization of Stackelberg equilibrium optimality under follower intention updates, for both open-loop and feedback information structures. This is derived directly from the stated model definitions, including the timing of belief updates and subsequent re-optimization by leader and follower. The result that an incorrect initial best-response assumption can produce lower total leader cost is obtained by comparing the resulting cost functionals under the two information structures; no step reduces to a fitted parameter, self-referential definition, or load-bearing self-citation. The LQ numerical examples and Monte Carlo runs are presented explicitly as illustrations of non-trivial instances rather than as the source of the inequality. The derivation is therefore self-contained against external benchmarks and receives a score of 0.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Classical Stackelberg equilibrium assumes the leader knows the follower's best-response function.
- domain assumption Belief updates occur before game end and trigger re-optimization by both players.
Reference graph
Works this paper leans on
-
[1]
Algorithms for inverse reinforcement learning,
A. Y . Ng and S. J. Russell, “Algorithms for inverse reinforcement learning,” inProceedings of the Seventeenth International Conference on Machine Learning, ser. ICML ’00. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2000, p. 663–670
work page 2000
-
[2]
Inverse Game Theory: Learning Utilities in Succinct Games,
V . Kuleshov and O. Schrijvers, “Inverse Game Theory: Learning Utilities in Succinct Games,” inWeb and Internet Economics, E. Markakis and G. Sch ¨afer, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2015, vol. 9470, pp. 413–427, series Title: Lecture Notes in Computer Science. [Online]. Available: https: //link.springer.com/10.1007/978-3-662-48995-6 30
-
[3]
Bayesian inverse reinforcement learning,
D. Ramachandran and E. Amir, “Bayesian inverse reinforcement learning,” inProceedings of the 20th International Joint Conference on Artifical Intelligence, ser. IJCAI’07. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2007, p. 2586–2591
work page 2007
-
[4]
B. Anderson and J. Moore,Optimal Filtering, ser. Dover Books on Electrical Engineering. Dover Publications, 2012. [Online]. Available: https://books.google.com/books?id=iYMqLQp49UMC
work page 2012
-
[5]
V on Stackelberg,Market Structure and Equilibrium
H. V on Stackelberg,Market Structure and Equilibrium. Berlin, Heidelberg: Springer Berlin Heidelberg, 1934. [Online]. Available: https://link.springer.com/10.1007/978-3-642-12586-7
-
[6]
T. Bas ¸ar and G. J. Olsder,Dynamic Noncooperative Game Theory, 2nd Edition. Society for Industrial and Applied Mathematics,
-
[7]
A Geometric Heuristic for Rectilinear Crossing Minimization
[Online]. Available: https://epubs.siam.org/doi/abs/10.1137/1. 9781611971132
-
[8]
Stackelberg solution for two-person games with biased information patterns,
C. Chen and J. Cruz, “Stackelberg solution for two-person games with biased information patterns,”IEEE Transactions on Automatic Control, vol. 17, no. 6, pp. 791–798, Dec. 1972. [Online]. Available: https://ieeexplore.ieee.org/document/1100179/
-
[9]
On the stackelberg strategy in nonzero- sum games,
M. Simaan and J. B. Cruz, “On the stackelberg strategy in nonzero- sum games,”J. Optim. Theory Appl., vol. 11, no. 5, p. 533–555, May
-
[10]
Available: https://doi.org/10.1007/BF00935665
[Online]. Available: https://doi.org/10.1007/BF00935665
-
[11]
Additional aspects of the stackelberg strategy in non-zero sum games,
M. Simaan and J. Cruz, “Additional aspects of the stackelberg strategy in non-zero sum games,” inProceedings of the 1972 IEEE Conference on Decision and Control and 11th Symposium on Adaptive Processes, vol. 11, 01 1973, pp. 183 – 187
work page 1972
-
[12]
A Stackelberg Game Theoretic Model of Lane-Merging,
J. Yoo and R. Langari, “A Stackelberg Game Theoretic Model of Lane-Merging,” Mar. 2020, arXiv:2003.09786 [cs]. [Online]. Available: http://arxiv.org/abs/2003.09786
-
[13]
Comparison of different Stackelberg solutions in a deter- ministic dynamic pollution control
T. Vall ´ee, “Comparison of different Stackelberg solutions in a deter- ministic dynamic pollution control.”
-
[14]
A Real-Time Demand-Response Algorithm for Smart Grids: A Stackelberg Game Approach,
M. Yu and S. H. Hong, “A Real-Time Demand-Response Algorithm for Smart Grids: A Stackelberg Game Approach,”IEEE Transactions on Smart Grid, vol. 7, no. 2, pp. 879–888, Mar. 2016. [Online]. Available: https://ieeexplore.ieee.org/document/7073650/
-
[15]
Noncooperative and Dominant Player Solutions in Discrete Dynamic Games,
F. Kydland, “Noncooperative and Dominant Player Solutions in Discrete Dynamic Games,”International Economic Review, vol. 16, no. 2, p. 321, Jun. 1975. [Online]. Available: https: //www.jstor.org/stable/2525814?origin=crossref
-
[16]
Feedback Stackelberg strategy for a two player game,
B. Gardner and J. Cruz, “Feedback Stackelberg strategy for a two player game,”IEEE Transactions on Automatic Control, vol. 22, no. 2, pp. 270–271, Apr. 1977. [Online]. Available: https://ieeexplore.ieee.org/document/1101465/
-
[17]
Closed-loop Stackelberg strategies with applications in the optimal control of multilevel systems,
T. Bas ¸ar and H. Selbuz, “Closed-loop Stackelberg strategies with applications in the optimal control of multilevel systems,”IEEE Transactions on Automatic Control, vol. 24, no. 2, pp. 166–179, Apr. 1979. [Online]. Available: https://ieeexplore.ieee.org/document/ 1101999/
work page 1979
-
[18]
Equilibrium solutions in dynamic dominant-player models,
F. Kydland, “Equilibrium solutions in dynamic dominant-player models,”Journal of Economic Theory, vol. 15, no. 2, pp. 307– 324, Aug. 1977. [Online]. Available: https://linkinghub.elsevier.com/ retrieve/pii/0022053177901053
-
[19]
Closed-loop Stackelberg solution to a multistage linear- quadratic game,
B. Tolwinski, “Closed-loop Stackelberg solution to a multistage linear- quadratic game,”Journal of Optimization Theory and Applications, vol. 34, no. 4, pp. 485–501, Aug. 1981. [Online]. Available: http://link.springer.com/10.1007/BF00935889
-
[20]
Learning to Play Trajectory Games Against Opponents With Unknown Objectives,
X. Liu, L. Peters, and J. Alonso-Mora, “Learning to Play Trajectory Games Against Opponents With Unknown Objectives,”IEEE Robotics and Automation Letters, vol. 8, no. 7, pp. 4139–4146, Jul. 2023. [Online]. Available: https://ieeexplore.ieee.org/document/10137879/
-
[21]
Planning for Autonomous Cars that Leverage Effects on Human Actions,
D. Sadigh, S. Sastry, S. A. Seshia, and A. D. Dragan, “Planning for Autonomous Cars that Leverage Effects on Human Actions,” inRobotics: Science and Systems XII. Robotics: Science and Systems Foundation, 2016. [Online]. Available: http://www.roboticsproceedings.org/rss12/p29.pdf
work page 2016
-
[22]
Inferring Objectives in Continuous Dynamic Games from Noise-Corrupted Partial State Observations,
L. Peters, D. Fridovich-Keil, V . Rubies-Royo, C. Tomlin, and C. Stachniss, “Inferring Objectives in Continuous Dynamic Games from Noise-Corrupted Partial State Observations,” inRobotics: Science and Systems XVII. Robotics: Science and Systems Foundation, Jul
-
[23]
Available: http://www.roboticsproceedings.org/rss17/ p030.pdf
[Online]. Available: http://www.roboticsproceedings.org/rss17/ p030.pdf
-
[24]
No- Regret Learning in Dynamic Stackelberg Games,
N. Lauffer, M. Ghasemi, A. Hashemi, Y . Savas, and U. Topcu, “No- Regret Learning in Dynamic Stackelberg Games,”IEEE Transactions on Automatic Control, vol. 69, no. 3, pp. 1418–1431, Mar. 2024. [Online]. Available: https://ieeexplore.ieee.org/document/10310098/
-
[25]
ISBN 1581138385.DOI: 10.1145/1015330.1015430
P. Abbeel and A. Y . Ng, “Apprenticeship learning via inverse reinforcement learning,” inProceedings of the Twenty-First International Conference on Machine Learning, ser. ICML ’04. New York, NY , USA: Association for Computing Machinery, 2004, p. 1. [Online]. Available: https://doi.org/10.1145/1015330.1015430
-
[26]
X. Zhao, H. Hu, and D. Sun, “Cooperation with humans of un- known intentions in confined spaces using the stackelberg friend-or- foe game,”IEEE Transactions on Aerospace and Electronic Systems, 2024
work page 2024
-
[27]
C. Zhang, J. Wang, and Q. Liu, “A motion planning framework con- sidering opportunity costs based on stackelberg games in interactive scenarios,”IEEE Transactions on Intelligent Transportation Systems, 2025
work page 2025
-
[28]
Active Inverse Learning in Stackelberg Trajectory Games,
W. Ward, Y . Yu, J. Levy, N. Mehr, D. Fridovich-Keil, and U. Topcu, “Active Inverse Learning in Stackelberg Trajectory Games,” Oct. 2024, arXiv:2308.08017 [cs]. [Online]. Available: http://arxiv.org/abs/2308.08017
-
[29]
Inverse game theory for stackelberg games: the blessing of bounded rationality,
J. Wu, W. Shen, F. Fang, and H. Xu, “Inverse game theory for stackelberg games: the blessing of bounded rationality,”Advances in Neural Information Processing Systems, vol. 35, pp. 32 186–32 198, 2022
work page 2022
-
[30]
Time Consistency and Robustness of Equilibria in Non-Cooperative Dynamic Games,
T. Bas ¸ar, “Time Consistency and Robustness of Equilibria in Non-Cooperative Dynamic Games,” inContributions to Economic Analysis. Elsevier, 1989, vol. 181, pp. 9–54. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/B9780444873873500098
work page 1989
- [31]
-
[32]
A theory of dynamic oligopoly, iii: Cournot competition,
E. Maskin and J. Tirole, “A theory of dynamic oligopoly, iii: Cournot competition,”European economic review, vol. 31, no. 4, pp. 947–968, 1987
work page 1987
-
[33]
A theory of dynamic oligopoly, i: Overview and quantity competition with large fixed costs,
——, “A theory of dynamic oligopoly, i: Overview and quantity competition with large fixed costs,”Econometrica: Journal of the Econometric Society, pp. 549–569, 1988
work page 1988
-
[34]
Zero-sum stochastic stack- elberg games,
D. Goktas, S. Zhao, and A. Greenwald, “Zero-sum stochastic stack- elberg games,”Advances in Neural Information Processing Systems, vol. 35, pp. 11 658–11 672, 2022
work page 2022
-
[35]
Y . Bar-Shalom, T. Kirubarajan, and X.-R. Li,Estimation with Appli- cations to Tracking and Navigation. USA: John Wiley & Sons, Inc., 2002. APPENDIX A. LQ game We consider a finite horizon dynamic LQ game. The cost functions of both players is denoted as J i 0:T =x ⊤ T QixT + T−1X t=0 x⊤ t Qixt +u i t ⊤ Riui t, i∈ {L, F}, (49) whereQ i ⪰0andR i ≻0to guara...
work page 2002
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.