pith. sign in

arxiv: 2511.07363 · v4 · submitted 2025-11-10 · 📡 eess.SY · cs.GT· cs.SY

When the Correct Model Fails: The Optimality of Stackelberg Equilibria with Follower Intention Updates

Pith reviewed 2026-05-17 23:30 UTC · model grok-4.3

classification 📡 eess.SY cs.GTcs.SY
keywords Stackelberg gamesbest responseintention updatesdynamic gameslinear quadratic gamescollision avoidanceinformation structuresoptimality guarantees
0
0 comments X

The pith

Assuming an incorrect follower best response can yield lower leader costs in dynamic Stackelberg games with intention updates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

In two-player dynamic Stackelberg games, the leader typically benefits from knowing the follower's true best-response function. This paper examines what happens when the leader instead receives updates on the follower's intentions mid-game and must re-optimize. The authors show that using an incorrect model of the follower can produce a lower total cost for the leader than using the correct model. They provide theoretical characterizations for both open-loop and feedback settings and back the claim with examples from linear-quadratic games, including collision avoidance scenarios. A sympathetic reader might care because this upends the intuition that better information always leads to better decisions in strategic interactions.

Core claim

We prove that in general, assuming an incorrect follower's best response may lead to a lower leader cost over the entire game than knowing the true follower's best response. This holds when the leader receives updated beliefs about the follower best response before the end of the game such that the update prompts the leader and subsequently the follower to re-optimize their strategies. We characterize the optimality guarantees for open loop and feedback information structures and support the results with examples in linear quadratic Stackelberg games.

What carries the argument

Stackelberg equilibrium under belief updates about the follower's best-response function; it allows re-optimization after the leader receives new information on the follower's intentions, enabling comparison of costs between correct and incorrect assumptions.

If this is right

  • In open-loop information structures, the Stackelberg equilibrium with an incorrect best response can achieve lower leader cost than the true best response after an update.
  • The same cost advantage for incorrect assumptions appears under feedback information structures.
  • Numerical examples in linear-quadratic Stackelberg games demonstrate concrete cases of lower leader costs with incorrect best-response assumptions.
  • Monte Carlo simulations show that instances where an incorrect best response improves leader cost are non-trivial in collision-avoidance linear-quadratic games.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Leaders facing uncertain follower intentions might sometimes prefer to retain an approximate model rather than pursue full accuracy.
  • The result connects to other dynamic decision problems where mid-game information updates allow strategies to adapt in ways that reward initial model mismatch.

Load-bearing premise

The leader receives updated beliefs about the follower best response before the end of the game such that the update prompts the leader and subsequently the follower to re-optimize their strategies.

What would settle it

A linear-quadratic Stackelberg game instance where, for every possible mid-game update of the follower's best-response belief, the leader's cumulative cost is always minimized by using the true best response rather than any incorrect one.

Figures

Figures reproduced from arXiv: 2511.07363 by Cayetana Salinas-Rodriguez, Jonathan Rogers, Sarah H.Q. Li.

Figure 1
Figure 1. Figure 1: Dynamics of Stackelberg game with BR update. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Percent of simulations with lowest cost achieved by each BR belief [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Percentage of simulations where each BR belief obtains the lowest [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Percent of simulations with lowest cost achieved by BR beliefs [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Percentage of simulations where each BR belief obtains the lowest [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
read the original abstract

We study a two-player dynamic Stackelberg game where the follower's intention is unknown to the leader. Classical formulations of the Stackelberg equilibrium (SE) assume that the follower's best response (BR) function is known to the leader. However, this is not always true in practice. We study a setting in which the leader receives updated beliefs about the follower BR before the end of the game, such that the update prompts the leader and subsequently the follower to re-optimize their strategies. We characterize the optimality guarantees of the SE solutions under this belief update for both open loop and feedback information structures. Interestingly, we prove that in general, assuming an incorrect follower's BR may lead to a lower leader cost over the entire game than knowing the true follower's BR. We support these results with numerical examples in a linear quadratic (LQ) Stackelberg game, and use Monte Carlo simulations to show that the instances of incorrect BR achieving lower leader costs are non-trivial in collision avoidance LQ Stackelberg games.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript studies a two-player dynamic Stackelberg game in which the leader initially holds an incorrect belief about the follower's best-response function. The model incorporates a mid-game belief update received by the leader that triggers re-optimization by both players. The authors characterize the optimality properties of the resulting Stackelberg equilibria under both open-loop and feedback information structures. The central claim is that, in general, the leader's cumulative cost can be strictly lower when starting from an incorrect BR assumption than when starting from the true BR. The theoretical results are illustrated with linear-quadratic examples and supported by Monte Carlo simulations in a collision-avoidance setting.

Significance. If the central characterization holds, the result is noteworthy because it identifies a counter-intuitive regime in which imperfect information about the follower can be advantageous for the leader over the full horizon. The paper earns credit for supplying a general, parameter-free theoretical characterization for both information structures and for using Monte Carlo runs to demonstrate that the cost-improving instances are non-trivial rather than measure-zero artifacts. This contributes to the literature on dynamic games with incomplete information and has potential implications for robust controller design in multi-agent systems.

major comments (2)
  1. [§3.1] §3.1 (open-loop characterization): The proof that an incorrect initial BR yields lower leader cost relies on the specific timing and form of the belief update that forces re-optimization; the manuscript should state explicitly whether the inequality continues to hold for arbitrary (including stochastic) update times or requires the update to occur before a fixed fraction of the horizon.
  2. [§4] §4 (feedback structure): The optimality guarantee for the feedback SE under incorrect BR is derived under the assumption that the follower observes the leader's updated strategy and re-optimizes; it is unclear from the derivation whether the same strict cost reduction holds when the follower only has noisy observations of the leader's action after the update.
minor comments (2)
  1. [Preliminaries] The definition of the belief-update operator should be placed in the preliminaries with a dedicated symbol rather than being introduced inline in the main theorems.
  2. [Numerical Examples] In the Monte Carlo section, the histograms would benefit from an additional panel showing the distribution of cost differences rather than only the fraction of improving cases.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful review and positive assessment of the manuscript. We address the two major comments below, indicating the revisions we plan to incorporate.

read point-by-point responses
  1. Referee: [§3.1] §3.1 (open-loop characterization): The proof that an incorrect initial BR yields lower leader cost relies on the specific timing and form of the belief update that forces re-optimization; the manuscript should state explicitly whether the inequality continues to hold for arbitrary (including stochastic) update times or requires the update to occur before a fixed fraction of the horizon.

    Authors: We agree that the open-loop characterization in §3.1 is derived for a deterministic belief update occurring at a fixed interior time t_update. The strict leader cost reduction depends on sufficient remaining horizon length after re-optimization. For arbitrary or stochastic update times, the inequality does not hold in general (e.g., if the update occurs near the terminal time). We will revise §3.1 to state this assumption explicitly and add a short remark on the conditions required for the result to extend to stochastic updates. revision: yes

  2. Referee: [§4] §4 (feedback structure): The optimality guarantee for the feedback SE under incorrect BR is derived under the assumption that the follower observes the leader's updated strategy and re-optimizes; it is unclear from the derivation whether the same strict cost reduction holds when the follower only has noisy observations of the leader's action after the update.

    Authors: We thank the referee for this clarification request. The feedback equilibrium analysis in §4 assumes that the follower perfectly observes the leader's updated strategy after the belief update, enabling exact re-optimization. With only noisy observations, the follower's best response would be computed from a noisy estimate, which can change the resulting equilibrium and may remove the strict cost advantage. We will revise §4 to state this observability assumption clearly and note the noisy-observation case as an open direction for future work. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper's central claim is a general theoretical characterization of Stackelberg equilibrium optimality under follower intention updates, for both open-loop and feedback information structures. This is derived directly from the stated model definitions, including the timing of belief updates and subsequent re-optimization by leader and follower. The result that an incorrect initial best-response assumption can produce lower total leader cost is obtained by comparing the resulting cost functionals under the two information structures; no step reduces to a fitted parameter, self-referential definition, or load-bearing self-citation. The LQ numerical examples and Monte Carlo runs are presented explicitly as illustrations of non-trivial instances rather than as the source of the inequality. The derivation is therefore self-contained against external benchmarks and receives a score of 0.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The claim rests on standard dynamic-game assumptions plus the specific mid-game belief-update mechanism; no free parameters or new entities are introduced in the abstract.

axioms (2)
  • domain assumption Classical Stackelberg equilibrium assumes the leader knows the follower's best-response function.
    Invoked to contrast with the uncertain-BR setting studied here.
  • domain assumption Belief updates occur before game end and trigger re-optimization by both players.
    Central modeling choice that enables the cost-comparison result.

pith-pipeline@v0.9.0 · 5496 in / 1215 out tokens · 31842 ms · 2026-05-17T23:30:44.831091+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages

  1. [1]

    Algorithms for inverse reinforcement learning,

    A. Y . Ng and S. J. Russell, “Algorithms for inverse reinforcement learning,” inProceedings of the Seventeenth International Conference on Machine Learning, ser. ICML ’00. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2000, p. 663–670

  2. [2]

    Inverse Game Theory: Learning Utilities in Succinct Games,

    V . Kuleshov and O. Schrijvers, “Inverse Game Theory: Learning Utilities in Succinct Games,” inWeb and Internet Economics, E. Markakis and G. Sch ¨afer, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2015, vol. 9470, pp. 413–427, series Title: Lecture Notes in Computer Science. [Online]. Available: https: //link.springer.com/10.1007/978-3-662-48995-6 30

  3. [3]

    Bayesian inverse reinforcement learning,

    D. Ramachandran and E. Amir, “Bayesian inverse reinforcement learning,” inProceedings of the 20th International Joint Conference on Artifical Intelligence, ser. IJCAI’07. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2007, p. 2586–2591

  4. [4]

    Anderson and J

    B. Anderson and J. Moore,Optimal Filtering, ser. Dover Books on Electrical Engineering. Dover Publications, 2012. [Online]. Available: https://books.google.com/books?id=iYMqLQp49UMC

  5. [5]

    V on Stackelberg,Market Structure and Equilibrium

    H. V on Stackelberg,Market Structure and Equilibrium. Berlin, Heidelberg: Springer Berlin Heidelberg, 1934. [Online]. Available: https://link.springer.com/10.1007/978-3-642-12586-7

  6. [6]

    Bas ¸ar and G

    T. Bas ¸ar and G. J. Olsder,Dynamic Noncooperative Game Theory, 2nd Edition. Society for Industrial and Applied Mathematics,

  7. [7]

    A Geometric Heuristic for Rectilinear Crossing Minimization

    [Online]. Available: https://epubs.siam.org/doi/abs/10.1137/1. 9781611971132

  8. [8]

    Stackelberg solution for two-person games with biased information patterns,

    C. Chen and J. Cruz, “Stackelberg solution for two-person games with biased information patterns,”IEEE Transactions on Automatic Control, vol. 17, no. 6, pp. 791–798, Dec. 1972. [Online]. Available: https://ieeexplore.ieee.org/document/1100179/

  9. [9]

    On the stackelberg strategy in nonzero- sum games,

    M. Simaan and J. B. Cruz, “On the stackelberg strategy in nonzero- sum games,”J. Optim. Theory Appl., vol. 11, no. 5, p. 533–555, May

  10. [10]

    Available: https://doi.org/10.1007/BF00935665

    [Online]. Available: https://doi.org/10.1007/BF00935665

  11. [11]

    Additional aspects of the stackelberg strategy in non-zero sum games,

    M. Simaan and J. Cruz, “Additional aspects of the stackelberg strategy in non-zero sum games,” inProceedings of the 1972 IEEE Conference on Decision and Control and 11th Symposium on Adaptive Processes, vol. 11, 01 1973, pp. 183 – 187

  12. [12]

    A Stackelberg Game Theoretic Model of Lane-Merging,

    J. Yoo and R. Langari, “A Stackelberg Game Theoretic Model of Lane-Merging,” Mar. 2020, arXiv:2003.09786 [cs]. [Online]. Available: http://arxiv.org/abs/2003.09786

  13. [13]

    Comparison of different Stackelberg solutions in a deter- ministic dynamic pollution control

    T. Vall ´ee, “Comparison of different Stackelberg solutions in a deter- ministic dynamic pollution control.”

  14. [14]

    A Real-Time Demand-Response Algorithm for Smart Grids: A Stackelberg Game Approach,

    M. Yu and S. H. Hong, “A Real-Time Demand-Response Algorithm for Smart Grids: A Stackelberg Game Approach,”IEEE Transactions on Smart Grid, vol. 7, no. 2, pp. 879–888, Mar. 2016. [Online]. Available: https://ieeexplore.ieee.org/document/7073650/

  15. [15]

    Noncooperative and Dominant Player Solutions in Discrete Dynamic Games,

    F. Kydland, “Noncooperative and Dominant Player Solutions in Discrete Dynamic Games,”International Economic Review, vol. 16, no. 2, p. 321, Jun. 1975. [Online]. Available: https: //www.jstor.org/stable/2525814?origin=crossref

  16. [16]

    Feedback Stackelberg strategy for a two player game,

    B. Gardner and J. Cruz, “Feedback Stackelberg strategy for a two player game,”IEEE Transactions on Automatic Control, vol. 22, no. 2, pp. 270–271, Apr. 1977. [Online]. Available: https://ieeexplore.ieee.org/document/1101465/

  17. [17]

    Closed-loop Stackelberg strategies with applications in the optimal control of multilevel systems,

    T. Bas ¸ar and H. Selbuz, “Closed-loop Stackelberg strategies with applications in the optimal control of multilevel systems,”IEEE Transactions on Automatic Control, vol. 24, no. 2, pp. 166–179, Apr. 1979. [Online]. Available: https://ieeexplore.ieee.org/document/ 1101999/

  18. [18]

    Equilibrium solutions in dynamic dominant-player models,

    F. Kydland, “Equilibrium solutions in dynamic dominant-player models,”Journal of Economic Theory, vol. 15, no. 2, pp. 307– 324, Aug. 1977. [Online]. Available: https://linkinghub.elsevier.com/ retrieve/pii/0022053177901053

  19. [19]

    Closed-loop Stackelberg solution to a multistage linear- quadratic game,

    B. Tolwinski, “Closed-loop Stackelberg solution to a multistage linear- quadratic game,”Journal of Optimization Theory and Applications, vol. 34, no. 4, pp. 485–501, Aug. 1981. [Online]. Available: http://link.springer.com/10.1007/BF00935889

  20. [20]

    Learning to Play Trajectory Games Against Opponents With Unknown Objectives,

    X. Liu, L. Peters, and J. Alonso-Mora, “Learning to Play Trajectory Games Against Opponents With Unknown Objectives,”IEEE Robotics and Automation Letters, vol. 8, no. 7, pp. 4139–4146, Jul. 2023. [Online]. Available: https://ieeexplore.ieee.org/document/10137879/

  21. [21]

    Planning for Autonomous Cars that Leverage Effects on Human Actions,

    D. Sadigh, S. Sastry, S. A. Seshia, and A. D. Dragan, “Planning for Autonomous Cars that Leverage Effects on Human Actions,” inRobotics: Science and Systems XII. Robotics: Science and Systems Foundation, 2016. [Online]. Available: http://www.roboticsproceedings.org/rss12/p29.pdf

  22. [22]

    Inferring Objectives in Continuous Dynamic Games from Noise-Corrupted Partial State Observations,

    L. Peters, D. Fridovich-Keil, V . Rubies-Royo, C. Tomlin, and C. Stachniss, “Inferring Objectives in Continuous Dynamic Games from Noise-Corrupted Partial State Observations,” inRobotics: Science and Systems XVII. Robotics: Science and Systems Foundation, Jul

  23. [23]

    Available: http://www.roboticsproceedings.org/rss17/ p030.pdf

    [Online]. Available: http://www.roboticsproceedings.org/rss17/ p030.pdf

  24. [24]

    No- Regret Learning in Dynamic Stackelberg Games,

    N. Lauffer, M. Ghasemi, A. Hashemi, Y . Savas, and U. Topcu, “No- Regret Learning in Dynamic Stackelberg Games,”IEEE Transactions on Automatic Control, vol. 69, no. 3, pp. 1418–1431, Mar. 2024. [Online]. Available: https://ieeexplore.ieee.org/document/10310098/

  25. [25]

    ISBN 1581138385.DOI: 10.1145/1015330.1015430

    P. Abbeel and A. Y . Ng, “Apprenticeship learning via inverse reinforcement learning,” inProceedings of the Twenty-First International Conference on Machine Learning, ser. ICML ’04. New York, NY , USA: Association for Computing Machinery, 2004, p. 1. [Online]. Available: https://doi.org/10.1145/1015330.1015430

  26. [26]

    Cooperation with humans of un- known intentions in confined spaces using the stackelberg friend-or- foe game,

    X. Zhao, H. Hu, and D. Sun, “Cooperation with humans of un- known intentions in confined spaces using the stackelberg friend-or- foe game,”IEEE Transactions on Aerospace and Electronic Systems, 2024

  27. [27]

    A motion planning framework con- sidering opportunity costs based on stackelberg games in interactive scenarios,

    C. Zhang, J. Wang, and Q. Liu, “A motion planning framework con- sidering opportunity costs based on stackelberg games in interactive scenarios,”IEEE Transactions on Intelligent Transportation Systems, 2025

  28. [28]

    Active Inverse Learning in Stackelberg Trajectory Games,

    W. Ward, Y . Yu, J. Levy, N. Mehr, D. Fridovich-Keil, and U. Topcu, “Active Inverse Learning in Stackelberg Trajectory Games,” Oct. 2024, arXiv:2308.08017 [cs]. [Online]. Available: http://arxiv.org/abs/2308.08017

  29. [29]

    Inverse game theory for stackelberg games: the blessing of bounded rationality,

    J. Wu, W. Shen, F. Fang, and H. Xu, “Inverse game theory for stackelberg games: the blessing of bounded rationality,”Advances in Neural Information Processing Systems, vol. 35, pp. 32 186–32 198, 2022

  30. [30]

    Time Consistency and Robustness of Equilibria in Non-Cooperative Dynamic Games,

    T. Bas ¸ar, “Time Consistency and Robustness of Equilibria in Non-Cooperative Dynamic Games,” inContributions to Economic Analysis. Elsevier, 1989, vol. 181, pp. 9–54. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/B9780444873873500098

  31. [31]

    Fudenberg and J

    D. Fudenberg and J. Tirole,Game theory. MIT press, 1991

  32. [32]

    A theory of dynamic oligopoly, iii: Cournot competition,

    E. Maskin and J. Tirole, “A theory of dynamic oligopoly, iii: Cournot competition,”European economic review, vol. 31, no. 4, pp. 947–968, 1987

  33. [33]

    A theory of dynamic oligopoly, i: Overview and quantity competition with large fixed costs,

    ——, “A theory of dynamic oligopoly, i: Overview and quantity competition with large fixed costs,”Econometrica: Journal of the Econometric Society, pp. 549–569, 1988

  34. [34]

    Zero-sum stochastic stack- elberg games,

    D. Goktas, S. Zhao, and A. Greenwald, “Zero-sum stochastic stack- elberg games,”Advances in Neural Information Processing Systems, vol. 35, pp. 11 658–11 672, 2022

  35. [35]

    Bar-Shalom, T

    Y . Bar-Shalom, T. Kirubarajan, and X.-R. Li,Estimation with Appli- cations to Tracking and Navigation. USA: John Wiley & Sons, Inc., 2002. APPENDIX A. LQ game We consider a finite horizon dynamic LQ game. The cost functions of both players is denoted as J i 0:T =x ⊤ T QixT + T−1X t=0 x⊤ t Qixt +u i t ⊤ Riui t, i∈ {L, F}, (49) whereQ i ⪰0andR i ≻0to guara...