When the Correct Model Fails: The Optimality of Stackelberg Equilibria with Follower Intention Updates

Cayetana Salinas-Rodriguez; Jonathan Rogers; Sarah H.Q. Li

arxiv: 2511.07363 · v4 · submitted 2025-11-10 · 📡 eess.SY · cs.GT· cs.SY

When the Correct Model Fails: The Optimality of Stackelberg Equilibria with Follower Intention Updates

Cayetana Salinas-Rodriguez , Jonathan Rogers , Sarah H.Q. Li This is my paper

Pith reviewed 2026-05-17 23:30 UTC · model grok-4.3

classification 📡 eess.SY cs.GTcs.SY

keywords Stackelberg gamesbest responseintention updatesdynamic gameslinear quadratic gamescollision avoidanceinformation structuresoptimality guarantees

0 comments

The pith

Assuming an incorrect follower best response can yield lower leader costs in dynamic Stackelberg games with intention updates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

In two-player dynamic Stackelberg games, the leader typically benefits from knowing the follower's true best-response function. This paper examines what happens when the leader instead receives updates on the follower's intentions mid-game and must re-optimize. The authors show that using an incorrect model of the follower can produce a lower total cost for the leader than using the correct model. They provide theoretical characterizations for both open-loop and feedback settings and back the claim with examples from linear-quadratic games, including collision avoidance scenarios. A sympathetic reader might care because this upends the intuition that better information always leads to better decisions in strategic interactions.

Core claim

We prove that in general, assuming an incorrect follower's best response may lead to a lower leader cost over the entire game than knowing the true follower's best response. This holds when the leader receives updated beliefs about the follower best response before the end of the game such that the update prompts the leader and subsequently the follower to re-optimize their strategies. We characterize the optimality guarantees for open loop and feedback information structures and support the results with examples in linear quadratic Stackelberg games.

What carries the argument

Stackelberg equilibrium under belief updates about the follower's best-response function; it allows re-optimization after the leader receives new information on the follower's intentions, enabling comparison of costs between correct and incorrect assumptions.

If this is right

In open-loop information structures, the Stackelberg equilibrium with an incorrect best response can achieve lower leader cost than the true best response after an update.
The same cost advantage for incorrect assumptions appears under feedback information structures.
Numerical examples in linear-quadratic Stackelberg games demonstrate concrete cases of lower leader costs with incorrect best-response assumptions.
Monte Carlo simulations show that instances where an incorrect best response improves leader cost are non-trivial in collision-avoidance linear-quadratic games.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Leaders facing uncertain follower intentions might sometimes prefer to retain an approximate model rather than pursue full accuracy.
The result connects to other dynamic decision problems where mid-game information updates allow strategies to adapt in ways that reward initial model mismatch.

Load-bearing premise

The leader receives updated beliefs about the follower best response before the end of the game such that the update prompts the leader and subsequently the follower to re-optimize their strategies.

What would settle it

A linear-quadratic Stackelberg game instance where, for every possible mid-game update of the follower's best-response belief, the leader's cumulative cost is always minimized by using the true best response rather than any incorrect one.

Figures

Figures reproduced from arXiv: 2511.07363 by Cayetana Salinas-Rodriguez, Jonathan Rogers, Sarah H.Q. Li.

**Figure 2.** Figure 2: Percent of simulations with lowest cost achieved by each BR belief [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Percentage of simulations where each BR belief obtains the lowest [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Percent of simulations with lowest cost achieved by BR beliefs [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Percentage of simulations where each BR belief obtains the lowest [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

read the original abstract

We study a two-player dynamic Stackelberg game where the follower's intention is unknown to the leader. Classical formulations of the Stackelberg equilibrium (SE) assume that the follower's best response (BR) function is known to the leader. However, this is not always true in practice. We study a setting in which the leader receives updated beliefs about the follower BR before the end of the game, such that the update prompts the leader and subsequently the follower to re-optimize their strategies. We characterize the optimality guarantees of the SE solutions under this belief update for both open loop and feedback information structures. Interestingly, we prove that in general, assuming an incorrect follower's BR may lead to a lower leader cost over the entire game than knowing the true follower's BR. We support these results with numerical examples in a linear quadratic (LQ) Stackelberg game, and use Monte Carlo simulations to show that the instances of incorrect BR achieving lower leader costs are non-trivial in collision avoidance LQ Stackelberg games.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows that in dynamic Stackelberg games with mid-game belief updates, starting from an incorrect follower best response can sometimes give the leader lower total cost than the true response.

read the letter

This paper's main result is that in a dynamic Stackelberg game the leader can end up with lower overall cost by initially assuming the wrong best response from the follower, provided a belief update arrives in time to trigger re-optimization by both players. They set up the model so the leader receives an updated belief before the game ends, then characterize the resulting equilibria under both open-loop and feedback information structures. The proof covers the general case, and they back it with linear-quadratic examples plus Monte Carlo runs in a collision-avoidance setting to show the lower-cost instances are not rare. The numerics serve as useful illustration rather than the foundation of the claim. The Monte Carlo runs are a reasonable way to check that the effect appears in non-trivial fractions of cases. The central argument holds up within the stated assumptions, and the stress-test note is right that the update timing is treated as part of the model definition rather than an unexamined loophole. A minor soft spot is how tightly the result depends on the precise form and timing of the belief update; readers will want to see whether the inequality survives modest changes to when or how the update occurs. Within the given framework there is no visible internal inconsistency. This is for researchers working on game-theoretic control and multi-agent systems with model mismatch, such as in robotics or autonomous decision-making. It deserves peer review because the counter-intuitive optimality claim is new relative to classical Stackelberg formulations and the supporting examples are concrete enough to warrant expert scrutiny.

Referee Report

2 major / 2 minor

Summary. The manuscript studies a two-player dynamic Stackelberg game in which the leader initially holds an incorrect belief about the follower's best-response function. The model incorporates a mid-game belief update received by the leader that triggers re-optimization by both players. The authors characterize the optimality properties of the resulting Stackelberg equilibria under both open-loop and feedback information structures. The central claim is that, in general, the leader's cumulative cost can be strictly lower when starting from an incorrect BR assumption than when starting from the true BR. The theoretical results are illustrated with linear-quadratic examples and supported by Monte Carlo simulations in a collision-avoidance setting.

Significance. If the central characterization holds, the result is noteworthy because it identifies a counter-intuitive regime in which imperfect information about the follower can be advantageous for the leader over the full horizon. The paper earns credit for supplying a general, parameter-free theoretical characterization for both information structures and for using Monte Carlo runs to demonstrate that the cost-improving instances are non-trivial rather than measure-zero artifacts. This contributes to the literature on dynamic games with incomplete information and has potential implications for robust controller design in multi-agent systems.

major comments (2)

[§3.1] §3.1 (open-loop characterization): The proof that an incorrect initial BR yields lower leader cost relies on the specific timing and form of the belief update that forces re-optimization; the manuscript should state explicitly whether the inequality continues to hold for arbitrary (including stochastic) update times or requires the update to occur before a fixed fraction of the horizon.
[§4] §4 (feedback structure): The optimality guarantee for the feedback SE under incorrect BR is derived under the assumption that the follower observes the leader's updated strategy and re-optimizes; it is unclear from the derivation whether the same strict cost reduction holds when the follower only has noisy observations of the leader's action after the update.

minor comments (2)

[Preliminaries] The definition of the belief-update operator should be placed in the preliminaries with a dedicated symbol rather than being introduced inline in the main theorems.
[Numerical Examples] In the Monte Carlo section, the histograms would benefit from an additional panel showing the distribution of cost differences rather than only the fraction of improving cases.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful review and positive assessment of the manuscript. We address the two major comments below, indicating the revisions we plan to incorporate.

read point-by-point responses

Referee: [§3.1] §3.1 (open-loop characterization): The proof that an incorrect initial BR yields lower leader cost relies on the specific timing and form of the belief update that forces re-optimization; the manuscript should state explicitly whether the inequality continues to hold for arbitrary (including stochastic) update times or requires the update to occur before a fixed fraction of the horizon.

Authors: We agree that the open-loop characterization in §3.1 is derived for a deterministic belief update occurring at a fixed interior time t_update. The strict leader cost reduction depends on sufficient remaining horizon length after re-optimization. For arbitrary or stochastic update times, the inequality does not hold in general (e.g., if the update occurs near the terminal time). We will revise §3.1 to state this assumption explicitly and add a short remark on the conditions required for the result to extend to stochastic updates. revision: yes
Referee: [§4] §4 (feedback structure): The optimality guarantee for the feedback SE under incorrect BR is derived under the assumption that the follower observes the leader's updated strategy and re-optimizes; it is unclear from the derivation whether the same strict cost reduction holds when the follower only has noisy observations of the leader's action after the update.

Authors: We thank the referee for this clarification request. The feedback equilibrium analysis in §4 assumes that the follower perfectly observes the leader's updated strategy after the belief update, enabling exact re-optimization. With only noisy observations, the follower's best response would be computed from a noisy estimate, which can change the resulting equilibrium and may remove the strict cost advantage. We will revise §4 to state this observability assumption clearly and note the noisy-observation case as an open direction for future work. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper's central claim is a general theoretical characterization of Stackelberg equilibrium optimality under follower intention updates, for both open-loop and feedback information structures. This is derived directly from the stated model definitions, including the timing of belief updates and subsequent re-optimization by leader and follower. The result that an incorrect initial best-response assumption can produce lower total leader cost is obtained by comparing the resulting cost functionals under the two information structures; no step reduces to a fitted parameter, self-referential definition, or load-bearing self-citation. The LQ numerical examples and Monte Carlo runs are presented explicitly as illustrations of non-trivial instances rather than as the source of the inequality. The derivation is therefore self-contained against external benchmarks and receives a score of 0.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The claim rests on standard dynamic-game assumptions plus the specific mid-game belief-update mechanism; no free parameters or new entities are introduced in the abstract.

axioms (2)

domain assumption Classical Stackelberg equilibrium assumes the leader knows the follower's best-response function.
Invoked to contrast with the uncertain-BR setting studied here.
domain assumption Belief updates occur before game end and trigger re-optimization by both players.
Central modeling choice that enables the cost-comparison result.

pith-pipeline@v0.9.0 · 5496 in / 1215 out tokens · 31842 ms · 2026-05-17T23:30:44.831091+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages

[1]

Algorithms for inverse reinforcement learning,

A. Y . Ng and S. J. Russell, “Algorithms for inverse reinforcement learning,” inProceedings of the Seventeenth International Conference on Machine Learning, ser. ICML ’00. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2000, p. 663–670

work page 2000
[2]

Inverse Game Theory: Learning Utilities in Succinct Games,

V . Kuleshov and O. Schrijvers, “Inverse Game Theory: Learning Utilities in Succinct Games,” inWeb and Internet Economics, E. Markakis and G. Sch ¨afer, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2015, vol. 9470, pp. 413–427, series Title: Lecture Notes in Computer Science. [Online]. Available: https: //link.springer.com/10.1007/978-3-662-48995-6 30

work page doi:10.1007/978-3-662-48995-6 2015
[3]

Bayesian inverse reinforcement learning,

D. Ramachandran and E. Amir, “Bayesian inverse reinforcement learning,” inProceedings of the 20th International Joint Conference on Artifical Intelligence, ser. IJCAI’07. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2007, p. 2586–2591

work page 2007
[4]

Anderson and J

B. Anderson and J. Moore,Optimal Filtering, ser. Dover Books on Electrical Engineering. Dover Publications, 2012. [Online]. Available: https://books.google.com/books?id=iYMqLQp49UMC

work page 2012
[5]

V on Stackelberg,Market Structure and Equilibrium

H. V on Stackelberg,Market Structure and Equilibrium. Berlin, Heidelberg: Springer Berlin Heidelberg, 1934. [Online]. Available: https://link.springer.com/10.1007/978-3-642-12586-7

work page doi:10.1007/978-3-642-12586-7 1934
[6]

Bas ¸ar and G

T. Bas ¸ar and G. J. Olsder,Dynamic Noncooperative Game Theory, 2nd Edition. Society for Industrial and Applied Mathematics,

work page
[7]

A Geometric Heuristic for Rectilinear Crossing Minimization

[Online]. Available: https://epubs.siam.org/doi/abs/10.1137/1. 9781611971132

work page doi:10.1137/1
[8]

Stackelberg solution for two-person games with biased information patterns,

C. Chen and J. Cruz, “Stackelberg solution for two-person games with biased information patterns,”IEEE Transactions on Automatic Control, vol. 17, no. 6, pp. 791–798, Dec. 1972. [Online]. Available: https://ieeexplore.ieee.org/document/1100179/

work page arXiv 1972
[9]

On the stackelberg strategy in nonzero- sum games,

M. Simaan and J. B. Cruz, “On the stackelberg strategy in nonzero- sum games,”J. Optim. Theory Appl., vol. 11, no. 5, p. 533–555, May

work page
[10]

Available: https://doi.org/10.1007/BF00935665

[Online]. Available: https://doi.org/10.1007/BF00935665

work page doi:10.1007/bf00935665
[11]

Additional aspects of the stackelberg strategy in non-zero sum games,

M. Simaan and J. Cruz, “Additional aspects of the stackelberg strategy in non-zero sum games,” inProceedings of the 1972 IEEE Conference on Decision and Control and 11th Symposium on Adaptive Processes, vol. 11, 01 1973, pp. 183 – 187

work page 1972
[12]

A Stackelberg Game Theoretic Model of Lane-Merging,

J. Yoo and R. Langari, “A Stackelberg Game Theoretic Model of Lane-Merging,” Mar. 2020, arXiv:2003.09786 [cs]. [Online]. Available: http://arxiv.org/abs/2003.09786

work page arXiv 2020
[13]

Comparison of different Stackelberg solutions in a deter- ministic dynamic pollution control

T. Vall ´ee, “Comparison of different Stackelberg solutions in a deter- ministic dynamic pollution control.”

work page
[14]

A Real-Time Demand-Response Algorithm for Smart Grids: A Stackelberg Game Approach,

M. Yu and S. H. Hong, “A Real-Time Demand-Response Algorithm for Smart Grids: A Stackelberg Game Approach,”IEEE Transactions on Smart Grid, vol. 7, no. 2, pp. 879–888, Mar. 2016. [Online]. Available: https://ieeexplore.ieee.org/document/7073650/

work page arXiv 2016
[15]

Noncooperative and Dominant Player Solutions in Discrete Dynamic Games,

F. Kydland, “Noncooperative and Dominant Player Solutions in Discrete Dynamic Games,”International Economic Review, vol. 16, no. 2, p. 321, Jun. 1975. [Online]. Available: https: //www.jstor.org/stable/2525814?origin=crossref

work page arXiv 1975
[16]

Feedback Stackelberg strategy for a two player game,

B. Gardner and J. Cruz, “Feedback Stackelberg strategy for a two player game,”IEEE Transactions on Automatic Control, vol. 22, no. 2, pp. 270–271, Apr. 1977. [Online]. Available: https://ieeexplore.ieee.org/document/1101465/

work page arXiv 1977
[17]

Closed-loop Stackelberg strategies with applications in the optimal control of multilevel systems,

T. Bas ¸ar and H. Selbuz, “Closed-loop Stackelberg strategies with applications in the optimal control of multilevel systems,”IEEE Transactions on Automatic Control, vol. 24, no. 2, pp. 166–179, Apr. 1979. [Online]. Available: https://ieeexplore.ieee.org/document/ 1101999/

work page 1979
[18]

Equilibrium solutions in dynamic dominant-player models,

F. Kydland, “Equilibrium solutions in dynamic dominant-player models,”Journal of Economic Theory, vol. 15, no. 2, pp. 307– 324, Aug. 1977. [Online]. Available: https://linkinghub.elsevier.com/ retrieve/pii/0022053177901053

work page arXiv 1977
[19]

Closed-loop Stackelberg solution to a multistage linear- quadratic game,

B. Tolwinski, “Closed-loop Stackelberg solution to a multistage linear- quadratic game,”Journal of Optimization Theory and Applications, vol. 34, no. 4, pp. 485–501, Aug. 1981. [Online]. Available: http://link.springer.com/10.1007/BF00935889

work page doi:10.1007/bf00935889 1981
[20]

Learning to Play Trajectory Games Against Opponents With Unknown Objectives,

X. Liu, L. Peters, and J. Alonso-Mora, “Learning to Play Trajectory Games Against Opponents With Unknown Objectives,”IEEE Robotics and Automation Letters, vol. 8, no. 7, pp. 4139–4146, Jul. 2023. [Online]. Available: https://ieeexplore.ieee.org/document/10137879/

work page arXiv 2023
[21]

Planning for Autonomous Cars that Leverage Effects on Human Actions,

D. Sadigh, S. Sastry, S. A. Seshia, and A. D. Dragan, “Planning for Autonomous Cars that Leverage Effects on Human Actions,” inRobotics: Science and Systems XII. Robotics: Science and Systems Foundation, 2016. [Online]. Available: http://www.roboticsproceedings.org/rss12/p29.pdf

work page 2016
[22]

Inferring Objectives in Continuous Dynamic Games from Noise-Corrupted Partial State Observations,

L. Peters, D. Fridovich-Keil, V . Rubies-Royo, C. Tomlin, and C. Stachniss, “Inferring Objectives in Continuous Dynamic Games from Noise-Corrupted Partial State Observations,” inRobotics: Science and Systems XVII. Robotics: Science and Systems Foundation, Jul

work page
[23]

Available: http://www.roboticsproceedings.org/rss17/ p030.pdf

[Online]. Available: http://www.roboticsproceedings.org/rss17/ p030.pdf

work page
[24]

No- Regret Learning in Dynamic Stackelberg Games,

N. Lauffer, M. Ghasemi, A. Hashemi, Y . Savas, and U. Topcu, “No- Regret Learning in Dynamic Stackelberg Games,”IEEE Transactions on Automatic Control, vol. 69, no. 3, pp. 1418–1431, Mar. 2024. [Online]. Available: https://ieeexplore.ieee.org/document/10310098/

work page arXiv 2024
[25]

ISBN 1581138385.DOI: 10.1145/1015330.1015430

P. Abbeel and A. Y . Ng, “Apprenticeship learning via inverse reinforcement learning,” inProceedings of the Twenty-First International Conference on Machine Learning, ser. ICML ’04. New York, NY , USA: Association for Computing Machinery, 2004, p. 1. [Online]. Available: https://doi.org/10.1145/1015330.1015430

work page doi:10.1145/1015330.1015430 2004
[26]

Cooperation with humans of un- known intentions in confined spaces using the stackelberg friend-or- foe game,

X. Zhao, H. Hu, and D. Sun, “Cooperation with humans of un- known intentions in confined spaces using the stackelberg friend-or- foe game,”IEEE Transactions on Aerospace and Electronic Systems, 2024

work page 2024
[27]

A motion planning framework con- sidering opportunity costs based on stackelberg games in interactive scenarios,

C. Zhang, J. Wang, and Q. Liu, “A motion planning framework con- sidering opportunity costs based on stackelberg games in interactive scenarios,”IEEE Transactions on Intelligent Transportation Systems, 2025

work page 2025
[28]

Active Inverse Learning in Stackelberg Trajectory Games,

W. Ward, Y . Yu, J. Levy, N. Mehr, D. Fridovich-Keil, and U. Topcu, “Active Inverse Learning in Stackelberg Trajectory Games,” Oct. 2024, arXiv:2308.08017 [cs]. [Online]. Available: http://arxiv.org/abs/2308.08017

work page arXiv 2024
[29]

Inverse game theory for stackelberg games: the blessing of bounded rationality,

J. Wu, W. Shen, F. Fang, and H. Xu, “Inverse game theory for stackelberg games: the blessing of bounded rationality,”Advances in Neural Information Processing Systems, vol. 35, pp. 32 186–32 198, 2022

work page 2022
[30]

Time Consistency and Robustness of Equilibria in Non-Cooperative Dynamic Games,

T. Bas ¸ar, “Time Consistency and Robustness of Equilibria in Non-Cooperative Dynamic Games,” inContributions to Economic Analysis. Elsevier, 1989, vol. 181, pp. 9–54. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/B9780444873873500098

work page 1989
[31]

Fudenberg and J

D. Fudenberg and J. Tirole,Game theory. MIT press, 1991

work page 1991
[32]

A theory of dynamic oligopoly, iii: Cournot competition,

E. Maskin and J. Tirole, “A theory of dynamic oligopoly, iii: Cournot competition,”European economic review, vol. 31, no. 4, pp. 947–968, 1987

work page 1987
[33]

A theory of dynamic oligopoly, i: Overview and quantity competition with large fixed costs,

——, “A theory of dynamic oligopoly, i: Overview and quantity competition with large fixed costs,”Econometrica: Journal of the Econometric Society, pp. 549–569, 1988

work page 1988
[34]

Zero-sum stochastic stack- elberg games,

D. Goktas, S. Zhao, and A. Greenwald, “Zero-sum stochastic stack- elberg games,”Advances in Neural Information Processing Systems, vol. 35, pp. 11 658–11 672, 2022

work page 2022
[35]

Bar-Shalom, T

Y . Bar-Shalom, T. Kirubarajan, and X.-R. Li,Estimation with Appli- cations to Tracking and Navigation. USA: John Wiley & Sons, Inc., 2002. APPENDIX A. LQ game We consider a finite horizon dynamic LQ game. The cost functions of both players is denoted as J i 0:T =x ⊤ T QixT + T−1X t=0 x⊤ t Qixt +u i t ⊤ Riui t, i∈ {L, F}, (49) whereQ i ⪰0andR i ≻0to guara...

work page 2002

[1] [1]

Algorithms for inverse reinforcement learning,

A. Y . Ng and S. J. Russell, “Algorithms for inverse reinforcement learning,” inProceedings of the Seventeenth International Conference on Machine Learning, ser. ICML ’00. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2000, p. 663–670

work page 2000

[2] [2]

Inverse Game Theory: Learning Utilities in Succinct Games,

V . Kuleshov and O. Schrijvers, “Inverse Game Theory: Learning Utilities in Succinct Games,” inWeb and Internet Economics, E. Markakis and G. Sch ¨afer, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2015, vol. 9470, pp. 413–427, series Title: Lecture Notes in Computer Science. [Online]. Available: https: //link.springer.com/10.1007/978-3-662-48995-6 30

work page doi:10.1007/978-3-662-48995-6 2015

[3] [3]

Bayesian inverse reinforcement learning,

D. Ramachandran and E. Amir, “Bayesian inverse reinforcement learning,” inProceedings of the 20th International Joint Conference on Artifical Intelligence, ser. IJCAI’07. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2007, p. 2586–2591

work page 2007

[4] [4]

Anderson and J

B. Anderson and J. Moore,Optimal Filtering, ser. Dover Books on Electrical Engineering. Dover Publications, 2012. [Online]. Available: https://books.google.com/books?id=iYMqLQp49UMC

work page 2012

[5] [5]

V on Stackelberg,Market Structure and Equilibrium

H. V on Stackelberg,Market Structure and Equilibrium. Berlin, Heidelberg: Springer Berlin Heidelberg, 1934. [Online]. Available: https://link.springer.com/10.1007/978-3-642-12586-7

work page doi:10.1007/978-3-642-12586-7 1934

[6] [6]

Bas ¸ar and G

T. Bas ¸ar and G. J. Olsder,Dynamic Noncooperative Game Theory, 2nd Edition. Society for Industrial and Applied Mathematics,

work page

[7] [7]

A Geometric Heuristic for Rectilinear Crossing Minimization

[Online]. Available: https://epubs.siam.org/doi/abs/10.1137/1. 9781611971132

work page doi:10.1137/1

[8] [8]

Stackelberg solution for two-person games with biased information patterns,

C. Chen and J. Cruz, “Stackelberg solution for two-person games with biased information patterns,”IEEE Transactions on Automatic Control, vol. 17, no. 6, pp. 791–798, Dec. 1972. [Online]. Available: https://ieeexplore.ieee.org/document/1100179/

work page arXiv 1972

[9] [9]

On the stackelberg strategy in nonzero- sum games,

M. Simaan and J. B. Cruz, “On the stackelberg strategy in nonzero- sum games,”J. Optim. Theory Appl., vol. 11, no. 5, p. 533–555, May

work page

[10] [10]

Available: https://doi.org/10.1007/BF00935665

[Online]. Available: https://doi.org/10.1007/BF00935665

work page doi:10.1007/bf00935665

[11] [11]

Additional aspects of the stackelberg strategy in non-zero sum games,

M. Simaan and J. Cruz, “Additional aspects of the stackelberg strategy in non-zero sum games,” inProceedings of the 1972 IEEE Conference on Decision and Control and 11th Symposium on Adaptive Processes, vol. 11, 01 1973, pp. 183 – 187

work page 1972

[12] [12]

A Stackelberg Game Theoretic Model of Lane-Merging,

J. Yoo and R. Langari, “A Stackelberg Game Theoretic Model of Lane-Merging,” Mar. 2020, arXiv:2003.09786 [cs]. [Online]. Available: http://arxiv.org/abs/2003.09786

work page arXiv 2020

[13] [13]

Comparison of different Stackelberg solutions in a deter- ministic dynamic pollution control

T. Vall ´ee, “Comparison of different Stackelberg solutions in a deter- ministic dynamic pollution control.”

work page

[14] [14]

A Real-Time Demand-Response Algorithm for Smart Grids: A Stackelberg Game Approach,

M. Yu and S. H. Hong, “A Real-Time Demand-Response Algorithm for Smart Grids: A Stackelberg Game Approach,”IEEE Transactions on Smart Grid, vol. 7, no. 2, pp. 879–888, Mar. 2016. [Online]. Available: https://ieeexplore.ieee.org/document/7073650/

work page arXiv 2016

[15] [15]

Noncooperative and Dominant Player Solutions in Discrete Dynamic Games,

F. Kydland, “Noncooperative and Dominant Player Solutions in Discrete Dynamic Games,”International Economic Review, vol. 16, no. 2, p. 321, Jun. 1975. [Online]. Available: https: //www.jstor.org/stable/2525814?origin=crossref

work page arXiv 1975

[16] [16]

Feedback Stackelberg strategy for a two player game,

B. Gardner and J. Cruz, “Feedback Stackelberg strategy for a two player game,”IEEE Transactions on Automatic Control, vol. 22, no. 2, pp. 270–271, Apr. 1977. [Online]. Available: https://ieeexplore.ieee.org/document/1101465/

work page arXiv 1977

[17] [17]

Closed-loop Stackelberg strategies with applications in the optimal control of multilevel systems,

T. Bas ¸ar and H. Selbuz, “Closed-loop Stackelberg strategies with applications in the optimal control of multilevel systems,”IEEE Transactions on Automatic Control, vol. 24, no. 2, pp. 166–179, Apr. 1979. [Online]. Available: https://ieeexplore.ieee.org/document/ 1101999/

work page 1979

[18] [18]

Equilibrium solutions in dynamic dominant-player models,

F. Kydland, “Equilibrium solutions in dynamic dominant-player models,”Journal of Economic Theory, vol. 15, no. 2, pp. 307– 324, Aug. 1977. [Online]. Available: https://linkinghub.elsevier.com/ retrieve/pii/0022053177901053

work page arXiv 1977

[19] [19]

Closed-loop Stackelberg solution to a multistage linear- quadratic game,

B. Tolwinski, “Closed-loop Stackelberg solution to a multistage linear- quadratic game,”Journal of Optimization Theory and Applications, vol. 34, no. 4, pp. 485–501, Aug. 1981. [Online]. Available: http://link.springer.com/10.1007/BF00935889

work page doi:10.1007/bf00935889 1981

[20] [20]

Learning to Play Trajectory Games Against Opponents With Unknown Objectives,

X. Liu, L. Peters, and J. Alonso-Mora, “Learning to Play Trajectory Games Against Opponents With Unknown Objectives,”IEEE Robotics and Automation Letters, vol. 8, no. 7, pp. 4139–4146, Jul. 2023. [Online]. Available: https://ieeexplore.ieee.org/document/10137879/

work page arXiv 2023

[21] [21]

Planning for Autonomous Cars that Leverage Effects on Human Actions,

D. Sadigh, S. Sastry, S. A. Seshia, and A. D. Dragan, “Planning for Autonomous Cars that Leverage Effects on Human Actions,” inRobotics: Science and Systems XII. Robotics: Science and Systems Foundation, 2016. [Online]. Available: http://www.roboticsproceedings.org/rss12/p29.pdf

work page 2016

[22] [22]

Inferring Objectives in Continuous Dynamic Games from Noise-Corrupted Partial State Observations,

L. Peters, D. Fridovich-Keil, V . Rubies-Royo, C. Tomlin, and C. Stachniss, “Inferring Objectives in Continuous Dynamic Games from Noise-Corrupted Partial State Observations,” inRobotics: Science and Systems XVII. Robotics: Science and Systems Foundation, Jul

work page

[23] [23]

Available: http://www.roboticsproceedings.org/rss17/ p030.pdf

[Online]. Available: http://www.roboticsproceedings.org/rss17/ p030.pdf

work page

[24] [24]

No- Regret Learning in Dynamic Stackelberg Games,

N. Lauffer, M. Ghasemi, A. Hashemi, Y . Savas, and U. Topcu, “No- Regret Learning in Dynamic Stackelberg Games,”IEEE Transactions on Automatic Control, vol. 69, no. 3, pp. 1418–1431, Mar. 2024. [Online]. Available: https://ieeexplore.ieee.org/document/10310098/

work page arXiv 2024

[25] [25]

ISBN 1581138385.DOI: 10.1145/1015330.1015430

P. Abbeel and A. Y . Ng, “Apprenticeship learning via inverse reinforcement learning,” inProceedings of the Twenty-First International Conference on Machine Learning, ser. ICML ’04. New York, NY , USA: Association for Computing Machinery, 2004, p. 1. [Online]. Available: https://doi.org/10.1145/1015330.1015430

work page doi:10.1145/1015330.1015430 2004

[26] [26]

Cooperation with humans of un- known intentions in confined spaces using the stackelberg friend-or- foe game,

X. Zhao, H. Hu, and D. Sun, “Cooperation with humans of un- known intentions in confined spaces using the stackelberg friend-or- foe game,”IEEE Transactions on Aerospace and Electronic Systems, 2024

work page 2024

[27] [27]

A motion planning framework con- sidering opportunity costs based on stackelberg games in interactive scenarios,

C. Zhang, J. Wang, and Q. Liu, “A motion planning framework con- sidering opportunity costs based on stackelberg games in interactive scenarios,”IEEE Transactions on Intelligent Transportation Systems, 2025

work page 2025

[28] [28]

Active Inverse Learning in Stackelberg Trajectory Games,

W. Ward, Y . Yu, J. Levy, N. Mehr, D. Fridovich-Keil, and U. Topcu, “Active Inverse Learning in Stackelberg Trajectory Games,” Oct. 2024, arXiv:2308.08017 [cs]. [Online]. Available: http://arxiv.org/abs/2308.08017

work page arXiv 2024

[29] [29]

Inverse game theory for stackelberg games: the blessing of bounded rationality,

J. Wu, W. Shen, F. Fang, and H. Xu, “Inverse game theory for stackelberg games: the blessing of bounded rationality,”Advances in Neural Information Processing Systems, vol. 35, pp. 32 186–32 198, 2022

work page 2022

[30] [30]

Time Consistency and Robustness of Equilibria in Non-Cooperative Dynamic Games,

T. Bas ¸ar, “Time Consistency and Robustness of Equilibria in Non-Cooperative Dynamic Games,” inContributions to Economic Analysis. Elsevier, 1989, vol. 181, pp. 9–54. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/B9780444873873500098

work page 1989

[31] [31]

Fudenberg and J

D. Fudenberg and J. Tirole,Game theory. MIT press, 1991

work page 1991

[32] [32]

A theory of dynamic oligopoly, iii: Cournot competition,

E. Maskin and J. Tirole, “A theory of dynamic oligopoly, iii: Cournot competition,”European economic review, vol. 31, no. 4, pp. 947–968, 1987

work page 1987

[33] [33]

A theory of dynamic oligopoly, i: Overview and quantity competition with large fixed costs,

——, “A theory of dynamic oligopoly, i: Overview and quantity competition with large fixed costs,”Econometrica: Journal of the Econometric Society, pp. 549–569, 1988

work page 1988

[34] [34]

Zero-sum stochastic stack- elberg games,

D. Goktas, S. Zhao, and A. Greenwald, “Zero-sum stochastic stack- elberg games,”Advances in Neural Information Processing Systems, vol. 35, pp. 11 658–11 672, 2022

work page 2022

[35] [35]

Bar-Shalom, T

Y . Bar-Shalom, T. Kirubarajan, and X.-R. Li,Estimation with Appli- cations to Tracking and Navigation. USA: John Wiley & Sons, Inc., 2002. APPENDIX A. LQ game We consider a finite horizon dynamic LQ game. The cost functions of both players is denoted as J i 0:T =x ⊤ T QixT + T−1X t=0 x⊤ t Qixt +u i t ⊤ Riui t, i∈ {L, F}, (49) whereQ i ⪰0andR i ≻0to guara...

work page 2002