Efficiently Solving Mixed-Hierarchy Games with Quasi-Policy Approximations

Christian Ellis; David Fridovich-Keil; Dong Ho Lee; Hamzah Khan; Jesse Milzman; Jingqi Li; Tianyu Qiu; Wesley Suttle

arxiv: 2602.01568 · v2 · pith:IZUIAPNHnew · submitted 2026-02-02 · 💻 cs.GT · cs.RO

Efficiently Solving Mixed-Hierarchy Games with Quasi-Policy Approximations

Hamzah Khan , Dong Ho Lee , Jingqi Li , Tianyu Qiu , Christian Ellis , Jesse Milzman , Wesley Suttle , David Fridovich-Keil This is my paper

Pith reviewed 2026-05-21 15:08 UTC · model grok-4.3

classification 💻 cs.GT cs.RO

keywords mixed-hierarchy gamesquasi-policy approximationforest-structured gamesStackelberg-Nash equilibriainexact Newton methodmulti-robot coordinationKKT conditions

0 comments

The pith

A quasi-policy approximation removes higher-order derivatives from KKT conditions, enabling an inexact Newton method to solve mixed-hierarchy games with local exponential convergence.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that multi-robot coordination games often mix simultaneous Nash decisions and hierarchical Stackelberg decisions in a forest structure. Direct solution of the optimality conditions becomes intractable due to escalating high-order policy derivatives with depth. By introducing a quasi-policy approximation that eliminates these higher-order terms, the authors develop an inexact Newton method that solves the approximated system efficiently. They prove local exponential convergence for non-quadratic objectives and nonlinear constraints. This matters because it allows real-time computation for complex multi-agent scenarios where existing solvers fail.

Core claim

For N-robot forest-structured mixed-hierarchy games, where each robot leads a subtree in Stackelberg fashion while branches interact via Nash, the KKT conditions involve high-order derivatives of best-response policies. The quasi-policy approximation removes these higher-order derivatives, yielding an approximated KKT system solvable by an inexact Newton method that exhibits local exponential convergence even with non-quadratic objectives and nonlinear constraints.

What carries the argument

The quasi-policy approximation, which approximates best-response policies to eliminate higher-order derivatives in the optimality conditions of mixed-hierarchy games.

Load-bearing premise

The quasi-policy approximation that removes higher-order policy derivatives remains sufficiently accurate for the target class of forest-structured games and does not invalidate the local convergence result.

What would settle it

A concrete falsifier would be a counterexample game in the forest class where the inexact Newton method diverges or fails to converge exponentially due to the approximation error accumulating with depth.

Figures

Figures reproduced from arXiv: 2602.01568 by Christian Ellis, David Fridovich-Keil, Dong Ho Lee, Hamzah Khan, Jesse Milzman, Jingqi Li, Tianyu Qiu, Wesley Suttle.

**Figure 4.** Figure 4: We demonstrate real-time motion planning under complex hierarchi [PITH_FULL_IMAGE:figures/full_fig_p003_4.png] view at source ↗

**Figure 2.** Figure 2: We introduce three representations of a mixed-hierarchy game which [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Examples of invalid hierarchical information structure graphs. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: We simulate a target-guarding game in Gazebo with three Turtlebots. [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗

**Figure 5.** Figure 5: We experiment on four hierarchy structures on a merging scenario and [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗

**Figure 6.** Figure 6: We report the mean and standard deviation of the merit function value ∥π(sk )∥2 over 10 runs with random initial states under a fixed hierarchy structure, corroborating Theorem 1 via monotone decay and local exponential convergence of ∥π(sk )∥. vehicle 2. The final structure describes a situation in which the entire convoy is a Stackelberg chain led by vehicle 1, and they play a Nash game with vehicle … view at source ↗

read the original abstract

Multi-robot coordination often exhibits hierarchical structure, with some robots' decisions depending on the planned behaviors of others. While game theory provides a principled framework for such interactions, existing solvers struggle to handle mixed information structures that combine simultaneous (Nash) and hierarchical (Stackelberg) decision-making. We study N-robot forest-structured mixed-hierarchy games, in which each robot acts as a Stackelberg leader over its subtree while robots in different branches interact via Nash equilibria. We derive the Karush-Kuhn-Tucker (KKT) first-order optimality conditions for this class of games and show that they involve increasingly high-order derivatives of robots' best-response policies as the hierarchy depth grows, rendering a direct solution intractable. To overcome this challenge, we introduce a quasi-policy approximation that removes higher-order policy derivatives and develop an inexact Newton method for efficiently solving the resulting approximated KKT systems. We prove local exponential convergence of the proposed algorithm for games with non-quadratic objectives and nonlinear constraints. The approach is implemented in a highly optimized Julia library (MixedHierarchyGames.jl) and evaluated in hardware and simulated multi-agent experiments, demonstrating real-time convergence for complex mixed-hierarchy information structures.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a workable solver for mixed-hierarchy forest games by approximating away high-order policy derivatives in the KKT system and claims local exponential convergence plus real-time experiments.

read the letter

The main thing here is a method to solve N-robot forest games that mix Stackelberg and Nash interactions by approximating the best-response policies to cut out high-order derivatives from the KKT system. They set up the optimality conditions for these hierarchical structures and show how depth causes derivative blowup. The quasi-policy approximation removes those terms, turning the problem into something an inexact Newton solver can handle. They prove local exponential convergence for non-quadratic objectives and nonlinear constraints, which is the part that stands out. The Julia package and multi-agent experiments, including hardware, back it up with real-time results. This is new in how it targets the mixed-hierarchy forest case specifically, rather than general approximations in game solving. The implementation looks solid and the experiments demonstrate practical speed. The soft spot is the approximation's effect on the true solution. Dropping higher-order terms works if they are small near equilibrium, but for nonlinear constraints they may not vanish quickly. The proof claims convergence for the approximated system, but it should also show the approximated KKT point is close enough to the original game's equilibrium. If that gap is not controlled, the exponential rate might not apply to the actual game. The abstract does not detail the error analysis, so that section will need scrutiny. This paper is for people building solvers for multi-robot coordination or similar distributed systems. Anyone needing to compute equilibria in hierarchical info structures will find the approach and code useful. It deserves a serious referee. The combination of theory, approximation, and implementation is worth reviewing even if revisions are needed on the convergence details. I recommend putting it through peer review.

Referee Report

1 major / 2 minor

Summary. The paper studies N-robot forest-structured mixed-hierarchy games combining Nash and Stackelberg interactions. It derives the KKT optimality conditions, which involve high-order derivatives of best-response policies that grow with hierarchy depth. To address intractability, the authors introduce a quasi-policy approximation that drops higher-order policy derivatives, formulate an inexact Newton method on the resulting system, prove local exponential convergence for non-quadratic objectives and nonlinear constraints, and demonstrate real-time performance via a Julia implementation (MixedHierarchyGames.jl) together with hardware and simulated multi-agent experiments.

Significance. If the local exponential convergence result is shown to hold for the original (unapproximated) KKT system, the work would supply a practical, scalable solver for mixed-hierarchy multi-robot coordination problems that existing game-theoretic methods cannot handle efficiently. The open-source library and hardware validation constitute concrete strengths that would make the contribution immediately usable in robotics applications.

major comments (1)

[§5, Theorem 5.1] §5, Theorem 5.1: The stated local exponential convergence applies to the inexact Newton iterates on the quasi-policy approximated KKT system. For the headline claim to extend to the original game, the proof must additionally verify that the approximation error (the omitted higher-order policy derivatives) satisfies the standard inexact-Newton condition that the perturbation is o(‖r_k‖) where r_k is the residual of the true KKT system; without this control, the distance between the approximated and true solutions may remain O(1) near a nonlinear equilibrium and invalidate q-superlinear convergence.

minor comments (2)

[§3.2] §3.2: The notation for the best-response policy derivatives (e.g., the distinction between first- and higher-order terms) is introduced without an explicit small example; adding a two-robot toy instance would clarify how the quasi-policy truncation is applied.
[Figure 4] Figure 4: The convergence plots would be easier to interpret if the y-axis were normalized by the initial residual norm rather than shown in absolute scale.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading of the manuscript and for the constructive major comment. We address it directly below.

read point-by-point responses

Referee: [§5, Theorem 5.1] §5, Theorem 5.1: The stated local exponential convergence applies to the inexact Newton iterates on the quasi-policy approximated KKT system. For the headline claim to extend to the original game, the proof must additionally verify that the approximation error (the omitted higher-order policy derivatives) satisfies the standard inexact-Newton condition that the perturbation is o(‖r_k‖) where r_k is the residual of the true KKT system; without this control, the distance between the approximated and true solutions may remain O(1) near a nonlinear equilibrium and invalidate q-superlinear convergence.

Authors: We agree that Theorem 5.1 establishes local exponential convergence strictly for the inexact Newton method on the quasi-policy approximated KKT system. The quasi-policy approximation is introduced precisely to drop the higher-order policy derivatives that render the original KKT conditions intractable for deep hierarchies. We acknowledge that, without an additional argument showing the omitted terms satisfy the o(‖r_k‖) inexact-Newton perturbation condition, the solver is guaranteed to converge only to a root of the approximated system; the distance to a root of the true system could in principle remain bounded away from zero for sufficiently nonlinear problems. In the revised manuscript we will (i) explicitly restate the theorem and all claims to make the scope (approximated system) unambiguous and (ii) add a short discussion of approximation quality, supported by the existing numerical comparisons in the experiments section that show the approximated equilibria remain close to those recovered by more expensive exact methods on small instances. We do not claim to supply the full o(‖r_k‖) control for arbitrary nonlinear objectives and constraints, as deriving such a bound appears to require additional structural assumptions beyond those already stated. revision: partial

Circularity Check

0 steps flagged

No circularity detected in derivation chain

full rationale

The paper derives the KKT conditions directly from the definition of the mixed-hierarchy game, introduces the quasi-policy approximation as an explicit modeling choice to truncate higher-order policy derivatives, and then applies a standard inexact Newton solver to the resulting approximated system. The local exponential convergence result is stated for the algorithm on this approximated KKT system under the maintained assumptions of non-quadratic objectives and nonlinear constraints. No step reduces by construction to a fitted parameter, a self-citation chain, or a renaming of the target result; the approximation is introduced independently of the convergence claim rather than being justified by it.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; all technical details required to populate the ledger are absent.

pith-pipeline@v0.9.0 · 5761 in / 979 out tokens · 39771 ms · 2026-05-21T15:08:11.096064+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages

[1]

Advances in Neural Information Processing Systems37, 6648–6710 (2024)

Altabaa, A., Yang, Z.: On the role of information structure in reinforcement learning for partially-observable sequential teams and games. Advances in Neural Information Processing Systems37, 6648–6710 (2024)

work page 2024
[2]

Econometrica: Journal of the Econometric Society pp

Aumann, R.J.: Correlated equilibrium as an expression of bayesian ratio- nality. Econometrica: Journal of the Econometric Society pp. 1–18 (1987)

work page 1987
[3]

SIAM (1998)

Başar, T., Olsder, G.J.: Dynamic noncooperative game theory. SIAM (1998)

work page 1998
[4]

Automatica17(5), 749–754 (1981)

Başar, T.: Equilibrium strategies in dynamic games with multi-levels of hierarchy. Automatica17(5), 749–754 (1981)

work page 1981
[5]

arXiv preprint arXiv:2405.18703 (2024)

Becker, T., Sunberg, Z.: Bridging the gap between partially observ- able stochastic games and sparse pomdp methods. arXiv preprint arXiv:2405.18703 (2024)

work page arXiv 2024
[6]

An- nals of Operations Research34(1992)

Blair, C.: The computational complexity of multi-level linear programs. An- nals of Operations Research34(1992)

work page 1992
[7]

Operations Research21(1), 37–44 (1973)

Bracken, J., McGill, J.T.: Mathematical programs with optimization prob- lems in the constraints. Operations Research21(1), 37–44 (1973)

work page 1973
[8]

Games and Economic Behavior4(2), 182–201 (1992)

Brandenburger, A., Dekel, E., Geanakoplos, J.: Correlated equilibrium with generalized information structures. Games and Economic Behavior4(2), 182–201 (1992)

work page 1992
[9]

European Journal of Operational Research1(1), 1–11 (1977)

Candler, W., Norton, R.D.: Multi-level programming: theory, algorithms, and applications. European Journal of Operational Research1(1), 1–11 (1977)

work page 1977
[10]

Management Science70(10), 7308–7324 (2024)

Carvalho, M., Dragotto, G., Feijoo, F., Lodi, A., Sankaranarayanan, S.: When nash meets stackelberg. Management Science70(10), 7308–7324 (2024)

work page 2024
[11]

In: IJCAI

Castiglioni, M., Marchesi, A., Gatti, N., et al.: Be a leader or become a follower: The strategy to commit to with multiple leaders. In: IJCAI. pp. 123–129 (2019)

work page 2019
[12]

Annals of Operations Research 235(1), 129–153 (2015)

Chang, Y., Erera, A.L., White III, C.C.: Value of information for a leader– follower partially observed markov game. Annals of Operations Research 235(1), 129–153 (2015)

work page 2015
[13]

In: Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004

Emery-Montemerlo, R., Gordon, G., Schneider, J., Thrun, S.: Approximate solutions for partially observable stochastic games with common payoffs. In: Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004. pp. 136–143. IEEE (2004)

work page 2004
[14]

Computational management science6(4), 377–397 (2009)

Faísca, N.P., Saraiva, P.M., Rustem, B., Pistikopoulos, E.N.: A multi- parametric programming approach for multilevel hierarchical and decen- tralised optimisation problems. Computational management science6(4), 377–397 (2009)

work page 2009
[15]

In: Proceedings of the AAAI Confer- ence on Artificial Intelligence

Farina, G., Sandholm, T.: Model-free online learning in unknown sequential decision making problems and games. In: Proceedings of the AAAI Confer- ence on Artificial Intelligence. vol. 35, pp. 5381–5390 (2021) 18 Authors Suppressed Due to Excessive Length

work page 2021
[16]

Springer Sci- ence & Business Media (2012)

Filar, J., Vrieze, K.: Competitive Markov decision processes. Springer Sci- ence & Business Media (2012)

work page 2012
[17]

Journal of the Operational Research Society32(9), 783–792 (1981)

Fortuny-Amat, J., McCarl, B.A.: A representation of linear complemen- tarity problems as linear programs. Journal of the Operational Research Society32(9), 783–792 (1981)

work page 1981
[18]

In: IEEE International Conference on Robotics and Automation (ICRA) (2020)

Fridovich-Keil, D., Ratner, E., Peters, L., Dragan, A.D., Tomlin, C.J.: Efficient iterative linear-quadratic approximations for nonlinear multi- player general-sum differential games. In: IEEE International Conference on Robotics and Automation (ICRA) (2020)

work page 2020
[19]

In: AAAI

Hansen, E.A., Bernstein, D.S., Zilberstein, S.: Dynamic programming for partially observable stochastic games. In: AAAI. vol. 4, pp. 709–715 (2004)

work page 2004
[20]

Mathematical programming32(2), 146–164 (1985)

Jeroslow, R.G.: The polynomial hierarchy and a simple model for competi- tive analysis. Mathematical programming32(2), 146–164 (1985)

work page 1985
[21]

IEEE Robotics and Automation Letters (RA-L) (2024)

Khan, H., Fridovich-Keil, D.: Leadership inference for multi-agent interac- tions. IEEE Robotics and Automation Letters (RA-L) (2024)

work page 2024
[22]

In: 2004 IEEE/RSJ international conference on intelligent robots and systems (IROS)(IEEE Cat

Koenig, N., Howard, A.: Design and use paradigms for gazebo, an open- source multi-robot simulator. In: 2004 IEEE/RSJ international conference on intelligent robots and systems (IROS)(IEEE Cat. No. 04CH37566). vol. 3, pp. 2149–2154. Ieee (2004)

work page 2004
[23]

Advances in Neural Information Processing Systems34, 11987–11998 (2021)

Kozuno, T., Ménard, P., Munos, R., Valko, M.: Learning in two-player zero- sum partially observable markov games with perfect recall. Advances in Neural Information Processing Systems34, 11987–11998 (2021)

work page 2021
[24]

Springer Science & Business Media (2002)

Krantz, S.G., Parks, H.R.: The implicit function theorem: history, theory, and applications. Springer Science & Business Media (2002)

work page 2002
[25]

Set-Valued and Variational Analysis22(4), 691–720 (2014).https://doi.org/10.1007/s11228-013-0271-4

Kulkarni, A.A., Shanbhag, U.V.: A shared-constraint approach to multi- leader multi-follower games. Set-Valued and Variational Analysis22(4), 691–720 (2014).https://doi.org/10.1007/s11228-013-0271-4

work page doi:10.1007/s11228-013-0271-4 2014
[26]

arXiv preprint arXiv:2404.03767 (2024)

Laine, F.: Mathematical program networks. arXiv preprint arXiv:2404.03767 (2024)

work page arXiv 2024
[27]

SIAM Journal on Opti- mization (2023)

Laine, F., Fridovich-Keil, D., Chiu, C.Y., Tomlin, C.: The computation of approximate generalized feedback Nash equilibria. SIAM Journal on Opti- mization (2023)

work page 2023
[28]

SIAM Journal on Optimization (2024)

Li, J., Sojoudi, S., Tomlin, C., Fridovich-Keil, D.: The Computation of Ap- proximate Feedback Stackelberg Equilibria in Multi-Player Nonlinear Con- strained Dynamic Games. SIAM Journal on Optimization (2024)

work page 2024
[29]

Energy320, 135418 (2025)

Li, X., Wu, N., Lei, L.: Nash-stackelberg-nash three-layer mixed game op- timal control strategy for multi-integrated energy systems considering mul- tiple uncertainties. Energy320, 135418 (2025)

work page 2025
[30]

In: Machine learning proceedings 1994, pp

Littman, M.L.: Markov games as a framework for multi-agent reinforce- ment learning. In: Machine learning proceedings 1994, pp. 157–163. Elsevier (1994)

work page 1994
[31]

Advances in Neural Information Pro- cessing Systems35, 18296–18308 (2022)

Liu, Q., Szepesvári, C., Jin, C.: Sample-efficient reinforcement learning of partially observable markov games. Advances in Neural Information Pro- cessing Systems35, 18296–18308 (2022)

work page 2022
[32]

Cambridge University Press (1996) Title Suppressed Due to Excessive Length 19

Luo, Z.Q., Pang, J.S., Ralph, D.: Mathematical Programs with Equilibrium Constraints. Cambridge University Press (1996) Title Suppressed Due to Excessive Length 19

work page 1996
[33]

arXiv preprint arXiv:2504.15019 (2025)

Mohapatra, P.S., Reddy, P.V., Zaccour, G.: Feedback stackelberg-nash equi- libria in difference games with quasi-hierarchical interactions and inequality constraints. arXiv preprint arXiv:2504.15019 (2025)

work page arXiv 2025
[34]

In: The Foundations of Price Theory Vol 4, pp

Nash, J.F.: Non-cooperative games. In: The Foundations of Price Theory Vol 4, pp. 329–340. Routledge (2024)

work page 2024
[35]

Springer (1999)

Nocedal, J., Wright, S.J.: Numerical optimization. Springer (1999)

work page 1999
[36]

Peters, L.: SymbolicTracingUtils.jl.https://github.com/ JuliaGameTheoreticPlanning/SymbolicTracingUtils.jl(2025), gitHub repository, last accessed 2025-12-02

work page 2025
[37]

Advances in Neural Information Processing Systems34, 7522– 7533 (2021)

Sato, R., Tanaka, M., Takeda, A.: A gradient method for multilevel opti- mization. Advances in Neural Information Processing Systems34, 7522– 7533 (2021)

work page 2021
[38]

Mathematical Methods of Operations Research99(1), 77–114 (2024)

Shafiei, A., Kungurtsev, V., Marecek, J.: Trilevel and multilevel optimiza- tion using monotone operator theory. Mathematical Methods of Operations Research99(1), 77–114 (2024)

work page 2024
[39]

Proceedings of the national academy of sciences39(10), 1095–1100 (1953)

Shapley, L.S.: Stochastic games. Proceedings of the national academy of sciences39(10), 1095–1100 (1953)

work page 1953
[40]

Operations Research31(2), 254–272 (1983)

Sherali, H.D., Soyster, A.L., Murphy, F.H.: A stackelberg-nash-cournot equilibrium model for oligopolistic markets. Operations Research31(2), 254–272 (1983)

work page 1983
[41]

Journal of Information and Optimization Sciences2(1), 51–68 (1981)

Shi, W., Shimizu, K.: A general formulation of the bilevel programming problem. Journal of Information and Optimization Sciences2(1), 51–68 (1981)

work page 1981
[42]

(No Title) (1952)

Stackelberg, H.v.: The theory of the market economy. (No Title) (1952)

work page 1952
[43]

Advances in neural information processing systems20(2007)

Zinkevich, M., Johanson, M., Bowling, M., Piccione, C.: Regret minimiza- tion in games with incomplete information. Advances in neural information processing systems20(2007)

work page 2007

[1] [1]

Advances in Neural Information Processing Systems37, 6648–6710 (2024)

Altabaa, A., Yang, Z.: On the role of information structure in reinforcement learning for partially-observable sequential teams and games. Advances in Neural Information Processing Systems37, 6648–6710 (2024)

work page 2024

[2] [2]

Econometrica: Journal of the Econometric Society pp

Aumann, R.J.: Correlated equilibrium as an expression of bayesian ratio- nality. Econometrica: Journal of the Econometric Society pp. 1–18 (1987)

work page 1987

[3] [3]

SIAM (1998)

Başar, T., Olsder, G.J.: Dynamic noncooperative game theory. SIAM (1998)

work page 1998

[4] [4]

Automatica17(5), 749–754 (1981)

Başar, T.: Equilibrium strategies in dynamic games with multi-levels of hierarchy. Automatica17(5), 749–754 (1981)

work page 1981

[5] [5]

arXiv preprint arXiv:2405.18703 (2024)

Becker, T., Sunberg, Z.: Bridging the gap between partially observ- able stochastic games and sparse pomdp methods. arXiv preprint arXiv:2405.18703 (2024)

work page arXiv 2024

[6] [6]

An- nals of Operations Research34(1992)

Blair, C.: The computational complexity of multi-level linear programs. An- nals of Operations Research34(1992)

work page 1992

[7] [7]

Operations Research21(1), 37–44 (1973)

Bracken, J., McGill, J.T.: Mathematical programs with optimization prob- lems in the constraints. Operations Research21(1), 37–44 (1973)

work page 1973

[8] [8]

Games and Economic Behavior4(2), 182–201 (1992)

Brandenburger, A., Dekel, E., Geanakoplos, J.: Correlated equilibrium with generalized information structures. Games and Economic Behavior4(2), 182–201 (1992)

work page 1992

[9] [9]

European Journal of Operational Research1(1), 1–11 (1977)

Candler, W., Norton, R.D.: Multi-level programming: theory, algorithms, and applications. European Journal of Operational Research1(1), 1–11 (1977)

work page 1977

[10] [10]

Management Science70(10), 7308–7324 (2024)

Carvalho, M., Dragotto, G., Feijoo, F., Lodi, A., Sankaranarayanan, S.: When nash meets stackelberg. Management Science70(10), 7308–7324 (2024)

work page 2024

[11] [11]

In: IJCAI

Castiglioni, M., Marchesi, A., Gatti, N., et al.: Be a leader or become a follower: The strategy to commit to with multiple leaders. In: IJCAI. pp. 123–129 (2019)

work page 2019

[12] [12]

Annals of Operations Research 235(1), 129–153 (2015)

Chang, Y., Erera, A.L., White III, C.C.: Value of information for a leader– follower partially observed markov game. Annals of Operations Research 235(1), 129–153 (2015)

work page 2015

[13] [13]

In: Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004

Emery-Montemerlo, R., Gordon, G., Schneider, J., Thrun, S.: Approximate solutions for partially observable stochastic games with common payoffs. In: Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004. pp. 136–143. IEEE (2004)

work page 2004

[14] [14]

Computational management science6(4), 377–397 (2009)

Faísca, N.P., Saraiva, P.M., Rustem, B., Pistikopoulos, E.N.: A multi- parametric programming approach for multilevel hierarchical and decen- tralised optimisation problems. Computational management science6(4), 377–397 (2009)

work page 2009

[15] [15]

In: Proceedings of the AAAI Confer- ence on Artificial Intelligence

Farina, G., Sandholm, T.: Model-free online learning in unknown sequential decision making problems and games. In: Proceedings of the AAAI Confer- ence on Artificial Intelligence. vol. 35, pp. 5381–5390 (2021) 18 Authors Suppressed Due to Excessive Length

work page 2021

[16] [16]

Springer Sci- ence & Business Media (2012)

Filar, J., Vrieze, K.: Competitive Markov decision processes. Springer Sci- ence & Business Media (2012)

work page 2012

[17] [17]

Journal of the Operational Research Society32(9), 783–792 (1981)

Fortuny-Amat, J., McCarl, B.A.: A representation of linear complemen- tarity problems as linear programs. Journal of the Operational Research Society32(9), 783–792 (1981)

work page 1981

[18] [18]

In: IEEE International Conference on Robotics and Automation (ICRA) (2020)

Fridovich-Keil, D., Ratner, E., Peters, L., Dragan, A.D., Tomlin, C.J.: Efficient iterative linear-quadratic approximations for nonlinear multi- player general-sum differential games. In: IEEE International Conference on Robotics and Automation (ICRA) (2020)

work page 2020

[19] [19]

In: AAAI

Hansen, E.A., Bernstein, D.S., Zilberstein, S.: Dynamic programming for partially observable stochastic games. In: AAAI. vol. 4, pp. 709–715 (2004)

work page 2004

[20] [20]

Mathematical programming32(2), 146–164 (1985)

Jeroslow, R.G.: The polynomial hierarchy and a simple model for competi- tive analysis. Mathematical programming32(2), 146–164 (1985)

work page 1985

[21] [21]

IEEE Robotics and Automation Letters (RA-L) (2024)

Khan, H., Fridovich-Keil, D.: Leadership inference for multi-agent interac- tions. IEEE Robotics and Automation Letters (RA-L) (2024)

work page 2024

[22] [22]

In: 2004 IEEE/RSJ international conference on intelligent robots and systems (IROS)(IEEE Cat

Koenig, N., Howard, A.: Design and use paradigms for gazebo, an open- source multi-robot simulator. In: 2004 IEEE/RSJ international conference on intelligent robots and systems (IROS)(IEEE Cat. No. 04CH37566). vol. 3, pp. 2149–2154. Ieee (2004)

work page 2004

[23] [23]

Advances in Neural Information Processing Systems34, 11987–11998 (2021)

Kozuno, T., Ménard, P., Munos, R., Valko, M.: Learning in two-player zero- sum partially observable markov games with perfect recall. Advances in Neural Information Processing Systems34, 11987–11998 (2021)

work page 2021

[24] [24]

Springer Science & Business Media (2002)

Krantz, S.G., Parks, H.R.: The implicit function theorem: history, theory, and applications. Springer Science & Business Media (2002)

work page 2002

[25] [25]

Set-Valued and Variational Analysis22(4), 691–720 (2014).https://doi.org/10.1007/s11228-013-0271-4

Kulkarni, A.A., Shanbhag, U.V.: A shared-constraint approach to multi- leader multi-follower games. Set-Valued and Variational Analysis22(4), 691–720 (2014).https://doi.org/10.1007/s11228-013-0271-4

work page doi:10.1007/s11228-013-0271-4 2014

[26] [26]

arXiv preprint arXiv:2404.03767 (2024)

Laine, F.: Mathematical program networks. arXiv preprint arXiv:2404.03767 (2024)

work page arXiv 2024

[27] [27]

SIAM Journal on Opti- mization (2023)

Laine, F., Fridovich-Keil, D., Chiu, C.Y., Tomlin, C.: The computation of approximate generalized feedback Nash equilibria. SIAM Journal on Opti- mization (2023)

work page 2023

[28] [28]

SIAM Journal on Optimization (2024)

Li, J., Sojoudi, S., Tomlin, C., Fridovich-Keil, D.: The Computation of Ap- proximate Feedback Stackelberg Equilibria in Multi-Player Nonlinear Con- strained Dynamic Games. SIAM Journal on Optimization (2024)

work page 2024

[29] [29]

Energy320, 135418 (2025)

Li, X., Wu, N., Lei, L.: Nash-stackelberg-nash three-layer mixed game op- timal control strategy for multi-integrated energy systems considering mul- tiple uncertainties. Energy320, 135418 (2025)

work page 2025

[30] [30]

In: Machine learning proceedings 1994, pp

Littman, M.L.: Markov games as a framework for multi-agent reinforce- ment learning. In: Machine learning proceedings 1994, pp. 157–163. Elsevier (1994)

work page 1994

[31] [31]

Advances in Neural Information Pro- cessing Systems35, 18296–18308 (2022)

Liu, Q., Szepesvári, C., Jin, C.: Sample-efficient reinforcement learning of partially observable markov games. Advances in Neural Information Pro- cessing Systems35, 18296–18308 (2022)

work page 2022

[32] [32]

Cambridge University Press (1996) Title Suppressed Due to Excessive Length 19

Luo, Z.Q., Pang, J.S., Ralph, D.: Mathematical Programs with Equilibrium Constraints. Cambridge University Press (1996) Title Suppressed Due to Excessive Length 19

work page 1996

[33] [33]

arXiv preprint arXiv:2504.15019 (2025)

Mohapatra, P.S., Reddy, P.V., Zaccour, G.: Feedback stackelberg-nash equi- libria in difference games with quasi-hierarchical interactions and inequality constraints. arXiv preprint arXiv:2504.15019 (2025)

work page arXiv 2025

[34] [34]

In: The Foundations of Price Theory Vol 4, pp

Nash, J.F.: Non-cooperative games. In: The Foundations of Price Theory Vol 4, pp. 329–340. Routledge (2024)

work page 2024

[35] [35]

Springer (1999)

Nocedal, J., Wright, S.J.: Numerical optimization. Springer (1999)

work page 1999

[36] [36]

Peters, L.: SymbolicTracingUtils.jl.https://github.com/ JuliaGameTheoreticPlanning/SymbolicTracingUtils.jl(2025), gitHub repository, last accessed 2025-12-02

work page 2025

[37] [37]

Advances in Neural Information Processing Systems34, 7522– 7533 (2021)

Sato, R., Tanaka, M., Takeda, A.: A gradient method for multilevel opti- mization. Advances in Neural Information Processing Systems34, 7522– 7533 (2021)

work page 2021

[38] [38]

Mathematical Methods of Operations Research99(1), 77–114 (2024)

Shafiei, A., Kungurtsev, V., Marecek, J.: Trilevel and multilevel optimiza- tion using monotone operator theory. Mathematical Methods of Operations Research99(1), 77–114 (2024)

work page 2024

[39] [39]

Proceedings of the national academy of sciences39(10), 1095–1100 (1953)

Shapley, L.S.: Stochastic games. Proceedings of the national academy of sciences39(10), 1095–1100 (1953)

work page 1953

[40] [40]

Operations Research31(2), 254–272 (1983)

Sherali, H.D., Soyster, A.L., Murphy, F.H.: A stackelberg-nash-cournot equilibrium model for oligopolistic markets. Operations Research31(2), 254–272 (1983)

work page 1983

[41] [41]

Journal of Information and Optimization Sciences2(1), 51–68 (1981)

Shi, W., Shimizu, K.: A general formulation of the bilevel programming problem. Journal of Information and Optimization Sciences2(1), 51–68 (1981)

work page 1981

[42] [42]

(No Title) (1952)

Stackelberg, H.v.: The theory of the market economy. (No Title) (1952)

work page 1952

[43] [43]

Advances in neural information processing systems20(2007)

Zinkevich, M., Johanson, M., Bowling, M., Piccione, C.: Regret minimiza- tion in games with incomplete information. Advances in neural information processing systems20(2007)

work page 2007