Learning in Structured Stackelberg Games

Keegan Harris; Kiriaki Fragkia; Maria-Florina Balcan

arxiv: 2504.09006 · v4 · pith:ZBIFCJHFnew · submitted 2025-04-11 · 💻 cs.GT · cs.LG

Learning in Structured Stackelberg Games

Maria-Florina Balcan , Kiriaki Fragkia , Keegan Harris This is my paper

Pith reviewed 2026-05-22 21:15 UTC · model grok-4.3

classification 💻 cs.GT cs.LG

keywords structured Stackelberg gamesStackelberg-Littlestone dimensiononline learningregret boundscontextual informationsample complexitygame theory

0 comments

The pith

A new Stackelberg-Littlestone dimension tightly characterizes the leader's instance-optimal regret in structured Stackelberg games with contextual type information.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies Stackelberg games in which context can predict the follower's unknown type, as occurs in security and AI safety settings. Standard complexity measures from online learning fail to describe how hard it is for the leader to learn a good policy. The authors introduce the Stackelberg-Littlestone dimension and prove it exactly determines the leader's best achievable regret in the online case, yielding an algorithm that matches this bound. Two related dimensions play the same role for sample complexity in the distributional setting. A sympathetic reader would care because the extra structure from context changes what is learnable and how quickly.

Core claim

In structured Stackelberg games, where contextual information is predictive of the follower's type, the Stackelberg-Littlestone dimension tightly characterizes the leader's instance-optimal regret and supports a provably optimal online learning algorithm; two new dimensions similarly control the sample complexity upper and lower bounds in the distributional setting.

What carries the argument

The Stackelberg-Littlestone dimension, a measure of learning complexity for the leader that accounts for sequences of contexts and the follower's best responses to the leader's actions.

If this is right

The leader obtains an online algorithm whose regret matches the Stackelberg-Littlestone dimension on every instance.
Ordinary Littlestone dimension does not suffice to characterize regret once context predicts follower type.
In the distributional setting the sample complexity is bounded above and below by two distinct new dimensions.
The contextual structure strictly improves learnability compared with the unstructured case.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The dimension could be approximated efficiently in games with compact strategy spaces or known payoff structure.
Similar dimensions might be derived for repeated leader-follower interactions beyond one-shot Stackelberg play.
Empirical validation could involve running the algorithm on synthetic security-game instances with varying context quality and measuring realized regret.

Load-bearing premise

Contextual information must carry predictive value about the follower's type; without such predictive power the new dimension collapses to ordinary Littlestone dimension and loses its advantage.

What would settle it

Construct a family of contextual Stackelberg instances, compute the Stackelberg-Littlestone dimension, run the proposed algorithm, and check whether the leader's regret stays within a small constant factor of that dimension value across all instances.

Figures

Figures reproduced from arXiv: 2504.09006 by Keegan Harris, Kiriaki Fragkia, Maria-Florina Balcan.

**Figure 1.** Figure 1: H3-shattered Littlestone Tree showing a consistent hypothesis for each root-to-leaf path. The multiclass Littlestone dimension reflects how the ability of a function class to shatter deeper trees corresponds to greater complexity in the class.4 This definition is due to Daniely et al. [20] and it generalizes the classic definition of the Littlestone dimension for binary classification [31]. Definition 3.5… view at source ↗

**Figure 2.** Figure 2: Follower types as a function of the context in the construction of Theorem [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: H3-shattered SL-Tree showing the consistent hypothesis for each root-to-leaf path. Leaf nodes have weight 0, nodes at depth 1 have weight 1/2 and the root node has weight 1/2 + 2/3. Definition 3.9 (Stackelberg Littlestone (SL) dimension). The SL dimension of a hypothesis class H for Stackelberg game G is defined as SLdimG(H) := sup{κ ≥ 0 : there exists an SL tree shattered by H under G with root node weigh… view at source ↗

**Figure 4.** Figure 4: Example construction of an H-shattered Littlestone tree. Each node represents a context in Z = {1, 2, 3, 4} and each edge corresponds to a follower type. (a) (b) [PITH_FULL_IMAGE:figures/full_fig_p021_4.png] view at source ↗

**Figure 5.** Figure 5: H-shattered SL-Tree with (a) z = 1 as the root node and (b) z = 2 as the root node. that a follower of type f (i) best-responds to all x ∈ Xz(f (i) , af ) by playing action af , that is, Xz(f (i) , af ) = {x ∈ X : bf (i) (z, x) = af } Notice that any possible follower best-response region is (1) a subset of ∆(A) and (2) the intersection of finitely-many halfspaces. Therefore, every Xz(f (i) , af ) is conve… view at source ↗

read the original abstract

We initiate the study of structured Stackelberg games, a novel form of strategic interaction between a leader and a follower where contextual information can be predictive of the follower's (unknown) type. Motivated by applications such as security games and AI safety, we show how this additional structure can help the leader learn a utility-maximizing policy in both the online and distributional settings. In the online setting, we first prove that standard learning-theoretic measures of complexity do not characterize the difficulty of the leader's learning task. Notably, we find that there exists a learning-theoretic measure of complexity, analogous to the Littlestone dimension in online classification, that tightly characterizes the leader's instance-optimal regret. We term this the Stackelberg-Littlestone dimension, and leverage it to provide a provably optimal online learning algorithm. In the distributional setting, we provide analogous results by showing that two new dimensions control the sample complexity upper- and lower-bound.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces the Stackelberg-Littlestone dimension to tightly bound the leader's regret in contextual Stackelberg games and gives matching algorithms.

read the letter

The main point is that this paper defines the Stackelberg-Littlestone dimension to measure how context predicts the follower's type in these leader-follower games. It shows this new measure gives tight upper and lower bounds on the leader's instance-optimal regret in the online setting, along with two distributional analogs that control sample complexity. Standard dimensions like Littlestone or VC do not work here, which is a useful negative result before they build the positive one. The full paper supplies the formal definitions, the online algorithm whose regret scales with the dimension, and the matching lower-bound constructions, so the tightness claims hold up without obvious circularity or hidden assumptions on feedback or best responses. The authors credit the analogy to existing online learning work and keep the framing clean. A minor soft spot is that the practical gain depends on how informative the context really is; if context is weak or uninformative about type, the new dimension reduces to ordinary Littlestone and the extra structure adds little. The motivating applications in security and AI safety are noted but not explored with concrete data or simulations. This is for researchers in algorithmic game theory and online learning who want tools for strategic settings with partial information. A reader working on regret bounds or sample complexity in games would get direct value from the new measures and algorithms. It shows clear thinking and solid engagement with the literature. I would bring it to a reading group to walk through the dimension definition and bounds. It deserves peer review because the results are formally grounded and address a clear gap.

Referee Report

0 major / 2 minor

Summary. The paper initiates the study of structured Stackelberg games in which contextual information can predict the follower's unknown type. It shows that standard complexity measures such as the Littlestone dimension fail to characterize the leader's learning task, introduces the Stackelberg-Littlestone dimension that tightly characterizes instance-optimal regret in the online setting, supplies a matching optimal algorithm, and defines two new distributional dimensions that govern sample-complexity upper and lower bounds.

Significance. If the claimed tight characterizations hold, the work supplies the first instance-optimal learning-theoretic analysis for contextual Stackelberg games, extending classical online-learning dimensions to a strategic setting with explicit type-prediction structure. The matching upper/lower bounds and explicit optimal algorithm constitute a clear technical contribution with direct relevance to security games and AI-safety applications. The reader's abstract-only concern about missing proofs does not apply once the full manuscript is examined, as it contains the formal definitions, algorithm, and lower-bound constructions.

minor comments (2)

[Abstract] Abstract: the two distributional dimensions are referred to only as 'two new dimensions' without names or brief definitions; supplying their names would improve readability.
[§3] The manuscript would benefit from an explicit statement (perhaps in §3 or §4) of the precise feedback model assumed for the leader (e.g., whether the follower’s realized type or only the best response is observed).

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary, significance assessment, and recommendation of minor revision. No major comments were provided in the report, so we have no specific points to address point-by-point. We are pleased that the full manuscript addresses any concerns about missing proofs.

Circularity Check

0 steps flagged

New dimension defined by direct analogy; characterization results remain independent

full rationale

The paper defines the Stackelberg-Littlestone dimension by explicit analogy to the classical Littlestone dimension to capture context-dependent follower types in Stackelberg games, then proves that an online algorithm achieves regret scaling with this dimension and that a matching lower bound exists. This is a standard learning-theoretic characterization and does not reduce any claimed bound to a fitted parameter, self-citation chain, or definitional tautology. No load-bearing step collapses the regret result to its own inputs by construction; the new dimension supplies independent structure when context is predictive.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claims rest on the existence of new complexity measures whose definitions and properties are not supplied in the abstract; the work inherits standard online-learning and game-theoretic background but adds new objects whose independence from prior literature cannot be verified from the given text.

axioms (1)

standard math Standard assumptions of online learning theory and Stackelberg game models
The paper builds directly on the Littlestone dimension and classic Stackelberg setup without stating additional ad-hoc axioms in the abstract.

invented entities (2)

Stackelberg-Littlestone dimension no independent evidence
purpose: Measure that tightly characterizes the leader's instance-optimal regret
Newly introduced quantity defined by analogy to Littlestone dimension; no independent evidence supplied in abstract.
Two new distributional dimensions no independent evidence
purpose: Control sample-complexity upper and lower bounds
Newly introduced quantities for the distributional setting; no independent evidence supplied in abstract.

pith-pipeline@v0.9.0 · 5692 in / 1523 out tokens · 36729 ms · 2026-05-22T21:15:50.937476+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We term this the Stackelberg-Littlestone dimension... tightly characterizes the leader's instance-optimal regret (Theorem 3.10, Theorem 3.11).

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages

[1]

Strategic littlestone dimension: Improved bounds on online strategic classification.Advances in Neural Information Processing Systems, 37:101696–101724, 2024

Saba Ahmadi, Kunhe Yang, and Hanrui Zhang. Strategic littlestone dimension: Improved bounds on online strategic classification.Advances in Neural Information Processing Systems, 37:101696–101724, 2024

work page 2024
[2]

A Theory of PAC Learnability of Partial Concept Classes

Noga Alon, Steve Hanneke, Ron Holzman, and Shay Moran. A Theory of PAC Learnability of Partial Concept Classes. In2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), 2022

work page 2022
[3]

Bartlett.Neural Network Learning: Theoretical Foundations

Martin Anthony and Peter L. Bartlett.Neural Network Learning: Theoretical Foundations. Cambridge University Press, 1999

work page 1999
[4]

Optimal learners for realizable regression: Pac learning and online learning

Idan Attias, Steve Hanneke, Alkis Kalavasis, Amin Karbasi, and Grigoris Velegkas. Optimal learners for realizable regression: Pac learning and online learning. InProceedings of the 37th International Conference on Neural Information Processing Systems, 2023

work page 2023
[5]

The Sample Complexity of Stackelberg Games

Francesco Bacchiocchi, Matteo Bollini, Matteo Castiglioni, Alberto Marchesi, and Nicola Gatti. The Sample Complexity of Stackelberg Games. InInternational Conference on Artificial Intelligence and Statistics, pages 2053–2061, 2025

work page 2053
[6]

10-806 Foundations of Machine Learning and Data Science Lecture

Maria-Florina Balcan. 10-806 Foundations of Machine Learning and Data Science Lecture

work page
[7]

URLhttps://www.cs.cmu.edu/~ninamf/ML11/lect0908.pdf

2011. URLhttps://www.cs.cmu.edu/~ninamf/ML11/lect0908.pdf

work page 2011
[8]

Data-Driven Algorithm Design

Maria-Florina Balcan. Data-Driven Algorithm Design. In Tim Roughgarden, editor, Beyond the Worst-Case Analysis of Algorithms, pages 626–645. Cambridge University Press, Cambridge, 2021

work page 2021
[9]

Commitment without regrets: Online learning in Stackelberg security games

Maria-Florina Balcan, Avrim Blum, Nika Haghtalab, and Ariel D Procaccia. Commitment without regrets: Online learning in Stackelberg security games. InProceedings of the sixteenth ACM Conference on Economics and Computation, pages 61–78, 2015

work page 2015
[10]

Dispersion for data-driven algorithm design, online learning, and private optimization

Maria-Florina Balcan, Travis Dick, and Ellen Vitercik. Dispersion for data-driven algorithm design, online learning, and private optimization. In2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), 2018

work page 2018
[11]

A general theory of sample complexity for multi-item profit maximization

Maria-Florina Balcan, Tuomas Sandholm, and Ellen Vitercik. A general theory of sample complexity for multi-item profit maximization. InProceedings of the 2018 ACM Conference on Economics and Computation, pages 173–174, 2018

work page 2018
[12]

Semi-bandit optimization in the dispersed setting

Maria-Florina Balcan, Travis Dick, and Wesley Pegden. Semi-bandit optimization in the dispersed setting. InConference on Uncertainty in Artificial Intelligence, pages 909–918, 2020

work page 2020
[13]

Learning- to-learn non-convex piecewise-lipschitz functions

Maria-Florina Balcan, Mikhail Khodak, Dravyansh Sharma, and Ameet Talwalkar. Learning- to-learn non-convex piecewise-lipschitz functions. InProceedings of the 35th International Conference on Neural Information Processing Systems, 2021

work page 2021
[14]

How much data is sufficient to learn high-performing algorithms?J

Maria-Florina Balcan, Dan Deblasio, Travis Dick, Carl Kingsford, Tuomas Sandholm, and Ellen Vitercik. How much data is sufficient to learn high-performing algorithms?J. ACM,

work page
[15]

doi: 10.1145/3676278

work page doi:10.1145/3676278
[16]

Generalization guarantees for multi-item profit maximization: Pricing, auctions, and randomized mechanisms.Operations Research, 73(2):648–663, 2025

Maria-Florina Balcan, Tuomas Sandholm, and Ellen Vitercik. Generalization guarantees for multi-item profit maximization: Pricing, auctions, and randomized mechanisms.Operations Research, 73(2):648–663, 2025. 13

work page 2025
[17]

Nearly-optimal bandit learning in Stackelberg games with side information

Maria-Florina Balcan, Martino Bernasconi, Matteo Castiglioni, Andrea Celli, Keegan Harris, and Zhiwei Steven Wu. Nearly-optimal bandit learning in Stackelberg games with side information. InInternational Conference on Learning Representations (ICLR), 2026

work page 2026
[18]

Learning optimal commitment to overcome insecurity.Advances in Neural Information Processing Systems, 27, 2014

Avrim Blum, Nika Haghtalab, and Ariel D Procaccia. Learning optimal commitment to overcome insecurity.Advances in Neural Information Processing Systems, 27, 2014

work page 2014
[19]

A characterization of multiclass learnability

Nataly Brukhim, Daniel Carmon, Irit Dinur, Shay Moran, and Amir Yehudayoff. A characterization of multiclass learnability. In2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pages 943–955. IEEE, 2022

work page 2022
[20]

Computing the optimal strategy to commit to

Vincent Conitzer and Tuomas Sandholm. Computing the optimal strategy to commit to. In Proceedings of the 7th ACM conference on Electronic commerce, pages 82–90, 2006

work page 2006
[21]

Optimal learners for multiclass problems

Amit Daniely and Shai Shalev-Shwartz. Optimal learners for multiclass problems. In Conference on Learning Theory, pages 287–316. PMLR, 2014

work page 2014
[22]

Multiclass learn- ability and the erm principle

Amit Daniely, Sivan Sabato, Shai Ben-David, and Shai Shalev-Shwartz. Multiclass learn- ability and the erm principle. InProceedings of the 24th Annual Conference on Learning Theory. PMLR, 2011

work page 2011
[23]

Fast rates for nonparametric online learning: from realizability to learning in games

Constantinos Daskalakis and Noah Golowich. Fast rates for nonparametric online learning: from realizability to learning in games. InProceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, pages 846–859, 2022

work page 2022
[24]

Learning in auctions: Regret is hard, envy is easy

Constantinos Daskalakis and Vasilis Syrgkanis. Learning in auctions: Regret is hard, envy is easy. In2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 219–228. IEEE, 2016

work page 2016
[25]

Universal donsker classes and metric entropy

Richard M Dudley. Universal donsker classes and metric entropy. InSelected Works of RM Dudley, pages 345–365. Springer, 2010

work page 2010
[26]

Schapire

Yoav Freund and Robert E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting.Journal of Computer and System Sciences, 55(1):119–139, 1997

work page 1997
[27]

Multiclass online learning and uniform convergence

Steve Hanneke, Shay Moran, Vinod Raman, Unique Subedi, and Ambuj Tewari. Multiclass online learning and uniform convergence. InThe Thirty Sixth Annual Conference on Learning Theory, pages 5682–5696. PMLR, 2023

work page 2023
[28]

Regret minimization in Stackelberg games with side information

Keegan Harris, Steven Wu, and Maria Florina Balcan. Regret minimization in Stackelberg games with side information. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

work page 2024
[29]

Bayesian persuasion.American Economic Review, 101(6):2590–2615, 2011

Emir Kamenica and Matthew Gentzkow. Bayesian persuasion.American Economic Review, 101(6):2590–2615, 2011

work page 2011
[30]

Poaching detection technologies—a survey.Sensors, 18(5), 2018

Jacob Kamminga, Eyuel Ayele, Nirvana Meratnia, and Paul Havinga. Poaching detection technologies—a survey.Sensors, 18(5), 2018. ISSN 1424-8220. doi: 10.3390/s18051474

work page doi:10.3390/s18051474 2018
[31]

Complexity of computing optimal stackelberg strategies in security resource allocation games

Dmytro Korzhyk, Vincent Conitzer, and Ronald Parr. Complexity of computing optimal stackelberg strategies in security resource allocation games. InProceedings of the AAAI Conference on Artificial Intelligence, volume 24, pages 805–810, 2010

work page 2010
[32]

Learning and approximating the optimal strategy to commit to

Joshua Letchford, Vincent Conitzer, and Kamesh Munagala. Learning and approximating the optimal strategy to commit to. InAlgorithmic Game Theory: Second International Symposium, pages 250–262. Springer, 2009. 14

work page 2009
[33]

Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm.Machine learning, 2:285–318, 1988

Nick Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm.Machine learning, 2:285–318, 1988

work page 1988
[34]

Chasing moving targets with online self-play reinforcement learning for safer language models, 2025

Mickel Liu, Liwei Jiang, Yancheng Liang, Simon Shaolei Du, Yejin Choi, Tim Althoff, and Natasha Jaques. Chasing moving targets with online self-play reinforcement learning for safer language models, 2025. URLhttps://arxiv.org/abs/2506.07468

work page arXiv 2025
[35]

On learning sets and functions.Machine Learning, 4:67–97, 1989

Balas K Natarajan. On learning sets and functions.Machine Learning, 4:67–97, 1989

work page 1989
[36]

Learning optimal strategies to commit to

Binghui Peng, Weiran Shen, Pingzhong Tang, and Song Zuo. Learning optimal strategies to commit to. InProceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 2149–2156, 2019

work page 2019
[37]

Springer Science & Business Media, 2012

David Pollard.Convergence of stochastic processes. Springer Science & Business Media, 2012

work page 2012
[38]

Cambridge University Press, 2014

Shai Shalev-Shwartz and Shai Ben-David.Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, 2014. ISBN 1107057132

work page 2014
[39]

Learning piecewise lipschitz functions in changing environments

Dravyansh Sharma, Maria-Florina Balcan, and Travis Dick. Learning piecewise lipschitz functions in changing environments. InInternational Conference on Artificial Intelligence and Statistics, 2019

work page 2019
[40]

Red teaming llms: A stackelberg game approach to AI safety

Samit Shivadekar. Red teaming llms: A stackelberg game approach to AI safety. In2025 4th International Conference on Innovative Mechanisms for Industry Applications (ICIMIA), pages 1147–1154. IEEE, 2025

work page 2025
[41]

Stackelberg security games: Looking beyond a decade of success

Arunesh Sinha, Fei Fang, Bo An, Christopher Kiekintveld, and Milind Tambe. Stackelberg security games: Looking beyond a decade of success. InInternational Joint Conference on Artificial Intelligence, 2018

work page 2018
[42]

The effectiveness of stackelberg strategies and tolls for network congestion games

Chaitanya Swamy. The effectiveness of stackelberg strategies and tolls for network congestion games. InProceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pages 1133–1142, 2007

work page 2007
[43]

A theory of the learnable.Communications of the ACM, 27(11):1134–1142, 1984

Leslie G Valiant. A theory of the learnable.Communications of the ACM, 27(11):1134–1142, 1984

work page 1984
[44]

Springer, Vienna, Austria, 1934

Heinrich von Stackelberg.Marktform und Gleichgewicht. Springer, Vienna, Austria, 1934

work page 1934
[45]

if-then” rules, specifically of the form “ifℓ1 then k1, else if ℓ2 then k2, else if ℓ3 then k3, ..., else kfinal

Jing Wang, Jie Shen, Dean Foster, Zohar Karnin, and Jeremy C Weiss. Optimal budgeted adaptation of large language models.arXiv preprint arXiv:2602.00952, 2026. 15 A Applications beyond Stackelberg games Our results also apply to the problems of learning to bid in simultaneous second-price auctions with side information and Bayesian persuasion with both pu...

work page arXiv 2026

[1] [1]

Strategic littlestone dimension: Improved bounds on online strategic classification.Advances in Neural Information Processing Systems, 37:101696–101724, 2024

Saba Ahmadi, Kunhe Yang, and Hanrui Zhang. Strategic littlestone dimension: Improved bounds on online strategic classification.Advances in Neural Information Processing Systems, 37:101696–101724, 2024

work page 2024

[2] [2]

A Theory of PAC Learnability of Partial Concept Classes

Noga Alon, Steve Hanneke, Ron Holzman, and Shay Moran. A Theory of PAC Learnability of Partial Concept Classes. In2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), 2022

work page 2022

[3] [3]

Bartlett.Neural Network Learning: Theoretical Foundations

Martin Anthony and Peter L. Bartlett.Neural Network Learning: Theoretical Foundations. Cambridge University Press, 1999

work page 1999

[4] [4]

Optimal learners for realizable regression: Pac learning and online learning

Idan Attias, Steve Hanneke, Alkis Kalavasis, Amin Karbasi, and Grigoris Velegkas. Optimal learners for realizable regression: Pac learning and online learning. InProceedings of the 37th International Conference on Neural Information Processing Systems, 2023

work page 2023

[5] [5]

The Sample Complexity of Stackelberg Games

Francesco Bacchiocchi, Matteo Bollini, Matteo Castiglioni, Alberto Marchesi, and Nicola Gatti. The Sample Complexity of Stackelberg Games. InInternational Conference on Artificial Intelligence and Statistics, pages 2053–2061, 2025

work page 2053

[6] [6]

10-806 Foundations of Machine Learning and Data Science Lecture

Maria-Florina Balcan. 10-806 Foundations of Machine Learning and Data Science Lecture

work page

[7] [7]

URLhttps://www.cs.cmu.edu/~ninamf/ML11/lect0908.pdf

2011. URLhttps://www.cs.cmu.edu/~ninamf/ML11/lect0908.pdf

work page 2011

[8] [8]

Data-Driven Algorithm Design

Maria-Florina Balcan. Data-Driven Algorithm Design. In Tim Roughgarden, editor, Beyond the Worst-Case Analysis of Algorithms, pages 626–645. Cambridge University Press, Cambridge, 2021

work page 2021

[9] [9]

Commitment without regrets: Online learning in Stackelberg security games

Maria-Florina Balcan, Avrim Blum, Nika Haghtalab, and Ariel D Procaccia. Commitment without regrets: Online learning in Stackelberg security games. InProceedings of the sixteenth ACM Conference on Economics and Computation, pages 61–78, 2015

work page 2015

[10] [10]

Dispersion for data-driven algorithm design, online learning, and private optimization

Maria-Florina Balcan, Travis Dick, and Ellen Vitercik. Dispersion for data-driven algorithm design, online learning, and private optimization. In2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), 2018

work page 2018

[11] [11]

A general theory of sample complexity for multi-item profit maximization

Maria-Florina Balcan, Tuomas Sandholm, and Ellen Vitercik. A general theory of sample complexity for multi-item profit maximization. InProceedings of the 2018 ACM Conference on Economics and Computation, pages 173–174, 2018

work page 2018

[12] [12]

Semi-bandit optimization in the dispersed setting

Maria-Florina Balcan, Travis Dick, and Wesley Pegden. Semi-bandit optimization in the dispersed setting. InConference on Uncertainty in Artificial Intelligence, pages 909–918, 2020

work page 2020

[13] [13]

Learning- to-learn non-convex piecewise-lipschitz functions

Maria-Florina Balcan, Mikhail Khodak, Dravyansh Sharma, and Ameet Talwalkar. Learning- to-learn non-convex piecewise-lipschitz functions. InProceedings of the 35th International Conference on Neural Information Processing Systems, 2021

work page 2021

[14] [14]

How much data is sufficient to learn high-performing algorithms?J

Maria-Florina Balcan, Dan Deblasio, Travis Dick, Carl Kingsford, Tuomas Sandholm, and Ellen Vitercik. How much data is sufficient to learn high-performing algorithms?J. ACM,

work page

[15] [15]

doi: 10.1145/3676278

work page doi:10.1145/3676278

[16] [16]

Generalization guarantees for multi-item profit maximization: Pricing, auctions, and randomized mechanisms.Operations Research, 73(2):648–663, 2025

Maria-Florina Balcan, Tuomas Sandholm, and Ellen Vitercik. Generalization guarantees for multi-item profit maximization: Pricing, auctions, and randomized mechanisms.Operations Research, 73(2):648–663, 2025. 13

work page 2025

[17] [17]

Nearly-optimal bandit learning in Stackelberg games with side information

Maria-Florina Balcan, Martino Bernasconi, Matteo Castiglioni, Andrea Celli, Keegan Harris, and Zhiwei Steven Wu. Nearly-optimal bandit learning in Stackelberg games with side information. InInternational Conference on Learning Representations (ICLR), 2026

work page 2026

[18] [18]

Learning optimal commitment to overcome insecurity.Advances in Neural Information Processing Systems, 27, 2014

Avrim Blum, Nika Haghtalab, and Ariel D Procaccia. Learning optimal commitment to overcome insecurity.Advances in Neural Information Processing Systems, 27, 2014

work page 2014

[19] [19]

A characterization of multiclass learnability

Nataly Brukhim, Daniel Carmon, Irit Dinur, Shay Moran, and Amir Yehudayoff. A characterization of multiclass learnability. In2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pages 943–955. IEEE, 2022

work page 2022

[20] [20]

Computing the optimal strategy to commit to

Vincent Conitzer and Tuomas Sandholm. Computing the optimal strategy to commit to. In Proceedings of the 7th ACM conference on Electronic commerce, pages 82–90, 2006

work page 2006

[21] [21]

Optimal learners for multiclass problems

Amit Daniely and Shai Shalev-Shwartz. Optimal learners for multiclass problems. In Conference on Learning Theory, pages 287–316. PMLR, 2014

work page 2014

[22] [22]

Multiclass learn- ability and the erm principle

Amit Daniely, Sivan Sabato, Shai Ben-David, and Shai Shalev-Shwartz. Multiclass learn- ability and the erm principle. InProceedings of the 24th Annual Conference on Learning Theory. PMLR, 2011

work page 2011

[23] [23]

Fast rates for nonparametric online learning: from realizability to learning in games

Constantinos Daskalakis and Noah Golowich. Fast rates for nonparametric online learning: from realizability to learning in games. InProceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, pages 846–859, 2022

work page 2022

[24] [24]

Learning in auctions: Regret is hard, envy is easy

Constantinos Daskalakis and Vasilis Syrgkanis. Learning in auctions: Regret is hard, envy is easy. In2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 219–228. IEEE, 2016

work page 2016

[25] [25]

Universal donsker classes and metric entropy

Richard M Dudley. Universal donsker classes and metric entropy. InSelected Works of RM Dudley, pages 345–365. Springer, 2010

work page 2010

[26] [26]

Schapire

Yoav Freund and Robert E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting.Journal of Computer and System Sciences, 55(1):119–139, 1997

work page 1997

[27] [27]

Multiclass online learning and uniform convergence

Steve Hanneke, Shay Moran, Vinod Raman, Unique Subedi, and Ambuj Tewari. Multiclass online learning and uniform convergence. InThe Thirty Sixth Annual Conference on Learning Theory, pages 5682–5696. PMLR, 2023

work page 2023

[28] [28]

Regret minimization in Stackelberg games with side information

Keegan Harris, Steven Wu, and Maria Florina Balcan. Regret minimization in Stackelberg games with side information. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

work page 2024

[29] [29]

Bayesian persuasion.American Economic Review, 101(6):2590–2615, 2011

Emir Kamenica and Matthew Gentzkow. Bayesian persuasion.American Economic Review, 101(6):2590–2615, 2011

work page 2011

[30] [30]

Poaching detection technologies—a survey.Sensors, 18(5), 2018

Jacob Kamminga, Eyuel Ayele, Nirvana Meratnia, and Paul Havinga. Poaching detection technologies—a survey.Sensors, 18(5), 2018. ISSN 1424-8220. doi: 10.3390/s18051474

work page doi:10.3390/s18051474 2018

[31] [31]

Complexity of computing optimal stackelberg strategies in security resource allocation games

Dmytro Korzhyk, Vincent Conitzer, and Ronald Parr. Complexity of computing optimal stackelberg strategies in security resource allocation games. InProceedings of the AAAI Conference on Artificial Intelligence, volume 24, pages 805–810, 2010

work page 2010

[32] [32]

Learning and approximating the optimal strategy to commit to

Joshua Letchford, Vincent Conitzer, and Kamesh Munagala. Learning and approximating the optimal strategy to commit to. InAlgorithmic Game Theory: Second International Symposium, pages 250–262. Springer, 2009. 14

work page 2009

[33] [33]

Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm.Machine learning, 2:285–318, 1988

Nick Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm.Machine learning, 2:285–318, 1988

work page 1988

[34] [34]

Chasing moving targets with online self-play reinforcement learning for safer language models, 2025

Mickel Liu, Liwei Jiang, Yancheng Liang, Simon Shaolei Du, Yejin Choi, Tim Althoff, and Natasha Jaques. Chasing moving targets with online self-play reinforcement learning for safer language models, 2025. URLhttps://arxiv.org/abs/2506.07468

work page arXiv 2025

[35] [35]

On learning sets and functions.Machine Learning, 4:67–97, 1989

Balas K Natarajan. On learning sets and functions.Machine Learning, 4:67–97, 1989

work page 1989

[36] [36]

Learning optimal strategies to commit to

Binghui Peng, Weiran Shen, Pingzhong Tang, and Song Zuo. Learning optimal strategies to commit to. InProceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 2149–2156, 2019

work page 2019

[37] [37]

Springer Science & Business Media, 2012

David Pollard.Convergence of stochastic processes. Springer Science & Business Media, 2012

work page 2012

[38] [38]

Cambridge University Press, 2014

Shai Shalev-Shwartz and Shai Ben-David.Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, 2014. ISBN 1107057132

work page 2014

[39] [39]

Learning piecewise lipschitz functions in changing environments

Dravyansh Sharma, Maria-Florina Balcan, and Travis Dick. Learning piecewise lipschitz functions in changing environments. InInternational Conference on Artificial Intelligence and Statistics, 2019

work page 2019

[40] [40]

Red teaming llms: A stackelberg game approach to AI safety

Samit Shivadekar. Red teaming llms: A stackelberg game approach to AI safety. In2025 4th International Conference on Innovative Mechanisms for Industry Applications (ICIMIA), pages 1147–1154. IEEE, 2025

work page 2025

[41] [41]

Stackelberg security games: Looking beyond a decade of success

Arunesh Sinha, Fei Fang, Bo An, Christopher Kiekintveld, and Milind Tambe. Stackelberg security games: Looking beyond a decade of success. InInternational Joint Conference on Artificial Intelligence, 2018

work page 2018

[42] [42]

The effectiveness of stackelberg strategies and tolls for network congestion games

Chaitanya Swamy. The effectiveness of stackelberg strategies and tolls for network congestion games. InProceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pages 1133–1142, 2007

work page 2007

[43] [43]

A theory of the learnable.Communications of the ACM, 27(11):1134–1142, 1984

Leslie G Valiant. A theory of the learnable.Communications of the ACM, 27(11):1134–1142, 1984

work page 1984

[44] [44]

Springer, Vienna, Austria, 1934

Heinrich von Stackelberg.Marktform und Gleichgewicht. Springer, Vienna, Austria, 1934

work page 1934

[45] [45]

if-then” rules, specifically of the form “ifℓ1 then k1, else if ℓ2 then k2, else if ℓ3 then k3, ..., else kfinal

Jing Wang, Jie Shen, Dean Foster, Zohar Karnin, and Jeremy C Weiss. Optimal budgeted adaptation of large language models.arXiv preprint arXiv:2602.00952, 2026. 15 A Applications beyond Stackelberg games Our results also apply to the problems of learning to bid in simultaneous second-price auctions with side information and Bayesian persuasion with both pu...

work page arXiv 2026