Learning vs. Optimizing Bidders in Budgeted Auctions

Balasubramanian Sivan; \'Eva Tardos; Giannis Fikioris

arxiv: 2604.08517 · v1 · submitted 2026-04-09 · 💻 cs.GT

Learning vs. Optimizing Bidders in Budgeted Auctions

Giannis Fikioris , Balasubramanian Sivan , \'Eva Tardos This is my paper

Pith reviewed 2026-05-10 17:03 UTC · model grok-4.3

classification 💻 cs.GT

keywords budgeted auctionsStackelberg equilibriumproportional controllerrepeated auctionssecond-price auctionsbudget constraintsstrategic manipulationnon-manipulability

0 comments

The pith

A learner using a proportional controller to pace bids ensures that an optimizer cannot exceed their utility in the Budgeted Stackelberg Equilibrium.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies repeated second-price auctions in which both a learning bidder and a strategic optimizer face budget constraints that span multiple rounds. It first generalizes the Stackelberg equilibrium to a budgeted version and proves that the optimizer's optimal strategy must decompose into at most k+1 distinct phases, each using its own mixed strategy. When the learner manages spending with a standard proportional controller, the optimizer's resulting utility is provably no higher than the value of this budgeted equilibrium. The result shows that a common practical heuristic for budget pacing remains robust even when rounds are linked by shared budgets.

Core claim

We generalize the classic Stackelberg equilibrium to the Budgeted Stackelberg Equilibrium for repeated second-price auctions in a Bayesian setting with cross-round budget constraints. The optimizer's optimal strategy requires time-multiplexing into up to k+1 phases, each possibly employing a distinct mixed strategy. When the learner employs a standard Proportional controller to pace bids, the optimizer's utility is upper bounded by their objective value in the Budgeted Stackelberg Equilibrium baseline.

What carries the argument

The Budgeted Stackelberg Equilibrium, which extends the classic version by incorporating cross-round budget constraints and allowing the optimizer to employ time-multiplexed mixed strategies across up to k+1 phases.

Load-bearing premise

The learner follows the proportional controller rule exactly while both agents know the value distributions and face strict per-round budget constraints that link spending across rounds.

What would settle it

A concrete bidding strategy for the optimizer that achieves strictly higher utility than the Budgeted Stackelberg Equilibrium value while the learner continues to follow the proportional controller.

Figures

Figures reproduced from arXiv: 2604.08517 by Balasubramanian Sivan, \'Eva Tardos, Giannis Fikioris.

**Figure 3.** Figure 3: The functions 𝑈O (·), 𝑃O (·), 𝑃L (·), of the example of Section D (left) and the optimizer’s optimal value when she uses a single distribution of actions with a budget of 𝜌 (right). distribution over 𝑣ˆO ∈ { 1 2 , 1, 2}, the optimization problem we have to solve is OPT d(𝜌) = max 𝜆,𝑞1,𝑞2,𝑞3≥0 𝑞1+𝑞2+𝑞3=1 𝑝1𝑈O ( 1 2 ) + 𝑝2𝑈O (1) + 𝑝3𝑈O (2) such that 𝜆 𝑝1 𝑃O ( 1 2 ) + 𝑝2 𝑃O (1) + 𝑝3 𝑃O (2) ≤ 𝜌 𝜆 𝑝1 𝑃L (… view at source ↗

**Figure 4.** Figure 4: The dual 𝑓 (𝜆, 𝑔) function of the example of Section E, along with the induced 𝑔 ★(𝜆) function, for 𝛿 = 0.05. Plotted for small and large 𝜆. We note that 𝑔 ★(𝜆) goes to 0 at a very slow rate, indicating the optimizer’s ability to gain substantial utility as 𝜆 (𝑡) increases in the repeated game. • Consider that 𝑉ˆ O ∈ [0, 1/2] with probability 𝑞1, achieving expected value 𝑣ˆO condition on that interval. The… view at source ↗

**Figure 5.** Figure 5: The experiments of Section E.1. For 105 ≤ 𝑇 ≤ 107 (x-axis) and different 𝛿 (different lines) we compare the optimizer’s reward when using strategy 1 (dotted lines that collapse to one line) and strategy 2 (solid lines). 𝑇 4 [PITH_FULL_IMAGE:figures/full_fig_p036_5.png] view at source ↗

read the original abstract

The study of repeated interactions between a learner and a utility-maximizing optimizer has yielded deep insights into the manipulability of learning algorithms. However, existing literature primarily focuses on independent, unlinked rounds, largely ignoring the ubiquitous practical reality of budget constraints. In this paper, we study this interaction in repeated second-price auctions in a Bayesian setting between a learning agent and a strategic agent, both subject to strict budget constraints, showing that such cross-round constraints fundamentally alter the strategic landscape. First, we generalize the classic Stackelberg equilibrium to the Budgeted Stackelberg Equilibrium. We prove that an optimizer's optimal strategy in a budgeted setting requires time-multiplexing; for a $k$-dimensional budget constraint, the optimal strategy strictly decomposes into up to $k+1$ distinct phases, with each phase employing a possibly unique mixed strategy (the case of $k=0$ recovers the classic Stackelberg equilibrium where the optimizer repeatedly uses a single mixed strategy). Second, we address the intriguing question of non-manipulability. We prove that when the learner employs a standard Proportional controller (the "P" of the PID-controller) to pace their bids, the optimizer's utility is upper bounded by their objective value in the Budgeted Stackelberg Equilibrium baseline. By bounding the dynamics of the PID controller via a novel analysis, our results establish that this widely used control-theoretic heuristic is actually strategically robust.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Budget constraints turn the repeated auction into a phased Stackelberg game, but a plain proportional pacing rule still caps the optimizer's utility at the budgeted equilibrium value.

read the letter

The key point is that adding cross-round budget constraints changes the strategic structure in repeated second-price auctions, and the paper gives a clean way to handle it. They define the Budgeted Stackelberg Equilibrium and show that the optimizer's optimal strategy decomposes into at most k+1 distinct phases for a k-dimensional budget. When k=0 this reduces to the usual repeated mixed strategy. They then prove that if the learner paces with the standard proportional controller, the optimizer's total utility cannot exceed what it gets in the BSE baseline. The controller analysis is the part that delivers the bound. This is new; earlier work on learning versus optimizing bidders assumed independent rounds and no budgets, so the phase decomposition and the specific robustness result for the P-controller fill a practical gap. The Bayesian setting with known distributions keeps the model tractable and lets them focus on the budget linkage. The result is conditional on the learner actually using the proportional rule, which the paper states clearly rather than hiding. That assumption is strong but realistic for many deployed pacing systems. The main soft spot is that the full dynamic analysis of the controller is only sketched in the abstract, so any gaps in the phase-transition arguments or the utility bound would need checking in the proofs. The paper does not claim the bound survives if the learner deviates or if distributions are unknown, which is honest but narrows the immediate applicability. This is worth a serious referee. It targets mechanism-design and online-auction researchers who care about budgets in ad markets or procurement. A reader who works on control-based bidding or repeated games will get concrete value from the equilibrium characterization and the robustness statement. I would send it out for review.

Referee Report

0 major / 2 minor

Summary. The paper studies repeated second-price auctions in a Bayesian setting with budget constraints linking across rounds, between a learner using a proportional pacing controller and a strategic optimizer. It generalizes Stackelberg equilibrium to the Budgeted Stackelberg Equilibrium (BSE), proving that an optimizer's optimal strategy decomposes into at most k+1 phases (each possibly using a distinct mixed strategy) for a k-dimensional budget. It further proves that the learner's exact use of the proportional controller upper-bounds the optimizer's utility by its BSE objective value, via a novel dynamics analysis of the controller.

Significance. If the results hold, the work is significant for showing how cross-round budget constraints fundamentally change the strategic landscape relative to independent-round models, while establishing robustness of a widely used control heuristic. The phase-decomposition characterization for multi-dimensional budgets and the conditional utility bound are notable contributions. The paper provides proofs for the equilibrium characterization and the utility bound, which strengthens the assessment.

minor comments (2)

[Abstract] Abstract: the statement that the optimal strategy 'strictly decomposes' into up to k+1 phases should be checked against the proof; if the decomposition is not always strict, rephrase to avoid overstatement while preserving the load-bearing claim.
[Abstract] Abstract: add a one-sentence pointer to the specific section containing the novel dynamics analysis of the proportional controller, to help readers locate the core technical argument for the utility bound.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of our work and for recommending minor revision. The referee's description accurately reflects our contributions on generalizing Stackelberg equilibrium to the budgeted setting and establishing the robustness of the proportional controller.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper's core results are a generalization of Stackelberg equilibrium to the budgeted case and a conditional upper-bound proof on optimizer utility when the learner exactly follows a proportional pacing controller. Both rest on explicit mathematical analysis of strategy decomposition into phases and dynamics bounding in a Bayesian setting with known distributions and cross-round budget constraints. No fitted parameters are renamed as predictions, no self-definitional loops appear in the equilibrium definition or bound, and no load-bearing self-citations reduce the central claims to prior unverified assertions by the same authors. The derivation chain is self-contained against the stated assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The claims rest on standard Bayesian mechanism-design assumptions and the existence of mixed-strategy equilibria in finite games; no new free parameters or invented entities are introduced.

axioms (2)

domain assumption Bayesian setting with common knowledge of value distributions
Invoked to define the Stackelberg interaction and equilibrium utilities.
domain assumption Strict budget constraints that couple decisions across rounds
Central to the generalization from classic to budgeted Stackelberg equilibrium.

pith-pipeline@v0.9.0 · 5561 in / 1318 out tokens · 56875 ms · 2026-05-10T17:03:35.865793+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

6 extracted references · 6 canonical work pages

[1]

For rounds𝑡≤𝑇/2, the learner’s payment is𝜆 (𝑡) 𝑃L(𝑣 ★ 1 )= 1 2𝜆 (𝑡) . This makes the expected change in the learner’s𝜆is𝜂(𝜌 L −𝜆 (𝑡) 𝑃L(𝑣 ★ 1 ))= 𝜂 2 (1−𝜆 (𝑡) ), implying the Learner will stabilize around𝜆 (𝑡) ≈1=𝜆 ★ 1 in the first𝑇/2rounds, as suggested by the Budgeted Stackelberg equilibrium. This implies that the optimizer is going to approximately sat...

work page
[2]

The optimizer bids𝜆 (𝑡) in round𝑡, as long as there is budget remaining

work page
[3]

If at any point there is no budget remaining, bid0

The optimizer bids𝜆 (𝑡) in round𝑡, for𝑡≤ 𝑇 2 −𝜏and then bids 1 𝜇 = 𝛿 2(1+𝛿) . If at any point there is no budget remaining, bid0. We notice that bidding𝜆 (𝑡) and then stopping after half the rounds is the BSE. In fact, because the learner starts at a low𝜆 (1) =0and takesΘ(1/𝜂)rounds to converge to𝜆=1, we expect strategy 1 to do slightly higher than the BS...

work page
[4]

Due to the large number of rounds, there is little variation between experiments, implying that random events concentrate very well

work page
[5]

In fact, for all the values of𝛿we examine, this strategy does consistently the same

For all the values of𝛿, strategy 1 does slightly better than the BSE value of 𝑇 4 . In fact, for all the values of𝛿we examine, this strategy does consistently the same

work page
[6]

Specifically, according to our previous analysis and since𝜂=𝑇 −2/3, we expect the optimizer’s reward to be 𝑇 4 1+𝑇 −𝛿/3 for small𝛿

Strategy 2 does considerably better. Specifically, according to our previous analysis and since𝜂=𝑇 −2/3, we expect the optimizer’s reward to be 𝑇 4 1+𝑇 −𝛿/3 for small𝛿. For𝛿=0.01, this reward becomes 34 0.0 0.2 0.4 0.6 0.8 1.0 Total Number of rounds T 1e7 0 1 2 3 4 5Total reward 1e6 = 0.01 = 0.05 = 0.10 = 0.20 = 0.50 = 0.90 0.0 0.2 0.4 0.6 0.8 1.0 Total N...

work page

[1] [1]

For rounds𝑡≤𝑇/2, the learner’s payment is𝜆 (𝑡) 𝑃L(𝑣 ★ 1 )= 1 2𝜆 (𝑡) . This makes the expected change in the learner’s𝜆is𝜂(𝜌 L −𝜆 (𝑡) 𝑃L(𝑣 ★ 1 ))= 𝜂 2 (1−𝜆 (𝑡) ), implying the Learner will stabilize around𝜆 (𝑡) ≈1=𝜆 ★ 1 in the first𝑇/2rounds, as suggested by the Budgeted Stackelberg equilibrium. This implies that the optimizer is going to approximately sat...

work page

[2] [2]

The optimizer bids𝜆 (𝑡) in round𝑡, as long as there is budget remaining

work page

[3] [3]

If at any point there is no budget remaining, bid0

The optimizer bids𝜆 (𝑡) in round𝑡, for𝑡≤ 𝑇 2 −𝜏and then bids 1 𝜇 = 𝛿 2(1+𝛿) . If at any point there is no budget remaining, bid0. We notice that bidding𝜆 (𝑡) and then stopping after half the rounds is the BSE. In fact, because the learner starts at a low𝜆 (1) =0and takesΘ(1/𝜂)rounds to converge to𝜆=1, we expect strategy 1 to do slightly higher than the BS...

work page

[4] [4]

Due to the large number of rounds, there is little variation between experiments, implying that random events concentrate very well

work page

[5] [5]

In fact, for all the values of𝛿we examine, this strategy does consistently the same

For all the values of𝛿, strategy 1 does slightly better than the BSE value of 𝑇 4 . In fact, for all the values of𝛿we examine, this strategy does consistently the same

work page

[6] [6]

Specifically, according to our previous analysis and since𝜂=𝑇 −2/3, we expect the optimizer’s reward to be 𝑇 4 1+𝑇 −𝛿/3 for small𝛿

Strategy 2 does considerably better. Specifically, according to our previous analysis and since𝜂=𝑇 −2/3, we expect the optimizer’s reward to be 𝑇 4 1+𝑇 −𝛿/3 for small𝛿. For𝛿=0.01, this reward becomes 34 0.0 0.2 0.4 0.6 0.8 1.0 Total Number of rounds T 1e7 0 1 2 3 4 5Total reward 1e6 = 0.01 = 0.05 = 0.10 = 0.20 = 0.50 = 0.90 0.0 0.2 0.4 0.6 0.8 1.0 Total N...

work page