Adaptive Bidding Policies for First-Price Auctions with Budget Constraints under Non-stationarity
Pith reviewed 2026-05-13 17:56 UTC · model grok-4.3
The pith
A dual-gradient-descent policy achieves order-optimal square-root regret for budget-constrained bidders in non-stationary first-price auctions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A dual-gradient-descent bidding policy that maintains a dual variable for the budget constraint achieves regret of order O(sqrt(T)) plus a Wasserstein-based variation term capturing non-stationarity in the uninformative setting, which is order-optimal. In the informative setting the variation term is eliminated by using advance predictions, yielding O(sqrt(T)) regret plus prediction error. A refined benchmark based on a per-period budget allocation plan achieves exactly O(sqrt(T)) regret, and the policy remains robust to both ideal and adversarial deviations from the planned allocation.
What carries the argument
Dual-gradient-descent bidding policy that maintains a dual variable for the budget constraint and updates bids as the budget is consumed.
Load-bearing premise
The bidder knows whether it operates in the uninformative or informative regime and an optimal complete-information benchmark policy exists.
What would settle it
Run the policy on a sequence of first-price auctions whose value distributions change at a known rate; measure whether observed regret exceeds O(sqrt(T)) by more than the computed Wasserstein variation term.
read the original abstract
In this paper, we study how a budget-constrained bidder should learn to bid adaptively in repeated first-price auctions to maximize cumulative payoff. This problem arises from the recent industry-wide shift from second-price auctions to first-price auctions in display advertising, which renders truthful bidding suboptimal. We propose a simple dual-gradient-descent-based bidding policy that maintains a dual variable for the budget constraint as the bidder consumes the budget. We analyze two settings based on the bidder's knowledge of future private values: (i) an uninformative setting where all distributional knowledge (potentially non-stationary) is entirely unknown, and (ii) an informative setting where a prediction of budget allocation is available in advance. We characterize the performance loss (regret) relative to an optimal policy with complete information. For uninformative setting, we show that the regret is ~O(sqrt(T)) plus a Wasserstein-based variation term capturing non-stationarity, which is order-optimal. In the informative setting, the variation term can be eliminated using predictions, yielding a regret of ~O(sqrt(T)) plus the prediction error. Furthermore, we go beyond the global budget constraint by introducing a refined benchmark based on a per-period budget allocation plan, achieving exactly ~O(sqrt(T)) regret. We also establish robustness guarantees when the baseline policy deviates from the planned allocation, covering both ideal and adversarial deviations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a dual-gradient-descent bidding policy for a budget-constrained bidder in repeated first-price auctions under non-stationary private-value distributions. It analyzes two regimes: an uninformative setting where the policy achieves regret ~O(sqrt(T)) plus a Wasserstein-based variation term capturing non-stationarity (claimed order-optimal relative to a complete-information benchmark), and an informative setting where predictions eliminate the variation term, again yielding ~O(sqrt(T)) plus prediction error. A refined per-period budget benchmark is introduced that removes the variation term entirely, and robustness results are stated for deviations from the planned allocation.
Significance. If the regret bounds and their order-optimality hold under the stated conditions, the work supplies the first adaptive, budget-aware bidding strategy with provable guarantees for the industry shift to first-price auctions. The use of Wasserstein variation to quantify non-stationarity and the per-period benchmark refinement are technically interesting extensions of online convex optimization to auction settings.
major comments (2)
- [Abstract] Abstract: the regularity conditions on the value distributions (bounded support, continuity/Lipschitz properties, etc.) required for the Wasserstein distance to bound instantaneous regret are not stated. Without these, it is impossible to confirm that the claimed O(sqrt(T)) + V_W bound is valid rather than an artifact of unstated restrictions.
- [Abstract] Abstract: the precise definition of the Wasserstein-based variation term V_W (including the metric, the way non-stationarity is aggregated across periods, and the proof that V_W is sublinear) is omitted. This definition is load-bearing for the order-optimality claim and must be supplied before the regret statement can be evaluated.
Simulated Author's Rebuttal
We thank the referee for the thoughtful review and positive assessment of the paper's contributions. We address the major comments below and will revise the manuscript to improve clarity.
read point-by-point responses
-
Referee: [Abstract] Abstract: the regularity conditions on the value distributions (bounded support, continuity/Lipschitz properties, etc.) required for the Wasserstein distance to bound instantaneous regret are not stated. Without these, it is impossible to confirm that the claimed O(sqrt(T)) + V_W bound is valid rather than an artifact of unstated restrictions.
Authors: We agree that the abstract should explicitly list the key regularity conditions for transparency. The distributions are assumed to have bounded support in [0,1] and to be Lipschitz continuous in the 1-Wasserstein metric, which ensures the per-period regret is controlled by the Wasserstein distance between consecutive distributions. We will revise the abstract to state these assumptions upfront. revision: yes
-
Referee: [Abstract] Abstract: the precise definition of the Wasserstein-based variation term V_W (including the metric, the way non-stationarity is aggregated across periods, and the proof that V_W is sublinear) is omitted. This definition is load-bearing for the order-optimality claim and must be supplied before the regret statement can be evaluated.
Authors: We acknowledge the abstract omits the formal definition. In the paper, V_W is defined as the sum_{t=1}^{T-1} W_1(F_t, F_{t+1}) using the 1-Wasserstein metric, where F_t denotes the value distribution in period t; the total variation term aggregates non-stationarity across periods. The regret bound O(sqrt(T) + V_W) holds relative to a complete-information benchmark whose lower bound also scales with V_W, establishing order-optimality without requiring V_W itself to be sublinear. We will add a concise definition of V_W and a note on the benchmark comparison to the abstract. revision: yes
Circularity Check
No circularity: regret bounds derived from standard online convex optimization and external variation measures
full rationale
The abstract describes a dual-gradient-descent policy whose regret is bounded by O(sqrt(T)) plus a Wasserstein variation term for non-stationarity (uninformative case) or prediction error (informative case), with a refined per-period benchmark yielding pure O(sqrt(T)). These are standard regret decompositions from online optimization literature; the Wasserstein term is an external measure of distributional shift, not defined in terms of the policy itself. No equations, self-citations, fitted parameters renamed as predictions, or ansatzes are present in the provided text. The derivation chain is therefore self-contained and does not reduce to its own inputs by construction.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
dual-gradient-descent-based bidding policy... regret is ~O(sqrt(T)) plus a Wasserstein-based variation term
-
IndisputableMonolith/Foundation/DimensionForcing.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Wasserstein-based variation term capturing non-stationarity
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.