Adaptive Bidding Policies for First-Price Auctions with Budget Constraints under Non-stationarity

Jiashuo Jiang; Yige Wang

arxiv: 2505.02796 · v2 · submitted 2025-05-05 · 💻 cs.GT · cs.LG

Adaptive Bidding Policies for First-Price Auctions with Budget Constraints under Non-stationarity

Yige Wang , Jiashuo Jiang This is my paper

Pith reviewed 2026-05-22 16:58 UTC · model grok-4.3

classification 💻 cs.GT cs.LG

keywords first-price auctionsbudget constraintsonline learningregret boundsnon-stationary distributionsdual gradient descentadaptive bidding

0 comments

The pith

A dual-gradient-descent policy achieves near-optimal regret for budget-constrained bidding in non-stationary first-price auctions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an adaptive bidding strategy for repeated first-price auctions where the bidder has a limited budget. It shows that by maintaining a dual variable to track budget usage and updating bids via gradient descent, the bidder can control her regret to scale with the square root of the number of auctions plus a term measuring how much the value distributions change over time. In a second setting, when the bidder has an advance prediction of how to allocate her budget across periods, the variation term disappears and only the prediction error remains in the bound. This matters because the shift from second-price to first-price auctions in advertising means bidders can no longer just bid their true value and must actively learn while respecting budgets.

Core claim

The proposed dual-gradient-descent-based bidding policy achieves a regret of tilde-O(sqrt(T)) plus a variation term reflecting the non-stationarity of value distributions in the uninformative setting, which is of optimal order. In the informative setting with a prediction of budget allocation, the regret becomes tilde-O(sqrt(T)) plus the prediction error term, removing the variation penalty.

What carries the argument

The dual-gradient-descent-based bidding policy that maintains a dual variable for the budget constraint as the bidder consumes her budget.

Load-bearing premise

The analysis assumes that an optimal policy with complete information on the stochasticity exists and that the bidder's dual variable can be maintained to enforce the budget constraint while the value distributions admit a well-defined variation measure.

What would settle it

A sequence of repeated first-price auctions where the measured cumulative regret exceeds tilde-O(sqrt(T)) plus the observed variation in value distributions would falsify the claimed bound.

read the original abstract

We study how a budget-constrained bidder should learn to adaptively bid in repeated first-price auctions to maximize her cumulative payoff. This problem arose due to an industry-wide shift from second-price auctions to first-price auctions in display advertising recently, which renders truthful bidding (i.e., always bidding one's private value) no longer optimal. We propose a simple dual-gradient-descent-based bidding policy that maintains a dual variable for budget constraint as the bidder consumes her budget. In analysis, we consider two settings regarding the bidder's knowledge of her private values in the future: (i) an uninformative setting where all the distributional knowledge (can be non-stationary) is entirely unknown to the bidder, and (ii) an informative setting where a prediction of the budget allocation in advance. We characterize the performance loss (or regret) relative to an optimal policy with complete information on the stochasticity. For uninformative setting, We show that the regret is \tilde{O}(\sqrt{T}) plus a variation term that reflects the non-stationarity of the value distributions, and this is of optimal order. We then show that we can get rid of the variation term with the help of the prediction; specifically, the regret is \tilde{O}(\sqrt{T}) plus the prediction error term in the informative setting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a dual-gradient-descent bidding rule that achieves near-optimal regret for budget-constrained first-price auctions when values are non-stationary, with separate bounds for the uninformative and informative cases.

read the letter

The core contribution is a straightforward dual-gradient-descent policy that tracks a budget dual variable while bidding in repeated first-price auctions. It delivers regret of order sqrt(T) plus a variation term in the fully unknown non-stationary case, and replaces the variation term with a prediction-error term once a budget-allocation forecast is available. Both bounds are claimed to be optimal order relative to a complete-information benchmark policy. That split between the two information regimes is the main technical step that feels new compared with earlier auction-learning work on stationary or second-price settings. The problem itself is timely because of the industry move to first-price auctions, where truthful bidding is no longer optimal and budget constraints matter in practice. The policy is simple enough that practitioners could implement it, and the regret statements are stated cleanly. The analysis appears to rest on standard online convex optimization ideas adapted to the auction constraint, which is fine as long as the dual updates remain stable. The main soft spot is that the abstract only sketches the regret decomposition; without the full derivation it is hard to check whether the non-stationarity variation is measured in a way that is both tight and easy to estimate, or whether the dual variable stays bounded under the stated assumptions. The complete-information benchmark is standard but strong, so the practical gap could be larger than the bounds suggest if the optimal policy is hard to approximate. This paper is aimed at researchers working on online learning for mechanism design and at people who build bidding systems for display ads. A reader who already knows the dual-descent literature on constrained bandits will see the extension quickly and can judge whether the auction-specific details add enough. It deserves a serious referee because the problem is well-motivated, the policy is concrete, and the claimed bounds are of the right form; any gaps in the proofs can be fixed in revision rather than requiring a complete rethink.

Referee Report

0 major / 2 minor

Summary. The paper proposes a dual-gradient-descent bidding policy for a budget-constrained bidder in repeated first-price auctions. It analyzes two settings: an uninformative setting with unknown (possibly non-stationary) value distributions, where regret is shown to be tilde-O(sqrt(T)) plus a variation term and claimed to be of optimal order, and an informative setting with a prediction of budget allocation, where regret is tilde-O(sqrt(T)) plus a prediction-error term. Both regrets are measured relative to an optimal complete-information policy.

Significance. If the regret bounds hold, the work is significant for delivering practical, near-optimal adaptive bidding strategies in non-stationary first-price auction environments, directly relevant to the recent industry shift in display advertising. The dual-variable approach for enforcing budget constraints combined with standard online-learning techniques for variation and prediction-error penalties extends existing methods in a clean way and supplies falsifiable performance guarantees.

minor comments (2)

Abstract: the phrase 'a prediction of the budget allocation in advance' is introduced without defining how the prediction is generated or what accuracy is assumed; this should be stated explicitly in the model section to make the informative-setting result self-contained.
The manuscript should include a brief discussion of how the variation measure for non-stationary distributions is computed or bounded in practice, as this term directly affects the reported regret.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of our work and the recommendation for minor revision. The referee summary accurately captures the dual-gradient-descent bidding policy, the two information settings, and the resulting regret bounds relative to the optimal complete-information benchmark. We are pleased that the practical relevance to display advertising and the clean extension of online-learning techniques are recognized.

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper derives regret bounds for a dual-gradient-descent bidding policy relative to an external benchmark consisting of an optimal policy with complete information on the stochasticity. The uninformative regret bound of tilde-O(sqrt(T)) plus a variation term for non-stationarity and the informative bound of tilde-O(sqrt(T)) plus prediction error are standard additive penalties from online learning and dynamic regret analysis; neither term is defined in terms of the proposed policy's outputs or fitted parameters. The analysis assumes existence of a well-defined optimal dual for the budget constraint but does not reduce any claimed result to a self-citation, ansatz smuggled via prior work, or renaming of a known empirical pattern. No load-bearing step collapses to the algorithm's own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard online-learning assumptions about the existence of an optimal benchmark policy and the ability to track a dual variable for the budget; no free parameters or invented entities are introduced in the abstract.

axioms (2)

domain assumption An optimal policy with complete information on the stochasticity exists and serves as the benchmark for regret.
Invoked when characterizing performance loss in both settings.
domain assumption The bidder can maintain and update a dual variable for the budget constraint as budget is consumed.
Core mechanism of the proposed bidding policy.

pith-pipeline@v0.9.0 · 5764 in / 1303 out tokens · 58107 ms · 2026-05-22T16:58:09.213264+00:00 · methodology

Adaptive Bidding Policies for First-Price Auctions with Budget Constraints under Non-stationarity

Core claim

What carries the argument

Load-bearing premise

What would settle it

discussion (0)