Generative Bid Shading in Real-Time Bidding Advertising
Pith reviewed 2026-05-19 01:18 UTC · model grok-4.3
The pith
Generative Bid Shading generates shading ratios autoregressively via stepwise residuals to optimize surplus without unimodal assumptions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Generative Bid Shading comprises an end-to-end generative model that utilizes an autoregressive approach to generate shading ratios by stepwise residuals, capturing complex value dependencies without relying on predefined priors, and a reward preference alignment system with CHNet and GRPO that optimizes both short-term and long-term surplus.
What carries the argument
Autoregressive generative model that produces shading ratios through successive residual steps to encode value dependencies directly.
If this is right
- Handles non-convex surplus curves that defeat unimodal landscape models.
- Eliminates cascading errors by replacing sequential two-stage workflows with a single generative pass.
- Captures dependencies across discrete value intervals that independent discretization ignores.
- Balances short-term and long-term surplus through explicit reward alignment and exploration terms.
- Supports deployment at scale for billions of daily requests without intermediate model handoffs.
Where Pith is reading between the lines
- The residual autoregressive structure may transfer to other continuous decision problems in auctions or dynamic pricing where value surfaces are non-convex.
- Removing the need for hand-crafted priors could reduce maintenance costs when bidding environments shift across markets or channels.
- The CHNet-GRPO alignment loop suggests a template for incorporating long-horizon rewards in other real-time optimization systems.
Load-bearing premise
That an autoregressive model trained on historical bidding data can accurately represent non-convex surplus curves and extract generalizable features without new errors from discretization or selection bias.
What would settle it
An online A/B test on the production platform showing no statistically significant lift in advertiser surplus when GBS replaces the existing two-stage baseline across millions of bid requests.
Figures
read the original abstract
Bid shading plays a crucial role in Real-Time Bidding (RTB) by adaptively adjusting the bid to avoid advertisers overspending. Existing mainstream two-stage methods, which first model bid landscapes and then optimize surplus using operations research techniques, are constrained by unimodal assumptions that fail to adapt for non-convex surplus curves and are vulnerable to cascading errors in sequential workflows. Additionally, existing discretization models of continuous values ignore the dependence between discrete intervals, reducing the model's error correction ability, while sample selection bias in bidding scenarios presents further challenges for prediction. To address these issues, this paper introduces Generative Bid Shading (GBS), which comprises two primary components: 1) an end-to-end generative model that utilizes an autoregressive approach to generate shading ratios by stepwise residuals, capturing complex value dependencies without relying on predefined priors; and 2) a reward preference alignment system, which incorporates a channel-aware hierarchical dynamic network (CHNet) as the reward model to extract fine-grained features, along with modules for surplus optimization and exploration utility reward alignment, ultimately optimizing both short-term and long-term surplus using group relative policy optimization (GRPO). Extensive experiments on both offline and online A/B tests validate GBS's effectiveness. Moreover, GBS has been deployed on the Meituan DSP platform, serving billions of bid requests daily.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes Generative Bid Shading (GBS) for real-time bidding in advertising. It consists of an end-to-end autoregressive generative model that generates shading ratios through stepwise residuals to capture complex value dependencies without predefined priors, and a reward preference alignment system incorporating a Channel-aware Hierarchical Dynamic Network (CHNet) as the reward model along with modules for surplus optimization and exploration utility reward alignment, optimized using Group Relative Policy Optimization (GRPO) for both short-term and long-term surplus. The approach is validated through offline and online A/B tests and has been deployed on the Meituan DSP platform serving billions of bid requests daily.
Significance. If the empirical results hold under rigorous scrutiny, this work could represent a meaningful advance in bid shading techniques by addressing limitations of two-stage methods and discretization approaches in handling non-convex surplus curves. The integration of generative modeling with preference alignment for multi-term optimization offers a fresh perspective in the game theory and advertising auction literature, potentially leading to more efficient bidding strategies in large-scale RTB systems.
major comments (2)
- [Abstract (validation paragraph)] The abstract states that 'Extensive experiments on both offline and online A/B tests validate GBS's effectiveness' but provides no quantitative results, error bars, baseline comparisons, or details on how non-convexity or sample bias were handled. This leaves the central empirical claim without visible supporting evidence in the provided text and undermines the ability to assess the claimed improvements.
- [Generative model component (described in abstract)] The autoregressive generation of shading ratios by stepwise residuals is presented as capturing complex dependencies without cascading errors. However, sequential conditional predictions in autoregressive decoding can still accumulate per-step residual errors, especially for continuous ratios. This risks undermining the asserted advantage over unimodal or discretized baselines when modeling non-convex surplus curves.
minor comments (1)
- [Notation and terminology] The introduction of new entities such as CHNet and GRPO would benefit from clearer definitions or references to prior work on similar hierarchical networks or policy optimization methods to aid reader comprehension.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below, clarifying our approach and indicating planned revisions to strengthen the presentation of results and technical details.
read point-by-point responses
-
Referee: [Abstract (validation paragraph)] The abstract states that 'Extensive experiments on both offline and online A/B tests validate GBS's effectiveness' but provides no quantitative results, error bars, baseline comparisons, or details on how non-convexity or sample bias were handled. This leaves the central empirical claim without visible supporting evidence in the provided text and undermines the ability to assess the claimed improvements.
Authors: We acknowledge that the abstract, due to its brevity, summarizes the validation at a high level without specific metrics. The full manuscript details the offline and online A/B test results in the Experiments section, including quantitative surplus improvements over baselines, error bars from repeated trials, and explicit handling of non-convex surplus curves through the generative residual modeling as well as sample bias mitigation via end-to-end training and the CHNet reward model. To make the central claims more immediately verifiable, we will revise the abstract to include key quantitative highlights (e.g., relative surplus gains and statistical significance) while remaining within length limits. revision: yes
-
Referee: [Generative model component (described in abstract)] The autoregressive generation of shading ratios by stepwise residuals is presented as capturing complex dependencies without cascading errors. However, sequential conditional predictions in autoregressive decoding can still accumulate per-step residual errors, especially for continuous ratios. This risks undermining the asserted advantage over unimodal or discretized baselines when modeling non-convex surplus curves.
Authors: We appreciate the referee highlighting this potential limitation of autoregressive approaches in general. Our design uses stepwise residual generation specifically to model inter-value dependencies and enable per-step correction, which we contrast with discretization methods that discard interval dependencies entirely. The manuscript provides both theoretical motivation and empirical ablations showing reduced error propagation and better performance on non-convex curves relative to unimodal and discretized baselines. To directly address the concern, we will expand the discussion of the generative component with additional analysis of residual error accumulation and further ablation results demonstrating the mitigation effect. revision: partial
Circularity Check
No circularity: model trained end-to-end on external bidding data with independent validation
full rationale
The paper introduces an autoregressive generative model for shading ratios and a CHNet+GRPO alignment system, both trained on observed bidding data and evaluated via offline/online A/B tests plus production deployment. No equations or sections reduce a claimed prediction or surplus objective to a fitted parameter or self-citation by construction; the surplus optimization uses standard policy-gradient methods on externally measured outcomes rather than redefining the target as the model's own output. The derivation chain therefore remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (2)
- autoregressive residual steps
- GRPO group size and reward weights
axioms (1)
- domain assumption Bidding surplus curves can be non-convex and that autoregressive generation can model them without unimodal assumptions.
invented entities (2)
-
Channel-aware Hierarchical Dynamic Network (CHNet)
no independent evidence
-
Group Relative Policy Optimization (GRPO)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
AppNexus. 2017.Demystifying Auction Dynamics for Digital Buyers and Sell- ers.https://www.appnexus.com/sites/default/files/whitepapers/49344-CM- Auction-Type-Whitepaper-V9.pdf
work page 2017
-
[2]
Mohammad Gheshlaghi Azar, Zhaohan Daniel Guo, Bilal Piot, Remi Munos, Mark Rowland, Michal Valko, and Daniele Calandriello. 2024. A general theoret- ical paradigm to understand learning from human preferences. InInternational Conference on Artificial Intelligence and Statistics. PMLR, 4447–4455
work page 2024
-
[3]
2019.Rolling out first price auctions to Google Ad Manager part- ners
Jason Bigler. 2019.Rolling out first price auctions to Google Ad Manager part- ners. https://blog.google/products/admanager/rolling-out-first-price-auctions- google-ad-manager-partners/
work page 2019
-
[4]
Olivier Chapelle. 2015. Offline evaluation of response prediction in online advertising auctions. InProceedings of the 24th international conference on world wide web. 919–922
work page 2015
-
[5]
Sayak Ray Chowdhury, Anush Kini, and Nagarajan Natarajan. [n. d.]. Provably Robust DPO: Aligning Language Models with Noisy Feedback. InForty-first International Conference on Machine Learning
-
[6]
John M Crespi and Richard J Sexton. 2005. A Multinomial logit framework to estimate bid shading in procurement auctions: Application to cattle sales in the Texas Panhandle.Review of industrial organization27, 3 (2005), 253–278
work page 2005
-
[7]
Ying Cui, Ruofei Zhang, Wei Li, and Jianchang Mao. 2011. Bid landscape forecast- ing in online ad exchange marketplace. InProceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. 265–273
work page 2011
-
[8]
Jiaxin Deng, Shiyao Wang, Kuo Cai, Lejian Ren, Qigen Hu, Weifeng Ding, Qiang Luo, and Guorui Zhou. 2025. Onerec: Unifying retrieve and rank with generative recommender and iterative preference alignment.arXiv preprint arXiv:2502.18965 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [9]
-
[10]
2017.RTB Auctions: Fair Play?https://blog.getintent.com/rtb- auctions-fair-play-3b372d505089
Getintent. 2017.RTB Auctions: Fair Play?https://blog.getintent.com/rtb- auctions-fair-play-3b372d505089
work page 2017
-
[11]
Djordje Gligorijevic, Tian Zhou, Bharatbhushan Shetty, Brendan Kitts, Shengjun Pan, Junwei Pan, and Aaron Flores. 2020. Bid shading in the brave new world of first-price auctions. InProceedings of the 29th ACM International Conference on Information & Knowledge Management. 2453–2460
work page 2020
-
[12]
Zhen Gong, Lvyin Niu, Yang Zhao, Miao Xu, Haoqi Zhang, Zhenzhe Zheng, Zhilin Zhang, Rongquan Bai, Chuan Yu, Jian Xu, et al. 2023. MEBS: Multi-task End-to-end Bid Shading for Multi-slot Display Advertising. InProceedings of the 32nd ACM International Conference on Information and Knowledge Management. 4588–4594
work page 2023
- [13]
-
[14]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models.Advances in neural information processing systems33 (2020), 6840–6851
work page 2020
-
[15]
Jiani Huang, Zhenzhe Zheng, Yanrong Kang, and Zixiao Wang. 2024. From Sec- ond to First: Mixed Censored Multi-Task Learning for Winning Price Prediction. InProceedings of the 17th ACM International Conference on Web Search and Data Mining. 295–303
work page 2024
-
[16]
Xu Li, Michelle Ma Zhang, Zhenya Wang, and Youjun Tong. 2022. Arbitrary distribution modeling with censorship in real-time bidding advertising. InPro- ceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 3250–3258
work page 2022
-
[17]
Hairen Liao, Lingxiao Peng, Zhenchuan Liu, and Xuehua Shen. 2014. iPinYou global rtb bidding algorithm competition dataset. InProceedings of the Eighth International Workshop on Data Mining for Online Advertising. 1–6
work page 2014
-
[18]
Xiao Lin, Xiaokai Chen, Linfeng Song, Jingwei Liu, Biao Li, and Peng Jiang
-
[19]
InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
Tree based progressive regression model for watch-time prediction in short-video recommendation. InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 4497–4506
-
[20]
Zhutian Lin, Junwei Pan, Shangyu Zhang, Ximei Wang, Xi Xiao, Shudong Huang, Lei Xiao, and Jie Jiang. 2024. Understanding the ranking loss for recommendation with sparse user feedback. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 5409–5418
work page 2024
- [21]
-
[22]
Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. 2022. Training language models to follow instructions with human feedback.Advances in neural information processing systems35 (2022), 27730–27744
work page 2022
- [23]
-
[24]
Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. 2023. Direct preference optimization: Your language model is secretly a reward model.Advances in neural information processing systems36 (2023), 53728–53741
work page 2023
-
[25]
Shashank Rajput, Nikhil Mehta, Anima Singh, Raghunandan Hulikal Keshavan, Trung Vu, Lukasz Heldt, Lichan Hong, Yi Tay, Vinh Tran, Jonah Samost, et al
-
[26]
Recommender systems with generative retrieval.Advances in Neural Information Processing Systems36 (2023), 10299–10315
work page 2023
-
[27]
Kan Ren, Jiarui Qin, Lei Zheng, Zhengyu Yang, Weinan Zhang, and Yong Yu. 2019. Deep landscape forecasting for real-time bidding advertising. InProceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 363–372
work page 2019
-
[28]
Kan Ren, Weinan Zhang, Ke Chang, Yifei Rong, Yong Yu, and Jun Wang. 2017. Bidding machine: Learning to bid for directly optimizing profits in display ad- vertising.IEEE Transactions on Knowledge and Data Engineering30, 4 (2017), 645–659
work page 2017
-
[29]
Kan Ren, Weinan Zhang, Yifei Rong, Haifeng Zhang, Yong Yu, and Jun Wang
-
[30]
InProceedings of the 25th acm international on conference on information and knowledge management
User response learning for directly optimizing campaign performance in display advertising. InProceedings of the 25th acm international on conference on information and knowledge management. 679–688
-
[31]
Burr Settles and Mark Craven. 2008. An analysis of active learning strategies for sequence labeling tasks. Inproceedings of the 2008 conference on empirical methods in natural language processing. 1070–1079
work page 2008
-
[32]
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, et al. 2024. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[33]
Nian Si, San Gultekin, Jose Blanchet, and Aaron Flores. 2023. Optimal bidding and experimentation for multi-layer auctions in online advertising.A vailable at SSRN 4358914(2023)
work page 2023
-
[34]
2017.Explainer: More On The Widespread Fee Practice Behind The Guardian’s Lawsuit Vs
Sarah Sluis. 2017.Explainer: More On The Widespread Fee Practice Behind The Guardian’s Lawsuit Vs. Rubicon Project. https://www.adexchanger.com/ad- exchange-news/explainer-widespread-fee-practice-behind-guardians- lawsuit-vs-rubicon-project/
work page 2017
-
[35]
Jake Snell, Kevin Swersky, and Richard Zemel. 2017. Prototypical networks for few-shot learning.Advances in neural information processing systems30 (2017)
work page 2017
-
[36]
Kihyuk Sohn, Honglak Lee, and Xinchen Yan. 2015. Learning structured output representation using deep conditional generative models.Advances in neural information processing systems28 (2015)
work page 2015
-
[37]
Jie Sun, Zhaoying Ding, Xiaoshuang Chen, Qi Chen, Yincheng Wang, Kaiqiao Zhan, and Ben Wang. 2024. Cread: A classification-restoration framework with error adaptive discretization for watch time prediction in video recommender systems. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 9027–9034
work page 2024
-
[38]
Zixiao Wang, Zhenzhe Zheng, Yanrong Kang, and Jiani Huang. 2024. Cost- Effective Active Learning for Bid Exploration in Online Advertising. InProceed- ings of the 17th ACM International Conference on Web Search and Data Mining. 788–796
work page 2024
-
[39]
Wush Chi-Hsuan Wu, Mi-Yen Yeh, and Ming-Syan Chen. 2015. Predicting winning price in real time bidding with censored data. InProceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1305–1314
work page 2015
-
[40]
Wei Zhang, Brendan Kitts, Yanjun Han, Zhengyuan Zhou, Tingyu Mao, Hao He, Shengjun Pan, Aaron Flores, San Gultekin, and Tsachy Weissman. 2021. MEOW: A space-efficient nonparametric bid shading algorithm. InProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 3928–3936
work page 2021
-
[41]
Tian Zhou, Hao He, Shengjun Pan, Niklas Karlsson, Bharatbhushan Shetty, Brendan Kitts, Djordje Gligorijevic, San Gultekin, Tingyu Mao, Junwei Pan, et al
-
[42]
InProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining
An efficient deep distribution network for bid shading in first-price auctions. InProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 3996–4004
-
[43]
Wen-Yuan Zhu, Wen-Yueh Shih, Ying-Hsuan Lee, Wen-Chih Peng, and Jiun- Long Huang. 2017. A gamma-based regression for winning price estimation in real-time bidding advertising. In2017 IEEE International Conference on Big Data (Big Data). IEEE, 1610–1619. Yinqiu Huang et al. A Vocabulary Construction We use the shading ratios generated by the two-stage meth...
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.