pith. sign in

arxiv: 2508.06550 · v3 · submitted 2025-08-06 · 💻 cs.GT · cs.LG

Generative Bid Shading in Real-Time Bidding Advertising

Pith reviewed 2026-05-19 01:18 UTC · model grok-4.3

classification 💻 cs.GT cs.LG
keywords bid shadingreal-time biddinggenerative modelautoregressive generationsurplus optimizationreinforcement learningadvertising auction
0
0 comments X p. Extension

The pith

Generative Bid Shading generates shading ratios autoregressively via stepwise residuals to optimize surplus without unimodal assumptions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Generative Bid Shading as a direct response to limitations in existing real-time bidding methods. Traditional two-stage pipelines first model bid landscapes under unimodal assumptions then apply operations research steps, which break down on non-convex surplus curves and accumulate errors across stages. GBS instead trains a single autoregressive generative model that builds shading ratios incrementally from residuals, modeling dependencies between value intervals directly from data. A separate reward preference alignment module uses a channel-aware hierarchical network and group relative policy optimization to balance immediate and future surplus gains. The approach is validated through offline experiments and online deployment on a large advertising platform.

Core claim

Generative Bid Shading comprises an end-to-end generative model that utilizes an autoregressive approach to generate shading ratios by stepwise residuals, capturing complex value dependencies without relying on predefined priors, and a reward preference alignment system with CHNet and GRPO that optimizes both short-term and long-term surplus.

What carries the argument

Autoregressive generative model that produces shading ratios through successive residual steps to encode value dependencies directly.

If this is right

  • Handles non-convex surplus curves that defeat unimodal landscape models.
  • Eliminates cascading errors by replacing sequential two-stage workflows with a single generative pass.
  • Captures dependencies across discrete value intervals that independent discretization ignores.
  • Balances short-term and long-term surplus through explicit reward alignment and exploration terms.
  • Supports deployment at scale for billions of daily requests without intermediate model handoffs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The residual autoregressive structure may transfer to other continuous decision problems in auctions or dynamic pricing where value surfaces are non-convex.
  • Removing the need for hand-crafted priors could reduce maintenance costs when bidding environments shift across markets or channels.
  • The CHNet-GRPO alignment loop suggests a template for incorporating long-horizon rewards in other real-time optimization systems.

Load-bearing premise

That an autoregressive model trained on historical bidding data can accurately represent non-convex surplus curves and extract generalizable features without new errors from discretization or selection bias.

What would settle it

An online A/B test on the production platform showing no statistically significant lift in advertiser surplus when GBS replaces the existing two-stage baseline across millions of bid requests.

Figures

Figures reproduced from arXiv: 2508.06550 by Haitao Wang, Hao Ma, Shuli Wang, Wenshuai Chen, Xingxing Wang, Xue Wei, Yinhua Zhu, Yinqiu Huang, Yongqiang Zhang, Zongwei Wang.

Figure 1
Figure 1. Figure 1: Overview of two-stage bid shading process. [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The winning rate and surplus curves derived from [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The proposed generative model employs an encoder-decoder architecture to predict the token sequence autoregres [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The architecture of CHNet. It outputs the PDF dis [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The architecture of post-training with GRPO. [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Architecture of the online deployment with GBS. [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 6
Figure 6. Figure 6: The performance of CHNet. The experimental results are shown in [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
read the original abstract

Bid shading plays a crucial role in Real-Time Bidding (RTB) by adaptively adjusting the bid to avoid advertisers overspending. Existing mainstream two-stage methods, which first model bid landscapes and then optimize surplus using operations research techniques, are constrained by unimodal assumptions that fail to adapt for non-convex surplus curves and are vulnerable to cascading errors in sequential workflows. Additionally, existing discretization models of continuous values ignore the dependence between discrete intervals, reducing the model's error correction ability, while sample selection bias in bidding scenarios presents further challenges for prediction. To address these issues, this paper introduces Generative Bid Shading (GBS), which comprises two primary components: 1) an end-to-end generative model that utilizes an autoregressive approach to generate shading ratios by stepwise residuals, capturing complex value dependencies without relying on predefined priors; and 2) a reward preference alignment system, which incorporates a channel-aware hierarchical dynamic network (CHNet) as the reward model to extract fine-grained features, along with modules for surplus optimization and exploration utility reward alignment, ultimately optimizing both short-term and long-term surplus using group relative policy optimization (GRPO). Extensive experiments on both offline and online A/B tests validate GBS's effectiveness. Moreover, GBS has been deployed on the Meituan DSP platform, serving billions of bid requests daily.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes Generative Bid Shading (GBS) for real-time bidding in advertising. It consists of an end-to-end autoregressive generative model that generates shading ratios through stepwise residuals to capture complex value dependencies without predefined priors, and a reward preference alignment system incorporating a Channel-aware Hierarchical Dynamic Network (CHNet) as the reward model along with modules for surplus optimization and exploration utility reward alignment, optimized using Group Relative Policy Optimization (GRPO) for both short-term and long-term surplus. The approach is validated through offline and online A/B tests and has been deployed on the Meituan DSP platform serving billions of bid requests daily.

Significance. If the empirical results hold under rigorous scrutiny, this work could represent a meaningful advance in bid shading techniques by addressing limitations of two-stage methods and discretization approaches in handling non-convex surplus curves. The integration of generative modeling with preference alignment for multi-term optimization offers a fresh perspective in the game theory and advertising auction literature, potentially leading to more efficient bidding strategies in large-scale RTB systems.

major comments (2)
  1. [Abstract (validation paragraph)] The abstract states that 'Extensive experiments on both offline and online A/B tests validate GBS's effectiveness' but provides no quantitative results, error bars, baseline comparisons, or details on how non-convexity or sample bias were handled. This leaves the central empirical claim without visible supporting evidence in the provided text and undermines the ability to assess the claimed improvements.
  2. [Generative model component (described in abstract)] The autoregressive generation of shading ratios by stepwise residuals is presented as capturing complex dependencies without cascading errors. However, sequential conditional predictions in autoregressive decoding can still accumulate per-step residual errors, especially for continuous ratios. This risks undermining the asserted advantage over unimodal or discretized baselines when modeling non-convex surplus curves.
minor comments (1)
  1. [Notation and terminology] The introduction of new entities such as CHNet and GRPO would benefit from clearer definitions or references to prior work on similar hierarchical networks or policy optimization methods to aid reader comprehension.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below, clarifying our approach and indicating planned revisions to strengthen the presentation of results and technical details.

read point-by-point responses
  1. Referee: [Abstract (validation paragraph)] The abstract states that 'Extensive experiments on both offline and online A/B tests validate GBS's effectiveness' but provides no quantitative results, error bars, baseline comparisons, or details on how non-convexity or sample bias were handled. This leaves the central empirical claim without visible supporting evidence in the provided text and undermines the ability to assess the claimed improvements.

    Authors: We acknowledge that the abstract, due to its brevity, summarizes the validation at a high level without specific metrics. The full manuscript details the offline and online A/B test results in the Experiments section, including quantitative surplus improvements over baselines, error bars from repeated trials, and explicit handling of non-convex surplus curves through the generative residual modeling as well as sample bias mitigation via end-to-end training and the CHNet reward model. To make the central claims more immediately verifiable, we will revise the abstract to include key quantitative highlights (e.g., relative surplus gains and statistical significance) while remaining within length limits. revision: yes

  2. Referee: [Generative model component (described in abstract)] The autoregressive generation of shading ratios by stepwise residuals is presented as capturing complex dependencies without cascading errors. However, sequential conditional predictions in autoregressive decoding can still accumulate per-step residual errors, especially for continuous ratios. This risks undermining the asserted advantage over unimodal or discretized baselines when modeling non-convex surplus curves.

    Authors: We appreciate the referee highlighting this potential limitation of autoregressive approaches in general. Our design uses stepwise residual generation specifically to model inter-value dependencies and enable per-step correction, which we contrast with discretization methods that discard interval dependencies entirely. The manuscript provides both theoretical motivation and empirical ablations showing reduced error propagation and better performance on non-convex curves relative to unimodal and discretized baselines. To directly address the concern, we will expand the discussion of the generative component with additional analysis of residual error accumulation and further ablation results demonstrating the mitigation effect. revision: partial

Circularity Check

0 steps flagged

No circularity: model trained end-to-end on external bidding data with independent validation

full rationale

The paper introduces an autoregressive generative model for shading ratios and a CHNet+GRPO alignment system, both trained on observed bidding data and evaluated via offline/online A/B tests plus production deployment. No equations or sections reduce a claimed prediction or surplus objective to a fitted parameter or self-citation by construction; the surplus optimization uses standard policy-gradient methods on externally measured outcomes rather than redefining the target as the model's own output. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 2 invented entities

The central claim rests on the unstated premise that bidding data contains learnable stepwise residual dependencies that an autoregressive model can exploit without external priors, plus the assumption that CHNet features and GRPO rewards align with true advertiser surplus.

free parameters (2)
  • autoregressive residual steps
    Number of stepwise residuals used to generate shading ratios; chosen to capture dependencies but value not specified.
  • GRPO group size and reward weights
    Hyperparameters controlling short-term versus long-term surplus trade-off in the alignment stage.
axioms (1)
  • domain assumption Bidding surplus curves can be non-convex and that autoregressive generation can model them without unimodal assumptions.
    Invoked when contrasting with existing two-stage methods that rely on unimodal assumptions.
invented entities (2)
  • Channel-aware Hierarchical Dynamic Network (CHNet) no independent evidence
    purpose: Reward model to extract fine-grained features for surplus optimization.
    New component introduced to support the alignment system; no independent evidence outside this work.
  • Group Relative Policy Optimization (GRPO) no independent evidence
    purpose: Optimization method for aligning short- and long-term surplus rewards.
    Introduced or adapted for the bid-shading task; no external verification cited.

pith-pipeline@v0.9.0 · 5796 in / 1554 out tokens · 56525 ms · 2026-05-19T01:18:09.811689+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 2 internal anchors

  1. [1]

    2017.Demystifying Auction Dynamics for Digital Buyers and Sell- ers.https://www.appnexus.com/sites/default/files/whitepapers/49344-CM- Auction-Type-Whitepaper-V9.pdf

    AppNexus. 2017.Demystifying Auction Dynamics for Digital Buyers and Sell- ers.https://www.appnexus.com/sites/default/files/whitepapers/49344-CM- Auction-Type-Whitepaper-V9.pdf

  2. [2]

    Mohammad Gheshlaghi Azar, Zhaohan Daniel Guo, Bilal Piot, Remi Munos, Mark Rowland, Michal Valko, and Daniele Calandriello. 2024. A general theoret- ical paradigm to understand learning from human preferences. InInternational Conference on Artificial Intelligence and Statistics. PMLR, 4447–4455

  3. [3]

    2019.Rolling out first price auctions to Google Ad Manager part- ners

    Jason Bigler. 2019.Rolling out first price auctions to Google Ad Manager part- ners. https://blog.google/products/admanager/rolling-out-first-price-auctions- google-ad-manager-partners/

  4. [4]

    Olivier Chapelle. 2015. Offline evaluation of response prediction in online advertising auctions. InProceedings of the 24th international conference on world wide web. 919–922

  5. [5]

    Sayak Ray Chowdhury, Anush Kini, and Nagarajan Natarajan. [n. d.]. Provably Robust DPO: Aligning Language Models with Noisy Feedback. InForty-first International Conference on Machine Learning

  6. [6]

    John M Crespi and Richard J Sexton. 2005. A Multinomial logit framework to estimate bid shading in procurement auctions: Application to cattle sales in the Texas Panhandle.Review of industrial organization27, 3 (2005), 253–278

  7. [7]

    Ying Cui, Ruofei Zhang, Wei Li, and Jianchang Mao. 2011. Bid landscape forecast- ing in online ad exchange marketplace. InProceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. 265–273

  8. [8]

    Jiaxin Deng, Shiyao Wang, Kuo Cai, Lejian Ren, Qigen Hu, Weifeng Ding, Qiang Luo, and Guorui Zhou. 2025. Onerec: Unifying retrieve and rank with generative recommender and iterative preference alignment.arXiv preprint arXiv:2502.18965 (2025)

  9. [9]

    Jingtong Gao, Bo Chen, Menghui Zhu, Xiangyu Zhao, Xiaopeng Li, Yuhao Wang, Yichao Wang, Huifeng Guo, and Ruiming Tang. 2023. Scenario-Aware Hierar- chical Dynamic Network for Multi-Scenario Recommendation.arXiv preprint arXiv:2309.02061(2023)

  10. [10]

    2017.RTB Auctions: Fair Play?https://blog.getintent.com/rtb- auctions-fair-play-3b372d505089

    Getintent. 2017.RTB Auctions: Fair Play?https://blog.getintent.com/rtb- auctions-fair-play-3b372d505089

  11. [11]

    Djordje Gligorijevic, Tian Zhou, Bharatbhushan Shetty, Brendan Kitts, Shengjun Pan, Junwei Pan, and Aaron Flores. 2020. Bid shading in the brave new world of first-price auctions. InProceedings of the 29th ACM International Conference on Information & Knowledge Management. 2453–2460

  12. [12]

    Zhen Gong, Lvyin Niu, Yang Zhao, Miao Xu, Haoqi Zhang, Zhenzhe Zheng, Zhilin Zhang, Rongquan Bai, Chuan Yu, Jian Xu, et al. 2023. MEBS: Multi-task End-to-end Bid Shading for Multi-slot Display Advertising. InProceedings of the 32nd ACM International Conference on Information and Knowledge Management. 4588–4594

  13. [13]

    Xian Guo, Ben Chen, Siyuan Wang, Ying Yang, Chenyi Lei, Yuqing Ding, and Han Li. 2025. OneSug: The Unified End-to-End Generative Framework for E-commerce Query Suggestion.arXiv preprint arXiv:2506.06913(2025)

  14. [14]

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models.Advances in neural information processing systems33 (2020), 6840–6851

  15. [15]

    Jiani Huang, Zhenzhe Zheng, Yanrong Kang, and Zixiao Wang. 2024. From Sec- ond to First: Mixed Censored Multi-Task Learning for Winning Price Prediction. InProceedings of the 17th ACM International Conference on Web Search and Data Mining. 295–303

  16. [16]

    Xu Li, Michelle Ma Zhang, Zhenya Wang, and Youjun Tong. 2022. Arbitrary distribution modeling with censorship in real-time bidding advertising. InPro- ceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 3250–3258

  17. [17]

    Hairen Liao, Lingxiao Peng, Zhenchuan Liu, and Xuehua Shen. 2014. iPinYou global rtb bidding algorithm competition dataset. InProceedings of the Eighth International Workshop on Data Mining for Online Advertising. 1–6

  18. [18]

    Xiao Lin, Xiaokai Chen, Linfeng Song, Jingwei Liu, Biao Li, and Peng Jiang

  19. [19]

    InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

    Tree based progressive regression model for watch-time prediction in short-video recommendation. InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 4497–4506

  20. [20]

    Zhutian Lin, Junwei Pan, Shangyu Zhang, Ximei Wang, Xi Xiao, Shudong Huang, Lei Xiao, and Jie Jiang. 2024. Understanding the ranking loss for recommendation with sparse user feedback. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 5409–5418

  21. [21]

    Hongxu Ma, Kai Tian, Tao Zhang, Xuefeng Zhang, Han Zhou, Chunjie Chen, Han Li, Jihong Guan, and Shuigeng Zhou. 2024. Generative Regression Based Watch Time Prediction for Short-Video Recommendation.arXiv preprint arXiv:2412.20211(2024)

  22. [22]

    Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. 2022. Training language models to follow instructions with human feedback.Advances in neural information processing systems35 (2022), 27730–27744

  23. [23]

    Shengjun Pan, Brendan Kitts, Tian Zhou, Hao He, Bharatbhushan Shetty, Aaron Flores, Djordje Gligorijevic, Junwei Pan, Tingyu Mao, San Gultekin, et al. 2020. Bid shading by win-rate estimation and surplus maximization.arXiv preprint arXiv:2009.09259(2020)

  24. [24]

    Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. 2023. Direct preference optimization: Your language model is secretly a reward model.Advances in neural information processing systems36 (2023), 53728–53741

  25. [25]

    Shashank Rajput, Nikhil Mehta, Anima Singh, Raghunandan Hulikal Keshavan, Trung Vu, Lukasz Heldt, Lichan Hong, Yi Tay, Vinh Tran, Jonah Samost, et al

  26. [26]

    Recommender systems with generative retrieval.Advances in Neural Information Processing Systems36 (2023), 10299–10315

  27. [27]

    Kan Ren, Jiarui Qin, Lei Zheng, Zhengyu Yang, Weinan Zhang, and Yong Yu. 2019. Deep landscape forecasting for real-time bidding advertising. InProceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 363–372

  28. [28]

    Kan Ren, Weinan Zhang, Ke Chang, Yifei Rong, Yong Yu, and Jun Wang. 2017. Bidding machine: Learning to bid for directly optimizing profits in display ad- vertising.IEEE Transactions on Knowledge and Data Engineering30, 4 (2017), 645–659

  29. [29]

    Kan Ren, Weinan Zhang, Yifei Rong, Haifeng Zhang, Yong Yu, and Jun Wang

  30. [30]

    InProceedings of the 25th acm international on conference on information and knowledge management

    User response learning for directly optimizing campaign performance in display advertising. InProceedings of the 25th acm international on conference on information and knowledge management. 679–688

  31. [31]

    Burr Settles and Mark Craven. 2008. An analysis of active learning strategies for sequence labeling tasks. Inproceedings of the 2008 conference on empirical methods in natural language processing. 1070–1079

  32. [32]

    Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, et al. 2024. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300 (2024)

  33. [33]

    Nian Si, San Gultekin, Jose Blanchet, and Aaron Flores. 2023. Optimal bidding and experimentation for multi-layer auctions in online advertising.A vailable at SSRN 4358914(2023)

  34. [34]

    2017.Explainer: More On The Widespread Fee Practice Behind The Guardian’s Lawsuit Vs

    Sarah Sluis. 2017.Explainer: More On The Widespread Fee Practice Behind The Guardian’s Lawsuit Vs. Rubicon Project. https://www.adexchanger.com/ad- exchange-news/explainer-widespread-fee-practice-behind-guardians- lawsuit-vs-rubicon-project/

  35. [35]

    Jake Snell, Kevin Swersky, and Richard Zemel. 2017. Prototypical networks for few-shot learning.Advances in neural information processing systems30 (2017)

  36. [36]

    Kihyuk Sohn, Honglak Lee, and Xinchen Yan. 2015. Learning structured output representation using deep conditional generative models.Advances in neural information processing systems28 (2015)

  37. [37]

    Jie Sun, Zhaoying Ding, Xiaoshuang Chen, Qi Chen, Yincheng Wang, Kaiqiao Zhan, and Ben Wang. 2024. Cread: A classification-restoration framework with error adaptive discretization for watch time prediction in video recommender systems. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 9027–9034

  38. [38]

    Zixiao Wang, Zhenzhe Zheng, Yanrong Kang, and Jiani Huang. 2024. Cost- Effective Active Learning for Bid Exploration in Online Advertising. InProceed- ings of the 17th ACM International Conference on Web Search and Data Mining. 788–796

  39. [39]

    Wush Chi-Hsuan Wu, Mi-Yen Yeh, and Ming-Syan Chen. 2015. Predicting winning price in real time bidding with censored data. InProceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1305–1314

  40. [40]

    Wei Zhang, Brendan Kitts, Yanjun Han, Zhengyuan Zhou, Tingyu Mao, Hao He, Shengjun Pan, Aaron Flores, San Gultekin, and Tsachy Weissman. 2021. MEOW: A space-efficient nonparametric bid shading algorithm. InProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 3928–3936

  41. [41]

    Tian Zhou, Hao He, Shengjun Pan, Niklas Karlsson, Bharatbhushan Shetty, Brendan Kitts, Djordje Gligorijevic, San Gultekin, Tingyu Mao, Junwei Pan, et al

  42. [42]

    InProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

    An efficient deep distribution network for bid shading in first-price auctions. InProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 3996–4004

  43. [43]

    Wen-Yuan Zhu, Wen-Yueh Shih, Ying-Hsuan Lee, Wen-Chih Peng, and Jiun- Long Huang. 2017. A gamma-based regression for winning price estimation in real-time bidding advertising. In2017 IEEE International Conference on Big Data (Big Data). IEEE, 1610–1619. Yinqiu Huang et al. A Vocabulary Construction We use the shading ratios generated by the two-stage meth...