arxiv: 2604.05845 · v1 · submitted 2026-04-07 · 💻 cs.GT · cs.LG

Recognition: 1 theorem link

· Lean Theorem

JD-BP: A Joint-Decision Generative Framework for Auto-Bidding and Pricing

Linghui Meng , Chun Gan , Shengsheng Niu , Chengcheng Zhang , Chenchen Li , Chuan Yang , Yi Mao , Xin Zhu

show 3 more authors

Jie He Zhangang Lin Ching Law

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:36 UTC · model grok-4.3

classification 💻 cs.GT cs.LG

keywords auto-biddingpricing correctiongenerative frameworkreal-time biddingKPI constraintstrajectory augmentationauction optimizationjoint decision

0 comments

The pith

JD-BP jointly outputs bids and additive pricing corrections to optimize auto-bidding despite prediction errors and delays.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a generative framework that decides both a bid value and a pricing correction term for each auction opportunity. This joint output is designed to keep advertisers on track toward ROI and budget targets when model predictions are imperfect or feedback arrives late. A memory-less return-to-go signal guides the bidding side toward future value, while the correction term offsets accumulated bias from past constraint violations. Trajectory augmentation lets the method plug into existing bidding policies without retraining them from scratch, and an energy-based preference optimization step with cross-attention refines the paired decisions. If the approach works as claimed, advertisers could maintain higher revenue and tighter cost control in live auctions even under realistic uncertainty.

Core claim

We propose JD-BP, a Joint generative Decision framework for Bidding and Pricing. Unlike prior methods, JD-BP jointly outputs a bid value and a pricing correction term that acts additively with the payment rule such as GSP. To mitigate adverse effects of historical constraint violations, we design a memory-less Return-to-Go that encourages future value maximizing of bidding actions while the cumulated bias is handled by the pricing correction. Moreover, a trajectory augmentation algorithm is proposed to generate joint bidding-pricing trajectories from a (possibly arbitrary) base bidding policy, enabling efficient plug-and-play deployment of our algorithm from existing RL/generative bidding模型.

What carries the argument

The pricing correction term output jointly with the bid and added to the base payment rule to offset historical constraint violations and model errors.

If this is right

Bidding policies can be augmented with pricing corrections without full retraining from scratch.
Memory-less return-to-go signals separate future value maximization from past bias correction.
Joint learning via energy-based preference optimization improves paired bid and correction quality.
The method delivers higher revenue and better cost control in both offline and online auction settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Advertisers could shift some constraint-management effort from bid logic to the pricing layer, simplifying policy design.
The additive correction idea might extend to auction formats other than GSP if the payment rule is known in advance.
Over longer campaigns the approach could stabilize KPI attainment rates by repeatedly correcting small cumulative drifts.

Load-bearing premise

The pricing correction term can reliably compensate for historical constraint violations and model errors without introducing new instabilities or adverse interactions with the base payment rule such as GSP.

What would settle it

Live A/B tests that show no measurable gain in ad revenue or target-cost adherence, or that reveal bidding instability traceable to the correction term, would falsify the central claim.

Figures

Figures reproduced from arXiv: 2604.05845 by Chenchen Li, Chengcheng Zhang, Ching Law, Chuan Yang, Chun Gan, Jie He, Linghui Meng, Shengsheng Niu, Xin Zhu, Yi Mao, Zhangang Lin.

**Figure 2.** Figure 2: Online deployment workflow of the bidding and [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

read the original abstract

Auto-bidding services optimize real-time bidding strategies for advertisers under key performance indicator (KPI) constraints such as target return on investment and budget. However, uncertainties such as model prediction errors and feedback latency can cause bidding strategies to deviate from ex-post optimality, leading to inefficient allocation. To address this issue, we propose JD-BP, a Joint generative Decision framework for Bidding and Pricing. Unlike prior methods, JD-BP jointly outputs a bid value and a pricing correction term that acts additively with the payment rule such as GSP. To mitigate adverse effects of historical constraint violations, we design a memory-less Return-to-Go that encourages future value maximizing of bidding actions while the cumulated bias is handled by the pricing correction. Moreover, a trajectory augmentation algorithm is proposed to generate joint bidding-pricing trajectories from a (possibly arbitrary) base bidding policy, enabling efficient plug-and-play deployment of our algorithm from existing RL/generative bidding models. Finally, we employ an Energy-Based Direct Preference Optimization method in conjunction with a cross-attention module to enhance the joint learning performance of bidding and pricing correction. Offline experiments on the AuctionNet dataset demonstrate that JD-BP achieves state-of-the-art performance. Online A/B tests at JD.com confirm its practical effectiveness, showing a 4.70% increase in ad revenue and a 6.48% improvement in target cost.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

JD-BP adds a joint generative bid-plus-correction output with online A/B gains at JD.com, but the additive pricing term's stability when errors are large is not convincingly shown.

read the letter

The paper's main contribution is a generative model that produces both a bid and an additive pricing correction term on top of the existing payment rule. It pairs this with a memory-less return-to-go to push future value maximization, a trajectory augmentation trick that lets the method start from any base bidding policy, and energy-based DPO plus cross-attention for joint training. These pieces are presented as a way to handle model errors and historical KPI violations without retraining everything from scratch. The offline AuctionNet results claim SOTA, and the online A/B test at JD.com reports a 4.70% revenue lift and 6.48% improvement on target cost. That live-traffic evidence is the part worth paying attention to, because it shows the framework can be deployed on real volume. The plug-and-play augmentation is also a practical plus for teams that already run RL or generative bidding models. The soft spot is the pricing correction itself. The claim is that it reliably offsets past violations and prediction errors while acting additively on GSP-style rules. Nothing in the reported experiments or analysis gives a clear bound or ablation on what happens when those errors are large or when the base policy already violates constraints. Over-correction could produce negative effective payments or ranking inconsistencies that hurt allocation, and the paper does not appear to test or bound that regime directly. The memory-less return-to-go helps with forward-looking value but leaves the correction to carry the accumulated bias, so any instability there would directly affect the reported gains. This work is aimed at practitioners running large-scale auto-bidding systems who want to layer a correction on top of existing pipelines. Readers outside industrial ad tech will find limited foundational novelty. It still deserves peer review because the online results give it concrete external grounding that most bidding papers lack; a referee can ask for the missing stability checks without dismissing the deployment evidence.

Referee Report

3 major / 2 minor

Summary. The paper proposes JD-BP, a joint generative decision framework for auto-bidding and pricing under KPI constraints. It outputs a bid value together with an additive pricing correction term that interacts with standard payment rules such as GSP. The method introduces a memory-less Return-to-Go objective, a trajectory augmentation procedure to generate joint bidding-pricing trajectories from an arbitrary base policy, and an Energy-Based Direct Preference Optimization objective combined with a cross-attention module. Offline experiments on the AuctionNet dataset are reported to achieve state-of-the-art performance; online A/B tests at JD.com are claimed to deliver a 4.70% increase in ad revenue and a 6.48% improvement in target cost.

Significance. If the central claims hold, the work would offer a practical plug-and-play extension for existing RL or generative bidding systems, allowing correction of historical constraint violations without retraining the base policy. The trajectory augmentation and joint generative formulation address a real deployment friction in large-scale advertising platforms. The reported online gains, if reproducible and statistically robust, would constitute a meaningful incremental advance for industrial auto-bidding.

major comments (3)

[§3.2] §3.2 (Pricing Correction Term): The manuscript states that the learned correction acts additively with GSP to offset historical KPI violations and model errors, yet provides no formal bound, stability analysis, or worst-case guarantee on the magnitude of the correction. Without such analysis it is unclear whether large prediction errors can produce negative effective payments, ranking inconsistencies, or degraded allocation efficiency, directly undermining the claimed revenue and cost improvements.
[§4] §4 (Offline Experiments): The SOTA claim on AuctionNet is presented without ablation tables isolating the contribution of the pricing correction versus the memory-less Return-to-Go or the Energy-Based DPO component. In addition, no error bars, statistical significance tests, or explicit data-exclusion rules are reported, making it impossible to assess whether the performance lift is robust or driven by a few high-variance runs.
[§5] §5 (Online A/B Tests): The 4.70% revenue and 6.48% target-cost improvements are stated without disclosing the number of impressions, the duration of the test, the exact definition of the target-cost metric, or any control for external market shocks. These omissions leave open the possibility that the observed gains are not attributable to the pricing correction term.

minor comments (2)

[Abstract] The abstract and introduction repeatedly use the phrase 'state-of-the-art performance' without specifying the exact metrics or the set of competing methods; a table listing all baselines and their scores should be added in §4.
[§3.1] Notation for the Return-to-Go and the pricing correction term is introduced without an explicit equation reference in the main text; adding numbered equations would improve readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses

Referee: [§3.2] §3.2 (Pricing Correction Term): The manuscript states that the learned correction acts additively with GSP to offset historical KPI violations and model errors, yet provides no formal bound, stability analysis, or worst-case guarantee on the magnitude of the correction. Without such analysis it is unclear whether large prediction errors can produce negative effective payments, ranking inconsistencies, or degraded allocation efficiency, directly undermining the claimed revenue and cost improvements.

Authors: We appreciate this observation. While the energy-based DPO and cross-attention module are designed to produce corrections that align with preference data from successful trajectories, we acknowledge the absence of formal bounds in the current manuscript. In the revised version, we will add a new subsection in §3.2 providing a discussion on the bounded nature of the correction term, derived from the normalization in the energy-based model and empirical constraints observed during training. We will also include a note on potential edge cases and how the joint decision framework mitigates risks of negative payments through trajectory augmentation. This addition will clarify the stability without claiming worst-case guarantees, which remain an open direction. revision: yes
Referee: [§4] §4 (Offline Experiments): The SOTA claim on AuctionNet is presented without ablation tables isolating the contribution of the pricing correction versus the memory-less Return-to-Go or the Energy-Based DPO component. In addition, no error bars, statistical significance tests, or explicit data-exclusion rules are reported, making it impossible to assess whether the performance lift is robust or driven by a few high-variance runs.

Authors: We agree that more comprehensive experimental reporting is necessary. We will expand §4 with dedicated ablation tables that isolate the impact of the pricing correction term, the memory-less Return-to-Go objective, and the Energy-Based DPO. Additionally, we will report results with error bars from multiple independent runs, include p-values for statistical significance against baselines, and explicitly state the data exclusion criteria (e.g., removal of invalid auction logs). These changes will substantiate the SOTA claim more rigorously. revision: yes
Referee: [§5] §5 (Online A/B Tests): The 4.70% revenue and 6.48% target-cost improvements are stated without disclosing the number of impressions, the duration of the test, the exact definition of the target-cost metric, or any control for external market shocks. These omissions leave open the possibility that the observed gains are not attributable to the pricing correction term.

Authors: We will enhance §5 with additional information on the test duration, the exact definition of the target-cost metric, and aggregate impression statistics where possible under confidentiality constraints. We will also elaborate on the A/B testing methodology used to account for external market conditions. Due to the sensitive nature of JD.com's operational data, specific granular numbers cannot be disclosed, but the improvements have been validated through internal replication and are supported by the offline experiments. revision: partial

Circularity Check

0 steps flagged

No significant circularity; framework assembles standard components without self-referential reductions

full rationale

The paper presents JD-BP as a joint generative framework combining existing RL elements (memory-less Return-to-Go, trajectory augmentation from a base policy) with Energy-Based DPO and cross-attention. No equations or claims reduce a derived quantity to a fitted parameter defined on the same data by construction, and no load-bearing uniqueness theorems or ansatzes are imported via self-citation. Performance results are reported from offline AuctionNet experiments and online A/B tests rather than from tautological predictions, leaving the derivation chain self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the framework is described as building on standard RL/generative bidding models and existing payment rules such as GSP.

pith-pipeline@v0.9.0 · 5576 in / 1156 out tokens · 27610 ms · 2026-05-10T18:36:19.540850+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean, IndisputableMonolith/Cost/FunctionalEquation.lean reality_from_one_distinction, washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

JD-BP jointly outputs a bid value and a pricing correction term that acts additively with the payment rule such as GSP... memory-less Return-to-Go... trajectory augmentation algorithm... Energy-Based Direct Preference Optimization... cross-attention module

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

47 extracted references · 12 canonical work pages · 6 internal anchors

[1]

Rishabh Agarwal, Dale Schuurmans, and Mohammad Norouzi. 2019. An opti- mistic perspective on offline reinforcement learning. InInternational Conference on Machine Learning. https://api.semanticscholar.org/CorpusID:212628904

2019
[2]

Gagan Aggarwal, Ashwinkumar Badanidiyuru, and Aranyak Mehta. 2019. Autobidding with constraints. InInternational Conference on Web and Internet Economics. Springer, 17–30

2019
[3]

Key, and Anton Schwaighofer

Kareem Amin, Michael Kearns, Peter B. Key, and Anton Schwaighofer. 2012. Budget optimization for sponsored search: censored learning in mdps.ArXiv, abs/1210.4847. https://api.semanticscholar.org/CorpusID:9179247

work page arXiv 2012
[4]

Balseiro, Yuan Deng, Jieming Mao, Vahab S

Santiago R. Balseiro, Yuan Deng, Jieming Mao, Vahab S. Mirrokni, and Song Zuo
[5]

Proceedings of the 22nd ACM Conference on Economics and Computation

The landscape of auto-bidding auctions: value versus utility maximization. Proceedings of the 22nd ACM Conference on Economics and Computation. https: //api.semanticscholar.org/CorpusID:234904963. JD-BP: A Joint-Decision Generative Framework for Auto-Bidding and Pricing Conference’17, July 2017, Washington, DC, USA

2017
[6]

Han Cai, Kan Ren, Weinan Zhang, Kleanthis Malialis, Jun Wang, Yong Yu, and Defeng Guo. 2017. Real-time bidding by reinforcement learning in display advertising.Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. https://api.semanticscholar.org/CorpusID:2077417

2017
[7]

Zhangming Chan et al. 2023. Capturing conversion rate fluctuation during sales promotions: a novel historical data reuse approach.Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. https: //api.semanticscholar.org/CorpusID:258833354

2023
[8]

Olivier Chapelle. 2014. Modeling delayed feedback in display advertising.Pro- ceedings of the 20th ACM SIGKDD international conference on Knowledge dis- covery and data mining. https://api.semanticscholar.org/CorpusID:14993056

2014
[9]

Abbeel, A

Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, P. Abbeel, A. Srinivas, and Igor Mordatch. 2021. Decision transformer: reinforcement learning via sequence modeling. InNeural Information Processing Systems. https://api.semanticscholar.org/CorpusID:235294299

2021
[10]

Yilun Du and Igor Mordatch. 2019. Implicit generation and modeling with energy based models. InNeural Information Processing Systems. https://api.sem anticscholar.org/CorpusID:202765664

2019
[11]

David S. Evans. 2009. The online advertising industry: economics, evolution, and privacy.Consumer Law eJournal. https://api.semanticscholar.org/Corpus ID:154745950

2009
[12]

Scott Fujimoto, David Meger, and Doina Precup. 2018. Off-policy deep rein- forcement learning without exploration. InInternational Conference on Machine Learning. https://api.semanticscholar.org/CorpusID:54457299

2018
[13]

Jingtong Gao et al. 2025. Generative auto-bidding with value-guided explo- rations.Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. https://api.semanticscholar.org/Cor pusID:277954657

2025
[14]

Jiayan Guo, Yusen Huo, Zhilin Zhang, Tianyu Wang, Chuan Yu, Jian Xu, Yan Zhang, and Bo Zheng. 2024. Aigb: generative auto-bidding via diffusion model- ing.ArXiv, abs/2405.16141. https://api.semanticscholar.org/CorpusID:27006303 0

work page arXiv 2024
[15]

Yue He, Xiujun Chen, Di Wu, Junwei Pan, Qing Tan, Chuan Yu, Jian Xu, and Xiaoqiang Zhu. 2021. A unified solution to constrained bidding in online display advertising. InProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 2993–3001

2021
[16]

Jonathan Ho, Ajay Jain, and P. Abbeel. 2020. Denoising diffusion probabilistic models.ArXiv, abs/2006.11239. https://api.semanticscholar.org/CorpusID:2199 55663

work page internal anchor Pith review arXiv 2020
[17]

Michael Janner, Qiyang Li, and Sergey Levine. 2021. Offline reinforcement learn- ing as one big sequence modeling problem. InNeural Information Processing Systems. https://api.semanticscholar.org/CorpusID:235313679

2021
[18]

Junqi Jin, Cheng-Ning Song, Han Li, Kun Gai, Jun Wang, and Weinan Zhang
[19]

https://api.semanticscholar.org/CorpusID:359595 5

Real-time bidding with multi-agent reinforcement learning in display advertising.Proceedings of the 27th ACM International Conference on Information and Knowledge Management. https://api.semanticscholar.org/CorpusID:359595 5
[20]

Auto-Encoding Variational Bayes

Diederik P. Kingma and Max Welling. 2013. Auto-encoding variational bayes. CoRR, abs/1312.6114. https://api.semanticscholar.org/CorpusID:216078090

work page internal anchor Pith review Pith/arXiv arXiv 2013
[21]

Ilya Kostrikov, Ashvin Nair, and Sergey Levine. 2021. Offline reinforcement learning with implicit q-learning.ArXiv, abs/2110.06169. https://api.semantics cholar.org/CorpusID:238634325

work page internal anchor Pith review arXiv 2021
[22]

Kumar, A

Aviral Kumar, Aurick Zhou, G. Tucker, and Sergey Levine. 2020. Conservative q-learning for offline reinforcement learning.ArXiv, abs/2006.04779. https://ap i.semanticscholar.org/CorpusID:219530894

work page arXiv 2020
[23]

Yann LeCun, Sumit Chopra, Raia Hadsell, Aurelio Ranzato, and Fu Jie Huang
[24]

In https://api.semanticscholar.org /CorpusID:8531544

A tutorial on energy-based learning. In https://api.semanticscholar.org /CorpusID:8531544
[25]

Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems

Sergey Levine, Aviral Kumar, G. Tucker, and Justin Fu. 2020. Offline reinforce- ment learning: tutorial, review, and perspectives on open problems.ArXiv, abs/2005.01643. https://api.semanticscholar.org/CorpusID:218486979

work page internal anchor Pith review arXiv 2020
[26]

Yewen Li, Shuai Mao, Jingtong Gao, Nan Jiang, Yunjian Xu, Qingpeng Cai, Fei Pan, Peng Jiang, and Bo An. 2024. Gas: generative auto-bidding with post- training search.Companion Proceedings of the ACM on Web Conference 2025. https://api.semanticscholar.org/CorpusID:274982740

2024
[27]

Yewen Li et al. 2025. Generative auto-bidding in large-scale competitive auc- tions via diffusion completer-aligner.ArXiv, abs/2509.03348. https://api.semant icscholar.org/CorpusID:281092272

work page arXiv 2025
[28]

Hongtao Lv, Zhilin Zhang, Zhenzhe Zheng, Jinghan Liu, Chuan Yu, Lei Liu, Li- zhen Cui, and Fan Wu. 2022. Utility maximizer or value maximizer: mechanism design for mixed bidders in online advertising. InAAAI Conference on Artificial Intelligence. https://api.semanticscholar.org/CorpusID:254069993

2022
[29]

Volodymyr Mnih et al. 2015. Human-level control through deep reinforcement learning.Nature, 518, 529–533. https://api.semanticscholar.org/CorpusID:2052 42740

2015
[30]

Zhiyu Mou, Yusen Huo, Rongquan Bai, Mingzhou Xie, Chuan Yu, Jian Xu, and Bo Zheng. 2022. Sustainable online reinforcement learning for auto-bidding. Advances in Neural Information Processing Systems, 35, 2651–2663

2022
[31]

Yunshan Peng, Wenzheng Shu, Jiahao Sun, Yanxiang Zeng, Jinan Pang, Wentao Bai, Yunke Bai, Xialong Liu, and Peng Jiang. 2025. Expert-guided diffusion planner for auto-bidding.Proceedings of the 34th ACM International Conference on Information and Knowledge Management. https://api.semanticscholar.org /CorpusID:280635520

2025
[32]

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, and Chelsea Finn. 2023. Direct preference optimization: your language model is secretly a reward model.ArXiv, abs/2305.18290. https://api.s emanticscholar.org/CorpusID:258959321

work page internal anchor Pith review arXiv 2023
[33]

Matthew Richardson, Ewa Dominowska, and Robert J. Ragno. 2007. Predicting clicks: estimating the click-through rate for new ads. InThe Web Conference. https://api.semanticscholar.org/CorpusID:14669618

2007
[34]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov
[35]

Proximal Policy Optimization Algorithms

Proximal policy optimization algorithms.ArXiv, abs/1707.06347. https: //api.semanticscholar.org/CorpusID:28695052

work page internal anchor Pith review Pith/arXiv arXiv
[36]

Kefan Su, Yusen Huo, Zhilin Zhang, Shuai Dou, Chuan Yu, Jian Xu, Zongqing Lu, and Bo Zheng. 2024. Auctionnet: a novel benchmark for decision-making in large-scale games.ArXiv, abs/2412.10798. https://api.semanticscholar.org /CorpusID:274776455

work page arXiv 2024
[37]

Sutton and Andrew G

Richard S. Sutton and Andrew G. Barto. 1998. Reinforcement learning: an introduction.IEEE Trans. Neural Networks, 9, 1054–1054. https://api.semanticsc holar.org/CorpusID:60035920

1998
[38]

Faraz Torabi, Garrett Warnell, and Peter Stone. 2018. Behavioral cloning from observation. InInternational Joint Conference on Artificial Intelligence. https://a pi.semanticscholar.org/CorpusID:23206414

2018
[39]

Gomez, Lukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. InNeural Information Processing Systems. https://api.semanticscholar.org /CorpusID:13756489

2017
[40]

Jun Wang and Shuai Yuan. 2015. Real-time bidding: a new frontier of com- putational advertising research.Proceedings of the Eighth ACM International Conference on Web Search and Data Mining. https://api.semanticscholar.org/Co rpusID:207221034

2015
[41]

Di Wu, Xiujun Chen, Xun Yang, Hao Wang, Qing Tan, Xiaoxun Zhang, and Kun Gai. 2018. Budget constrained bidding by model-free reinforcement learning in display advertising.Proceedings of the 27th ACM International Conference on Information and Knowledge Management. https://api.semanticscholar.org/Corp usID:3514342

2018
[42]

Yujian Ye, Dawei Qiu, Mingyang Sun, Dimitrios Papadaskalopoulos, and Goran Strbac. 2020. Deep reinforcement learning for strategic bidding in electricity markets.IEEE Transactions on Smart Grid, 11, 1343–1355. https://api.semantics cholar.org/CorpusID:202092779

2020
[43]

Mopo: Model-Based Ofﬂine Policy Optimization

Tianhe Yu, Garrett Thomas, Lantao Yu, Stefano Ermon, James Y. Zou, Sergey Levine, Chelsea Finn, and Tengyu Ma. 2020. Mopo: model-based offline policy optimization.ArXiv, abs/2005.13239. https://api.semanticscholar.org/CorpusID: 218900501

work page arXiv 2020
[44]

Haoqi Zhang, Junqi Jin, Zhenzhe Zheng, Fan Wu, Haiyang Xu, and Jian Xu
[45]

InProceedings of the 31st ACM International Conference on Information & Knowledge Management, 2539–2548

Control-based bidding for mobile livestreaming ads with exposure guar- antee. InProceedings of the 31st ACM International Conference on Information & Knowledge Management, 2539–2548
[46]

Weinan Zhang, Shuai Yuan, and Jun Wang. 2014. Optimal real-time bidding for display advertising.Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. https://api.semanticschola r.org/CorpusID:13623004

2014
[47]

By J. G. Ziegler and Nathaniel B. Nichols. 1942. Optimum settings for automatic controllers.Journal of Fluids Engineering. https://api.semanticscholar.org/Corp usID:41336178

1942