R&F-Inventory: A Large-Scale Dataset for Monotonic Inventory Estimation in Reach and Frequency Advertising
Pith reviewed 2026-05-10 06:46 UTC · model grok-4.3
The pith
The paper releases a dataset that records unique visitor and page view metrics at multiple budget levels under fixed targeting, scheduling, and frequency controls to form complete monotonic budget-performance curves.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper releases the R&F-Inventory dataset, which uses the R&F contract context consisting of targeting-scheduling-frequency control as the basic context, providing observations of UV and PV corresponding to multiple budget points within the same context, thus forming a complete budget-performance curve. The dataset explicitly includes a time-window-based frequency control mechanism and naturally satisfies the monotonicity and diminishing marginal returns characteristics in the budget and scheduling dimensions. It further derives the theoretical maximum exposure ceiling and uses it as a consistency check to evaluate data quality and the feasibility of model predictions, while defining two
What carries the argument
The R&F contract context of targeting-scheduling-frequency control, which generates multiple budget points to produce a complete budget-performance curve of UV and PV observations.
If this is right
- The dataset supports systematic research on structural constraint learning for advertising models.
- Monotonic regression techniques can be applied and tested for inventory estimation tasks.
- Curve consistency modeling becomes feasible for R&F contract planning problems.
- Reproducible baselines and evaluation protocols are available for single-point performance prediction and budget-performance curve reconstruction.
- The theoretical exposure ceiling provides a built-in validation tool for new predictive models.
Where Pith is reading between the lines
- The curves could be used to train optimization algorithms that allocate ad budgets more efficiently while respecting frequency limits.
- Similar data structures might apply to other resource allocation settings with diminishing returns, such as supply chain or network capacity planning.
- Extensions could test whether models trained on these curves generalize across different advertising platforms or time periods.
- The consistency checks may inspire theoretical bounds for monotonic functions in related machine learning applications.
Load-bearing premise
The collected data accurately reflects real-world R&F contract dynamics without platform-specific biases or sampling artifacts that would invalidate the monotonicity or the theoretical maximum exposure ceiling as a consistency check.
What would settle it
Observation of any budget-performance curve in the released dataset where unique visitors or page views decrease as budget increases, or where any value exceeds the derived theoretical maximum exposure ceiling.
Figures
read the original abstract
Reach and Frequency (R&F) contract advertising is an important form of widely used brand advertising. Unlike performance advertising, R&F contracts emphasize controllable delivery of UV and PV under given targeting, scheduling, and frequency control constraints. In practical systems, advertisers typically need to view the UV, PV change curves at different budget levels in real time when creating an R&F contract. However, most existing publicly available advertising datasets are based on independent samples, lacking a characterization of the core structure of the "budget-performance curve" (including UV and PV) in R&F contracts.This paper proposes and releases a large-scale R&F contract inventory estimation dataset. This dataset uses the R&F contract context consisting of "targeting-scheduling-frequency control" as the basic context, providing observations of UV and PV corresponding to multiple budget points within the same context, thus forming a complete budget-performance curve. The dataset explicitly includes a time-window-based frequency control mechanism (e.g.,"no more than 3 times within 5 days") and naturally satisfies the monotonicity and diminishing marginal returns characteristics in the budget and scheduling dimensions. We further derive the theoretical maximum exposure ceiling and use it as a consistency check to evaluate data quality and the feasibility of model predictions. Using this data set, this paper defines two standardized benchmark tasks: single-point performance prediction and reconstruction of budget-performance curves, and provides a set of reproducible baseline methods and evaluation protocols. This dataset can support systematic research on problems such as structural constraint learning, monotonic regression, curve consistency modeling, and R&F contract planning.The code for our experiments can be found at https://github.com/pengyunshan/RF-Inventory.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces and releases R&F-Inventory, a large-scale dataset for reach and frequency (R&F) advertising. It structures observations around fixed 'targeting-scheduling-frequency control' contexts, supplying UV and PV values at multiple budget levels to form complete budget-performance curves. The dataset incorporates time-window-based frequency capping and is presented as naturally exhibiting monotonicity and diminishing marginal returns; a derived theoretical maximum exposure ceiling serves as a post-collection consistency check. Two standardized benchmark tasks are defined (single-point performance prediction and full curve reconstruction) together with reproducible baseline methods and evaluation protocols. Code is released at the provided GitHub repository.
Significance. If the collected curves accurately reflect real R&F contract dynamics, the dataset supplies a missing public resource for research on structural constraint learning, monotonic regression, and budget-aware planning in brand advertising. The multi-budget-per-context design directly supports curve-level modeling that independent-sample datasets cannot, and the theoretical ceiling provides an explicit, falsifiable consistency mechanism. Releasing data, code, and baseline implementations is a clear strength that lowers the barrier for follow-on work.
minor comments (2)
- [Introduction / Data Description] The abstract and introduction assert that the data 'naturally satisfies' monotonicity and diminishing returns; a short quantitative verification (e.g., percentage of curves violating monotonicity or average second differences) in the data-description section would make this claim immediately verifiable without requiring readers to download and inspect the full release.
- [Methods / Consistency Check] The derivation of the theoretical maximum exposure ceiling is referenced as a consistency check, but the manuscript would benefit from an explicit formula or short derivation (even if placed in an appendix) so that the check can be reproduced independently of the released data files.
Simulated Author's Rebuttal
We thank the referee for the positive and constructive review of our manuscript on the R&F-Inventory dataset. We appreciate the recognition of the dataset's design around fixed targeting-scheduling-frequency contexts, the provision of complete multi-budget UV/PV curves, the built-in monotonicity and diminishing marginal returns, the theoretical exposure ceiling for quality control, and the standardized benchmarks with baselines. The recommendation for minor revision is noted, and we will incorporate any suggested improvements accordingly.
Circularity Check
No significant circularity in dataset release and validation
full rationale
The paper releases observed data forming budget-performance curves under fixed targeting-scheduling-frequency contexts, with natural monotonicity and diminishing returns as empirical properties of the collection process. The theoretical maximum exposure ceiling is derived independently as a post-hoc consistency check for data quality, not as a fitted prediction or load-bearing derivation that reduces to the inputs. No self-citations, ansatzes, or uniqueness theorems are invoked to support central claims; benchmark tasks are defined directly on the released observations without circular reduction. The derivation chain is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Deepak Agarwal, Bee-Chung Chen, and Prabhakar Elango. 2014. Budget Op- timization for Sponsored Search. InProceedings of the 23rd International World Wide Web Conference (WWW). 123–134
work page 2014
-
[2]
Avazu Inc. 2014. Avazu Click-Through Rate Prediction Dataset. https://www. kaggle.com/c/avazu-ctr-prediction. Accessed: 2025-01
work page 2014
- [3]
-
[4]
Christian Borgs, Jennifer Chayes, Nicole Immorlica, Mohammad Mahdian, and Amin Saberi. 2007. Dynamics of Bid Optimization in Online Advertisement Auctions. InProceedings of the 16th International World Wide Web Conference (WWW). 531–540
work page 2007
-
[5]
Criteo Labs. 2014. Criteo Display Advertising Challenge Dataset. https://labs. criteo.com/2014/02/kaggle-display-advertising-challenge-dataset/. Accessed: 2025-01
work page 2014
- [6]
- [7]
-
[8]
2020.KDD Cup 2020 Challenges for Modern E-Commerce Platform: Debiasing Dataset
Alibaba Group. 2020.KDD Cup 2020 Challenges for Modern E-Commerce Platform: Debiasing Dataset
work page 2020
- [9]
-
[10]
Gupta, Andrew Cotter, Jan Pfeifer, and Konstantin Voevodski
Maya R. Gupta, Andrew Cotter, Jan Pfeifer, and Konstantin Voevodski. 2016. Monotonic Calibrated Interpolated Look-Up Tables.Journal of Machine Learning Research17, 109 (2016), 1–47
work page 2016
- [11]
-
[12]
Kaggle. 2020. M5 Forecasting – Accuracy. https://www.kaggle.com/competitions/ m5-forecasting-accuracy. Retail demand forecasting benchmark
work page 2020
-
[13]
Inc. Meta Platforms. 2026. Meta Ad Library Public Dataset. https://www.facebook. com/ads/library/. Accessed on 2026-02-03; A comprehensive public database of advertisements running on Meta platforms (Facebook, Instagram, Messenger, etc.), containing ad creatives, spend ranges, impression ranges, and advertiser information, suitable for advertising transpa...
work page 2026
- [14]
-
[15]
Joseph Sill. 1997. Monotonic networks.Advances in neural information processing systems10 (1997)
work page 1997
-
[16]
Joseph Sill and Yaser Abu-Mostafa. 1996. Monotonicity hints.Advances in neural information processing systems9 (1996)
work page 1996
-
[17]
2018.Taobao Display Ad Click-Through Rate Prediction Dataset
Tianchi. 2018.Taobao Display Ad Click-Through Rate Prediction Dataset
work page 2018
-
[18]
Jon Vaver and Jon Koehler. 2011. Predictive Modeling of Advertising Reach. Journal of Advertising Research51, 1 (2011), 263–276
work page 2011
-
[19]
Seungil You, David Ding, Kevin Canini, Jan Pfeifer, and Maya Gupta. 2017. Deep lattice networks and partial monotonic functions.Advances in neural information processing systems30 (2017)
work page 2017
-
[20]
Shuai Yuan, Jun Wang, and Xiaoxue Zhao. 2013. Real-Time Bidding with Multi- Dimensional Budget Constraints. InProceedings of the 19th ACM SIGKDD Inter- national Conference on Knowledge Discovery and Data Mining (KDD). 239–247
work page 2013
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.