R&F-Inventory: A Large-Scale Dataset for Monotonic Inventory Estimation in Reach and Frequency Advertising

Jinan Pang; Ji Wu; Peng Jiang; Wentao Bai; Wenzheng Shu; Xialong Liu; Yanxiang Zeng; Yunke Bai; Yunshan Peng

arxiv: 2604.16821 · v1 · submitted 2026-04-18 · 💻 cs.LG

R&F-Inventory: A Large-Scale Dataset for Monotonic Inventory Estimation in Reach and Frequency Advertising

Yunshan Peng , Ji Wu , Wentao Bai , Yunke Bai , Jinan Pang , Wenzheng Shu , Yanxiang Zeng , Xialong Liu

show 1 more author

Peng Jiang

This is my paper

Pith reviewed 2026-05-10 06:46 UTC · model grok-4.3

classification 💻 cs.LG

keywords Reach and Frequency advertisingInventory estimationMonotonic regressionBudget-performance curveAdvertising datasetFrequency controlUV and PV metricsMonotonicity

0 comments

The pith

The paper releases a dataset that records unique visitor and page view metrics at multiple budget levels under fixed targeting, scheduling, and frequency controls to form complete monotonic budget-performance curves.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces and releases the R&F-Inventory dataset for monotonic inventory estimation in reach and frequency advertising. It supplies observations of unique visitors and page views at several budget points within the same targeting-scheduling-frequency control context, creating full budget-performance curves rather than isolated samples. The data incorporates time-window-based frequency limits such as no more than a set number of exposures within a period and exhibits monotonicity along with diminishing marginal returns. A derived theoretical maximum exposure ceiling serves as a check on data quality and model predictions, while two benchmark tasks are defined for single-point prediction and full curve reconstruction.

Core claim

The paper releases the R&F-Inventory dataset, which uses the R&F contract context consisting of targeting-scheduling-frequency control as the basic context, providing observations of UV and PV corresponding to multiple budget points within the same context, thus forming a complete budget-performance curve. The dataset explicitly includes a time-window-based frequency control mechanism and naturally satisfies the monotonicity and diminishing marginal returns characteristics in the budget and scheduling dimensions. It further derives the theoretical maximum exposure ceiling and uses it as a consistency check to evaluate data quality and the feasibility of model predictions, while defining two

What carries the argument

The R&F contract context of targeting-scheduling-frequency control, which generates multiple budget points to produce a complete budget-performance curve of UV and PV observations.

If this is right

The dataset supports systematic research on structural constraint learning for advertising models.
Monotonic regression techniques can be applied and tested for inventory estimation tasks.
Curve consistency modeling becomes feasible for R&F contract planning problems.
Reproducible baselines and evaluation protocols are available for single-point performance prediction and budget-performance curve reconstruction.
The theoretical exposure ceiling provides a built-in validation tool for new predictive models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The curves could be used to train optimization algorithms that allocate ad budgets more efficiently while respecting frequency limits.
Similar data structures might apply to other resource allocation settings with diminishing returns, such as supply chain or network capacity planning.
Extensions could test whether models trained on these curves generalize across different advertising platforms or time periods.
The consistency checks may inspire theoretical bounds for monotonic functions in related machine learning applications.

Load-bearing premise

The collected data accurately reflects real-world R&F contract dynamics without platform-specific biases or sampling artifacts that would invalidate the monotonicity or the theoretical maximum exposure ceiling as a consistency check.

What would settle it

Observation of any budget-performance curve in the released dataset where unique visitors or page views decrease as budget increases, or where any value exceeds the derived theoretical maximum exposure ceiling.

Figures

Figures reproduced from arXiv: 2604.16821 by Jinan Pang, Ji Wu, Peng Jiang, Wentao Bai, Wenzheng Shu, Xialong Liu, Yanxiang Zeng, Yunke Bai, Yunshan Peng.

**Figure 2.** Figure 2: Schematic diagram of dataset splitting methods [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

read the original abstract

Reach and Frequency (R&F) contract advertising is an important form of widely used brand advertising. Unlike performance advertising, R&F contracts emphasize controllable delivery of UV and PV under given targeting, scheduling, and frequency control constraints. In practical systems, advertisers typically need to view the UV, PV change curves at different budget levels in real time when creating an R&F contract. However, most existing publicly available advertising datasets are based on independent samples, lacking a characterization of the core structure of the "budget-performance curve" (including UV and PV) in R&F contracts.This paper proposes and releases a large-scale R&F contract inventory estimation dataset. This dataset uses the R&F contract context consisting of "targeting-scheduling-frequency control" as the basic context, providing observations of UV and PV corresponding to multiple budget points within the same context, thus forming a complete budget-performance curve. The dataset explicitly includes a time-window-based frequency control mechanism (e.g.,"no more than 3 times within 5 days") and naturally satisfies the monotonicity and diminishing marginal returns characteristics in the budget and scheduling dimensions. We further derive the theoretical maximum exposure ceiling and use it as a consistency check to evaluate data quality and the feasibility of model predictions. Using this data set, this paper defines two standardized benchmark tasks: single-point performance prediction and reconstruction of budget-performance curves, and provides a set of reproducible baseline methods and evaluation protocols. This dataset can support systematic research on problems such as structural constraint learning, monotonic regression, curve consistency modeling, and R&F contract planning.The code for our experiments can be found at https://github.com/pengyunshan/RF-Inventory.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper releases a dataset of multi-budget curves for R&F ad contracts that includes frequency controls and a simple consistency check.

read the letter

The core of this paper is a dataset release that gives complete budget-performance curves instead of independent samples. For each fixed context of targeting, scheduling, and frequency control, it supplies UV and PV observations at several budget levels, plus an explicit time-window frequency cap. That structure matches what R&F systems actually need when advertisers want to see how performance changes with spend under the same constraints. They also derive a theoretical maximum exposure ceiling and use it to flag data quality issues, which is a straightforward and useful addition for validation. The paper defines two clean benchmark tasks—single-point prediction and full curve reconstruction—and ships code with baselines, so the resource is immediately usable. These elements are the real contribution: observed curves that respect monotonicity and diminishing returns by construction. The soft spots are limited. Collection details and filtering rules are not fully expanded in the abstract, so it is not yet clear how much platform-specific sampling might affect the curves or the ceiling check. The baselines are basic and not heavily benchmarked against prior methods, which keeps the immediate methodological advance modest. This work is for researchers in advertising systems and constrained learning who need real curve data to test monotonic regression or inventory planning models. A reader focused on operations research or ad tech would get direct value from the data and protocols. It deserves peer review because the dataset fills a documented gap with observable properties rather than fitted claims, and the community can evaluate the collection process once the full methods are available.

Referee Report

0 major / 2 minor

Summary. The manuscript introduces and releases R&F-Inventory, a large-scale dataset for reach and frequency (R&F) advertising. It structures observations around fixed 'targeting-scheduling-frequency control' contexts, supplying UV and PV values at multiple budget levels to form complete budget-performance curves. The dataset incorporates time-window-based frequency capping and is presented as naturally exhibiting monotonicity and diminishing marginal returns; a derived theoretical maximum exposure ceiling serves as a post-collection consistency check. Two standardized benchmark tasks are defined (single-point performance prediction and full curve reconstruction) together with reproducible baseline methods and evaluation protocols. Code is released at the provided GitHub repository.

Significance. If the collected curves accurately reflect real R&F contract dynamics, the dataset supplies a missing public resource for research on structural constraint learning, monotonic regression, and budget-aware planning in brand advertising. The multi-budget-per-context design directly supports curve-level modeling that independent-sample datasets cannot, and the theoretical ceiling provides an explicit, falsifiable consistency mechanism. Releasing data, code, and baseline implementations is a clear strength that lowers the barrier for follow-on work.

minor comments (2)

[Introduction / Data Description] The abstract and introduction assert that the data 'naturally satisfies' monotonicity and diminishing returns; a short quantitative verification (e.g., percentage of curves violating monotonicity or average second differences) in the data-description section would make this claim immediately verifiable without requiring readers to download and inspect the full release.
[Methods / Consistency Check] The derivation of the theoretical maximum exposure ceiling is referenced as a consistency check, but the manuscript would benefit from an explicit formula or short derivation (even if placed in an appendix) so that the check can be reproduced independently of the released data files.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive and constructive review of our manuscript on the R&F-Inventory dataset. We appreciate the recognition of the dataset's design around fixed targeting-scheduling-frequency contexts, the provision of complete multi-budget UV/PV curves, the built-in monotonicity and diminishing marginal returns, the theoretical exposure ceiling for quality control, and the standardized benchmarks with baselines. The recommendation for minor revision is noted, and we will incorporate any suggested improvements accordingly.

Circularity Check

0 steps flagged

No significant circularity in dataset release and validation

full rationale

The paper releases observed data forming budget-performance curves under fixed targeting-scheduling-frequency contexts, with natural monotonicity and diminishing returns as empirical properties of the collection process. The theoretical maximum exposure ceiling is derived independently as a post-hoc consistency check for data quality, not as a fitted prediction or load-bearing derivation that reduces to the inputs. No self-citations, ansatzes, or uniqueness theorems are invoked to support central claims; benchmark tasks are defined directly on the released observations without circular reduction. The derivation chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the existence and quality of the collected advertising data plus the correctness of the derived theoretical maximum exposure ceiling. No free parameters, ad-hoc axioms, or new invented entities are introduced in the abstract.

pith-pipeline@v0.9.0 · 5639 in / 1153 out tokens · 49014 ms · 2026-05-10T06:46:27.901195+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages

[1]

Deepak Agarwal, Bee-Chung Chen, and Prabhakar Elango. 2014. Budget Op- timization for Sponsored Search. InProceedings of the 23rd International World Wide Web Conference (WWW). 123–134

work page 2014
[2]

Avazu Inc. 2014. Avazu Click-Through Rate Prediction Dataset. https://www. kaggle.com/c/avazu-ctr-prediction. Accessed: 2025-01

work page 2014
[3]

Bartholomew

David J. Bartholomew. 1972.Isotonic Inference for Ordered Data. Wiley

work page 1972
[4]

Christian Borgs, Jennifer Chayes, Nicole Immorlica, Mohammad Mahdian, and Amin Saberi. 2007. Dynamics of Bid Optimization in Online Advertisement Auctions. InProceedings of the 16th International World Wide Web Conference (WWW). 531–540

work page 2007
[5]

Criteo Labs. 2014. Criteo Display Advertising Challenge Dataset. https://labs. criteo.com/2014/02/kaggle-display-advertising-challenge-dataset/. Accessed: 2025-01

work page 2014
[6]

Friedman

Jerome H. Friedman. 2001. Greedy Function Approximation: A Gradient Boosting Machine.Annals of Statistics29, 5 (2001), 1189–1232

work page 2001
[7]

Google. 2023. Google Ads Reach Planner. https://support.google.com/google- ads/answer/9272990. Accessed: 2025-01

work page arXiv 2023
[8]

2020.KDD Cup 2020 Challenges for Modern E-Commerce Platform: Debiasing Dataset

Alibaba Group. 2020.KDD Cup 2020 Challenges for Modern E-Commerce Platform: Debiasing Dataset

work page 2020
[9]

Akhil Gupta, Naman Shukla, Lavanya Marla, Arinbjörn Kolbeinsson, and Kartik Yellepeddi. 2019. How to incorporate monotonicity in deep networks while preserving flexibility?arXiv preprint arXiv:1909.10662(2019)

work page arXiv 2019
[10]

Gupta, Andrew Cotter, Jan Pfeifer, and Konstantin Voevodski

Maya R. Gupta, Andrew Cotter, Jan Pfeifer, and Konstantin Voevodski. 2016. Monotonic Calibrated Interpolated Look-Up Tables.Journal of Machine Learning Research17, 109 (2016), 1–47

work page 2016
[11]

Christian Igel. 2023. Smooth min-max monotonic networks.arXiv preprint arXiv:2306.01147(2023)

work page arXiv 2023
[12]

Kaggle. 2020. M5 Forecasting – Accuracy. https://www.kaggle.com/competitions/ m5-forecasting-accuracy. Retail demand forecasting benchmark

work page 2020
[13]

Meta Platforms

Inc. Meta Platforms. 2026. Meta Ad Library Public Dataset. https://www.facebook. com/ads/library/. Accessed on 2026-02-03; A comprehensive public database of advertisements running on Meta platforms (Facebook, Instagram, Messenger, etc.), containing ad creatives, spend ranges, impression ranges, and advertiser information, suitable for advertising transpa...

work page 2026
[14]

Meta Platforms, Inc. 2023. Reach and Frequency Buying. https://www.facebook. com/business/help/321639094882292. Accessed: 2025-01

work page arXiv 2023
[15]

Joseph Sill. 1997. Monotonic networks.Advances in neural information processing systems10 (1997)

work page 1997
[16]

Joseph Sill and Yaser Abu-Mostafa. 1996. Monotonicity hints.Advances in neural information processing systems9 (1996)

work page 1996
[17]

2018.Taobao Display Ad Click-Through Rate Prediction Dataset

Tianchi. 2018.Taobao Display Ad Click-Through Rate Prediction Dataset

work page 2018
[18]

Jon Vaver and Jon Koehler. 2011. Predictive Modeling of Advertising Reach. Journal of Advertising Research51, 1 (2011), 263–276

work page 2011
[19]

Seungil You, David Ding, Kevin Canini, Jan Pfeifer, and Maya Gupta. 2017. Deep lattice networks and partial monotonic functions.Advances in neural information processing systems30 (2017)

work page 2017
[20]

Shuai Yuan, Jun Wang, and Xiaoxue Zhao. 2013. Real-Time Bidding with Multi- Dimensional Budget Constraints. InProceedings of the 19th ACM SIGKDD Inter- national Conference on Knowledge Discovery and Data Mining (KDD). 239–247

work page 2013

[1] [1]

Deepak Agarwal, Bee-Chung Chen, and Prabhakar Elango. 2014. Budget Op- timization for Sponsored Search. InProceedings of the 23rd International World Wide Web Conference (WWW). 123–134

work page 2014

[2] [2]

Avazu Inc. 2014. Avazu Click-Through Rate Prediction Dataset. https://www. kaggle.com/c/avazu-ctr-prediction. Accessed: 2025-01

work page 2014

[3] [3]

Bartholomew

David J. Bartholomew. 1972.Isotonic Inference for Ordered Data. Wiley

work page 1972

[4] [4]

Christian Borgs, Jennifer Chayes, Nicole Immorlica, Mohammad Mahdian, and Amin Saberi. 2007. Dynamics of Bid Optimization in Online Advertisement Auctions. InProceedings of the 16th International World Wide Web Conference (WWW). 531–540

work page 2007

[5] [5]

Criteo Labs. 2014. Criteo Display Advertising Challenge Dataset. https://labs. criteo.com/2014/02/kaggle-display-advertising-challenge-dataset/. Accessed: 2025-01

work page 2014

[6] [6]

Friedman

Jerome H. Friedman. 2001. Greedy Function Approximation: A Gradient Boosting Machine.Annals of Statistics29, 5 (2001), 1189–1232

work page 2001

[7] [7]

Google. 2023. Google Ads Reach Planner. https://support.google.com/google- ads/answer/9272990. Accessed: 2025-01

work page arXiv 2023

[8] [8]

2020.KDD Cup 2020 Challenges for Modern E-Commerce Platform: Debiasing Dataset

Alibaba Group. 2020.KDD Cup 2020 Challenges for Modern E-Commerce Platform: Debiasing Dataset

work page 2020

[9] [9]

Akhil Gupta, Naman Shukla, Lavanya Marla, Arinbjörn Kolbeinsson, and Kartik Yellepeddi. 2019. How to incorporate monotonicity in deep networks while preserving flexibility?arXiv preprint arXiv:1909.10662(2019)

work page arXiv 2019

[10] [10]

Gupta, Andrew Cotter, Jan Pfeifer, and Konstantin Voevodski

Maya R. Gupta, Andrew Cotter, Jan Pfeifer, and Konstantin Voevodski. 2016. Monotonic Calibrated Interpolated Look-Up Tables.Journal of Machine Learning Research17, 109 (2016), 1–47

work page 2016

[11] [11]

Christian Igel. 2023. Smooth min-max monotonic networks.arXiv preprint arXiv:2306.01147(2023)

work page arXiv 2023

[12] [12]

Kaggle. 2020. M5 Forecasting – Accuracy. https://www.kaggle.com/competitions/ m5-forecasting-accuracy. Retail demand forecasting benchmark

work page 2020

[13] [13]

Meta Platforms

Inc. Meta Platforms. 2026. Meta Ad Library Public Dataset. https://www.facebook. com/ads/library/. Accessed on 2026-02-03; A comprehensive public database of advertisements running on Meta platforms (Facebook, Instagram, Messenger, etc.), containing ad creatives, spend ranges, impression ranges, and advertiser information, suitable for advertising transpa...

work page 2026

[14] [14]

Meta Platforms, Inc. 2023. Reach and Frequency Buying. https://www.facebook. com/business/help/321639094882292. Accessed: 2025-01

work page arXiv 2023

[15] [15]

Joseph Sill. 1997. Monotonic networks.Advances in neural information processing systems10 (1997)

work page 1997

[16] [16]

Joseph Sill and Yaser Abu-Mostafa. 1996. Monotonicity hints.Advances in neural information processing systems9 (1996)

work page 1996

[17] [17]

2018.Taobao Display Ad Click-Through Rate Prediction Dataset

Tianchi. 2018.Taobao Display Ad Click-Through Rate Prediction Dataset

work page 2018

[18] [18]

Jon Vaver and Jon Koehler. 2011. Predictive Modeling of Advertising Reach. Journal of Advertising Research51, 1 (2011), 263–276

work page 2011

[19] [19]

Seungil You, David Ding, Kevin Canini, Jan Pfeifer, and Maya Gupta. 2017. Deep lattice networks and partial monotonic functions.Advances in neural information processing systems30 (2017)

work page 2017

[20] [20]

Shuai Yuan, Jun Wang, and Xiaoxue Zhao. 2013. Real-Time Bidding with Multi- Dimensional Budget Constraints. InProceedings of the 19th ACM SIGKDD Inter- national Conference on Knowledge Discovery and Data Mining (KDD). 239–247

work page 2013