pith. sign in

arxiv: 2604.16821 · v1 · submitted 2026-04-18 · 💻 cs.LG

R&F-Inventory: A Large-Scale Dataset for Monotonic Inventory Estimation in Reach and Frequency Advertising

Pith reviewed 2026-05-10 06:46 UTC · model grok-4.3

classification 💻 cs.LG
keywords Reach and Frequency advertisingInventory estimationMonotonic regressionBudget-performance curveAdvertising datasetFrequency controlUV and PV metricsMonotonicity
0
0 comments X

The pith

The paper releases a dataset that records unique visitor and page view metrics at multiple budget levels under fixed targeting, scheduling, and frequency controls to form complete monotonic budget-performance curves.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces and releases the R&F-Inventory dataset for monotonic inventory estimation in reach and frequency advertising. It supplies observations of unique visitors and page views at several budget points within the same targeting-scheduling-frequency control context, creating full budget-performance curves rather than isolated samples. The data incorporates time-window-based frequency limits such as no more than a set number of exposures within a period and exhibits monotonicity along with diminishing marginal returns. A derived theoretical maximum exposure ceiling serves as a check on data quality and model predictions, while two benchmark tasks are defined for single-point prediction and full curve reconstruction.

Core claim

The paper releases the R&F-Inventory dataset, which uses the R&F contract context consisting of targeting-scheduling-frequency control as the basic context, providing observations of UV and PV corresponding to multiple budget points within the same context, thus forming a complete budget-performance curve. The dataset explicitly includes a time-window-based frequency control mechanism and naturally satisfies the monotonicity and diminishing marginal returns characteristics in the budget and scheduling dimensions. It further derives the theoretical maximum exposure ceiling and uses it as a consistency check to evaluate data quality and the feasibility of model predictions, while defining two

What carries the argument

The R&F contract context of targeting-scheduling-frequency control, which generates multiple budget points to produce a complete budget-performance curve of UV and PV observations.

If this is right

  • The dataset supports systematic research on structural constraint learning for advertising models.
  • Monotonic regression techniques can be applied and tested for inventory estimation tasks.
  • Curve consistency modeling becomes feasible for R&F contract planning problems.
  • Reproducible baselines and evaluation protocols are available for single-point performance prediction and budget-performance curve reconstruction.
  • The theoretical exposure ceiling provides a built-in validation tool for new predictive models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The curves could be used to train optimization algorithms that allocate ad budgets more efficiently while respecting frequency limits.
  • Similar data structures might apply to other resource allocation settings with diminishing returns, such as supply chain or network capacity planning.
  • Extensions could test whether models trained on these curves generalize across different advertising platforms or time periods.
  • The consistency checks may inspire theoretical bounds for monotonic functions in related machine learning applications.

Load-bearing premise

The collected data accurately reflects real-world R&F contract dynamics without platform-specific biases or sampling artifacts that would invalidate the monotonicity or the theoretical maximum exposure ceiling as a consistency check.

What would settle it

Observation of any budget-performance curve in the released dataset where unique visitors or page views decrease as budget increases, or where any value exceeds the derived theoretical maximum exposure ceiling.

Figures

Figures reproduced from arXiv: 2604.16821 by Jinan Pang, Ji Wu, Peng Jiang, Wentao Bai, Wenzheng Shu, Xialong Liu, Yanxiang Zeng, Yunke Bai, Yunshan Peng.

Figure 1
Figure 1. Figure 1: Simulation diagram of a single user inserting a CPM [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Schematic diagram of dataset splitting methods [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
read the original abstract

Reach and Frequency (R&F) contract advertising is an important form of widely used brand advertising. Unlike performance advertising, R&F contracts emphasize controllable delivery of UV and PV under given targeting, scheduling, and frequency control constraints. In practical systems, advertisers typically need to view the UV, PV change curves at different budget levels in real time when creating an R&F contract. However, most existing publicly available advertising datasets are based on independent samples, lacking a characterization of the core structure of the "budget-performance curve" (including UV and PV) in R&F contracts.This paper proposes and releases a large-scale R&F contract inventory estimation dataset. This dataset uses the R&F contract context consisting of "targeting-scheduling-frequency control" as the basic context, providing observations of UV and PV corresponding to multiple budget points within the same context, thus forming a complete budget-performance curve. The dataset explicitly includes a time-window-based frequency control mechanism (e.g.,"no more than 3 times within 5 days") and naturally satisfies the monotonicity and diminishing marginal returns characteristics in the budget and scheduling dimensions. We further derive the theoretical maximum exposure ceiling and use it as a consistency check to evaluate data quality and the feasibility of model predictions. Using this data set, this paper defines two standardized benchmark tasks: single-point performance prediction and reconstruction of budget-performance curves, and provides a set of reproducible baseline methods and evaluation protocols. This dataset can support systematic research on problems such as structural constraint learning, monotonic regression, curve consistency modeling, and R&F contract planning.The code for our experiments can be found at https://github.com/pengyunshan/RF-Inventory.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript introduces and releases R&F-Inventory, a large-scale dataset for reach and frequency (R&F) advertising. It structures observations around fixed 'targeting-scheduling-frequency control' contexts, supplying UV and PV values at multiple budget levels to form complete budget-performance curves. The dataset incorporates time-window-based frequency capping and is presented as naturally exhibiting monotonicity and diminishing marginal returns; a derived theoretical maximum exposure ceiling serves as a post-collection consistency check. Two standardized benchmark tasks are defined (single-point performance prediction and full curve reconstruction) together with reproducible baseline methods and evaluation protocols. Code is released at the provided GitHub repository.

Significance. If the collected curves accurately reflect real R&F contract dynamics, the dataset supplies a missing public resource for research on structural constraint learning, monotonic regression, and budget-aware planning in brand advertising. The multi-budget-per-context design directly supports curve-level modeling that independent-sample datasets cannot, and the theoretical ceiling provides an explicit, falsifiable consistency mechanism. Releasing data, code, and baseline implementations is a clear strength that lowers the barrier for follow-on work.

minor comments (2)
  1. [Introduction / Data Description] The abstract and introduction assert that the data 'naturally satisfies' monotonicity and diminishing returns; a short quantitative verification (e.g., percentage of curves violating monotonicity or average second differences) in the data-description section would make this claim immediately verifiable without requiring readers to download and inspect the full release.
  2. [Methods / Consistency Check] The derivation of the theoretical maximum exposure ceiling is referenced as a consistency check, but the manuscript would benefit from an explicit formula or short derivation (even if placed in an appendix) so that the check can be reproduced independently of the released data files.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive and constructive review of our manuscript on the R&F-Inventory dataset. We appreciate the recognition of the dataset's design around fixed targeting-scheduling-frequency contexts, the provision of complete multi-budget UV/PV curves, the built-in monotonicity and diminishing marginal returns, the theoretical exposure ceiling for quality control, and the standardized benchmarks with baselines. The recommendation for minor revision is noted, and we will incorporate any suggested improvements accordingly.

Circularity Check

0 steps flagged

No significant circularity in dataset release and validation

full rationale

The paper releases observed data forming budget-performance curves under fixed targeting-scheduling-frequency contexts, with natural monotonicity and diminishing returns as empirical properties of the collection process. The theoretical maximum exposure ceiling is derived independently as a post-hoc consistency check for data quality, not as a fitted prediction or load-bearing derivation that reduces to the inputs. No self-citations, ansatzes, or uniqueness theorems are invoked to support central claims; benchmark tasks are defined directly on the released observations without circular reduction. The derivation chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the existence and quality of the collected advertising data plus the correctness of the derived theoretical maximum exposure ceiling. No free parameters, ad-hoc axioms, or new invented entities are introduced in the abstract.

pith-pipeline@v0.9.0 · 5639 in / 1153 out tokens · 49014 ms · 2026-05-10T06:46:27.901195+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages

  1. [1]

    Deepak Agarwal, Bee-Chung Chen, and Prabhakar Elango. 2014. Budget Op- timization for Sponsored Search. InProceedings of the 23rd International World Wide Web Conference (WWW). 123–134

  2. [2]

    Avazu Inc. 2014. Avazu Click-Through Rate Prediction Dataset. https://www. kaggle.com/c/avazu-ctr-prediction. Accessed: 2025-01

  3. [3]

    Bartholomew

    David J. Bartholomew. 1972.Isotonic Inference for Ordered Data. Wiley

  4. [4]

    Christian Borgs, Jennifer Chayes, Nicole Immorlica, Mohammad Mahdian, and Amin Saberi. 2007. Dynamics of Bid Optimization in Online Advertisement Auctions. InProceedings of the 16th International World Wide Web Conference (WWW). 531–540

  5. [5]

    Criteo Labs. 2014. Criteo Display Advertising Challenge Dataset. https://labs. criteo.com/2014/02/kaggle-display-advertising-challenge-dataset/. Accessed: 2025-01

  6. [6]

    Friedman

    Jerome H. Friedman. 2001. Greedy Function Approximation: A Gradient Boosting Machine.Annals of Statistics29, 5 (2001), 1189–1232

  7. [7]

    Google. 2023. Google Ads Reach Planner. https://support.google.com/google- ads/answer/9272990. Accessed: 2025-01

  8. [8]

    2020.KDD Cup 2020 Challenges for Modern E-Commerce Platform: Debiasing Dataset

    Alibaba Group. 2020.KDD Cup 2020 Challenges for Modern E-Commerce Platform: Debiasing Dataset

  9. [9]

    Akhil Gupta, Naman Shukla, Lavanya Marla, Arinbjörn Kolbeinsson, and Kartik Yellepeddi. 2019. How to incorporate monotonicity in deep networks while preserving flexibility?arXiv preprint arXiv:1909.10662(2019)

  10. [10]

    Gupta, Andrew Cotter, Jan Pfeifer, and Konstantin Voevodski

    Maya R. Gupta, Andrew Cotter, Jan Pfeifer, and Konstantin Voevodski. 2016. Monotonic Calibrated Interpolated Look-Up Tables.Journal of Machine Learning Research17, 109 (2016), 1–47

  11. [11]

    Christian Igel. 2023. Smooth min-max monotonic networks.arXiv preprint arXiv:2306.01147(2023)

  12. [12]

    Kaggle. 2020. M5 Forecasting – Accuracy. https://www.kaggle.com/competitions/ m5-forecasting-accuracy. Retail demand forecasting benchmark

  13. [13]

    Meta Platforms

    Inc. Meta Platforms. 2026. Meta Ad Library Public Dataset. https://www.facebook. com/ads/library/. Accessed on 2026-02-03; A comprehensive public database of advertisements running on Meta platforms (Facebook, Instagram, Messenger, etc.), containing ad creatives, spend ranges, impression ranges, and advertiser information, suitable for advertising transpa...

  14. [14]

    Meta Platforms, Inc. 2023. Reach and Frequency Buying. https://www.facebook. com/business/help/321639094882292. Accessed: 2025-01

  15. [15]

    Joseph Sill. 1997. Monotonic networks.Advances in neural information processing systems10 (1997)

  16. [16]

    Joseph Sill and Yaser Abu-Mostafa. 1996. Monotonicity hints.Advances in neural information processing systems9 (1996)

  17. [17]

    2018.Taobao Display Ad Click-Through Rate Prediction Dataset

    Tianchi. 2018.Taobao Display Ad Click-Through Rate Prediction Dataset

  18. [18]

    Jon Vaver and Jon Koehler. 2011. Predictive Modeling of Advertising Reach. Journal of Advertising Research51, 1 (2011), 263–276

  19. [19]

    Seungil You, David Ding, Kevin Canini, Jan Pfeifer, and Maya Gupta. 2017. Deep lattice networks and partial monotonic functions.Advances in neural information processing systems30 (2017)

  20. [20]

    Shuai Yuan, Jun Wang, and Xiaoxue Zhao. 2013. Real-Time Bidding with Multi- Dimensional Budget Constraints. InProceedings of the 19th ACM SIGKDD Inter- national Conference on Knowledge Discovery and Data Mining (KDD). 239–247