pith. sign in

arxiv: 2605.21736 · v1 · pith:QKLZHCKLnew · submitted 2026-05-20 · 📊 stat.ML · cs.AI· cs.LG

Support-aware offline policy selection for advertising marketplaces

Pith reviewed 2026-05-22 08:33 UTC · model grok-4.3

classification 📊 stat.ML cs.AIcs.LG
keywords offline policy selectionadvertising marketplacesreserve pricesupport estimationoff-policy evaluationregret certificationreplay evaluation
0
0 comments X

The pith

A support-aware framework turns logged auction data into certified reserve-policy decisions rather than point estimates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Logged advertising auctions let marketplaces evaluate reserve-price policies offline, yet standard replay often overstates gains when data support is thin or uncertainty is ignored. This paper builds a decision framework that processes the logs into three groups: policies that pass conservative support and uncertainty tests, alternatives that are statistically dominated, and candidates still needing live checks. The central guarantee keeps the strongest policy that clears the gates while removing only those shown to carry certified regret. A sympathetic reader cares because the method replaces risky single-winner rankings with an operational shortlist that limits exposure across many advertiser segments.

Core claim

The main theoretical result gives a unified finite-catalog guarantee showing that, under simultaneous uncertainty control and conservative support gates, the framework preserves the best gate-passing policy while eliminating only policies with certified regret.

What carries the argument

The support-aware offline decision framework that converts logged evidence into a conservative decision object of certified policies, statistically dominated alternatives, and unresolved candidates.

If this is right

  • A 19-policy catalog shrinks to a two-policy validation shortlist.
  • Non-harm is certified across 44 advertiser, exchange, and region segments.
  • The leading reserve rule shows 47.66 percent replay lift together with a 40.71 percent simultaneous lower bound.
  • Information-theoretic limits on threshold resolution are characterized for the catalog.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same support-gate logic could be tested in other logged-policy domains such as recommendation or pricing where coverage varies by context.
  • Experiments that deliberately increase bidder heterogeneity beyond the levels studied here would check whether the supporting results on response uncertainty continue to hold.
  • The emphasis on producing a shortlist of unresolved candidates points toward hybrid systems that combine this offline filter with targeted online A/B tests.

Load-bearing premise

The logged auction data supplies representative samples that permit accurate support estimation and uncertainty control, with bidder-response heterogeneity not overturning localized replay rankings.

What would settle it

Fresh auction logs in which a policy the framework eliminated yields higher revenue than the certified set would falsify the guarantee.

Figures

Figures reproduced from arXiv: 2605.21736 by Caroline Howard, Prashant Shekhar.

Figure 1
Figure 1. Figure 1: Offline reserve-policy selection from logged advertising auctions. Logged marketplace data and a finite reserve￾policy catalog make offline replay evaluation possible, but naive replay rankings can be misleading because apparent gains may hide weak threshold support, multiple-comparison effects, subgroup harm, or bidder-response uncertainty. The figure illustrates the central decision problem of determinin… view at source ↗
Figure 2
Figure 2. Figure 2: Conservative shortlist construction on season two. Panel (a) shows the replay frontier for non-baseline reserve policies, with replay yield lift plotted against retained impression share. Panel (b) shows simultaneous lower-bound ranking. Colored points indicate whether the decision rule certifies the policy, eliminates it as dominated, or leaves it unresolved. Black points show support-adjusted lower bound… view at source ↗
Figure 3
Figure 3. Figure 3: Support-localized threshold resolution. Panel (a) reports effective boundary sample size nboundary(h) as the diagnostic boundary window expands. Panel (b) reports support-adjusted lower-bound lift for the leading policies. The same season-two panel can be statistically large overall while remaining locally thin near narrow reserve-threshold bands. from 0 to 0.10, and Appendix B.1 shows that bootstrap repla… view at source ↗
Figure 4
Figure 4. Figure 4: Validation readiness through transfer and subgroup safety. Panel (a) compares season-two and season-three replay lifts under the frozen catalog. Panel (b) reports mean segment-level replay lifts for the covered segments with the smallest lower endpoints, with 95% normal confidence bars computed from daily segment-level replay variation. maximum support-adjusted lower bound rises from 9.17% at h = 1 to 47.6… view at source ↗
Figure 5
Figure 5. Figure 5: Additional replay diagnostics. Panel (a) shows daily replay-lift dispersion for the leading policies. Panel (b) shows how the Bonferroni critical value and the leader’s simultaneous lower bound change as the policy catalog grows [PITH_FULL_IMAGE:figures/full_fig_p020_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Pairwise boundary-support diagnostics. Panel (a) reports the empirical distribution of pairwise boundary-support shares across policy pairs. Panel (b) relates mean candidate-floor distance to absolute replay-lift gaps, with marker size proportional to boundary support. B.1 Additional replay and replay-concentration diagnostics [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: q-localized replay selection. Panel (a) reports localized boundary lift over localization levels q. Panel (b) reports day-bootstrap winner frequencies. P11 is the stable q-localized boundary-lift leader, while P18 remains the aggregate replay leader. The monotone pattern from the main text remains. Boundary support grows as the diagnostic window widens, and support-adjusted lower bounds become less conserv… view at source ↗
Figure 8
Figure 8. Figure 8: Out-of-time transfer for q-localized selections. Panel (a) compares season-two localized boundary lift with season-three aggregate replay lift for policies selected by the q-localized rule. Panel (b) reports season-three aggregate replay lift across localization levels. The localized winner P11 transfers positively but does not exceed P18’s aggregate season-three replay performance [PITH_FULL_IMAGE:figure… view at source ↗
Figure 9
Figure 9. Figure 9: Shortlist and decision-rule robustness. Panel (a) reports retained shortlist size as the elimination tolerance varies. Panel (b) compares simpler decision rules with the support-aware elimination shortlist. not eliminated because it is strongly supported locally. P18 remains the preferred validation candidate because it dominates on aggregate replay, conservative lower-bound ranking, and out-of-time transf… view at source ↗
read the original abstract

Logged advertising auctions make offline reserve-price evaluation attractive but risky. Replay tables can identify policies with large apparent yield gains, yet they can also hide weak threshold support, multiple-comparison effects, subgroup harm, and bidder-response uncertainty. Existing replay and off-policy evaluation methods estimate or rank policy values, but they do not directly answer the operational question of whether the available evidence is strong enough to justify validation. This paper develops a support-aware offline decision framework for reserve-policy selection. Rather than outputting a single point-estimate winner, the framework converts logged evidence into a conservative decision object consisting of certified policies, statistically dominated alternatives, and unresolved candidates requiring further validation. The main theoretical result gives a unified finite-catalog guarantee showing that, under simultaneous uncertainty control and conservative support gates, the framework preserves the best gate-passing policy while eliminating only policies with certified regret. Supporting results characterize support-localized replay generalization, establish information-theoretic threshold-resolution limits, and quantify when heterogeneous bidder response can overturn localized replay rankings. Experiments on iPinYou real-time-bidding logs show that the leading reserve rule achieves a 47.66% replay lift in season two, a 40.71% simultaneous lower-bound lift, and a 43.87% frozen out-of-time replay lift in season three. The framework reduces a 19-policy catalog to a two-policy validation shortlist while certifying non-harm across 44 advertiser, exchange, and region segments. The results support the central claim that offline reserve-policy evaluation should produce certified validation decisions rather than point-estimate rankings alone.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper develops a support-aware offline decision framework for reserve-price policy selection in advertising marketplaces. Instead of point-estimate rankings, it produces a conservative decision object of certified policies, statistically dominated alternatives, and unresolved candidates. The central theoretical result is a unified finite-catalog guarantee that, under simultaneous uncertainty control and conservative support gates, preserves the best gate-passing policy while eliminating only policies with certified regret. Supporting results address support-localized replay generalization, information-theoretic limits, and heterogeneous bidder response. Experiments on iPinYou RTB logs report a 47.66% replay lift, 40.71% lower-bound lift, and 43.87% out-of-time lift for the leading rule, reducing a 19-policy catalog to a two-policy shortlist while certifying non-harm across 44 segments.

Significance. If the theoretical guarantee holds and the support-estimation assumptions are satisfied, the framework offers a meaningful shift from standard OPE rankings toward certifiable, risk-aware decisions suitable for operational deployment. The empirical demonstration of catalog reduction and segment-level non-harm certification on real auction logs is practically relevant. The work also provides characterizations of localized generalization and bidder-response heterogeneity that could inform future OPE methods. These elements, if substantiated, strengthen the case for conservative offline selection in marketplaces.

major comments (2)
  1. [Main theoretical result] Main theoretical result (unified finite-catalog guarantee): The preservation of the best gate-passing policy while eliminating only certified-regret policies is stated to hold under simultaneous uncertainty control and conservative support gates. However, the construction of these gates and their robustness to non-stationarity or cross-segment dependence in logged auctions is not shown to be independent of the target guarantee; if support estimation fails to certify true coverage, the guarantee can either drop the optimal policy or retain weak alternatives. A concrete counter-example or additional robustness theorem under temporal shifts would be required to substantiate the claim.
  2. [Supporting results on heterogeneous bidder response] Supporting results on heterogeneous bidder response: The quantification of overturning risk for localized replay rankings is performed under specific modeling assumptions and iPinYou season splits. The paper does not demonstrate that these results extend to unmodeled temporal shifts or bidder heterogeneity patterns outside the observed splits; such patterns could overturn the support-gate decisions and thereby invalidate the catalog-reduction and non-harm certification reported in the experiments.
minor comments (2)
  1. [Abstract and introduction] The abstract and introduction introduce several new decision objects (certified policies, statistically dominated alternatives) without an early table or diagram that maps them to standard OPE quantities; adding such a mapping would improve readability.
  2. [Experiments] The reported lifts (47.66% replay, 40.71% lower-bound, 43.87% out-of-time) are given without explicit confidence intervals or details on how the simultaneous lower bound is computed; including these would strengthen the empirical section.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and for recognizing the potential of our support-aware framework to shift offline policy selection toward certifiable decisions in advertising marketplaces. We address each major comment below with clarifications on the assumptions underlying our results and commit to targeted revisions that strengthen the discussion of robustness without altering the core claims.

read point-by-point responses
  1. Referee: [Main theoretical result] Main theoretical result (unified finite-catalog guarantee): The preservation of the best gate-passing policy while eliminating only certified-regret policies is stated to hold under simultaneous uncertainty control and conservative support gates. However, the construction of these gates and their robustness to non-stationarity or cross-segment dependence in logged auctions is not shown to be independent of the target guarantee; if support estimation fails to certify true coverage, the guarantee can either drop the optimal policy or retain weak alternatives. A concrete counter-example or additional robustness theorem under temporal shifts would be required to substantiate the claim.

    Authors: The unified finite-catalog guarantee is derived under the explicit joint conditions of uncertainty control and conservative support gates, as formalized in the main theorem; it does not claim independence from support-estimation quality. The gates are deliberately conservative, requiring empirical coverage thresholds that favor retaining unresolved candidates over risking the elimination of the best-supported policy. Supporting results on support-localized replay generalization already incorporate coverage estimation error. The iPinYou experiments include out-of-time replay on season-three data as an empirical check against temporal effects. We agree that a dedicated robustness subsection would improve clarity. In revision we will add a discussion of sensitivity to non-stationarity and cross-segment dependence, together with a brief illustrative example showing gate behavior under mild shifts, while preserving the conditional nature of the guarantee. revision: partial

  2. Referee: [Supporting results on heterogeneous bidder response] Supporting results on heterogeneous bidder response: The quantification of overturning risk for localized replay rankings is performed under specific modeling assumptions and iPinYou season splits. The paper does not demonstrate that these results extend to unmodeled temporal shifts or bidder heterogeneity patterns outside the observed splits; such patterns could overturn the support-gate decisions and thereby invalidate the catalog-reduction and non-harm certification reported in the experiments.

    Authors: The characterizations of overturning risk and bidder-response heterogeneity are explicitly tied to the modeling assumptions and the observed season splits in the iPinYou logs; we do not assert universal extension. The conservative support gates and the 44-segment non-harm certification are computed directly on the available data, and the reported catalog reduction (19 to 2 policies) and lifts are likewise dataset-specific. The framework outputs unresolved candidates precisely when heterogeneity may threaten gate decisions. In revision we will expand the relevant section to state the scope of these supporting results more explicitly and to note that further validation would be required for environments exhibiting substantially different heterogeneity patterns. revision: yes

Circularity Check

0 steps flagged

No significant circularity; theoretical guarantee is self-contained under explicitly defined controls

full rationale

The paper's central claim is a unified finite-catalog guarantee that, under simultaneous uncertainty control and conservative support gates, preserves the best gate-passing policy while eliminating only policies with certified regret. This is presented as a derived result from the framework's construction rather than a reduction to fitted parameters or self-cited premises by definition. The abstract and supporting results on support-localized replay and heterogeneous bidder response characterize the conditions without evidence of self-definitional loops, fitted inputs renamed as predictions, or load-bearing self-citations that collapse the argument. The derivation remains independent, with the decision object built from logged evidence and explicit gates, qualifying as self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 2 invented entities

The framework rests on domain assumptions about data representativeness and bidder response, with new decision constructs introduced; specific free parameters such as support thresholds are implied but not quantified in the abstract.

free parameters (1)
  • support thresholds and uncertainty control parameters
    Conservative support gates and simultaneous uncertainty controls are central to the guarantee but their specific values or selection method are not detailed.
axioms (2)
  • domain assumption Logged auction data is representative for estimating policy support and performance under the gates
    Implicit in the use of replay tables and the finite-catalog guarantee.
  • domain assumption Heterogeneous bidder responses can be quantified without overturning localized rankings
    Referenced in the supporting result on when bidder response can overturn rankings.
invented entities (2)
  • certified policies no independent evidence
    purpose: Policies that pass support and uncertainty checks for safe validation
    New decision category introduced by the framework.
  • statistically dominated alternatives no independent evidence
    purpose: Policies eliminated as provably inferior under the guarantee
    New decision category introduced by the framework.

pith-pipeline@v0.9.0 · 5808 in / 1648 out tokens · 45838 ms · 2026-05-22T08:33:21.614778+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages

  1. [1]

    Iavor Bojinov, David Simchi-Levi, and Jinglong Zhao

    doi: 10.1214/23-STS883. Iavor Bojinov, David Simchi-Levi, and Jinglong Zhao. Design and analysis of switchback experiments. Management Science, 69(7):3759–3777,

  2. [2]

    Leon Bottou, Jonas Peters, Joaquin Quinonero-Candela, Denis X

    doi: 10.1287/mnsc.2022.4444. Leon Bottou, Jonas Peters, Joaquin Quinonero-Candela, Denis X. Charles, D. Max Chickering, Elon Portugaly, Dipankar Ray, Patrice Simard, and Ed Snelson. Counterfactual reasoning and learning systems: The example of computational advertising.Journal of Machine Learning Research, 14:3207–3260,

  3. [3]

    Ido Bright, Arthur Delarue, and Ilan Lobel

    URL https://www.jmlr.org/papers/v14/bottou13a.html. Ido Bright, Arthur Delarue, and Ilan Lobel. Reducing marketplace interference bias via shadow prices.arXiv preprint arXiv:2205.02274,

  4. [4]

    doi: 10.1145/2591796. 2591867. Kristof Coussement and Dries F. Benoit. Interpretable data science for decision making.Decision Support Systems, 150:113664,

  5. [5]

    Miroslav Dudík, John Langford, and Lihong Li

    doi: 10.1016/j.dss.2021.113664. Miroslav Dudík, John Langford, and Lihong Li. Doubly robust policy evaluation and learning. InProceedings of the 28th International Conference on Machine Learning,

  6. [6]

    1Corresponding author:shekharp@erau.edu 12 Benjamin Edelman, Michael Ostrovsky, and Michael Schwarz

    URL https://icml.cc/2011/ papers/554_icmlpaper.pdf. 1Corresponding author:shekharp@erau.edu 12 Benjamin Edelman, Michael Ostrovsky, and Michael Schwarz. Internet advertising and the generalized second-price auction: Selling billions of dollars worth of keywords.American Economic Review, 97(1): 242–259,

  7. [7]

    Zhe Feng, Sébastien Lahaie, Jon Schneider, and Jinchao Ye

    doi: 10.1257/aer.97.1.242. Zhe Feng, Sébastien Lahaie, Jon Schneider, and Jinchao Ye. Reserve price optimization for first price auctions. arXiv preprint arXiv:2006.06519,

  8. [8]

    Limiting bias from test-control interference in online marketplace experiments

    David Holtz and Sinan Aral. Limiting bias from test-control interference in online marketplace experiments. arXiv preprint arXiv:2004.12162,

  9. [9]

    doi: 10.1287/mnsc.2021

  10. [10]

    Interference, bias, and variance in two-sided marketplace experimentation: Guidance for platforms

    Hannah Li, Geng Zhao, Ramesh Johari, and Gabriel Y Weintraub. Interference, bias, and variance in two-sided marketplace experimentation: Guidance for platforms. InProceedings of the ACM Web Conference 2022, pages 182–192,

  11. [11]

    Michael Ostrovsky and Michael Schwarz

    doi: 10.1287/moor.6.1.58. Michael Ostrovsky and Michael Schwarz. Reserve prices in internet advertising auctions. InProceedings of the 12th ACM Conference on Electronic Commerce, pages 59–60,

  12. [12]

    Prashant Shekhar and Caroline Howard

    doi: 10.1145/1993574.1993585. Prashant Shekhar and Caroline Howard. Decision support for marketplace policies under incomplete evidence: From replay to launch readiness.arXiv preprint arXiv:2605.12840,

  13. [13]

    Shuai Yuan, Jun Wang, Bowei Chen, Peter Mason, and Sam Seljan

    doi: 10.1016/j.ijindorg.2006.10.002. Shuai Yuan, Jun Wang, Bowei Chen, Peter Mason, and Sam Seljan. An empirical study of reserve price optimisation in real-time bidding. InProceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1897–1906,

  14. [14]

    1Corresponding author:shekharp@erau.edu 13 A Proofs A.1 Proof of Theorem 4.1 Proof.For eachπ∈ P, define the centered replay difference Z π i :=Y π i −Y 0 i , µ Z,π :=E[Z π i ]

    doi: 10.1145/2623330.2623357. 1Corresponding author:shekharp@erau.edu 13 A Proofs A.1 Proof of Theorem 4.1 Proof.For eachπ∈ P, define the centered replay difference Z π i :=Y π i −Y 0 i , µ Z,π :=E[Z π i ]. Then ∆π = µZ,π µ0 . Fixπandq∈(0,1). Let Aπ,q :={|G i −τ π| ≤r π(q)}, m π,q =P(A π,q). By assumption, the replay difference admits the decomposition Z ...