pith. sign in

arxiv: 2605.21752 · v1 · pith:JP7VIBZCnew · submitted 2026-05-20 · 💻 cs.LG · cs.AI

PEARL: Unbiased Percentile Estimation via Contrastive Learning for Industrial-Scale Livestream Recommendation

Pith reviewed 2026-05-22 09:38 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords recommender systemscontrastive learningpercentile estimationdebiasinglivestream recommendationbehavioral biasunbiased estimationindustrial scale
0
0 comments X

The pith

Pairwise comparisons from observed interactions yield unbiased estimates of preference percentiles in recommender systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Recommender systems trained on interaction data suffer from behavioral intensity imbalance, where signals from highly active users dominate and distort true preferences. PEARL counters this by modeling relative preference signals through a nonparametric contrastive framework that approximates percentile relationships directly from real pairwise samples. Theoretical justification shows these comparisons produce unbiased percentile-based estimates without auxiliary distribution models. The approach adds bootstrapping for sparse feedback, value-weighted formulations, and co-training, with offline gains and production improvements in watch duration, consumption, and interaction rates.

Core claim

PEARL establishes a nonparametric contrastive percentile approximation that leverages real contrastive interaction samples to model relative preferences instead of absolute engagement magnitudes, with theoretical proof that such pairwise comparisons deliver unbiased estimates of percentile-based preference signals, supported by mechanisms for smoothing sparse discrete feedback and generalized weighting.

What carries the argument

The nonparametric contrastive percentile approximation framework that uses real pairwise interaction samples to estimate relative preference percentiles directly from observed data.

If this is right

  • Mitigates behavioral intensity imbalance so observed interactions better reflect true user preferences across activity levels.
  • Improves recommendation performance consistently across multiple ranking targets in offline experiments.
  • Produces measurable production gains including higher watch duration, consumption amount, and interaction rate alongside lower report rate.
  • Extends to sparse feedback via prediction-based bootstrapping and supports flexible modeling through value-weighted and co-training extensions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could apply to other recommendation settings with heterogeneous user activity, such as e-commerce or content platforms, by replacing absolute counts with relative rankings.
  • Better representation of low-activity users might increase long-term retention by reducing over-optimization for power users.
  • The co-training component may generalize to improve embedding quality in other contrastive recommendation architectures.

Load-bearing premise

Real contrastive interaction samples from observed data can directly approximate underlying percentile relationships without auxiliary distribution models or extra assumptions on user behavior.

What would settle it

A simulation or controlled experiment with known skewed engagement distributions where PEARL's recovered percentile estimates are compared against ground-truth percentiles to check for residual bias.

Figures

Figures reproduced from arXiv: 2605.21752 by Blake Gella, Emily Liu, Junlin Zhang, Qinglei Wang, Wei Wu, Wentao Guo, YuHao Yin, Zexi Huang, Zikai Wang.

Figure 1
Figure 1. Figure 1: The overall architecture of PEARL, which consists of two core components: A user-keyed reservoir pool ( [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Notably, the raw watch time target suffers from strong [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
read the original abstract

Recommender systems trained on user interaction data are susceptible to behavioral intensity imbalance--a systematic distortion arising from heterogeneous engagement patterns across users. This imbalance skews feedback signals such that observed interactions no longer faithfully reflect true preferences, causing models to disproportionately amplify signals from highly active users while underrepresenting others, which ultimately degrades recommendation quality and robustness at scale. To address this issue, we propose a nonparametric contrastive percentile approximation framework, PEARL, that models relative preference signals instead of absolute engagement magnitudes. Building upon relative advantage debiasing, PEARL leverages real contrastive interaction samples to approximate percentile relationships directly, without relying on auxiliary distribution estimation models. We provide theoretical justification demonstrating that such pairwise comparisons yield unbiased estimates of percentile-based preference signals. For broader applicability, we introduce a prediction-based bootstrapping mechanism for percentile smoothing to handle sparse and discrete feedback, alongside a generalized value-weighted formulation and a co-training strategy to enhance both modeling flexibility and representation learning. Extensive offline experiments demonstrate that PEARL effectively mitigates behavioral bias and consistently improves recommendation performance across multiple ranking targets. Deployed in a production livestream platform with a combined user base of billions, online A/B testing confirms substantial real-world gains: +2.10% Watch Duration, +0.80% Consumption Amount, +1.49% Interaction Rate, and -6.91% Report Rate.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes PEARL, a nonparametric contrastive percentile approximation framework for addressing behavioral intensity imbalance in industrial livestream recommender systems. It models relative preference signals via real contrastive interaction samples drawn from observed data, claims theoretical justification that such pairwise comparisons produce unbiased percentile estimates without auxiliary distribution models, and introduces prediction-based bootstrapping for smoothing sparse feedback, a generalized value-weighted formulation, and a co-training strategy. Offline experiments show effective bias mitigation and performance gains across ranking targets; online A/B tests on a platform serving billions of users report gains of +2.10% Watch Duration, +0.80% Consumption Amount, +1.49% Interaction Rate, and -6.91% Report Rate.

Significance. If the nonparametric unbiasedness claim holds for logged interaction data and the empirical improvements prove robust to distribution shift, the work could provide a practical tool for debiasing engagement signals at industrial scale. The combination of a direct contrastive approach, mechanisms for discrete feedback, and large-scale A/B validation on a production livestream platform adds applied relevance, though the overall significance hinges on whether the theoretical justification transfers from idealized sampling to policy-generated data.

major comments (2)
  1. [Theoretical justification section] The central theoretical claim (referenced in the abstract as demonstrating that 'pairwise comparisons yield unbiased estimates of percentile-based preference signals') must explicitly address whether contrastive pairs sampled from logged data remain exchangeable with respect to the underlying preference ordering. If the derivation assumes uniform or independent sampling over the full user-item space rather than the exposure-biased distribution induced by the current policy, the unbiasedness result does not transfer; a concrete proof or counter-example under logged-data sampling is required.
  2. [Experimental setup] § on experimental setup and data processing: the manuscript reports positive A/B results but provides no explicit rules for data exclusion, error bounds on the percentile estimates, or ablation isolating the contribution of the contrastive sampling versus the bootstrapping and value-weighted components; without these, it is difficult to confirm that the observed gains stem from unbiased percentile estimation rather than other modeling choices.
minor comments (2)
  1. [Method description] Clarify the precise construction of positive and negative contrastive pairs from the observed interaction logs, including any filtering steps that might re-introduce selection bias.
  2. [Abstract and §1] The abstract and introduction use 'nonparametric' and 'without relying on auxiliary distribution estimation models' interchangeably; ensure the distinction is maintained when describing the prediction-based bootstrapping mechanism.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. We address each major point below and have revised the manuscript accordingly to strengthen the presentation of the theoretical result and the experimental details.

read point-by-point responses
  1. Referee: [Theoretical justification section] The central theoretical claim (referenced in the abstract as demonstrating that 'pairwise comparisons yield unbiased estimates of percentile-based preference signals') must explicitly address whether contrastive pairs sampled from logged data remain exchangeable with respect to the underlying preference ordering. If the derivation assumes uniform or independent sampling over the full user-item space rather than the exposure-biased distribution induced by the current policy, the unbiasedness result does not transfer; a concrete proof or counter-example under logged-data sampling is required.

    Authors: We agree that the original theoretical section did not sufficiently clarify the sampling regime. The derivation in the manuscript is performed with respect to the observed (logged) distribution rather than a uniform distribution over the full space. In the revised version we have added a dedicated subsection that derives the unbiasedness result under policy-induced sampling. The proof proceeds by showing that the contrastive pairwise comparisons preserve the relative ordering of preferences conditional on the items that were exposed and observed; the estimator is therefore unbiased for the percentile ranks within the support of the logged data. We also discuss the distinction between this conditional unbiasedness and unconditional unbiasedness with respect to the full user-item space, and note that the former is the relevant quantity for debiasing observed engagement signals. revision: yes

  2. Referee: [Experimental setup] § on experimental setup and data processing: the manuscript reports positive A/B results but provides no explicit rules for data exclusion, error bounds on the percentile estimates, or ablation isolating the contribution of the contrastive sampling versus the bootstrapping and value-weighted components; without these, it is difficult to confirm that the observed gains stem from unbiased percentile estimation rather than other modeling choices.

    Authors: We acknowledge that the original experimental section lacked several reproducibility details. The revised manuscript now includes: (i) explicit data exclusion criteria (sessions with fewer than five interactions, items with exposure below a minimum threshold, and users whose activity falls outside the 5th–95th percentile of engagement intensity); (ii) bootstrap-based error bounds on the percentile estimates (100 resamples, reporting standard errors for each reported metric); and (iii) a full set of ablations that disable contrastive sampling, prediction-based bootstrapping, and the value-weighted formulation in turn. These ablations show that the contrastive component accounts for the majority of the bias reduction, while bootstrapping and value weighting provide incremental robustness, particularly on sparse feedback. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on independent theoretical justification

full rationale

The paper's core claim is a nonparametric contrastive framework that approximates percentile relationships directly from observed interaction samples, supported by a separate theoretical justification for unbiasedness. No equations or steps in the abstract reduce the target percentile estimate to a fitted parameter or self-citation by construction. The prediction-based bootstrapping and value-weighted formulation are presented as extensions for sparsity handling rather than load-bearing definitions of the primary result. The derivation chain remains self-contained, with the unbiasedness result positioned as an external mathematical property rather than an input renamed as output.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; full derivation, assumptions, and any fitted components are not visible. The central claim rests on an unshown theoretical result about pairwise comparisons and on the representativeness of observed contrastive samples.

axioms (1)
  • domain assumption Pairwise comparisons from observed interactions yield unbiased estimates of percentile-based preference signals
    Stated directly in the abstract as the theoretical justification for the framework.

pith-pipeline@v0.9.0 · 5800 in / 1270 out tokens · 29751 ms · 2026-05-22T09:38:32.530343+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · 1 internal anchor

  1. [1]

    Himan Abdollahpouri, Masoud Mansoury, Robin Burke, and Bamshad Mobasher

  2. [2]

    The Unfairness of Popularity Bias in Recommendation.arXiv:1907.13286 (2019)

  3. [3]

    Stephen Bonner and Flavian Vasile. 2018. Causal embeddings for recommendation. InProceedings of the 12th ACM conference on recommender systems. 104–112. PEARL: Unbiased Percentile Estimation via Contrastive Learning for Industrial-Scale Livestream Recommendation Conference acronym ’XX, June 03–05, 2018, Woodstock, NY

  4. [4]

    Jiawei Chen, Hande Dong, Yang Qiu, Xiangnan He, Xin Xin, Liang Chen, Guli Lin, and Keping Yang. 2021. AutoDebias: Learning to debias for recommendation. InProceedings of the 44th international ACM SIGIR conference on research and development in information retrieval. 21–30

  5. [5]

    Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. InProceedings of the 10th ACM conference on recommender systems. 191–198

  6. [6]

    Jiaxin Deng, Dong Shen, Shiyao Wang, Xiangyu Wu, Fan Yang, Guorui Zhou, and Gaofeng Meng. 2023. ContentCTR: Frame-level live streaming click-through rate prediction with multimodal transformer.arXiv preprint arXiv:2306.14392 (2023)

  7. [7]

    Thorsten Joachims, Adith Swaminathan, and Tobias Schnabel. 2017. Unbiased learning-to-rank with biased feedback. InProceedings of the tenth ACM interna- tional conference on web search and data mining. 781–789

  8. [8]

    Jing Li, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Tao Lian, and Jun Ma. 2017. Neural attentive session-based recommendation. InProceedings of the 2017 ACM on Conference on Information and Knowledge Management. 1419–1428

  9. [9]

    Dawen Liang, Rahul G Krishnan, Matthew D Hoffman, and Tony Jebara. 2018. Variational autoencoders for collaborative filtering. InProceedings of the 2018 world wide web conference. 689–698

  10. [10]

    Chengzhi Lin, Shuchang Liu, Chuyuan Wang, and Yongqi Liu. 2024. Conditional quantile estimation for uncertain watch time in short-video recommendation. arXiv preprint arXiv:2407.12223(2024)

  11. [11]

    Emily Liu, Kuan Han, Minfeng Zhan, Bocheng Zhao, Guanyu Mu, and Yang Song

  12. [12]

    Relative Advantage Debiasing for Watch-Time Prediction in Short-Video Recommendation.arXiv preprint arXiv:2508.11086(2025)

  13. [13]

    Jianxin Ma, Chang Zhou, Peng Cui, Hongxia Yang, and Wenwu Zhu. 2019. Learn- ing disentangled representations for recommendation.Advances in neural infor- mation processing systems32 (2019)

  14. [14]

    Yoon-Joo Park and Alexander Tuzhilin. 2008. The long tail of recommender systems and how to leverage it. InProceedings of the 2008 ACM conference on Recommender systems. 11–18

  15. [15]

    Tobias Schnabel, Adith Swaminathan, Ashudeep Singh, Navin Chandak, and Thorsten Joachims. 2016. Recommendations as treatments: Debiasing learning and evaluation. Ininternational conference on machine learning. PMLR, 1670– 1679

  16. [16]

    Jie Sun, Zhaoying Ding, Xiaoshuang Chen, Qi Chen, Yincheng Wang, Kaiqiao Zhan, and Ben Wang. 2024. Cread: A classification-restoration framework with error adaptive discretization for watch time prediction in video recommender systems. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 9027–9034

  17. [17]

    Adith Swaminathan and Thorsten Joachims. 2015. The self-normalized estimator for counterfactual learning.advances in neural information processing systems28 (2015)

  18. [18]

    Yixin Wang, Dawen Liang, Laurent Charlin, and David M Blei. 2018. The decon- founded recommender: A causal inference approach to recommendation.arXiv preprint arXiv:1808.06581(2018)

  19. [19]

    Ruohan Zhan, Changhua Pei, Qiang Su, Jianfeng Wen, Xueliang Wang, Guanyu Mu, Dong Zheng, Peng Jiang, and Kun Gai. 2022. Deconfounding duration bias in watch-time prediction for video recommendation. InProceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining. 4472–4481

  20. [20]

    Buyun Zhang, Liang Luo, Yuxin Chen, Jade Nie, Xi Liu, Shen Li, Yanli Zhao, Yuchen Hao, Yantao Yao, Ellie Dingqiao Wen, Jongsoo Park, Maxim Naumov, and Wenlin Chen. 2024. Wukong: Towards a Scaling Law for Large-Scale Recommen- dation. InProceedings of the 41st International Conference on Machine Learning. PMLR, 59421–59434

  21. [21]

    Ruifeng Zhang, Zexi Huang, Zikai Wang, Ke Sun, Bohang Zheng, Yuchen Jiang, Zhe Chen, Zhen Ouyang, Huimin Xie, Phil Shen, et al. 2026. Zenith: Scaling up Ranking Models for Billion-scale Livestreaming Recommendation.arXiv preprint arXiv:2601.21285(2026)

  22. [22]

    Haiyuan Zhao, Changshuo Zhang, Yang Wang, Hao Wang, Zhen Ouyang, Bin Yuan, Qinglei Wang, and Zuotao Liu. 2025. Towards Unbiased and Real-Time Staytime Prediction for Live Streaming Recommendation. InProceedings of the 34th ACM International Conference on Information and Knowledge Management. 6293–6300

  23. [23]

    Haiyuan Zhao, Lei Zhang, Jun Xu, Guohao Cai, Zhenhua Dong, and Ji-Rong Wen. 2023. Uncovering user interest from biased and noised watch time in video recommendation. InProceedings of the 17th ACM Conference on Recommender Systems. 528–539

  24. [24]

    Yu Zheng, Chen Gao, Jingtao Ding, Lingling Yi, Depeng Jin, Yong Li, and Meng Wang. 2022. Dvr: micro-video recommendation optimizing watch-time-gain under duration bias. InProceedings of the 30th ACM International Conference on Multimedia. 334–345

  25. [25]

    Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep interest network for click-through rate prediction. InProceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 1059–1068