PEARL: Unbiased Percentile Estimation via Contrastive Learning for Industrial-Scale Livestream Recommendation
Pith reviewed 2026-05-22 09:38 UTC · model grok-4.3
The pith
Pairwise comparisons from observed interactions yield unbiased estimates of preference percentiles in recommender systems.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
PEARL establishes a nonparametric contrastive percentile approximation that leverages real contrastive interaction samples to model relative preferences instead of absolute engagement magnitudes, with theoretical proof that such pairwise comparisons deliver unbiased estimates of percentile-based preference signals, supported by mechanisms for smoothing sparse discrete feedback and generalized weighting.
What carries the argument
The nonparametric contrastive percentile approximation framework that uses real pairwise interaction samples to estimate relative preference percentiles directly from observed data.
If this is right
- Mitigates behavioral intensity imbalance so observed interactions better reflect true user preferences across activity levels.
- Improves recommendation performance consistently across multiple ranking targets in offline experiments.
- Produces measurable production gains including higher watch duration, consumption amount, and interaction rate alongside lower report rate.
- Extends to sparse feedback via prediction-based bootstrapping and supports flexible modeling through value-weighted and co-training extensions.
Where Pith is reading between the lines
- The method could apply to other recommendation settings with heterogeneous user activity, such as e-commerce or content platforms, by replacing absolute counts with relative rankings.
- Better representation of low-activity users might increase long-term retention by reducing over-optimization for power users.
- The co-training component may generalize to improve embedding quality in other contrastive recommendation architectures.
Load-bearing premise
Real contrastive interaction samples from observed data can directly approximate underlying percentile relationships without auxiliary distribution models or extra assumptions on user behavior.
What would settle it
A simulation or controlled experiment with known skewed engagement distributions where PEARL's recovered percentile estimates are compared against ground-truth percentiles to check for residual bias.
Figures
read the original abstract
Recommender systems trained on user interaction data are susceptible to behavioral intensity imbalance--a systematic distortion arising from heterogeneous engagement patterns across users. This imbalance skews feedback signals such that observed interactions no longer faithfully reflect true preferences, causing models to disproportionately amplify signals from highly active users while underrepresenting others, which ultimately degrades recommendation quality and robustness at scale. To address this issue, we propose a nonparametric contrastive percentile approximation framework, PEARL, that models relative preference signals instead of absolute engagement magnitudes. Building upon relative advantage debiasing, PEARL leverages real contrastive interaction samples to approximate percentile relationships directly, without relying on auxiliary distribution estimation models. We provide theoretical justification demonstrating that such pairwise comparisons yield unbiased estimates of percentile-based preference signals. For broader applicability, we introduce a prediction-based bootstrapping mechanism for percentile smoothing to handle sparse and discrete feedback, alongside a generalized value-weighted formulation and a co-training strategy to enhance both modeling flexibility and representation learning. Extensive offline experiments demonstrate that PEARL effectively mitigates behavioral bias and consistently improves recommendation performance across multiple ranking targets. Deployed in a production livestream platform with a combined user base of billions, online A/B testing confirms substantial real-world gains: +2.10% Watch Duration, +0.80% Consumption Amount, +1.49% Interaction Rate, and -6.91% Report Rate.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes PEARL, a nonparametric contrastive percentile approximation framework for addressing behavioral intensity imbalance in industrial livestream recommender systems. It models relative preference signals via real contrastive interaction samples drawn from observed data, claims theoretical justification that such pairwise comparisons produce unbiased percentile estimates without auxiliary distribution models, and introduces prediction-based bootstrapping for smoothing sparse feedback, a generalized value-weighted formulation, and a co-training strategy. Offline experiments show effective bias mitigation and performance gains across ranking targets; online A/B tests on a platform serving billions of users report gains of +2.10% Watch Duration, +0.80% Consumption Amount, +1.49% Interaction Rate, and -6.91% Report Rate.
Significance. If the nonparametric unbiasedness claim holds for logged interaction data and the empirical improvements prove robust to distribution shift, the work could provide a practical tool for debiasing engagement signals at industrial scale. The combination of a direct contrastive approach, mechanisms for discrete feedback, and large-scale A/B validation on a production livestream platform adds applied relevance, though the overall significance hinges on whether the theoretical justification transfers from idealized sampling to policy-generated data.
major comments (2)
- [Theoretical justification section] The central theoretical claim (referenced in the abstract as demonstrating that 'pairwise comparisons yield unbiased estimates of percentile-based preference signals') must explicitly address whether contrastive pairs sampled from logged data remain exchangeable with respect to the underlying preference ordering. If the derivation assumes uniform or independent sampling over the full user-item space rather than the exposure-biased distribution induced by the current policy, the unbiasedness result does not transfer; a concrete proof or counter-example under logged-data sampling is required.
- [Experimental setup] § on experimental setup and data processing: the manuscript reports positive A/B results but provides no explicit rules for data exclusion, error bounds on the percentile estimates, or ablation isolating the contribution of the contrastive sampling versus the bootstrapping and value-weighted components; without these, it is difficult to confirm that the observed gains stem from unbiased percentile estimation rather than other modeling choices.
minor comments (2)
- [Method description] Clarify the precise construction of positive and negative contrastive pairs from the observed interaction logs, including any filtering steps that might re-introduce selection bias.
- [Abstract and §1] The abstract and introduction use 'nonparametric' and 'without relying on auxiliary distribution estimation models' interchangeably; ensure the distinction is maintained when describing the prediction-based bootstrapping mechanism.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments. We address each major point below and have revised the manuscript accordingly to strengthen the presentation of the theoretical result and the experimental details.
read point-by-point responses
-
Referee: [Theoretical justification section] The central theoretical claim (referenced in the abstract as demonstrating that 'pairwise comparisons yield unbiased estimates of percentile-based preference signals') must explicitly address whether contrastive pairs sampled from logged data remain exchangeable with respect to the underlying preference ordering. If the derivation assumes uniform or independent sampling over the full user-item space rather than the exposure-biased distribution induced by the current policy, the unbiasedness result does not transfer; a concrete proof or counter-example under logged-data sampling is required.
Authors: We agree that the original theoretical section did not sufficiently clarify the sampling regime. The derivation in the manuscript is performed with respect to the observed (logged) distribution rather than a uniform distribution over the full space. In the revised version we have added a dedicated subsection that derives the unbiasedness result under policy-induced sampling. The proof proceeds by showing that the contrastive pairwise comparisons preserve the relative ordering of preferences conditional on the items that were exposed and observed; the estimator is therefore unbiased for the percentile ranks within the support of the logged data. We also discuss the distinction between this conditional unbiasedness and unconditional unbiasedness with respect to the full user-item space, and note that the former is the relevant quantity for debiasing observed engagement signals. revision: yes
-
Referee: [Experimental setup] § on experimental setup and data processing: the manuscript reports positive A/B results but provides no explicit rules for data exclusion, error bounds on the percentile estimates, or ablation isolating the contribution of the contrastive sampling versus the bootstrapping and value-weighted components; without these, it is difficult to confirm that the observed gains stem from unbiased percentile estimation rather than other modeling choices.
Authors: We acknowledge that the original experimental section lacked several reproducibility details. The revised manuscript now includes: (i) explicit data exclusion criteria (sessions with fewer than five interactions, items with exposure below a minimum threshold, and users whose activity falls outside the 5th–95th percentile of engagement intensity); (ii) bootstrap-based error bounds on the percentile estimates (100 resamples, reporting standard errors for each reported metric); and (iii) a full set of ablations that disable contrastive sampling, prediction-based bootstrapping, and the value-weighted formulation in turn. These ablations show that the contrastive component accounts for the majority of the bias reduction, while bootstrapping and value weighting provide incremental robustness, particularly on sparse feedback. revision: yes
Circularity Check
No significant circularity; derivation relies on independent theoretical justification
full rationale
The paper's core claim is a nonparametric contrastive framework that approximates percentile relationships directly from observed interaction samples, supported by a separate theoretical justification for unbiasedness. No equations or steps in the abstract reduce the target percentile estimate to a fitted parameter or self-citation by construction. The prediction-based bootstrapping and value-weighted formulation are presented as extensions for sparsity handling rather than load-bearing definitions of the primary result. The derivation chain remains self-contained, with the unbiasedness result positioned as an external mathematical property rather than an input renamed as output.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Pairwise comparisons from observed interactions yield unbiased estimates of percentile-based preference signals
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theorem 1 (Unbiased Estimator of Percentile). ... E_{Y'∼f_u}[I(y>Y')] = ∫_0^y f_u(t) dt = CDF_u(y)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Himan Abdollahpouri, Masoud Mansoury, Robin Burke, and Bamshad Mobasher
- [2]
-
[3]
Stephen Bonner and Flavian Vasile. 2018. Causal embeddings for recommendation. InProceedings of the 12th ACM conference on recommender systems. 104–112. PEARL: Unbiased Percentile Estimation via Contrastive Learning for Industrial-Scale Livestream Recommendation Conference acronym ’XX, June 03–05, 2018, Woodstock, NY
work page 2018
-
[4]
Jiawei Chen, Hande Dong, Yang Qiu, Xiangnan He, Xin Xin, Liang Chen, Guli Lin, and Keping Yang. 2021. AutoDebias: Learning to debias for recommendation. InProceedings of the 44th international ACM SIGIR conference on research and development in information retrieval. 21–30
work page 2021
-
[5]
Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. InProceedings of the 10th ACM conference on recommender systems. 191–198
work page 2016
- [6]
-
[7]
Thorsten Joachims, Adith Swaminathan, and Tobias Schnabel. 2017. Unbiased learning-to-rank with biased feedback. InProceedings of the tenth ACM interna- tional conference on web search and data mining. 781–789
work page 2017
-
[8]
Jing Li, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Tao Lian, and Jun Ma. 2017. Neural attentive session-based recommendation. InProceedings of the 2017 ACM on Conference on Information and Knowledge Management. 1419–1428
work page 2017
-
[9]
Dawen Liang, Rahul G Krishnan, Matthew D Hoffman, and Tony Jebara. 2018. Variational autoencoders for collaborative filtering. InProceedings of the 2018 world wide web conference. 689–698
work page 2018
- [10]
-
[11]
Emily Liu, Kuan Han, Minfeng Zhan, Bocheng Zhao, Guanyu Mu, and Yang Song
- [12]
-
[13]
Jianxin Ma, Chang Zhou, Peng Cui, Hongxia Yang, and Wenwu Zhu. 2019. Learn- ing disentangled representations for recommendation.Advances in neural infor- mation processing systems32 (2019)
work page 2019
-
[14]
Yoon-Joo Park and Alexander Tuzhilin. 2008. The long tail of recommender systems and how to leverage it. InProceedings of the 2008 ACM conference on Recommender systems. 11–18
work page 2008
-
[15]
Tobias Schnabel, Adith Swaminathan, Ashudeep Singh, Navin Chandak, and Thorsten Joachims. 2016. Recommendations as treatments: Debiasing learning and evaluation. Ininternational conference on machine learning. PMLR, 1670– 1679
work page 2016
-
[16]
Jie Sun, Zhaoying Ding, Xiaoshuang Chen, Qi Chen, Yincheng Wang, Kaiqiao Zhan, and Ben Wang. 2024. Cread: A classification-restoration framework with error adaptive discretization for watch time prediction in video recommender systems. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 9027–9034
work page 2024
-
[17]
Adith Swaminathan and Thorsten Joachims. 2015. The self-normalized estimator for counterfactual learning.advances in neural information processing systems28 (2015)
work page 2015
-
[18]
Yixin Wang, Dawen Liang, Laurent Charlin, and David M Blei. 2018. The decon- founded recommender: A causal inference approach to recommendation.arXiv preprint arXiv:1808.06581(2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[19]
Ruohan Zhan, Changhua Pei, Qiang Su, Jianfeng Wen, Xueliang Wang, Guanyu Mu, Dong Zheng, Peng Jiang, and Kun Gai. 2022. Deconfounding duration bias in watch-time prediction for video recommendation. InProceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining. 4472–4481
work page 2022
-
[20]
Buyun Zhang, Liang Luo, Yuxin Chen, Jade Nie, Xi Liu, Shen Li, Yanli Zhao, Yuchen Hao, Yantao Yao, Ellie Dingqiao Wen, Jongsoo Park, Maxim Naumov, and Wenlin Chen. 2024. Wukong: Towards a Scaling Law for Large-Scale Recommen- dation. InProceedings of the 41st International Conference on Machine Learning. PMLR, 59421–59434
work page 2024
- [21]
-
[22]
Haiyuan Zhao, Changshuo Zhang, Yang Wang, Hao Wang, Zhen Ouyang, Bin Yuan, Qinglei Wang, and Zuotao Liu. 2025. Towards Unbiased and Real-Time Staytime Prediction for Live Streaming Recommendation. InProceedings of the 34th ACM International Conference on Information and Knowledge Management. 6293–6300
work page 2025
-
[23]
Haiyuan Zhao, Lei Zhang, Jun Xu, Guohao Cai, Zhenhua Dong, and Ji-Rong Wen. 2023. Uncovering user interest from biased and noised watch time in video recommendation. InProceedings of the 17th ACM Conference on Recommender Systems. 528–539
work page 2023
-
[24]
Yu Zheng, Chen Gao, Jingtao Ding, Lingling Yi, Depeng Jin, Yong Li, and Meng Wang. 2022. Dvr: micro-video recommendation optimizing watch-time-gain under duration bias. InProceedings of the 30th ACM International Conference on Multimedia. 334–345
work page 2022
-
[25]
Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep interest network for click-through rate prediction. InProceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 1059–1068
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.