DADF: A Distribution-Aware Debiasing Framework for Watch-Time Regression in Recommender Systems

Han Li; Kun Gai; Ruiming Tang; Xiao Lv; XinLong Zhao; Yiqing Yang; Zhao Liu

arxiv: 2605.17863 · v1 · pith:3ZWLIPEJnew · submitted 2026-05-18 · 💻 cs.IR

DADF: A Distribution-Aware Debiasing Framework for Watch-Time Regression in Recommender Systems

Yiqing Yang , Xinlong Zhao , Zhao Liu , Xiao Lv , Ruiming Tang , Han Li , Kun Gai This is my paper

Pith reviewed 2026-05-20 00:55 UTC · model grok-4.3

classification 💻 cs.IR

keywords watch-time predictiondebiasingrecommender systemslong-tailed distributionresidual correctioncalibration biasshort-video platforms

0 comments

The pith

A plug-in second-stage correction can remove local calibration biases in long-tailed watch-time regression without retraining the base predictor.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to demonstrate that watch-time models often look calibrated in aggregate while systematically overestimating short views and underestimating long views because errors cancel out. This matters in short-video platforms because ranking and user engagement depend on accurate per-video time estimates. DADF adds a lightweight multiplicative adjustment layer that stabilizes the long-tailed targets, conditions the correction on observable factors such as video duration, and draws on auxiliary engagement predictions. The approach is presented as a deployable fix that improves both accuracy and downstream metrics while leaving the original predictor untouched.

Core claim

DADF performs second-stage multiplicative residual correction on top of an existing watch-time predictor. It combines a dynamic distribution-aware transformation to stabilize long-tailed correction targets, a debias-factor-aware module that models heterogeneous residual patterns using inference-time observables especially video duration, and a multi-label-aware module that exploits auxiliary prediction signals from engagement heads. The framework is evaluated on public benchmarks and a large-scale industrial system, where it improves pointwise accuracy, ranking quality, and real-world engagement metrics.

What carries the argument

Second-stage multiplicative residual correction that applies dynamic distribution-aware transformation and factors conditioned on video duration plus auxiliary engagement signals to adjust region-specific biases in long-tailed watch-time targets.

If this is right

The method yields consistent gains in pointwise accuracy and ranking quality across multiple public short-video datasets and different base model backbones.
In a production ranking system it produces a 1.88 percentage-point WUAUC improvement and a 12.57 percent MAE reduction.
Online A/B testing records a statistically significant 0.347 percent increase in average time spent per device.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same second-stage correction pattern could be tested on other long-tailed regression targets such as dwell time or completion rate in recommendation systems.
Because the correction uses only inference-time observables, it may allow debiasing in settings where full model retraining is impractical or expensive.
Similar multiplicative residual adjustments might address local calibration issues in ranking or regression tasks outside video recommendations.

Load-bearing premise

Residual errors vary systematically across watch-time regions in a way that multiplicative correction factors conditioned on inference-time observables can capture without introducing new distributional biases.

What would settle it

On a held-out set, bin videos by watch-time length and check whether the corrected model still shows statistically significant overestimation in the shortest bin and underestimation in the longest bin.

Figures

Figures reproduced from arXiv: 2605.17863 by Han Li, Kun Gai, Ruiming Tang, Xiao Lv, XinLong Zhao, Yiqing Yang, Zhao Liu.

**Figure 2.** Figure 2: Overview of DADF. The framework corrects an [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: MAE reduction across duration/watch-time buckets. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 5.** Figure 5: Sensitivity to the number of duration buckets [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Distribution comparison of the raw multiplicative correction factor (top) and the group-specific Box–Cox transformed [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗

**Figure 7.** Figure 7: Learned group-specific Box–Cox transformation [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

read the original abstract

Watch-time prediction is a central regression task in short-video recommender systems, where labels are highly long-tailed and residual errors vary systematically across observed watch-time regions. In practice, a model may appear globally calibrated while still overestimating short views and underestimating long views, because opposite errors cancel out in aggregate. Existing methods mainly improve the first-stage watch-time predictor, but often leave such residual distributional bias insufficiently corrected. We propose DADF, a distribution-aware debiasing framework for watch-time regression. Instead of replacing a deployed predictor, DADF performs second-stage multiplicative residual correction on top of it. DADF combines three complementary designs: a dynamic distribution-aware transformation for stabilizing long-tailed correction targets, a debias-factor-aware module for modeling heterogeneous residual patterns using inference-time observable factors, especially video duration, and a multi-label-aware module that exploits auxiliary prediction signals from engagement heads. We evaluate DADF on public short-video benchmarks and a large-scale industrial ranking system. DADF consistently improves both pointwise accuracy and ranking quality across datasets and backbones. In the industrial setting, it achieves a 1.88 percentage-point WUAUC gain over the production baseline, reduces MAE by 12.57%, and yields a statistically significant 0.347% lift in average time spent per device in online A/B testing. These results demonstrate that DADF effectively mitigates local calibration bias and provides a practical plug-in solution for debiasing long-tailed continuous targets. The source code is available at https://github.com/liuzhao09/DADF.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DADF is a practical second-stage correction for local calibration bias in watch-time regression that shows offline gains plus a small but real online lift in production.

read the letter

This paper gives a workable plug-in fix for the calibration problem that shows up in watch-time prediction for short-video systems. The core idea is to leave the deployed first-stage model alone and add a second-stage multiplicative correction that targets residual errors varying by watch-time region. They stabilize the long tail with a dynamic distribution-aware transform, condition the correction on observables like video duration, and pull in auxiliary engagement signals through a multi-label module. That combination is the new piece relative to earlier watch-time work. The results look solid on public benchmarks and, more usefully, they ran an industrial A/B test that produced a 1.88-point WUAUC lift, 12.57% MAE drop, and a statistically significant 0.347% increase in average time spent per device. The plug-in framing is a real strength because it avoids the cost of retraining the base predictor. Code is released, which helps reproducibility. The main soft spot is that the online gain is modest and the paper does not test how sensitive the correction is to changes in the base model or to shifts in the underlying distribution. It also leaves open whether the multiplicative form could create new biases in edge cases, though nothing in the reported numbers flags that as a problem. The work is aimed at recsys engineers who already have a watch-time model in production and want a lightweight way to reduce local over- and under-prediction without a full rewrite. It deserves a serious referee because the online validation and the explicit second-stage design give it practical weight beyond another incremental offline improvement. I would send it to review.

Referee Report

0 major / 2 minor

Summary. The paper proposes DADF, a distribution-aware debiasing framework for watch-time regression in short-video recommender systems. It performs second-stage multiplicative residual correction on top of an existing deployed predictor rather than replacing it. The method combines a dynamic distribution-aware transformation to stabilize long-tailed correction targets, a debias-factor-aware module that models heterogeneous residual patterns conditioned on inference-time observables (especially video duration), and a multi-label-aware module that exploits auxiliary signals from engagement heads. Evaluations on public short-video benchmarks and a large-scale industrial ranking system report consistent gains in pointwise accuracy and ranking quality, including a 1.88 percentage-point WUAUC improvement, 12.57% MAE reduction, and a statistically significant 0.347% lift in average time spent per device from online A/B testing.

Significance. If the results hold, DADF provides a practical plug-in solution for addressing local calibration bias in long-tailed continuous targets without retraining base predictors, which is directly relevant to production recommender systems. The work is strengthened by the public release of source code at https://github.com/liuzhao09/DADF, reproducible offline results across multiple backbones and datasets, and statistically significant online A/B metrics that measure real user engagement rather than internal model quantities.

minor comments (2)

[Abstract and Section 5] The abstract and experimental sections report a 1.88 percentage-point WUAUC gain and 12.57% MAE reduction but do not state the absolute baseline values of these metrics; providing the raw baseline numbers alongside the deltas would improve interpretability of the effect sizes.
[Section 3.2] In the description of the debias-factor-aware module, the conditioning on video duration and other observables is motivated by observed residual patterns, but an explicit statement of how these factors are encoded (e.g., as categorical embeddings or continuous features) would clarify the implementation.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive review, the recognition of DADF's practical plug-in nature, and the recommendation to accept. We appreciate the emphasis on reproducibility, code release, and the real-world A/B testing results measuring user engagement.

Circularity Check

0 steps flagged

No significant circularity detected in DADF derivation

full rationale

The paper presents DADF as an empirical second-stage multiplicative correction framework for watch-time regression, explicitly motivated by observed residual bias patterns across long-tailed targets and conditioned on inference-time observables plus auxiliary signals. All reported gains (WUAUC, MAE, online time-spent lift) are measured against external production baselines and public benchmarks rather than quantities defined solely inside the model. No load-bearing equations, self-citations, or fitted parameters are shown to reduce the central claims to inputs by construction; the plug-in design remains additive and independently falsifiable via the stated A/B results and code release.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on standard supervised regression assumptions plus the domain premise that residual bias is locally systematic and observable at inference time via video duration and auxiliary heads. No new physical entities or ad-hoc constants are introduced beyond typical ML hyperparameters.

axioms (1)

domain assumption Residual errors in watch-time regression vary systematically across observed watch-time regions and can be modeled multiplicatively.
Invoked in the description of the debias-factor-aware module and the overall second-stage correction design.

pith-pipeline@v0.9.0 · 5826 in / 1389 out tokens · 48367 ms · 2026-05-20T00:55:41.119253+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages

[1]

Narayanaswamy Balakrishnan and Asit P. Basu. 1996.Exponential Distribution: Theory, Methods and Applications

work page 1996
[2]

Bartlett

Maurice S. Bartlett. 1936. The Square Root Transformation in Analysis of Variance. Supplement to the Journal of the Royal Statistical Society3, 1 (1936), 68–78

work page 1936
[3]

Bartlett

Maurice S. Bartlett. 1947. The Use of Transformations.Biometrics3, 1 (1947), 39–52

work page 1947
[4]

Christopher M. Bishop. 1994.Mixture Density Networks. Technical Report. Aston University

work page 1994
[5]

Stephen Bonner and Flavian Vasile. 2018. Causal Embeddings for Recommen- dation. InProceedings of the 12th ACM Conference on Recommender Systems. 104–112

work page 2018
[6]

George E. P. Box and David R. Cox. 1964. An Analysis of Transformations.Journal of the Royal Statistical Society: Series B (Methodological)26, 2 (1964), 211–252

work page 1964
[7]

Qingpeng Cai, Shuchang Liu, Xueliang Wang, Tianyou Zuo, Wentao Xie, Bin Yang, Dong Zheng, Peng Jiang, and Kun Gai. 2023. Reinforcing User Retention in a Billion Scale Short Video Recommender System. InCompanion Proceedings of the ACM Web Conference 2023. 421–426

work page 2023
[8]

Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep Neural Networks for YouTube Recommendations. InProceedings of the 10th ACM Conference on Recommender Systems. 191–198

work page 2016
[9]

Naihua Duan. 1983. Smearing Estimate: A Nonparametric Retransformation Method.J. Amer. Statist. Assoc.78, 383 (1983), 605–610

work page 1983
[10]

Silvia Ferrari and Francisco Cribari-Neto. 2004. Beta Regression for Modelling Rates and Proportions.Journal of Applied Statistics31, 7 (2004), 799–815

work page 2004
[11]

D. J. Finney. 1941. On the Distribution of a Variate Whose Logarithm Is Normally Distributed.Supplement to the Journal of the Royal Statistical Society7, 2 (1941), 155–161

work page 1941
[12]

Chongming Gao, Shijun Li, Wenqiang Lei, Jiawei Chen, Biao Li, Peng Jiang, Xiangnan He, Jiaxin Mao, and Tat-Seng Chua. 2022. KuaiRec: A Fully-observed Dataset and Insights for Evaluating Recommender Systems. InProceedings of the 31st ACM International Conference on Information and Knowledge Management. 540–550

work page 2022
[13]

Chongming Gao, Shijun Li, Yuan Zhang, Jiawei Chen, Biao Li, Wenqiang Lei, Peng Jiang, and Xiangnan He. 2022. KuaiRand: An Unbiased Sequential Rec- ommendation Dataset with Randomly Exposed Videos. InProceedings of the 31st ACM International Conference on Information and Knowledge Management. 3953–3957

work page 2022
[14]

Xudong Gong, Qinlin Feng, Yuan Zhang, Jiangling Qin, Weijie Ding, Biao Li, Peng Jiang, and Kun Gai. 2022. Real-Time Short Video Recommendation on Mobile Devices. InProceedings of the 31st ACM International Conference on Information and Knowledge Management. 3103–3112

work page 2022
[15]

Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7132–7141

work page 2018
[16]

Koch and Gary M

Roy W. Koch and Gary M. Smillie. 1986. Bias in Hydrologic Prediction Using Log-Transformed Regression Models.Journal of the American Water Resources Association22, 5 (1986), 717–723

work page 1986
[17]

Ron Kohavi, Alex Deng, Brian Frasca, Toby Walker, Ya Xu, and Nils Pohlmann

work page
[18]

InProceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Online Controlled Experiments at Large Scale. InProceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1168–1176

work page
[19]

Chengzhi Lin, Shuchang Liu, Chuyuan Wang, and Yongqi Liu. 2024. Conditional Quantile Estimation for Uncertain Watch Time in Short-Video Recommendation. arXiv:2407.12223

work page arXiv 2024
[20]

Xiao Lin, Xiaokai Chen, Linfeng Song, Jingwei Liu, Biao Li, and Peng Jiang. 2023. Tree Based Progressive Regression Model for Watch-Time Prediction in Short- Video Recommendation. InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 4497–4506

work page 2023
[21]

Jiaqi Ma, Zhe Zhao, Xinyang Yi, Jilin Chen, Lichan Hong, and Ed H. Chi. 2018. Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture- of-Experts. InProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1930–1939

work page 2018
[22]

Xiao Ma, Liqin Zhao, Guan Huang, Zhi Wang, Zelin Hu, Xiaoqiang Zhu, and Kun Gai. 2018. Entire Space Multi-Task Model: An Effective Approach for Estimating Post-Click Conversion Rate. InProceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval. 1137–1140

work page 2018
[23]

McLachlan, Sharon X

Geoffrey J. McLachlan, Sharon X. Lee, and Suren I. Rathnayake. 2019. Finite Mixture Models.Annual Review of Statistics and Its Application6 (2019), 355–378

work page 2019
[24]

Mean Absolute Error. 2016. Mean Absolute Error. Retrieved September 19, 2016, 14

work page 2016
[25]

Sushant More. 2022. Identifying and Overcoming Transformation Bias in Fore- casting Models. arXiv:2208.12264

work page arXiv 2022
[26]

Michael C. Newman. 1993. Regression Analysis of Log-Transformed Data: Sta- tistical Bias and Its Correction.Environmental Toxicology and Chemistry12, 6 (1993), 1129–1133

work page 1993
[27]

Jerzy Neyman and Elizabeth L. Scott. 1960. Correction for Bias Introduced by a Transformation of Variables.The Annals of Mathematical Statistics31, 3 (1960), 643–655

work page 1960
[28]

Vincent Moshi Ouma, Samuel Musili Mwalili, and Anthony Wanjoya Kiberia

work page
[29]

Poisson Inverse Gaussian Regression Model for Infectious Disease Count Data.American Journal of Theoretical and Applied Statistics5, 5 (2016), 326–333

work page 2016
[30]

Yunzhu Pan, Chen Gao, Jianxin Chang, Yanan Niu, Yang Song, Kun Gai, Depeng Jin, and Yong Li. 2023. Understanding and Modeling Passive-Negative Feedback for Short-Video Sequential Recommendation. InProceedings of the 17th ACM Conference on Recommender Systems. 540–550

work page 2023
[31]

George Papamakarios, Eric Nalisnick, Danilo Jimenez Rezende, Shakir Mohamed, and Balaji Lakshminarayanan. 2021. Normalizing Flows for Probabilistic Model- ing and Inference.Journal of Machine Learning Research22, 57 (2021), 1–64

work page 2021
[32]

Yuta Saito, Suguru Yaginuma, Yuta Nishino, Kazuhide Nakata, and Keiichi Sakata

work page
[33]

InProceedings of the 13th International Conference on Web Search and Data Mining

Unbiased Recommender Learning from Missing-Not-at-Random Implicit Feedback. InProceedings of the 13th International Conference on Web Search and Data Mining. 501–509

work page
[34]

Remi M. Sakia. 1992. The Box-Cox Transformation Technique: A Review.Journal of the Royal Statistical Society: Series D (The Statistician)41, 2 (1992), 169–178

work page 1992
[35]

Hiroshi Shono. 2008. Application of the Tweedie Distribution to Zero-Catch Data in CPUE Analysis.Fisheries Research93, 1–2 (2008), 154–162

work page 2008
[36]

Stow, Kenneth H

Craig A. Stow, Kenneth H. Reckhow, and Song S. Qian. 2006. A Bayesian Approach to Retransformation Bias in Transformed Regression.Ecology87, 6 (2006), 1472– 1477

work page 2006
[37]

Strimbu, Alexandru Amarioarei, John Paul McTague, and Mihaela M

Bogdan M. Strimbu, Alexandru Amarioarei, John Paul McTague, and Mihaela M. Paun. 2018. A Posteriori Bias Correction of Three Models Used for Environmental Reporting.Forestry: An International Journal of Forest Research91, 1 (2018), 49–62

work page 2018
[38]

Jie Sun, Zhaoying Ding, Xiaoshuang Chen, Qi Chen, Yincheng Wang, Kaiqiao Zhan, and Ben Wang. 2024. CREAD: A Classification-Restoration Framework with Error Adaptive Discretization for Watch Time Prediction in Video Recom- mender Systems.Proceedings of the AAAI Conference on Artificial Intelligence38, 8 (2024), 9027–9034

work page 2024
[39]

Diane Tang, Ashish Agarwal, Deirdre O’Brien, and Mike Meyer. 2010. Over- lapping Experiment Infrastructure: More, Better, Faster Experimentation. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 17–26

work page 2010
[40]

Hongyan Tang, Junning Liu, Ming Zhao, and Xudong Gong. 2020. Progres- sive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations. InProceedings of the 14th ACM Conference on Recommender Systems. 269–278

work page 2020
[41]

Gomez, Lukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. InAdvances in Neural Information Processing Systems, Vol. 30. 5998– 6008

work page 2017
[42]

Ruoxi Wang, Bin Fu, Gang Fu, and Mingliang Wang. 2017. Deep & Cross Network for Ad Click Predictions. InProceedings of the ADKDD’17. 1–7

work page 2017
[43]

Tianxin Wang, Jingwu Chen, Fuzhen Zhuang, Leyu Lin, Feng Xia, Lihuan Du, and Qing He. 2020. Capturing Attraction Distribution: Sequential Attentive Network for Dwell Time Prediction. InECAI 2020. 529–536

work page 2020
[44]

Wenjie Wang, Fuli Feng, Xiangnan He, and Tat-Seng Chua. 2021. Deconfounded Recommendation for Alleviating Bias Amplification. InProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1717–1725

work page 2021
[45]

Yunpeng Weng, Xing Tang, Zhenhao Xu, Fuyuan Lyu, Dugang Liu, Zexu Sun, and Xiuqiang He. 2024. OptDist: Learning Optimal Distribution for Customer Lifetime Value Prediction. InProceedings of the 33rd ACM International Conference on Information and Knowledge Management. 2523–2533

work page 2024
[46]

Siqi Wu, Marian-Andrei Rizoiu, and Lexing Xie. 2018. Beyond Views: Measuring and Predicting Engagement in Online Videos.Proceedings of the International AAAI Conference on Web and Social Media12, 1 (2018), 434–442

work page 2018
[47]

Dongbo Xi, Zhen Chen, Peng Yan, Yao Zhang, Yongchun Zhu, Fuzhen Zhuang, and Yu Chen. 2021. Modeling the Sequential Dependence among Audience Multi- Step Conversions with Multi-Task Learning in Targeted Display Advertising. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 3745–3755

work page 2021
[48]

Xing Yi, Liangjie Hong, Erheng Zhong, Nanthan Nan Liu, and Suju Rajan. 2014. Beyond Clicks: Dwell Time for Personalization. InProceedings of the 8th ACM Conference on Recommender Systems. 113–120

work page 2014
[49]

Jiahao Yu, Haozhuang Liu, Yeqiu Yang, Lu Chen, Jian Wu, Yuning Jiang, and Bo Zheng. 2025. TranSUN: A Preemptive Paradigm to Eradicate Retransfor- mation Bias Intrinsically from Regression Models in Recommender Systems. arXiv:2505.13881. NeurIPS 2025 poster

work page arXiv 2025
[50]

Ruohan Zhan, Changhua Pei, Qiang Su, Jianfeng Wen, Xueliang Wang, Guanyu Mu, Dong Zheng, Peng Jiang, and Kun Gai. 2022. Deconfounding Duration Bias in Watch-Time Prediction for Video Recommendation. InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 4472–4481

work page 2022
[51]

Haiyuan Zhao, Guohao Cai, Jieming Zhu, Zhenhua Dong, Jun Xu, and Ji-Rong Wen. 2024. Counteracting Duration Bias in Video Recommendation via Coun- terfactual Watch Time. InProceedings of the 30th ACM SIGKDD Conference on RecSys ’26, September 28–October 2, 2026, Minneapolis, MN, USA Yiqing Yang et al. Knowledge Discovery and Data Mining. 4455–4466

work page 2024
[52]

Haiyuan Zhao, Lei Zhang, Jun Xu, Guohao Cai, Zhenhua Dong, and Ji-Rong Wen

work page
[53]

InProceedings of the 17th ACM Conference on Recommender Systems

Uncovering User Interest from Biased and Noised Watch Time in Video Recommendation. InProceedings of the 17th ACM Conference on Recommender Systems. 528–539

work page
[54]

Xu Zhao, RuiBo Ma, Jiaqi Chen, Weiqi Zhao, Ping Yang, and Yao Hu. 2025. Multi-Granularity Distribution Modeling for Video Watch Time Prediction via Exponential-Gaussian Mixture Network. InProceedings of the 19th ACM Confer- ence on Recommender Systems. 309–318

work page 2025
[55]

Yu Zheng, Chen Gao, Jingtao Ding, Lingling Yi, Depeng Jin, Yong Li, and Meng Wang. 2022. DVR: Micro-Video Recommendation Optimizing Watch-Time-Gain under Duration Bias. InProceedings of the 30th ACM International Conference on Multimedia. 334–345

work page 2022
[56]

Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep Interest Network for Click- Through Rate Prediction. InProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1059–1068

work page 2018
[57]

Xinhua Zhuang, Yan Huang, Kannappan Palaniappan, and Yunxin Zhao. 1996. Gaussian Mixture Density Modeling, Decomposition, and Applications.IEEE Transactions on Image Processing5, 9 (1996), 1293–1302. A Proofs A.1 Long-Tailedness Inheritance of the Multiplicative Correction Factor We provide a simple theoretical argument showing that the ratio- style multi...

work page 1996

[1] [1]

Narayanaswamy Balakrishnan and Asit P. Basu. 1996.Exponential Distribution: Theory, Methods and Applications

work page 1996

[2] [2]

Bartlett

Maurice S. Bartlett. 1936. The Square Root Transformation in Analysis of Variance. Supplement to the Journal of the Royal Statistical Society3, 1 (1936), 68–78

work page 1936

[3] [3]

Bartlett

Maurice S. Bartlett. 1947. The Use of Transformations.Biometrics3, 1 (1947), 39–52

work page 1947

[4] [4]

Christopher M. Bishop. 1994.Mixture Density Networks. Technical Report. Aston University

work page 1994

[5] [5]

Stephen Bonner and Flavian Vasile. 2018. Causal Embeddings for Recommen- dation. InProceedings of the 12th ACM Conference on Recommender Systems. 104–112

work page 2018

[6] [6]

George E. P. Box and David R. Cox. 1964. An Analysis of Transformations.Journal of the Royal Statistical Society: Series B (Methodological)26, 2 (1964), 211–252

work page 1964

[7] [7]

Qingpeng Cai, Shuchang Liu, Xueliang Wang, Tianyou Zuo, Wentao Xie, Bin Yang, Dong Zheng, Peng Jiang, and Kun Gai. 2023. Reinforcing User Retention in a Billion Scale Short Video Recommender System. InCompanion Proceedings of the ACM Web Conference 2023. 421–426

work page 2023

[8] [8]

Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep Neural Networks for YouTube Recommendations. InProceedings of the 10th ACM Conference on Recommender Systems. 191–198

work page 2016

[9] [9]

Naihua Duan. 1983. Smearing Estimate: A Nonparametric Retransformation Method.J. Amer. Statist. Assoc.78, 383 (1983), 605–610

work page 1983

[10] [10]

Silvia Ferrari and Francisco Cribari-Neto. 2004. Beta Regression for Modelling Rates and Proportions.Journal of Applied Statistics31, 7 (2004), 799–815

work page 2004

[11] [11]

D. J. Finney. 1941. On the Distribution of a Variate Whose Logarithm Is Normally Distributed.Supplement to the Journal of the Royal Statistical Society7, 2 (1941), 155–161

work page 1941

[12] [12]

Chongming Gao, Shijun Li, Wenqiang Lei, Jiawei Chen, Biao Li, Peng Jiang, Xiangnan He, Jiaxin Mao, and Tat-Seng Chua. 2022. KuaiRec: A Fully-observed Dataset and Insights for Evaluating Recommender Systems. InProceedings of the 31st ACM International Conference on Information and Knowledge Management. 540–550

work page 2022

[13] [13]

Chongming Gao, Shijun Li, Yuan Zhang, Jiawei Chen, Biao Li, Wenqiang Lei, Peng Jiang, and Xiangnan He. 2022. KuaiRand: An Unbiased Sequential Rec- ommendation Dataset with Randomly Exposed Videos. InProceedings of the 31st ACM International Conference on Information and Knowledge Management. 3953–3957

work page 2022

[14] [14]

Xudong Gong, Qinlin Feng, Yuan Zhang, Jiangling Qin, Weijie Ding, Biao Li, Peng Jiang, and Kun Gai. 2022. Real-Time Short Video Recommendation on Mobile Devices. InProceedings of the 31st ACM International Conference on Information and Knowledge Management. 3103–3112

work page 2022

[15] [15]

Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7132–7141

work page 2018

[16] [16]

Koch and Gary M

Roy W. Koch and Gary M. Smillie. 1986. Bias in Hydrologic Prediction Using Log-Transformed Regression Models.Journal of the American Water Resources Association22, 5 (1986), 717–723

work page 1986

[17] [17]

Ron Kohavi, Alex Deng, Brian Frasca, Toby Walker, Ya Xu, and Nils Pohlmann

work page

[18] [18]

InProceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Online Controlled Experiments at Large Scale. InProceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1168–1176

work page

[19] [19]

Chengzhi Lin, Shuchang Liu, Chuyuan Wang, and Yongqi Liu. 2024. Conditional Quantile Estimation for Uncertain Watch Time in Short-Video Recommendation. arXiv:2407.12223

work page arXiv 2024

[20] [20]

Xiao Lin, Xiaokai Chen, Linfeng Song, Jingwei Liu, Biao Li, and Peng Jiang. 2023. Tree Based Progressive Regression Model for Watch-Time Prediction in Short- Video Recommendation. InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 4497–4506

work page 2023

[21] [21]

Jiaqi Ma, Zhe Zhao, Xinyang Yi, Jilin Chen, Lichan Hong, and Ed H. Chi. 2018. Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture- of-Experts. InProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1930–1939

work page 2018

[22] [22]

Xiao Ma, Liqin Zhao, Guan Huang, Zhi Wang, Zelin Hu, Xiaoqiang Zhu, and Kun Gai. 2018. Entire Space Multi-Task Model: An Effective Approach for Estimating Post-Click Conversion Rate. InProceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval. 1137–1140

work page 2018

[23] [23]

McLachlan, Sharon X

Geoffrey J. McLachlan, Sharon X. Lee, and Suren I. Rathnayake. 2019. Finite Mixture Models.Annual Review of Statistics and Its Application6 (2019), 355–378

work page 2019

[24] [24]

Mean Absolute Error. 2016. Mean Absolute Error. Retrieved September 19, 2016, 14

work page 2016

[25] [25]

Sushant More. 2022. Identifying and Overcoming Transformation Bias in Fore- casting Models. arXiv:2208.12264

work page arXiv 2022

[26] [26]

Michael C. Newman. 1993. Regression Analysis of Log-Transformed Data: Sta- tistical Bias and Its Correction.Environmental Toxicology and Chemistry12, 6 (1993), 1129–1133

work page 1993

[27] [27]

Jerzy Neyman and Elizabeth L. Scott. 1960. Correction for Bias Introduced by a Transformation of Variables.The Annals of Mathematical Statistics31, 3 (1960), 643–655

work page 1960

[28] [28]

Vincent Moshi Ouma, Samuel Musili Mwalili, and Anthony Wanjoya Kiberia

work page

[29] [29]

Poisson Inverse Gaussian Regression Model for Infectious Disease Count Data.American Journal of Theoretical and Applied Statistics5, 5 (2016), 326–333

work page 2016

[30] [30]

Yunzhu Pan, Chen Gao, Jianxin Chang, Yanan Niu, Yang Song, Kun Gai, Depeng Jin, and Yong Li. 2023. Understanding and Modeling Passive-Negative Feedback for Short-Video Sequential Recommendation. InProceedings of the 17th ACM Conference on Recommender Systems. 540–550

work page 2023

[31] [31]

George Papamakarios, Eric Nalisnick, Danilo Jimenez Rezende, Shakir Mohamed, and Balaji Lakshminarayanan. 2021. Normalizing Flows for Probabilistic Model- ing and Inference.Journal of Machine Learning Research22, 57 (2021), 1–64

work page 2021

[32] [32]

Yuta Saito, Suguru Yaginuma, Yuta Nishino, Kazuhide Nakata, and Keiichi Sakata

work page

[33] [33]

InProceedings of the 13th International Conference on Web Search and Data Mining

Unbiased Recommender Learning from Missing-Not-at-Random Implicit Feedback. InProceedings of the 13th International Conference on Web Search and Data Mining. 501–509

work page

[34] [34]

Remi M. Sakia. 1992. The Box-Cox Transformation Technique: A Review.Journal of the Royal Statistical Society: Series D (The Statistician)41, 2 (1992), 169–178

work page 1992

[35] [35]

Hiroshi Shono. 2008. Application of the Tweedie Distribution to Zero-Catch Data in CPUE Analysis.Fisheries Research93, 1–2 (2008), 154–162

work page 2008

[36] [36]

Stow, Kenneth H

Craig A. Stow, Kenneth H. Reckhow, and Song S. Qian. 2006. A Bayesian Approach to Retransformation Bias in Transformed Regression.Ecology87, 6 (2006), 1472– 1477

work page 2006

[37] [37]

Strimbu, Alexandru Amarioarei, John Paul McTague, and Mihaela M

Bogdan M. Strimbu, Alexandru Amarioarei, John Paul McTague, and Mihaela M. Paun. 2018. A Posteriori Bias Correction of Three Models Used for Environmental Reporting.Forestry: An International Journal of Forest Research91, 1 (2018), 49–62

work page 2018

[38] [38]

Jie Sun, Zhaoying Ding, Xiaoshuang Chen, Qi Chen, Yincheng Wang, Kaiqiao Zhan, and Ben Wang. 2024. CREAD: A Classification-Restoration Framework with Error Adaptive Discretization for Watch Time Prediction in Video Recom- mender Systems.Proceedings of the AAAI Conference on Artificial Intelligence38, 8 (2024), 9027–9034

work page 2024

[39] [39]

Diane Tang, Ashish Agarwal, Deirdre O’Brien, and Mike Meyer. 2010. Over- lapping Experiment Infrastructure: More, Better, Faster Experimentation. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 17–26

work page 2010

[40] [40]

Hongyan Tang, Junning Liu, Ming Zhao, and Xudong Gong. 2020. Progres- sive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations. InProceedings of the 14th ACM Conference on Recommender Systems. 269–278

work page 2020

[41] [41]

Gomez, Lukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. InAdvances in Neural Information Processing Systems, Vol. 30. 5998– 6008

work page 2017

[42] [42]

Ruoxi Wang, Bin Fu, Gang Fu, and Mingliang Wang. 2017. Deep & Cross Network for Ad Click Predictions. InProceedings of the ADKDD’17. 1–7

work page 2017

[43] [43]

Tianxin Wang, Jingwu Chen, Fuzhen Zhuang, Leyu Lin, Feng Xia, Lihuan Du, and Qing He. 2020. Capturing Attraction Distribution: Sequential Attentive Network for Dwell Time Prediction. InECAI 2020. 529–536

work page 2020

[44] [44]

Wenjie Wang, Fuli Feng, Xiangnan He, and Tat-Seng Chua. 2021. Deconfounded Recommendation for Alleviating Bias Amplification. InProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1717–1725

work page 2021

[45] [45]

Yunpeng Weng, Xing Tang, Zhenhao Xu, Fuyuan Lyu, Dugang Liu, Zexu Sun, and Xiuqiang He. 2024. OptDist: Learning Optimal Distribution for Customer Lifetime Value Prediction. InProceedings of the 33rd ACM International Conference on Information and Knowledge Management. 2523–2533

work page 2024

[46] [46]

Siqi Wu, Marian-Andrei Rizoiu, and Lexing Xie. 2018. Beyond Views: Measuring and Predicting Engagement in Online Videos.Proceedings of the International AAAI Conference on Web and Social Media12, 1 (2018), 434–442

work page 2018

[47] [47]

Dongbo Xi, Zhen Chen, Peng Yan, Yao Zhang, Yongchun Zhu, Fuzhen Zhuang, and Yu Chen. 2021. Modeling the Sequential Dependence among Audience Multi- Step Conversions with Multi-Task Learning in Targeted Display Advertising. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 3745–3755

work page 2021

[48] [48]

Xing Yi, Liangjie Hong, Erheng Zhong, Nanthan Nan Liu, and Suju Rajan. 2014. Beyond Clicks: Dwell Time for Personalization. InProceedings of the 8th ACM Conference on Recommender Systems. 113–120

work page 2014

[49] [49]

Jiahao Yu, Haozhuang Liu, Yeqiu Yang, Lu Chen, Jian Wu, Yuning Jiang, and Bo Zheng. 2025. TranSUN: A Preemptive Paradigm to Eradicate Retransfor- mation Bias Intrinsically from Regression Models in Recommender Systems. arXiv:2505.13881. NeurIPS 2025 poster

work page arXiv 2025

[50] [50]

Ruohan Zhan, Changhua Pei, Qiang Su, Jianfeng Wen, Xueliang Wang, Guanyu Mu, Dong Zheng, Peng Jiang, and Kun Gai. 2022. Deconfounding Duration Bias in Watch-Time Prediction for Video Recommendation. InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 4472–4481

work page 2022

[51] [51]

Haiyuan Zhao, Guohao Cai, Jieming Zhu, Zhenhua Dong, Jun Xu, and Ji-Rong Wen. 2024. Counteracting Duration Bias in Video Recommendation via Coun- terfactual Watch Time. InProceedings of the 30th ACM SIGKDD Conference on RecSys ’26, September 28–October 2, 2026, Minneapolis, MN, USA Yiqing Yang et al. Knowledge Discovery and Data Mining. 4455–4466

work page 2024

[52] [52]

Haiyuan Zhao, Lei Zhang, Jun Xu, Guohao Cai, Zhenhua Dong, and Ji-Rong Wen

work page

[53] [53]

InProceedings of the 17th ACM Conference on Recommender Systems

Uncovering User Interest from Biased and Noised Watch Time in Video Recommendation. InProceedings of the 17th ACM Conference on Recommender Systems. 528–539

work page

[54] [54]

Xu Zhao, RuiBo Ma, Jiaqi Chen, Weiqi Zhao, Ping Yang, and Yao Hu. 2025. Multi-Granularity Distribution Modeling for Video Watch Time Prediction via Exponential-Gaussian Mixture Network. InProceedings of the 19th ACM Confer- ence on Recommender Systems. 309–318

work page 2025

[55] [55]

Yu Zheng, Chen Gao, Jingtao Ding, Lingling Yi, Depeng Jin, Yong Li, and Meng Wang. 2022. DVR: Micro-Video Recommendation Optimizing Watch-Time-Gain under Duration Bias. InProceedings of the 30th ACM International Conference on Multimedia. 334–345

work page 2022

[56] [56]

Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep Interest Network for Click- Through Rate Prediction. InProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1059–1068

work page 2018

[57] [57]

Xinhua Zhuang, Yan Huang, Kannappan Palaniappan, and Yunxin Zhao. 1996. Gaussian Mixture Density Modeling, Decomposition, and Applications.IEEE Transactions on Image Processing5, 9 (1996), 1293–1302. A Proofs A.1 Long-Tailedness Inheritance of the Multiplicative Correction Factor We provide a simple theoretical argument showing that the ratio- style multi...

work page 1996