KuaiLive: A Real-time Interactive Dataset for Live Streaming Recommendation
Pith reviewed 2026-05-18 23:59 UTC · model grok-4.3
The pith
KuaiLive supplies the first public dataset with exact live room start and end times plus multi-type real-time interactions from a major streaming platform.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
KuaiLive is the first real-time, interactive dataset collected from a leading live streaming platform that includes precise live room start and end timestamps, multiple types of real-time user interactions (click, comment, like, gift), and rich side information features for both users and streamers, enabling more realistic simulation of dynamic candidate items and better modeling of user and streamer behaviors.
What carries the argument
The KuaiLive dataset, built around precise live-room timestamps and four distinct real-time interaction types plus side information, supplies the concrete records that allow dynamic candidate simulation.
If this is right
- Supports top-K recommendation, click-through rate prediction, watch-time prediction, and gift-price prediction under live conditions.
- Enables studies of multi-behavior modeling that combine clicks, comments, likes, and gifts.
- Allows multi-task learning experiments that jointly optimize several prediction targets.
- Provides side information suitable for fairness-aware recommendation research.
Where Pith is reading between the lines
- The timestamped interaction sequences could be used to test algorithms that must adapt to sudden changes in available items.
- Similar fine-grained logs might help identify engagement patterns that appear only when live content evolves continuously.
- The dataset structure invites direct comparison of model robustness between live streaming and static video or product recommendation settings.
Load-bearing premise
Data gathered from one Chinese platform across a 21-day window represents typical live-streaming behavior without large platform-specific biases or missing interactions.
What would settle it
Training a recommendation model on KuaiLive and then measuring a sharp drop in performance when the same model is tested on interaction logs from a different live-streaming service would indicate the dataset does not generalize.
Figures
read the original abstract
Live streaming platforms have become a dominant form of online content consumption, offering dynamically evolving content, real-time interactions, and highly engaging user experiences. These unique characteristics introduce new challenges that differentiate live streaming recommendation from traditional recommendation settings and have garnered increasing attention from industry in recent years. However, research progress in academia has been hindered by the lack of publicly available datasets that accurately reflect the dynamic nature of live streaming environments. To address this gap, we introduce KuaiLive, the first real-time, interactive dataset collected from Kuaishou, a leading live streaming platform in China with over 400 million daily active users. The dataset records the interaction logs of 23,772 users and 452,621 streamers over a 21-day period. Compared to existing datasets, KuaiLive offers several advantages: it includes precise live room start and end timestamps, multiple types of real-time user interactions (click, comment, like, gift), and rich side information features for both users and streamers. These features enable more realistic simulation of dynamic candidate items and better modeling of user and streamer behaviors. We conduct a thorough analysis of KuaiLive from multiple perspectives and evaluate several representative recommendation methods on it, establishing a strong benchmark for future research. KuaiLive can support a wide range of tasks in the live streaming domain, such as top-K recommendation, click-through rate prediction, watch time prediction, and gift price prediction. Moreover, its fine-grained behavioral data also enables research on multi-behavior modeling, multi-task learning, and fairness-aware recommendation. The dataset and related resources are publicly available at https://imgkkk574.github.io/KuaiLive.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces KuaiLive, a public dataset collected from the Kuaishou live streaming platform. It records interactions from 23,772 users and 452,621 streamers over 21 days, including precise live room start/end timestamps, multi-type real-time interactions (click, comment, like, gift), and rich side information for users and streamers. The authors provide multi-perspective analysis and benchmark representative recommendation methods to support tasks such as top-K recommendation, CTR prediction, watch time prediction, gift price prediction, multi-behavior modeling, multi-task learning, and fairness-aware recommendation.
Significance. If the collected data faithfully captures live streaming dynamics, KuaiLive fills a notable gap by supplying the first public resource with fine-grained temporal and multi-behavior features for this domain. The public release, combined with benchmark results and support for diverse tasks, positions it as a useful foundation for advancing research on dynamic candidate sets and real-time user/streamer behaviors.
major comments (1)
- Abstract: The central claim that the dataset 'enables more realistic simulation of dynamic candidate items' due to precise timestamps and real-time interactions is load-bearing but unsupported by direct evidence. The described benchmarks evaluate standard methods without an ablation or comparison demonstrating improved simulation fidelity attributable to these features versus prior datasets lacking them.
minor comments (2)
- Abstract: Add a brief limitations paragraph noting the 21-day single-platform collection window and any potential platform-specific biases in interaction patterns (e.g., gift or comment mechanics) to help readers assess generalizability.
- Dataset construction section: Provide additional details on data cleaning steps, filtering criteria, and coverage statistics (e.g., fraction of interactions retained) to allow verification of completeness and reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and the recommendation for minor revision. We address the single major comment point by point below, with an honest assessment of the manuscript's current support for the claim in question.
read point-by-point responses
-
Referee: Abstract: The central claim that the dataset 'enables more realistic simulation of dynamic candidate items' due to precise timestamps and real-time interactions is load-bearing but unsupported by direct evidence. The described benchmarks evaluate standard methods without an ablation or comparison demonstrating improved simulation fidelity attributable to these features versus prior datasets lacking them.
Authors: We thank the referee for this observation. The abstract claim is grounded in the dataset's distinguishing characteristics: precise live-room start/end timestamps and multi-type timestamped interactions (click, comment, like, gift) that allow candidate sets to be reconstructed as they evolve in real time, in contrast to the static item pools typical of prior public datasets. The multi-perspective analysis and the reported benchmarks on tasks such as CTR prediction, watch-time prediction, and multi-behavior modeling demonstrate that these features can be directly exploited by standard methods. Nevertheless, we acknowledge that the current experiments do not contain an explicit ablation or cross-dataset comparison that quantifies an improvement in simulation fidelity. To address the concern without overstating the evidence, we will revise the abstract to replace 'enables more realistic simulation' with the more precise phrasing 'facilitates more realistic simulation of dynamic candidate items by providing precise timestamps and real-time interaction logs.' We will also add one clarifying sentence in Section 3 (Dataset Description) that explicitly ties the claim to the data properties rather than to new empirical results. These changes constitute a minor revision that directly responds to the comment. revision: yes
Circularity Check
No circularity: dataset release with independent empirical contribution
full rationale
KuaiLive is a data-collection paper whose central claims rest on the described properties of the released logs (precise timestamps, multi-type interactions, side information) rather than any derivation, equation, or fitted parameter. No self-definitional loops, fitted-input predictions, or load-bearing self-citations appear in the provided text; the 'first' and 'more realistic' assertions are direct descriptions of the collected data, not reductions to prior author work or internal fits. The 21-day Kuaishou sample's representativeness is an external assumption open to falsification, not a circular step inside the paper's own chain.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The interaction logs collected from Kuaishou accurately capture all relevant real-time user and streamer behaviors without material platform-specific bias or missing data.
Forward citations
Cited by 3 Pith papers
-
Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces
OmniBehavior benchmark demonstrates that LLMs simulating real human behavior converge on hyper-active positive average personas, losing long-tail individual differences.
-
Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces
Introduces OmniBehavior benchmark from real-world data and shows LLMs exhibit hyper-activity, persona homogenization, and utopian bias in behavior simulation.
-
Dual-Rerank: Fusing Causality and Utility for Industrial Generative Reranking
Dual-Rerank fuses autoregressive and non-autoregressive generative reranking via knowledge distillation and uses list-wise decoupled RL optimization to improve whole-page utility and cut latency in industrial video search.
Reference graph
Works this paper leans on
- [1]
-
[2]
Sirui Chen, Yuan Wang, Zijing Wen, Zhiyu Li, Changshuo Zhang, Xiao Zhang, Quan Lin, Cheng Zhu, and Jun Xu. 2023. Controllable Multi-Objective Re-ranking with Policy Hypernetworks. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining . 3855–3864
work page 2023
-
[3]
Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al
-
[4]
In Proceedings of the 1st workshop on deep learning for recommender systems
Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems . 7–10
-
[5]
Sunhao Dai, Changle Qu, Sirui Chen, Xiao Zhang, and Jun Xu. 2024. Recode: Modeling repeat consumption with neural ode. In Proceedings of the 47th In- ternational ACM SIGIR Conference on Research and Development in Information Retrieval. 2599–2603
work page 2024
- [6]
-
[7]
Jiaxin Deng, Shiyao Wang, Kuo Cai, Lejian Ren, Qigen Hu, Weifeng Ding, Qiang Luo, and Guorui Zhou. 2025. Onerec: Unifying retrieve and rank with generative recommender and iterative preference alignment. arXiv preprint arXiv:2502.18965 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[8]
Jiaxin Deng, Shiyao Wang, Dong Shen, Liqin Zhao, Fan Yang, Guorui Zhou, and Gaofeng Meng. 2024. A Multimodal Transformer for Live Streaming Highlight Prediction. In 2024 IEEE International Conference on Multimedia and Expo (ICME) . IEEE, 1–6
work page 2024
-
[9]
Jiaxin Deng, Shiyao Wang, Yuchen Wang, Jiansong Qi, Liqin Zhao, Guorui Zhou, and Gaofeng Meng. 2024. MMBee: Live Streaming Gift-Sending Recommenda- tions via Multi-Modal Fusion and Behaviour Expansion. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining . 4896–4905
work page 2024
-
[10]
Manqing Dong, Feng Yuan, Lina Yao, Xiwei Xu, and Liming Zhu. 2020. Mamo: Memory-augmented meta-optimization for cold-start recommendation. In Pro- ceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining. 688–697
work page 2020
-
[11]
Chongming Gao, Shijun Li, Wenqiang Lei, Jiawei Chen, Biao Li, Peng Jiang, Xiangnan He, Jiaxin Mao, and Tat-Seng Chua. 2022. KuaiRec: A Fully-Observed Dataset and Insights for Evaluating Recommender Systems. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management (Atlanta, GA, USA) (CIKM ’22). 540–550. doi:10.1145/3511...
-
[12]
Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: a factorization-machine based neural network for CTR prediction. In Proceedings of IJCAI. (2017)
work page 2017
-
[13]
Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, and Meng Wang. 2020. Lightgcn: Simplifying and powering graph convolution network for recommendation. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval . 639–648
work page 2020
-
[14]
Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. In Proceedings of the 26th international conference on world wide web . 173–182
work page 2017
-
[15]
Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk
-
[16]
In Proceedings of 4th International Conference on Learning Representations (ICLR)
Session-based recommendations with recurrent neural networks. In Proceedings of 4th International Conference on Learning Representations (ICLR). (2015)
work page 2015
-
[17]
Bowen Jin, Chen Gao, Xiangnan He, Depeng Jin, and Yong Li. 2020. Multi- behavior recommendation with graph convolutional networks. In Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval. 659–668
work page 2020
-
[18]
Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recom- mendation. In 2018 IEEE international conference on data mining (ICDM) . IEEE, 197–206
work page 2018
-
[19]
Hoyeop Lee, Jinbae Im, Seongwon Jang, Hyunsouk Cho, and Sehee Chung. 2019. Melu: Meta-learned user preference estimator for cold-start recommendation. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining . 1073–1082
work page 2019
-
[20]
Jiayu Li, Hanyu Li, Zhiyu He, Weizhi Ma, Peijie Sun, Min Zhang, and Shaoping Ma. 2024. ReChorus2. 0: A Modular and Task-Flexible Recommendation Library. In Proceedings of the 18th ACM Conference on Recommender Systems . 454–464
work page 2024
-
[21]
Jing Li, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Tao Lian, and Jun Ma. 2017. Neural attentive session-based recommendation. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management . 1419–1428
work page 2017
-
[22]
Jiacheng Li, Yujie Wang, and Julian McAuley. 2020. Time interval aware self- attention for sequential recommendation. In Proceedings of the 13th international conference on web search and data mining . 322–330
work page 2020
- [23]
-
[24]
Jianxun Lian, Xiaohuan Zhou, Fuzheng Zhang, Zhongxia Chen, Xing Xie, and Guangzhong Sun. 2018. xdeepfm: Combining explicit and implicit feature in- teractions for recommender systems. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining . 1754–1763
work page 2018
- [25]
- [26]
- [27]
-
[28]
Qi Pi, Weijie Bian, Guorui Zhou, Xiaoqiang Zhu, and Kun Gai. 2019. Practice on long sequential user behavior modeling for click-through rate prediction. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining . 2671–2679
work page 2019
- [29]
-
[30]
Changle Qu, Liqin Zhao, Yanan Niu, Xiao Zhang, and Jun Xu. 2025. Bridging Short Videos and Streamers with Multi-Graph Contrastive Learning for Live Streaming Recommendation. In Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval . 2059–2069
work page 2025
-
[31]
Jérémie Rappaz, Julian McAuley, and Karl Aberer. 2021. Recommendation on live-streaming platforms: Dynamic availability and repeat consumption. In Pro- ceedings of the 15th ACM Conference on Recommender Systems . 390–399
work page 2021
-
[32]
Steffen Rendle. 2010. Factorization machines. In2010 IEEE International conference on data mining. IEEE, 995–1000
work page 2010
-
[33]
Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme
-
[34]
In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence . 452–461
-
[35]
Chenglei Shen, Xiao Zhang, Teng Shi, Changshuo Zhang, Guofu Xie, Jun Xu, Ming He, and Jianping Fan. 2025. A survey of controllable learning: Methods and applications in information retrieval. Frontiers of Computer Science (2025), https://doi.org/10.1007/s11704–025–41366–5
-
[36]
Zhongxiang Sun, Zihua Si, Xiaoxue Zang, Dewei Leng, Yanan Niu, Yang Song, Xiao Zhang, and Jun Xu. 2023. KuaiSar: A unified search and recommendation dataset. In Proceedings of the 32nd ACM international conference on information and knowledge management. 5407–5411
work page 2023
-
[37]
Jiaxi Tang and Ke Wang. 2018. Personalized top-n sequential recommenda- tion via convolutional sequence embedding. In Proceedings of the eleventh ACM international conference on web search and data mining . 565–573
work page 2018
-
[38]
Chenyang Wang, Yuanqing Yu, Weizhi Ma, Min Zhang, Chong Chen, Yiqun Liu, and Shaoping Ma. 2022. Towards representation alignment and uniformity in collaborative filtering. In Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining . 1816–1825
work page 2022
-
[39]
Chenyang Wang, Min Zhang, Weizhi Ma, Yiqun Liu, and Shaoping Ma. 2020. Make it a chorus: knowledge-and time-aware item modeling for sequential rec- ommendation. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval . 109–118
work page 2020
-
[40]
Ruoxi Wang, Bin Fu, Gang Fu, and Mingliang Wang. 2017. Deep & cross network for ad click predictions. In Proceedings of the ADKDD’17. 1–7
work page 2017
-
[41]
Ruoxi Wang, Rakesh Shivanna, Derek Cheng, Sagar Jain, Dong Lin, Lichan Hong, and Ed Chi. 2021. Dcn v2: Improved deep & cross network and practical lessons for web-scale learning to rank systems. In Proceedings of the web conference 2021 . 1785–1797
work page 2021
-
[42]
Dinghao Xi, Liumin Tang, Runyu Chen, and Wei Xu. 2023. A multimodal time- series method for gifting prediction in live streaming platforms. Information Processing & Management 60, 3 (2023), 103254
work page 2023
-
[43]
Chen Xu, Sirui Chen, Jun Xu, Weiran Shen, Xiao Zhang, Gang Wang, and Zhenhua Dong. 2023. P-MMF: Provider max-min fairness re-ranking in recommender system. In Proceedings of the ACM Web Conference 2023 . 3701–3711
work page 2023
-
[44]
Xiaopeng Ye, Chen Xu, Jun Xu, Xuyang Xie, Gang Wang, and Zhenhua Dong
-
[45]
In Proceedings of the 33rd ACM Inter- national Conference on Information and Knowledge Management
Guaranteeing Accuracy and Fairness under Fluctuating User Traffic: A Bankruptcy-Inspired Re-ranking Approach. In Proceedings of the 33rd ACM Inter- national Conference on Information and Knowledge Management . 2991–3001
-
[46]
Sanshi Yu, Zhuoxuan Jiang, Dong-Dong Chen, Shanshan Feng, Dongsheng Li, Qi Liu, and Jinfeng Yi. 2021. Leveraging tripartite interaction information from live stream e-commerce for improving product recommendation. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining . Conference acronym ’XX, June 03–05, 2018, Woodstock, NY...
work page 2021
-
[47]
Enming Yuan, Wei Guo, Zhicheng He, Huifeng Guo, Chengkai Liu, and Ruiming Tang. 2022. Multi-behavior sequential transformer recommender. In Proceedings of the 45th international ACM SIGIR conference on research and development in information retrieval. 1642–1652
work page 2022
-
[48]
Guanghu Yuan, Fajie Yuan, Yudong Li, Beibei Kong, Shujie Li, Lei Chen, Min Yang, Chenyun Yu, Bo Hu, Zang Li, et al . 2022. Tenrec: A large-scale multipurpose benchmark dataset for recommender systems. Advances in Neural Information Processing Systems 35 (2022), 11480–11493
work page 2022
-
[49]
Kepu Zhang, Teng Shi, Sunhao Dai, Xiao Zhang, Yinfeng Li, Jing Lu, Xiaoxue Zang, Yang Song, and Jun Xu. 2024. SAQRec: Aligning Recommender Systems to User Satisfaction via Questionnaire Feedback. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management . 3165–3175
work page 2024
-
[50]
Xiao Zhang, Sunhao Dai, Jun Xu, Zhenhua Dong, Quanyu Dai, and Ji-Rong Wen
-
[51]
In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2022)
Counteracting user attention bias in music streaming recommendation via reward modification. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2022) . 2504–2514
work page 2022
-
[52]
Xiao Zhang, Haonan Jia, Hanjing Su, Wenhan Wang, Jun Xu, and Ji-Rong Wen
-
[53]
Counterfactual reward modification for streaming recommendation with delayed feedback. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021) . 41–50
work page 2021
-
[54]
Yixin Zhang, Yong Liu, Hao Xiong, Yi Liu, Fuqiang Yu, Wei He, Yonghui Xu, Lizhen Cui, and Chunyan Miao. 2023. Cross-domain disentangled learning for e-commerce live streaming recommendation. In 2023 IEEE 39th International Conference on Data Engineering (ICDE) . IEEE, 2955–2968
work page 2023
-
[55]
Jiawei Zheng, Hao Gu, Chonggang Song, Dandan Lin, Lingling Yi, and Chuan Chen. 2023. Dual Interests-Aligned Graph Auto-Encoders for Cross-domain Rec- ommendation in WeChat. InProceedings of the 32nd ACM International Conference on Information and Knowledge Management . 4988–4994
work page 2023
- [56]
-
[57]
Guorui Zhou, Na Mou, Ying Fan, Qi Pi, Weijie Bian, Chang Zhou, Xiaoqiang Zhu, and Kun Gai. 2019. Deep interest evolution network for click-through rate prediction. In Proceedings of the AAAI conference on artificial intelligence , Vol. 33. 5941–5948
work page 2019
-
[58]
Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep interest network for click-through rate prediction. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining . 1059–1068
work page 2018
-
[59]
Mengxiao Zhu, Qi Shu, Shuanghong Shen, Li Feng, Jiancan Wu, and Zhenya Huang. 2025. Live Streaming Recommendation Based on Multiple Types of Repeated Behaviors. Expert Systems with Applications (2025), 128217
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.