arxiv: 2604.25834 · v1 · submitted 2026-04-28 · 💻 cs.AI · cs.IR

Recognition: unknown

Action-Aware Generative Sequence Modeling for Short Video Recommendation

Wenhao Li , Zihan Lin , Zhengxiao Guo , Jie Zhou , Shukai Liu , Yongqi Liu , Chuan Luo , Chaoyi Ma

show 2 more authors

Ruiming Tang Han Li

Authors on Pith no claims yet

Pith reviewed 2026-05-07 16:15 UTC · model grok-4.3

classification 💻 cs.AI cs.IR

keywords short video recommendationaction sequencestemporal patternsgenerative modelingcontext-aware attentionhierarchical encodingautoregressive generationuser intention modeling

0 comments

The pith

By chaining timed user actions into sequences, a generative network models nuanced preferences in short videos better than binary whole-video classifications.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Short videos have multiple segments that elicit different responses from users over time, but standard models classify each video as a single liked or disliked item. The paper shows through data analysis that the specific timing of actions signals distinct user intentions. It introduces a network that builds these actions into sequences, enriches them with context, encodes their patterns hierarchically, and generates predictions autoregressively. This unified sequential treatment improves recommendation quality, as measured in offline tests and live platform experiments.

Core claim

The paper establishes that the timing of user actions can represent diverse intentions through statistical analysis and examination of action patterns. It proposes the Action-Aware Generative Sequence Network (A2Gen), which refines user actions along the temporal dimension and chains them into sequences for unified processing and prediction using a Context-aware Attention Module to incorporate item-specific features, a Hierarchical Sequence Encoder to learn temporal patterns, and an Action-seq Autoregressive Generator to produce future action sequences.

What carries the argument

Action-Aware Generative Sequence Network (A2Gen), which builds and generates temporal sequences of user actions enriched by attention and hierarchical encoding to unify preference modeling and prediction.

Load-bearing premise

The timing of user actions represents diverse intentions rather than arising mainly from video length, random behavior, or platform effects.

What would settle it

A controlled experiment in which action timestamps are randomly shuffled before feeding the model, yet prediction accuracy remains unchanged, would show that temporal order adds no value.

Figures

Figures reproduced from arXiv: 2604.25834 by Chaoyi Ma, Chuan Luo, Han Li, Jie Zhou, Ruiming Tang, Shukai Liu, Wenhao Li, Yongqi Liu, Zhengxiao Guo, Zihan Lin.

**Figure 1.** Figure 1: Consider a short video titled “Interview: Do you like Messi or Ronaldo?” A user gives the video a view at source ↗

**Figure 3.** Figure 3: The distribution of 𝐿𝑖𝑘𝑒 action timing (the x-axis represents time (s) and the y-axis represents the occurrence rate). probabilistically drops specific tasks during optimization to dynamically selecting beneficial knowledge to transfer. Wang Xu et al. proposed the HoME [26] model to address expert network collapse. Scene-wise adaptive networks [9] further tackle dynamic cold-start optimization in CTR pred… view at source ↗

**Figure 4.** Figure 4: The modeling process of A2Gen. For instance, if a user follows the author while watching a video, they are more likely to subsequently 𝐿𝑖𝑘𝑒 that video. Building on the above analysis, a model should precisely capture the segments that truly reflect users’ interests and exploit the hidden information contained in the sequence of actions during video consumption. Since watching a short video is inherently a… view at source ↗

**Figure 5.** Figure 5: Context-aware Attention Module (CAM) 4 Approach 4.1 Context-aware Attention Module To effectively model users’ historical item sequences, action sequences, and the target action sequence, a general-purpose sequence processing module is required. Our sequences share similarities with text sequences: (1) Each position in the sequence involves multi-class predictions with a finite set of categories; (2) Th… view at source ↗

**Figure 6.** Figure 6: Hierarchical Sequence Encoder (HSE) view at source ↗

**Figure 7.** Figure 7: Action-seq Autoregressive Generator (AAG) view at source ↗

**Figure 8.** Figure 8: The overall architecture of A2Gen. 𝑉 𝑒𝑐hist, the representation of user historical action sequences; (3) the ground-truth action sequence on the target item. Parallelized training is achieved by adopting the masking mechanism as Transformer [21], allowing the model to simultaneously compute the action types and occurrence times at all positions in the sequence: 𝐹context = Concat(𝐹𝑢, 𝐹𝑥target,𝑉 𝑒𝑐hist), (9… view at source ↗

**Figure 9.** Figure 9: Hyper-parameter analysis of the Loss function on the view at source ↗

read the original abstract

With the rapid development of the Internet, users have increasingly higher expectations for the recommendation accuracy of online content consumption platforms. However, short videos often contain diverse segments, and users may not hold the same attitude toward all of them. Traditional binary-classification recommendation models, which treat a video as a single holistic entity, face limitations in accurately capturing such nuanced preferences. Considering that user consumption is a temporal process, this paper demonstrates that the timing of user actions can represent diverse intentions through statistical analysis and examination of action patterns. Based on this insight, we propose a novel modeling paradigm: Action-Aware Generative Sequence Network (A2Gen), which refines user actions along the temporal dimension and chains them into sequences for unified processing and prediction. First, we introduce the Context-aware Attention Module (CAM) to model action sequences enriched with item-specific contextual features. Building upon this, we develop the Hierarchical Sequence Encoder (HSE) to learn temporal action patterns from users' historical actions. Finally, through leveraging CAM, we design a module for action sequence generation: the Action-seq Autoregressive Generator (AAG). Extensive offline experiments on the Kuaishou's dataset and the Tmall public dataset demonstrate the superiority of our proposed model. Furthermore, through large-scale online A/B testing deployed on Kuaishou's platform, our model achieves significant improvements over baseline methods in multi-task prediction by leveraging sequential information. Specifically, it yields increases of 0.34% in user watch time, 8.1% in interaction rate, and 0.162% in overall user retention (LifeTime-7), leading to successful deployment across all traffic, serving over 400 million users every day.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies sequence modeling to action-level short video data with a generative twist and backs it with real online lifts at Kuaishou scale, but the key premise on action timing as intention proxy rests on patterns that may need tighter controls.

read the letter

The main thing here is a model called A2Gen that breaks short videos into action sequences instead of treating each video as one binary event. It adds a context-aware attention module, a hierarchical encoder for temporal patterns, and an autoregressive generator for the sequences. The online A/B test on the live Kuaishou platform is the part that matters most: they report small gains in watch time, interaction rate, and seven-day retention, then rolled it out to hundreds of millions of users. That kind of deployment evidence is concrete and worth noting for applied recommendation work.

Referee Report

2 major / 2 minor

Summary. The paper claims that the timing of user actions during short video consumption encodes diverse user intentions, as demonstrated via statistical analysis of action patterns. Motivated by this, it introduces the Action-Aware Generative Sequence Network (A2Gen) that refines actions temporally and processes them as sequences. The architecture comprises a Context-aware Attention Module (CAM) to incorporate item-specific context into action sequences, a Hierarchical Sequence Encoder (HSE) to capture temporal patterns from historical actions, and an Action-seq Autoregressive Generator (AAG) for sequence generation. Offline experiments on Kuaishou's dataset and the Tmall public dataset are reported to show superiority, while large-scale online A/B tests on the Kuaishou platform yield lifts of 0.34% in watch time, 8.1% in interaction rate, and 0.162% in LifeTime-7 retention, resulting in full deployment to over 400 million daily users.

Significance. If the core modeling premise holds and the reported lifts are robust to baseline choices and statistical controls, the work would offer a practical advance in sequential recommendation by unifying action timing, context, and generative prediction within a multi-task framework. The successful large-scale deployment provides concrete evidence of industrial impact, though the incremental benefit over prior attention-based and hierarchical sequence models requires clear differentiation.

major comments (2)

[Motivation and statistical analysis (pre-§3)] The central motivation—that 'the timing of user actions can represent diverse intentions through statistical analysis and examination of action patterns'—directly justifies the design of CAM, HSE, and AAG. However, the analysis appears to present raw observational correlations without conditioning on key confounders such as video length, item popularity, session duration, or user demographics. This leaves open the possibility that the patterns reflect overall engagement volume rather than intention diversity, weakening the load-bearing justification for the temporal refinement and generative components.
[§4] §4 (online experiments): The A/B test reports specific percentage improvements in multi-task metrics, but lacks details on the precise baseline models, the definition of the multi-task objectives, the duration of the test, or any statistical significance measures (e.g., p-values or confidence intervals). Without these, it is impossible to determine whether the gains are attributable to the proposed modules or to other factors.

minor comments (2)

[Abstract and §1] The abstract and introduction use the term 'multi-task prediction' without enumerating the tasks or loss functions; adding this clarification would improve readability.
[§3] Acronyms CAM, HSE, and AAG are introduced without a dedicated notation table or consistent first-use definitions, which can hinder quick reference.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major comment below and indicate the specific revisions we will incorporate to improve clarity and rigor.

read point-by-point responses

Referee: [Motivation and statistical analysis (pre-§3)] The central motivation—that 'the timing of user actions can represent diverse intentions through statistical analysis and examination of action patterns'—directly justifies the design of CAM, HSE, and AAG. However, the analysis appears to present raw observational correlations without conditioning on key confounders such as video length, item popularity, session duration, or user demographics. This leaves open the possibility that the patterns reflect overall engagement volume rather than intention diversity, weakening the load-bearing justification for the temporal refinement and generative components.

Authors: We appreciate the referee's observation that the motivational analysis relies on observational patterns. The presented statistics were derived from large-scale platform logs to highlight variability in action timings, and similar trends held when examined across broad user activity strata. However, we acknowledge that explicit conditioning on confounders such as video length, item popularity, and session duration was not included in the original figures. To strengthen the justification for the temporal refinement and generative components, we will revise the motivation section to incorporate additional stratified and normalized analyses (e.g., action timing distributions conditioned on video length and popularity bins). These revisions will better isolate intention diversity from engagement volume. revision: yes
Referee: [§4] §4 (online experiments): The A/B test reports specific percentage improvements in multi-task metrics, but lacks details on the precise baseline models, the definition of the multi-task objectives, the duration of the test, or any statistical significance measures (e.g., p-values or confidence intervals). Without these, it is impossible to determine whether the gains are attributable to the proposed modules or to other factors.

Authors: We agree that additional experimental details are necessary for assessing robustness and reproducibility. In the revised manuscript, we will expand §4 to specify the exact baseline models (the production recommendation system deployed at the time of the test), define the multi-task objectives and associated loss functions, state the A/B test duration, and report statistical significance measures including p-values and confidence intervals for the lifts in watch time, interaction rate, and LifeTime-7 retention. These additions will clarify that the observed gains are attributable to the proposed modules. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation chain relies on external experimental validation

full rationale

The paper motivates its A2Gen architecture (CAM, HSE, AAG) from an observational claim that action timing encodes diverse user intentions, demonstrated via statistical analysis of patterns in the manuscript. This insight is then used to design the model components for sequence modeling. However, the claimed superiority is established through independent offline experiments on Kuaishou and Tmall datasets plus large-scale online A/B testing measuring watch time, interaction rate, and retention lifts. No equations, fitted parameters, or predictions are shown to reduce by construction to the input assumptions or prior self-citations. The derivation remains self-contained against external benchmarks, with no load-bearing self-definitional steps or renamed known results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 3 invented entities

The central claim rests on the domain assumption that action timing encodes distinct intentions and on the introduction of three new model components whose internal mechanics are not detailed in the abstract.

axioms (1)

domain assumption User consumption is a temporal process where the timing of actions represents diverse intentions
Stated explicitly in the abstract as the statistical and pattern-based foundation for the modeling paradigm.

invented entities (3)

Context-aware Attention Module (CAM) no independent evidence
purpose: Model action sequences enriched with item-specific contextual features
New module introduced to process enriched sequences; no independent evidence outside the paper.
Hierarchical Sequence Encoder (HSE) no independent evidence
purpose: Learn temporal action patterns from historical actions
New encoder component; details not provided in abstract.
Action-seq Autoregressive Generator (AAG) no independent evidence
purpose: Generate action sequences for prediction
New generator module leveraging CAM; no external validation shown.

pith-pipeline@v0.9.0 · 5633 in / 1489 out tokens · 63328 ms · 2026-05-07T16:15:07.887341+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

49 extracted references · 7 canonical work pages · 1 internal anchor

[1]

Jianxin Chang, Chenbin Zhang, Yiqun Hui, Dewei Leng, Yanan Niu, Yang Song, and Kun Gai. 2023. Pepnet: Parameter and embedding personalized network for infusing with personalized prior information. InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 3795–3804

2023
[2]

Qiwei Chen, Huan Zhao, Wei Li, Pipei Huang, and Wenwu Ou. 2019. Behavior sequence transformer for e-commerce recommendation in alibaba. InProceedings of the 1st international workshop on deep learning practice for high-dimensional sparse data. 1–4

2019
[3]

Yashar Deldjoo, Zhankui He, Julian McAuley, Anton Korikov, Scott Sanner, Arnau Ramisa, René Vidal, Maheswaran Sathiamoorthy, Atoosa Kasirzadeh, and Silvia Milano. 2024. A review of modern recommender systems using generative models (gen-recsys). InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 6448–6458

2024
[4]

Yufei Feng, Fuyu Lv, Weichen Shen, Menghan Wang, Fei Sun, Yu Zhu, and Keping Yang. 2019. Deep session interest network for click-through rate prediction. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. 2301–2307

2019
[5]

Yun He, Xue Feng, Cheng Cheng, Geng Ji, Yunsong Guo, and James Caverlee
[6]

InProceedings of the ACM Web Conference 2022

Metabalance: improving multi-task recommendations via adapting gradient magnitudes of auxiliary tasks. InProceedings of the ACM Web Conference 2022. 2205–2215

2022
[7]

Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimiza- tion. InProceedings of the 3rd International Conference on Learning Representations (ICLR)

2015
[8]

Hyeyoung Ko, Suyeon Lee, Yoonseo Park, and Anna Choi. 2022. A survey of recommendation systems: recommendation models, techniques, and application fields.Electronics11, 1 (2022), 141

2022
[9]

Pengcheng Li, Runze Li, Qing Da, An-Xiang Zeng, and Lijun Zhang. 2020. Improv- ing multi-scenario learning to rank in e-commerce by exploiting task relation- ships in the label space. InProceedings of the 29th ACM International Conference on Information & Knowledge Management. 2605–2612

2020
[10]

Wenhao Li, Jie Zhou, Chuan Luo, Chao Tang, Kun Zhang, and Shixiong Zhao
[11]

InProceedings of the 18th ACM Conference on Recommender Systems

Scene-wise adaptive network for dynamic cold-start scenes optimization in ctr prediction. InProceedings of the 18th ACM Conference on Recommender Systems. 370–379
[12]

Yongqi Li, Xinyu Lin, Wenjie Wang, Fuli Feng, Liang Pang, Wenjie Li, Liqiang Nie, Xiangnan He, and Tat-Seng Chua. 2024. A survey of generative search and recom- mendation in the era of large language models.arXiv preprint arXiv:2404.16924 (2024)

work page arXiv 2024
[13]

Shangsong Liang, Zhou Pan, wei liu, Jian Yin, and Maarten de Rijke. 2024. A Survey on Variational Autoencoders in Recommender Systems.Comput. Surveys (2024)

2024
[14]

Jiayi Liao, Sihang Li, Zhengyi Yang, Jiancan Wu, Yancheng Yuan, Xiang Wang, and Xiangnan He. 2023. Llara: Aligning large language models with sequential recommenders.arXiv preprint arXiv:2312.02445(2023)

work page arXiv 2023
[15]

Jianghao Lin, Jiaqi Liu, Jiachen Zhu, Yunjia Xi, Chengkai Liu, Yangtian Zhang, Yong Yu, and Weinan Zhang. 2024. A Survey on Diffusion Models for Recom- mender Systems.arXiv preprint arXiv:2409.05033(2024)

work page arXiv 2024
[16]

Xiaofan Liu, Qinglin Jia, Chuhan Wu, Jingjie Li, Dai Quanyu, Lin Bo, Rui Zhang, and Ruiming Tang. 2023. Task adaptive multi-learner network for joint CTR and CVR estimation. InCompanion Proceedings of the ACM Web Conference 2023. 490–494

2023
[17]

Sichun Luo, Yuxuan Yao, Bowei He, Yinya Huang, Aojun Zhou, Xinyi Zhang, Yuanzhang Xiao, Mingjie Zhan, and Linqi Song. 2024. Integrating large language models into recommendation via mutual augmentation and adaptive aggregation. arXiv preprint arXiv:2401.13870(2024)

work page arXiv 2024
[18]

Jiaqi Ma, Zhe Zhao, Xinyang Yi, Jilin Chen, Lichan Hong, and Ed H Chi. 2018. Modeling task relationships in multi-task learning with multi-gate mixture-of- experts. InProceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 1930–1939

2018
[19]

Aakarsh Malhotra, Mayank Vatsa, and Richa Singh. 2022. Dropped scheduled task: Mitigating negative transfer in multi-task learning using dynamic task dropping.Transactions on Machine Learning Research(2022)

2022
[20]

Xiang-Rong Sheng, Liqin Zhao, Guorui Zhou, Xinyao Ding, Binding Dai, Qiang Luo, Siran Yang, Jingshan Lv, Chi Zhang, Hongbo Deng, et al. 2021. One model to serve all: Star topology adaptive recommender for multi-domain ctr prediction. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 4104–4113

2021
[21]

Liangcai Su, Junwei Pan, Ximei Wang, Xi Xiao, Shijie Quan, Xihua Chen, and Jie Jiang. 2024. STEM: unleashing the power of embeddings for multi-task recommendation. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 9002–9010

2024
[22]

Hongyan Tang, Junning Liu, Ming Zhao, and Xudong Gong. 2020. Progressive layered extraction (ple): A novel multi-task learning (mtl) model for personalized recommendations. InProceedings of the 14th ACM Conference on Recommender Systems. 269–278

2020
[23]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. InAdvances in Neural Information Processing Systems, Vol. 30

2017
[24]

Nelson Vithayathil Varghese and Qusay H Mahmoud. 2020. A survey of multi-task deep reinforcement learning.Electronics9, 9 (2020), 1363. Action-Aware Generative Sequence Modeling for Short Video Recommendation SIGIR ’26, July 20–24, 2026, Melbourne, VIC, Australia

2020
[25]

Ruize Wang, Hui Xu, Ying Cheng, Qi He, Xing Zhou, Rui Feng, Wei Xu, Lei Huang, and Jie Jiang. 2024. ADSNet: Cross-Domain LTV Prediction with an Adaptive Siamese Network in Advertising. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 5872–5881

2024
[26]

Wenjie Wang, Xinyu Lin, Fuli Feng, Xiangnan He, and Tat-Seng Chua. 2023. Generative recommendation: Towards next-generation recommender paradigm. arXiv preprint arXiv:2304.03516(2023)

work page arXiv 2023
[27]

Wenjie Wang, Yiyan Xu, Fuli Feng, Xinyu Lin, Xiangnan He, and Tat-Seng Chua
[28]

InProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

Diffusion recommender model. InProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. 832–841
[29]

Xu Wang, Jiangxia Cao, Zhiyi Fu, Kun Gai, and Guorui Zhou. 2025. Home: Hierarchy of multi-gate experts for multi-task learning at kuaishou. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V

2025
[30]

Yuhao Wang, Ha Tsz Lam, Yi Wong, Ziru Liu, Xiangyu Zhao, Yichao Wang, Bo Chen, Huifeng Guo, and Ruiming Tang. 2024. Multi-task deep recommender systems: A survey.IEEE Transactions on Knowledge and Data Engineering36, 5 (2024), 2038–2057

2024
[31]

Shen Xin, Martin Ester, Jiajun Bu, Chengwei Yao, Zhao Li, Xun Zhou, Yizhou Ye, and Can Wang. 2019. Multi-task based sales predictions for online promotions. In Proceedings of the 28th ACM international conference on information and knowledge management. 2823–2831

2019
[32]

Enneng Yang, Junwei Pan, Ximei Wang, Haibin Yu, Li Shen, Xihua Chen, Lei Xiao, Jie Jiang, and Guibing Guo. 2023. Adatask: A task-aware adaptive learning rate approach to multi-task learning. InProceedings of the AAAI conference on artificial intelligence, Vol. 37. 10745–10753

2023
[33]

Jiaqi Zhai, Lucy Liao, Xing Liu, Yueming Wang, Rui Li, Xuan Cao, Leon Gao, Zhao- jie Gong, Fangda Gu, Jiayuan He, et al. 2024. Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations. In International Conference on Machine Learning. PMLR, 58484–58509

2024
[34]

Junjie Zhang, Ruobing Xie, Yupeng Hou, Xin Zhao, Leyu Lin, and Ji-Rong Wen
[35]

Recommendation as instruction following: A large language model em- powered recommendation approach.ACM Transactions on Information Systems (2023)

2023
[36]

Shuai Zhang, Lina Yao, Aixin Sun, and Yi Tay. 2019. Deep learning based recom- mender system: A survey and new perspectives.ACM computing surveys (CSUR) 52, 1 (2019), 1–38

2019
[37]

Yuyu Zhang, Liang Pang, Lei Shi, and Bin Wang. 2014. Large scale purchase prediction with historical user actions on B2C online retail platform.arXiv preprint arXiv:1408.6515(2014)

work page arXiv 2014
[38]

Zijian Zhang, Shuchang Liu, Jiaao Yu, Qingpeng Cai, Xiangyu Zhao, Chunxu Zhang, Ziru Liu, Qidong Liu, Hongwei Zhao, Lantao Hu, et al . 2024. M3oe: Multi-domain multi-task mixture-of experts recommendation framework. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. 893–902

2024
[39]

Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et al. 2023. A survey of large language models.arXiv preprint arXiv:2303.18223(2023)

work page internal anchor Pith review arXiv 2023
[40]

Bowen Zheng, Yupeng Hou, Hongyu Lu, Yu Chen, Wayne Xin Zhao, Ming Chen, and Ji-Rong Wen. 2024. Adapting large language models by integrating collaborative semantics for recommendation. In2024 IEEE 40th International Conference on Data Engineering (ICDE). IEEE, 1435–1448

2024
[41]

Kai Zheng, Xianjun Yang, Yilei Wang, Yingjie Wu, and Xianghan Zheng. 2020. Collaborative filtering recommendation algorithm based on variational inference. International Journal of Crowd Science4, 1 (2020), 31–44

2020
[42]

Wenliang Zhong, Rong Jin, Cheng Yang, Xiaowei Yan, Qi Zhang, and Qiang Li
[43]

InProceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Stock constrained recommendation in tmall. InProceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2287–2296
[44]

Guorui Zhou, Na Mou, Ying Fan, Qi Pi, Weijie Bian, Chang Zhou, Xiaoqiang Zhu, and Kun Gai. 2019. Deep interest evolution network for click-through rate prediction. InProceedings of the AAAI conference on artificial intelligence, Vol. 33. 5941–5948

2019
[45]

Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep interest network for click-through rate prediction. InProceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 1059–1068

2018
[46]

Jie Zhou, Xianshuai Cao, Wenhao Li, Lin Bo, Kun Zhang, Chuan Luo, and Qian Yu
[47]

In2023 IEEE 39th International Conference on Data Engineering (ICDE)

Hinet: Novel multi-scenario & multi-task learning with hierarchical infor- mation extraction. In2023 IEEE 39th International Conference on Data Engineering (ICDE). IEEE, 2969–2975
[48]

Jieming Zhu, Qinglin Jia, Guohao Cai, Quanyu Dai, Jingjie Li, Zhenhua Dong, Ruiming Tang, and Rui Zhang. 2023. Final: Factorized interaction layer for ctr prediction. InProceedings of the 46th International ACM SIGIR conference on research and development in information retrieval. 2006–2010

2023
[49]

Yaochen Zhu, Liang Wu, Qi Guo, Liangjie Hong, and Jundong Li. 2024. Collab- orative large language model for recommender systems. InProceedings of the ACM on Web Conference 2024. 3162–3172

2024