pith. machine review for the scientific record. sign in

arxiv: 2604.08933 · v1 · submitted 2026-04-10 · 💻 cs.IR

Recognition: unknown

IAT: Instance-As-Token Compression for Historical User Sequence Modeling in Industrial Recommender Systems

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:16 UTC · model grok-4.3

classification 💻 cs.IR
keywords recommender systemssequence modelinginstance compressionuser behavior modelingindustrial recommendertransferability
0
0 comments X

The pith

Compressing each historical interaction into one token enables better sequence modeling in recommender systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proposes a two-stage Instance-As-Token framework to overcome the information limits of hand-crafted features in user sequence modeling for recommendations. The first stage compresses every past interaction's features into a single unified embedding that serves as an informative token. The second stage applies standard sequence models to sequences of these tokens to learn long-range user preferences. A reader would care if this leads to more accurate recommendations and better transfer across domains, as demonstrated in industrial deployments where it improved business metrics.

Core claim

The central claim is that by compressing all features of each historical interaction instance into a unified instance embedding in the first stage, using temporal or user-order schemes, and then using these compressed tokens in the second stage with standard sequence modeling, the framework can significantly outperform state-of-the-art methods in modeling long-range preferences with superior transferability.

What carries the argument

The two-stage Instance-As-Token (IAT) compression mechanism that reduces each multi-feature interaction to a single compact token for efficient sequence processing.

Load-bearing premise

That compressing all features of each historical interaction instance into a single unified instance embedding preserves sufficient information for downstream sequence modeling without critical loss.

What would settle it

Observing that a model using uncompressed raw features or alternative compression methods achieves equal or higher performance on the same industrial datasets would indicate that the IAT compression does not provide the claimed benefits.

Figures

Figures reproduced from arXiv: 2604.08933 by Daiye Hou, Fei Qin, Fei Teng, Heng Shi, Huizhi Yang, Lele Yu, Linlan Chen, Ning Zhang, Qianqian Yang, Wenlin Zhao, Xinchun Li, Yaocheng Tan, Yixin Wu, Zhen Wang.

Figure 1
Figure 1. Figure 1: The motivation of IAT. Hand-crafted sequence fea [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The overall two-stage framework of IAT. The IAT [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The training paradigms of IAT source models. The [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The storage architecture design of IAT. InsID [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The further analysis of the performance improve [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: The scaling law of downstream models. Models with the user-order IAT consistently achieve a relative im￾provement in AUC (up to 0.31%) and reductions in LogLoss (up to −0.67%), accompanied by only a modest increase in parameters and FLOPs. These experiments demonstrate that the IAT sequence is effective in various sequential modeling paradigms. We observe that aligning the IAT sequence modeling architec￾tu… view at source ↗
Figure 9
Figure 9. Figure 9: An enhancement training approach for the user [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗
Figure 8
Figure 8. Figure 8: Different IAT modeling architecture choices studied [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗
read the original abstract

Although sophisticated sequence modeling paradigms have achieved remarkable success in recommender systems, the information capacity of hand-crafted sequential features constrains the performance upper bound. To better enhance user experience by encoding historical interaction patterns, this paper presents a novel two-stage sequence modeling framework termed Instance-As-Token (IAT). The first stage of IAT compresses all features of each historical interaction instance into a unified instance embedding, which encodes the interaction characteristics in a compact yet informative token. Both temporal-order and user-order compression schemes are proposed, with the latter better aligning with the demands of downstream sequence modeling. The second stage involves the downstream task fetching fixed-length compressed instance tokens via timestamps and adopting standard sequence modeling approaches to learn long-range preferences patterns. Extensive experiments demonstrate that IAT significantly outperforms state-of-the-art methods and exhibits superior in-domain and cross-domain transferability. IAT has been successfully deployed in real-world industrial recommender systems, including e-commerce advertising, shopping mall marketing, and live-streaming e-commerce, delivering substantial improvements in key business metrics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes Instance-As-Token (IAT), a two-stage framework for historical user sequence modeling in recommender systems. Stage 1 compresses all features of each historical interaction instance into a single unified instance embedding (token) via temporal-order or user-order schemes; Stage 2 feeds the resulting fixed-length token sequence to standard sequence models for long-range preference learning. The authors claim that IAT significantly outperforms state-of-the-art methods, shows superior in-domain and cross-domain transferability, and has been deployed in industrial systems (e-commerce advertising, shopping mall marketing, live-streaming e-commerce) with substantial business-metric gains.

Significance. If the first-stage compression demonstrably preserves intra-instance feature interactions, IAT would offer a practical route to higher-capacity sequence modeling in industrial recommenders by converting variable-length, multi-feature histories into compact, fixed-length token sequences while improving both accuracy and transfer. The reported real-world deployments constitute a strong practical strength if supported by the experimental details.

major comments (2)
  1. [Abstract / first-stage description] Abstract / first-stage description: the central claim that the unified instance embedding 'encodes the interaction characteristics in a compact yet informative token' is load-bearing for all performance and deployment assertions, yet no reconstruction error, mutual-information, or per-feature ablation results are supplied to test whether cross-feature dependencies within a single interaction are retained. Without such evidence the observed lifts could arise from downstream architecture choices or data differences rather than the IAT compression itself.
  2. [Experiments and deployment claims] Experiments and deployment claims: the abstract asserts 'extensive experiments' and successful industrial deployment with 'substantial improvements in key business metrics,' but the provided text supplies neither baseline details, data-split protocols, statistical significance tests, nor comparisons against richer multi-feature sequence models. These omissions prevent verification that the reported gains are robust and attributable to IAT.
minor comments (1)
  1. [Method description] The distinction between the temporal-order and user-order compression schemes would be clearer if the paper included a short pseudocode or explicit algorithmic description of how timestamps and user-ordering are used to produce the fixed-length token sequence.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below with clarifications and proposed revisions to strengthen the presentation of evidence for the IAT compression and experimental details.

read point-by-point responses
  1. Referee: [Abstract / first-stage description] Abstract / first-stage description: the central claim that the unified instance embedding 'encodes the interaction characteristics in a compact yet informative token' is load-bearing for all performance and deployment assertions, yet no reconstruction error, mutual-information, or per-feature ablation results are supplied to test whether cross-feature dependencies within a single interaction are retained. Without such evidence the observed lifts could arise from downstream architecture choices or data differences rather than the IAT compression itself.

    Authors: We acknowledge that explicit metrics such as reconstruction error or mutual information for the instance embeddings are not reported. The downstream performance gains across multiple datasets and the industrial results provide indirect support that the unified tokens retain key interaction characteristics, particularly under the user-order scheme which aligns features for sequence modeling. To directly address this, we will add per-feature ablation studies and comparisons against models that retain individual features without compression in the revised manuscript. revision: yes

  2. Referee: [Experiments and deployment claims] Experiments and deployment claims: the abstract asserts 'extensive experiments' and successful industrial deployment with 'substantial improvements in key business metrics,' but the provided text supplies neither baseline details, data-split protocols, statistical significance tests, nor comparisons against richer multi-feature sequence models. These omissions prevent verification that the reported gains are robust and attributable to IAT.

    Authors: The full manuscript details the experimental protocols, including baselines (e.g., DIN, DIEN, and other sequence models), chronological data splits, and statistical significance via paired t-tests. However, to improve clarity and verifiability, we will expand the experiments section with explicit comparisons to richer multi-feature sequence models and additional specifics on the industrial A/B tests, including exact business metrics and deployment settings. revision: partial

Circularity Check

0 steps flagged

No significant circularity in IAT derivation or claims

full rationale

The paper introduces an empirical two-stage framework: stage one compresses per-interaction features into a single instance embedding via temporal or user-order schemes, and stage two feeds the resulting token sequence into standard sequence models for preference learning. No equations, loss functions, or performance metrics are shown to reduce by construction to the compression step itself or to any fitted parameters renamed as predictions. Claims of outperformance and industrial deployment rest on experimental results rather than self-referential definitions or load-bearing self-citations that collapse the central result. The framework is self-contained against external benchmarks with no detected self-definitional, fitted-input, or ansatz-smuggling patterns.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that instance-level compression can encode interaction characteristics compactly yet informatively; no explicit free parameters or invented entities are named in the abstract, though training of the embeddings will involve learned weights.

free parameters (1)
  • instance embedding dimension and compression parameters
    The unified instance embedding is produced by a learned compression step whose exact architecture and hyperparameters are not detailed in the abstract.
axioms (1)
  • domain assumption All features of a historical interaction instance can be compressed into a single token that retains the interaction characteristics needed for downstream preference modeling
    Invoked in the description of the first stage of IAT.

pith-pipeline@v0.9.0 · 5523 in / 1281 out tokens · 43145 ms · 2026-05-10T17:16:31.969350+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Sample Is Feature: Beyond Item-Level, Toward Sample-Level Tokens for Unified Large Recommender Models

    cs.IR 2026-04 unverdicted novelty 7.0

    SIF encodes full historical raw samples as tokens via hierarchical quantization to preserve sample context and unify sequential/non-sequential features in large recommender models.

Reference graph

Works this paper leans on

53 extracted references · 21 canonical work pages · cited by 1 Pith paper · 5 internal anchors

  1. [1]

    Zheng Chai, Qin Ren, Xijun Xiao, Huizhi Yang, Bo Han, Sijun Zhang, Di Chen, Hui Lu, Wenlin Zhao, Lele Yu, et al . 2025. Longer: Scaling up long sequence modeling in industrial recommenders. InProceedings of the Nineteenth ACM Conference on Recommender Systems. 247–256

  2. [2]

    Jianxin Chang, Chenbin Zhang, Zhiyi Fu, Xiaoxue Zang, Lin Guan, Jing Lu, Yiqun Hui, Dewei Leng, Yanan Niu, Yang Song, et al. 2023. TWIN: TWo-stage interest network for lifelong user behavior modeling in CTR prediction at kuaishou. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 3785–3794

  3. [3]

    Junyi Chen, Lu Chi, Bingyue Peng, and Zehuan Yuan. 2024. Hllm: Enhancing sequential recommendations via hierarchical large language models for item and user modeling.arXiv preprint arXiv:2409.12740(2024)

  4. [4]

    Jiaxin Deng, Shiyao Wang, Kuo Cai, Lejian Ren, Qigen Hu, Weifeng Ding, Qiang Luo, and Guorui Zhou. 2025. Onerec: Unifying retrieve and rank with generative recommender and iterative preference alignment.arXiv preprint arXiv:2502.18965 (2025)

  5. [5]

    Kairui Fu, Tao Zhang, Shuwen Xiao, Ziyang Wang, Xinming Zhang, Chenchi Zhang, Yuliang Yan, Junjun Zheng, Yu Li, Zhihong Chen, et al . 2025. Forge: Forming semantic identifiers for generative retrieval in industrial datasets.arXiv preprint arXiv:2509.20904(2025)

  6. [6]

    Lin Guan, Jia-Qi Yang, Zhishan Zhao, Beichuan Zhang, Bo Sun, Xuanyuan Luo, Jinan Ni, Xiaowen Li, Yuhang Qi, Zhifang Fan, et al. 2025. Make It Long, Keep It Fast: End-to-End 10k-Sequence Modeling at Billion Scale on Douyin.arXiv preprint arXiv:2511.06077(2025)

  7. [7]

    Huan Gui, Ruoxi Wang, Ke Yin, Long Jin, Maciej Kula, Taibai Xu, Lichan Hong, and Ed H Chi. 2023. Hiformer: Heterogeneous feature interactions learning with transformers for recommender systems.arXiv preprint arXiv:2311.05884(2023)

  8. [8]

    Ruidong Han, Bin Yin, Shangyu Chen, He Jiang, Fei Jiang, Xiang Li, Chi Ma, Mincong Huang, Xiaoguang Li, Chunzhen Jing, et al . 2025. Mtgr: Industrial- scale generative recommendation framework in meituan. InProceedings of the 34th ACM International Conference on Information and Knowledge Management. 5731–5738

  9. [9]

    Xintian Han, Honggang Chen, Quan Lin, Jingyue Gao, Xiangyuan Ren, Lifei Zhu, Zhisheng Ye, Shikang Wu, XiongHang Xie, Xiaochu Gan, et al . 2025. LEMUR: Large scale End-to-end MUltimodal Recommendation.arXiv preprint arXiv:2511.10962(2025)

  10. [10]

    Zhicheng He, Weiwen Liu, Wei Guo, Jiarui Qin, Yingxue Zhang, Yaochen Hu, and Ruiming Tang. 2023. A survey on user behavior modeling in recommender systems.arXiv preprint arXiv:2302.11087(2023)

  11. [11]

    Dan Hendrycks and Kevin Gimpel. 2016. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415(2016)

  12. [12]

    Yupeng Hou, Shanlei Mu, Wayne Xin Zhao, Yaliang Li, Bolin Ding, and Ji-Rong Wen. 2022. Towards universal sequence representation learning for recommender systems. InProceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining. 585–593

  13. [13]

    Yunwen Huang, Shiyong Hong, Xijun Xiao, Jinqiu Jin, Xuanyuan Luo, Zhe Wang, Zheng Chai, Shikang Wu, Yuchao Zheng, and Jingjian Lin. 2026. HyFormer: Revis- iting the Roles of Sequence Modeling and Feature Interaction in CTR Prediction. arXiv preprint arXiv:2601.12681(2026)

  14. [14]

    Andrew Jaegle, Felix Gimeno, Andy Brock, Oriol Vinyals, Andrew Zisserman, and Joao Carreira. 2021. Perceiver: General perception with iterative attention. InInternational conference on machine learning. PMLR, 4651–4664

  15. [15]

    Pengyue Jia, Yejing Wang, Zhaocheng Du, Xiangyu Zhao, Yichao Wang, Bo Chen, Wanyu Wang, Huifeng Guo, and Ruiming Tang. 2024. Erase: Benchmarking feature selection methods for deep recommender systems. InProceedings of the 30th ACM SIGKDD conference on knowledge discovery and data mining. 5194–5205

  16. [16]

    Yuchen Jiang, Jie Zhu, Xintian Han, Hui Lu, Kunmin Bai, Mingyu Yang, Shikang Wu, Ruihao Zhang, Wenlin Zhao, Shipeng Bai, et al. 2026. TokenMixer-Large: Scaling Up Large Ranking Models in Industrial Recommenders.arXiv preprint arXiv:2602.06563(2026)

  17. [17]

    Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recom- mendation. In2018 IEEE international conference on data mining (ICDM). IEEE, 197–206

  18. [18]

    Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. 2020. Scaling laws for neural language models.arXiv preprint arXiv:2001.08361(2020)

  19. [19]

    Kirill Khrylchenko, Artem Matveev, Sergei Makeev, and Vladimir Baikalov. 2025. Scaling recommender transformers to one billion parameters.arXiv preprint arXiv:2507.15994(2025)

  20. [20]

    Weijiang Lai, Beihong Jin, Jiongyan Zhang, Yiyuan Zheng, Jian Dong, Jia Cheng, Jun Lei, and Xingxing Wang. 2025. Exploring Scaling Laws of CTR Model for Online Performance Improvement. InProceedings of the Nineteenth ACM Conference on Recommender Systems. 114–123

  21. [21]

    Kaiyuan Li, Yongxiang Tang, Yanhua Cheng, Yong Bai, Yanxiang Zeng, Chao Wang, Xialong Liu, and Peng Jiang. 2025. VQL: An End-to-End Context-Aware Vector Quantization Attention for Ultra-Long User Behavior Modeling.arXiv preprint arXiv:2508.17125(2025)

  22. [22]

    Zhiwei Liu, Ziwei Fan, Yu Wang, and Philip S Yu. 2021. Augmenting sequential recommendation with pseudo-prior items via reversely pre-training transformer. InProceedings of the 44th international ACM SIGIR conference on Research and development in information retrieval. 1608–1612

  23. [23]

    Xiao Lv, Jiangxia Cao, Shijie Guan, Xiaoyou Zhou, Zhiguang Qi, Yaqiang Zang, Ben Wang, and Guorui Zhou. 2025. MARM: Unlocking the Recommendation Cache Scaling-Law through Memory Augmentation and Scalable Complexity. InProceedings of the 34th ACM International Conference on Information and Knowledge Management. 2022–2031

  24. [24]

    Wenhan Lyu, Devashish Tyagi, Yihang Yang, Ziwei Li, Ajay Somani, Karthikeyan Shanmugasundaram, Nikola Andrejevic, Ferdi Adeputra, Curtis Zeng, Arun K Singh, et al. 2025. DV365: Extremely Long User History Modeling at Instagram. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2. 4717–4727

  25. [25]

    Shashank Rajput, Nikhil Mehta, Anima Singh, Raghunandan Hulikal Keshavan, Trung Vu, Lukasz Heldt, Lichan Hong, Yi Tay, Vinh Tran, Jonah Samost, et al

  26. [26]

    Recommender systems with generative retrieval.Advances in Neural Information Processing Systems36 (2023), 10299–10315

  27. [27]

    Ads Recommendation. 2025. External Large Foundation Model: How to Efficiently Serve Trillions of Parameters for Online Ads Recommendation.arXiv preprint arXiv:2502.17494(2025)

  28. [28]

    Benedikt Schifferer, Chris Deotte, and Even Oldridge. 2020. Tutorial: feature engineering for recommender systems. InProceedings of the 14th ACM Conference on Recommender Systems. 754–755

  29. [29]

    Zihua Si, Lin Guan, ZhongXiang Sun, Xiaoxue Zang, Jing Lu, Yiqun Hui, Xingchao Cao, Zeyu Yang, Yichen Zheng, Dewei Leng, et al. 2024. Twin v2: Scaling ultra- long user behavior sequence modeling for enhanced ctr prediction at kuaishou. InProceedings of the 33rd ACM International Conference on Information and Knowledge Management. 4890–4897

  30. [30]

    Uriel Singer, Haggai Roitman, Yotam Eshel, Alexander Nus, Ido Guy, Or Levi, Idan Hasson, and Eliyahu Kiperwasser. 2022. Sequential modeling with multiple attributes for watchlist recommendation in e-commerce. InProceedings of the fifteenth ACM international conference on web search and data mining. 937–946

  31. [31]

    Xin Song, Xiaochen Li, Jinxin Hu, Hong Wen, Zulong Chen, Yu Zhang, Xiaoyi Zeng, and Jing Zhang. 2025. Lrea: Low-rank efficient attention on modeling long- term user behaviors for ctr prediction. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2843–2847

  32. [32]

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need.Advances in neural information processing systems30 (2017)

  33. [33]

    Hu Wan, Yun Huang, Shuhan Bai, Xuan Sun, Tei-Wei Kuo, and Chun Jason Xue

  34. [34]

    InProceedings of the 40th ACM/SIGAPP Symposium on Applied Computing

    Rabbitail: A Tail Latency-Aware Scheduler for Deep Learning Recommen- dation Systems with Hierarchical Embedding Storage. InProceedings of the 40th ACM/SIGAPP Symposium on Applied Computing. 279–287. Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Li et al

  35. [35]

    Shuhan Wang, Bin Shen, Xu Min, Yong He, Xiaolu Zhang, Liang Zhang, Jun Zhou, and Linjian Mo. 2024. Aligned side information fusion method for sequential recommendation. InCompanion Proceedings of the ACM Web Conference 2024. 112–120

  36. [36]

    Zhuoxing Wei, Qi Liu, and Qingchen Xie. 2025. Deep Multiple Quantization Network on Long Behavior Sequence for Click-Through Rate Prediction. In Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. 3090–3094

  37. [37]

    Bin Wu, Feifan Yang, Zhangming Chan, Yu-Ran Gu, Jiawei Feng, Chao Yi, Xiang- Rong Sheng, Han Zhu, Jian Xu, Mang Ye, et al. 2025. MUSE: A Simple Yet Effective Multimodal Search-Based Framework for Lifelong User Interest Modeling.arXiv preprint arXiv:2512.07216(2025)

  38. [38]

    Xue Xia, Saurabh Joshi, Kousik Rajesh, Kangnan Li, Yangyi Lu, Nikil Pancha, Dhruvil Badani, Jiajing Xu, and Pong Eksombatchai. 2025. TransAct V2: Lifelong User Action Sequence Modeling on Pinterest Recommendation. InProceedings of the 34th ACM International Conference on Information and Knowledge Management. 6881–6882

  39. [39]

    Yueqi Xie, Peilin Zhou, and Sunghun Kim. 2022. Decoupled side information fusion for sequential recommendation. InProceedings of the 45th international ACM SIGIR conference on research and development in information retrieval. 1611– 1621

  40. [40]

    Lee Xiong, Zhirong Chen, Rahul Mayuranath, Shangran Qiu, Arda Ozdemir, Lu Li, Yang Hu, Dave Li, Jingtao Ren, Howard Cheng, et al. 2026. LLaTTE: Scaling Laws for Multi-Stage Sequence Modeling in Large-Scale Ads Recommendation. arXiv preprint arXiv:2601.20083(2026)

  41. [41]

    Songpei Xu, Shijia Wang, Da Guo, Xianwen Guo, Qiang Xiao, Bin Huang, Guanlin Wu, and Chuanjiang Luo. 2025. Climber: Toward efficient scaling laws for large recommendation models. InProceedings of the 34th ACM International Conference on Information and Knowledge Management. 6193–6200

  42. [42]

    Jinho Yang, Ji-Hoon Kim, and Joo-Young Kim. 2025. SCRec: A Scalable Computa- tional Storage System with Statistical Sharding and Tensor-train Decomposition for Recommendation Models.IEEE Trans. Comput.(2025)

  43. [43]

    Jingyang Yuan, Huazuo Gao, Damai Dai, Junyu Luo, Liang Zhao, Zhengyan Zhang, Zhenda Xie, Yuxing Wei, Lean Wang, Zhiping Xiao, et al. 2025. Native sparse attention: Hardware-aligned and natively trainable sparse attention. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 23078–23097

  44. [44]

    Kun Yuan, Junyu Bi, Daixuan Cheng, Changfa Wu, Shuwen Xiao, Binbin Cao, Jian Wu, and Yuning Jiang. 2026. HiSAC: Hierarchical Sparse Activation Com- pression for Ultra-long Sequence Modeling in Recommenders.arXiv preprint arXiv:2602.21009(2026)

  45. [45]

    Zhichen Zeng, Xiaolong Liu, Mengyue Hang, Xiaoyi Liu, Qinghai Zhou, Chaofei Yang, Yiqun Liu, Yichen Ruan, Laming Chen, Yuxin Chen, et al. 2025. InterFormer: Effective Heterogeneous Interaction Learning for Click-Through Rate Prediction. InProceedings of the 34th ACM International Conference on Information and Knowledge Management. 6225–6233

  46. [46]

    Jiaqi Zhai, Lucy Liao, Xing Liu, Yueming Wang, Rui Li, Xuan Cao, Leon Gao, Zhao- jie Gong, Fangda Gu, Michael He, et al. 2024. Actions speak louder than words: Trillion-parameter sequential transducers for generative recommendations.arXiv preprint arXiv:2402.17152(2024)

  47. [47]

    Buyun Zhang, Liang Luo, Yuxin Chen, Jade Nie, Xi Liu, Daifeng Guo, Yanli Zhao, Shen Li, Yuchen Hao, Yantao Yao, et al. 2024. Wukong: Towards a scaling law for large-scale recommendation.arXiv preprint arXiv:2403.02545(2024)

  48. [48]

    Kun Zhang, Jingming Zhang, Wei Cheng, Yansong Cheng, Jiaqi Zhang, Hao Lu, Xu Zhang, Haixiang Gan, Jiangxia Cao, Tenglong Wang, et al. 2026. OneMall: One Model, More Scenarios–End-to-End Generative Recommender Family at Kuaishou E-Commerce.arXiv preprint arXiv:2601.21770(2026)

  49. [49]

    Tingting Zhang, Pengpeng Zhao, Yanchi Liu, Victor S Sheng, Jiajie Xu, Deqing Wang, Guanfeng Liu, Xiaofang Zhou, et al . 2019. Feature-level deeper self- attention network for sequential recommendation. InIJCAI. 4320–4326

  50. [50]

    Zhaoqi Zhang, Haolei Pei, Jun Guo, Tianyu Wang, Yufei Feng, Hui Sun, Shaowei Liu, and Aixin Sun. 2025. OneTrans: Unified Feature Interaction and Sequence Modeling with One Transformer in Industrial Recommender.arXiv preprint arXiv:2510.26104(2025)

  51. [51]

    Guorui Zhou, Na Mou, Ying Fan, Qi Pi, Weijie Bian, Chang Zhou, Xiaoqiang Zhu, and Kun Gai. 2019. Deep interest evolution network for click-through rate prediction. InProceedings of the AAAI conference on artificial intelligence, Vol. 33. 5941–5948

  52. [52]

    Wen-Ji Zhou, Yuhang Zheng, Yinfu Feng, Yunan Ye, Rong Xiao, Long Chen, Xiaosong Yang, and Jun Xiao. 2024. ENCODE: Breaking the trade-off between performance and efficiency in long-term user behavior modeling.IEEE Transac- tions on Knowledge and Data Engineering37, 1 (2024), 265–277

  53. [53]

    Jie Zhu, Zhifang Fan, Xiaoxie Zhu, Yuchen Jiang, Hangyu Wang, Xintian Han, Haoran Ding, Xinmin Wang, Wenlin Zhao, Zhen Gong, et al. 2025. Rankmixer: Scaling up ranking models in industrial recommenders. InProceedings of the 34th ACM International Conference on Information and Knowledge Management. 6309–6316. IAT: Instance-As-Token Compression for Historic...