BlossomRec: Block-level Fused Sparse Attention Mechanism for Sequential Recommendations

Jingtong Gao; Mengyang Ma; Pengyue Jia; Wanyu Wang; Weihong Luo; Xiangyu Zhao; Xiao Han; Xiaopeng Li; Yiqi Wang; Yunpeng Weng

arxiv: 2512.13368 · v3 · pith:NV3FJKBWnew · submitted 2025-12-15 · 💻 cs.IR

BlossomRec: Block-level Fused Sparse Attention Mechanism for Sequential Recommendations

Mengyang Ma , Xiaopeng Li , Wanyu Wang , Zhaocheng Du , Jingtong Gao , Pengyue Jia , Yuyang Ye , Yiqi Wang

show 4 more authors

Yunpeng Weng Weihong Luo Xiao Han Xiangyu Zhao

This is my paper

Pith reviewed 2026-05-25 07:18 UTC · model grok-4.3

classification 💻 cs.IR

keywords sequential recommendationssparse attentiontransformer modelslong-term interestsshort-term interestsmemory efficiencyrecommender systemsattention mechanism

0 comments

The pith

BlossomRec applies two sparse attention patterns for long-term and short-term interests to match full attention performance with far less memory in sequential recommenders.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces BlossomRec to address the growing computational and memory costs in Transformer-based sequential recommender systems as user histories lengthen. It separates user interests into long-term and short-term categories and computes each with a dedicated sparse attention pattern before merging them through a learnable gate. The design targets stable results on sequences of any length while cutting the number of attention interactions. If the approach holds, it would allow existing Transformer recommenders to scale to longer histories without proportional increases in resource demands.

Core claim

BlossomRec categorizes user interests into long-term and short-term, computes them using two distinct sparse attention patterns, and combines the results through a learnable gated output. This significantly reduces the number of interactions participating in attention computation. When integrated with state-of-the-art Transformer-based models, it achieves comparable or even superior performance on four public datasets while significantly reducing memory usage.

What carries the argument

BlossomRec, the block-level fused sparse attention mechanism that applies two distinct sparse patterns for long-term and short-term interests and fuses their outputs with a learnable gate.

If this is right

Transformer models augmented with BlossomRec maintain or exceed baseline recommendation accuracy.
Memory usage drops substantially as user interaction sequences grow longer.
Performance stays stable across both short and long sequences unlike some other efficient attention methods.
The theoretical cut in attention interactions translates to measurable efficiency gains in practice.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same two-pattern fusion could be tested on other sequence tasks where quadratic attention becomes prohibitive.
Adjusting the sparse patterns themselves might produce further memory savings on specific datasets.
The learnable gate opens a route to dynamic weighting of multiple interest types in broader recommender designs.
Production systems with real-time constraints would need separate validation beyond the public dataset results.

Load-bearing premise

That two fixed sparse attention patterns combined by a learnable gate can capture all relevant user interest interactions without needing the cross terms from standard full attention.

What would settle it

Direct side-by-side runs on the four public datasets showing whether the BlossomRec-integrated models drop below baseline Transformer accuracy or fail to deliver substantial memory reduction.

Figures

Figures reproduced from arXiv: 2512.13368 by Jingtong Gao, Mengyang Ma, Pengyue Jia, Wanyu Wang, Weihong Luo, Xiangyu Zhao, Xiao Han, Xiaopeng Li, Yiqi Wang, Yunpeng Weng, Yuyang Ye, Zhaocheng Du.

**Figure 1.** Figure 1: Overview of the BlossomRec framework. query heads into 𝑔 groups, each sharing the same key and value projections. This can be formulated as: GQA(𝑄, 𝐾,𝑉 ) = Concat(head1, . . . , headℎ)𝑊 𝑂 (4) head𝑖 = Attn(𝑄𝑖 , 𝐾𝑔(𝑖) ,𝑉𝑔(𝑖)) (5) where ℎ is the number of query heads, 𝑔(𝑖) = ⌊𝑖/(ℎ/𝑔)⌋ is 𝐾𝑉 group index for head 𝑖, 𝑔 is the number of KV groups (𝑔 < ℎ). 3 Framework In this section, we introduce the BlossomRec f… view at source ↗

**Figure 2.** Figure 2: Efficiency Analysis roughly one-seventh of SASRec’s. The sparsity structure, therefore, alleviates not only computational but also memory bottlenecks at serving time, facilitating deployment in resourceconstrained environments. 4.4 Ablation Study (RQ3) [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 4.** Figure 4: Feature-map visualization of different models [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Case Study of User 566 Interaction Sequence form ML-1M [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

read the original abstract

Transformer structures have been widely used in sequential recommender systems (SRS). However, as user interaction histories increase, computational time and memory requirements also grow. This is mainly caused by the standard attention mechanism. Although there exist many methods employing efficient attention and SSM-based models, these approaches struggle to effectively model long sequences and may exhibit unstable performance on short sequences. To address these challenges, we design a sparse attention mechanism, BlossomRec, which models both long-term and short-term user interests through attention computation to achieve stable performance across sequences of varying lengths. Specifically, we categorize user interests in recommendation systems into long-term and short-term interests, and compute them using two distinct sparse attention patterns, with the results combined through a learnable gated output. Theoretically, it significantly reduces the number of interactions participating in attention computation. Extensive experiments on four public datasets demonstrate that BlossomRec, when integrated with state-of-the-art Transformer-based models, achieves comparable or even superior performance while significantly reducing memory usage, providing strong evidence of BlossomRec's efficiency and effectiveness. The code is available at https://github.com/Applied-Machine-Learning-Lab/WWW2026_BlossomRec.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BlossomRec gives a concrete block-level sparse attention split for long and short-term interests with a gate, and the full experiments plus ablations hold up without load-bearing gaps.

read the letter

BlossomRec introduces a block-level fused sparse attention that separates long-term and short-term user interests, computes each with its own sparse pattern, and merges the outputs through a learnable gate. This specific combination at the block level is not in the prior efficient-attention or SSM work they cite, and the full manuscript supplies the ablations, complexity analysis, and experimental details that were missing from the abstract. On four public datasets the integrated models match or beat the baselines while cutting memory, and the code release makes it straightforward to check the numbers. The central empirical claim therefore rests on reproducible comparisons rather than circular fitting. The soft spots are modest and proportional. The field of efficiency tweaks for sequential recommenders is already dense, so the advance is incremental rather than conceptual. The choice of the two particular sparse patterns and block sizes still feels somewhat heuristic even after the ablations, though nothing in the reported results suggests the fusion fails to capture the needed interactions. No internal contradictions or unverified assumptions appear load-bearing once the full text is examined. This paper is aimed at engineers and researchers who need to scale Transformer-based sequential models to longer histories under memory constraints. A reader working on production deployments would get direct value from the memory numbers and the released implementation. It deserves a serious referee because the claims are grounded, the method is reproducible, and the experiments address the main practical questions.

Referee Report

0 major / 2 minor

Summary. The paper proposes BlossomRec, a block-level fused sparse attention mechanism for sequential recommender systems. It models long-term and short-term user interests via two distinct sparse attention patterns whose outputs are combined by a learnable gate, claiming to reduce the number of attention interactions while achieving comparable or superior performance to standard attention when plugged into Transformer-based models, with extensive experiments on four public datasets and open-sourced code.

Significance. If the empirical results hold, the work offers a practical, memory-efficient alternative to quadratic attention and SSM-based models for handling variable-length user histories in sequential recommendation, addressing a core scalability bottleneck. The provision of code and complexity analysis strengthens its potential utility.

minor comments (2)

[Abstract] Abstract: the four public datasets are not named; adding their identities would improve immediate context for readers.
[§3] The description of the two sparse patterns and gate fusion would benefit from an explicit statement of their computational complexity relative to standard attention (e.g., O(n) vs O(n²)) in the main text.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, recognition of the work's potential utility, and recommendation for minor revision. No specific major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's contribution is an empirical sparse attention design (two block-level patterns plus learnable gate) whose performance claims rest on experiments across four datasets, ablations, and complexity analysis rather than any closed mathematical derivation. No equations are presented that reduce a claimed result to a fitted parameter or self-citation by construction; the design choices are motivated by domain considerations and externally validated. This matches the most common honest outcome for applied systems papers.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The mechanism introduces a small number of architectural choices (block size, two sparsity masks, gate network) whose values are fitted during training; no new physical or mathematical entities are postulated.

free parameters (2)

block size
Determines the granularity of the sparse patterns and must be chosen or tuned per dataset.
gate network weights
Learned parameters that combine the two attention outputs.

axioms (1)

standard math Standard scaled dot-product attention formula remains valid when restricted to the chosen sparse masks.
Invoked implicitly when defining the two sparse patterns.

pith-pipeline@v0.9.0 · 5772 in / 1254 out tokens · 35493 ms · 2026-05-25T07:18:10.621667+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

77 extracted references · 77 canonical work pages · 9 internal anchors

[1]

Joshua Ainslie, James Lee-Thorp, Michiel De Jong, Yury Zemlyanskiy, Federico Lebrón, and Sumit Sanghai. 2023. Gqa: Training generalized multi-query trans- former models from multi-head checkpoints.arXiv preprint arXiv:2305.13245 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[2]

Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer normaliza- tion.arXiv preprint arXiv:1607.06450(2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[3]

Keqin Bao, Jizhi Zhang, Wenjie Wang, Yang Zhang, Zhengyi Yang, Yanchen Luo, Chong Chen, Fuli Feng, and Qi Tian. 2025. A bi-step grounding paradigm for large language models in recommendation systems.ACM Transactions on Recommender Systems3, 4 (2025), 1–27

work page 2025
[4]

Iz Beltagy, Matthew E Peters, and Arman Cohan. 2020. Longformer: The long- document transformer.arXiv preprint arXiv:2004.05150(2020)

work page internal anchor Pith review Pith/arXiv arXiv 2020
[5]

Zheng Chai, Qin Ren, Xijun Xiao, Huizhi Yang, Bo Han, Sijun Zhang, Di Chen, Hui Lu, Wenlin Zhao, Lele Yu, et al . 2025. Longer: Scaling up long sequence modeling in industrial recommenders. InProceedings of the Nineteenth ACM Conference on Recommender Systems. 247–256

work page 2025
[6]

Junyi Chen, Lu Chi, Bingyue Peng, and Zehuan Yuan. 2024. Hllm: Enhancing sequential recommendations via hierarchical large language models for item and user modeling.arXiv preprint arXiv:2409.12740(2024)

work page arXiv 2024
[7]

Lida Chen, Dong Xu, Chenxin An, Xintao Wang, Yikai Zhang, Jiangjie Chen, Zujie Liang, Feng Wei, Jiaqing Liang, Yanghua Xiao, et al. 2025. PowerAttention: Exponentially Scaling of Receptive Fields for Effective Sparse Attention.arXiv preprint arXiv:2503.03588(2025)

work page arXiv 2025
[8]

Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. InProceedings of the 10th ACM conference on recommender systems. 191–198

work page 2016
[9]

Yu Cui, Feng Liu, Pengbo Wang, Bohao Wang, Heng Tang, Yi Wan, Jun Wang, and Jiawei Chen. 2024. Distillation matters: empowering sequential recommenders to match the performance of large language models. InProceedings of the 18th ACM Conference on Recommender Systems. 507–517

work page 2024
[10]

Jiayu Ding, Shuming Ma, Li Dong, Xingxing Zhang, Shaohan Huang, Wenhui Wang, Nanning Zheng, and Furu Wei. 2023. Longnet: Scaling transformers to 1,000,000,000 tokens.arXiv preprint arXiv:2307.02486(2023)

work page arXiv 2023
[11]

Hanwen Du, Hui Shi, Pengpeng Zhao, Deqing Wang, Victor S Sheng, Yanchi Liu, Guanfeng Liu, and Lei Zhao. 2022. Contrastive learning with bidirectional transformers for sequential recommendation. InProceedings of the 31st ACM International Conference on Information & Knowledge Management. 396–405

work page 2022
[12]

Ningya Feng, Junwei Pan, Jialong Wu, Baixu Chen, Ximei Wang, Qian Li, Xian Hu, Jie Jiang, and Mingsheng Long. 2024. Long-Sequence Recommendation Models Need Decoupled Embeddings.arXiv preprint arXiv:2410.02604(2024)

work page arXiv 2024
[13]

Yongrui Fu, Jian Liu, Tao Li, Zonggang Wu, Shouke Qin, and Hanmeng Liu

work page
[14]

Multimodal Fusion And Sparse Attention-based Alignment Model for Long Sequential Recommendation.arXiv preprint arXiv:2508.09664(2025)

work page arXiv 2025
[15]

Zichuan Fu, Xiangyang Li, Chuhan Wu, Yichao Wang, Kuicai Dong, Xiangyu Zhao, Mengchen Zhao, Huifeng Guo, and Ruiming Tang. 2023. A unified frame- work for multi-domain ctr prediction via large language models.ACM Transac- tions on Information Systems(2023)

work page 2023
[16]

Jingtong Gao, Bo Chen, Menghui Zhu, Xiangyu Zhao, Xiaopeng Li, Yuhao Wang, Yichao Wang, Huifeng Guo, and Ruiming Tang. 2024. Hierrec: Scenario-aware hierarchical modeling for multi-scenario recommendations. InProceedings of the 33rd ACM International Conference on Information and Knowledge Management. 653–662

work page 2024
[17]

Jingtong Gao, Zhaocheng Du, Xiaopeng Li, Yichao Wang, Xiangyang Li, Huifeng Guo, Ruiming Tang, and Xiangyu Zhao. 2025. SampleLLM: Optimizing Tabular Data Synthesis in Recommendations. InCompanion Proceedings of the ACM on Web Conference 2025. 211–220

work page 2025
[18]

Jingtong Gao, Xiangyu Zhao, Muyang Li, Minghao Zhao, Runze Wu, Ruocheng Guo, Yiding Liu, and Dawei Yin. 2024. Smlp4rec: An efficient all-mlp architecture for sequential recommendations.ACM Transactions on Information Systems42, 3 (2024), 1–23

work page 2024
[19]

Binzong Geng, Zhaoxin Huan, Xiaolu Zhang, Yong He, Liang Zhang, Fajie Yuan, Jun Zhou, and Linjian Mo. 2024. Breaking the length barrier: Llm-enhanced CTR prediction in long textual user behaviors. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2311–2315

work page 2024
[20]

Albert Gu and Tri Dao. 2023. Mamba: Linear-time sequence modeling with selective state spaces.arXiv preprint arXiv:2312.00752(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[21]

Qipeng Guo, Xipeng Qiu, Pengfei Liu, Yunfan Shao, Xiangyang Xue, and Zheng Zhang. 2019. Star-transformer.arXiv preprint arXiv:1902.09113(2019)

work page arXiv 2019
[22]

Xiaowen Huang, Shengsheng Qian, Quan Fang, Jitao Sang, and Changsheng Xu

work page
[23]

InProceedings of the 26th ACM international conference on Multimedia

Csan: Contextual self-attention network for user sequential recommen- dation. InProceedings of the 26th ACM international conference on Multimedia. 447–455

work page
[24]

Dietmar Jannach and Malte Ludewig. 2017. When recurrent neural networks meet the neighborhood for session-based recommendation. InProceedings of the eleventh ACM conference on recommender systems. 306–310

work page 2017
[25]

Pengyue Jia, Zhaocheng Du, Yichao Wang, Xiangyu Zhao, Xiaopeng Li, Yuhao Wang, Qidong Liu, Huifeng Guo, and Ruiming Tang. 2025. SELF: Surrogate- light Feature Selection with Large Language Models in Deep Recommender Systems. InProceedings of the 34th ACM International Conference on Information and Knowledge Management. 1145–1155

work page 2025
[26]

Pengyue Jia, Yiding Liu, Xiaopeng Li, Xiangyu Zhao, Yuhao Wang, Yantong Du, Xiao Han, Xuetao Wei, Shuaiqiang Wang, and Dawei Yin. 2024. G3: an effective and adaptive framework for worldwide geolocalization using large multi-modality models.Advances in Neural Information Processing Systems37 (2024), 53198–53221

work page 2024
[27]

Pengyue Jia, Yichao Wang, Shanru Lin, Xiaopeng Li, Xiangyu Zhao, Huifeng Guo, and Ruiming Tang. 2024. D3: A methodological exploration of domain division, modeling, and balance in multi-domain recommendations. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 8553–8561

work page 2024
[28]

Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, De- vendra Singh Chaplot, de las Diego Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, and William El Sayed. 2023. Mistral 7B.arxiv:2310.0682...

work page internal anchor Pith review Pith/arXiv arXiv 2023
[29]

Huiqiang Jiang, Yucheng Li, Chengruidong Zhang, Qianhui Wu, Xufang Luo, Surin Ahn, Zhenhua Han, Amir H Abdi, Dongsheng Li, Chin-Yew Lin, et al

work page
[30]

Minference 1.0: Accelerating pre-filling for long-context llms via dynamic sparse attention.Advances in Neural Information Processing Systems37 (2024), 52481–52515

work page 2024
[31]

Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recom- mendation. In2018 IEEE international conference on data mining (ICDM). IEEE, 197–206

work page 2018
[32]

Chengxi Li, Yejing Wang, Qidong Liu, Xiangyu Zhao, Wanyu Wang, Yiqi Wang, Lixin Zou, Wenqi Fan, and Qing Li. 2023. STRec: Sparse transformer for sequential recommendations. InProceedings of the 17th ACM conference on recommender systems. 101–111

work page 2023
[33]

Jingyu Li, Zhaocheng Du, Qianhui Zhu, Zhicheng Zhang, Song-Li Wu, Chaolang Li, Pengwen Dai, et al. 2026. CollectiveKV: Decoupling and Sharing Collaborative Information in Sequential Recommendation.arXiv preprint arXiv:2601.19178 (2026)

work page arXiv 2026
[34]

Jiacheng Li, Yujie Wang, and Julian McAuley. 2020. Time interval aware self- attention for sequential recommendation. InProceedings of the 13th international conference on web search and data mining. 322–330

work page 2020
[35]

Muyang Li, Zijian Zhang, Xiangyu Zhao, Wanyu Wang, Minghao Zhao, Runze Wu, and Ruocheng Guo. 2023. Automlp: Automated mlp for sequential recom- mendations. InProceedings of the ACM web conference 2023. 1190–1198

work page 2023
[36]

Muyang Li, Xiangyu Zhao, Chuan Lyu, Minghao Zhao, Runze Wu, and Ruocheng Guo. 2022. MLP4Rec: A pure MLP architecture for sequential recommendations. arXiv preprint arXiv:2204.11510(2022)

work page arXiv 2022
[37]

Shiyang Li, Xiaoyong Jin, Yao Xuan, Xiyou Zhou, Wenhu Chen, Yu-Xiang Wang, and Xifeng Yan. 2019. Enhancing the locality and breaking the memory bottle- neck of transformer on time series forecasting.Advances in neural information processing systems32 (2019)

work page 2019
[38]

Xiaopeng Li, Bo Chen, Junda She, Shiteng Cao, You Wang, Qinlin Jia, Haiying He, Zheli Zhou, Zhao Liu, Ji Liu, et al. 2025. A Survey of Generative Recommendation from a Tri-Decoupled Perspective: Tokenization, Architecture, and Optimization. (2025)

work page 2025
[39]

Xinhang Li, Zhaopeng Qiu, Xiangyu Zhao, Zihao Wang, Yong Zhang, Chunxiao Xing, and Xian Wu. 2022. Gromov-wasserstein guided representation learning for cross-domain recommendation. InProceedings of the 31st ACM International Conference on Information & Knowledge Management. 1199–1208

work page 2022
[40]

Xiaopeng Li, Lixin Su, Pengyue Jia, Xiangyu Zhao, Suqi Cheng, Junfeng Wang, and Dawei Yin. 2023. Agent4ranking: Semantic robust ranking via personalized query rewriting using multi-agent llm.arXiv preprint arXiv:2312.15450(2023)

work page arXiv 2023
[41]

Xiaopeng Li, Fan Yan, Xiangyu Zhao, Yichao Wang, Bo Chen, Huifeng Guo, and Ruiming Tang. 2023. Hamur: Hyper adapter for multi-domain recommendation. InProceedings of the 32nd ACM International Conference on Information and Knowledge Management. 1268–1277

work page 2023
[42]

Xiaopeng Li, Yuanjin Zheng, Wanyu Wang, Pengyue Jia, Yiqi Wang, Maolin Wang, Xuetao Wei, Xiangyu Zhao, et al. 2025. MTA: A Merge-then-Adapt Framework for Personalized Large Language Model.arXiv preprint arXiv:2511.20072(2025)

work page arXiv 2025
[43]

Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Cheng- gang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, et al. 2024. Deepseek-v3 technical report.arXiv preprint arXiv:2412.19437(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[44]

Chengkai Liu, Jianghao Lin, Hanzhou Liu, Jianling Wang, and James Caverlee

work page
[45]

InProceedings of the 33rd ACM international conference on information and knowledge management

Behavior-dependent linear recurrent units for efficient sequential recom- mendation. InProceedings of the 33rd ACM international conference on information and knowledge management. 1430–1440

work page
[46]

Chengkai Liu, Jianghao Lin, Jianling Wang, Hanzhou Liu, and James Caverlee

work page
[47]

Mamba4rec: Towards efficient sequential recommendation with selective state space models.arXiv preprint arXiv:2403.03900(2024)

work page arXiv 2024
[48]

Langming Liu, Liu Cai, Chi Zhang, Xiangyu Zhao, Jingtong Gao, Wanyu Wang, Yifu Lv, Wenqi Fan, Yiqi Wang, Ming He, et al. 2023. Linrec: Linear attention WWW ’26, April 13–17, 2026, Dubai, United Arab Emirates Mengyang Ma et al. mechanism for long-term sequential recommender systems. InProceedings of the 46th International ACM SIGIR Conference on Research a...

work page 2023
[49]

Qidong Liu, Xian Wu, Yejing Wang, Zijian Zhang, Feng Tian, Yefeng Zheng, and Xiangyu Zhao. 2024. Llm-esr: Large language models enhancement for long- tailed sequential recommendation.Advances in Neural Information Processing Systems37 (2024), 26701–26727

work page 2024
[50]

Qidong Liu, Xiangyu Zhao, Yejing Wang, Zijian Zhang, Howard Zhong, Chong Chen, Xiang Li, Wei Huang, and Feng Tian. 2025. Bridge the Domains: Large Lan- guage Models Enhanced Cross-domain Sequential Recommendation. InProceed- ings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1582–1592

work page 2025
[51]

Shuchang Liu, Qingpeng Cai, Bowen Sun, Yuhao Wang, Ji Jiang, Dong Zheng, Peng Jiang, Kun Gai, Xiangyu Zhao, and Yongfeng Zhang. 2023. Exploration and regularization of the latent action space in recommendation. InProceedings of the ACM Web Conference 2023. 833–844

work page 2023
[52]

Ziwei Liu, Qidong Liu, Yejing Wang, Wanyu Wang, Pengyue Jia, Maolin Wang, Zitao Liu, Yi Chang, and Xiangyu Zhao. 2025. SIGMA: Selective Gated Mamba for Sequential Recommendation. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 12264–12272

work page 2025
[53]

Enzhe Lu, Zhejun Jiang, Jingyuan Liu, Yulun Du, Tao Jiang, Chao Hong, Shaowei Liu, Weiran He, Enming Yuan, Yuzhi Wang, et al. 2025. Moba: Mixture of block attention for long-context llms.arXiv preprint arXiv:2502.13189(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[54]

Yucheng Lu, Jiangxia Cao, Xu Kuan, Wei Cheng, Wei Jiang, Jiaming Zhang, Yang Shuang, Liu Zhaojie, and Liyin Hong. 2025. LiveForesighter: Generating Future Information for Live-Streaming Recommendations at Kuaishou.arXiv preprint arXiv:2502.06557(2025)

work page arXiv 2025
[55]

Fuyu Lv, Taiwei Jin, Changlong Yu, Fei Sun, Quan Lin, Keping Yang, and Wil- fred Ng. 2019. SDM: Sequential deep matching model for online large-scale recommender system. InProceedings of the 28th ACM international conference on information and knowledge management. 2635–2643

work page 2019
[56]

Dongyang Ma, Yan Wang, and Lan Tian. 2024. Block-attention for efficient prefilling.arXiv preprint arXiv:2409.15355(2024)

work page arXiv 2024
[57]

Qijie Shen, Hong Wen, Jing Zhang, and Qi Rao. 2022. Hierarchically fusing long and short-term user interests for click-through rate prediction in product search. InProceedings of the 31st ACM International Conference on Information & Knowledge Management. 1767–1776

work page 2022
[58]

Enxin Song, Wenhao Chai, Shusheng Yang, Ethan Armand, Xiaojun Shan, Haiyang Xu, Jianwen Xie, and Zhuowen Tu. 2025. Videonsa: Native sparse attention scales video understanding.arXiv preprint arXiv:2510.02295(2025)

work page arXiv 2025
[59]

Jianlin Su, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. 2024. Roformer: Enhanced transformer with rotary position embedding. Neurocomputing568 (2024), 127063

work page 2024
[60]

Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang

work page
[61]

InProceedings of the 28th ACM international conference on information and knowledge management

BERT4Rec: Sequential recommendation with bidirectional encoder rep- resentations from transformer. InProceedings of the 28th ACM international conference on information and knowledge management. 1441–1450

work page
[62]

Philippe Tillet, Hsiang-Tsung Kung, and David Cox. 2019. Triton: an intermediate language and compiler for tiled neural network computations. InProceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Program- ming Languages. 10–19

work page 2019
[63]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need.Advances in neural information processing systems30 (2017)

work page 2017
[64]

Yuhao Wang, Xiaopeng Li, Cheng Gong, Ziru Liu, Suiyun Zhang, Rui Liu, and Xiangyu Zhao. 2025. Efficient Reasoning via Reward Model.arXiv preprint arXiv:2511.09158(2025)

work page arXiv 2025
[65]

Yuhao Wang, Xiangyu Zhao, Bo Chen, Qidong Liu, Huifeng Guo, Huanshuo Liu, Yichao Wang, Rui Zhang, and Ruiming Tang. 2023. PLATE: A prompt-enhanced paradigm for multi-scenario recommendations. InProceedings of the 46th In- ternational ACM SIGIR Conference on Research and Development in Information Retrieval. 1498–1507

work page 2023
[66]

Qihang Yu, Kairui Fu, Zhaocheng Du, Yuxuan Si, Kaiyuan Li, Weihao Zhao, Zhicheng Zhang, Jieming Zhu, Quanyu Dai, Zhenhua Dong, et al. 2026. MAL- LOC: Benchmarking the Memory-aware Long Sequence Compression for Large Sequential Recommendation.arXiv preprint arXiv:2601.20234(2026)

work page arXiv 2026
[67]

Jingyang Yuan, Huazuo Gao, Damai Dai, Junyu Luo, Liang Zhao, Zhengyan Zhang, Zhenda Xie, YX Wei, Lean Wang, Zhiping Xiao, et al . 2025. Native sparse attention: Hardware-aligned and natively trainable sparse attention.arXiv preprint arXiv:2502.11089(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[68]

Manzil Zaheer, Guru Guruganesh, Kumar Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, et al. 2020. Big bird: Transformers for longer sequences.Advances in neural information processing systems33 (2020), 17283–17297

work page 2020
[69]

Jiaqi Zhai, Lucy Liao, Xing Liu, Yueming Wang, Rui Li, Xuan Cao, Leon Gao, Zhao- jie Gong, Fangda Gu, Michael He, et al. 2024. Actions speak louder than words: Trillion-parameter sequential transducers for generative recommendations.arXiv preprint arXiv:2402.17152(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[70]

Chao Zhang, Haoxin Zhang, Shiwei Wu, Di Wu, Tong Xu, Xiangyu Zhao, Yan Gao, Yao Hu, and Enhong Chen. 2024. Notellm-2: Multimodal large representation models for recommendation.arXiv preprint arXiv:2405.16789(2024)

work page arXiv 2024
[71]

Qianru Zhang, Liang Qu, Honggang Wen, Dong Huang, Siu-Ming Yiu, Nguyen Quoc Viet Hung, and Hongzhi Yin. 2025. M2Rec: Multi-scale Mamba for Efficient Sequential Recommendation.arXiv preprint arXiv:2505.04445(2025)

work page arXiv 2025
[72]

Sheng Zhang, Maolin Wang, Wanyu Wang, Jingtong Gao, Xiangyu Zhao, Yu Yang, Xuetao Wei, Zitao Liu, and Tong Xu. 2025. Glint-ru: Gated lightweight intelligent recurrent units for sequential recommender systems. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V

work page 2025
[73]

Zhicheng Zhang, Zhaocheng Du, Jieming Zhu, Jiwei Tang, Fengyuan Lu, Wang Jiaheng, Song-Li Wu, Qianhui Zhu, Jingyu Li, Hai-Tao Zheng, et al. 2026. Length- Adaptive Interest Network for Balancing Long and Short Sequence Modeling in CTR Prediction.arXiv preprint arXiv:2601.19142(2026)

work page arXiv 2026
[74]

Xiangyu Zhao, Yichao Wang, Bo Chen, Jingtong Gao, Yuhao Wang, Xiaopeng Li, Pengyue Jia, Qidong Liu, Huifeng Guo, and Ruiming Tang. 2025. Joint Modeling in Recommendations: A Survey.arXiv preprint arXiv:2502.21195(2025)

work page arXiv 2025
[75]

Xiangyu Zhao, Long Xia, Liang Zhang, Zhuoye Ding, Dawei Yin, and Jiliang Tang. 2018. Deep reinforcement learning for page-wise recommendations. In Proceedings of the 12th ACM conference on recommender systems. 95–103

work page 2018
[76]

Xiangyu Zhao, Liang Zhang, Zhuoye Ding, Long Xia, Jiliang Tang, and Dawei Yin

work page
[77]

InProceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining

Recommendations with negative feedback via pairwise deep reinforcement learning. InProceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 1040–1048. A Observations To investigate whether interaction sequences can be processed in a block-wise pattern, we extracted the complete interaction sequence of user #566 fro...

work page 2000

[1] [1]

Joshua Ainslie, James Lee-Thorp, Michiel De Jong, Yury Zemlyanskiy, Federico Lebrón, and Sumit Sanghai. 2023. Gqa: Training generalized multi-query trans- former models from multi-head checkpoints.arXiv preprint arXiv:2305.13245 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[2] [2]

Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer normaliza- tion.arXiv preprint arXiv:1607.06450(2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[3] [3]

Keqin Bao, Jizhi Zhang, Wenjie Wang, Yang Zhang, Zhengyi Yang, Yanchen Luo, Chong Chen, Fuli Feng, and Qi Tian. 2025. A bi-step grounding paradigm for large language models in recommendation systems.ACM Transactions on Recommender Systems3, 4 (2025), 1–27

work page 2025

[4] [4]

Iz Beltagy, Matthew E Peters, and Arman Cohan. 2020. Longformer: The long- document transformer.arXiv preprint arXiv:2004.05150(2020)

work page internal anchor Pith review Pith/arXiv arXiv 2020

[5] [5]

Zheng Chai, Qin Ren, Xijun Xiao, Huizhi Yang, Bo Han, Sijun Zhang, Di Chen, Hui Lu, Wenlin Zhao, Lele Yu, et al . 2025. Longer: Scaling up long sequence modeling in industrial recommenders. InProceedings of the Nineteenth ACM Conference on Recommender Systems. 247–256

work page 2025

[6] [6]

Junyi Chen, Lu Chi, Bingyue Peng, and Zehuan Yuan. 2024. Hllm: Enhancing sequential recommendations via hierarchical large language models for item and user modeling.arXiv preprint arXiv:2409.12740(2024)

work page arXiv 2024

[7] [7]

Lida Chen, Dong Xu, Chenxin An, Xintao Wang, Yikai Zhang, Jiangjie Chen, Zujie Liang, Feng Wei, Jiaqing Liang, Yanghua Xiao, et al. 2025. PowerAttention: Exponentially Scaling of Receptive Fields for Effective Sparse Attention.arXiv preprint arXiv:2503.03588(2025)

work page arXiv 2025

[8] [8]

Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. InProceedings of the 10th ACM conference on recommender systems. 191–198

work page 2016

[9] [9]

Yu Cui, Feng Liu, Pengbo Wang, Bohao Wang, Heng Tang, Yi Wan, Jun Wang, and Jiawei Chen. 2024. Distillation matters: empowering sequential recommenders to match the performance of large language models. InProceedings of the 18th ACM Conference on Recommender Systems. 507–517

work page 2024

[10] [10]

Jiayu Ding, Shuming Ma, Li Dong, Xingxing Zhang, Shaohan Huang, Wenhui Wang, Nanning Zheng, and Furu Wei. 2023. Longnet: Scaling transformers to 1,000,000,000 tokens.arXiv preprint arXiv:2307.02486(2023)

work page arXiv 2023

[11] [11]

Hanwen Du, Hui Shi, Pengpeng Zhao, Deqing Wang, Victor S Sheng, Yanchi Liu, Guanfeng Liu, and Lei Zhao. 2022. Contrastive learning with bidirectional transformers for sequential recommendation. InProceedings of the 31st ACM International Conference on Information & Knowledge Management. 396–405

work page 2022

[12] [12]

Ningya Feng, Junwei Pan, Jialong Wu, Baixu Chen, Ximei Wang, Qian Li, Xian Hu, Jie Jiang, and Mingsheng Long. 2024. Long-Sequence Recommendation Models Need Decoupled Embeddings.arXiv preprint arXiv:2410.02604(2024)

work page arXiv 2024

[13] [13]

Yongrui Fu, Jian Liu, Tao Li, Zonggang Wu, Shouke Qin, and Hanmeng Liu

work page

[14] [14]

Multimodal Fusion And Sparse Attention-based Alignment Model for Long Sequential Recommendation.arXiv preprint arXiv:2508.09664(2025)

work page arXiv 2025

[15] [15]

Zichuan Fu, Xiangyang Li, Chuhan Wu, Yichao Wang, Kuicai Dong, Xiangyu Zhao, Mengchen Zhao, Huifeng Guo, and Ruiming Tang. 2023. A unified frame- work for multi-domain ctr prediction via large language models.ACM Transac- tions on Information Systems(2023)

work page 2023

[16] [16]

Jingtong Gao, Bo Chen, Menghui Zhu, Xiangyu Zhao, Xiaopeng Li, Yuhao Wang, Yichao Wang, Huifeng Guo, and Ruiming Tang. 2024. Hierrec: Scenario-aware hierarchical modeling for multi-scenario recommendations. InProceedings of the 33rd ACM International Conference on Information and Knowledge Management. 653–662

work page 2024

[17] [17]

Jingtong Gao, Zhaocheng Du, Xiaopeng Li, Yichao Wang, Xiangyang Li, Huifeng Guo, Ruiming Tang, and Xiangyu Zhao. 2025. SampleLLM: Optimizing Tabular Data Synthesis in Recommendations. InCompanion Proceedings of the ACM on Web Conference 2025. 211–220

work page 2025

[18] [18]

Jingtong Gao, Xiangyu Zhao, Muyang Li, Minghao Zhao, Runze Wu, Ruocheng Guo, Yiding Liu, and Dawei Yin. 2024. Smlp4rec: An efficient all-mlp architecture for sequential recommendations.ACM Transactions on Information Systems42, 3 (2024), 1–23

work page 2024

[19] [19]

Binzong Geng, Zhaoxin Huan, Xiaolu Zhang, Yong He, Liang Zhang, Fajie Yuan, Jun Zhou, and Linjian Mo. 2024. Breaking the length barrier: Llm-enhanced CTR prediction in long textual user behaviors. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2311–2315

work page 2024

[20] [20]

Albert Gu and Tri Dao. 2023. Mamba: Linear-time sequence modeling with selective state spaces.arXiv preprint arXiv:2312.00752(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[21] [21]

Qipeng Guo, Xipeng Qiu, Pengfei Liu, Yunfan Shao, Xiangyang Xue, and Zheng Zhang. 2019. Star-transformer.arXiv preprint arXiv:1902.09113(2019)

work page arXiv 2019

[22] [22]

Xiaowen Huang, Shengsheng Qian, Quan Fang, Jitao Sang, and Changsheng Xu

work page

[23] [23]

InProceedings of the 26th ACM international conference on Multimedia

Csan: Contextual self-attention network for user sequential recommen- dation. InProceedings of the 26th ACM international conference on Multimedia. 447–455

work page

[24] [24]

Dietmar Jannach and Malte Ludewig. 2017. When recurrent neural networks meet the neighborhood for session-based recommendation. InProceedings of the eleventh ACM conference on recommender systems. 306–310

work page 2017

[25] [25]

Pengyue Jia, Zhaocheng Du, Yichao Wang, Xiangyu Zhao, Xiaopeng Li, Yuhao Wang, Qidong Liu, Huifeng Guo, and Ruiming Tang. 2025. SELF: Surrogate- light Feature Selection with Large Language Models in Deep Recommender Systems. InProceedings of the 34th ACM International Conference on Information and Knowledge Management. 1145–1155

work page 2025

[26] [26]

Pengyue Jia, Yiding Liu, Xiaopeng Li, Xiangyu Zhao, Yuhao Wang, Yantong Du, Xiao Han, Xuetao Wei, Shuaiqiang Wang, and Dawei Yin. 2024. G3: an effective and adaptive framework for worldwide geolocalization using large multi-modality models.Advances in Neural Information Processing Systems37 (2024), 53198–53221

work page 2024

[27] [27]

Pengyue Jia, Yichao Wang, Shanru Lin, Xiaopeng Li, Xiangyu Zhao, Huifeng Guo, and Ruiming Tang. 2024. D3: A methodological exploration of domain division, modeling, and balance in multi-domain recommendations. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 8553–8561

work page 2024

[28] [28]

Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, De- vendra Singh Chaplot, de las Diego Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, and William El Sayed. 2023. Mistral 7B.arxiv:2310.0682...

work page internal anchor Pith review Pith/arXiv arXiv 2023

[29] [29]

Huiqiang Jiang, Yucheng Li, Chengruidong Zhang, Qianhui Wu, Xufang Luo, Surin Ahn, Zhenhua Han, Amir H Abdi, Dongsheng Li, Chin-Yew Lin, et al

work page

[30] [30]

Minference 1.0: Accelerating pre-filling for long-context llms via dynamic sparse attention.Advances in Neural Information Processing Systems37 (2024), 52481–52515

work page 2024

[31] [31]

Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recom- mendation. In2018 IEEE international conference on data mining (ICDM). IEEE, 197–206

work page 2018

[32] [32]

Chengxi Li, Yejing Wang, Qidong Liu, Xiangyu Zhao, Wanyu Wang, Yiqi Wang, Lixin Zou, Wenqi Fan, and Qing Li. 2023. STRec: Sparse transformer for sequential recommendations. InProceedings of the 17th ACM conference on recommender systems. 101–111

work page 2023

[33] [33]

Jingyu Li, Zhaocheng Du, Qianhui Zhu, Zhicheng Zhang, Song-Li Wu, Chaolang Li, Pengwen Dai, et al. 2026. CollectiveKV: Decoupling and Sharing Collaborative Information in Sequential Recommendation.arXiv preprint arXiv:2601.19178 (2026)

work page arXiv 2026

[34] [34]

Jiacheng Li, Yujie Wang, and Julian McAuley. 2020. Time interval aware self- attention for sequential recommendation. InProceedings of the 13th international conference on web search and data mining. 322–330

work page 2020

[35] [35]

Muyang Li, Zijian Zhang, Xiangyu Zhao, Wanyu Wang, Minghao Zhao, Runze Wu, and Ruocheng Guo. 2023. Automlp: Automated mlp for sequential recom- mendations. InProceedings of the ACM web conference 2023. 1190–1198

work page 2023

[36] [36]

Muyang Li, Xiangyu Zhao, Chuan Lyu, Minghao Zhao, Runze Wu, and Ruocheng Guo. 2022. MLP4Rec: A pure MLP architecture for sequential recommendations. arXiv preprint arXiv:2204.11510(2022)

work page arXiv 2022

[37] [37]

Shiyang Li, Xiaoyong Jin, Yao Xuan, Xiyou Zhou, Wenhu Chen, Yu-Xiang Wang, and Xifeng Yan. 2019. Enhancing the locality and breaking the memory bottle- neck of transformer on time series forecasting.Advances in neural information processing systems32 (2019)

work page 2019

[38] [38]

Xiaopeng Li, Bo Chen, Junda She, Shiteng Cao, You Wang, Qinlin Jia, Haiying He, Zheli Zhou, Zhao Liu, Ji Liu, et al. 2025. A Survey of Generative Recommendation from a Tri-Decoupled Perspective: Tokenization, Architecture, and Optimization. (2025)

work page 2025

[39] [39]

Xinhang Li, Zhaopeng Qiu, Xiangyu Zhao, Zihao Wang, Yong Zhang, Chunxiao Xing, and Xian Wu. 2022. Gromov-wasserstein guided representation learning for cross-domain recommendation. InProceedings of the 31st ACM International Conference on Information & Knowledge Management. 1199–1208

work page 2022

[40] [40]

Xiaopeng Li, Lixin Su, Pengyue Jia, Xiangyu Zhao, Suqi Cheng, Junfeng Wang, and Dawei Yin. 2023. Agent4ranking: Semantic robust ranking via personalized query rewriting using multi-agent llm.arXiv preprint arXiv:2312.15450(2023)

work page arXiv 2023

[41] [41]

Xiaopeng Li, Fan Yan, Xiangyu Zhao, Yichao Wang, Bo Chen, Huifeng Guo, and Ruiming Tang. 2023. Hamur: Hyper adapter for multi-domain recommendation. InProceedings of the 32nd ACM International Conference on Information and Knowledge Management. 1268–1277

work page 2023

[42] [42]

Xiaopeng Li, Yuanjin Zheng, Wanyu Wang, Pengyue Jia, Yiqi Wang, Maolin Wang, Xuetao Wei, Xiangyu Zhao, et al. 2025. MTA: A Merge-then-Adapt Framework for Personalized Large Language Model.arXiv preprint arXiv:2511.20072(2025)

work page arXiv 2025

[43] [43]

Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Cheng- gang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, et al. 2024. Deepseek-v3 technical report.arXiv preprint arXiv:2412.19437(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[44] [44]

Chengkai Liu, Jianghao Lin, Hanzhou Liu, Jianling Wang, and James Caverlee

work page

[45] [45]

InProceedings of the 33rd ACM international conference on information and knowledge management

Behavior-dependent linear recurrent units for efficient sequential recom- mendation. InProceedings of the 33rd ACM international conference on information and knowledge management. 1430–1440

work page

[46] [46]

Chengkai Liu, Jianghao Lin, Jianling Wang, Hanzhou Liu, and James Caverlee

work page

[47] [47]

Mamba4rec: Towards efficient sequential recommendation with selective state space models.arXiv preprint arXiv:2403.03900(2024)

work page arXiv 2024

[48] [48]

Langming Liu, Liu Cai, Chi Zhang, Xiangyu Zhao, Jingtong Gao, Wanyu Wang, Yifu Lv, Wenqi Fan, Yiqi Wang, Ming He, et al. 2023. Linrec: Linear attention WWW ’26, April 13–17, 2026, Dubai, United Arab Emirates Mengyang Ma et al. mechanism for long-term sequential recommender systems. InProceedings of the 46th International ACM SIGIR Conference on Research a...

work page 2023

[49] [49]

Qidong Liu, Xian Wu, Yejing Wang, Zijian Zhang, Feng Tian, Yefeng Zheng, and Xiangyu Zhao. 2024. Llm-esr: Large language models enhancement for long- tailed sequential recommendation.Advances in Neural Information Processing Systems37 (2024), 26701–26727

work page 2024

[50] [50]

Qidong Liu, Xiangyu Zhao, Yejing Wang, Zijian Zhang, Howard Zhong, Chong Chen, Xiang Li, Wei Huang, and Feng Tian. 2025. Bridge the Domains: Large Lan- guage Models Enhanced Cross-domain Sequential Recommendation. InProceed- ings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1582–1592

work page 2025

[51] [51]

Shuchang Liu, Qingpeng Cai, Bowen Sun, Yuhao Wang, Ji Jiang, Dong Zheng, Peng Jiang, Kun Gai, Xiangyu Zhao, and Yongfeng Zhang. 2023. Exploration and regularization of the latent action space in recommendation. InProceedings of the ACM Web Conference 2023. 833–844

work page 2023

[52] [52]

Ziwei Liu, Qidong Liu, Yejing Wang, Wanyu Wang, Pengyue Jia, Maolin Wang, Zitao Liu, Yi Chang, and Xiangyu Zhao. 2025. SIGMA: Selective Gated Mamba for Sequential Recommendation. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 12264–12272

work page 2025

[53] [53]

Enzhe Lu, Zhejun Jiang, Jingyuan Liu, Yulun Du, Tao Jiang, Chao Hong, Shaowei Liu, Weiran He, Enming Yuan, Yuzhi Wang, et al. 2025. Moba: Mixture of block attention for long-context llms.arXiv preprint arXiv:2502.13189(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[54] [54]

Yucheng Lu, Jiangxia Cao, Xu Kuan, Wei Cheng, Wei Jiang, Jiaming Zhang, Yang Shuang, Liu Zhaojie, and Liyin Hong. 2025. LiveForesighter: Generating Future Information for Live-Streaming Recommendations at Kuaishou.arXiv preprint arXiv:2502.06557(2025)

work page arXiv 2025

[55] [55]

Fuyu Lv, Taiwei Jin, Changlong Yu, Fei Sun, Quan Lin, Keping Yang, and Wil- fred Ng. 2019. SDM: Sequential deep matching model for online large-scale recommender system. InProceedings of the 28th ACM international conference on information and knowledge management. 2635–2643

work page 2019

[56] [56]

Dongyang Ma, Yan Wang, and Lan Tian. 2024. Block-attention for efficient prefilling.arXiv preprint arXiv:2409.15355(2024)

work page arXiv 2024

[57] [57]

Qijie Shen, Hong Wen, Jing Zhang, and Qi Rao. 2022. Hierarchically fusing long and short-term user interests for click-through rate prediction in product search. InProceedings of the 31st ACM International Conference on Information & Knowledge Management. 1767–1776

work page 2022

[58] [58]

Enxin Song, Wenhao Chai, Shusheng Yang, Ethan Armand, Xiaojun Shan, Haiyang Xu, Jianwen Xie, and Zhuowen Tu. 2025. Videonsa: Native sparse attention scales video understanding.arXiv preprint arXiv:2510.02295(2025)

work page arXiv 2025

[59] [59]

Jianlin Su, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. 2024. Roformer: Enhanced transformer with rotary position embedding. Neurocomputing568 (2024), 127063

work page 2024

[60] [60]

Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang

work page

[61] [61]

InProceedings of the 28th ACM international conference on information and knowledge management

BERT4Rec: Sequential recommendation with bidirectional encoder rep- resentations from transformer. InProceedings of the 28th ACM international conference on information and knowledge management. 1441–1450

work page

[62] [62]

Philippe Tillet, Hsiang-Tsung Kung, and David Cox. 2019. Triton: an intermediate language and compiler for tiled neural network computations. InProceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Program- ming Languages. 10–19

work page 2019

[63] [63]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need.Advances in neural information processing systems30 (2017)

work page 2017

[64] [64]

Yuhao Wang, Xiaopeng Li, Cheng Gong, Ziru Liu, Suiyun Zhang, Rui Liu, and Xiangyu Zhao. 2025. Efficient Reasoning via Reward Model.arXiv preprint arXiv:2511.09158(2025)

work page arXiv 2025

[65] [65]

Yuhao Wang, Xiangyu Zhao, Bo Chen, Qidong Liu, Huifeng Guo, Huanshuo Liu, Yichao Wang, Rui Zhang, and Ruiming Tang. 2023. PLATE: A prompt-enhanced paradigm for multi-scenario recommendations. InProceedings of the 46th In- ternational ACM SIGIR Conference on Research and Development in Information Retrieval. 1498–1507

work page 2023

[66] [66]

Qihang Yu, Kairui Fu, Zhaocheng Du, Yuxuan Si, Kaiyuan Li, Weihao Zhao, Zhicheng Zhang, Jieming Zhu, Quanyu Dai, Zhenhua Dong, et al. 2026. MAL- LOC: Benchmarking the Memory-aware Long Sequence Compression for Large Sequential Recommendation.arXiv preprint arXiv:2601.20234(2026)

work page arXiv 2026

[67] [67]

Jingyang Yuan, Huazuo Gao, Damai Dai, Junyu Luo, Liang Zhao, Zhengyan Zhang, Zhenda Xie, YX Wei, Lean Wang, Zhiping Xiao, et al . 2025. Native sparse attention: Hardware-aligned and natively trainable sparse attention.arXiv preprint arXiv:2502.11089(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[68] [68]

Manzil Zaheer, Guru Guruganesh, Kumar Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, et al. 2020. Big bird: Transformers for longer sequences.Advances in neural information processing systems33 (2020), 17283–17297

work page 2020

[69] [69]

Jiaqi Zhai, Lucy Liao, Xing Liu, Yueming Wang, Rui Li, Xuan Cao, Leon Gao, Zhao- jie Gong, Fangda Gu, Michael He, et al. 2024. Actions speak louder than words: Trillion-parameter sequential transducers for generative recommendations.arXiv preprint arXiv:2402.17152(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[70] [70]

Chao Zhang, Haoxin Zhang, Shiwei Wu, Di Wu, Tong Xu, Xiangyu Zhao, Yan Gao, Yao Hu, and Enhong Chen. 2024. Notellm-2: Multimodal large representation models for recommendation.arXiv preprint arXiv:2405.16789(2024)

work page arXiv 2024

[71] [71]

Qianru Zhang, Liang Qu, Honggang Wen, Dong Huang, Siu-Ming Yiu, Nguyen Quoc Viet Hung, and Hongzhi Yin. 2025. M2Rec: Multi-scale Mamba for Efficient Sequential Recommendation.arXiv preprint arXiv:2505.04445(2025)

work page arXiv 2025

[72] [72]

Sheng Zhang, Maolin Wang, Wanyu Wang, Jingtong Gao, Xiangyu Zhao, Yu Yang, Xuetao Wei, Zitao Liu, and Tong Xu. 2025. Glint-ru: Gated lightweight intelligent recurrent units for sequential recommender systems. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V

work page 2025

[73] [73]

Zhicheng Zhang, Zhaocheng Du, Jieming Zhu, Jiwei Tang, Fengyuan Lu, Wang Jiaheng, Song-Li Wu, Qianhui Zhu, Jingyu Li, Hai-Tao Zheng, et al. 2026. Length- Adaptive Interest Network for Balancing Long and Short Sequence Modeling in CTR Prediction.arXiv preprint arXiv:2601.19142(2026)

work page arXiv 2026

[74] [74]

Xiangyu Zhao, Yichao Wang, Bo Chen, Jingtong Gao, Yuhao Wang, Xiaopeng Li, Pengyue Jia, Qidong Liu, Huifeng Guo, and Ruiming Tang. 2025. Joint Modeling in Recommendations: A Survey.arXiv preprint arXiv:2502.21195(2025)

work page arXiv 2025

[75] [75]

Xiangyu Zhao, Long Xia, Liang Zhang, Zhuoye Ding, Dawei Yin, and Jiliang Tang. 2018. Deep reinforcement learning for page-wise recommendations. In Proceedings of the 12th ACM conference on recommender systems. 95–103

work page 2018

[76] [76]

Xiangyu Zhao, Liang Zhang, Zhuoye Ding, Long Xia, Jiliang Tang, and Dawei Yin

work page

[77] [77]

InProceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining

Recommendations with negative feedback via pairwise deep reinforcement learning. InProceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 1040–1048. A Observations To investigate whether interaction sequences can be processed in a block-wise pattern, we extracted the complete interaction sequence of user #566 fro...

work page 2000