Recognition: unknown
Pay Attention to Sequence Split: Uncovering the Impacts of Sub-Sequence Splitting on Sequential Recommendation Models
Pith reviewed 2026-05-10 19:56 UTC · model grok-4.3
The pith
Sub-sequence splitting interferes with the evaluation of sequential recommendation models by inflating their measured performance.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Sub-sequence splitting may interfere with the evaluation of the model's actual performance. Many state-of-the-art sequential recommendation models apply it during the data reading stage without disclosure in their papers. Removing the operation leads to large performance drops that place recent models below classical ones. The technique produces gains only when combined with specific splitting methods, target selection strategies, and loss functions; other combinations can harm results. It achieves these effects by evening out the distribution of training examples and by increasing the probability that different items are chosen as targets.
What carries the argument
Sub-sequence splitting, the operation that breaks each raw user interaction sequence into multiple shorter sub-sequences before model training.
If this is right
- Recent sequential recommendation models lose much of their reported advantage once the unreported split is removed.
- Only particular combinations of splitting method, target strategy, and loss function allow the split to improve results.
- The split balances the frequency of training examples across items and raises the chance that varied items become targets.
- Evaluation practices must disclose all data-preprocessing steps to allow fair comparison across models.
Where Pith is reading between the lines
- Papers should release the exact data-reading code so that others can verify whether splits or other hidden steps are present.
- Data-augmentation methods in sequential recommendation may need separate testing with and without sequence splitting to isolate genuine gains.
- New models could be required to report results both with and without common preprocessing operations to show robustness.
Load-bearing premise
The performance declines seen after removing the split are caused mainly by the splitting step itself rather than by other unreported implementation choices in the tested models.
What would settle it
Re-running the recent state-of-the-art models on standard datasets with the sub-sequence splitting step disabled and checking whether accuracy falls below that of earlier classical models.
Figures
read the original abstract
Sub-sequence splitting (SSS) has been demonstrated as an effective approach to mitigate data sparsity in sequential recommendation (SR) by splitting a raw user interaction sequence into multiple sub-sequences. Previous studies have demonstrated its ability to enhance the performance of SR models significantly. However, in this work, we discover that \textbf{(i). SSS may interfere with the evaluation of the model's actual performance.} We observed that many recent state-of-the-art SR models employ SSS during the data reading stage (not mentioned in the papers). When we removed this operation, performance significantly declined, even falling below that of earlier classical SR models. The varying improvements achieved by SSS and different splitting methods across different models prompt us to analyze further when SSS proves effective. We find that \textbf{(ii). SSS demonstrates strong capabilities only when specific splitting methods, target strategies, and loss functions are used together.} Inappropriate combinations may even harm performance. Furthermore, we analyze why sub-sequence splitting yields such remarkable performance gains and find that \textbf{(iii). it evens out the distribution of training data while increasing the likelihood that different items are targeted.} Finally, we provide suggestions for overcoming SSS interference, along with a discussion on data augmentation methods and future directions. We hope this work will prompt the broader community to re-examine the impact of data splitting on SR and promote fairer, more rigorous model evaluation. All analysis code and data will be made available upon acceptance. We provide a simple, anonymous implementation at https://github.com/KingGugu/SSS4SR.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that sub-sequence splitting (SSS) is an undocumented but widely used operation in the data pipelines of many recent state-of-the-art sequential recommendation (SR) models. Removing SSS from these models causes large performance drops, sometimes below classical baselines. SSS is effective only under specific combinations of splitting method, target strategy, and loss function; it works by evening the training-data distribution and raising the chance that different items become targets. The authors supply suggestions for avoiding SSS interference and release analysis code.
Significance. If the empirical observations hold after proper controls, the work is significant for the SR community because it identifies a hidden implementation choice that can distort model comparisons and inflate reported gains. The provision of reproducible code and the explicit call for re-examination of data-handling practices are positive contributions that could improve evaluation rigor.
major comments (3)
- [Experimental Setup / Results] The central claim that performance declines are caused by the removal of SSS (rather than by other uncontrolled implementation differences) requires verification that the 'with SSS' re-implementations reproduce the original papers' reported metrics. Without side-by-side tables comparing the authors' re-runs against the numbers published in the source papers for each SOTA model, it remains possible that negative-sampling, truncation, or loss-scaling discrepancies drive the observed drops.
- [Introduction / Related Work] The statement that 'many recent state-of-the-art SR models employ SSS during the data reading stage (not mentioned in the papers)' needs explicit enumeration: which models were inspected, how the presence of SSS was detected in their released code, and whether the detection was performed on the exact versions used in the original publications.
- [Analysis of SSS Effectiveness] The finding that SSS is effective only under particular combinations of splitting method, target strategy, and loss function is load-bearing for claim (ii). The paper should report the full factorial ablation (all combinations) with statistical significance tests rather than selected examples, so readers can judge the scope of the interaction effect.
minor comments (2)
- [Abstract] The GitHub link is described as 'anonymous'; confirm that the repository does not contain author-identifying metadata before public release.
- [Analysis of Why SSS Works] Clarify the exact definition of 'target item' and 'loss function' used in the distribution analysis (iii) so that the claimed increase in target-item likelihood can be reproduced from the released code.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major point below and will revise the manuscript accordingly to improve clarity and rigor.
read point-by-point responses
-
Referee: [Experimental Setup / Results] The central claim that performance declines are caused by the removal of SSS (rather than by other uncontrolled implementation differences) requires verification that the 'with SSS' re-implementations reproduce the original papers' reported metrics. Without side-by-side tables comparing the authors' re-runs against the numbers published in the source papers for each SOTA model, it remains possible that negative-sampling, truncation, or loss-scaling discrepancies drive the observed drops.
Authors: We agree that confirming reproduction of the original reported metrics is necessary to isolate the effect of SSS. In the revised manuscript, we will add side-by-side tables in the Experimental Setup section comparing our re-implemented 'with SSS' results to the metrics published in the source papers for each SOTA model. This will help verify that performance drops stem from SSS removal rather than other implementation factors. revision: yes
-
Referee: [Introduction / Related Work] The statement that 'many recent state-of-the-art SR models employ SSS during the data reading stage (not mentioned in the papers)' needs explicit enumeration: which models were inspected, how the presence of SSS was detected in their released code, and whether the detection was performed on the exact versions used in the original publications.
Authors: We will revise the Introduction and Related Work to include an explicit enumeration of the inspected models, a description of how SSS was identified by examining the data reading pipelines in the released code, and confirmation of the code versions used. This added detail will provide the requested transparency. revision: yes
-
Referee: [Analysis of SSS Effectiveness] The finding that SSS is effective only under particular combinations of splitting method, target strategy, and loss function is load-bearing for claim (ii). The paper should report the full factorial ablation (all combinations) with statistical significance tests rather than selected examples, so readers can judge the scope of the interaction effect.
Authors: We acknowledge that selected examples limit the ability to assess the full scope. The revised analysis section will present the complete factorial ablation across all combinations of splitting methods, target strategies, and loss functions, accompanied by statistical significance tests to substantiate the interaction effects. revision: yes
Circularity Check
No circularity: empirical observations on data preprocessing effects
full rationale
The paper reports experimental findings that removing an undocumented sub-sequence splitting (SSS) step from re-implemented SOTA models causes performance drops. No equations, fitted parameters, or derivations are present in the provided text. The central claims rest on direct before/after comparisons rather than any self-referential definition, prediction-from-fit, or self-citation chain. The analysis is therefore self-contained against external benchmarks and receives the default non-circularity finding.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Vito Walter Anelli, Daniele Malitesta, Claudio Pomo, Alejandro Bellogín, Euge- nio Di Sciascio, and Tommaso Di Noia. 2023. Challenging the myth of graph collaborative filtering: a reasoned and reproducibility-driven analysis. InRecSys. 350–361
2023
-
[2]
Ilwoong Baek, Mincheol Yoon, Seongmin Park, and Jongwuk Lee. 2025. MUFFIN: Mixture of User-Adaptive Frequency Filtering for Sequential Recommendation. InCIKM. 119–128
2025
-
[3]
Shuqing Bian, Xingyu Pan, Wayne Xin Zhao, Jinpeng Wang, Chuyuan Wang, and Ji-Rong Wen. 2023. Multi-modal mixture of experts represetation learning for sequential recommendation. InCIKM. 110–119
2023
-
[4]
Jiawei Chen, Hande Dong, Yang Qiu, Xiangnan He, Xin Xin, Liang Chen, Guli Lin, and Keping Yang. 2021. AutoDebias: Learning to debias for recommendation. InSIGIR. 21–30
2021
-
[5]
Yongjun Chen, Zhiwei Liu, Jia Li, Julian McAuley, and Caiming Xiong. 2022. Intent contrastive learning for sequential recommendation. InWWW. 2172–2182
2022
-
[6]
Yizhou Dang, Yuting Liu, Enneng Yang, Guibing Guo, Linying Jiang, Xingwei Wang, and Jianzhe Zhao. 2024. Repeated Padding for Sequential Recommendation. InRecSys. 497–506
2024
-
[7]
Yizhou Dang, Yuting Liu, Enneng Yang, Minhan Huang, Guibing Guo, Jianzhe Zhao, and Xingwei Wang. 2025. Data augmentation as free lunch: Exploring the test-time augmentation for sequential recommendation. InSIGIR. 1466–1475
2025
- [8]
-
[9]
Yizhou Dang, Enneng Yang, Guibing Guo, Linying Jiang, Xingwei Wang, Xiaoxiao Xu, Qinghui Sun, and Hong Liu. 2023. TiCoSeRec: Augmenting data to uniform sequences by time intervals for effective recommendation.TKDE(2023)
2023
-
[10]
Yizhou Dang, Enneng Yang, Guibing Guo, Linying Jiang, Xingwei Wang, Xiaoxiao Xu, Qinghui Sun, and Hong Liu. 2023. Uniform sequence better: Time interval aware data augmentation for sequential recommendation. InAAAI
2023
- [11]
-
[12]
Yizhou Dang, Enneng Yang, Yuting Liu, Jianzhe Zhao, Xingwei Wang, and Guib- ing Guo. 2026. Exploring and Tailoring the Test-Time Augmentation for Sequen- tial Recommendation.TPAMI(2026)
2026
- [13]
-
[14]
Yizhou Dang, Jiahui Zhang, Yuting Liu, Enneng Yang, Yuliang Liang, Guibing Guo, Jianzhe Zhao, and Xingwei Wang. 2025. Augmenting Sequential Recommendation with Balanced Relevance and Diversity. InAAAI, Vol. 39. 11563–11571
2025
-
[15]
Yicheng Di, Hongjian Shi, Xiaoming Wang, Ruhui Ma, and Yuan Liu. 2025. Feder- ated recommender system based on diffusion augmentation and guided denoising. TOIS43, 2 (2025), 1–36
2025
-
[16]
Yicheng Di, Xiaoming Wang, Hongjian Shi, Chongsheng Fan, Rong Zhou, Ruhui Ma, and Yuan Liu. 2025. Personalized consumer federated recommender system using fine-grained transformation and hybrid information sharing.TCE(2025)
2025
-
[17]
Ziwei Fan, Zhiwei Liu, Yu Wang, Alice Wang, Zahra Nazari, Lei Zheng, Hao Peng, and Philip S Yu. 2022. Sequential recommendation via stochastic self-attention. InWWW. 2036–2047
2022
-
[18]
Maurizio Ferrari Dacrema, Paolo Cremonesi, and Dietmar Jannach. 2019. Are we really making much progress? A worrying analysis of recent neural recommen- dation approaches. InSIGIR. 101–109
2019
-
[19]
Haoyan Fu, Zhida Qin, Shixiao Yang, Haoyao Zhang, Bin Lu, Shuang Li, Tianyu Huang, and John CS Lui. 2025. Time Matters: Enhancing Sequential Recommen- dations with Time-Guided Graph Neural ODEs. InKDD. 637–648
2025
- [20]
-
[21]
Ruining He and Julian McAuley. 2016. Fusing similarity models with markov chains for sparse sequential recommendation. InICDM. IEEE, 191–200
2016
-
[22]
Byungmoon Heo and Jaekwang Kim. 2025. WaveRec: Is Wavelet Transform a Better Alternative to Fourier Transform for Sequential Recommendation?. In SIGIR. 497–502
2025
-
[23]
Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk
-
[24]
Session-based recommendations with recurrent neural networks.arXiv preprint arXiv:1511.06939(2015)
work page internal anchor Pith review arXiv 2015
-
[25]
Wei Ji, Xiangyan Liu, An Zhang, Yinwei Wei, Yongxin Ni, and Xiang Wang. 2023. Online distillation-enhanced multi-modal transformer for sequential recommen- dation. InMM. 955–965
2023
-
[26]
Juyong Jiang, Peiyan Zhang, Yingtao Luo, Chaozhuo Li, Jae Boum Kim, Kai Zhang, Senzhang Wang, Sunghun Kim, and Philip S Yu. 2025. Improving Sequential Recommendations via Bidirectional Temporal Data Augmentation With Pre- Training.TKDE(2025)
2025
-
[27]
Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recom- mendation. InICDM. IEEE, 197–206
2018
-
[28]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic opti- mization.arXiv preprint arXiv:1412.6980(2014). Pay Attention to Sequence Split: Uncovering the Impacts of Sub-Sequence Splitting on Sequential Recommendation Models Conference acronym ’XX, June 03–05, 2018, Woodstock, NY
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[29]
Joseph A Konstan and Gediminas Adomavicius. 2013. Toward identification and adoption of best practices in algorithmic recommender systems research. InProceedings of the international workshop on Reproducibility and replication in recommender systems evaluation. 23–28
2013
-
[30]
Walid Krichene and Steffen Rendle. 2020. On sampled metrics for item recom- mendation. InKDD. 1748–1757
2020
-
[31]
Pengxiang Lan, Haoyu Xu, Enneng Yang, Yuliang Liang, Guibing Guo, Jianzhe Zhao, and Xingwei Wang. 2025. Efficient and effective prompt tuning via prompt decomposition and compressed outer product. InCOLONG. 4406–4421
2025
-
[32]
Pengxiang Lan, Enneng Yang, Yuting Liu, Guibing Guo, Jianzhe Zhao, and Xing- wei Wang. 2025. EPT: efficient prompt tuning by multi-space projection and prompt fusion. InAAAI, Vol. 39. 24366–24374
2025
-
[33]
Pengxiang Lan, Yihao Zhang, Haoran Xiang, Yuhao Wang, and Wei Zhou. 2023. Spatio-temporal position-extended and gated-deep network for next poi recom- mendation. InDASFAA. Springer, 505–520
2023
- [34]
-
[35]
Ming Li, Ali Vardasbi, Andrew Yates, and Maarten De Rijke. 2023. Repetition and exploration in sequential recommendation. InSIGIR. 2532–2541
2023
-
[36]
Yaoyiran Li, Xiang Zhai, Moustafa Alzantot, Keyi Yu, Ivan Vulić, Anna Korhonen, and Mohamed Hammad. 2024. Calrec: Contrastive alignment of generative llms for sequential recommendation. InRecSys. 422–432
2024
-
[37]
Zihao Li, Aixin Sun, and Chenliang Li. 2023. Diffurec: A diffusion model for sequential recommendation.TOIS42, 3 (2023), 1–28
2023
-
[38]
Yuliang Liang, Yuting Liu, Yizhou Dang, Enneng Yang, Guibing Guo, Wei Cai, Jianzhe Zhao, and Xingwei Wang. 2025. Self-supervised Hierarchical Represen- tation for Medication Recommendation. InDASFAA. Springer, 134–150
2025
-
[39]
Guanyu Lin, Chen Gao, Yu Zheng, Jianxin Chang, Yanan Niu, Yang Song, Kun Gai, Zhiheng Li, Depeng Jin, Yong Li, et al. 2024. Mixed attention network for cross-domain sequential recommendation. InWSDM. 405–413
2024
-
[40]
Haibo Liu, Zhixiang Deng, Liang Wang, Jinjia Peng, and Shi Feng. 2023. Distribution-based learnable filters with side information for sequential rec- ommendation. InRecSys. 78–88
2023
-
[41]
Qidong Liu, Xian Wu, Yejing Wang, Zijian Zhang, Feng Tian, Yefeng Zheng, and Xiangyu Zhao. 2024. Llm-esr: Large language models enhancement for long-tailed sequential recommendation.NIPS37 (2024), 26701–26727
2024
-
[42]
Qidong Liu, Fan Yan, Xiangyu Zhao, Zhaocheng Du, Huifeng Guo, Ruiming Tang, and Feng Tian. 2023. Diffusion augmentation for sequential recommendation. In CIKM. 1576–1586
2023
-
[43]
Qidong Liu, Xiangyu Zhao, Yuhao Wang, Yejing Wang, Zijian Zhang, Yuqi Sun, Xiang Li, Maolin Wang, Pengyue Jia, Chong Chen, et al. 2025. Large Language Model Enhanced Recommender Systems: Methods, Applications and Trends. In KDD. 6096–6106
2025
-
[44]
Qijiong Liu, Jieming Zhu, Yanting Yang, Quanyu Dai, Zhaocheng Du, Xiao-Ming Wu, Zhou Zhao, Rui Zhang, and Zhenhua Dong. 2024. Multimodal pretraining, adaptation, and generation for recommendation: A survey. InKDD. 6566–6576
2024
-
[45]
Xiaoyu Liu, Jiaxin Yuan, Yuhang Zhou, Jingling Li, Furong Huang, and Wei Ai
-
[46]
CSRec: Rethinking Sequential Recommendation from A Causal Perspective.. InSIGIR. 1562–1571
-
[47]
Yuli Liu, Christian Walder, Lexing Xie, and Yiqun Liu. 2024. Probabilistic attention for sequential recommendation. InKDD. 1956–1967
2024
- [48]
-
[49]
Zhiwei Liu, Ziwei Fan, Yu Wang, and Philip S Yu. 2021. Augmenting sequential recommendation with pseudo-prior items via reversely pre-training transformer. InSIGIR. 1608–1612
2021
-
[50]
Julian McAuley, Rahul Pandey, and Jure Leskovec. 2015. Inferring networks of substitutable and complementary products. InKDD. 785–794
2015
-
[51]
Krishna P Neupane, Ervine Zheng, and Qi Yu. 2024. Evidential stochastic dif- ferential equations for time-aware sequential recommendation.NIPS37 (2024), 67556–67574
2024
-
[52]
Xingyu Pan, Yushuo Chen, Changxin Tian, Zihan Lin, Jinpeng Wang, He Hu, and Wayne Xin Zhao. 2022. Multimodal meta-learning for cold-start sequential recommendation. InCIKM. 3421–3430
2022
-
[53]
Xiuyuan Qin, Huanhuan Yuan, Pengpeng Zhao, Guanfeng Liu, Fuzhen Zhuang, and Victor S Sheng. 2024. Intent contrastive learning with cross subsequences for sequential recommendation. InWSDM. 548–556
2024
-
[54]
Ruihong Qiu, Zi Huang, Hongzhi Yin, and Zijian Wang. 2022. Contrastive learning for representation degeneration problem in sequential recommendation. InWSDM. 813–823
2022
-
[55]
Shilin Qu, Fajie Yuan, Guibing Guo, Liguang Zhang, and Wei Wei. 2022. CmnRec: Sequential Recommendations with Chunk-accelerated Memory Network.TKDE (2022)
2022
-
[56]
Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. 2010. Factor- izing personalized markov chains for next-basket recommendation. InWWW. 811–820
2010
-
[57]
Faisal Shehzad, Maurizio Ferrari Dacrema, and Dietmar Jannach. 2025. A Worry- ing Reproducibility Study of Intent-Aware Recommendation Models. InSIGIR. 3155–3164
2025
- [58]
-
[59]
Yehjin Shin, Jeongwhan Choi, Hyowon Wi, and Noseong Park. 2024. An attentive inductive bias for sequential recommendation beyond the self-attention. InAAAI, Vol. 38. 8984–8992
2024
-
[60]
Yatong Sun, Bin Wang, Zhu Sun, and Xiaochun Yang. 2021. Does Every Data In- stance Matter? Enhancing Sequential Recommendation by Eliminating Unreliable Data.. InIJCAI. 1579–1585
2021
-
[61]
Yatong Sun, Bin Wang, Zhu Sun, Xiaochun Yang, and Yan Wang. 2023. Theoreti- cally guaranteed bidirectional data rectification for robust sequential recommen- dation.NIPS36 (2023), 2850–2876
2023
-
[62]
Yatong Sun, Xiaochun Yang, Zhu Sun, Yan Wang, Bin Wang, and Xinghua Qu
-
[63]
InAAAI, Vol
LLM4RSR: Large Language Models as Data Correctors for Robust Sequential Recommendation. InAAAI, Vol. 39. 12604–12612
-
[64]
Yong Kiam Tan, Xinxing Xu, and Yong Liu. 2016. Improved recurrent neural networks for session-based recommendations. InDLRS. 17–22
2016
-
[65]
Jiaxi Tang and Ke Wang. 2018. Personalized top-n sequential recommendation via convolutional sequence embedding. InWSDM. 565–573
2018
-
[66]
A Vaswani. 2017. Attention is all you need.NIPS(2017)
2017
- [67]
-
[68]
Wenjie Wang, Xinyu Lin, Fuli Feng, Xiangnan He, Min Lin, and Tat-Seng Chua
-
[69]
Causal representation learning for out-of-distribution recommendation. In WWW. 3562–3571
-
[70]
Zhenlei Wang, Jingsen Zhang, Hongteng Xu, Xu Chen, Yongfeng Zhang, Wayne Xin Zhao, and Ji-Rong Wen. 2021. Counterfactual data-augmented se- quential recommendation. InSIGIR. 347–356
2021
-
[71]
Zihao Wu, Xin Wang, Hong Chen, Kaidong Li, Yi Han, Lifeng Sun, and Wenwu Zhu. 2023. Diff4rec: Sequential recommendation with curriculum-scheduled diffusion augmentation. InMM. 9329–9335
2023
-
[72]
Jiafeng Xia, Dongsheng Li, Hansu Gu, Tun Lu, Peng Zhang, Li Shang, and Ning Gu. 2025. Oracle-guided Dynamic User Preference Modeling for Sequential Recommendation. InWSDM. 363–372
2025
-
[73]
Lianghao Xia, Chao Huang, Yong Xu, and Jian Pei. 2022. Multi-behavior sequential recommendation with temporal graph transformer.TKDE35, 6 (2022), 6099–6112
2022
-
[74]
Xu Xie, Fei Sun, Zhaoyang Liu, Shiwen Wu, Jinyang Gao, Jiandong Zhang, Bolin Ding, and Bin Cui. 2022. Contrastive learning for sequential recommendation. In ICDE. IEEE, 1259–1273
2022
- [75]
-
[76]
Mingshi Xu, Haoren Zhu, and Wilfred Siu Hung Ng. 2025. Multi-Item-Query Attention for Stable Sequential Recommendation. InCIKM. 5381–5385
2025
-
[77]
Wujiang Xu, Qitian Wu, Runzhong Wang, Mingming Ha, Qiongxu Ma, Linxun Chen, Bing Han, and Junchi Yan. 2024. Rethinking cross-domain sequential recommendation under open-world assumptions. InWWW. 3173–3184
2024
-
[78]
Zhengyi Yang, Xiangnan He, Jizhi Zhang, Jiancan Wu, Xin Xin, Jiawei Chen, and Xiang Wang. 2023. A generic learning framework for sequential recommendation with distribution shifts. InSIGIR. 331–340
2023
-
[79]
Wenwen Ye, Shuaiqiang Wang, Xu Chen, Xuepeng Wang, Zheng Qin, and Dawei Yin. 2020. Time matters: Sequential recommendation with complex temporal information. InSIGIR. 1459–1468
2020
-
[80]
Yaowen Ye, Lianghao Xia, and Chao Huang. 2023. Graph masked autoencoder for sequential recommendation. InSIGIR. 321–330
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.