CAST: Modeling Semantic-Level Transitions for Complementary-Aware Sequential Recommendation
Pith reviewed 2026-05-10 02:03 UTC · model grok-4.3
The pith
CAST models dynamic transitions directly in discrete semantic code space to capture fine-grained item complementarity beyond co-occurrence statistics.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The CAST framework introduces semantic-level transitions that model dynamic changes directly in the discrete semantic code space, thereby capturing fine-grained semantic dependencies that are lost when codes are aggregated into item representations; a complementary prior injection module then incorporates LLM-verified complementary priors into the attention mechanism to prioritize true complementary patterns over spurious co-occurrence statistics.
What carries the argument
The semantic-level transition module that operates directly on discrete semantic codes to track fine-grained dependencies, paired with the complementary prior injection module that biases attention toward verified complementary relations.
If this is right
- CAST achieves up to 17.6% higher Recall and 16.0% higher NDCG than state-of-the-art sequential recommenders on multiple e-commerce datasets.
- The framework trains 65 times faster while delivering these accuracy improvements.
- Direct modeling in semantic code space uncovers latent complementarity that co-purchase statistics alone cannot reliably detect.
- The approach reduces reliance on aggregated item representations that blur specific semantic details required for complementarity.
Where Pith is reading between the lines
- If the underlying semantic codes are generated by large language models, any biases in those models could propagate into the learned transitions and recommendations.
- The same semantic-transition idea could be tested on non-e-commerce domains such as music playlists or article reading sequences where relations are semantic rather than transactional.
- Because transitions are tracked at the code level, the model might naturally yield more interpretable explanations by highlighting which semantic attributes drive a recommendation.
- Replacing the LLM-verified priors with priors derived from other sources would test how sensitive the performance gains are to the quality of the complementary signals.
Load-bearing premise
LLM-verified complementary priors accurately reflect genuine item complementarity without adding new biases, and transitions modeled in discrete semantic code space retain the necessary fine-grained dependencies.
What would settle it
An experiment on a dataset containing expert-annotated ground-truth complementary item pairs in which CAST fails to outperform strong statistic-based baselines on next-item prediction would falsify the claim.
Figures
read the original abstract
Sequential Recommendation (SR) aims to predict the next interaction of a user based on their behavior sequence, where complementary relations often provide essential signals for predicting the next item. However, mainstream models relying on sparse co-purchase statistics often mistake spurious correlations (e.g., due to popularity bias) for true complementary relations. Identifying true complementary relations requires capturing the fine-grained item semantics (e.g., specifications) that simple cooccurrence statistics would be unable to model. While recent semantics-based methods utilize discrete semantic codes to represent items, they typically aggregate semantic codes into coarse item representations. This aggregation process blurs specific semantic details required to identify complementarity. To address these critical limitations and effectively leverage semantics for capturing reliable complementary relations, we propose a Complementary-Aware Semantic Transition (CAST) framework that introduces a new modeling paradigm built upon semantic-level transitions. Specifically, a semantic-level transition module is designed to model dynamic transitions directly in the discrete semantic code space, effectively capturing fine-grained semantic dependencies often lost in aggregated item representations. Then, a complementary prior injection module is designed to incorporate LLM-verified complementary priors into the attention mechanism, thereby prioritizing complementary patterns over co-occurrence statistics. Experiments on multiple e-commerce datasets demonstrate that CAST consistently outperforms the state-of-the-art approaches, achieving up to 17.6% Recall and 16.0% NDCG gains with 65x training acceleration. This validates its effectiveness and efficiency in uncovering latent item complementarity beyond statistics. The code will be released upon acceptance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes the CAST framework for sequential recommendation. It introduces a semantic-level transition module to model dynamic transitions directly in discrete semantic code space (capturing fine-grained dependencies lost in aggregated representations) and a complementary prior injection module that incorporates LLM-verified complementary priors into the attention mechanism to prioritize true complementarity over co-occurrence statistics. Experiments on multiple e-commerce datasets are reported to show consistent outperformance over state-of-the-art methods, with gains up to 17.6% Recall and 16.0% NDCG plus 65x training acceleration.
Significance. If the results hold under rigorous validation, the work could meaningfully advance complementary-aware sequential recommendation by showing how semantic code transitions combined with external LLM priors can reduce reliance on spurious co-purchase statistics. The reported efficiency gains are a notable practical strength, and the planned code release would aid reproducibility in the IR community.
major comments (3)
- [§4.2] The central performance claims rest on the complementary prior injection module (§4.2). The manuscript does not provide dataset-specific validation (e.g., human evaluation or held-out complementarity labels) showing that the LLM-verified priors accurately reflect latent item complementarity in the target e-commerce data rather than injecting general-knowledge or popularity biases; without this, gains cannot be confidently attributed to the proposed modeling paradigm.
- [§5] §5 Experiments: the reported gains lack accompanying statistical significance tests, standard error bars across multiple runs, and full ablation studies that isolate the semantic-level transition module from the prior injection module. This makes it difficult to confirm that the 17.6% Recall / 16.0% NDCG improvements are robust and load-bearing for the framework's novelty.
- [Table 2] Table 2 (or equivalent results table): while relative improvements are highlighted, the comparison set does not include recent semantics-based sequential models that also operate in discrete code space; this leaves open whether the gains derive specifically from transition modeling in code space or from other implementation choices.
minor comments (2)
- [Abstract] Abstract: the claim of '65x training acceleration' should briefly note the hardware and baseline implementation details for context.
- [§3] Notation in §3: the distinction between item-level embeddings and per-code semantic representations could be clarified with an explicit equation or diagram to avoid reader confusion when reading the transition module.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive comments, which have helped us identify areas to strengthen the manuscript. We address each major comment point-by-point below and outline planned revisions to improve clarity, rigor, and completeness.
read point-by-point responses
-
Referee: [§4.2] The central performance claims rest on the complementary prior injection module (§4.2). The manuscript does not provide dataset-specific validation (e.g., human evaluation or held-out complementarity labels) showing that the LLM-verified priors accurately reflect latent item complementarity in the target e-commerce data rather than injecting general-knowledge or popularity biases; without this, gains cannot be confidently attributed to the proposed modeling paradigm.
Authors: We agree that direct validation of the LLM-derived priors against target-domain complementarity is important for attributing gains specifically to the injection mechanism rather than external knowledge. The current work derives priors from LLM analysis of item metadata to distinguish complementarity from co-occurrence, with ablations showing performance degradation when the module is removed. However, we did not include human evaluation or held-out labels. In the revision we will add a dedicated subsection with qualitative case studies on sampled items from each dataset, explicit discussion of potential LLM biases (e.g., popularity or general-knowledge effects), and any available proxy checks using existing metadata. This will better support the claim that gains stem from the modeling paradigm. revision: yes
-
Referee: [§5] §5 Experiments: the reported gains lack accompanying statistical significance tests, standard error bars across multiple runs, and full ablation studies that isolate the semantic-level transition module from the prior injection module. This makes it difficult to confirm that the 17.6% Recall / 16.0% NDCG improvements are robust and load-bearing for the framework's novelty.
Authors: We acknowledge the value of statistical rigor and component isolation. The reported results are from single runs without error bars or significance tests, and the ablations combine both modules. In the revised version we will rerun all experiments across five random seeds, report means with standard deviations, include paired t-tests against baselines for the main metrics, and expand §5 with separate ablations: one disabling only the semantic-level transition module and one disabling only the prior injection module. These additions will clarify the individual contributions and robustness of the 17.6% / 16.0% gains. revision: yes
-
Referee: [Table 2] Table 2 (or equivalent results table): while relative improvements are highlighted, the comparison set does not include recent semantics-based sequential models that also operate in discrete code space; this leaves open whether the gains derive specifically from transition modeling in code space or from other implementation choices.
Authors: The current baselines focus on strong sequential and complementary-aware models, but we recognize that recent semantics-based approaches using discrete codes are relevant for isolating the benefit of direct transition modeling in code space. In the revision we will expand Table 2 (and the corresponding text) to include additional recent semantics-based sequential models that operate in discrete code spaces, re-running or citing their reported results where possible. This will help demonstrate that the observed improvements are attributable to the semantic-level transition design rather than other factors. revision: partial
Circularity Check
Empirical modeling framework with no circular derivation chain
full rationale
The paper introduces CAST as a new framework with two modules: a semantic-level transition module operating in discrete code space and a complementary prior injection module using LLM-verified priors. Performance claims (up to 17.6% Recall and 16.0% NDCG gains) are presented as outcomes of experiments on e-commerce datasets rather than any first-principles derivation or prediction that reduces to fitted parameters by construction. No equations are shown that equate outputs to inputs via self-definition, renaming, or self-citation load-bearing steps. The central claims rest on empirical validation and external LLM priors, which are independent of the model's internal fitting process and do not create a closed loop. This is a standard engineering contribution without the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Discrete semantic codes capture fine-grained item specifications sufficient to identify complementarity when transitions are modeled directly
- domain assumption LLM-verified complementary priors are reliable signals that can be injected into attention without introducing new biases
invented entities (2)
-
semantic-level transition module
no independent evidence
-
complementary prior injection module
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Haw-Shiuan Chang, Nikhil Agarwal, and Andrew McCallum. 2024. To copy, or not to copy; that is a critical issue of the output softmax layer in neural sequential recommenders. InProceedings of the 17th ACM International Conference on Web Search and Data Mining. 67–76
work page 2024
-
[2]
Jianxin Chang, Chen Gao, Yu Zheng, Yiqun Hui, Yanan Niu, Yang Song, Depeng Jin, and Yong Li. 2021. Sequential Recommendation with Graph Neural Net- works. InProceedings of the 44th ACM International Conference on Research and Development in Information Retrieval. 378–387
work page 2021
-
[3]
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. InPro- ceedings of the 37th International Conference on Machine Learning. 1597–1607
work page 2020
-
[4]
Xinyu Du, Huanhuan Yuan, Pengpeng Zhao, Jianfeng Qu, Fuzhen Zhuang, Guan- feng Liu, Yanchi Liu, and Victor S Sheng. 2023. Frequency enhanced hybrid attention network for sequential recommendation. InProceedings of the 46th ACM International SIGIR Conference on Research and Development in Information Retrieval. 78–88
work page 2023
-
[5]
Tiezheng Ge, Kaiming He, Qifa Ke, and Jian Sun. 2013. Optimized product quantization.IEEE Transactions on Pattern Analysis and Machine Intelligence36, 4 (2013), 744–755
work page 2013
-
[6]
Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk
-
[7]
In Proceedings of the 4th International Conference on Learning Representations
Session-based Recommendations with Recurrent Neural Networks. In Proceedings of the 4th International Conference on Learning Representations. 1–10
-
[8]
Yupeng Hou, Zhankui He, Julian McAuley, and Wayne Xin Zhao. 2023. Learning vector-quantized item representation for transferable sequential recommenders. InProceedings of the ACM Web Conference. 1162–1171
work page 2023
-
[9]
Yupeng Hou, Binbin Hu, Zhiqiang Zhang, and Wayne Xin Zhao. 2022. Core: simple and effective session-based recommendation within consistent represen- tation space. InProceedings of the 45th ACM International SIGIR Conference on Research and Development in Information Retrieval. 1796–1801
work page 2022
-
[10]
Yupeng Hou, Shanlei Mu, Wayne Xin Zhao, Yaliang Li, Bolin Ding, and Ji-Rong Wen. 2022. Towards universal sequence representation learning for recommender systems. InProceedings of the 28th ACM SIGKDD Conference on Knowledge Dis- covery and Data Mining. 585–593
work page 2022
-
[11]
Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2010. Product quantization for nearest neighbor search.IEEE Transactions on Pattern Analysis and Machine Intelligence33, 1 (2010), 117–128
work page 2010
-
[12]
Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-scale similarity search with GPUs.IEEE Transactions on Big Data7, 3 (2019), 535–547
work page 2019
-
[13]
Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential rec- ommendation. InProceedings of the 18th IEEE International Conference on Data Mining. 197–206
work page 2018
-
[14]
Walid Krichene and Steffen Rendle. 2020. On sampled metrics for item recom- mendation. InProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1748–1757
work page 2020
-
[15]
Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. 2023. BLIP-2: Boot- strapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. InProceedings of the 40th International Conference on Machine Learning. 19730–19742
work page 2023
-
[16]
Jing Li, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Tao Lian, and Jun Ma. 2017. Neural Attentive Session-based Recommendation. InProceedings of the 26th ACM Conference on Information and Knowledge Management. 1419–1428
work page 2017
-
[17]
Zelong Li, Yan Liang, Ming Wang, Sungro Yoon, Jiaying Shi, Xin Shen, Xiang He, Chenwei Zhang, Wenyi Wu, Hanbo Wang, et al . 2024. Explainable and coherent complement recommendation based on large language models. InPro- ceedings of the 33rd ACM International Conference on Information and Knowledge Management. 4678–4685
work page 2024
-
[18]
Enze Liu, Bowen Zheng, Wayne Xin Zhao, and Ji-Rong Wen. 2025. Bridging Textual-Collaborative Gap through Semantic Codes for Sequential Recommenda- tion. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1788–1798
work page 2025
-
[19]
Julian McAuley, Rahul Pandey, and Jure Leskovec. 2015. Inferring networks of substitutable and complementary products. InProceedings of the 21st International ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 785–794
work page 2015
-
[20]
Ruihong Qiu, Zi Huang, Hongzhi Yin, and Zijian Wang. 2022. Contrastive learning for representation degeneration problem in sequential recommendation. InProceedings of the 15th ACM International Conference on Web Search and Data Mining. 813–823
work page 2022
-
[21]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer.Journal of Machine Learning Research21, 140 (2020), 1–67
work page 2020
-
[22]
Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. 2010. Fac- torizing personalized Markov chains for next-basket recommendation. InPro- ceedings of the 19th International Conference on World Wide Web. 811–820
work page 2010
-
[23]
Tobias Schnabel, Adith Swaminathan, Ashudeep Singh, Navin Chandak, and Thorsten Joachims. 2016. Recommendations as Treatments: Debiasing Learning and Evaluation. InProceedings of The 33rd International Conference on Machine Learning, Vol. 48. 1670–1679
work page 2016
-
[24]
Kai Sugahara, Chihiro Yamasaki, and Kazushi Okamoto. 2024. Is it really comple- mentary? revisiting behavior-based labels for complementary recommendation. InProceedings of the 18th ACM Conference on Recommender Systems. 1091–1095
work page 2024
-
[25]
Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang
-
[26]
InProceedings of the 28th ACM International Conference on Information and Knowledge Management
BERT4Rec: Sequential Recommendation with Bidirectional Encoder Rep- resentations from Transformer. InProceedings of the 28th ACM International Conference on Information and Knowledge Management. 1441–1450
-
[27]
Qiaoyu Tan, Jianwei Zhang, Jiangchao Yao, Ninghao Liu, Jingren Zhou, Hongxia Yang, and Xia Hu. 2021. Sparse-interest network for sequential recommendation. InProceedings of the 14th ACM International Conference on Web Search and Data Mining. 598–606
work page 2021
-
[28]
Jiaxi Tang and Ke Wang. 2018. Personalized top-n sequential recommendation via convolutional sequence embedding. InProceedings of the 11th ACM International Conference on Web Search and Data Mining. 565–573
work page 2018
-
[29]
Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, and Donald Metzler
-
[30]
InProceedings of the 10th International Conference on Learning Representations
Scale Efficiently: Insights from Pretraining and Finetuning Transformers. InProceedings of the 10th International Conference on Learning Representations. 1–18
-
[31]
Gemma Team, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Rivière, et al. 2025. Gemma 3 Technical Report.arXiv preprint arXiv:2503.19786 (2025), 1–25
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[32]
Chenyang Wang, Min Zhang, Weizhi Ma, Yiqun Liu, and Shaoping Ma. 2020. Make it a chorus: knowledge-and time-aware item modeling for sequential recommendation. InProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 109–118
work page 2020
-
[33]
Wei Wei, Xubin Ren, Jiabin Tang, Qinyong Wang, Lixin Su, Suqi Cheng, Junfeng Wang, Dawei Yin, and Chao Huang. 2024. Llmrec: Large language models with graph augmentation for recommendation. InProceedings of the 17th ACM International Conference on Web Search and Data Mining. 806–815
work page 2024
-
[34]
Xu Xie, Fei Sun, Zhaoyang Liu, Shiwen Wu, Jinyang Gao, Jiandong Zhang, Bolin Ding, and Bin Cui. 2022. Contrastive Learning for Sequential Recommendation. InProceedings of the 38th IEEE International Conference on Data Engineering. 1259–1273
work page 2022
-
[35]
Lanling Xu, Zhen Tian, Bingqian Li, Junjie Zhang, Daoyuan Wang, Hongyu Wang, Jinpeng Wang, Sheng Chen, and Wayne Xin Zhao. 2024. Sequence-level semantic representation fusion for recommender systems. InProceedings of the 33rd ACM International Conference on Information and Knowledge Management. 5015–5022
work page 2024
-
[36]
Lanling Xu, Zhen Tian, Gaowei Zhang, Junjie Zhang, Lei Wang, Bowen Zheng, Yifan Li, Jiakai Tang, Zeyu Zhang, Yupeng Hou, et al . 2023. Towards a more user-friendly and easy-to-use benchmark library for recommender systems. In Proceedings of the 46th ACM International SIGIR Conference on Research and Development in Information Retrieval. 2837–2847
work page 2023
-
[37]
An Yan, Chaosheng Dong, Yan Gao, Jinmiao Fu, Tong Zhao, Yi Sun, and Ju- lian McAuley. 2022. Personalized complementary product recommendation. In Proceedings of the ACM Web Conference. 146–151
work page 2022
-
[38]
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. 2025. Qwen3 Technical Report.arXiv preprint arXiv:2505.09388(2025), 1–35
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[39]
Qian Zhang, Shoujin Wang, Longbing Cao, Defu Lian, Haibo Zhang, and Wen- peng Lu. 2025. Semantic Relation Guided Dual-view Contrastive Learning for Session-based Recommendations.ACM Transactions on Information Systems43, 6 (2025), 1–36
work page 2025
-
[40]
Wei Zhang, Zeyuan Chen, Hongyuan Zha, and Jianyong Wang. 2021. Learning from substitutable and complementary relations for graph-based sequential product recommendation.ACM Transactions on Information Systems40, 2 (2021), 1–28
work page 2021
-
[41]
Yang Zhang, Fuli Feng, Xiangnan He, Tianxin Wei, Chonggang Song, Guohui Ling, and Yongdong Zhang. 2021. Causal intervention for leveraging popularity bias in recommendation. InProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 11–20
work page 2021
-
[42]
Peilin Zhou, You-Liang Huang, Yueqi Xie, Jingqi Gao, Shoujin Wang, Jae Boum Kim, and Sunghun Kim. 2024. Is contrastive learning necessary? a study of data augmentation vs contrastive learning in sequential recommendation. In Proceedings of the ACM Web Conference. 3854–3863
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.