pith. sign in

arxiv: 2604.19414 · v1 · submitted 2026-04-21 · 💻 cs.IR · cs.LG

CAST: Modeling Semantic-Level Transitions for Complementary-Aware Sequential Recommendation

Pith reviewed 2026-05-10 02:03 UTC · model grok-4.3

classification 💻 cs.IR cs.LG
keywords sequential recommendationcomplementary relationssemantic codestransition modelingLLM priorsattention mechanisme-commerce datasetsuser behavior sequences
0
0 comments X

The pith

CAST models dynamic transitions directly in discrete semantic code space to capture fine-grained item complementarity beyond co-occurrence statistics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Sequential recommendation systems typically depend on sparse co-purchase statistics that often reflect popularity biases rather than genuine complementary relations between items. The paper argues that aggregating semantic codes into coarse item representations blurs the specific details needed to identify true complementarity. CAST instead models transitions directly between discrete semantic codes and injects LLM-verified complementary priors into the attention mechanism to prioritize reliable patterns. This yields up to 17.6 percent Recall and 16.0 percent NDCG gains along with 65 times faster training on e-commerce datasets. A sympathetic reader would care because the method offers a concrete way to leverage item semantics for more accurate next-item predictions without relying on potentially misleading statistics.

Core claim

The CAST framework introduces semantic-level transitions that model dynamic changes directly in the discrete semantic code space, thereby capturing fine-grained semantic dependencies that are lost when codes are aggregated into item representations; a complementary prior injection module then incorporates LLM-verified complementary priors into the attention mechanism to prioritize true complementary patterns over spurious co-occurrence statistics.

What carries the argument

The semantic-level transition module that operates directly on discrete semantic codes to track fine-grained dependencies, paired with the complementary prior injection module that biases attention toward verified complementary relations.

If this is right

  • CAST achieves up to 17.6% higher Recall and 16.0% higher NDCG than state-of-the-art sequential recommenders on multiple e-commerce datasets.
  • The framework trains 65 times faster while delivering these accuracy improvements.
  • Direct modeling in semantic code space uncovers latent complementarity that co-purchase statistics alone cannot reliably detect.
  • The approach reduces reliance on aggregated item representations that blur specific semantic details required for complementarity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the underlying semantic codes are generated by large language models, any biases in those models could propagate into the learned transitions and recommendations.
  • The same semantic-transition idea could be tested on non-e-commerce domains such as music playlists or article reading sequences where relations are semantic rather than transactional.
  • Because transitions are tracked at the code level, the model might naturally yield more interpretable explanations by highlighting which semantic attributes drive a recommendation.
  • Replacing the LLM-verified priors with priors derived from other sources would test how sensitive the performance gains are to the quality of the complementary signals.

Load-bearing premise

LLM-verified complementary priors accurately reflect genuine item complementarity without adding new biases, and transitions modeled in discrete semantic code space retain the necessary fine-grained dependencies.

What would settle it

An experiment on a dataset containing expert-annotated ground-truth complementary item pairs in which CAST fails to outperform strong statistic-based baselines on next-item prediction would falsify the claim.

Figures

Figures reproduced from arXiv: 2604.19414 by Haibo Zhang, Jeremiah D. Deng, Lech Szymanski, Qian Zhang.

Figure 1
Figure 1. Figure 1: Example of why co-purchase does not equal com [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The prompt template utilized for inferring comple [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overview of the CAST framework. CAST integrates LLM-verified complementary relations ( [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Hyper-parameter analysis of CAST on three datasets in terms of NDCG@10. [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Distribution of semantic transition scores. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
read the original abstract

Sequential Recommendation (SR) aims to predict the next interaction of a user based on their behavior sequence, where complementary relations often provide essential signals for predicting the next item. However, mainstream models relying on sparse co-purchase statistics often mistake spurious correlations (e.g., due to popularity bias) for true complementary relations. Identifying true complementary relations requires capturing the fine-grained item semantics (e.g., specifications) that simple cooccurrence statistics would be unable to model. While recent semantics-based methods utilize discrete semantic codes to represent items, they typically aggregate semantic codes into coarse item representations. This aggregation process blurs specific semantic details required to identify complementarity. To address these critical limitations and effectively leverage semantics for capturing reliable complementary relations, we propose a Complementary-Aware Semantic Transition (CAST) framework that introduces a new modeling paradigm built upon semantic-level transitions. Specifically, a semantic-level transition module is designed to model dynamic transitions directly in the discrete semantic code space, effectively capturing fine-grained semantic dependencies often lost in aggregated item representations. Then, a complementary prior injection module is designed to incorporate LLM-verified complementary priors into the attention mechanism, thereby prioritizing complementary patterns over co-occurrence statistics. Experiments on multiple e-commerce datasets demonstrate that CAST consistently outperforms the state-of-the-art approaches, achieving up to 17.6% Recall and 16.0% NDCG gains with 65x training acceleration. This validates its effectiveness and efficiency in uncovering latent item complementarity beyond statistics. The code will be released upon acceptance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes the CAST framework for sequential recommendation. It introduces a semantic-level transition module to model dynamic transitions directly in discrete semantic code space (capturing fine-grained dependencies lost in aggregated representations) and a complementary prior injection module that incorporates LLM-verified complementary priors into the attention mechanism to prioritize true complementarity over co-occurrence statistics. Experiments on multiple e-commerce datasets are reported to show consistent outperformance over state-of-the-art methods, with gains up to 17.6% Recall and 16.0% NDCG plus 65x training acceleration.

Significance. If the results hold under rigorous validation, the work could meaningfully advance complementary-aware sequential recommendation by showing how semantic code transitions combined with external LLM priors can reduce reliance on spurious co-purchase statistics. The reported efficiency gains are a notable practical strength, and the planned code release would aid reproducibility in the IR community.

major comments (3)
  1. [§4.2] The central performance claims rest on the complementary prior injection module (§4.2). The manuscript does not provide dataset-specific validation (e.g., human evaluation or held-out complementarity labels) showing that the LLM-verified priors accurately reflect latent item complementarity in the target e-commerce data rather than injecting general-knowledge or popularity biases; without this, gains cannot be confidently attributed to the proposed modeling paradigm.
  2. [§5] §5 Experiments: the reported gains lack accompanying statistical significance tests, standard error bars across multiple runs, and full ablation studies that isolate the semantic-level transition module from the prior injection module. This makes it difficult to confirm that the 17.6% Recall / 16.0% NDCG improvements are robust and load-bearing for the framework's novelty.
  3. [Table 2] Table 2 (or equivalent results table): while relative improvements are highlighted, the comparison set does not include recent semantics-based sequential models that also operate in discrete code space; this leaves open whether the gains derive specifically from transition modeling in code space or from other implementation choices.
minor comments (2)
  1. [Abstract] Abstract: the claim of '65x training acceleration' should briefly note the hardware and baseline implementation details for context.
  2. [§3] Notation in §3: the distinction between item-level embeddings and per-code semantic representations could be clarified with an explicit equation or diagram to avoid reader confusion when reading the transition module.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments, which have helped us identify areas to strengthen the manuscript. We address each major comment point-by-point below and outline planned revisions to improve clarity, rigor, and completeness.

read point-by-point responses
  1. Referee: [§4.2] The central performance claims rest on the complementary prior injection module (§4.2). The manuscript does not provide dataset-specific validation (e.g., human evaluation or held-out complementarity labels) showing that the LLM-verified priors accurately reflect latent item complementarity in the target e-commerce data rather than injecting general-knowledge or popularity biases; without this, gains cannot be confidently attributed to the proposed modeling paradigm.

    Authors: We agree that direct validation of the LLM-derived priors against target-domain complementarity is important for attributing gains specifically to the injection mechanism rather than external knowledge. The current work derives priors from LLM analysis of item metadata to distinguish complementarity from co-occurrence, with ablations showing performance degradation when the module is removed. However, we did not include human evaluation or held-out labels. In the revision we will add a dedicated subsection with qualitative case studies on sampled items from each dataset, explicit discussion of potential LLM biases (e.g., popularity or general-knowledge effects), and any available proxy checks using existing metadata. This will better support the claim that gains stem from the modeling paradigm. revision: yes

  2. Referee: [§5] §5 Experiments: the reported gains lack accompanying statistical significance tests, standard error bars across multiple runs, and full ablation studies that isolate the semantic-level transition module from the prior injection module. This makes it difficult to confirm that the 17.6% Recall / 16.0% NDCG improvements are robust and load-bearing for the framework's novelty.

    Authors: We acknowledge the value of statistical rigor and component isolation. The reported results are from single runs without error bars or significance tests, and the ablations combine both modules. In the revised version we will rerun all experiments across five random seeds, report means with standard deviations, include paired t-tests against baselines for the main metrics, and expand §5 with separate ablations: one disabling only the semantic-level transition module and one disabling only the prior injection module. These additions will clarify the individual contributions and robustness of the 17.6% / 16.0% gains. revision: yes

  3. Referee: [Table 2] Table 2 (or equivalent results table): while relative improvements are highlighted, the comparison set does not include recent semantics-based sequential models that also operate in discrete code space; this leaves open whether the gains derive specifically from transition modeling in code space or from other implementation choices.

    Authors: The current baselines focus on strong sequential and complementary-aware models, but we recognize that recent semantics-based approaches using discrete codes are relevant for isolating the benefit of direct transition modeling in code space. In the revision we will expand Table 2 (and the corresponding text) to include additional recent semantics-based sequential models that operate in discrete code spaces, re-running or citing their reported results where possible. This will help demonstrate that the observed improvements are attributable to the semantic-level transition design rather than other factors. revision: partial

Circularity Check

0 steps flagged

Empirical modeling framework with no circular derivation chain

full rationale

The paper introduces CAST as a new framework with two modules: a semantic-level transition module operating in discrete code space and a complementary prior injection module using LLM-verified priors. Performance claims (up to 17.6% Recall and 16.0% NDCG gains) are presented as outcomes of experiments on e-commerce datasets rather than any first-principles derivation or prediction that reduces to fitted parameters by construction. No equations are shown that equate outputs to inputs via self-definition, renaming, or self-citation load-bearing steps. The central claims rest on empirical validation and external LLM priors, which are independent of the model's internal fitting process and do not create a closed loop. This is a standard engineering contribution without the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The central claim rests on two newly introduced modules whose effectiveness is asserted via experiments whose details are absent from the abstract. No independent evidence for the modules is supplied beyond the reported gains.

axioms (2)
  • domain assumption Discrete semantic codes capture fine-grained item specifications sufficient to identify complementarity when transitions are modeled directly
    Invoked to justify the semantic-level transition module over aggregated representations.
  • domain assumption LLM-verified complementary priors are reliable signals that can be injected into attention without introducing new biases
    Invoked to justify the complementary prior injection module.
invented entities (2)
  • semantic-level transition module no independent evidence
    purpose: Model dynamic transitions directly in discrete semantic code space
    New component introduced to avoid loss of detail from aggregation.
  • complementary prior injection module no independent evidence
    purpose: Incorporate LLM-verified complementary priors into the attention mechanism
    New component to prioritize true complementarity over co-occurrence statistics.

pith-pipeline@v0.9.0 · 5571 in / 1650 out tokens · 85310 ms · 2026-05-10T02:03:29.991691+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 2 internal anchors

  1. [1]

    Haw-Shiuan Chang, Nikhil Agarwal, and Andrew McCallum. 2024. To copy, or not to copy; that is a critical issue of the output softmax layer in neural sequential recommenders. InProceedings of the 17th ACM International Conference on Web Search and Data Mining. 67–76

  2. [2]

    Jianxin Chang, Chen Gao, Yu Zheng, Yiqun Hui, Yanan Niu, Yang Song, Depeng Jin, and Yong Li. 2021. Sequential Recommendation with Graph Neural Net- works. InProceedings of the 44th ACM International Conference on Research and Development in Information Retrieval. 378–387

  3. [3]

    Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. InPro- ceedings of the 37th International Conference on Machine Learning. 1597–1607

  4. [4]

    Xinyu Du, Huanhuan Yuan, Pengpeng Zhao, Jianfeng Qu, Fuzhen Zhuang, Guan- feng Liu, Yanchi Liu, and Victor S Sheng. 2023. Frequency enhanced hybrid attention network for sequential recommendation. InProceedings of the 46th ACM International SIGIR Conference on Research and Development in Information Retrieval. 78–88

  5. [5]

    Tiezheng Ge, Kaiming He, Qifa Ke, and Jian Sun. 2013. Optimized product quantization.IEEE Transactions on Pattern Analysis and Machine Intelligence36, 4 (2013), 744–755

  6. [6]

    Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk

  7. [7]

    In Proceedings of the 4th International Conference on Learning Representations

    Session-based Recommendations with Recurrent Neural Networks. In Proceedings of the 4th International Conference on Learning Representations. 1–10

  8. [8]

    Yupeng Hou, Zhankui He, Julian McAuley, and Wayne Xin Zhao. 2023. Learning vector-quantized item representation for transferable sequential recommenders. InProceedings of the ACM Web Conference. 1162–1171

  9. [9]

    Yupeng Hou, Binbin Hu, Zhiqiang Zhang, and Wayne Xin Zhao. 2022. Core: simple and effective session-based recommendation within consistent represen- tation space. InProceedings of the 45th ACM International SIGIR Conference on Research and Development in Information Retrieval. 1796–1801

  10. [10]

    Yupeng Hou, Shanlei Mu, Wayne Xin Zhao, Yaliang Li, Bolin Ding, and Ji-Rong Wen. 2022. Towards universal sequence representation learning for recommender systems. InProceedings of the 28th ACM SIGKDD Conference on Knowledge Dis- covery and Data Mining. 585–593

  11. [11]

    Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2010. Product quantization for nearest neighbor search.IEEE Transactions on Pattern Analysis and Machine Intelligence33, 1 (2010), 117–128

  12. [12]

    Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-scale similarity search with GPUs.IEEE Transactions on Big Data7, 3 (2019), 535–547

  13. [13]

    Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential rec- ommendation. InProceedings of the 18th IEEE International Conference on Data Mining. 197–206

  14. [14]

    Walid Krichene and Steffen Rendle. 2020. On sampled metrics for item recom- mendation. InProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1748–1757

  15. [15]

    Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. 2023. BLIP-2: Boot- strapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. InProceedings of the 40th International Conference on Machine Learning. 19730–19742

  16. [16]

    Jing Li, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Tao Lian, and Jun Ma. 2017. Neural Attentive Session-based Recommendation. InProceedings of the 26th ACM Conference on Information and Knowledge Management. 1419–1428

  17. [17]

    Zelong Li, Yan Liang, Ming Wang, Sungro Yoon, Jiaying Shi, Xin Shen, Xiang He, Chenwei Zhang, Wenyi Wu, Hanbo Wang, et al . 2024. Explainable and coherent complement recommendation based on large language models. InPro- ceedings of the 33rd ACM International Conference on Information and Knowledge Management. 4678–4685

  18. [18]

    Enze Liu, Bowen Zheng, Wayne Xin Zhao, and Ji-Rong Wen. 2025. Bridging Textual-Collaborative Gap through Semantic Codes for Sequential Recommenda- tion. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1788–1798

  19. [19]

    Julian McAuley, Rahul Pandey, and Jure Leskovec. 2015. Inferring networks of substitutable and complementary products. InProceedings of the 21st International ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 785–794

  20. [20]

    Ruihong Qiu, Zi Huang, Hongzhi Yin, and Zijian Wang. 2022. Contrastive learning for representation degeneration problem in sequential recommendation. InProceedings of the 15th ACM International Conference on Web Search and Data Mining. 813–823

  21. [21]

    Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer.Journal of Machine Learning Research21, 140 (2020), 1–67

  22. [22]

    Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. 2010. Fac- torizing personalized Markov chains for next-basket recommendation. InPro- ceedings of the 19th International Conference on World Wide Web. 811–820

  23. [23]

    Tobias Schnabel, Adith Swaminathan, Ashudeep Singh, Navin Chandak, and Thorsten Joachims. 2016. Recommendations as Treatments: Debiasing Learning and Evaluation. InProceedings of The 33rd International Conference on Machine Learning, Vol. 48. 1670–1679

  24. [24]

    Kai Sugahara, Chihiro Yamasaki, and Kazushi Okamoto. 2024. Is it really comple- mentary? revisiting behavior-based labels for complementary recommendation. InProceedings of the 18th ACM Conference on Recommender Systems. 1091–1095

  25. [25]

    Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang

  26. [26]

    InProceedings of the 28th ACM International Conference on Information and Knowledge Management

    BERT4Rec: Sequential Recommendation with Bidirectional Encoder Rep- resentations from Transformer. InProceedings of the 28th ACM International Conference on Information and Knowledge Management. 1441–1450

  27. [27]

    Qiaoyu Tan, Jianwei Zhang, Jiangchao Yao, Ninghao Liu, Jingren Zhou, Hongxia Yang, and Xia Hu. 2021. Sparse-interest network for sequential recommendation. InProceedings of the 14th ACM International Conference on Web Search and Data Mining. 598–606

  28. [28]

    Jiaxi Tang and Ke Wang. 2018. Personalized top-n sequential recommendation via convolutional sequence embedding. InProceedings of the 11th ACM International Conference on Web Search and Data Mining. 565–573

  29. [29]

    Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, and Donald Metzler

  30. [30]

    InProceedings of the 10th International Conference on Learning Representations

    Scale Efficiently: Insights from Pretraining and Finetuning Transformers. InProceedings of the 10th International Conference on Learning Representations. 1–18

  31. [31]

    Gemma Team, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Rivière, et al. 2025. Gemma 3 Technical Report.arXiv preprint arXiv:2503.19786 (2025), 1–25

  32. [32]

    Chenyang Wang, Min Zhang, Weizhi Ma, Yiqun Liu, and Shaoping Ma. 2020. Make it a chorus: knowledge-and time-aware item modeling for sequential recommendation. InProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 109–118

  33. [33]

    Wei Wei, Xubin Ren, Jiabin Tang, Qinyong Wang, Lixin Su, Suqi Cheng, Junfeng Wang, Dawei Yin, and Chao Huang. 2024. Llmrec: Large language models with graph augmentation for recommendation. InProceedings of the 17th ACM International Conference on Web Search and Data Mining. 806–815

  34. [34]

    Xu Xie, Fei Sun, Zhaoyang Liu, Shiwen Wu, Jinyang Gao, Jiandong Zhang, Bolin Ding, and Bin Cui. 2022. Contrastive Learning for Sequential Recommendation. InProceedings of the 38th IEEE International Conference on Data Engineering. 1259–1273

  35. [35]

    Lanling Xu, Zhen Tian, Bingqian Li, Junjie Zhang, Daoyuan Wang, Hongyu Wang, Jinpeng Wang, Sheng Chen, and Wayne Xin Zhao. 2024. Sequence-level semantic representation fusion for recommender systems. InProceedings of the 33rd ACM International Conference on Information and Knowledge Management. 5015–5022

  36. [36]

    Lanling Xu, Zhen Tian, Gaowei Zhang, Junjie Zhang, Lei Wang, Bowen Zheng, Yifan Li, Jiakai Tang, Zeyu Zhang, Yupeng Hou, et al . 2023. Towards a more user-friendly and easy-to-use benchmark library for recommender systems. In Proceedings of the 46th ACM International SIGIR Conference on Research and Development in Information Retrieval. 2837–2847

  37. [37]

    An Yan, Chaosheng Dong, Yan Gao, Jinmiao Fu, Tong Zhao, Yi Sun, and Ju- lian McAuley. 2022. Personalized complementary product recommendation. In Proceedings of the ACM Web Conference. 146–151

  38. [38]

    An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. 2025. Qwen3 Technical Report.arXiv preprint arXiv:2505.09388(2025), 1–35

  39. [39]

    Qian Zhang, Shoujin Wang, Longbing Cao, Defu Lian, Haibo Zhang, and Wen- peng Lu. 2025. Semantic Relation Guided Dual-view Contrastive Learning for Session-based Recommendations.ACM Transactions on Information Systems43, 6 (2025), 1–36

  40. [40]

    Wei Zhang, Zeyuan Chen, Hongyuan Zha, and Jianyong Wang. 2021. Learning from substitutable and complementary relations for graph-based sequential product recommendation.ACM Transactions on Information Systems40, 2 (2021), 1–28

  41. [41]

    Yang Zhang, Fuli Feng, Xiangnan He, Tianxin Wei, Chonggang Song, Guohui Ling, and Yongdong Zhang. 2021. Causal intervention for leveraging popularity bias in recommendation. InProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 11–20

  42. [42]

    Peilin Zhou, You-Liang Huang, Yueqi Xie, Jingqi Gao, Shoujin Wang, Jae Boum Kim, and Sunghun Kim. 2024. Is contrastive learning necessary? a study of data augmentation vs contrastive learning in sequential recommendation. In Proceedings of the ACM Web Conference. 3854–3863