Learning Decomposed Contextual Token Representations from Pretrained and Collaborative Signals for Generative Recommendation
Pith reviewed 2026-05-18 20:59 UTC · model grok-4.3
The pith
DECOR addresses objective misalignment in generative recommenders by using contextual token composition and decomposed embedding fusion to retain pretrained semantics while adapting to user interactions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a unified framework called DECOR learns decomposed contextual token representations by first applying contextualized token composition to refine embeddings based on user interaction context and then performing decomposed embedding fusion to integrate pretrained codebook embeddings with newly learned collaborative embeddings, thereby preserving pretrained semantics and improving token adaptability for next-item generation.
What carries the argument
DECOR framework, built on contextualized token composition that refines embeddings from interaction context plus decomposed embedding fusion that merges pretrained codebook vectors with collaborative embeddings.
If this is right
- Token assignments become dynamic and reflect specific user contexts rather than remaining fixed after pretraining.
- Pretrained knowledge from language models survives recommender training instead of being overwritten by interaction data.
- Sequence-to-sequence generation for next items benefits from both semantic stability and collaborative adaptability.
- The same two-stage pipeline can be retained while closing the performance gap to models trained end-to-end on interactions alone.
Where Pith is reading between the lines
- The same decomposition pattern could be tested in other generative pipelines where a pretrained encoder is later adapted to a different objective, such as dialogue or code generation.
- If the fusion mechanism proves stable, it might reduce the need for separate semantic ID retraining when new interaction data arrives.
- The approach suggests a general template for keeping auxiliary pretraining signals alive during task-specific fine-tuning without full retraining.
Load-bearing premise
The assumption that objective misalignment between semantic reconstruction and user interaction modeling is the main source of poor tokenization and lost semantics, and that adding contextual composition and decomposed fusion fixes both problems without creating offsetting drawbacks.
What would settle it
A controlled ablation on the same three datasets showing that removing either the contextual composition step or the decomposed fusion step produces recommendation metrics no better than the strongest prior two-stage baselines.
Figures
read the original abstract
Recent advances in generative recommenders adopt a two-stage paradigm: items are first tokenized into semantic IDs using a pretrained tokenizer, and then large language models (LLMs) are trained to generate the next item via sequence-to-sequence modeling. However, these two stages are optimized for different objectives: semantic reconstruction during tokenizer pretraining versus user interaction modeling during recommender training. This objective misalignment leads to two key limitations: (i) suboptimal static tokenization, where fixed token assignments fail to reflect diverse usage contexts; and (ii) discarded pretrained semantics, where pretrained knowledge - typically from language model embeddings - is overwritten during recommender training on user interactions. To address these limitations, we propose to learn $\underline{DE}$composed $\underline{CO}$ntextual Token $\underline{R}$epresentations (DECOR), a unified framework that preserves pretrained semantics while enhancing the adaptability of token embeddings. DECOR introduces contextualized token composition to refine token embeddings based on user interaction context, and decomposed embedding fusion that integrates pretrained codebook embeddings with newly learned collaborative embeddings. Experiments on three real-world datasets demonstrate that DECOR consistently outperforms state-of-the-art baselines in recommendation performance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes DECOR, a unified framework for generative recommendation that learns decomposed contextual token representations. It identifies objective misalignment between semantic reconstruction in pretrained tokenizers and user-interaction modeling in LLM-based recommenders as the source of two limitations: suboptimal static tokenization and overwriting of pretrained semantics. DECOR introduces contextualized token composition to adapt token embeddings to usage context and decomposed embedding fusion to combine pretrained codebook embeddings with newly learned collaborative embeddings. Experiments on three real-world datasets are reported to show consistent outperformance over state-of-the-art baselines.
Significance. If the empirical gains are shown to arise specifically from semantic preservation rather than added capacity or altered optimization, the approach could offer a practical route to retain useful pretrained knowledge while adapting to collaborative signals in generative recommenders.
major comments (2)
- [Method (decomposed embedding fusion) and Experiments] The central claim that decomposed embedding fusion preserves pretrained semantics (rather than allowing collaborative signals to overwrite them) is load-bearing for the motivation and contribution. No quantitative verification—such as cosine similarity, reconstruction fidelity, or nearest-neighbor analysis between final token representations and the original pretrained embeddings—is described, leaving open the possibility that gains stem from extra parameters or training dynamics instead.
- [Experiments] Table or results section: the abstract asserts consistent outperformance on three datasets, yet the provided description supplies no concrete metrics (NDCG@K, Recall@K), baseline list, statistical significance tests, or ablation isolating contextual composition versus decomposed fusion. These details are required to evaluate whether the reported gains are reliable and attributable to the proposed mechanisms.
minor comments (1)
- [Abstract] The abstract introduces the acronym DECOR via underlined letters; ensure the full manuscript uses consistent acronym formatting and expands it on first use.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major point below and will revise the manuscript to strengthen the evidence and clarity of the experimental results.
read point-by-point responses
-
Referee: [Method (decomposed embedding fusion) and Experiments] The central claim that decomposed embedding fusion preserves pretrained semantics (rather than allowing collaborative signals to overwrite them) is load-bearing for the motivation and contribution. No quantitative verification—such as cosine similarity, reconstruction fidelity, or nearest-neighbor analysis between final token representations and the original pretrained embeddings—is described, leaving open the possibility that gains stem from extra parameters or training dynamics instead.
Authors: We agree that direct quantitative verification would strengthen the central claim and help rule out alternative explanations. While the manuscript emphasizes end-to-end recommendation gains, we will add a new analysis subsection in the revision that reports cosine similarity between the final fused embeddings and the original pretrained codebook embeddings, along with nearest-neighbor semantic retention examples. This will provide explicit evidence that collaborative adaptation occurs without overwriting pretrained semantics. revision: yes
-
Referee: [Experiments] Table or results section: the abstract asserts consistent outperformance on three datasets, yet the provided description supplies no concrete metrics (NDCG@K, Recall@K), baseline list, statistical significance tests, or ablation isolating contextual composition versus decomposed fusion. These details are required to evaluate whether the reported gains are reliable and attributable to the proposed mechanisms.
Authors: The full manuscript contains these details in Section 4 and the associated tables. We report NDCG@10 and Recall@10 on the three datasets, with comparisons against baselines including TIGER, P5, and SASRec; paired t-tests establish statistical significance; and ablation studies in Table 3 isolate contextualized token composition from decomposed embedding fusion. We will revise the abstract and introduction to explicitly reference these tables and metrics for improved accessibility. revision: partial
Circularity Check
No significant circularity in derivation chain
full rationale
The paper proposes new architectural components (contextualized token composition and decomposed embedding fusion) to address objective misalignment between semantic reconstruction and interaction modeling. These are described as independent mechanisms that integrate pretrained codebook embeddings with learned collaborative ones, without any equations or steps in the abstract reducing the claimed semantic preservation or performance gains to a redefinition, fit, or self-citation of the inputs by construction. The framework is presented as self-contained with novel design choices whose benefits are to be validated empirically on external datasets.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Pretrained tokenizers produce semantically meaningful codebook embeddings that remain useful when fused with collaborative signals.
invented entities (2)
-
Contextualized token composition
no independent evidence
-
Decomposed embedding fusion
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
DECOR introduces contextualized token composition to refine token embeddings based on user interaction context, and decomposed embedding fusion that integrates pretrained codebook embeddings with newly learned collaborative embeddings.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We retain the pretrained codebooks from the RQ-VAE tokenizer as frozen semantic embeddings and introduce separate, learnable collaborative embeddings.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
, " * write output.state after.block = add.period write newline
ENTRY address archivePrefix author booktitle chapter edition editor eid eprint howpublished institution isbn journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.a...
-
[2]
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...
-
[3]
Bao, K.; Zhang, J.; Wang, W.; Zhang, Y.; Yang, Z.; Luo, Y.; Chen, C.; Feng, F.; and Tian, Q. 2025. A bi-step grounding paradigm for large language models in recommendation systems. ACM Transactions on Recommender Systems, 3(4): 1--27
work page 2025
- [4]
-
[5]
Dai, S.; Shao, N.; Zhao, H.; Yu, W.; Si, Z.; Xu, C.; Sun, Z.; Zhang, X.; and Xu, J. 2023. Uncovering chatgpt’s capabilities in recommender systems. In Proceedings of the 17th ACM Conference on Recommender Systems, 1126--1132
work page 2023
-
[6]
Deldjoo, Y.; He, Z.; McAuley, J.; Korikov, A.; Sanner, S.; Ramisa, A.; Vidal, R.; Sathiamoorthy, M.; Kasirzadeh, A.; and Milano, S. 2024. A Review of Modern Recommender Systems Using Generative Models (Gen-RecSys). In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD '24, 6448–6458. New York, NY, USA: Association fo...
work page 2024
-
[7]
Deng, J.; Wang, S.; Cai, K.; Ren, L.; Hu, Q.; Ding, W.; Luo, Q.; and Zhou, G. 2025. Onerec: Unifying retrieve and rank with generative recommender and iterative preference alignment. arXiv preprint arXiv:2502.18965
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[8]
Geng, S.; Liu, S.; Fu, Z.; Ge, Y.; and Zhang, Y. 2022. Recommendation as language processing (rlp): A unified pretrain, personalized prompt & predict paradigm (p5). In Proceedings of the 16th ACM conference on recommender systems, 299--315
work page 2022
-
[9]
Hou, Y.; He, Z.; McAuley, J.; and Zhao, W. X. 2023. Learning Vector-Quantized Item Representation for Transferable Sequential Recommenders. WWW '23, 1162–1171. New York, NY, USA: Association for Computing Machinery. ISBN 9781450394161
work page 2023
-
[10]
Hou, Y.; Li, J.; He, Z.; Yan, A.; Chen, X.; and McAuley, J. 2024. Bridging Language and Items for Retrieval and Recommendation. arXiv preprint arXiv:2403.03952
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[11]
Hua, W.; Xu, S.; Ge, Y.; and Zhang, Y. 2023. How to index item ids for recommendation foundation models. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region, 195--204
work page 2023
-
[12]
Jannach, D.; and Ludewig, M. 2017. When recurrent neural networks meet the neighborhood for session-based recommendation. In Proceedings of the eleventh ACM conference on recommender systems, 306--310
work page 2017
-
[13]
Kang, W.-C.; and McAuley, J. 2018. Self-attentive sequential recommendation. In 2018 IEEE international conference on data mining (ICDM), 197--206. IEEE
work page 2018
-
[14]
Lee, D.; Kim, C.; Kim, S.; Cho, M.; and Han, W.-S. 2022. Autoregressive image generation using residual quantization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11523--11532
work page 2022
-
[15]
Li, Y.; Yang, N.; Wang, L.; Wei, F.; and Li, W. 2023. Generative retrieval for conversational question answering. Information Processing & Management, 60(5): 103475
work page 2023
-
[16]
Liao, J.; Li, S.; Yang, Z.; Wu, J.; Yuan, Y.; and Wang, X. 2023. Llara: Aligning large language models with sequential recommenders. CoRR
work page 2023
-
[17]
Liao, J.; Li, S.; Yang, Z.; Wu, J.; Yuan, Y.; Wang, X.; and He, X. 2024. Llara: Large language-recommendation assistant. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 1785--1795
work page 2024
- [18]
-
[19]
Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; and Liu, P. J. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research, 21(140): 1--67
work page 2020
-
[20]
Q.; Samost, J.; Kula, M.; Chi, E
Rajput, S.; Mehta, N.; Singh, A.; Keshavan, R.; Vu, T.; Heidt, L.; Hong, L.; Tay, Y.; Tran, V. Q.; Samost, J.; Kula, M.; Chi, E. H.; and Sathiamoorthy, M. 2023. Recommender systems with generative retrieval. In Proceedings of the 37th International Conference on Neural Information Processing Systems, NIPS '23. Red Hook, NY, USA: Curran Associates Inc
work page 2023
-
[21]
Sun, F.; Liu, J.; Wu, J.; Pei, C.; Lin, X.; Ou, W.; and Jiang, P. 2019. BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. In Proceedings of the 28th ACM international conference on information and knowledge management, 1441--1450
work page 2019
-
[22]
Tang, J.; and Wang, K. 2018. Personalized top-n sequential recommendation via convolutional sequence embedding. In Proceedings of the eleventh ACM international conference on web search and data mining, 565--573
work page 2018
-
[23]
van den Oord, A.; Vinyals, O.; and Kavukcuoglu, K. 2017. Neural discrete representation learning. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS'17, 6309–6318. Red Hook, NY, USA: Curran Associates Inc. ISBN 9781510860964
work page 2017
-
[24]
Wang, W.; Bao, H.; Lin, X.; Zhang, J.; Li, Y.; Feng, F.; Ng, S.-K.; and Chua, T.-S. 2024 a . Learnable Item Tokenization for Generative Recommendation. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, CIKM '24, 2400–2409. New York, NY, USA: Association for Computing Machinery. ISBN 9798400704369
work page 2024
-
[25]
Wang, Y.; Ren, Z.; Sun, W.; Yang, J.; Liang, Z.; Chen, X.; Xie, R.; Yan, S.; Zhang, X.; Ren, P.; et al. 2024 b . Enhanced generative recommendation via content and collaboration integration. CoRR
work page 2024
-
[26]
Wang, Y.; Xun, J.; Hong, M.; Zhu, J.; Jin, T.; Lin, W.; Li, H.; Li, L.; Xia, Y.; Zhao, Z.; et al. 2024 c . Eager: Two-stream generative recommender with behavior-semantic collaboration. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 3245--3254
work page 2024
-
[27]
Yin, J.; Zeng, Z.; Li, M.; Yan, H.; Li, C.; Han, W.; Zhang, J.; Liu, R.; Sun, H.; Deng, W.; Sun, F.; Zhang, Q.; Pan, S.; and Wang, S. 2025. Unleash LLM s Potential for Sequential Recommendation by Coordinating Dual Dynamic Index Mechanism. In THE WEB CONFERENCE 2025
work page 2025
-
[28]
Zhang, J.; Xie, R.; Hou, Y.; Zhao, X.; Lin, L.; and Wen, J.-R. 2025. Recommendation as instruction following: A large language model empowered recommendation approach. ACM Transactions on Information Systems, 43(5): 1--37
work page 2025
-
[29]
Zhang, Y.; Ding, H.; Shui, Z.; Ma, Y.; Zou, J.; Deoras, A.; and Wang, H. 2021. Language models as recommender systems: Evaluations and limitations
work page 2021
-
[30]
Zheng, B.; Hou, Y.; Lu, H.; Chen, Y.; Zhao, W. X.; Chen, M.; and Wen, J.-R. 2024. Adapting Large Language Models by Integrating Collaborative Semantics for Recommendation. In 2024 IEEE 40th International Conference on Data Engineering (ICDE), 1435--1448
work page 2024
-
[31]
X.; Zhu, Y.; Wang, S.; Zhang, F.; Wang, Z.; and Wen, J.-R
Zhou, K.; Wang, H.; Zhao, W. X.; Zhu, Y.; Wang, S.; Zhang, F.; Wang, Z.; and Wen, J.-R. 2020. S3-rec: Self-supervised learning for sequential recommendation with mutual information maximization. In Proceedings of the 29th ACM international conference on information & knowledge management, 1893--1902
work page 2020
- [32]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.