arxiv: 2509.10468 · v2 · submitted 2025-08-22 · 💻 cs.IR · cs.AI· cs.CL

Learning Decomposed Contextual Token Representations from Pretrained and Collaborative Signals for Generative Recommendation

Yifan Liu , Yaokun Liu , Zelin Li , Zhenrui Yue , Gyuseok Lee , Ruichen Yao , Yang Zhang , Dong Wang This is my paper

Pith reviewed 2026-05-18 20:59 UTC · model grok-4.3

classification 💻 cs.IR cs.AIcs.CL

keywords generative recommendationtoken representationspretrained semanticscontextual compositiondecomposed embedding fusionsemantic IDsuser interaction modeling

0 comments p. Extension

The pith

DECOR addresses objective misalignment in generative recommenders by using contextual token composition and decomposed embedding fusion to retain pretrained semantics while adapting to user interactions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper identifies a core conflict in current generative recommendation systems where item tokenization is pretrained for semantic reconstruction while the subsequent LLM training optimizes for user interaction sequences. This misalignment produces static tokens that ignore varying contexts and overwrites useful pretrained knowledge during recommender fine-tuning. DECOR introduces contextualized composition to adjust token embeddings according to interaction history and decomposed fusion to combine original codebook vectors with newly learned collaborative signals. Experiments across three datasets show consistent gains over existing baselines. A reader should care because the approach offers a direct way to keep the strengths of language-model pretraining inside recommendation pipelines without discarding them.

Core claim

The central claim is that a unified framework called DECOR learns decomposed contextual token representations by first applying contextualized token composition to refine embeddings based on user interaction context and then performing decomposed embedding fusion to integrate pretrained codebook embeddings with newly learned collaborative embeddings, thereby preserving pretrained semantics and improving token adaptability for next-item generation.

What carries the argument

DECOR framework, built on contextualized token composition that refines embeddings from interaction context plus decomposed embedding fusion that merges pretrained codebook vectors with collaborative embeddings.

If this is right

Token assignments become dynamic and reflect specific user contexts rather than remaining fixed after pretraining.
Pretrained knowledge from language models survives recommender training instead of being overwritten by interaction data.
Sequence-to-sequence generation for next items benefits from both semantic stability and collaborative adaptability.
The same two-stage pipeline can be retained while closing the performance gap to models trained end-to-end on interactions alone.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same decomposition pattern could be tested in other generative pipelines where a pretrained encoder is later adapted to a different objective, such as dialogue or code generation.
If the fusion mechanism proves stable, it might reduce the need for separate semantic ID retraining when new interaction data arrives.
The approach suggests a general template for keeping auxiliary pretraining signals alive during task-specific fine-tuning without full retraining.

Load-bearing premise

The assumption that objective misalignment between semantic reconstruction and user interaction modeling is the main source of poor tokenization and lost semantics, and that adding contextual composition and decomposed fusion fixes both problems without creating offsetting drawbacks.

What would settle it

A controlled ablation on the same three datasets showing that removing either the contextual composition step or the decomposed fusion step produces recommendation metrics no better than the strongest prior two-stage baselines.

Figures

Figures reproduced from arXiv: 2509.10468 by Dong Wang, Gyuseok Lee, Ruichen Yao, Yang Zhang, Yaokun Liu, Yifan Liu, Zelin Li, Zhenrui Yue.

**Figure 1.** Figure 1: Suboptimal static tokenization assigns identical prefix tokens to semantically distinct items (e.g., noisecanceling headphones for office use, workout, or sleep), leading to ambiguous representations that fail to reflect diverse user interaction contexts. Our method addresses this by contextually adapting token representations through compositions, which increases embedding utilization across quantization… view at source ↗

**Figure 2.** Figure 2: DECOR enhances generative recommendation via two components: decomposed embedding fusion integrates frozen [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Parameter analysis of DECOR on contextual token composition weight [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Case study for prefix ambiguity on the Scien [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

read the original abstract

Recent advances in generative recommenders adopt a two-stage paradigm: items are first tokenized into semantic IDs using a pretrained tokenizer, and then large language models (LLMs) are trained to generate the next item via sequence-to-sequence modeling. However, these two stages are optimized for different objectives: semantic reconstruction during tokenizer pretraining versus user interaction modeling during recommender training. This objective misalignment leads to two key limitations: (i) suboptimal static tokenization, where fixed token assignments fail to reflect diverse usage contexts; and (ii) discarded pretrained semantics, where pretrained knowledge - typically from language model embeddings - is overwritten during recommender training on user interactions. To address these limitations, we propose to learn $\underline{DE}$composed $\underline{CO}$ntextual Token $\underline{R}$epresentations (DECOR), a unified framework that preserves pretrained semantics while enhancing the adaptability of token embeddings. DECOR introduces contextualized token composition to refine token embeddings based on user interaction context, and decomposed embedding fusion that integrates pretrained codebook embeddings with newly learned collaborative embeddings. Experiments on three real-world datasets demonstrate that DECOR consistently outperforms state-of-the-art baselines in recommendation performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes DECOR, a unified framework for generative recommendation that learns decomposed contextual token representations. It identifies objective misalignment between semantic reconstruction in pretrained tokenizers and user-interaction modeling in LLM-based recommenders as the source of two limitations: suboptimal static tokenization and overwriting of pretrained semantics. DECOR introduces contextualized token composition to adapt token embeddings to usage context and decomposed embedding fusion to combine pretrained codebook embeddings with newly learned collaborative embeddings. Experiments on three real-world datasets are reported to show consistent outperformance over state-of-the-art baselines.

Significance. If the empirical gains are shown to arise specifically from semantic preservation rather than added capacity or altered optimization, the approach could offer a practical route to retain useful pretrained knowledge while adapting to collaborative signals in generative recommenders.

major comments (2)

[Method (decomposed embedding fusion) and Experiments] The central claim that decomposed embedding fusion preserves pretrained semantics (rather than allowing collaborative signals to overwrite them) is load-bearing for the motivation and contribution. No quantitative verification—such as cosine similarity, reconstruction fidelity, or nearest-neighbor analysis between final token representations and the original pretrained embeddings—is described, leaving open the possibility that gains stem from extra parameters or training dynamics instead.
[Experiments] Table or results section: the abstract asserts consistent outperformance on three datasets, yet the provided description supplies no concrete metrics (NDCG@K, Recall@K), baseline list, statistical significance tests, or ablation isolating contextual composition versus decomposed fusion. These details are required to evaluate whether the reported gains are reliable and attributable to the proposed mechanisms.

minor comments (1)

[Abstract] The abstract introduces the acronym DECOR via underlined letters; ensure the full manuscript uses consistent acronym formatting and expands it on first use.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will revise the manuscript to strengthen the evidence and clarity of the experimental results.

read point-by-point responses

Referee: [Method (decomposed embedding fusion) and Experiments] The central claim that decomposed embedding fusion preserves pretrained semantics (rather than allowing collaborative signals to overwrite them) is load-bearing for the motivation and contribution. No quantitative verification—such as cosine similarity, reconstruction fidelity, or nearest-neighbor analysis between final token representations and the original pretrained embeddings—is described, leaving open the possibility that gains stem from extra parameters or training dynamics instead.

Authors: We agree that direct quantitative verification would strengthen the central claim and help rule out alternative explanations. While the manuscript emphasizes end-to-end recommendation gains, we will add a new analysis subsection in the revision that reports cosine similarity between the final fused embeddings and the original pretrained codebook embeddings, along with nearest-neighbor semantic retention examples. This will provide explicit evidence that collaborative adaptation occurs without overwriting pretrained semantics. revision: yes
Referee: [Experiments] Table or results section: the abstract asserts consistent outperformance on three datasets, yet the provided description supplies no concrete metrics (NDCG@K, Recall@K), baseline list, statistical significance tests, or ablation isolating contextual composition versus decomposed fusion. These details are required to evaluate whether the reported gains are reliable and attributable to the proposed mechanisms.

Authors: The full manuscript contains these details in Section 4 and the associated tables. We report NDCG@10 and Recall@10 on the three datasets, with comparisons against baselines including TIGER, P5, and SASRec; paired t-tests establish statistical significance; and ablation studies in Table 3 isolate contextualized token composition from decomposed embedding fusion. We will revise the abstract and introduction to explicitly reference these tables and metrics for improved accessibility. revision: partial

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper proposes new architectural components (contextualized token composition and decomposed embedding fusion) to address objective misalignment between semantic reconstruction and interaction modeling. These are described as independent mechanisms that integrate pretrained codebook embeddings with learned collaborative ones, without any equations or steps in the abstract reducing the claimed semantic preservation or performance gains to a redefinition, fit, or self-citation of the inputs by construction. The framework is presented as self-contained with novel design choices whose benefits are to be validated empirically on external datasets.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

Abstract-only review limits visibility into exact hyperparameters or assumptions; the framework implicitly relies on standard transformer training assumptions and the existence of useful pretrained codebooks.

axioms (1)

domain assumption Pretrained tokenizers produce semantically meaningful codebook embeddings that remain useful when fused with collaborative signals.
Invoked in the description of decomposed embedding fusion as a way to preserve pretrained semantics.

invented entities (2)

Contextualized token composition no independent evidence
purpose: Refine token embeddings dynamically based on user interaction context
New mechanism introduced to address static tokenization limitation.
Decomposed embedding fusion no independent evidence
purpose: Integrate pretrained codebook embeddings with newly learned collaborative embeddings
New mechanism introduced to prevent overwriting of pretrained knowledge.

pith-pipeline@v0.9.0 · 5760 in / 1334 out tokens · 30074 ms · 2026-05-18T20:59:50.288987+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

DECOR introduces contextualized token composition to refine token embeddings based on user interaction context, and decomposed embedding fusion that integrates pretrained codebook embeddings with newly learned collaborative embeddings.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We retain the pretrained codebooks from the RQ-VAE tokenizer as frozen semantic embeddings and introduce separate, learnable collaborative embeddings.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages · 2 internal anchors

[1]

, " * write output.state after.block = add.period write newline

ENTRY address archivePrefix author booktitle chapter edition editor eid eprint howpublished institution isbn journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.a...

work page
[2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page
[3]

Bao, K.; Zhang, J.; Wang, W.; Zhang, Y.; Yang, Z.; Luo, Y.; Chen, C.; Feng, F.; and Tian, Q. 2025. A bi-step grounding paradigm for large language models in recommendation systems. ACM Transactions on Recommender Systems, 3(4): 1--27

work page 2025
[4]

Chu, Z.; Hao, H.; Ouyang, X.; Wang, S.; Wang, Y.; Shen, Y.; Gu, J.; Cui, Q.; Li, L.; Xue, S.; et al. 2023. Leveraging large language models for pre-trained recommender systems. arXiv preprint arXiv:2308.10837

work page arXiv 2023
[5]

Dai, S.; Shao, N.; Zhao, H.; Yu, W.; Si, Z.; Xu, C.; Sun, Z.; Zhang, X.; and Xu, J. 2023. Uncovering chatgpt’s capabilities in recommender systems. In Proceedings of the 17th ACM Conference on Recommender Systems, 1126--1132

work page 2023
[6]

Deldjoo, Y.; He, Z.; McAuley, J.; Korikov, A.; Sanner, S.; Ramisa, A.; Vidal, R.; Sathiamoorthy, M.; Kasirzadeh, A.; and Milano, S. 2024. A Review of Modern Recommender Systems Using Generative Models (Gen-RecSys). In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD '24, 6448–6458. New York, NY, USA: Association fo...

work page 2024
[7]

Deng, J.; Wang, S.; Cai, K.; Ren, L.; Hu, Q.; Ding, W.; Luo, Q.; and Zhou, G. 2025. Onerec: Unifying retrieve and rank with generative recommender and iterative preference alignment. arXiv preprint arXiv:2502.18965

work page internal anchor Pith review Pith/arXiv arXiv 2025
[8]

Geng, S.; Liu, S.; Fu, Z.; Ge, Y.; and Zhang, Y. 2022. Recommendation as language processing (rlp): A unified pretrain, personalized prompt & predict paradigm (p5). In Proceedings of the 16th ACM conference on recommender systems, 299--315

work page 2022
[9]

Hou, Y.; He, Z.; McAuley, J.; and Zhao, W. X. 2023. Learning Vector-Quantized Item Representation for Transferable Sequential Recommenders. WWW '23, 1162–1171. New York, NY, USA: Association for Computing Machinery. ISBN 9781450394161

work page 2023
[10]

Hou, Y.; Li, J.; He, Z.; Yan, A.; Chen, X.; and McAuley, J. 2024. Bridging Language and Items for Retrieval and Recommendation. arXiv preprint arXiv:2403.03952

work page internal anchor Pith review Pith/arXiv arXiv 2024
[11]

Hua, W.; Xu, S.; Ge, Y.; and Zhang, Y. 2023. How to index item ids for recommendation foundation models. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region, 195--204

work page 2023
[12]

Jannach, D.; and Ludewig, M. 2017. When recurrent neural networks meet the neighborhood for session-based recommendation. In Proceedings of the eleventh ACM conference on recommender systems, 306--310

work page 2017
[13]

Kang, W.-C.; and McAuley, J. 2018. Self-attentive sequential recommendation. In 2018 IEEE international conference on data mining (ICDM), 197--206. IEEE

work page 2018
[14]

Lee, D.; Kim, C.; Kim, S.; Cho, M.; and Han, W.-S. 2022. Autoregressive image generation using residual quantization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11523--11532

work page 2022
[15]

Li, Y.; Yang, N.; Wang, L.; Wei, F.; and Li, W. 2023. Generative retrieval for conversational question answering. Information Processing & Management, 60(5): 103475

work page 2023
[16]

Liao, J.; Li, S.; Yang, Z.; Wu, J.; Yuan, Y.; and Wang, X. 2023. Llara: Aligning large language models with sequential recommenders. CoRR

work page 2023
[17]

Liao, J.; Li, S.; Yang, Z.; Wu, J.; Yuan, Y.; Wang, X.; and He, X. 2024. Llara: Large language-recommendation assistant. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 1785--1795

work page 2024
[18]

Liu, E.; Zheng, B.; Ling, C.; Hu, L.; Li, H.; and Zhao, W. X. 2025. Generative Recommender with End-to-End Learnable Item Tokenization. arXiv:2409.05546

work page arXiv 2025
[19]

Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; and Liu, P. J. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research, 21(140): 1--67

work page 2020
[20]

Q.; Samost, J.; Kula, M.; Chi, E

Rajput, S.; Mehta, N.; Singh, A.; Keshavan, R.; Vu, T.; Heidt, L.; Hong, L.; Tay, Y.; Tran, V. Q.; Samost, J.; Kula, M.; Chi, E. H.; and Sathiamoorthy, M. 2023. Recommender systems with generative retrieval. In Proceedings of the 37th International Conference on Neural Information Processing Systems, NIPS '23. Red Hook, NY, USA: Curran Associates Inc

work page 2023
[21]

Sun, F.; Liu, J.; Wu, J.; Pei, C.; Lin, X.; Ou, W.; and Jiang, P. 2019. BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. In Proceedings of the 28th ACM international conference on information and knowledge management, 1441--1450

work page 2019
[22]

Tang, J.; and Wang, K. 2018. Personalized top-n sequential recommendation via convolutional sequence embedding. In Proceedings of the eleventh ACM international conference on web search and data mining, 565--573

work page 2018
[23]

van den Oord, A.; Vinyals, O.; and Kavukcuoglu, K. 2017. Neural discrete representation learning. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS'17, 6309–6318. Red Hook, NY, USA: Curran Associates Inc. ISBN 9781510860964

work page 2017
[24]

Wang, W.; Bao, H.; Lin, X.; Zhang, J.; Li, Y.; Feng, F.; Ng, S.-K.; and Chua, T.-S. 2024 a . Learnable Item Tokenization for Generative Recommendation. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, CIKM '24, 2400–2409. New York, NY, USA: Association for Computing Machinery. ISBN 9798400704369

work page 2024
[25]

Wang, Y.; Ren, Z.; Sun, W.; Yang, J.; Liang, Z.; Chen, X.; Xie, R.; Yan, S.; Zhang, X.; Ren, P.; et al. 2024 b . Enhanced generative recommendation via content and collaboration integration. CoRR

work page 2024
[26]

Wang, Y.; Xun, J.; Hong, M.; Zhu, J.; Jin, T.; Lin, W.; Li, H.; Li, L.; Xia, Y.; Zhao, Z.; et al. 2024 c . Eager: Two-stream generative recommender with behavior-semantic collaboration. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 3245--3254

work page 2024
[27]

Yin, J.; Zeng, Z.; Li, M.; Yan, H.; Li, C.; Han, W.; Zhang, J.; Liu, R.; Sun, H.; Deng, W.; Sun, F.; Zhang, Q.; Pan, S.; and Wang, S. 2025. Unleash LLM s Potential for Sequential Recommendation by Coordinating Dual Dynamic Index Mechanism. In THE WEB CONFERENCE 2025

work page 2025
[28]

Zhang, J.; Xie, R.; Hou, Y.; Zhao, X.; Lin, L.; and Wen, J.-R. 2025. Recommendation as instruction following: A large language model empowered recommendation approach. ACM Transactions on Information Systems, 43(5): 1--37

work page 2025
[29]

Zhang, Y.; Ding, H.; Shui, Z.; Ma, Y.; Zou, J.; Deoras, A.; and Wang, H. 2021. Language models as recommender systems: Evaluations and limitations

work page 2021
[30]

X.; Chen, M.; and Wen, J.-R

Zheng, B.; Hou, Y.; Lu, H.; Chen, Y.; Zhao, W. X.; Chen, M.; and Wen, J.-R. 2024. Adapting Large Language Models by Integrating Collaborative Semantics for Recommendation. In 2024 IEEE 40th International Conference on Data Engineering (ICDE), 1435--1448

work page 2024
[31]

X.; Zhu, Y.; Wang, S.; Zhang, F.; Wang, Z.; and Wen, J.-R

Zhou, K.; Wang, H.; Zhao, W. X.; Zhu, Y.; Wang, S.; Zhang, F.; Wang, Z.; and Wen, J.-R. 2020. S3-rec: Self-supervised learning for sequential recommendation with mutual information maximization. In Proceedings of the 29th ACM international conference on information & knowledge management, 1893--1902

work page 2020
[32]

Zhu, J.; Jin, M.; Liu, Q.; Qiu, Z.; Dong, Z.; and Li, X. 2024. CoST: Contrastive Quantization based Semantic Tokenization for Generative Recommendation. arXiv:2404.14774

work page arXiv 2024