pith. sign in

arxiv: 2605.17267 · v1 · pith:QBQNMUWYnew · submitted 2026-05-17 · 💻 cs.IR

RAGR: Review-Augmented Generative Recommendation

Pith reviewed 2026-05-19 23:16 UTC · model grok-4.3

classification 💻 cs.IR
keywords generative recommendationsequential recommendationreview feedbacksemantic IDsdirect preference optimizationuser sequence modelingnext-item prediction
0
0 comments X

The pith

Interleaving review semantic IDs into item sequences lets generative models use explanatory feedback to improve next-item predictions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Traditional generative recommendation models treat user history as sequences of items only, which misses the reasons for choices that reviews often reveal. RAGR addresses this by building a single mixed sequence that alternates item semantic IDs with review semantic IDs in time order so that review information can shape the autoregressive predictions of future items. An additional alignment step based on direct preference optimization steers the model to output item tokens rather than review tokens at the positions where recommendations are needed. Experiments across three real-world datasets report consistent gains over strong generative baselines on standard ranking metrics.

Core claim

RAGR incorporates review feedback directly into the generative user sequence by interleaving item semantic IDs and review semantic IDs in chronological order, enabling review signals to participate in autoregressive next-token generation, while an Item-Centric Task Generation Alignment based on direct preference optimization preserves the pure recommendation objective by encouraging item tokens over review tokens at prediction positions.

What carries the argument

Review-Augmented User Sequence Modeling that interleaves item and review semantic IDs chronologically, paired with Item-Centric Task Generation Alignment via direct preference optimization to keep the model focused on item prediction.

If this is right

  • Review signals become first-class participants in the token-level generation process rather than separate auxiliary inputs.
  • The shared sequence allows review tokens to condition later item predictions through standard autoregressive attention.
  • Direct preference optimization can enforce task separation inside a unified generative space without retraining separate heads.
  • The resulting model produces higher accuracy on next-item prediction while still generating coherent mixed sequences.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same interleaving pattern could be tested with other explanatory signals such as star ratings or free-text tags.
  • Extending the mixed sequence to include review timestamps might allow the model to weigh recent explanations more heavily.
  • The approach hints that enriching the vocabulary with causal or evaluative tokens may help other sequential decision tasks beyond recommendation.

Load-bearing premise

Interleaving review semantic IDs lets review signals directly shape autoregressive item predictions and the DPO alignment step keeps the recommendation focus without adding new biases or lowering item quality.

What would settle it

Running the same three datasets with the review tokens removed or the DPO alignment disabled and observing no change or a drop in next-item ranking metrics would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.17267 by Junyi Li, Sheng Zhang, Wenlin Zhang, Xiangyu Zhao, Xianneng Li, Xiaowei Qian, Yejing Wang, Yichao Wang, Yingyi Zhang, Yong Liu, Yue Feng.

Figure 1
Figure 1. Figure 1: Comparison Between Existing and Review-Augmented [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The overall framework of the proposed RAGR, which consists of three main stages: Tokenizer Training, Review￾Augmented User Sequence Modeling, and Item-Centric Task Generation Alignment. review-augmented user sequences while preserving the item￾centric recommendation objective. However, incorporating review tokens into the unified generative space introduces ambiguity in the prediction target, since both it… view at source ↗
Figure 3
Figure 3. Figure 3: Illustration of the progressively enhanced training paradigms in the ablation study. Starting from the original item-only [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Comparison of Top-10 SID frequency distributions between item and review. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Illustration of the three tokenizer: training RQ-VAE on item text only, review text only, and both item and review text. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Impact of different tokenizer training strategies on downstream recommendation performance. We compare tokenizers [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Impact of the number of semantic ID tokens on recommendation performance. We report HIT@K and NDCG@K [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Sensitivity analysis of the DPO-based task alignment with respect to the preference coefficient [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗
read the original abstract

Sequential recommendation (SR) is traditionally formulated as next-item prediction over a chronological sequence of interacted items. Although recent generative recommendation (GR) methods introduce new machinery, such as semantic IDs, autoregressive decoding, and unified token spaces, they largely inherit the same item-only modeling assumption. We argue that this design constitutes a structural bottleneck, because user decision-making is not purely behavioral: while item interactions reveal what users choose, review feedback often explain why they choose it by exposing latent evaluative factors. Motivated by this observation, we propose Review-Augmented Generative Recommendation (RAGR), a novel GR framework that incorporates review feedback directly into the generative user sequence rather than treating reviews as auxiliary side information. Specifically, RAGR introduces a Review-Augmented User Sequence Modeling mechanism that interleaves item semantic IDs and review semantic IDs in chronological order to construct a mixed behavioral-semantic sequence, enabling review signals to participate directly in autoregressive next-token generation. To preserve the recommendation objective, we further introduce an Item-Centric Task Generation Alignment strategy based on direct preference optimization (DPO), which encourages the model to favor item tokens over review tokens at prediction positions. Experiments on three real-world datasets show that RAGR yields consistent and significant gains over strong GR backbones across all metrics. Our code and data are available at \url{https://github.com/Zhang-Yingyi/TKDE_RAGR}.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes Review-Augmented Generative Recommendation (RAGR), which augments generative recommendation by interleaving review semantic IDs with item semantic IDs in chronological user sequences to enable direct participation of review signals in autoregressive next-token prediction. An Item-Centric Task Generation Alignment based on DPO is introduced to favor item tokens over review tokens at prediction positions and thereby preserve the core recommendation objective. Experiments on three real-world datasets are reported to yield consistent and significant gains over strong generative recommendation baselines across all metrics.

Significance. If the central modeling claim holds under rigorous validation, the work would be significant for generative recommendation: it directly challenges the item-only assumption by injecting evaluative review semantics into the generative process itself rather than as post-hoc side information. The open release of code and data is a clear strength that supports reproducibility and follow-on work.

major comments (2)
  1. [Experiments] Experiments section: the abstract and results claim 'consistent and significant gains' over strong GR backbones, yet no dataset statistics (user/item counts, sequence lengths), baseline reproduction details, metric definitions, ablation studies, or statistical significance tests are provided. Without these, it is impossible to verify that observed improvements arise from review-signal injection rather than sequence-length effects or altered regularization.
  2. [Method] Item-Centric Task Generation Alignment (DPO) subsection: the construction of chosen/rejected preference pairs and the precise loss weighting used to counteract review-token leakage in interleaved sequences are not specified. Because interleaving changes the token distribution seen during next-token training, any mis-specification here would make the reported metric gains ambiguous between genuine review semantics and unintended side effects of the alignment procedure.
minor comments (2)
  1. [Abstract] Abstract: the phrase 'strong GR backbones' is used without naming the specific models or citing their original papers; adding these references would improve immediate clarity.
  2. [Method] Notation: semantic ID construction for reviews is described at a high level; a short example or pseudocode would help readers understand how review text is tokenized into the shared vocabulary.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We have addressed each major comment below with clarifications and revisions to strengthen the presentation of experimental details and methodological specifications.

read point-by-point responses
  1. Referee: [Experiments] Experiments section: the abstract and results claim 'consistent and significant gains' over strong GR backbones, yet no dataset statistics (user/item counts, sequence lengths), baseline reproduction details, metric definitions, ablation studies, or statistical significance tests are provided. Without these, it is impossible to verify that observed improvements arise from review-signal injection rather than sequence-length effects or altered regularization.

    Authors: We agree that these elements are necessary for full verification. In the revised manuscript we have added a new 'Experimental Setup' subsection reporting: dataset statistics (user/item/review counts and average sequence lengths for all three datasets), baseline reproduction details including exact hyperparameters and training protocols, formal metric definitions, ablation studies that isolate the review interleaving component, and paired statistical significance tests (p < 0.05) against the GR baselines. These additions confirm that performance gains are attributable to review-signal participation rather than sequence-length or regularization artifacts. revision: yes

  2. Referee: [Method] Item-Centric Task Generation Alignment (DPO) subsection: the construction of chosen/rejected preference pairs and the precise loss weighting used to counteract review-token leakage in interleaved sequences are not specified. Because interleaving changes the token distribution seen during next-token training, any mis-specification here would make the reported metric gains ambiguous between genuine review semantics and unintended side effects of the alignment procedure.

    Authors: We acknowledge the original description was too concise. The revised subsection now explicitly defines the preference-pair construction: at each prediction position the chosen response is the ground-truth item semantic ID while rejected responses are the review semantic IDs (and other non-item tokens) that appear in the interleaved sequence. The DPO loss is applied with an explicit weighting hyper-parameter β that balances item-token preference against the standard autoregressive objective on the mixed sequence. This formulation directly counters review-token leakage while preserving the generative recommendation goal. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical proposal validated on external datasets

full rationale

The paper introduces RAGR as an empirical framework that interleaves review semantic IDs into generative sequences and applies DPO-based alignment to preserve the item-prediction objective. No equations, derivations, or parameter-fitting steps are presented that would reduce the claimed performance gains to quantities defined by the method's own inputs or self-referential normalizations. Validation rests on experiments across three real-world datasets against external GR backbones, rendering the central claims falsifiable outside any internal construction and free of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the domain assumption that reviews expose latent evaluative factors that can be usefully encoded as semantic IDs and interleaved without breaking the autoregressive recommendation objective; no new physical entities or free parameters are introduced in the abstract.

axioms (1)
  • domain assumption User decision-making is not purely behavioral and review feedback exposes latent evaluative factors that explain choices.
    Explicitly stated in the opening motivation paragraph of the abstract.

pith-pipeline@v0.9.0 · 5808 in / 1223 out tokens · 46351 ms · 2026-05-19T23:16:31.126607+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 1 internal anchor

  1. [1]

    Recommender systems in the era of large language models (llms),

    Z. Zhao, W. Fan, J. Li, Y . Liu, X. Mei, Y . Wang, Z. Wen, F. Wang, X. Zhao, J. Tanget al., “Recommender systems in the era of large language models (llms),”IEEE Transactions on Knowledge and Data Engineering, vol. 36, no. 11, pp. 6889–6907, 2024. 1

  2. [2]

    Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering,

    R. He and J. McAuley, “Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering,” inproceedings of the 25th international conference on world wide web, 2016, pp. 507–517. 1, 5

  3. [3]

    Billion- scale commodity embedding for e-commerce recommendation in alibaba,

    J. Wang, P. Huang, H. Zhao, Z. Zhang, B. Zhao, and D. L. Lee, “Billion- scale commodity embedding for e-commerce recommendation in alibaba,” inProceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, 2018, pp. 839–848. 1

  4. [4]

    Self-attentive sequential recommendation,

    W.-C. Kang and J. McAuley, “Self-attentive sequential recommendation,” in2018 IEEE international conference on data mining (ICDM). IEEE, 2018, pp. 197–206. 1, 5, 6, 11

  5. [5]

    Contrastive learning for sequential recommendation,

    X. Xie, F. Sun, Z. Liu, S. Wu, J. Gao, J. Zhang, B. Ding, and B. Cui, “Contrastive learning for sequential recommendation,” in2022 IEEE 38th international conference on data engineering (ICDE). IEEE, 2022, pp. 1259–1273. 1, 11

  6. [6]

    Deep interest network for click-through rate prediction,

    G. Zhou, X. Zhu, C. Song, Y . Fan, H. Zhu, X. Ma, Y . Yan, J. Jin, H. Li, and K. Gai, “Deep interest network for click-through rate prediction,” inProceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, 2018, pp. 1059–1068. 1

  7. [7]

    Deep interest evolution network for click-through rate prediction,

    G. Zhou, N. Mou, Y . Fan, Q. Pi, W. Bian, C. Zhou, X. Zhu, and K. Gai, “Deep interest evolution network for click-through rate prediction,” in Proceedings of the AAAI conference on artificial intelligence, vol. 33, no. 01, 2019, pp. 5941–5948. 1

  8. [8]

    A survey on sequential recommendation,

    L.-W. Pan, W.-K. Pan, M.-Y . Wei, H.-Z. Yin, and Z. Ming, “A survey on sequential recommendation,”Frontiers of Computer Science, vol. 20, no. 3, p. 2003606, 2026. 1, 11

  9. [9]

    Deep learning for sequential recommendation: Algorithms, influential factors, and evaluations,

    H. Fang, D. Zhang, Y . Shu, and G. Guo, “Deep learning for sequential recommendation: Algorithms, influential factors, and evaluations,”ACM Transactions on Information Systems (TOIS), vol. 39, no. 1, pp. 1–42,

  10. [10]

    1 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, SUBMISSION 2026 13

  11. [11]

    Bert4rec: Sequential recommendation with bidirectional encoder representations from transformer,

    F. Sun, J. Liu, J. Wu, C. Pei, X. Lin, W. Ou, and P. Jiang, “Bert4rec: Sequential recommendation with bidirectional encoder representations from transformer,” inProceedings of the 28th ACM international conference on information and knowledge management, 2019, pp. 1441–

  12. [12]

    Recommender systems with generative retrieval,

    S. Rajput, N. Mehta, A. Singh, R. Hulikal Keshavan, T. Vu, L. Heldt, L. Hong, Y . Tay, V . Tran, J. Samostet al., “Recommender systems with generative retrieval,”Advances in Neural Information Processing Systems, vol. 36, pp. 10 299–10 315, 2023. 1, 2, 3, 5, 6, 11

  13. [13]

    A survey of generative recommendation from a tri-decoupled perspective: Tokenization, architecture, and optimization,

    X. Li, B. Chen, J. She, S. Cao, Y . Wang, Q. Jia, H. He, Z. Zhou, Z. Liu, J. Liuet al., “A survey of generative recommendation from a tri-decoupled perspective: Tokenization, architecture, and optimization,”

  14. [14]

    Learnable item tokenization for generative recommendation,

    W. Wang, H. Bao, X. Lin, J. Zhang, Y . Li, F. Feng, S.-K. Ng, and T.-S. Chua, “Learnable item tokenization for generative recommendation,” in Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, 2024, pp. 2400–2409. 1, 2, 6, 11

  15. [15]

    Tokenrec: Learning to tokenize id for llm-based generative recommendations,

    H. Qu, W. Fan, Z. Zhao, and Q. Li, “Tokenrec: Learning to tokenize id for llm-based generative recommendations,”IEEE Transactions on Knowledge and Data Engineering, 2025. 1, 2, 11

  16. [16]

    Joint deep modeling of users and items using reviews for recommendation,

    L. Zheng, V . Noroozi, and P. S. Yu, “Joint deep modeling of users and items using reviews for recommendation,” inProceedings of the tenth ACM international conference on web search and data mining, 2017, pp. 425–434. 1, 12

  17. [17]

    Identifying features in opinion mining via intrinsic and extrinsic domain relevance,

    Z. Hai, K. Chang, J.-J. Kim, and C. C. Yang, “Identifying features in opinion mining via intrinsic and extrinsic domain relevance,”IEEE transactions on knowledge and data engineering, vol. 26, no. 3, pp. 623–634, 2013. 1

  18. [18]

    Hidden factors and hidden topics: understanding rating dimensions with review text,

    J. McAuley and J. Leskovec, “Hidden factors and hidden topics: understanding rating dimensions with review text,” inProceedings of the 7th ACM conference on Recommender systems, 2013, pp. 165–172. 1

  19. [19]

    A review of modern recommender systems using generative models (gen-recsys),

    Y . Deldjoo, Z. He, J. McAuley, A. Korikov, S. Sanner, A. Ramisa, R. Vidal, M. Sathiamoorthy, A. Kasirzadeh, and S. Milano, “A review of modern recommender systems using generative models (gen-recsys),” inProceedings of the 30th ACM SIGKDD conference on Knowledge Discovery and Data Mining, 2024, pp. 6448–6458. 2, 11

  20. [20]

    Rethinking large language model architectures for sequential recommendations,

    H. Wang, X. Liu, W. Fan, X. Zhao, V . Kini, D. P. Yadav, F. Wang, Z. Wen, and H. Liu, “Rethinking large language model architectures for sequential recommendations,” inProceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, 20...

  21. [21]

    Direct preference optimization: Your language model is secretly a reward model,

    R. Rafailov, A. Sharma, E. Mitchell, C. D. Manning, S. Ermon, and C. Finn, “Direct preference optimization: Your language model is secretly a reward model,”Advances in neural information processing systems, vol. 36, pp. 53 728–53 741, 2023. 2, 5

  22. [22]

    Autoregressive image generation using residual quantization,

    D. Lee, C. Kim, S. Kim, M. Cho, and W.-S. Han, “Autoregressive image generation using residual quantization,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 11 523–11 532. 4

  23. [23]

    Session-based Recommendations with Recurrent Neural Networks

    B. Hidasi, A. Karatzoglou, L. Baltrunas, and D. Tikk, “Session- based recommendations with recurrent neural networks,”arXiv preprint arXiv:1511.06939, 2015. 6, 11

  24. [24]

    S3-rec: Self-supervised learning for sequential recom- mendation with mutual information maximization,

    K. Zhou, H. Wang, W. X. Zhao, Y . Zhu, S. Wang, F. Zhang, Z. Wang, and J.-R. Wen, “S3-rec: Self-supervised learning for sequential recom- mendation with mutual information maximization,” inProceedings of the 29th ACM international conference on information & knowledge management, 2020, pp. 1893–1902. 6, 11

  25. [25]

    Exploring the limits of transfer learning with a unified text-to-text transformer,

    C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y . Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer learning with a unified text-to-text transformer,”Journal of machine learning research, vol. 21, no. 140, pp. 1–67, 2020. 6

  26. [26]

    Time matters: Sequential recommendation with complex temporal information,

    W. Ye, S. Wang, X. Chen, X. Wang, Z. Qin, and D. Yin, “Time matters: Sequential recommendation with complex temporal information,” in Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval, 2020, pp. 1459–1468. 11

  27. [27]

    Dynamic graph neural networks for sequential recommendation,

    M. Zhang, S. Wu, X. Yu, Q. Liu, and L. Wang, “Dynamic graph neural networks for sequential recommendation,”IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 5, pp. 4741–4753,

  28. [28]

    Hamur: Hyper adapter for multi-domain recommendation,

    X. Li, F. Yan, X. Zhao, Y . Wang, B. Chen, H. Guo, and R. Tang, “Hamur: Hyper adapter for multi-domain recommendation,” inProceedings of the 32nd ACM International Conference on Information and Knowledge Management, 2023, pp. 1268–1277. 11

  29. [29]

    Llmemb: Large language model can be a good embedding generator for sequential recommendation,

    Q. Liu, X. Wu, W. Wang, Y . Wang, Y . Zhu, X. Zhao, F. Tian, and Y . Zheng, “Llmemb: Large language model can be a good embedding generator for sequential recommendation,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 11, 2025, pp. 12 183– 12 191. 11

  30. [30]

    Towards large generative recommendation: A tokenization perspective,

    Y . Hou, A. Zhang, L. Sheng, J. Wu, X. Wang, T.-S. Chua, and J. McAuley, “Towards large generative recommendation: A tokenization perspective,” inProceedings of the 34th ACM International Conference on Information and Knowledge Management, 2025, pp. 6821–6824. 11

  31. [31]

    Generative recommendation with semantic ids: A practitioner’s handbook,

    C. M. Ju, L. Collins, L. Neves, B. Kumar, L. Y . Wang, T. Zhao, and N. Shah, “Generative recommendation with semantic ids: A practitioner’s handbook,” inProceedings of the 34th ACM International Conference on Information and Knowledge Management, 2025, pp. 6420–6425. 11

  32. [32]

    Generating long semantic ids in parallel for recommendation,

    Y . Hou, J. Li, A. Shin, J. Jeon, A. Santhanam, W. Shao, K. Hassani, N. Yao, and J. McAuley, “Generating long semantic ids in parallel for recommendation,” inProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 2, 2025, pp. 956–966. 11

  33. [33]

    Actionpiece: Contextually tokenizing action sequences for generative recommendation,

    Y . Hou, J. Ni, Z. He, N. Sachdeva, W.-C. Kang, E. H. Chi, J. McAuley, and D. Z. Cheng, “Actionpiece: Contextually tokenizing action sequences for generative recommendation,” inForty-second International Confer- ence on Machine Learning, 2025. 12

  34. [34]

    Sparse meets dense: Unified generative recommendations with cascaded sparse-dense representations,

    Y . Yang, Z. Ji, Z. Li, Y . LI, Z. Mo, Y . Ding, K. Chen, Z. Zhang, J. Li, shuanglong li, and L. LIN, “Sparse meets dense: Unified generative recommendations with cascaded sparse-dense representations,” inThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. 12

  35. [35]

    Nezha: A zero-sacrifice and hyperspeed decoding architecture for generative recommendations,

    Y . Wang, S. Zhou, J. Lu, Z. Liu, L. Liu, M. Wang, W. Zhang, F. Li, W. Su, P. Wanget al., “Nezha: A zero-sacrifice and hyperspeed decoding architecture for generative recommendations,”arXiv preprint arXiv:2511.18793, 2025. 12

  36. [36]

    A survey on accuracy-oriented neural recommendation: From collaborative filtering to information-rich recommendation,

    L. Wu, X. He, X. Wang, K. Zhang, and M. Wang, “A survey on accuracy-oriented neural recommendation: From collaborative filtering to information-rich recommendation,”IEEE transactions on knowledge and data engineering, vol. 35, no. 5, pp. 4425–4445, 2022. 12

  37. [37]

    Review-based recommender systems: A survey of approaches, challenges and future perspectives,

    E. Hasan, M. Rahman, C. Ding, J. X. Huang, and S. Raza, “Review-based recommender systems: A survey of approaches, challenges and future perspectives,”ACM Comput. Surv., vol. 58, no. 1, Sep. 2025. 12

  38. [38]

    Aspect-aware latent factor model: Rating prediction with ratings and reviews,

    Z. Cheng, Y . Ding, L. Zhu, and M. Kankanhalli, “Aspect-aware latent factor model: Rating prediction with ratings and reviews,” inProceedings of the 2018 world wide web conference, 2018, pp. 639–648. 12

  39. [39]

    Hadsf: Aspect aware semantic control for explainable recommendation,

    Z. Nie and P. Sun, “Hadsf: Aspect aware semantic control for explainable recommendation,” inProceedings of the Nineteenth ACM International Conference on Web Search and Data Mining, 2026, pp. 509–519. 12

  40. [40]

    Neural attentional rating regression with review-level explanations,

    C. Chen, M. Zhang, Y . Liu, and S. Ma, “Neural attentional rating regression with review-level explanations,” inProceedings of the 2018 world wide web conference, 2018, pp. 1583–1592. 12

  41. [41]

    Asymmetrical hierarchical networks with attentive interactions for interpretable review-based recommendation,

    X. Dong, J. Ni, W. Cheng, Z. Chen, B. Zong, D. Song, Y . Liu, H. Chen, and G. De Melo, “Asymmetrical hierarchical networks with attentive interactions for interpretable review-based recommendation,” in Proceedings of the AAAI conference on artificial intelligence, vol. 34, no. 05, 2020, pp. 7667–7674. 12

  42. [42]

    Leveraging review properties for effective recommendation,

    X. Wang, I. Ounis, and C. Macdonald, “Leveraging review properties for effective recommendation,” inProceedings of the Web Conference 2021, 2021, pp. 2209–2219. 12

  43. [43]

    Heterogeneous information net- work embedding for recommendation,

    C. Shi, B. Hu, W. X. Zhao, and P. S. Yu, “Heterogeneous information net- work embedding for recommendation,”IEEE transactions on knowledge and data engineering, vol. 31, no. 2, pp. 357–370, 2018. 12

  44. [44]

    Adaptive hierarchical attention-enhanced gated network integrating reviews for item recom- mendation,

    D. Liu, J. Wu, J. Li, B. Du, J. Chang, and X. Li, “Adaptive hierarchical attention-enhanced gated network integrating reviews for item recom- mendation,”IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 5, pp. 2076–2090, 2020. 12

  45. [45]

    How useful are reviews for recommenda- tion? a critical review and potential improvements,

    N. Sachdeva and J. McAuley, “How useful are reviews for recommenda- tion? a critical review and potential improvements,” inproceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval, 2020, pp. 1845–1848. 12

  46. [46]

    Recommendation as language processing (rlp): A unified pretrain, personalized prompt & predict paradigm (p5),

    S. Geng, S. Liu, Z. Fu, Y . Ge, and Y . Zhang, “Recommendation as language processing (rlp): A unified pretrain, personalized prompt & predict paradigm (p5),” inProceedings of the 16th ACM conference on recommender systems, 2022, pp. 299–315. 12

  47. [47]

    Llm-rec: Personalized recommendation via prompting large language models,

    H. Lyu, S. Jiang, H. Zeng, Y . Xia, Q. Wang, S. Zhang, R. Chen, C. Leung, J. Tang, and J. Luo, “Llm-rec: Personalized recommendation via prompting large language models,” inFindings of the Association for Computational Linguistics: NAACL 2024, 2024, pp. 583–612. 12

  48. [48]

    Rdrec: Rationale distillation for llm-based recommendation,

    X. Wang, J. Cui, Y . Suzuki, and F. Fukumoto, “Rdrec: Rationale distillation for llm-based recommendation,” inProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2024, pp. 65–74. 12

  49. [49]

    Llm-cure: Llm-based competitor user review analysis for feature enhancement,

    M. Assi, S. Hassan, and Y . Zou, “Llm-cure: Llm-based competitor user review analysis for feature enhancement,”ACM Transactions on Software Engineering and Methodology, 2024. 12 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, SUBMISSION 2026 14 Yingyi Zhangis currently a PhD candidate at the joint program between Dalian University of Technology and C...

  50. [50]

    He has published extensively in leading venues such as NeurIPS, ICLR, and ACL, receiving over 11,000 citations and an h- index of 24

    His research focuses on trustworthy, reasoning- capable, and evolving AI agents. He has published extensively in leading venues such as NeurIPS, ICLR, and ACL, receiving over 11,000 citations and an h- index of 24. More information about him can be found at https://lijunyi.tech/. Yejing Wangis currently a Data Science Ph.D. candidate at City University of...

  51. [51]

    His research interests include efficient sequential recommendation and agent memory

    He has published 7 papers in top-tier confer- ences (e.g., KDD, SIGIR, and AAAI). His research interests include efficient sequential recommendation and agent memory. More information about him can be found at https://szhang-cityu.github.io/. Yue Fengis currently a Ph.D. student of the School of Economics and Management at Dalian University of Technology....