pith. sign in

arxiv: 2511.18740 · v2 · submitted 2025-11-24 · 💻 cs.IR

Multimodal Large Language Models with Adaptive Preference Optimization for Sequential Recommendation

Pith reviewed 2026-05-17 06:12 UTC · model grok-4.3

classification 💻 cs.IR
keywords multimodal large language modelssequential recommendationpreference optimizationhardness-aware learningnoise regularizationcross-modal biasdirect preference optimization
0
0 comments X

The pith

HaNoRec improves multimodal LLM recommendations by weighting harder training samples and adding Gaussian noise to correct modality biases.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a multimodal LLM framework called HaNoRec for sequential recommendation that combines supervised fine-tuning with an adapted form of direct preference optimization. It targets two problems in existing approaches: random negative samples cause the model to overfit easy cases while under-training on difficult ones, and fixed reference models in standard DPO lock in cross-modal misalignments between text and images. HaNoRec counters the first issue by estimating sample hardness on the fly and scaling optimization weights accordingly, while the second is addressed through Gaussian perturbations applied to the policy model's output logits. A sympathetic reader would care because this setup lets visual signals such as product images shape user preference modeling more reliably than text-only methods.

Core claim

HaNoRec integrates hardness-aware and noise-regularized preference optimization into multimodal LLMs for sequential recommendation. Optimization weights are adjusted dynamically according to the estimated hardness of each training sample and the policy model's current responsiveness, so harder examples receive more emphasis during training. Gaussian-perturbed distribution optimization is then applied to the output logits to strengthen cross-modal semantic consistency and reduce the modality bias inherited from the reference model.

What carries the argument

HaNoRec, a hardness-aware and noise-regularized preference optimization method that reweights samples by estimated difficulty and applies Gaussian perturbations to output logits.

If this is right

  • Prioritizing harder examples during optimization reduces overfitting to easy negatives sampled from user histories.
  • Gaussian perturbations on output logits produce more consistent alignments between textual descriptions and visual item features.
  • The policy model becomes less constrained by biases in the fixed reference model, especially across longer user sequences.
  • Recommendation performance improves when visual signals like product images or movie posters are incorporated into preference learning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same hardness estimation and noise regularization could be tested on non-recommendation tasks that involve long multimodal sequences.
  • If the method scales, it may reduce the need for extensive negative sampling strategies in other preference optimization pipelines.
  • Cross-modal consistency gains might translate to better handling of noisy or missing visual data in real-world recommendation settings.

Load-bearing premise

Hardness of training samples can be estimated reliably during training and Gaussian perturbations on logits will reduce cross-modal bias without creating new instabilities or lowering overall recommendation quality.

What would settle it

An ablation study that removes either the hardness-based reweighting or the Gaussian logit perturbations and measures whether recommendation metrics such as NDCG or Hit Rate on standard sequential datasets remain unchanged or degrade.

Figures

Figures reproduced from arXiv: 2511.18740 by Fei Liu, Le Wu, Richang Hong, Yi Zhang, Yonghui Yang, Yu Wang.

Figure 1
Figure 1. Figure 1: (a) Existing paradigms and performance comparisons of large model-based sequential recommendation; (b) Data [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Illustration of HaNoRec. Hardness-aware Reweighting Strategy (HaRS) scales DPO weight proportionally to sample hardness, spotlighting challenging cases. Noise-regularized Distribution Optimization (NoDO) adds Gaussian noise and adaptive KL to align title–image semantics and curb modality bias. where (h𝑗 + x𝑗) denotes the embedding of item 𝑣𝑗 , and (h𝑦∗ + x𝑦∗ ) represents the embedding of the corresponding … view at source ↗
Figure 3
Figure 3. Figure 3: Ablation studies of model variants on the all [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Case study on Movielens for user 4126. 2, showing that NoDO effectively mitigates title–image mismatch caused by MLLM modality bias. Finally, we conduct a blind test using GPT-4o [19], asking it to judge which recommendation list better matches the user based solely on historical interactions, without access to ground truth. As shown in the gray region of figure, GPT-4o finds that HaRS’s recommendations be… view at source ↗
Figure 5
Figure 5. Figure 5: Performance comparison w.r.t. different hyperpa [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: The prompt format for SFT paradigm [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
read the original abstract

Recent advances in Large Language Models (LLMs) have opened new avenues for sequential recommendation by enabling natural language reasoning over user behavior sequences. A common approach formulates recommendation as a language modeling task, where interaction histories are transformed into prompts and user preferences are learned via supervised fine-tuning. However, these methods operate solely in the textual modality and often miss users' fine-grained interests, especially when shaped by rich visual signals such as product images or movie posters. Multimodal Large Language Models (MLLMs) offer a promising alternative by aligning text and vision in a shared semantic space. A prevalent training paradigm applies Supervised Fine-Tuning (SFT) followed by Direct Preference Optimization (DPO) to model user preferences. Yet, two core challenges remain: 1) Imbalanced sample hardness, where random negative sampling causes overfitting on easy examples and under-training on hard ones; 2) Cross-modal semantic bias, where the fixed reference model in DPO prevents the policy model from correcting modality misalignments--especially over long sequences. To address these issues, we propose a Multimodal LLM framework that integrates Hardness-aware and Noise-regularized preference optimization for Recommendation (HaNoRec). Specifically, HaNoRec dynamically adjusts optimization weights based on both the estimated hardness of each training sample and the policy model's real-time responsiveness, prioritizing harder examples. It further introduces Gaussian-perturbed distribution optimization on output logits to enhance cross-modal semantic consistency and reduce modality bias inherited from the reference model.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes HaNoRec, a Multimodal Large Language Model framework for sequential recommendation. It integrates hardness-aware preference optimization that dynamically reweights DPO-style updates according to per-sample hardness estimates and the policy model's real-time responsiveness, together with noise-regularized optimization that applies Gaussian perturbations to output logits in order to improve cross-modal semantic consistency and reduce modality bias inherited from the fixed reference model.

Significance. If the adaptive reweighting and logit perturbation mechanisms can be shown to operate as described without introducing instabilities or circular dependencies, the work would address two practically relevant limitations of standard SFT+DPO pipelines for MLLM-based recommendation and could improve performance on visually rich item sequences.

major comments (2)
  1. [Abstract] Abstract: the hardness-aware component is described as dynamically adjusting weights 'based on both the estimated hardness of each training sample and the policy model's real-time responsiveness,' yet no metric, estimator, or update rule is supplied. Because this reweighting is load-bearing for the claim of prioritizing harder examples, the absence of a concrete definition leaves open the risk that responsiveness correlates with quantities already shaped by the current policy (e.g., loss or logit magnitude), potentially creating a self-reinforcing loop rather than correcting intrinsic hardness.
  2. [Abstract] Abstract: the noise-regularized component asserts that 'Gaussian-perturbed distribution optimization on output logits' will 'enhance cross-modal semantic consistency and reduce modality bias inherited from the reference model.' No derivation, variance schedule, or propagation analysis through the multimodal alignment layers is given, so it remains unclear whether the isotropic perturbation isolates and corrects modality misalignment or simply adds generic smoothing that may degrade ranking precision.
minor comments (1)
  1. The abstract would be strengthened by a single high-level equation or pseudocode block that defines the combined loss or the responsiveness measure.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for your valuable comments on our paper. We have reviewed the concerns about the descriptions in the abstract and will update the manuscript accordingly to provide more concrete information.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the hardness-aware component is described as dynamically adjusting weights 'based on both the estimated hardness of each training sample and the policy model's real-time responsiveness,' yet no metric, estimator, or update rule is supplied. Because this reweighting is load-bearing for the claim of prioritizing harder examples, the absence of a concrete definition leaves open the risk that responsiveness correlates with quantities already shaped by the current policy (e.g., loss or logit magnitude), potentially creating a self-reinforcing loop rather than correcting intrinsic hardness.

    Authors: We acknowledge the need for more specificity in the abstract. The full paper provides the hardness estimator as the per-sample DPO loss and the responsiveness as the policy model's update magnitude on the sample. The reweighting is formulated to use a lagged version of the responsiveness to break any potential self-reinforcing loop. We will revise the abstract to briefly describe these elements and refer readers to Section 3 for the full details and analysis demonstrating the absence of circular dependencies. revision: yes

  2. Referee: [Abstract] Abstract: the noise-regularized component asserts that 'Gaussian-perturbed distribution optimization on output logits' will 'enhance cross-modal semantic consistency and reduce modality bias inherited from the reference model.' No derivation, variance schedule, or propagation analysis through the multimodal alignment layers is given, so it remains unclear whether the isotropic perturbation isolates and corrects modality misalignment or simply adds generic smoothing that may degrade ranking precision.

    Authors: We agree that additional technical details would strengthen the abstract. In the revised version, we will include a short description of the Gaussian perturbation variance schedule and note that the analysis in Section 3.3 shows the perturbation is propagated through the multimodal layers in a way that specifically encourages correction of modality bias. We will also clarify that empirical results indicate no degradation in ranking precision. The full derivation is available in the appendix. revision: yes

Circularity Check

0 steps flagged

HaNoRec framework introduces adaptive weighting and logit perturbations without reducing claims to self-referential inputs or fitted predictions.

full rationale

The paper outlines a multimodal LLM approach extending SFT and DPO with hardness-aware reweighting based on sample difficulty and policy responsiveness plus Gaussian perturbations on output logits to address cross-modal bias. No equations or derivations are presented that equate a claimed prediction or optimization outcome directly to its own estimation procedure by construction. Hardness and responsiveness are described as dynamic quantities computed from training data and model behavior, but without shown reductions that make the reweighting tautological or force the noise regularization to be equivalent to the bias it targets. The approach cites standard preference optimization literature without load-bearing self-citations that import uniqueness theorems or smuggle ansatzes. The derivation remains self-contained with independent methodological content.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the method description implies standard LLM fine-tuning assumptions plus two new optimization heuristics whose details and justification are not supplied.

pith-pipeline@v0.9.0 · 5573 in / 1200 out tokens · 49642 ms · 2026-05-17T06:12:29.767425+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. ProMax: Exploring the Potential of LLM-derived Profiles with Distribution Shaping for Recommender Systems

    cs.IR 2026-04 unverdicted novelty 7.0

    ProMax uses dense retrieval and dual distribution reshaping on LLM-derived profiles to guide recommender models toward preferences for unseen items, substantially boosting base model performance on public datasets.

  2. DIAURec: Dual-Intent Space Representation Optimization for Recommendation

    cs.IR 2026-04 unverdicted novelty 5.0

    DIAURec unifies intent and language modeling to reconstruct and optimize representations in prototype and distribution spaces, outperforming baselines on three datasets.

Reference graph

Works this paper leans on

76 extracted references · 76 canonical work pages · cited by 2 Pith papers · 13 internal anchors

  1. [1]

    Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Floren- cia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. 2023. Gpt-4 Technical Report.arXiv preprint arXiv:2303.08774 (2023)

  2. [2]

    Alexander A Alemi, Ian Fischer, Joshua V Dillon, and Kevin Murphy. 2017. Deep Variational Information Bottleneck. InICLR

  3. [3]

    Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, Humen Zhong, Yuanzhi Zhu, Mingkun Yang, Zhaohai Li, Jianqiang Wan, Pengfei Wang, Wei Ding, Zheren Fu, Yiheng Xu, Jiabo Ye, Xi Zhang, Tianbao Xie, Zesen Cheng, Hang Zhang, Zhibo Yang, Haiyang Xu, and Junyang Lin. 2025. Qwen2.5-VL Technical Rep...

  4. [4]

    Zechen Bai, Pichao Wang, Tianjun Xiao, Tong He, Zongbo Han, Zheng Zhang, and Mike Zheng Shou. 2024. Hallucination of Multimodal Large Language Models: A Survey.arXiv preprint arXiv:2404.18930(2024)

  5. [5]

    Keqin Bao, Jizhi Zhang, Wenjie Wang, Yang Zhang, Zhengyi Yang, Yanchen Luo, Chong Chen, Fuli Feng, and Qi Tian. 2025. A Bi-Step Grounding Paradigm for Large Language Models in Recommendation Systems.ACM Transactions on Recommender Systems (TORS)3, 4 (2025), 1–27

  6. [6]

    Keqin Bao, Jizhi Zhang, Yang Zhang, Wenjie Wang, Fuli Feng, and Xiangnan He. 2023. TallRec: An Effective and Efficient Tuning Framework to Align lLarge Language Model with Recommendation. InRecSys. 1007–1014

  7. [7]

    Jiawei Chen, Hongyu Lin, Xianpei Han, and Le Sun. 2024. Benchmarking Large Language Models in Retrieval-Augmented Generation. InAAAI, Vol. 38. 17754– 17762

  8. [8]

    Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv preprint arXiv:1412.3555(2014)

  9. [9]

    Sunhao Dai, Chen Xu, Shicheng Xu, Liang Pang, Zhenhua Dong, and Jun Xu

  10. [10]

    Bias and Unfairness in Information Retrieval Systems: New Challenges in the LLM Era. InKDD. 6437–6447

  11. [11]

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL. 4171–4186

  12. [12]

    Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. 2024. The Llama 3 Herd of Models.arXiv preprint arXiv:2407.21783(2024)

  13. [13]

    Chongming Gao, Ruijun Chen, Shuai Yuan, Kexin Huang, Yuanqing Yu, and Xiangnan He. 2025. SPRec: Self-Play to Debias LLM-based Recommendation. In WWW. 5075–5084

  14. [14]

    Yunfan Gao, Tao Sheng, Youlin Xiang, Yun Xiong, Haofen Wang, and Jiawei Zhang. 2023. Chat-Rec: Towards Interactive and Explainable LLMs-augmented Recommender System.arXiv preprint arXiv:2303.14524(2023)

  15. [15]

    Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, and Meng Wang. 2020. Lightgcn: Simplifying and Powering Graph Convolution Network for Recommendation. InSIGIR. 639–648

  16. [16]

    Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural Collaborative Filtering. InWWW. 173–182

  17. [17]

    Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk

  18. [18]

    Session-based Recommendations with Recurrent Neural Networks. In ICLR

  19. [19]

    Yupeng Hou, Jiacheng Li, Zhankui He, An Yan, Xiusi Chen, and Julian McAuley

  20. [20]

    Bridging language and items for retrieval and recommendation.arXiv preprint arXiv:2403.03952(2024)

  21. [21]

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. 2022. Lora: Low-rank Adaptation of Large Language Models.. InICLR

  22. [22]

    Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. 2024. GPT-4o System Card.arXiv preprint arXiv:2410.21276(2024)

  23. [23]

    Jiaming Ji, Mickel Liu, Josef Dai, Xuehai Pan, Chi Zhang, Ce Bian, Boyuan Chen, Ruiyang Sun, Yizhou Wang, and Yaodong Yang. 2023. Beavertails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset. InNeurIPS, Vol. 36. 24678–24704

  24. [24]

    Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive Sequential Recom- mendation. InICDM. 197–206

  25. [25]

    Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. 2020. Scaling Laws for Neural Language Models.arXiv preprint arXiv:2001.08361(2020)

  26. [26]

    Feng Li, Renrui Zhang, Hao Zhang, Yuanhan Zhang, Bo Li, Wei Li, Zejun Ma, and Chunyuan Li. 2024. LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models.arXiv preprint arXiv:2407.07895(2024)

  27. [27]

    Lei Li, Zhihui Xie, Mukai Li, Shunian Chen, Peiyi Wang, Liang Chen, Yazheng Yang, Benyou Wang, Lingpeng Kong, and Qi Liu. 2024. VLFeedback: A Large- Scale AI Feedback Dataset for Large Vision-Language Models Alignment.arXiv preprint arXiv:2410.09421(2024)

  28. [28]

    Jiayi Liao, Xiangnan He, Ruobing Xie, Jiancan Wu, Yancheng Yuan, Xingwu Sun, Zhanhui Kang, and Xiang Wang. 2024. RosePO: Aligning LLM-based Recom- menders with Human Values.arXiv preprint arXiv:2410.12519(2024)

  29. [29]

    Jiayi Liao, Sihang Li, Zhengyi Yang, Jiancan Wu, Yancheng Yuan, Xiang Wang, and Xiangnan He. 2024. Llara: Large Language-Recommendation Assistant. In SIGIR. 1785–1795

  30. [30]

    Yuqing Liu, Yu Wang, Lichao Sun, and Philip S Yu. 2024. Rec-GPT4V: Multi- modal Recommendation with Large Vision-Language Models.arXiv preprint arXiv:2402.08670(2024)

  31. [31]

    Jinda Lu, Junkang Wu, Jinghan Li, Xiaojun Jia, Shuo Wang, YiFan Zhang, Junfeng Fang, Xiang Wang, and Xiangnan He. 2025. DAMO: Data-and Model-aware Alignment of Multi-modal LLMs. InICML

  32. [32]

    Yu Meng, Mengzhou Xia, and Danqi Chen. 2024. SimPO: Simple Preference Optimization with a Reference-Free Reward. InNeurIPS, Vol. 37. 124198–124235

  33. [33]

    Yongxin Ni, Yu Cheng, Xiangyan Liu, Junchen Fu, Youhua Li, Xiangnan He, Yongfeng Zhang, and Fajie Yuan. 2023. A Content-driven Micro-video Recom- mendation Dataset at Scale.arXiv preprint arXiv:2309.15379(2023)

  34. [34]

    Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schul- man, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F Christiano, Jan Leike, and Ryan Lowe. 2022. Training Lan- guage Models to Follow Instructions with Hum...

  35. [35]

    Yingtao Peng, Chen Gao, Yu Zhang, Tangpeng Dan, Xiaoyi Du, Hengliang Luo, Yong Li, and Xiaofeng Meng. 2025. Denoising alignment with large language model for recommendation.ACM Transactions on Information Systems (TOIS)43, 2 (2025), 1–35

  36. [36]

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sand- hini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al

  37. [37]

    Learning Transferable Visual Models From Natural Language Supervision. InICML. 8748–8763

  38. [38]

    Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. 2023. Direct Preference Optimization: Your Language Model is Secretly a Reward Model. InNeurIPS, Vol. 36. 53728–53741

  39. [39]

    Xubin Ren and Chao Huang. 2024. EasyRec: Simple yet Effective Language Models for Recommendation.arXiv preprint arXiv:2408.08821(2024)

  40. [40]

    Xubin Ren, Jiabin Tang, Dawei Yin, Nitesh Chawla, and Chao Huang. 2024. A Survey of Large Language Models for Graphs. InKDD. 6616–6626

  41. [41]

    Xubin Ren, Wei Wei, Lianghao Xia, Lixin Su, Suqi Cheng, Junfeng Wang, Dawei Yin, and Chao Huang. 2024. Representation Learning with Large Language Models for Recommendation. InWWW. 3464–3475

  42. [42]

    Francesco Ricci, Lior Rokach, and Bracha Shapira. 2010. Introduction to Rec- ommender Systems Handbook. InRecommender Systems Handbook. Springer, 1–35

  43. [43]

    Lei Sang, Yu Wang, Yi Zhang, Yiwen Zhang, and Xindong Wu. 2025. Intent- guided Heterogeneous Graph Contrastive Learning for Recommendation.IEEE Transactions on Knowledge and Data Engineering (TKDE)37, 4 (2025), 1915–1929

  44. [44]

    John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov

  45. [45]

    Proximal Policy Optimization Algorithms.arXiv preprint arXiv:1707.06347 (2017)

  46. [46]

    Leheng Sheng, An Zhang, Yi Zhang, Yuxin Chen, Xiang Wang, and Tat-Seng Chua

  47. [47]

    Language Representations Can be What Recommenders Need: Findings and Potentials. InICLR

  48. [48]

    Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang

  49. [49]

    BERT4Rec: Sequential Recommendation with Bidirectional Encoder Repre- sentations from Transformer. InCIKM. 1441–1450

  50. [50]

    Zhiqing Sun, Sheng Shen, Shengcao Cao, Haotian Liu, Chunyuan Li, Yikang Shen, Chuang Gan, Liang-Yan Gui, Yu-Xiong Wang, Yiming Yang, et al. 2024. Aligning Large Multimodal Models with Factually Augmented RLHF. InACL

  51. [51]

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. InNeurIPS, Vol. 30

  52. [52]

    Fei Wang, Wenxuan Zhou, James Y Huang, Nan Xu, Sheng Zhang, Hoifung Poon, and Muhao Chen. 2024. mDPO: Conditional Preference Optimization for Multimodal Large Language Models. InEMNLP. 8078–8088

  53. [53]

    Peng Wang, Shuai Bai, Sinan Tan, Shijie Wang, Zhihao Fan, Jinze Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, et al. 2024. Qwen2-VL: Enhancing Vision-Language Model’s Perception of the World at Any Resolution.arXiv preprint arXiv:2409.12191(2024). Conference acronym ’XX, June 03–05,2018, Woodstock, NY Yu Wang, Yonghui Yang, Le Wu, Yi Zhang, and Ri...

  54. [54]

    Shoujin Wang, Liang Hu, Yan Wang, Longbing Cao, Quan Z Sheng, and Mehmet Orgun. 2019. Sequential Recommender Systems: Challenges, Progress and Prospects. InIJCAI. 6332–6338

  55. [55]

    Yu Wang, Lei Sang, Yi Zhang, and Yiwen Zhang. 2025. Intent Representation Learning with Large Language Model for Recommendation. InSIGIR. 1870–1879

  56. [56]

    Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. Chain-of-Thought Prompting Elicits Rea- soning in Large Language Models. InNeurIPS, Vol. 35. 24824–24837

  57. [57]

    Wei Wei, Xubin Ren, Jiabin Tang, Qinyong Wang, Lixin Su, Suqi Cheng, Junfeng Wang, Dawei Yin, and Chao Huang. 2024. LLMRec: Large Language Models with Graph Augmentation for Recommendation. InWSDM. 806–815

  58. [58]

    Jiayang Wu, Wensheng Gan, Zefeng Chen, Shicheng Wan, and Philip S Yu. 2023. Multimodal Large Language Models: A Survey. InBigData. IEEE, 2247–2256

  59. [59]

    Xu Xie, Fei Sun, Zhaoyang Liu, Shiwen Wu, Jinyang Gao, Jiandong Zhang, Bolin Ding, and Bin Cui. 2022. Contrastive Learning for Sequential Recommendation. InICDE. 1259–1273

  60. [60]

    Zhihe Yang, Xufang Luo, Dongqi Han, Yunjian Xu, and Dongsheng Li. 2025. Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key. InCVPR. 10610–10620

  61. [61]

    Yuyang Ye, Zhi Zheng, Yishan Shen, Tianshu Wang, Hengruo Zhang, Peijun Zhu, Runlong Yu, Kai Zhang, and Hui Xiong. 2025. Harnessing Multimodal Large Language Models for Multimodal Sequential Recommendation. InAAAI, Vol. 39. 13069–13077

  62. [62]

    Tianyu Yu, Yuan Yao, Haoye Zhang, Taiwen He, Yifeng Han, Ganqu Cui, Jinyi Hu, Zhiyuan Liu, Hai-Tao Zheng, Maosong Sun, and Tat-Seng Chua. 2024. RLHF- V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback. InCVPR. 13807–13816

  63. [63]

    Zheng Yuan, Fajie Yuan, Yu Song, Youhua Li, Junchen Fu, Fei Yang, Yunzhu Pan, and Yongxin Ni. 2023. Where to Go Next for Recommender Systems? ID-vs. Modality-based Recommender Models Revisited. InSIGIR. 2639–2649

  64. [64]

    Zhenrui Yue, Sara Rabhi, Gabriel de Souza Pereira Moreira, Dong Wang, and Even Oldridge. 2023. LlamaRec: Two-Stage Recommendation using Large Language Models for Ranking. InCIKM Workshop on Personalized Generative AI

  65. [65]

    Dan Zhang, Yangliao Geng, Wenwen Gong, Zhongang Qi, Zhiyu Chen, Xing Tang, Ying Shan, Yuxiao Dong, and Jie Tang. 2024. RecDCL: Dual Contrastive Learning for Recommendation. InWWW. 3655–3666

  66. [66]

    Jizhi Zhang, Keqin Bao, Yang Zhang, Wenjie Wang, Fuli Feng, and Xiangnan He. 2023. Is ChatGPT Fair for Recommendation? Evaluating Fairness in Large Language Model Recommendation. InRecSys. 993–999

  67. [67]

    Jinghao Zhang, Yanqiao Zhu, Qiang Liu, Shu Wu, Shuhui Wang, and Liang Wang. 2021. Mining Latent Structures for Multimedia Recommendation. InMM. 3872–3880

  68. [68]

    Yang Zhang, Fuli Feng, Jizhi Zhang, Keqin Bao, Qifan Wang, and Xiangnan He

  69. [69]

    CoLLM: Integrating Collaborative Embeddings Into Large Language Models for Recommendation.IEEE Transactions on Knowledge and Data Engineering (TKDE)37, 5 (2025), 2329–2340

  70. [70]

    Yi Zhang, Yiwen Zhang, Yu Wang, Tong Chen, and Hongzhi Yin. 2025. To- wards Distribution Matching between Collaborative and Language Spaces for Generative Recommendation. InSIGIR. 2006–2016

  71. [71]

    Zizhuo Zhang and Bang Wang. 2023. Prompt Learning for News Recommendation. InSIGIR. 227–237

  72. [72]

    Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et al. 2023. A Survey of Large Language Models.arXiv preprint arXiv:2303.18223(2023)

  73. [73]

    Zhiyuan Zhao, Bin Wang, Linke Ouyang, Xiaoyi Dong, Jiaqi Wang, and Conghui He. 2023. Beyond Hallucinations: Enhancing LVLMs through Hallucination- Aware Direct Preference Optimization.arXiv preprint arXiv:2311.16839(2023)

  74. [74]

    Bowen Zheng, Yupeng Hou, Hongyu Lu, Yu Chen, Wayne Xin Zhao, Ming Chen, and Ji-Rong Wen. 2024. Adapting Large Language Models by Integrating Collaborative Semantics for Recommendation. InICDE. 1435–1448

  75. [75]

    Peilin Zhou, Chao Liu, Jing Ren, Xinfeng Zhou, Yueqi Xie, Meng Cao, Zhongtao Rao, You-Liang Huang, Dading Chong, Junling Liu, Jae Boum Kim, Shoujin Wang, Raymond Chi-Wing Wong, and Sunghun Kim. 2025. When Large Vision Language Models Meet Multimodal Sequential Recommendation: An Empirical Study. InWWW. 275–292

  76. [76]

    Yes” or “No

    Yiyang Zhou, Chenhang Cui, Rafael Rafailov, Chelsea Finn, and Huaxiu Yao. 2024. Aligning Modalities in Vision Large Language Models via Preference Fine-tuning. InICLR Workshop on Reliable and Responsible Foundation Models. Appendix In the Appendix, we first present the pseudo-code for the complete training of the proposed HaNoRec. Subsequently, we provide...