Multimodal Large Language Models with Adaptive Preference Optimization for Sequential Recommendation

Fei Liu; Le Wu; Richang Hong; Yi Zhang; Yonghui Yang; Yu Wang

arxiv: 2511.18740 · v2 · submitted 2025-11-24 · 💻 cs.IR

Multimodal Large Language Models with Adaptive Preference Optimization for Sequential Recommendation

Yu Wang , Yonghui Yang , Le Wu , Yi Zhang , Fei Liu , Richang Hong This is my paper

Pith reviewed 2026-05-17 06:12 UTC · model grok-4.3

classification 💻 cs.IR

keywords multimodal large language modelssequential recommendationpreference optimizationhardness-aware learningnoise regularizationcross-modal biasdirect preference optimization

0 comments

The pith

HaNoRec improves multimodal LLM recommendations by weighting harder training samples and adding Gaussian noise to correct modality biases.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a multimodal LLM framework called HaNoRec for sequential recommendation that combines supervised fine-tuning with an adapted form of direct preference optimization. It targets two problems in existing approaches: random negative samples cause the model to overfit easy cases while under-training on difficult ones, and fixed reference models in standard DPO lock in cross-modal misalignments between text and images. HaNoRec counters the first issue by estimating sample hardness on the fly and scaling optimization weights accordingly, while the second is addressed through Gaussian perturbations applied to the policy model's output logits. A sympathetic reader would care because this setup lets visual signals such as product images shape user preference modeling more reliably than text-only methods.

Core claim

HaNoRec integrates hardness-aware and noise-regularized preference optimization into multimodal LLMs for sequential recommendation. Optimization weights are adjusted dynamically according to the estimated hardness of each training sample and the policy model's current responsiveness, so harder examples receive more emphasis during training. Gaussian-perturbed distribution optimization is then applied to the output logits to strengthen cross-modal semantic consistency and reduce the modality bias inherited from the reference model.

What carries the argument

HaNoRec, a hardness-aware and noise-regularized preference optimization method that reweights samples by estimated difficulty and applies Gaussian perturbations to output logits.

If this is right

Prioritizing harder examples during optimization reduces overfitting to easy negatives sampled from user histories.
Gaussian perturbations on output logits produce more consistent alignments between textual descriptions and visual item features.
The policy model becomes less constrained by biases in the fixed reference model, especially across longer user sequences.
Recommendation performance improves when visual signals like product images or movie posters are incorporated into preference learning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same hardness estimation and noise regularization could be tested on non-recommendation tasks that involve long multimodal sequences.
If the method scales, it may reduce the need for extensive negative sampling strategies in other preference optimization pipelines.
Cross-modal consistency gains might translate to better handling of noisy or missing visual data in real-world recommendation settings.

Load-bearing premise

Hardness of training samples can be estimated reliably during training and Gaussian perturbations on logits will reduce cross-modal bias without creating new instabilities or lowering overall recommendation quality.

What would settle it

An ablation study that removes either the hardness-based reweighting or the Gaussian logit perturbations and measures whether recommendation metrics such as NDCG or Hit Rate on standard sequential datasets remain unchanged or degrade.

Figures

Figures reproduced from arXiv: 2511.18740 by Fei Liu, Le Wu, Richang Hong, Yi Zhang, Yonghui Yang, Yu Wang.

**Figure 1.** Figure 1: (a) Existing paradigms and performance comparisons of large model-based sequential recommendation; (b) Data [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Illustration of HaNoRec. Hardness-aware Reweighting Strategy (HaRS) scales DPO weight proportionally to sample hardness, spotlighting challenging cases. Noise-regularized Distribution Optimization (NoDO) adds Gaussian noise and adaptive KL to align title–image semantics and curb modality bias. where (h𝑗 + x𝑗) denotes the embedding of item 𝑣𝑗 , and (h𝑦∗ + x𝑦∗ ) represents the embedding of the corresponding … view at source ↗

**Figure 3.** Figure 3: Ablation studies of model variants on the all [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Case study on Movielens for user 4126. 2, showing that NoDO effectively mitigates title–image mismatch caused by MLLM modality bias. Finally, we conduct a blind test using GPT-4o [19], asking it to judge which recommendation list better matches the user based solely on historical interactions, without access to ground truth. As shown in the gray region of figure, GPT-4o finds that HaRS’s recommendations be… view at source ↗

**Figure 5.** Figure 5: Performance comparison w.r.t. different hyperpa [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: The prompt format for SFT paradigm [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗

read the original abstract

Recent advances in Large Language Models (LLMs) have opened new avenues for sequential recommendation by enabling natural language reasoning over user behavior sequences. A common approach formulates recommendation as a language modeling task, where interaction histories are transformed into prompts and user preferences are learned via supervised fine-tuning. However, these methods operate solely in the textual modality and often miss users' fine-grained interests, especially when shaped by rich visual signals such as product images or movie posters. Multimodal Large Language Models (MLLMs) offer a promising alternative by aligning text and vision in a shared semantic space. A prevalent training paradigm applies Supervised Fine-Tuning (SFT) followed by Direct Preference Optimization (DPO) to model user preferences. Yet, two core challenges remain: 1) Imbalanced sample hardness, where random negative sampling causes overfitting on easy examples and under-training on hard ones; 2) Cross-modal semantic bias, where the fixed reference model in DPO prevents the policy model from correcting modality misalignments--especially over long sequences. To address these issues, we propose a Multimodal LLM framework that integrates Hardness-aware and Noise-regularized preference optimization for Recommendation (HaNoRec). Specifically, HaNoRec dynamically adjusts optimization weights based on both the estimated hardness of each training sample and the policy model's real-time responsiveness, prioritizing harder examples. It further introduces Gaussian-perturbed distribution optimization on output logits to enhance cross-modal semantic consistency and reduce modality bias inherited from the reference model.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

HaNoRec layers hardness-aware reweighting and Gaussian logit perturbations onto DPO for multimodal sequential recs, but the dynamic estimation and noise steps rest on assumptions that need tighter checks.

read the letter

The main point is that HaNoRec combines hardness-aware weighting and Gaussian-perturbed logit optimization with DPO to tackle imbalanced training samples and cross-modal bias in multimodal LLM sequential recommendation. The paper does well in spotting two practical bottlenecks. Standard SFT plus DPO on MLLMs can overfit to easy negatives from random sampling and get stuck with biases from the reference model, especially over long sequences with visual data like images. The dynamic adjustment of weights using both sample hardness and the policy model's responsiveness is a sensible way to focus on harder examples. Adding Gaussian perturbations to the output logits is an attempt to improve semantic consistency across modalities. The soft spots come from how these mechanisms are supposed to work. Estimating hardness on the fly based on responsiveness risks a self-reinforcing cycle, where the model keeps emphasizing its current errors instead of fixed difficulty, and visual features can make this worse in sequential data. The Gaussian noise is a general regularizer that may not specifically reduce inherited modality bias; it could degrade ranking quality or fail to propagate fixes through the alignment layers. Without clear ablations or error analysis isolating these effects, it's hard to see if the gains are real or just from extra tuning. This paper is for people building or improving preference optimization methods for multimodal recommendation systems. A reader working on similar DPO adaptations would find the ideas worth considering, even if the execution needs more scrutiny. It deserves a serious referee because the targeted problems are relevant and the proposed solutions are specific enough to evaluate. I would recommend peer review, asking for more details on the hardness estimation process and experiments that test the noise perturbation's impact on bias correction versus general performance.

Referee Report

2 major / 1 minor

Summary. The paper proposes HaNoRec, a Multimodal Large Language Model framework for sequential recommendation. It integrates hardness-aware preference optimization that dynamically reweights DPO-style updates according to per-sample hardness estimates and the policy model's real-time responsiveness, together with noise-regularized optimization that applies Gaussian perturbations to output logits in order to improve cross-modal semantic consistency and reduce modality bias inherited from the fixed reference model.

Significance. If the adaptive reweighting and logit perturbation mechanisms can be shown to operate as described without introducing instabilities or circular dependencies, the work would address two practically relevant limitations of standard SFT+DPO pipelines for MLLM-based recommendation and could improve performance on visually rich item sequences.

major comments (2)

[Abstract] Abstract: the hardness-aware component is described as dynamically adjusting weights 'based on both the estimated hardness of each training sample and the policy model's real-time responsiveness,' yet no metric, estimator, or update rule is supplied. Because this reweighting is load-bearing for the claim of prioritizing harder examples, the absence of a concrete definition leaves open the risk that responsiveness correlates with quantities already shaped by the current policy (e.g., loss or logit magnitude), potentially creating a self-reinforcing loop rather than correcting intrinsic hardness.
[Abstract] Abstract: the noise-regularized component asserts that 'Gaussian-perturbed distribution optimization on output logits' will 'enhance cross-modal semantic consistency and reduce modality bias inherited from the reference model.' No derivation, variance schedule, or propagation analysis through the multimodal alignment layers is given, so it remains unclear whether the isotropic perturbation isolates and corrects modality misalignment or simply adds generic smoothing that may degrade ranking precision.

minor comments (1)

The abstract would be strengthened by a single high-level equation or pseudocode block that defines the combined loss or the responsiveness measure.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for your valuable comments on our paper. We have reviewed the concerns about the descriptions in the abstract and will update the manuscript accordingly to provide more concrete information.

read point-by-point responses

Referee: [Abstract] Abstract: the hardness-aware component is described as dynamically adjusting weights 'based on both the estimated hardness of each training sample and the policy model's real-time responsiveness,' yet no metric, estimator, or update rule is supplied. Because this reweighting is load-bearing for the claim of prioritizing harder examples, the absence of a concrete definition leaves open the risk that responsiveness correlates with quantities already shaped by the current policy (e.g., loss or logit magnitude), potentially creating a self-reinforcing loop rather than correcting intrinsic hardness.

Authors: We acknowledge the need for more specificity in the abstract. The full paper provides the hardness estimator as the per-sample DPO loss and the responsiveness as the policy model's update magnitude on the sample. The reweighting is formulated to use a lagged version of the responsiveness to break any potential self-reinforcing loop. We will revise the abstract to briefly describe these elements and refer readers to Section 3 for the full details and analysis demonstrating the absence of circular dependencies. revision: yes
Referee: [Abstract] Abstract: the noise-regularized component asserts that 'Gaussian-perturbed distribution optimization on output logits' will 'enhance cross-modal semantic consistency and reduce modality bias inherited from the reference model.' No derivation, variance schedule, or propagation analysis through the multimodal alignment layers is given, so it remains unclear whether the isotropic perturbation isolates and corrects modality misalignment or simply adds generic smoothing that may degrade ranking precision.

Authors: We agree that additional technical details would strengthen the abstract. In the revised version, we will include a short description of the Gaussian perturbation variance schedule and note that the analysis in Section 3.3 shows the perturbation is propagated through the multimodal layers in a way that specifically encourages correction of modality bias. We will also clarify that empirical results indicate no degradation in ranking precision. The full derivation is available in the appendix. revision: yes

Circularity Check

0 steps flagged

HaNoRec framework introduces adaptive weighting and logit perturbations without reducing claims to self-referential inputs or fitted predictions.

full rationale

The paper outlines a multimodal LLM approach extending SFT and DPO with hardness-aware reweighting based on sample difficulty and policy responsiveness plus Gaussian perturbations on output logits to address cross-modal bias. No equations or derivations are presented that equate a claimed prediction or optimization outcome directly to its own estimation procedure by construction. Hardness and responsiveness are described as dynamic quantities computed from training data and model behavior, but without shown reductions that make the reweighting tautological or force the noise regularization to be equivalent to the bias it targets. The approach cites standard preference optimization literature without load-bearing self-citations that import uniqueness theorems or smuggle ansatzes. The derivation remains self-contained with independent methodological content.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the method description implies standard LLM fine-tuning assumptions plus two new optimization heuristics whose details and justification are not supplied.

pith-pipeline@v0.9.0 · 5573 in / 1200 out tokens · 49642 ms · 2026-05-17T06:12:29.767425+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

HaRS dynamically adjusts optimization weights based on both the estimated hardness of each training sample and the policy model's real-time responsiveness... Gaussian-perturbed distribution optimization on output logits
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose a Multimodal LLM framework that integrates Hardness-aware and Noise-regularized preference optimization for Recommendation (HaNoRec)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

ProMax: Exploring the Potential of LLM-derived Profiles with Distribution Shaping for Recommender Systems
cs.IR 2026-04 unverdicted novelty 7.0

ProMax uses dense retrieval and dual distribution reshaping on LLM-derived profiles to guide recommender models toward preferences for unseen items, substantially boosting base model performance on public datasets.
DIAURec: Dual-Intent Space Representation Optimization for Recommendation
cs.IR 2026-04 unverdicted novelty 5.0

DIAURec unifies intent and language modeling to reconstruct and optimize representations in prototype and distribution spaces, outperforming baselines on three datasets.

Reference graph

Works this paper leans on

76 extracted references · 76 canonical work pages · cited by 2 Pith papers · 13 internal anchors

[1]

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Floren- cia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. 2023. Gpt-4 Technical Report.arXiv preprint arXiv:2303.08774 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[2]

Alexander A Alemi, Ian Fischer, Joshua V Dillon, and Kevin Murphy. 2017. Deep Variational Information Bottleneck. InICLR

work page 2017
[3]

Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, Humen Zhong, Yuanzhi Zhu, Mingkun Yang, Zhaohai Li, Jianqiang Wan, Pengfei Wang, Wei Ding, Zheren Fu, Yiheng Xu, Jiabo Ye, Xi Zhang, Tianbao Xie, Zesen Cheng, Hang Zhang, Zhibo Yang, Haiyang Xu, and Junyang Lin. 2025. Qwen2.5-VL Technical Rep...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[4]

Zechen Bai, Pichao Wang, Tianjun Xiao, Tong He, Zongbo Han, Zheng Zhang, and Mike Zheng Shou. 2024. Hallucination of Multimodal Large Language Models: A Survey.arXiv preprint arXiv:2404.18930(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[5]

Keqin Bao, Jizhi Zhang, Wenjie Wang, Yang Zhang, Zhengyi Yang, Yanchen Luo, Chong Chen, Fuli Feng, and Qi Tian. 2025. A Bi-Step Grounding Paradigm for Large Language Models in Recommendation Systems.ACM Transactions on Recommender Systems (TORS)3, 4 (2025), 1–27

work page 2025
[6]

Keqin Bao, Jizhi Zhang, Yang Zhang, Wenjie Wang, Fuli Feng, and Xiangnan He. 2023. TallRec: An Effective and Efficient Tuning Framework to Align lLarge Language Model with Recommendation. InRecSys. 1007–1014

work page 2023
[7]

Jiawei Chen, Hongyu Lin, Xianpei Han, and Le Sun. 2024. Benchmarking Large Language Models in Retrieval-Augmented Generation. InAAAI, Vol. 38. 17754– 17762

work page 2024
[8]

Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv preprint arXiv:1412.3555(2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[9]

Sunhao Dai, Chen Xu, Shicheng Xu, Liang Pang, Zhenhua Dong, and Jun Xu

work page
[10]

Bias and Unfairness in Information Retrieval Systems: New Challenges in the LLM Era. InKDD. 6437–6447

work page
[11]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL. 4171–4186

work page 2019
[12]

Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. 2024. The Llama 3 Herd of Models.arXiv preprint arXiv:2407.21783(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[13]

Chongming Gao, Ruijun Chen, Shuai Yuan, Kexin Huang, Yuanqing Yu, and Xiangnan He. 2025. SPRec: Self-Play to Debias LLM-based Recommendation. In WWW. 5075–5084

work page 2025
[14]

Yunfan Gao, Tao Sheng, Youlin Xiang, Yun Xiong, Haofen Wang, and Jiawei Zhang. 2023. Chat-Rec: Towards Interactive and Explainable LLMs-augmented Recommender System.arXiv preprint arXiv:2303.14524(2023)

work page arXiv 2023
[15]

Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, and Meng Wang. 2020. Lightgcn: Simplifying and Powering Graph Convolution Network for Recommendation. InSIGIR. 639–648

work page 2020
[16]

Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural Collaborative Filtering. InWWW. 173–182

work page 2017
[17]

Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk

work page
[18]

Session-based Recommendations with Recurrent Neural Networks. In ICLR

work page
[19]

Yupeng Hou, Jiacheng Li, Zhankui He, An Yan, Xiusi Chen, and Julian McAuley

work page
[20]

Bridging language and items for retrieval and recommendation.arXiv preprint arXiv:2403.03952(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[21]

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. 2022. Lora: Low-rank Adaptation of Large Language Models.. InICLR

work page 2022
[22]

Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. 2024. GPT-4o System Card.arXiv preprint arXiv:2410.21276(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[23]

Jiaming Ji, Mickel Liu, Josef Dai, Xuehai Pan, Chi Zhang, Ce Bian, Boyuan Chen, Ruiyang Sun, Yizhou Wang, and Yaodong Yang. 2023. Beavertails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset. InNeurIPS, Vol. 36. 24678–24704

work page 2023
[24]

Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive Sequential Recom- mendation. InICDM. 197–206

work page 2018
[25]

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. 2020. Scaling Laws for Neural Language Models.arXiv preprint arXiv:2001.08361(2020)

work page internal anchor Pith review Pith/arXiv arXiv 2020
[26]

Feng Li, Renrui Zhang, Hao Zhang, Yuanhan Zhang, Bo Li, Wei Li, Zejun Ma, and Chunyuan Li. 2024. LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models.arXiv preprint arXiv:2407.07895(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[27]

Lei Li, Zhihui Xie, Mukai Li, Shunian Chen, Peiyi Wang, Liang Chen, Yazheng Yang, Benyou Wang, Lingpeng Kong, and Qi Liu. 2024. VLFeedback: A Large- Scale AI Feedback Dataset for Large Vision-Language Models Alignment.arXiv preprint arXiv:2410.09421(2024)

work page arXiv 2024
[28]

Jiayi Liao, Xiangnan He, Ruobing Xie, Jiancan Wu, Yancheng Yuan, Xingwu Sun, Zhanhui Kang, and Xiang Wang. 2024. RosePO: Aligning LLM-based Recom- menders with Human Values.arXiv preprint arXiv:2410.12519(2024)

work page arXiv 2024
[29]

Jiayi Liao, Sihang Li, Zhengyi Yang, Jiancan Wu, Yancheng Yuan, Xiang Wang, and Xiangnan He. 2024. Llara: Large Language-Recommendation Assistant. In SIGIR. 1785–1795

work page 2024
[30]

Yuqing Liu, Yu Wang, Lichao Sun, and Philip S Yu. 2024. Rec-GPT4V: Multi- modal Recommendation with Large Vision-Language Models.arXiv preprint arXiv:2402.08670(2024)

work page arXiv 2024
[31]

Jinda Lu, Junkang Wu, Jinghan Li, Xiaojun Jia, Shuo Wang, YiFan Zhang, Junfeng Fang, Xiang Wang, and Xiangnan He. 2025. DAMO: Data-and Model-aware Alignment of Multi-modal LLMs. InICML

work page 2025
[32]

Yu Meng, Mengzhou Xia, and Danqi Chen. 2024. SimPO: Simple Preference Optimization with a Reference-Free Reward. InNeurIPS, Vol. 37. 124198–124235

work page 2024
[33]

Yongxin Ni, Yu Cheng, Xiangyan Liu, Junchen Fu, Youhua Li, Xiangnan He, Yongfeng Zhang, and Fajie Yuan. 2023. A Content-driven Micro-video Recom- mendation Dataset at Scale.arXiv preprint arXiv:2309.15379(2023)

work page arXiv 2023
[34]

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schul- man, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F Christiano, Jan Leike, and Ryan Lowe. 2022. Training Lan- guage Models to Follow Instructions with Hum...

work page 2022
[35]

Yingtao Peng, Chen Gao, Yu Zhang, Tangpeng Dan, Xiaoyi Du, Hengliang Luo, Yong Li, and Xiaofeng Meng. 2025. Denoising alignment with large language model for recommendation.ACM Transactions on Information Systems (TOIS)43, 2 (2025), 1–35

work page 2025
[36]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sand- hini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al

work page
[37]

Learning Transferable Visual Models From Natural Language Supervision. InICML. 8748–8763

work page
[38]

Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. 2023. Direct Preference Optimization: Your Language Model is Secretly a Reward Model. InNeurIPS, Vol. 36. 53728–53741

work page 2023
[39]

Xubin Ren and Chao Huang. 2024. EasyRec: Simple yet Effective Language Models for Recommendation.arXiv preprint arXiv:2408.08821(2024)

work page arXiv 2024
[40]

Xubin Ren, Jiabin Tang, Dawei Yin, Nitesh Chawla, and Chao Huang. 2024. A Survey of Large Language Models for Graphs. InKDD. 6616–6626

work page 2024
[41]

Xubin Ren, Wei Wei, Lianghao Xia, Lixin Su, Suqi Cheng, Junfeng Wang, Dawei Yin, and Chao Huang. 2024. Representation Learning with Large Language Models for Recommendation. InWWW. 3464–3475

work page 2024
[42]

Francesco Ricci, Lior Rokach, and Bracha Shapira. 2010. Introduction to Rec- ommender Systems Handbook. InRecommender Systems Handbook. Springer, 1–35

work page 2010
[43]

Lei Sang, Yu Wang, Yi Zhang, Yiwen Zhang, and Xindong Wu. 2025. Intent- guided Heterogeneous Graph Contrastive Learning for Recommendation.IEEE Transactions on Knowledge and Data Engineering (TKDE)37, 4 (2025), 1915–1929

work page 2025
[44]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov

work page
[45]

Proximal Policy Optimization Algorithms.arXiv preprint arXiv:1707.06347 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[46]

Leheng Sheng, An Zhang, Yi Zhang, Yuxin Chen, Xiang Wang, and Tat-Seng Chua

work page
[47]

Language Representations Can be What Recommenders Need: Findings and Potentials. InICLR

work page
[48]

Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang

work page
[49]

BERT4Rec: Sequential Recommendation with Bidirectional Encoder Repre- sentations from Transformer. InCIKM. 1441–1450

work page
[50]

Zhiqing Sun, Sheng Shen, Shengcao Cao, Haotian Liu, Chunyuan Li, Yikang Shen, Chuang Gan, Liang-Yan Gui, Yu-Xiong Wang, Yiming Yang, et al. 2024. Aligning Large Multimodal Models with Factually Augmented RLHF. InACL

work page 2024
[51]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. InNeurIPS, Vol. 30

work page 2017
[52]

Fei Wang, Wenxuan Zhou, James Y Huang, Nan Xu, Sheng Zhang, Hoifung Poon, and Muhao Chen. 2024. mDPO: Conditional Preference Optimization for Multimodal Large Language Models. InEMNLP. 8078–8088

work page 2024
[53]

Peng Wang, Shuai Bai, Sinan Tan, Shijie Wang, Zhihao Fan, Jinze Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, et al. 2024. Qwen2-VL: Enhancing Vision-Language Model’s Perception of the World at Any Resolution.arXiv preprint arXiv:2409.12191(2024). Conference acronym ’XX, June 03–05,2018, Woodstock, NY Yu Wang, Yonghui Yang, Le Wu, Yi Zhang, and Ri...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[54]

Shoujin Wang, Liang Hu, Yan Wang, Longbing Cao, Quan Z Sheng, and Mehmet Orgun. 2019. Sequential Recommender Systems: Challenges, Progress and Prospects. InIJCAI. 6332–6338

work page 2019
[55]

Yu Wang, Lei Sang, Yi Zhang, and Yiwen Zhang. 2025. Intent Representation Learning with Large Language Model for Recommendation. InSIGIR. 1870–1879

work page 2025
[56]

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. Chain-of-Thought Prompting Elicits Rea- soning in Large Language Models. InNeurIPS, Vol. 35. 24824–24837

work page 2022
[57]

Wei Wei, Xubin Ren, Jiabin Tang, Qinyong Wang, Lixin Su, Suqi Cheng, Junfeng Wang, Dawei Yin, and Chao Huang. 2024. LLMRec: Large Language Models with Graph Augmentation for Recommendation. InWSDM. 806–815

work page 2024
[58]

Jiayang Wu, Wensheng Gan, Zefeng Chen, Shicheng Wan, and Philip S Yu. 2023. Multimodal Large Language Models: A Survey. InBigData. IEEE, 2247–2256

work page 2023
[59]

Xu Xie, Fei Sun, Zhaoyang Liu, Shiwen Wu, Jinyang Gao, Jiandong Zhang, Bolin Ding, and Bin Cui. 2022. Contrastive Learning for Sequential Recommendation. InICDE. 1259–1273

work page 2022
[60]

Zhihe Yang, Xufang Luo, Dongqi Han, Yunjian Xu, and Dongsheng Li. 2025. Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key. InCVPR. 10610–10620

work page 2025
[61]

Yuyang Ye, Zhi Zheng, Yishan Shen, Tianshu Wang, Hengruo Zhang, Peijun Zhu, Runlong Yu, Kai Zhang, and Hui Xiong. 2025. Harnessing Multimodal Large Language Models for Multimodal Sequential Recommendation. InAAAI, Vol. 39. 13069–13077

work page 2025
[62]

Tianyu Yu, Yuan Yao, Haoye Zhang, Taiwen He, Yifeng Han, Ganqu Cui, Jinyi Hu, Zhiyuan Liu, Hai-Tao Zheng, Maosong Sun, and Tat-Seng Chua. 2024. RLHF- V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback. InCVPR. 13807–13816

work page 2024
[63]

Zheng Yuan, Fajie Yuan, Yu Song, Youhua Li, Junchen Fu, Fei Yang, Yunzhu Pan, and Yongxin Ni. 2023. Where to Go Next for Recommender Systems? ID-vs. Modality-based Recommender Models Revisited. InSIGIR. 2639–2649

work page 2023
[64]

Zhenrui Yue, Sara Rabhi, Gabriel de Souza Pereira Moreira, Dong Wang, and Even Oldridge. 2023. LlamaRec: Two-Stage Recommendation using Large Language Models for Ranking. InCIKM Workshop on Personalized Generative AI

work page 2023
[65]

Dan Zhang, Yangliao Geng, Wenwen Gong, Zhongang Qi, Zhiyu Chen, Xing Tang, Ying Shan, Yuxiao Dong, and Jie Tang. 2024. RecDCL: Dual Contrastive Learning for Recommendation. InWWW. 3655–3666

work page 2024
[66]

Jizhi Zhang, Keqin Bao, Yang Zhang, Wenjie Wang, Fuli Feng, and Xiangnan He. 2023. Is ChatGPT Fair for Recommendation? Evaluating Fairness in Large Language Model Recommendation. InRecSys. 993–999

work page 2023
[67]

Jinghao Zhang, Yanqiao Zhu, Qiang Liu, Shu Wu, Shuhui Wang, and Liang Wang. 2021. Mining Latent Structures for Multimedia Recommendation. InMM. 3872–3880

work page 2021
[68]

Yang Zhang, Fuli Feng, Jizhi Zhang, Keqin Bao, Qifan Wang, and Xiangnan He

work page
[69]

CoLLM: Integrating Collaborative Embeddings Into Large Language Models for Recommendation.IEEE Transactions on Knowledge and Data Engineering (TKDE)37, 5 (2025), 2329–2340

work page 2025
[70]

Yi Zhang, Yiwen Zhang, Yu Wang, Tong Chen, and Hongzhi Yin. 2025. To- wards Distribution Matching between Collaborative and Language Spaces for Generative Recommendation. InSIGIR. 2006–2016

work page 2025
[71]

Zizhuo Zhang and Bang Wang. 2023. Prompt Learning for News Recommendation. InSIGIR. 227–237

work page 2023
[72]

Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et al. 2023. A Survey of Large Language Models.arXiv preprint arXiv:2303.18223(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[73]

Zhiyuan Zhao, Bin Wang, Linke Ouyang, Xiaoyi Dong, Jiaqi Wang, and Conghui He. 2023. Beyond Hallucinations: Enhancing LVLMs through Hallucination- Aware Direct Preference Optimization.arXiv preprint arXiv:2311.16839(2023)

work page internal anchor Pith review arXiv 2023
[74]

Bowen Zheng, Yupeng Hou, Hongyu Lu, Yu Chen, Wayne Xin Zhao, Ming Chen, and Ji-Rong Wen. 2024. Adapting Large Language Models by Integrating Collaborative Semantics for Recommendation. InICDE. 1435–1448

work page 2024
[75]

Peilin Zhou, Chao Liu, Jing Ren, Xinfeng Zhou, Yueqi Xie, Meng Cao, Zhongtao Rao, You-Liang Huang, Dading Chong, Junling Liu, Jae Boum Kim, Shoujin Wang, Raymond Chi-Wing Wong, and Sunghun Kim. 2025. When Large Vision Language Models Meet Multimodal Sequential Recommendation: An Empirical Study. InWWW. 275–292

work page 2025
[76]

Yes” or “No

Yiyang Zhou, Chenhang Cui, Rafael Rafailov, Chelsea Finn, and Huaxiu Yao. 2024. Aligning Modalities in Vision Large Language Models via Preference Fine-tuning. InICLR Workshop on Reliable and Responsible Foundation Models. Appendix In the Appendix, we first present the pseudo-code for the complete training of the proposed HaNoRec. Subsequently, we provide...

work page 2024

[1] [1]

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Floren- cia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. 2023. Gpt-4 Technical Report.arXiv preprint arXiv:2303.08774 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[2] [2]

Alexander A Alemi, Ian Fischer, Joshua V Dillon, and Kevin Murphy. 2017. Deep Variational Information Bottleneck. InICLR

work page 2017

[3] [3]

Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, Humen Zhong, Yuanzhi Zhu, Mingkun Yang, Zhaohai Li, Jianqiang Wan, Pengfei Wang, Wei Ding, Zheren Fu, Yiheng Xu, Jiabo Ye, Xi Zhang, Tianbao Xie, Zesen Cheng, Hang Zhang, Zhibo Yang, Haiyang Xu, and Junyang Lin. 2025. Qwen2.5-VL Technical Rep...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[4] [4]

Zechen Bai, Pichao Wang, Tianjun Xiao, Tong He, Zongbo Han, Zheng Zhang, and Mike Zheng Shou. 2024. Hallucination of Multimodal Large Language Models: A Survey.arXiv preprint arXiv:2404.18930(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[5] [5]

Keqin Bao, Jizhi Zhang, Wenjie Wang, Yang Zhang, Zhengyi Yang, Yanchen Luo, Chong Chen, Fuli Feng, and Qi Tian. 2025. A Bi-Step Grounding Paradigm for Large Language Models in Recommendation Systems.ACM Transactions on Recommender Systems (TORS)3, 4 (2025), 1–27

work page 2025

[6] [6]

Keqin Bao, Jizhi Zhang, Yang Zhang, Wenjie Wang, Fuli Feng, and Xiangnan He. 2023. TallRec: An Effective and Efficient Tuning Framework to Align lLarge Language Model with Recommendation. InRecSys. 1007–1014

work page 2023

[7] [7]

Jiawei Chen, Hongyu Lin, Xianpei Han, and Le Sun. 2024. Benchmarking Large Language Models in Retrieval-Augmented Generation. InAAAI, Vol. 38. 17754– 17762

work page 2024

[8] [8]

Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv preprint arXiv:1412.3555(2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[9] [9]

Sunhao Dai, Chen Xu, Shicheng Xu, Liang Pang, Zhenhua Dong, and Jun Xu

work page

[10] [10]

Bias and Unfairness in Information Retrieval Systems: New Challenges in the LLM Era. InKDD. 6437–6447

work page

[11] [11]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL. 4171–4186

work page 2019

[12] [12]

Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. 2024. The Llama 3 Herd of Models.arXiv preprint arXiv:2407.21783(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[13] [13]

Chongming Gao, Ruijun Chen, Shuai Yuan, Kexin Huang, Yuanqing Yu, and Xiangnan He. 2025. SPRec: Self-Play to Debias LLM-based Recommendation. In WWW. 5075–5084

work page 2025

[14] [14]

Yunfan Gao, Tao Sheng, Youlin Xiang, Yun Xiong, Haofen Wang, and Jiawei Zhang. 2023. Chat-Rec: Towards Interactive and Explainable LLMs-augmented Recommender System.arXiv preprint arXiv:2303.14524(2023)

work page arXiv 2023

[15] [15]

Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, and Meng Wang. 2020. Lightgcn: Simplifying and Powering Graph Convolution Network for Recommendation. InSIGIR. 639–648

work page 2020

[16] [16]

Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural Collaborative Filtering. InWWW. 173–182

work page 2017

[17] [17]

Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk

work page

[18] [18]

Session-based Recommendations with Recurrent Neural Networks. In ICLR

work page

[19] [19]

Yupeng Hou, Jiacheng Li, Zhankui He, An Yan, Xiusi Chen, and Julian McAuley

work page

[20] [20]

Bridging language and items for retrieval and recommendation.arXiv preprint arXiv:2403.03952(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[21] [21]

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. 2022. Lora: Low-rank Adaptation of Large Language Models.. InICLR

work page 2022

[22] [22]

Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. 2024. GPT-4o System Card.arXiv preprint arXiv:2410.21276(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[23] [23]

Jiaming Ji, Mickel Liu, Josef Dai, Xuehai Pan, Chi Zhang, Ce Bian, Boyuan Chen, Ruiyang Sun, Yizhou Wang, and Yaodong Yang. 2023. Beavertails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset. InNeurIPS, Vol. 36. 24678–24704

work page 2023

[24] [24]

Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive Sequential Recom- mendation. InICDM. 197–206

work page 2018

[25] [25]

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. 2020. Scaling Laws for Neural Language Models.arXiv preprint arXiv:2001.08361(2020)

work page internal anchor Pith review Pith/arXiv arXiv 2020

[26] [26]

Feng Li, Renrui Zhang, Hao Zhang, Yuanhan Zhang, Bo Li, Wei Li, Zejun Ma, and Chunyuan Li. 2024. LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models.arXiv preprint arXiv:2407.07895(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[27] [27]

Lei Li, Zhihui Xie, Mukai Li, Shunian Chen, Peiyi Wang, Liang Chen, Yazheng Yang, Benyou Wang, Lingpeng Kong, and Qi Liu. 2024. VLFeedback: A Large- Scale AI Feedback Dataset for Large Vision-Language Models Alignment.arXiv preprint arXiv:2410.09421(2024)

work page arXiv 2024

[28] [28]

Jiayi Liao, Xiangnan He, Ruobing Xie, Jiancan Wu, Yancheng Yuan, Xingwu Sun, Zhanhui Kang, and Xiang Wang. 2024. RosePO: Aligning LLM-based Recom- menders with Human Values.arXiv preprint arXiv:2410.12519(2024)

work page arXiv 2024

[29] [29]

Jiayi Liao, Sihang Li, Zhengyi Yang, Jiancan Wu, Yancheng Yuan, Xiang Wang, and Xiangnan He. 2024. Llara: Large Language-Recommendation Assistant. In SIGIR. 1785–1795

work page 2024

[30] [30]

Yuqing Liu, Yu Wang, Lichao Sun, and Philip S Yu. 2024. Rec-GPT4V: Multi- modal Recommendation with Large Vision-Language Models.arXiv preprint arXiv:2402.08670(2024)

work page arXiv 2024

[31] [31]

Jinda Lu, Junkang Wu, Jinghan Li, Xiaojun Jia, Shuo Wang, YiFan Zhang, Junfeng Fang, Xiang Wang, and Xiangnan He. 2025. DAMO: Data-and Model-aware Alignment of Multi-modal LLMs. InICML

work page 2025

[32] [32]

Yu Meng, Mengzhou Xia, and Danqi Chen. 2024. SimPO: Simple Preference Optimization with a Reference-Free Reward. InNeurIPS, Vol. 37. 124198–124235

work page 2024

[33] [33]

Yongxin Ni, Yu Cheng, Xiangyan Liu, Junchen Fu, Youhua Li, Xiangnan He, Yongfeng Zhang, and Fajie Yuan. 2023. A Content-driven Micro-video Recom- mendation Dataset at Scale.arXiv preprint arXiv:2309.15379(2023)

work page arXiv 2023

[34] [34]

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schul- man, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F Christiano, Jan Leike, and Ryan Lowe. 2022. Training Lan- guage Models to Follow Instructions with Hum...

work page 2022

[35] [35]

Yingtao Peng, Chen Gao, Yu Zhang, Tangpeng Dan, Xiaoyi Du, Hengliang Luo, Yong Li, and Xiaofeng Meng. 2025. Denoising alignment with large language model for recommendation.ACM Transactions on Information Systems (TOIS)43, 2 (2025), 1–35

work page 2025

[36] [36]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sand- hini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al

work page

[37] [37]

Learning Transferable Visual Models From Natural Language Supervision. InICML. 8748–8763

work page

[38] [38]

Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. 2023. Direct Preference Optimization: Your Language Model is Secretly a Reward Model. InNeurIPS, Vol. 36. 53728–53741

work page 2023

[39] [39]

Xubin Ren and Chao Huang. 2024. EasyRec: Simple yet Effective Language Models for Recommendation.arXiv preprint arXiv:2408.08821(2024)

work page arXiv 2024

[40] [40]

Xubin Ren, Jiabin Tang, Dawei Yin, Nitesh Chawla, and Chao Huang. 2024. A Survey of Large Language Models for Graphs. InKDD. 6616–6626

work page 2024

[41] [41]

Xubin Ren, Wei Wei, Lianghao Xia, Lixin Su, Suqi Cheng, Junfeng Wang, Dawei Yin, and Chao Huang. 2024. Representation Learning with Large Language Models for Recommendation. InWWW. 3464–3475

work page 2024

[42] [42]

Francesco Ricci, Lior Rokach, and Bracha Shapira. 2010. Introduction to Rec- ommender Systems Handbook. InRecommender Systems Handbook. Springer, 1–35

work page 2010

[43] [43]

Lei Sang, Yu Wang, Yi Zhang, Yiwen Zhang, and Xindong Wu. 2025. Intent- guided Heterogeneous Graph Contrastive Learning for Recommendation.IEEE Transactions on Knowledge and Data Engineering (TKDE)37, 4 (2025), 1915–1929

work page 2025

[44] [44]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov

work page

[45] [45]

Proximal Policy Optimization Algorithms.arXiv preprint arXiv:1707.06347 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[46] [46]

Leheng Sheng, An Zhang, Yi Zhang, Yuxin Chen, Xiang Wang, and Tat-Seng Chua

work page

[47] [47]

Language Representations Can be What Recommenders Need: Findings and Potentials. InICLR

work page

[48] [48]

Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang

work page

[49] [49]

BERT4Rec: Sequential Recommendation with Bidirectional Encoder Repre- sentations from Transformer. InCIKM. 1441–1450

work page

[50] [50]

Zhiqing Sun, Sheng Shen, Shengcao Cao, Haotian Liu, Chunyuan Li, Yikang Shen, Chuang Gan, Liang-Yan Gui, Yu-Xiong Wang, Yiming Yang, et al. 2024. Aligning Large Multimodal Models with Factually Augmented RLHF. InACL

work page 2024

[51] [51]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. InNeurIPS, Vol. 30

work page 2017

[52] [52]

Fei Wang, Wenxuan Zhou, James Y Huang, Nan Xu, Sheng Zhang, Hoifung Poon, and Muhao Chen. 2024. mDPO: Conditional Preference Optimization for Multimodal Large Language Models. InEMNLP. 8078–8088

work page 2024

[53] [53]

Peng Wang, Shuai Bai, Sinan Tan, Shijie Wang, Zhihao Fan, Jinze Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, et al. 2024. Qwen2-VL: Enhancing Vision-Language Model’s Perception of the World at Any Resolution.arXiv preprint arXiv:2409.12191(2024). Conference acronym ’XX, June 03–05,2018, Woodstock, NY Yu Wang, Yonghui Yang, Le Wu, Yi Zhang, and Ri...

work page internal anchor Pith review Pith/arXiv arXiv 2024

[54] [54]

Shoujin Wang, Liang Hu, Yan Wang, Longbing Cao, Quan Z Sheng, and Mehmet Orgun. 2019. Sequential Recommender Systems: Challenges, Progress and Prospects. InIJCAI. 6332–6338

work page 2019

[55] [55]

Yu Wang, Lei Sang, Yi Zhang, and Yiwen Zhang. 2025. Intent Representation Learning with Large Language Model for Recommendation. InSIGIR. 1870–1879

work page 2025

[56] [56]

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. Chain-of-Thought Prompting Elicits Rea- soning in Large Language Models. InNeurIPS, Vol. 35. 24824–24837

work page 2022

[57] [57]

Wei Wei, Xubin Ren, Jiabin Tang, Qinyong Wang, Lixin Su, Suqi Cheng, Junfeng Wang, Dawei Yin, and Chao Huang. 2024. LLMRec: Large Language Models with Graph Augmentation for Recommendation. InWSDM. 806–815

work page 2024

[58] [58]

Jiayang Wu, Wensheng Gan, Zefeng Chen, Shicheng Wan, and Philip S Yu. 2023. Multimodal Large Language Models: A Survey. InBigData. IEEE, 2247–2256

work page 2023

[59] [59]

Xu Xie, Fei Sun, Zhaoyang Liu, Shiwen Wu, Jinyang Gao, Jiandong Zhang, Bolin Ding, and Bin Cui. 2022. Contrastive Learning for Sequential Recommendation. InICDE. 1259–1273

work page 2022

[60] [60]

Zhihe Yang, Xufang Luo, Dongqi Han, Yunjian Xu, and Dongsheng Li. 2025. Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key. InCVPR. 10610–10620

work page 2025

[61] [61]

Yuyang Ye, Zhi Zheng, Yishan Shen, Tianshu Wang, Hengruo Zhang, Peijun Zhu, Runlong Yu, Kai Zhang, and Hui Xiong. 2025. Harnessing Multimodal Large Language Models for Multimodal Sequential Recommendation. InAAAI, Vol. 39. 13069–13077

work page 2025

[62] [62]

Tianyu Yu, Yuan Yao, Haoye Zhang, Taiwen He, Yifeng Han, Ganqu Cui, Jinyi Hu, Zhiyuan Liu, Hai-Tao Zheng, Maosong Sun, and Tat-Seng Chua. 2024. RLHF- V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback. InCVPR. 13807–13816

work page 2024

[63] [63]

Zheng Yuan, Fajie Yuan, Yu Song, Youhua Li, Junchen Fu, Fei Yang, Yunzhu Pan, and Yongxin Ni. 2023. Where to Go Next for Recommender Systems? ID-vs. Modality-based Recommender Models Revisited. InSIGIR. 2639–2649

work page 2023

[64] [64]

Zhenrui Yue, Sara Rabhi, Gabriel de Souza Pereira Moreira, Dong Wang, and Even Oldridge. 2023. LlamaRec: Two-Stage Recommendation using Large Language Models for Ranking. InCIKM Workshop on Personalized Generative AI

work page 2023

[65] [65]

Dan Zhang, Yangliao Geng, Wenwen Gong, Zhongang Qi, Zhiyu Chen, Xing Tang, Ying Shan, Yuxiao Dong, and Jie Tang. 2024. RecDCL: Dual Contrastive Learning for Recommendation. InWWW. 3655–3666

work page 2024

[66] [66]

Jizhi Zhang, Keqin Bao, Yang Zhang, Wenjie Wang, Fuli Feng, and Xiangnan He. 2023. Is ChatGPT Fair for Recommendation? Evaluating Fairness in Large Language Model Recommendation. InRecSys. 993–999

work page 2023

[67] [67]

Jinghao Zhang, Yanqiao Zhu, Qiang Liu, Shu Wu, Shuhui Wang, and Liang Wang. 2021. Mining Latent Structures for Multimedia Recommendation. InMM. 3872–3880

work page 2021

[68] [68]

Yang Zhang, Fuli Feng, Jizhi Zhang, Keqin Bao, Qifan Wang, and Xiangnan He

work page

[69] [69]

CoLLM: Integrating Collaborative Embeddings Into Large Language Models for Recommendation.IEEE Transactions on Knowledge and Data Engineering (TKDE)37, 5 (2025), 2329–2340

work page 2025

[70] [70]

Yi Zhang, Yiwen Zhang, Yu Wang, Tong Chen, and Hongzhi Yin. 2025. To- wards Distribution Matching between Collaborative and Language Spaces for Generative Recommendation. InSIGIR. 2006–2016

work page 2025

[71] [71]

Zizhuo Zhang and Bang Wang. 2023. Prompt Learning for News Recommendation. InSIGIR. 227–237

work page 2023

[72] [72]

Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et al. 2023. A Survey of Large Language Models.arXiv preprint arXiv:2303.18223(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[73] [73]

Zhiyuan Zhao, Bin Wang, Linke Ouyang, Xiaoyi Dong, Jiaqi Wang, and Conghui He. 2023. Beyond Hallucinations: Enhancing LVLMs through Hallucination- Aware Direct Preference Optimization.arXiv preprint arXiv:2311.16839(2023)

work page internal anchor Pith review arXiv 2023

[74] [74]

Bowen Zheng, Yupeng Hou, Hongyu Lu, Yu Chen, Wayne Xin Zhao, Ming Chen, and Ji-Rong Wen. 2024. Adapting Large Language Models by Integrating Collaborative Semantics for Recommendation. InICDE. 1435–1448

work page 2024

[75] [75]

Peilin Zhou, Chao Liu, Jing Ren, Xinfeng Zhou, Yueqi Xie, Meng Cao, Zhongtao Rao, You-Liang Huang, Dading Chong, Junling Liu, Jae Boum Kim, Shoujin Wang, Raymond Chi-Wing Wong, and Sunghun Kim. 2025. When Large Vision Language Models Meet Multimodal Sequential Recommendation: An Empirical Study. InWWW. 275–292

work page 2025

[76] [76]

Yes” or “No

Yiyang Zhou, Chenhang Cui, Rafael Rafailov, Chelsea Finn, and Huaxiu Yao. 2024. Aligning Modalities in Vision Large Language Models via Preference Fine-tuning. InICLR Workshop on Reliable and Responsible Foundation Models. Appendix In the Appendix, we first present the pseudo-code for the complete training of the proposed HaNoRec. Subsequently, we provide...

work page 2024