RecGOAT: Graph Optimal Adaptive Transport for LLM-Enhanced Multimodal Recommendation with Dual Semantic Alignment

Chi Lu; Hengwei Ju; Kun Gai; Peng Jiang; Wei Yang; Yuecheng Li; Zeyu Song

arxiv: 2602.00682 · v2 · pith:B4JTIBYHnew · submitted 2026-01-31 · 💻 cs.IR · cs.AI

RecGOAT: Graph Optimal Adaptive Transport for LLM-Enhanced Multimodal Recommendation with Dual Semantic Alignment

Yuecheng Li , Hengwei Ju , Zeyu Song , Wei Yang , Chi Lu , Peng Jiang , Kun Gai This is my paper

classification 💻 cs.IR cs.AI

keywords alignmentrecommendationrepresentationsrecgoatmultimodaloptimalsemantictransport

0 comments

read the original abstract

Integrating large language model (LLM) representations into multimodal recommendation has shown promise, yet a fundamental challenge remains largely overlooked: the semantic heterogeneity between generative LM representations and the ID-based collaborative signals that recommendation systems rely on. Naively injecting LM features without alignment degrades recommendation performance rather than improving it. To resolve this, we propose RecGOAT, a dual-granularity semantic alignment framework built on graph neural networks and optimal transport theory. RecGOAT first enriches collaborative semantics through multimodal attentive graphs that capture item-item, user-item, and user-user relationships, initializing user representations via LLM-inferred behavioral preferences. It then aligns LM-derived modality representations with recommendation IDs at two complementary granularities: (1) instance-level alignment via cross-modal contrastive learning (CMCL), which produces discriminative per-sample representations; and (2) distribution-level alignment via optimal adaptive transport (OAT), which minimizes the 1-Wasserstein distance between ID distributions and LLM semantics to produce a unified, consistently aligned feature space. Theoretically, we prove that the unified representation achieves strictly lower target error than any single-modality representation, with the gap bounded by the Wasserstein distance and the InfoNCE loss, providing rigorous guarantees for both alignment consistency and fusion comprehensiveness. Extensive experiments on three public benchmarks demonstrate state-of-the-art performance. Deployment on a large-scale online advertising platform further validates RecGOAT's industrial scalability. Our code is available at https://github.com/6lyc/RecGOAT-LLM4Rec.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

TimeMM: Time-as-Operator Spectral Filtering for Dynamic Multimodal Recommendation
cs.IR 2026-04 unverdicted novelty 6.0

TimeMM proposes a time-as-operator spectral filtering framework with adaptive mixing and modality routing to model non-stationary multimodal user preferences in recommendation systems.
Behavior-Guided Candidate Calibration for Multimodal Recommendation
cs.IR 2026-05 unverdicted novelty 5.0

Behavior-guided calibration converts co-user overlap into signed evidence applied only to multimodal recommender shortlists and yields consistent gains on Amazon Baby, Sports, and Electronics datasets.