RecGOAT: Graph Optimal Adaptive Transport for LLM-Enhanced Multimodal Recommendation with Dual Semantic Alignment
read the original abstract
Integrating large language model (LLM) representations into multimodal recommendation has shown promise, yet a fundamental challenge remains largely overlooked: the semantic heterogeneity between generative LM representations and the ID-based collaborative signals that recommendation systems rely on. Naively injecting LM features without alignment degrades recommendation performance rather than improving it. To resolve this, we propose RecGOAT, a dual-granularity semantic alignment framework built on graph neural networks and optimal transport theory. RecGOAT first enriches collaborative semantics through multimodal attentive graphs that capture item-item, user-item, and user-user relationships, initializing user representations via LLM-inferred behavioral preferences. It then aligns LM-derived modality representations with recommendation IDs at two complementary granularities: (1) instance-level alignment via cross-modal contrastive learning (CMCL), which produces discriminative per-sample representations; and (2) distribution-level alignment via optimal adaptive transport (OAT), which minimizes the 1-Wasserstein distance between ID distributions and LLM semantics to produce a unified, consistently aligned feature space. Theoretically, we prove that the unified representation achieves strictly lower target error than any single-modality representation, with the gap bounded by the Wasserstein distance and the InfoNCE loss, providing rigorous guarantees for both alignment consistency and fusion comprehensiveness. Extensive experiments on three public benchmarks demonstrate state-of-the-art performance. Deployment on a large-scale online advertising platform further validates RecGOAT's industrial scalability. Our code is available at https://github.com/6lyc/RecGOAT-LLM4Rec.
This paper has not been read by Pith yet.
Forward citations
Cited by 2 Pith papers
-
TimeMM: Time-as-Operator Spectral Filtering for Dynamic Multimodal Recommendation
TimeMM proposes a time-as-operator spectral filtering framework with adaptive mixing and modality routing to model non-stationary multimodal user preferences in recommendation systems.
-
Behavior-Guided Candidate Calibration for Multimodal Recommendation
Behavior-guided calibration converts co-user overlap into signed evidence applied only to multimodal recommender shortlists and yields consistent gains on Amazon Baby, Sports, and Electronics datasets.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.