pith. sign in

arxiv: 2412.02142 · v1 · pith:XDFWNYP3new · submitted 2024-12-03 · 💻 cs.CV · cs.AI· cs.CL· cs.IR

Personalized Multimodal Large Language Models: A Survey

classification 💻 cs.CV cs.AIcs.CLcs.IR
keywords languagelargemodelsmultimodalpersonalizedmllmssurveytechniques
0
0 comments X
read the original abstract

Multimodal Large Language Models (MLLMs) have become increasingly important due to their state-of-the-art performance and ability to integrate multiple data modalities, such as text, images, and audio, to perform complex tasks with high accuracy. This paper presents a comprehensive survey on personalized multimodal large language models, focusing on their architecture, training methods, and applications. We propose an intuitive taxonomy for categorizing the techniques used to personalize MLLMs to individual users, and discuss the techniques accordingly. Furthermore, we discuss how such techniques can be combined or adapted when appropriate, highlighting their advantages and underlying rationale. We also provide a succinct summary of personalization tasks investigated in existing research, along with the evaluation metrics commonly used. Additionally, we summarize the datasets that are useful for benchmarking personalized MLLMs. Finally, we outline critical open challenges. This survey aims to serve as a valuable resource for researchers and practitioners seeking to understand and advance the development of personalized multimodal large language models.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 8 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. AlpsBench: An LLM Personalization Benchmark for Real-Dialogue Memorization and Preference Alignment

    cs.CL 2026-03 unverdicted novelty 8.0

    AlpsBench supplies 2500 real-dialogue sequences with verified memories to benchmark LLM extraction, updating, retrieval, and utilization of personalized information.

  2. Personal Visual Memory from Explicit and Implicit Evidence

    cs.CV 2026-05 unverdicted novelty 7.0

    VisualMem augments text memory with a visual module that resolves identity and durable user facts from images, outperforming prior systems on a new benchmark for explicit and implicit personal visual evidence.

  3. F-GRPO: Factorized Group-Relative Policy Optimization for Unified Candidate Generation and Ranking

    cs.LG 2026-05 unverdicted novelty 7.0

    F-GRPO factorizes group-relative policy optimization into generation and ranking phases within one autoregressive sequence, using order-invariant coverage and position-aware utility rewards to improve top-ranked perfo...

  4. OLIVIA: Online Learning via Inference-time Action Adaptation for Decision Making in LLM ReAct Agents

    cs.AI 2026-05 unverdicted novelty 7.0

    OLIVIA treats LLM agent action selection as a contextual linear bandit over frozen hidden states and applies UCB exploration to adapt online, yielding consistent gains over static ReAct and prompt-based baselines on f...

  5. Skill-CMIB: Multimodal Agent Skill for Consistent Action via Conditional Multimodal Information Bottleneck

    cs.LG 2026-05 unverdicted novelty 7.0

    CMIB uses a conditional multimodal information bottleneck to create reusable agent skills that separate verbalizable text content from predictive perceptual residuals, improving execution stability.

  6. Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning

    cs.LG 2026-04 unverdicted novelty 7.0

    This survey introduces the Generate-Filter-Control-Replay (GFCR) taxonomy to structure rollout pipelines for RL-based post-training of reasoning LLMs.

  7. PersonaVLM: Long-Term Personalized Multimodal LLMs

    cs.CL 2026-03 unverdicted novelty 6.0

    PersonaVLM adds memory extraction, multi-turn retrieval-based reasoning, and personality inference to multimodal LLMs, yielding 22.4% gains on a new long-term personalization benchmark and outperforming GPT-4o.

  8. GRADE: Graph Representation of LLM Agent Dependency and Execution

    cs.LG 2026-06 unverdicted novelty 5.0

    GRADE models any LLM agent run as a graph with execution and graded dependency edge layers to enable failure prediction and fault localization across tool, coding, and web agent corpora.