hub Canonical reference

A Survey of Personalized Large Language Models: Progress and Future Directions

A survey of personalized large language models: Progress, future directions , author= · 2025 · arXiv 2502.11528

Canonical reference. 80% of citing Pith papers cite this work as background.

21 Pith papers citing it

Background 80% of classified citations

read on arXiv browse 21 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 5

citation-polarity summary

background 4 support 1

representative citing papers

AlpsBench: An LLM Personalization Benchmark for Real-Dialogue Memorization and Preference Alignment

cs.CL · 2026-03-09 · unverdicted · novelty 8.0

AlpsBench supplies 2500 real-dialogue sequences with verified memories to benchmark LLM extraction, updating, retrieval, and utilization of personalized information.

VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions

cs.AI · 2026-05-26 · unverdicted · novelty 7.0

VitaBench 2.0 introduces a benchmark for long-term personalized and proactive agent behavior, with results indicating substantial gaps in current frontier LLMs.

Test-Time Personalization: A Diagnostic Framework and Probabilistic Fix for Scaling Failures

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

Test-time scaling for personalized LLMs follows a logarithmic utility curve under oracle selection but standard reward models suffer user-level collapse and query-level hacking; a probabilistic reward model with learned variance enables consistent scaling.

ReLay: Personalized LLM-Generated Plain-Language Summaries for Better Understanding, but at What Cost?

cs.CL · 2026-05-01 · unverdicted · novelty 7.0

Personalized LLM-generated plain language summaries improve lay readers' comprehension and quality ratings but increase risks of reinforcing biases and introducing hallucinations compared to static expert summaries.

Response-Aware User Memory Selection for LLM Personalization

cs.AI · 2026-04-15 · unverdicted · novelty 7.0

RUMS selects LLM user memory via mutual information with model outputs to reduce response uncertainty, outperforming similarity-based methods in human alignment and response quality with up to 95% lower cost.

Retrieval-Augmented Personalization with Foundation Models for Wearable Stress Detection

cs.LG · 2026-06-23 · unverdicted · novelty 6.0

Retrieval from out-of-domain foundation models enables personalization of a lightweight transformer for stress detection, yielding +3.92% accuracy and +4.76% F1 gains on WESAD without user labels.

When Should Memory Stay Silent: Measuring Memory-Use Boundaries in Memory-Augmented Conversational Agents

cs.AI · 2026-06-04 · unverdicted · novelty 6.0

RBI-Eval shows LLMs integrate sensitive memory under benign prompts at rates 8.9-82.9% higher than no-memory baselines, with retrieval systems reducing but not eliminating the effect.

cs.AI · 2026-06-04 · unverdicted · novelty 6.0

MemGate is a 9M-parameter neural gate inserted between vector memory and LLM that converts similarity search into task-conditioned admission, reducing memory-induced threats across agent frameworks while preserving utility.

A Reflective Storytelling Agent for Older Adults: Integrating Argumentation Schemes and Argument Mining in LLM-Based Personalised Narratives

cs.AI · 2026-05-11 · unverdicted · novelty 6.0

The work develops a reflective LLM-based storytelling agent for older adults that integrates argumentation schemes and argument mining with knowledge graphs and user modeling to generate and inspect personalized health narratives, evaluated through expert design and user studies showing recognition,

An Annotation Scheme and Classifier for Personal Facts in Dialogue

cs.CL · 2026-05-11 · accept · novelty 6.0

An extended annotation scheme with new categories and attributes plus a Gemma-300M-based multi-head classifier achieves 81.6% macro F1 on personal fact classification, outperforming few-shot LLM baselines by nearly 9 points with lower compute.

A Survey on LLM-based Conversational User Simulation

cs.CL · 2026-04-27 · unverdicted · novelty 6.0

A survey that introduces a taxonomy for LLM-based conversational user simulation, analyzes core techniques and evaluation methods, and identifies open challenges in the field.

Alignment has a Fantasia Problem

cs.AI · 2026-04-23 · unverdicted · novelty 6.0

AI alignment must move beyond assuming users have fully formed goals and instead provide active cognitive support to help form and refine intent over time.

Enhancing Zero-shot Personalized Image Aesthetics Assessment with Profile-aware Multimodal LLM

cs.CV · 2026-04-19 · unverdicted · novelty 6.0

P-MLLM augments a frozen LLM with selective fusion modules to incorporate visual information in a profile-conditioned manner for competitive zero-shot PIAA performance.

Mobile GUI Agent Privacy Personalization with Trajectory Induced Preference Optimization

cs.AI · 2026-04-13 · unverdicted · novelty 6.0

TIPO applies preference-intensity weighting and padding gating to stabilize preference optimization for privacy personalization in mobile GUI agents, yielding higher alignment and distinction metrics than prior methods.

TSUBASA: Improving Long-Horizon Personalization via Evolving Memory and Self-Learning with Context Distillation

cs.CL · 2026-04-09 · unverdicted · novelty 6.0

TSUBASA improves long-horizon personalization in LLMs via dynamic memory evolution for writing and context-distillation self-learning for reading, outperforming Mem0 and Memory-R1 on Qwen-3 benchmarks while reducing token use.

PersonaVLM: Long-Term Personalized Multimodal LLMs

cs.CL · 2026-03-20 · unverdicted · novelty 6.0

PersonaVLM adds memory extraction, multi-turn retrieval-based reasoning, and personality inference to multimodal LLMs, yielding 22.4% gains on a new long-term personalization benchmark and outperforming GPT-4o.

AI and My Values: User Perceptions of LLMs' Ability to Extract, Embody, and Explain Human Values from Casual Conversations

cs.HC · 2026-01-30 · unverdicted · novelty 6.0

13 participants became convinced AI understands human values after chatbot interactions evaluated with the VAPT toolkit.

Beyond Retrieval: Learning Compact User Representations for Scalable LLM Personalization

cs.IR · 2026-06-03 · unverdicted · novelty 5.0

TAP-PER encodes user preferences as lightweight learnable prefix embeddings that outperform prompt-based and adapter-based baselines on LaMP tasks with 130x fewer per-user parameters.

The Personalization Paradox: Semantic Loss vs. Reasoning Gains in Agentic AI Q&A

cs.IR · 2025-12-04 · unverdicted · novelty 5.0

Personalization in an agentic RAG advising system boosts reasoning quality and grounding while reducing semantic metric scores due to the inability of current metrics to accommodate user-specific responses.

Fair Agents: Balancing Multistakeholder Alignment in Multi-Agent Personalization Systems

cs.IR · 2026-05-04 · unverdicted · novelty 4.0

The authors propose a conceptual framework integrating stakeholder-LLM alignment methods, social choice-based aggregation for collective decisions, and stakeholder-centric evaluations to achieve fair multi-agent personalization.

Towards trustworthy agentic AI: a comprehensive survey of safety, robustness, privacy, and system security

cs.AI · 2026-05-17 · unverdicted · novelty 2.0

A survey that maps risks along the agent workflow and consolidates metrics and benchmarks for safety, robustness, privacy, and security in agentic AI.

citing papers explorer

Showing 20 of 20 citing papers after filters.

AlpsBench: An LLM Personalization Benchmark for Real-Dialogue Memorization and Preference Alignment cs.CL · 2026-03-09 · unverdicted · none · ref 23
AlpsBench supplies 2500 real-dialogue sequences with verified memories to benchmark LLM extraction, updating, retrieval, and utilization of personalized information.
VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions cs.AI · 2026-05-26 · unverdicted · none · ref 8
VitaBench 2.0 introduces a benchmark for long-term personalized and proactive agent behavior, with results indicating substantial gaps in current frontier LLMs.
Test-Time Personalization: A Diagnostic Framework and Probabilistic Fix for Scaling Failures cs.LG · 2026-05-09 · unverdicted · none · ref 1
Test-time scaling for personalized LLMs follows a logarithmic utility curve under oracle selection but standard reward models suffer user-level collapse and query-level hacking; a probabilistic reward model with learned variance enables consistent scaling.
ReLay: Personalized LLM-Generated Plain-Language Summaries for Better Understanding, but at What Cost? cs.CL · 2026-05-01 · unverdicted · none · ref 54
Personalized LLM-generated plain language summaries improve lay readers' comprehension and quality ratings but increase risks of reinforcing biases and introducing hallucinations compared to static expert summaries.
Response-Aware User Memory Selection for LLM Personalization cs.AI · 2026-04-15 · unverdicted · none · ref 5
RUMS selects LLM user memory via mutual information with model outputs to reduce response uncertainty, outperforming similarity-based methods in human alignment and response quality with up to 95% lower cost.
Retrieval-Augmented Personalization with Foundation Models for Wearable Stress Detection cs.LG · 2026-06-23 · unverdicted · none · ref 117
Retrieval from out-of-domain foundation models enables personalization of a lightweight transformer for stress detection, yielding +3.92% accuracy and +4.76% F1 gains on WESAD without user labels.
When Should Memory Stay Silent: Measuring Memory-Use Boundaries in Memory-Augmented Conversational Agents cs.AI · 2026-06-04 · unverdicted · none · ref 26
RBI-Eval shows LLMs integrate sensitive memory under benign prompts at rates 8.9-82.9% higher than no-memory baselines, with retrieval systems reducing but not eliminating the effect.
Beyond Similarity: Trustworthy Memory Search for Personal AI Agents cs.AI · 2026-06-04 · unverdicted · none · ref 1
MemGate is a 9M-parameter neural gate inserted between vector memory and LLM that converts similarity search into task-conditioned admission, reducing memory-induced threats across agent frameworks while preserving utility.
A Reflective Storytelling Agent for Older Adults: Integrating Argumentation Schemes and Argument Mining in LLM-Based Personalised Narratives cs.AI · 2026-05-11 · unverdicted · none · ref 42
The work develops a reflective LLM-based storytelling agent for older adults that integrates argumentation schemes and argument mining with knowledge graphs and user modeling to generate and inspect personalized health narratives, evaluated through expert design and user studies showing recognition,
A Survey on LLM-based Conversational User Simulation cs.CL · 2026-04-27 · unverdicted · none · ref 14
A survey that introduces a taxonomy for LLM-based conversational user simulation, analyzes core techniques and evaluation methods, and identifies open challenges in the field.
Alignment has a Fantasia Problem cs.AI · 2026-04-23 · unverdicted · none · ref 14
AI alignment must move beyond assuming users have fully formed goals and instead provide active cognitive support to help form and refine intent over time.
Enhancing Zero-shot Personalized Image Aesthetics Assessment with Profile-aware Multimodal LLM cs.CV · 2026-04-19 · unverdicted · none · ref 11
P-MLLM augments a frozen LLM with selective fusion modules to incorporate visual information in a profile-conditioned manner for competitive zero-shot PIAA performance.
Mobile GUI Agent Privacy Personalization with Trajectory Induced Preference Optimization cs.AI · 2026-04-13 · unverdicted · none · ref 18
TIPO applies preference-intensity weighting and padding gating to stabilize preference optimization for privacy personalization in mobile GUI agents, yielding higher alignment and distinction metrics than prior methods.
TSUBASA: Improving Long-Horizon Personalization via Evolving Memory and Self-Learning with Context Distillation cs.CL · 2026-04-09 · unverdicted · none · ref 31
TSUBASA improves long-horizon personalization in LLMs via dynamic memory evolution for writing and context-distillation self-learning for reading, outperforming Mem0 and Memory-R1 on Qwen-3 benchmarks while reducing token use.
PersonaVLM: Long-Term Personalized Multimodal LLMs cs.CL · 2026-03-20 · unverdicted · none · ref 24
PersonaVLM adds memory extraction, multi-turn retrieval-based reasoning, and personality inference to multimodal LLMs, yielding 22.4% gains on a new long-term personalization benchmark and outperforming GPT-4o.
AI and My Values: User Perceptions of LLMs' Ability to Extract, Embody, and Explain Human Values from Casual Conversations cs.HC · 2026-01-30 · unverdicted · none · ref 60
13 participants became convinced AI understands human values after chatbot interactions evaluated with the VAPT toolkit.
Beyond Retrieval: Learning Compact User Representations for Scalable LLM Personalization cs.IR · 2026-06-03 · unverdicted · none · ref 107
TAP-PER encodes user preferences as lightweight learnable prefix embeddings that outperform prompt-based and adapter-based baselines on LaMP tasks with 130x fewer per-user parameters.
The Personalization Paradox: Semantic Loss vs. Reasoning Gains in Agentic AI Q&A cs.IR · 2025-12-04 · unverdicted · none · ref 18
Personalization in an agentic RAG advising system boosts reasoning quality and grounding while reducing semantic metric scores due to the inability of current metrics to accommodate user-specific responses.
Fair Agents: Balancing Multistakeholder Alignment in Multi-Agent Personalization Systems cs.IR · 2026-05-04 · unverdicted · none · ref 7
The authors propose a conceptual framework integrating stakeholder-LLM alignment methods, social choice-based aggregation for collective decisions, and stakeholder-centric evaluations to achieve fair multi-agent personalization.
Towards trustworthy agentic AI: a comprehensive survey of safety, robustness, privacy, and system security cs.AI · 2026-05-17 · unverdicted · none · ref 203
A survey that maps risks along the agent workflow and consolidates metrics and benchmarks for safety, robustness, privacy, and security in agentic AI.

A Survey of Personalized Large Language Models: Progress and Future Directions

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer