hub

L a MP : When Large Language Models Meet Personalization

doi: 10 · 2024 · DOI 10.18653/v1/2024.acl-long.399

13 Pith papers cite this work. Polarity classification is still indexing.

13 Pith papers citing it

open at publisher browse 13 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

baseline 1 dataset 1

citation-polarity summary

baseline 1 use dataset 1

representative citing papers

Test-Time Personalization: A Diagnostic Framework and Probabilistic Fix for Scaling Failures

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

Test-time scaling for personalized LLMs follows a logarithmic utility curve under oracle selection but standard reward models suffer user-level collapse and query-level hacking; a probabilistic reward model with learned variance enables consistent scaling.

HorizonBench: Long-Horizon Personalization with Evolving Preferences

cs.CL · 2026-04-19 · unverdicted · novelty 7.0

HorizonBench generates 6-month conversation histories from structured mental state graphs to test AI models on tracking evolving user preferences, finding that frontier models mostly fail at belief updates and perform near or below chance.

PERCEIVE: A Benchmark for Personalized Emotion and Communication Behavior Understanding on Social Media

cs.SI · 2026-04-10 · unverdicted · novelty 7.0

PERCEIVE is the first bilingual benchmark integrating author content, reader emotions from comments, communication behavior, user attributes, and social graphs for personalized social media emotion understanding.

Improving General Role-Playing Agents via Psychology-Grounded Reasoning and Role-Aware Policy Optimization

cs.CL · 2026-06-25 · unverdicted · novelty 6.0

Psy-CoT decomposes reasoning into Interaction Perception, Psychological Empathy, and Logical Construction while RAPO asymmetrically weights role-specific tokens during policy optimization, outperforming prior CoT and GRPO baselines on role-playing benchmarks.

ClusterRAG: Cluster-Based Collaborative Filtering for Personalized Retrieval-Augmented Generation

cs.IR · 2026-04-14 · unverdicted · novelty 6.0

ClusterRAG applies density-based clustering to user profiles for collaborative retrieval in personalized RAG and reports best performance on LaMP tasks by combining target and similar-user profiles.

TSUBASA: Improving Long-Horizon Personalization via Evolving Memory and Self-Learning with Context Distillation

cs.CL · 2026-04-09 · unverdicted · novelty 6.0

TSUBASA improves long-horizon personalization in LLMs via dynamic memory evolution for writing and context-distillation self-learning for reading, outperforming Mem0 and Memory-R1 on Qwen-3 benchmarks while reducing token use.

PeReGrINE: Evaluating Personalized Review Fidelity with User Item Graph Context

cs.IR · 2026-04-09 · unverdicted · novelty 6.0

PeReGrINE is a graph-based benchmark that restructures Amazon Reviews 2023 with temporal cutoffs and introduces dissonance analysis to measure how well retrieval-conditioned models match user style and product consensus.

Controlling Distributional Bias in Multi-Round LLM Generation via KL-Optimized Fine-Tuning

cs.CL · 2026-04-07 · unverdicted · novelty 6.0

A hybrid fine-tuning objective using KL divergence for token calibration and Kahneman-Tversky optimization for semantic binding enables LLMs to produce outputs that match desired attribute distributions across repeated prompts.

Always-OnAgents:A Survey of Persistent Memory, State, and Governance in LLMAgents

cs.MA · 2026-06-29 · unverdicted · novelty 5.0

Survey mapping persistent state in LLM agents along six axes and proposing the AOEP-v0 protocol to evaluate governance and recovery obligations.

Exploring Cross-Scenario Generality of Agentic Memory Systems: Diagnostics and a Strong Baseline

cs.AI · 2026-06-03 · unverdicted · novelty 5.0

An agentic harness letting the LLM self-manage flat text-file storage via tool calls outperforms eight prior memory systems on cross-scenario generality across QA, chat, trajectory, stress-test, and long-horizon tasks.

Training LLMs with Reinforcement Learning for Intent-Aware Personalized Question Answering

cs.CL · 2026-05-12 · unverdicted · novelty 5.0

IAP uses RL to train LLMs to explicitly infer and apply implicit user intent in single-turn personalized QA, achieving ~7.5% average macro-score gains over baselines on LaMP-QA.

Quantifying and Predicting Disagreement in Graded Human Ratings

cs.CL · 2026-05-01 · unverdicted · novelty 5.0

Annotation disagreement on toxic language can be moderately predicted from textual features, with high-opposition items proving harder for models to estimate accurately.

Modeling Human Perspectives with Socio-Demographic Representations

cs.CL · 2026-04-20 · unverdicted · novelty 5.0

Socio-Contrastive Learning jointly learns socio-demographic representations and textual features via contrastive objectives to predict annotator perspectives more accurately than concatenation baselines.

citing papers explorer

Showing 13 of 13 citing papers after filters.

Test-Time Personalization: A Diagnostic Framework and Probabilistic Fix for Scaling Failures cs.LG · 2026-05-09 · unverdicted · none · ref 3
Test-time scaling for personalized LLMs follows a logarithmic utility curve under oracle selection but standard reward models suffer user-level collapse and query-level hacking; a probabilistic reward model with learned variance enables consistent scaling.
HorizonBench: Long-Horizon Personalization with Evolving Preferences cs.CL · 2026-04-19 · unverdicted · none · ref 32
HorizonBench generates 6-month conversation histories from structured mental state graphs to test AI models on tracking evolving user preferences, finding that frontier models mostly fail at belief updates and perform near or below chance.
PERCEIVE: A Benchmark for Personalized Emotion and Communication Behavior Understanding on Social Media cs.SI · 2026-04-10 · unverdicted · none · ref 21
PERCEIVE is the first bilingual benchmark integrating author content, reader emotions from comments, communication behavior, user attributes, and social graphs for personalized social media emotion understanding.
Improving General Role-Playing Agents via Psychology-Grounded Reasoning and Role-Aware Policy Optimization cs.CL · 2026-06-25 · unverdicted · none · ref 112
Psy-CoT decomposes reasoning into Interaction Perception, Psychological Empathy, and Logical Construction while RAPO asymmetrically weights role-specific tokens during policy optimization, outperforming prior CoT and GRPO baselines on role-playing benchmarks.
ClusterRAG: Cluster-Based Collaborative Filtering for Personalized Retrieval-Augmented Generation cs.IR · 2026-04-14 · unverdicted · none · ref 89
ClusterRAG applies density-based clustering to user profiles for collaborative retrieval in personalized RAG and reports best performance on LaMP tasks by combining target and similar-user profiles.
TSUBASA: Improving Long-Horizon Personalization via Evolving Memory and Self-Learning with Context Distillation cs.CL · 2026-04-09 · unverdicted · none · ref 54
TSUBASA improves long-horizon personalization in LLMs via dynamic memory evolution for writing and context-distillation self-learning for reading, outperforming Mem0 and Memory-R1 on Qwen-3 benchmarks while reducing token use.
PeReGrINE: Evaluating Personalized Review Fidelity with User Item Graph Context cs.IR · 2026-04-09 · unverdicted · none · ref 7
PeReGrINE is a graph-based benchmark that restructures Amazon Reviews 2023 with temporal cutoffs and introduces dissonance analysis to measure how well retrieval-conditioned models match user style and product consensus.
Controlling Distributional Bias in Multi-Round LLM Generation via KL-Optimized Fine-Tuning cs.CL · 2026-04-07 · unverdicted · none · ref 32
A hybrid fine-tuning objective using KL divergence for token calibration and Kahneman-Tversky optimization for semantic binding enables LLMs to produce outputs that match desired attribute distributions across repeated prompts.
Always-OnAgents:A Survey of Persistent Memory, State, and Governance in LLMAgents cs.MA · 2026-06-29 · unverdicted · none · ref 36
Survey mapping persistent state in LLM agents along six axes and proposing the AOEP-v0 protocol to evaluate governance and recovery obligations.
Exploring Cross-Scenario Generality of Agentic Memory Systems: Diagnostics and a Strong Baseline cs.AI · 2026-06-03 · unverdicted · none · ref 30
An agentic harness letting the LLM self-manage flat text-file storage via tool calls outperforms eight prior memory systems on cross-scenario generality across QA, chat, trajectory, stress-test, and long-horizon tasks.
Training LLMs with Reinforcement Learning for Intent-Aware Personalized Question Answering cs.CL · 2026-05-12 · unverdicted · none · ref 24
IAP uses RL to train LLMs to explicitly infer and apply implicit user intent in single-turn personalized QA, achieving ~7.5% average macro-score gains over baselines on LaMP-QA.
Quantifying and Predicting Disagreement in Graded Human Ratings cs.CL · 2026-05-01 · unverdicted · none · ref 52
Annotation disagreement on toxic language can be moderately predicted from textual features, with high-opposition items proving harder for models to estimate accurately.
Modeling Human Perspectives with Socio-Demographic Representations cs.CL · 2026-04-20 · unverdicted · none · ref 47
Socio-Contrastive Learning jointly learns socio-demographic representations and textual features via contrastive objectives to predict annotator perspectives more accurately than concatenation baselines.

L a MP : When Large Language Models Meet Personalization

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer