Latent Preference Modeling for Cross-Session Personalized Tool Calling

· 2026 · cs.CL · arXiv 2604.17886

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Users often omit essential details in their requests to LLM-based agents, resulting in under-specified inputs for tool use. This poses a fundamental challenge for tool-augmented agents, as API execution typically requires complete arguments, highlighting the need for personalized tool calling. To study this problem, we introduce MPT, a benchmark comprising 265 multi-session dialogues that cover three challenges: Preference Recall, Preference Induction, and Preference Transfer. We also propose PRefine, a test-time memory-augmented method that represents user preferences as evolving hypotheses. Through a generate--verify--refine loop, it extracts reusable constraints from history and improves tool-calling accuracy while using only 1.24% of the tokens required by full-history prompting. These results indicate that robust personalization in agentic systems depends on memory that captures the reasons behind user choices, not just the choices themselves.

representative citing papers

Ask Now, Use Later: Benchmarking the Proactivity Gap in Long-Lived LLM Agents

cs.CL · 2026-05-27 · unverdicted · novelty 8.0

ATRBench is the first benchmark for the Ask-to-Remember task, showing eight frontier LLM agents fall at least 62 points below an oracle that receives the relevant preference and that prompting closes little of the gap.

citing papers explorer

Showing 1 of 1 citing paper.

Ask Now, Use Later: Benchmarking the Proactivity Gap in Long-Lived LLM Agents cs.CL · 2026-05-27 · unverdicted · none · ref 4 · internal anchor
ATRBench is the first benchmark for the Ask-to-Remember task, showing eight frontier LLM agents fall at least 62 points below an oracle that receives the relevant preference and that prompting closes little of the gap.

Latent Preference Modeling for Cross-Session Personalized Tool Calling

fields

years

verdicts

representative citing papers

citing papers explorer