HorizonBench generates 6-month conversation histories from structured mental state graphs to test AI models on tracking evolving user preferences, finding that frontier models mostly fail at belief updates and perform near or below chance.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 2verdicts
UNVERDICTED 2representative citing papers
The paper maps agent memory research via three forms (token-level, parametric, latent), three functions (factual, experiential, working), and dynamics of formation/evolution/retrieval, plus benchmarks and future directions.
citing papers explorer
-
HorizonBench: Long-Horizon Personalization with Evolving Preferences
HorizonBench generates 6-month conversation histories from structured mental state graphs to test AI models on tracking evolving user preferences, finding that frontier models mostly fail at belief updates and perform near or below chance.
-
Memory in the Age of AI Agents
The paper maps agent memory research via three forms (token-level, parametric, latent), three functions (factual, experiential, working), and dynamics of formation/evolution/retrieval, plus benchmarks and future directions.