arXiv preprint arXiv:2503.07018 , year=

Xintong Li, Jalend Bantupalli, Ria Dharmani, Yuwei Zhang, Jingbo Shang · arXiv 2503.07018

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

HorizonBench: Long-Horizon Personalization with Evolving Preferences

cs.CL · 2026-04-19 · unverdicted · novelty 7.0

HorizonBench generates 6-month conversation histories from structured mental state graphs to test AI models on tracking evolving user preferences, finding that frontier models mostly fail at belief updates and perform near or below chance.

LMEB: Long-horizon Memory Embedding Benchmark

cs.CL · 2026-03-13 · unverdicted · novelty 7.0

LMEB benchmark shows that embedding models' performance on traditional retrieval does not transfer to long-horizon memory tasks, larger models do not always perform better, and LMEB measures capabilities orthogonal to MTEB.

citing papers explorer

Showing 2 of 2 citing papers.

HorizonBench: Long-Horizon Personalization with Evolving Preferences cs.CL · 2026-04-19 · unverdicted · none · ref 6
HorizonBench generates 6-month conversation histories from structured mental state graphs to test AI models on tracking evolving user preferences, finding that frontier models mostly fail at belief updates and perform near or below chance.
LMEB: Long-horizon Memory Embedding Benchmark cs.CL · 2026-03-13 · unverdicted · none · ref 18
LMEB benchmark shows that embedding models' performance on traditional retrieval does not transfer to long-horizon memory tasks, larger models do not always perform better, and LMEB measures capabilities orthogonal to MTEB.

arXiv preprint arXiv:2503.07018 , year=

fields

years

verdicts

representative citing papers

citing papers explorer