Personalized deep research systems need evaluation with real users because LLM judges overlook nuanced errors that matter to researchers.
InProceedings of the 63rd Annual Meeting of the Association for Computational Lin- guistics (Volume 3: System Demonstrations), pages 513–523, Vienna, Austria
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
Language Models Don't Know What You Want: Evaluating Personalization in Deep Research Needs Real Users
Personalized deep research systems need evaluation with real users because LLM judges overlook nuanced errors that matter to researchers.