Personalized deep research systems need evaluation with real users because LLM judges overlook nuanced errors that matter to researchers.
InProceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 735–744
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
Language Models Don't Know What You Want: Evaluating Personalization in Deep Research Needs Real Users
Personalized deep research systems need evaluation with real users because LLM judges overlook nuanced errors that matter to researchers.