Augmenting LLM search judges with historical QRI cards improves Spearman correlation with user preferences by ~5% overall (91% relative on disagreements) and 15% in multilingual settings, with better alignment to live A/B test outcomes.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.IR 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
As It Was: Aligning LLM Search Evaluation with Historical User Preferences
Augmenting LLM search judges with historical QRI cards improves Spearman correlation with user preferences by ~5% overall (91% relative on disagreements) and 15% in multilingual settings, with better alignment to live A/B test outcomes.