LLMs consistently overrate relevance of inadequate passages in IR evaluations due to biases toward length and lexical features rather than true content match.
Damessie, Thao P
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
A neural sparse retrieval system with granular subword tokenization (max 3 chars) achieves 91.4% recall@10 on a 6M music document corpus versus 57.7% for trigrams, with improved HCI exploration efficiency and zero added query latency.
citing papers explorer
-
When LLM Judges Inflate Scores: Exploring Overrating in Relevance Assessment
LLMs consistently overrate relevance of inadequate passages in IR evaluations due to biases toward length and lexical features rather than true content match.
-
Surface-Form Neural Sparse Retrieval: Robust Fuzzy Matching for Industrial Music Search
A neural sparse retrieval system with granular subword tokenization (max 3 chars) achieves 91.4% recall@10 on a 6M music document corpus versus 57.7% for trigrams, with improved HCI exploration efficiency and zero added query latency.