Relevance Context Learning generates explicit relevance narratives from judged examples to guide LLM assessors, outperforming zero-shot and standard in-context learning for IR relevance judgments.
In: Proceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.IR 6years
2026 6roles
background 3polarities
background 3representative citing papers
MIRA is a new benchmark for multi-category integrated retrieval built from real queries on a social science platform, with LLM assistance for topic descriptions and relevance labeling across four item categories.
Synthetically formalizing information needs into topics with descriptions and narratives improves LLM relevance assessor agreement with humans and reduces over-labeling of relevant documents on TREC Deep Learning and Robust04.
LLMs consistently overrate relevance of inadequate passages in IR evaluations due to biases toward length and lexical features rather than true content match.
LLM-generated reference documents enable dynamic ranked list truncation and adaptive batching for listwise reranking, outperforming prior RLT methods and accelerating processing by up to 66% on TREC benchmarks.
LLMs judge document relevance at a level comparable to humans but frequently highlight different passages, indicating they are often not right for the right reasons and cannot fully replace human assessors.
citing papers explorer
-
Hybrid Pooling with LLMs via Relevance Context Learning
Relevance Context Learning generates explicit relevance narratives from judged examples to guide LLM assessors, outperforming zero-shot and standard in-context learning for IR relevance judgments.
-
MIRA: An LLM-Assisted Benchmark for Multi-Category Integrated Retrieval
MIRA is a new benchmark for multi-category integrated retrieval built from real queries on a social science platform, with LLM assistance for topic descriptions and relevance labeling across four item categories.
-
Formalized Information Needs Improve Large-Language-Model Relevance Judgments
Synthetically formalizing information needs into topics with descriptions and narratives improves LLM relevance assessor agreement with humans and reduces over-labeling of relevant documents on TREC Deep Learning and Robust04.
-
When LLM Judges Inflate Scores: Exploring Overrating in Relevance Assessment
LLMs consistently overrate relevance of inadequate passages in IR evaluations due to biases toward length and lexical features rather than true content match.
-
Dynamic Ranked List Truncation for Reranking Pipelines via LLM-generated Reference-Documents
LLM-generated reference documents enable dynamic ranked list truncation and adaptive batching for listwise reranking, outperforming prior RLT methods and accelerating processing by up to 66% on TREC benchmarks.
-
LLMs as Assessors: Right for the Right Reason?
LLMs judge document relevance at a level comparable to humans but frequently highlight different passages, indicating they are often not right for the right reasons and cannot fully replace human assessors.