Relevance Context Learning generates explicit relevance narratives from judged examples to guide LLM assessors, outperforming zero-shot and standard in-context learning for IR relevance judgments.
CoRRabs/2411.08275(2024)
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 4roles
method 1polarities
use method 1representative citing papers
DoGMaTiQ automates QA-nugget creation via document-grounded generation, paraphrase clustering, and quality-based subselection, yielding strong rank correlations with human judgments on cross-lingual TREC tasks.
Synthetically formalizing information needs into topics with descriptions and narratives improves LLM relevance assessor agreement with humans and reduces over-labeling of relevant documents on TREC Deep Learning and Robust04.
LLMs consistently overrate relevance of inadequate passages in IR evaluations due to biases toward length and lexical features rather than true content match.
citing papers explorer
-
Hybrid Pooling with LLMs via Relevance Context Learning
Relevance Context Learning generates explicit relevance narratives from judged examples to guide LLM assessors, outperforming zero-shot and standard in-context learning for IR relevance judgments.
-
DoGMaTiQ: Automated Generation of Question-and-Answer Nuggets for Report Evaluation
DoGMaTiQ automates QA-nugget creation via document-grounded generation, paraphrase clustering, and quality-based subselection, yielding strong rank correlations with human judgments on cross-lingual TREC tasks.
-
Formalized Information Needs Improve Large-Language-Model Relevance Judgments
Synthetically formalizing information needs into topics with descriptions and narratives improves LLM relevance assessor agreement with humans and reduces over-labeling of relevant documents on TREC Deep Learning and Robust04.
-
When LLM Judges Inflate Scores: Exploring Overrating in Relevance Assessment
LLMs consistently overrate relevance of inadequate passages in IR evaluations due to biases toward length and lexical features rather than true content match.