Relevance Context Learning generates explicit relevance narratives from judged examples to guide LLM assessors, outperforming zero-shot and standard in-context learning for IR relevance judgments.
Title resolution pending
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
Synthetically formalizing information needs into topics with descriptions and narratives improves LLM relevance assessor agreement with humans and reduces over-labeling of relevant documents on TREC Deep Learning and Robust04.
A group of 22 researchers proposes seven study types and eight guidelines for empirical software engineering studies involving LLMs to enhance reproducibility and replicability.
LLMs judge document relevance at a level comparable to humans but frequently highlight different passages, indicating they are often not right for the right reasons and cannot fully replace human assessors.
citing papers explorer
-
Hybrid Pooling with LLMs via Relevance Context Learning
Relevance Context Learning generates explicit relevance narratives from judged examples to guide LLM assessors, outperforming zero-shot and standard in-context learning for IR relevance judgments.
-
Formalized Information Needs Improve Large-Language-Model Relevance Judgments
Synthetically formalizing information needs into topics with descriptions and narratives improves LLM relevance assessor agreement with humans and reduces over-labeling of relevant documents on TREC Deep Learning and Robust04.
-
Guidelines for Empirical Studies in Software Engineering involving Large Language Models
A group of 22 researchers proposes seven study types and eight guidelines for empirical software engineering studies involving LLMs to enhance reproducibility and replicability.
-
LLMs as Assessors: Right for the Right Reason?
LLMs judge document relevance at a level comparable to humans but frequently highlight different passages, indicating they are often not right for the right reasons and cannot fully replace human assessors.