LLMs judge document relevance at a level comparable to humans but frequently highlight different passages, indicating they are often not right for the right reasons and cannot fully replace human assessors.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
REFLEX improves explainable fact-checking by using verdict-anchored style control and self-disagreement signals to disentangle fact from style in LLM outputs, achieving SOTA results with minimal self-refined samples.
A survey categorizing scaling in LLM reasoning across input size, steps, rounds, training, and future directions, noting that scaling can negatively affect performance.
citing papers explorer
-
LLMs as Assessors: Right for the Right Reason?
LLMs judge document relevance at a level comparable to humans but frequently highlight different passages, indicating they are often not right for the right reasons and cannot fully replace human assessors.
-
REFLEX: Self-Refining Explainable Fact-Checking via Verdict-Anchored Style Control
REFLEX improves explainable fact-checking by using verdict-anchored style control and self-disagreement signals to disentangle fact from style in LLM outputs, achieving SOTA results with minimal self-refined samples.
-
A Survey of Scaling in Large Language Model Reasoning
A survey categorizing scaling in LLM reasoning across input size, steps, rounds, training, and future directions, noting that scaling can negatively affect performance.