Fine-tuned LLMs produce critiques that improve human detection of errors in summaries, with larger models showing better self-critique and refinement capabilities.
For example, we could search for critiques (see Section D) Recall that in Section 5 we found a negative CD gap for the Addition, Alphabetize, and RACE synthetic tasks
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2022 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
Self-critiquing models for assisting human evaluators
Fine-tuned LLMs produce critiques that improve human detection of errors in summaries, with larger models showing better self-critique and refinement capabilities.