Outcome-level RL with binary or composite rewards improves compositional generalization over supervised fine-tuning by avoiding overfitting to frequent training patterns.
Post-Hoc Answer Attribution for Grounded and Trustworthy Long Document Comprehension: Task, Insights, and Challenges
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Reinforcement Learning for Compositional Generalization with Outcome-Level Optimization
Outcome-level RL with binary or composite rewards improves compositional generalization over supervised fine-tuning by avoiding overfitting to frequent training patterns.