RLAAR applies competence-gated curriculum RL with mixed accuracy and abstention rewards to reduce Lost-in-Conversation degradation, raising benchmark accuracy from 62.6% to 75.1% and calibrated abstention from 33.5% to 73.4%.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Mitigating Lost in Multi-turn Conversation via Curriculum RL with Verifiable Accuracy and Abstention Rewards
RLAAR applies competence-gated curriculum RL with mixed accuracy and abstention rewards to reduce Lost-in-Conversation degradation, raising benchmark accuracy from 62.6% to 75.1% and calibrated abstention from 33.5% to 73.4%.