RTT bridges response-level rubrics to token-level rewards via a relevance discriminator and intra-sample group normalization, yielding higher instruction and rubric accuracy than baselines.
D Inter-Sample Token Group Normalization We first provide a formal definition ofInter-Sample Token Group Normalization, the baseline strategy discussed in Section 3.2
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1roles
background 1polarities
unclear 1representative citing papers
citing papers explorer
-
Rubrics to Tokens: Bridging Response-level Rubrics and Token-level Rewards in Instruction Following Tasks
RTT bridges response-level rubrics to token-level rewards via a relevance discriminator and intra-sample group normalization, yielding higher instruction and rubric accuracy than baselines.