Accepted reflections are inserted into the rubric bank, and the scored reflection samples are placed in adeferred buffer

Meanwhile, reflection scoring completes in the background

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards

cs.CL · 2026-05-11 · unverdicted · novelty 6.0

RubricEM uses rubric-guided stagewise policy decomposition and reflection-based meta-policy evolution to improve long-horizon research agents beyond verifiable rewards.

citing papers explorer

Showing 1 of 1 citing paper.

RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards cs.CL · 2026-05-11 · unverdicted · none · ref 33
RubricEM uses rubric-guided stagewise policy decomposition and reflection-based meta-policy evolution to improve long-horizon research agents beyond verifiable rewards.

Accepted reflections are inserted into the rubric bank, and the scored reflection samples are placed in adeferred buffer

fields

years

verdicts

representative citing papers

citing papers explorer