Scores normalized so 0 = baseline model, 1 =r(y ∗)

11 A Appendix Inference-time RL Reward Functionπ(·|y ∗ )|Y S |=2|Y S |=4|Y S |=8|Y S |=16|Y S |=32|Y S |=64 RL@500 RL@1000 RL@ · 2000

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Reward Weighted Classifier-Free Guidance as Policy Improvement in Autoregressive Models

cs.LG · 2026-04-16 · unverdicted · novelty 6.0

Reward-weighted classifier-free guidance approximates Q-function policy improvement in autoregressive models, enabling test-time reward optimization and faster RL convergence via distillation.

citing papers explorer

Showing 1 of 1 citing paper.

Reward Weighted Classifier-Free Guidance as Policy Improvement in Autoregressive Models cs.LG · 2026-04-16 · unverdicted · none · ref 20
Reward-weighted classifier-free guidance approximates Q-function policy improvement in autoregressive models, enabling test-time reward optimization and faster RL convergence via distillation.

Scores normalized so 0 = baseline model, 1 =r(y ∗)

fields

years

verdicts

representative citing papers

citing papers explorer