Reinforcement Learning for Reasoning Tasks

proposed that effective process rewards should measure progress by evaluating likelihood changes before, after each reasoning step · 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Taming Extreme Tokens: Covariance-Aware GRPO with Gaussian-Kernel Advantage Reweighting

cs.CL · 2026-05-12 · unverdicted · novelty 6.0

Covariance-weighted GRPO with Gaussian-kernel reweighting tames extreme tokens to stabilize training and boost reasoning performance over standard GRPO.

citing papers explorer

Showing 1 of 1 citing paper.

Taming Extreme Tokens: Covariance-Aware GRPO with Gaussian-Kernel Advantage Reweighting cs.CL · 2026-05-12 · unverdicted · none · ref 15
Covariance-weighted GRPO with Gaussian-kernel reweighting tames extreme tokens to stabilize training and boost reasoning performance over standard GRPO.

Reinforcement Learning for Reasoning Tasks

fields

years

verdicts

representative citing papers

citing papers explorer