pith. sign in

On the direction of rlvr updates for llm reasoning: Identification and exploitation

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

fields

cs.LG 7 cs.AI 1

years

2026 8

verdicts

UNVERDICTED 8

clear filters

representative citing papers

Not only where, But when: Temporal Scheduling for RLVR

cs.LG · 2026-05-25 · unverdicted · novelty 7.0

Temporal scheduling of credit allocation criteria over RLVR training, using trajectory percentiles to target heterogeneous behaviors, yields more stable policy entropy and better reasoning benchmark results than static allocation.

APPO: Agentic Procedural Policy Optimization

cs.LG · 2026-06-10 · unverdicted · novelty 6.0

APPO refines branching and credit assignment in agentic RL via a Branching Score and procedure-level scaling, improving baselines by nearly 4 points on 13 benchmarks.

Entropy Polarity in Reinforcement Fine-Tuning: Direction, Asymmetry, and Control

cs.LG · 2026-05-12 · unverdicted · novelty 6.0 · 2 refs

Entropy polarity is a signed token-level quantity derived from a first-order approximation of entropy change that predicts whether RL updates expand or contract policy entropy in LLM fine-tuning, revealing an asymmetry between high- and low-probability tokens.

Mechanism-Guided Selective Unlearning for RLVR-Induced Reasoning

cs.LG · 2026-06-17 · unverdicted · novelty 5.0

MAST ranks attention-projection tensors by off-principal energy, update magnitude, and forget-gradient coupling to selectively unlearn RLVR-induced reasoning, achieving significant forgetting on MATH while preserving GSM8K and retain MATH unlike full-parameter updates.

One-Way Policy Optimization for Self-Evolving LLMs

cs.LG · 2026-05-21 · unverdicted · novelty 5.0

OWPO decouples optimization direction from magnitude via asymmetric reweighting (Accelerated Alignment for inferior deviations, Gain Locking for superior) plus iterative references to create a ratchet effect for continuous LLM improvement.

citing papers explorer

Showing 8 of 8 citing papers after filters.