pith. machine review for the scientific record.
sign in

hub Canonical reference

The Invisible Leash: Why RLVR may or may not escape its origin

Canonical reference. 80% of citing Pith papers cite this work as background.

12 Pith papers citing it
Background 80% of classified citations

hub tools

citation-role summary

background 4 method 1

citation-polarity summary

years

2026 10 2025 2

representative citing papers

Reinforcement Learning via Value Gradient Flow

cs.LG · 2026-04-15 · unverdicted · novelty 7.0

VGF solves behavior-regularized RL by transporting particles from a reference distribution to the value-induced optimal policy via discrete value-guided gradient flow.

Policy Improvement Reinforcement Learning

cs.LG · 2026-04-01 · unverdicted · novelty 6.0

PIRL maximizes cumulative policy improvement across iterations instead of surrogate rewards and is proven aligned with final performance; PIPO implements it via retrospective verification for stable closed-loop optimization.

Polychromic Objectives for Reinforcement Learning

cs.LG · 2025-09-29 · unverdicted · novelty 5.0

Introduces polychromic objectives adapted into PPO via vine sampling and modified advantages, showing higher success rates and better coverage under perturbations on BabyAI, Minigrid, and algorithmic tasks.

citing papers explorer

Showing 12 of 12 citing papers.