pith. sign in

Canonical reference

Justrl: Scaling a 1.5 b llm with a simple rl recipe

Canonical reference. 80% of citing Pith papers cite this work as background.

9 Pith papers citing it
Background 80% of classified citations

citation-role summary

background 4 method 1

citation-polarity summary

years

2026 9

representative citing papers

Entropy Polarity in Reinforcement Fine-Tuning: Direction, Asymmetry, and Control

cs.LG · 2026-05-12 · unverdicted · novelty 6.0 · 2 refs

Entropy polarity is a signed token-level quantity derived from a first-order approximation of entropy change that predicts whether RL updates expand or contract policy entropy in LLM fine-tuning, revealing an asymmetry between high- and low-probability tokens.

On-Policy Distillation with Best-of-N Teacher Rollout Selection

cs.CV · 2026-05-10 · unverdicted · novelty 5.0 · 2 refs

BRTS improves on-policy distillation by sampling multiple teacher rollouts and selecting the best one via a correctness-first then alignment priority rule, yielding gains on AIME and AMC math benchmarks.

citing papers explorer

Showing 9 of 9 citing papers.