pith. sign in

hub

Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

17 Pith papers cite this work. Polarity classification is still indexing.

17 Pith papers citing it

hub tools

citation-role summary

background 3

citation-polarity summary

roles

background 3

polarities

background 3

representative citing papers

Self-Policy Distillation via Capability-Selective Subspace Projection

cs.CL · 2026-05-21 · unverdicted · novelty 7.0

Self-Policy Distillation extracts a capability subspace from model gradients on correctness tokens, projects KV activations into it for self-generation, and fine-tunes LLMs to achieve up to 13-16% gains over baselines without external signals.

rePIRL: Learn PRM with Inverse RL for LLM Reasoning

cs.LG · 2026-02-08 · unverdicted · novelty 6.0

rePIRL learns effective process reward models for LLM reasoning via a dual policy-PRM update process inspired by inverse RL, unifying online and offline methods with reported gains over prior approaches on math and coding datasets.

Demystifying Long Chain-of-Thought Reasoning in LLMs

cs.CL · 2025-02-05 · unverdicted · novelty 4.0

Experiments show that long CoT reasoning in LLMs emerges with more training compute when reward shaping is used properly, and scaling verifiable rewards from noisy data helps especially on out-of-distribution tasks.

citing papers explorer

Showing 17 of 17 citing papers.