pith. sign in

GitHub repository , howpublished =

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

fields

cs.LG 4 cs.CL 1

years

2026 5

verdicts

UNVERDICTED 5

representative citing papers

Self-Supervised On-Policy Distillation for Reasoning Language Models

cs.LG · 2026-05-17 · unverdicted · novelty 6.0

SSOPD converts intra-group correct-wrong contrast into process supervision by distilling a teacher distribution from the shortest correct completion into prefixes of the longest wrong completion, improving GRPO on AIME and HMMT benchmarks.

torchtune: PyTorch native post-training library

cs.LG · 2026-05-20 · unverdicted · novelty 5.0

torchtune is a modular PyTorch library for LLM post-training that delivers competitive performance and memory efficiency while supporting rapid research iteration through hackable components.

ReMedi: Reasoner for Medical Clinical Prediction

cs.CL · 2026-05-02 · unverdicted · novelty 5.0

ReMedi boosts LLM performance on EHR clinical predictions by up to 19.9% F1 through ground-truth-guided rationale regeneration and fine-tuning.

LEPO: Latent Reasoning Policy Optimization for Large Language Models

cs.LG · 2026-04-20 · unverdicted · novelty 5.0

LEPO applies RL to continuous latent representations in LLMs by injecting Gumbel-Softmax stochasticity for diverse trajectory sampling and unified gradient estimation, outperforming existing discrete and latent RL methods.

citing papers explorer

Showing 5 of 5 citing papers.

  • Self-Supervised On-Policy Distillation for Reasoning Language Models cs.LG · 2026-05-17 · unverdicted · none · ref 83

    SSOPD converts intra-group correct-wrong contrast into process supervision by distilling a teacher distribution from the shortest correct completion into prefixes of the longest wrong completion, improving GRPO on AIME and HMMT benchmarks.

  • SAGE: Shaping Anchors for Guided Exploration in RLVR of LLMs cs.LG · 2026-05-15 · unverdicted · none · ref 26

    SAGE reshapes the reverse-KL anchor via guide function q(x,y) for controllable empirical support expansion, yielding gains in both pass@1 and pass@k on math reasoning benchmarks.

  • torchtune: PyTorch native post-training library cs.LG · 2026-05-20 · unverdicted · none · ref 10

    torchtune is a modular PyTorch library for LLM post-training that delivers competitive performance and memory efficiency while supporting rapid research iteration through hackable components.

  • ReMedi: Reasoner for Medical Clinical Prediction cs.CL · 2026-05-02 · unverdicted · none · ref 49

    ReMedi boosts LLM performance on EHR clinical predictions by up to 19.9% F1 through ground-truth-guided rationale regeneration and fine-tuning.

  • LEPO: Latent Reasoning Policy Optimization for Large Language Models cs.LG · 2026-04-20 · unverdicted · none · ref 37

    LEPO applies RL to continuous latent representations in LLMs by injecting Gumbel-Softmax stochasticity for diverse trajectory sampling and unified gradient estimation, outperforming existing discrete and latent RL methods.