Hdpo: Hybrid distillation policy optimization via privileged self-distillation

Ken Ding · 2026 · arXiv 2603.23871

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

CEPO: RLVR Self-Distillation using Contrastive Evidence Policy Optimization

cs.LG · 2026-05-19 · conditional · novelty 7.0

CEPO sharpens token credit in RLVR by requiring tokens to be favored by the correct answer and disfavored by wrong answers drawn from rejected rollouts, delivering accuracy gains on five multimodal math benchmarks.

Multi-Rollout On-Policy Distillation via Peer Successes and Failures

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

MOPD improves on-policy distillation for LLMs by using peer successes for positive patterns and failures for negative examples to create more informative teacher signals.

Preference-Based Self-Distillation: Beyond KL Matching via Reward Regularization

cs.LG · 2026-05-06 · unverdicted · novelty 7.0

PBSD derives a reward-reweighted teacher distribution as the analytic optimum of a reward-regularized objective, yielding better stability and performance than KL-based self-distillation on math reasoning and tool-use tasks.

Prefix Teach, Suffix Fade: Local Teachability Collapse in Strong-to-Weak On-Policy Distillation

cs.CL · 2026-05-13 · unverdicted · novelty 6.0

Local teachability collapse in trajectory suffixes makes uniform dense supervision suboptimal in strong-to-weak OPD; truncating at BIC-style change points on teacher margin improves performance.

Adaptive Teacher Exposure for Self-Distillation in LLM Reasoning

cs.AI · 2026-05-12 · unverdicted · novelty 6.0

ATESD makes teacher exposure to reference reasoning a learnable control variable via a Beta-policy optimized on future student improvement, yielding gains of up to +2.33 points over fixed-exposure self-distillation on AIME and HMMT math benchmarks.

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

cs.LG · 2026-04-14 · unverdicted · novelty 6.0

On-policy distillation works when student and teacher models share thinking patterns and the teacher adds new capabilities, with success tied to alignment on a small set of high-probability tokens.

On-Policy Distillation with Best-of-N Teacher Rollout Selection

cs.CV · 2026-05-10 · unverdicted · novelty 5.0 · 2 refs

BRTS improves on-policy distillation by sampling multiple teacher rollouts and selecting the best one via a correctness-first then alignment priority rule, yielding gains on AIME and AMC math benchmarks.

A Brief Overview: On-Policy Self-Distillation In Large Language Models

cs.HC · 2026-05-18 · unverdicted · novelty 2.0 · 2 refs

This overview paper explains the conceptual foundations and design principles of On-Policy Self-Distillation for large language models from a beginner's perspective.

GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation

cs.CV · 2026-05-20

citing papers explorer

Showing 9 of 9 citing papers.

CEPO: RLVR Self-Distillation using Contrastive Evidence Policy Optimization cs.LG · 2026-05-19 · conditional · none · ref 3
CEPO sharpens token credit in RLVR by requiring tokens to be favored by the correct answer and disfavored by wrong answers drawn from rejected rollouts, delivering accuracy gains on five multimodal math benchmarks.
Multi-Rollout On-Policy Distillation via Peer Successes and Failures cs.LG · 2026-05-12 · unverdicted · none · ref 29
MOPD improves on-policy distillation for LLMs by using peer successes for positive patterns and failures for negative examples to create more informative teacher signals.
Preference-Based Self-Distillation: Beyond KL Matching via Reward Regularization cs.LG · 2026-05-06 · unverdicted · none · ref 4
PBSD derives a reward-reweighted teacher distribution as the analytic optimum of a reward-regularized objective, yielding better stability and performance than KL-based self-distillation on math reasoning and tool-use tasks.
Prefix Teach, Suffix Fade: Local Teachability Collapse in Strong-to-Weak On-Policy Distillation cs.CL · 2026-05-13 · unverdicted · none · ref 11
Local teachability collapse in trajectory suffixes makes uniform dense supervision suboptimal in strong-to-weak OPD; truncating at BIC-style change points on teacher margin improves performance.
Adaptive Teacher Exposure for Self-Distillation in LLM Reasoning cs.AI · 2026-05-12 · unverdicted · none · ref 6
ATESD makes teacher exposure to reference reasoning a learnable control variable via a Beta-policy optimized on future student improvement, yielding gains of up to +2.33 points over fixed-exposure self-distillation on AIME and HMMT math benchmarks.
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe cs.LG · 2026-04-14 · unverdicted · none · ref 3
On-policy distillation works when student and teacher models share thinking patterns and the teacher adds new capabilities, with success tied to alignment on a small set of high-probability tokens.
On-Policy Distillation with Best-of-N Teacher Rollout Selection cs.CV · 2026-05-10 · unverdicted · none · ref 7 · 2 links
BRTS improves on-policy distillation by sampling multiple teacher rollouts and selecting the best one via a correctness-first then alignment priority rule, yielding gains on AIME and AMC math benchmarks.
A Brief Overview: On-Policy Self-Distillation In Large Language Models cs.HC · 2026-05-18 · unverdicted · none · ref 7 · 2 links
This overview paper explains the conceptual foundations and design principles of On-Policy Self-Distillation for large language models from a beginner's perspective.
GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation cs.CV · 2026-05-20 · unreviewed · ref 12

Hdpo: Hybrid distillation policy optimization via privileged self-distillation

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer