Heal: Hindsight entropy-assisted learning for reasoning distillation.arXiv preprint arXiv:2603.10359,

12 Wenjing Zhang, Jiangze Yan, Jieyun Huang, Yi Shen, Shuming Shi, Ping Chen, Ning Wang, Zhaoxiang Liu, Kai Wang, Shiguo Lian · arXiv 2603.10359

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

read on arXiv browse 1 citing papers

representative citing papers

Teaching the Way, Not the Answer: Privileged Tutoring Distillation for Multimodal Policy Optimization

cs.AI · 2026-06-05 · unverdicted · novelty 6.0

PTD-PO supplies step-wise token-distribution supervision to student policies via in-context privileged hints derived from spatial attention and intermediate reasoning, while keeping the student in an answer-free context and using Top-K Jensen-Shannon divergence for stable alignment.

citing papers explorer

Showing 1 of 1 citing paper.

Teaching the Way, Not the Answer: Privileged Tutoring Distillation for Multimodal Policy Optimization cs.AI · 2026-06-05 · unverdicted · none · ref 23
PTD-PO supplies step-wise token-distribution supervision to student policies via in-context privileged hints derived from spatial attention and intermediate reasoning, while keeping the student in an answer-free context and using Top-K Jensen-Shannon divergence for stable alignment.

Heal: Hindsight entropy-assisted learning for reasoning distillation.arXiv preprint arXiv:2603.10359,

fields

years

verdicts

representative citing papers

citing papers explorer