pith. sign in

Overconfident errors need stronger correction: Asymmetric confidence penalties for reinforcement learning.CoRR, abs/2602.21420

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

citation-role summary

other 1

citation-polarity summary

fields

cs.LG 3 cs.AI 2

years

2026 5

roles

other 1

polarities

unclear 1

representative citing papers

TIP: Token Importance in On-Policy Distillation

cs.LG · 2026-04-15 · unverdicted · novelty 6.0 · 3 refs

A two-axis taxonomy of student entropy and teacher-student divergence identifies informative tokens in on-policy distillation, allowing near-full performance with 10-50% of tokens.

Reasoning Compression with Mixed-Policy Distillation

cs.AI · 2026-05-09 · unverdicted · novelty 5.0

Mixed-Policy Distillation transfers concise reasoning behavior from larger to smaller LLMs by having the teacher compress student-generated trajectories, cutting token usage up to 27% while raising benchmark scores.

citing papers explorer

Showing 5 of 5 citing papers.