CUD reshapes the teacher's predictive distribution before distillation so that students receive calibrated uncertainty signals alongside accuracy, yielding more robust and better-calibrated models on high-cardinality and distribution-shift benchmarks.
While these techniques can improve expected calibration, they may also blur meaningful inter-class geometry that KD intends to pass on
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Trust the uncertain teacher: distilling dark knowledge via calibrated uncertainty
CUD reshapes the teacher's predictive distribution before distillation so that students receive calibrated uncertainty signals alongside accuracy, yielding more robust and better-calibrated models on high-cardinality and distribution-shift benchmarks.