Hard Labels In! Rethinking the Role of Hard Labels in Mitigating Local Semantic Drift
read the original abstract
Soft labels from teacher models are a de facto practice for knowledge transfer and large-scale dataset distillation (e.g., SRe2L, LPLD). However, when we limit the number of crops per image to reduce the substantial cost of storing precomputed soft labels, these methods suffer severely from local semantic drift: visually ambiguous crops can cause soft supervision to deviate from the image-level ground-truth semantics, leading to persistent errors and a train-test distribution mismatch. We revisit the overlooked role of hard labels and show that, when properly integrated, they can act as a content-invariant semantic anchor that calibrates such drift. We theoretically analyze the emergence of drift under sparse soft-label supervision and demonstrate that hybridizing hard and soft labels restores alignment between visual content and semantic supervision. Building on this insight, we propose a new training paradigm, Hard Label for Alleviating Local Semantic Drift (HALD), which uses hard labels as intermediate corrective signals while preserving the fine-grained benefits of soft labels. Extensive experiments on dataset distillation and large-scale classification benchmarks show consistent generalization improvements. On ImageNet-1K, our method achieves 42.7% accuracy with only 285M soft-label storage (reduces by 100X), outperforming prior state-of-the-art LPLD 9.0%.
This paper has not been read by Pith yet.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.