The paper identifies an encoding mismatch in ViT feature distillation from per-image compressibility versus dataset subspace rotations and broad spectral energy patterns, proposing Lift and WideLast remedies that improve DeiT-Tiny accuracy from 74.86% to 77.53-78.23% on ImageNet-1K.
Learning efficient vision transformers via fine-grained manifold distillation
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
From Per-Image Low-Rank to Encoding Mismatch: Rethinking Feature Distillation in Vision Transformers
The paper identifies an encoding mismatch in ViT feature distillation from per-image compressibility versus dataset subspace rotations and broad spectral energy patterns, proposing Lift and WideLast remedies that improve DeiT-Tiny accuracy from 74.86% to 77.53-78.23% on ImageNet-1K.