{"paper":{"title":"From Per-Image Low-Rank to Encoding Mismatch: Rethinking Feature Distillation in Vision Transformers","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"An encoding mismatch from per-image low-rank features and rotating dataset subspaces blocks feature-map distillation for compressing Vision Transformers.","cross_cats":[],"primary_cat":"cs.CV","authors_text":"Bonan Xu, Huiyuan Tian, Shijian Li","submitted_at":"2025-11-19T16:03:21Z","abstract_excerpt":"Feature-map knowledge distillation (KD) transfers internal representations well between comparably sized Vision Transformers (ViTs), but it often fails in compression. We revisit this failure and uncover a paradox. Sample-wise SVD shows that each image is highly compressible, which seems to suggest that a narrow student with a linear projector should match the teacher \"in principle\". However, a dataset-level view contradicts this intuition: PCA shows that the teacher is a union of low-rank subspaces with significant subspace rotation across inputs. We further introduce token-level Spectral Ene"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"We refer to this combined phenomenon as an encoding mismatch. We propose two minimal remedies, Lift or WideLast... On ImageNet-1K, these fixes revive feature KD for ViT compression, improving DeiT-Tiny distilled from CaiT-S24 from 74.86% to 77.53%/78.23% top-1 accuracy.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The assumption that the observed per-image low-rank structure, dataset-level subspace rotations, and token-level spectral energy patterns are the primary and causal reasons for feature-map KD failure in compression, rather than other factors such as optimization dynamics or capacity limits.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"The paper identifies an encoding mismatch in ViT feature distillation from per-image compressibility versus dataset subspace rotations and broad spectral energy patterns, proposing Lift and WideLast remedies that improve DeiT-Tiny accuracy from 74.86% to 77.53-78.23% on ImageNet-1K.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"An encoding mismatch from per-image low-rank features and rotating dataset subspaces blocks feature-map distillation for compressing Vision Transformers.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"131f4edf674c8156b3ccbdc07db66a596225e0a2b54b99c187b1abe371a97035"},"source":{"id":"2511.15572","kind":"arxiv","version":3},"verdict":{"id":"279a9c32-b2cf-4cb6-8506-5765e92914fc","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-17T20:19:20.348005Z","strongest_claim":"We refer to this combined phenomenon as an encoding mismatch. We propose two minimal remedies, Lift or WideLast... On ImageNet-1K, these fixes revive feature KD for ViT compression, improving DeiT-Tiny distilled from CaiT-S24 from 74.86% to 77.53%/78.23% top-1 accuracy.","one_line_summary":"The paper identifies an encoding mismatch in ViT feature distillation from per-image compressibility versus dataset subspace rotations and broad spectral energy patterns, proposing Lift and WideLast remedies that improve DeiT-Tiny accuracy from 74.86% to 77.53-78.23% on ImageNet-1K.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The assumption that the observed per-image low-rank structure, dataset-level subspace rotations, and token-level spectral energy patterns are the primary and causal reasons for feature-map KD failure in compression, rather than other factors such as optimization dynamics or capacity limits.","pith_extraction_headline":"An encoding mismatch from per-image low-rank features and rotating dataset subspaces blocks feature-map distillation for compressing Vision Transformers."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2511.15572/integrity.json","findings":[],"available":true,"detectors_run":[],"snapshot_sha256":"c28c3603d3b5d939e8dc4c7e95fa8dfce3d595e45f758748cecf8e644a296938"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}