{"paper":{"title":"Rethinking the Good Enough Embedding for Easy Few-Shot Learning","license":"http://creativecommons.org/licenses/by-nc-sa/4.0/","headline":"A frozen DINOv2 embedding paired with k-nearest neighbor classification reaches state-of-the-art few-shot accuracy without any fine-tuning.","cross_cats":[],"primary_cat":"cs.CV","authors_text":"Alper Yilmaz, Michael Karnes","submitted_at":"2026-05-13T21:52:05Z","abstract_excerpt":"The field of deep visual recognition is undergoing a paradigm shift toward universal representations. The Platonic Representation Hypothesis suggests that diverse architectures trained on massive datasets are converging toward a shared, \"ideal\" latent space. This again raises a critical question: is a \"Good Embedding All You Need?\" In this paper, we leverage this convergence to demonstrate that off-the-shelf embeddings are inherently \"good enough\" for complex tasks, rendering intensive task-specific fine-tuning unnecessary. We explore this hypothesis within the few-shot learning framework, pro"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"By utilizing a k-Nearest Neighbor classifier on frozen DINOv2-L features, we conduct a layer-wise characterization to identify an optimal feature extraction. We further demonstrate that manifold refinement via PCA and ICA provides a beneficial regularizing effect. Our results across four major benchmarks demonstrate that our approach consistently surpasses sophisticated meta-learning algorithms, achieving state-of-the-art performance.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The assumption that DINOv2 features already form a sufficiently universal and task-agnostic representation so that no task-specific adaptation or learned metric is required; this is invoked when claiming the frozen embedding plus k-NN is 'good enough' for complex tasks.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Frozen DINOv2-L features with k-NN classification and PCA/ICA refinement achieve state-of-the-art few-shot performance on four benchmarks without any backpropagation or fine-tuning.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"A frozen DINOv2 embedding paired with k-nearest neighbor classification reaches state-of-the-art few-shot accuracy without any fine-tuning.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"1d31205e12aa52549e2b7d680ec6c14c850bafb7d2abcab36429f5d28d2d3258"},"source":{"id":"2605.14145","kind":"arxiv","version":1},"verdict":{"id":"b85ad5ee-4c7c-4f2b-9f2c-a45d9f6eaf33","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T04:51:53.330394Z","strongest_claim":"By utilizing a k-Nearest Neighbor classifier on frozen DINOv2-L features, we conduct a layer-wise characterization to identify an optimal feature extraction. We further demonstrate that manifold refinement via PCA and ICA provides a beneficial regularizing effect. Our results across four major benchmarks demonstrate that our approach consistently surpasses sophisticated meta-learning algorithms, achieving state-of-the-art performance.","one_line_summary":"Frozen DINOv2-L features with k-NN classification and PCA/ICA refinement achieve state-of-the-art few-shot performance on four benchmarks without any backpropagation or fine-tuning.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The assumption that DINOv2 features already form a sufficiently universal and task-agnostic representation so that no task-specific adaptation or learned metric is required; this is invoked when claiming the frozen embedding plus k-NN is 'good enough' for complex tasks.","pith_extraction_headline":"A frozen DINOv2 embedding paired with k-nearest neighbor classification reaches state-of-the-art few-shot accuracy without any fine-tuning."},"references":{"count":46,"sample":[{"doi":"","year":2020,"title":"Bateni, P., Goyal, R., Masrani, V., Wood, F., Sigal, L.: Improved few-shot visual classification (2020),https://arxiv.org/abs/1912.03432","work_id":"d602bd10-34f9-4848-b606-06d55a828d21","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2023,"title":"Chen, W., Si, C., Zhang, Z., Wang, L., Wang, Z., Tan, T.: Semantic prompt for few-shot image recognition (2023),https://arxiv.org/abs/2303.14123","work_id":"92fee499-ca27-44f5-b3d0-35f0c3dbf56e","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2021,"title":"Chen, Y., Liu, Z., Xu, H., Darrell, T., Wang, X.: Meta-baseline: Exploring simple meta-learning for few-shot learning (2021),https://arxiv.org/abs/2003.04390","work_id":"ad9fe5ef-a9d3-4289-8b62-96cdc0ccecb0","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"10.1109/iccv51070","year":2023,"title":"st + 1 K KX k=1 vt k # =E t","work_id":"c82688eb-98da-44ce-8f90-cf2ac44cd408","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"10.1007/978-3-031-20044-1_19","year":2022,"title":"In: Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XX","work_id":"4aae9c62-970d-4340-a3c5-572c624c6f52","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":46,"snapshot_sha256":"ebb29b23ce0d2521dcbf4b3715549b28bf655c2941b3ffc4643221cd328ede64","internal_anchors":4},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}