arxiv: 2605.14145 · v1 · submitted 2026-05-13 · 💻 cs.CV

Recognition: no theorem link

Rethinking the Good Enough Embedding for Easy Few-Shot Learning

Michael Karnes , Alper Yilmaz

Authors on Pith no claims yet

Pith reviewed 2026-05-15 04:51 UTC · model grok-4.3

classification 💻 cs.CV

keywords few-shot learningDINOv2frozen embeddingsk-nearest neighborsmanifold refinementuniversal representationsnon-parametric classification

0 comments

The pith

A frozen DINOv2 embedding paired with k-nearest neighbor classification reaches state-of-the-art few-shot accuracy without any fine-tuning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether large self-supervised vision models have converged on representations that are already universal enough for few-shot visual tasks. It freezes the DINOv2-L network, extracts features from a selected intermediate layer, and classifies new examples with a plain k-nearest neighbor lookup. Linear manifold steps using PCA followed by ICA are shown to regularize the space and raise accuracy further. On four standard few-shot benchmarks this non-parametric pipeline exceeds the results of current meta-learning algorithms that require backpropagation and episodic training. The work therefore asks whether a sufficiently good embedding removes the need for task-specific adaptation.

Core claim

By keeping the DINOv2-L weights fixed and feeding its layer activations into a k-nearest neighbor classifier, the method records higher accuracy than meta-learning baselines on four major few-shot benchmarks. Performance peaks at one particular layer; applying PCA and then ICA to the extracted features supplies a measurable regularization benefit that further lifts results.

What carries the argument

Frozen DINOv2-L features with layer selection and PCA-ICA manifold refinement inside a k-nearest neighbor classifier.

If this is right

Task-specific fine-tuning or learned metrics become unnecessary once a sufficiently universal embedding is available.
Layer choice inside a frozen network can be determined by cross-validation on the support set alone.
Simple linear operations such as PCA and ICA act as effective regularizers for high-dimensional frozen features.
Episodic training and backpropagation can be bypassed while still exceeding the accuracy of current meta-learning methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Continued scaling of self-supervised pretraining may eventually eliminate the need for any few-shot adaptation step.
The same frozen-embedding plus nearest-neighbor recipe could be tested on few-shot problems outside vision.
Practitioners gain a lightweight alternative that removes the computational overhead of meta-learning pipelines.

Load-bearing premise

The DINOv2 representation already encodes enough task-agnostic structure that nearest-neighbor lookup on frozen features matches the accuracy of models trained specifically for each new few-shot episode.

What would settle it

Running the identical frozen DINOv2 plus k-NN pipeline on a new few-shot benchmark where it falls below a standard meta-learning baseline would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.14145 by Alper Yilmaz, Michael Karnes.

**Figure 1.** Figure 1: The proposed feature extraction and classification pipeline. where pi,l represents the i-th patch token. The Principal Component Analysis (PCA) or Independent Component Analysis (ICA) projection is then trained from this set of embedded unlabeled images. The PCA or ICA is then used to project the centered latent vector into a reduced space s ∈ R d ′ via: s = WT (z − µ) (7) Stage 2: Supervised Synthesis of … view at source ↗

**Figure 2.** Figure 2: Layer-wise Accuracy Trends and Logistic Fits. (A) Raw classification performance across the layers of the DINOv2-L backbone. Stars indicate the peak accuracy layer for each dataset. (B) Logistic function fits f(x) = L 1+e−k(x−x0) applied to the accuracy data, where the high R 2 values (> 0.98) suggest that feature maturation follows a sigmoidal progression toward a semantic plateau [PITH_FULL_IMAGE:figu… view at source ↗

**Figure 3.** Figure 3: illustrates the resilience of the identified optimal layers and their immediate neighbors under varying levels of PCA-driven compression. By projecting the latent spaces into 512, 256, 128, and 64 components, we quantify the trade-off between representational efficiency and classification precision. Across all four benchmarks, we observe that accuracy remains remarkably stable when reducing the dimensiona… view at source ↗

**Figure 4.** Figure 4: qualitatively evaluates the CIFAR-FS and FC100 latent manifolds generated by our framework across raw, PCA, and ICA representations. For CIFAR-FS, both the raw and PCA-512 embeddings maintained distinct, wellseparated clusters with significant inter-class distances. In contrast, the ICA512 manifold exhibited reduced class spacing and more dispersed clusters. While FC100 proved significantly more challeng… view at source ↗

**Figure 1.** Figure 1: Few-Shot Learning Performance Across Backbone Layers. Mean accuracy are reported for 5-way tasks on the CIFAR-FS dataset across all 24 layer blocks of the frozen DINOv2-L backbone, comparing the performance of raw encodings in both 1- shot and 5-shot scenarios [PITH_FULL_IMAGE:figures/full_fig_p020_1.png] view at source ↗

**Figure 2.** Figure 2: Mean PCA performances aggregated across all four datasets using many-way classification (64 images per class). The results demonstrate that while the latent manifold is remarkably resilient to moderate compression (512 to 256 components), a significant performance decline exists at 64 components [PITH_FULL_IMAGE:figures/full_fig_p020_2.png] view at source ↗

**Figure 3.** Figure 3: Detailed PCA component progression for individual datasets (CIFAR-FS, FC100, miniImageNet, and tieredImageNet) under the many-way characterization setup. The sigmoidal maturation of features is preserved across 512, 256, and 128 dimensions, whereas the 64-component sweep collapses toward a near-random baseline, indicating a fundamental loss of semantic structure when using extreme compression [PITH_FULL_… view at source ↗

**Figure 4.** Figure 4: t-SNE manifold comparison for CIFAR-FS. The clusters remain well-separated across all PCA and ICA reduction levels, with highly cohesive query-to-support mapping [PITH_FULL_IMAGE:figures/full_fig_p023_4.png] view at source ↗

**Figure 5.** Figure 5: t-SNE manifold comparison for the challenging FC100 dataset. Despite substantial class overlap in the raw latent space, the 5-shot ICA refinement notably improves inter-class discriminability [PITH_FULL_IMAGE:figures/full_fig_p024_5.png] view at source ↗

**Figure 6.** Figure 6: t-SNE manifold comparison for miniImageNet. The manifold exhibits exceptional resilience to linear projection, maintaining high cluster purity even at 128 components [PITH_FULL_IMAGE:figures/full_fig_p025_6.png] view at source ↗

**Figure 7.** Figure 7: t-SNE manifold comparison for tieredImageNet. The visualizations reinforce the observation that the Platonic feature structure is a robust, cross-domain backbone property [PITH_FULL_IMAGE:figures/full_fig_p026_7.png] view at source ↗

**Figure 8.** Figure 8: This table provides the granular numerical results for the many-way characterization across all 24 layers (shown in [PITH_FULL_IMAGE:figures/full_fig_p027_8.png] view at source ↗

**Figure 9.** Figure 9: A comparative numerical view of the optimal layer blocks and neighbors across four compression levels (shown in [PITH_FULL_IMAGE:figures/full_fig_p028_9.png] view at source ↗

**Figure 10.** Figure 10: A comparison of centroid of k-NN versus centroid and Mahalanobis, Euclidean, and Cosine similarity metrics across the four considered datasets. The Mahalanobis Distance with k-NN classification provided the highest accuracies for all 5-way 5-shot scenarios. The Cosine Similarity with centroids provided the highest accuracies for the 5-way 1-shot scenarios. The most significant effect of backbone selection… view at source ↗

read the original abstract

The field of deep visual recognition is undergoing a paradigm shift toward universal representations. The Platonic Representation Hypothesis suggests that diverse architectures trained on massive datasets are converging toward a shared, "ideal" latent space. This again raises a critical question: is a "Good Embedding All You Need?" In this paper, we leverage this convergence to demonstrate that off-the-shelf embeddings are inherently "good enough" for complex tasks, rendering intensive task-specific fine-tuning unnecessary. We explore this hypothesis within the few-shot learning framework, proposing a straightforward, non-parametric pipeline that entirely bypasses backpropagation. By utilizing a k-Nearest Neighbor classifier on frozen DINOv2-L features, we conduct a layer-wise characterization to identify an optimal feature extraction. We further demonstrate that manifold refinement via PCA and ICA provides a beneficial regularizing effect. Our results across four major benchmarks demonstrate that our approach consistently surpasses sophisticated meta-learning algorithms, achieving state-of-the-art performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Simple frozen DINOv2 k-NN pipeline claims to beat meta-learning, but backbone mismatch weakens the comparison.

read the letter

The key point is that this paper shows a simple frozen DINOv2-L embedding with k-nearest neighbors, plus layer selection and optional PCA or ICA, can outperform several meta-learning methods on standard few-shot vision benchmarks. The results are presented cleanly across multiple datasets. The work does a good job with the layer-wise characterization of the DINOv2 features. They identify which layers work best for few-shot tasks and demonstrate that a quick manifold refinement step adds a regularizing benefit without any backpropagation. This keeps the pipeline lightweight and avoids the complexity of training meta-learners from scratch. The empirical focus on off-the-shelf embeddings aligns with the broader trend toward universal representations, and the numbers they report are competitive enough to warrant attention. The main limitation is in how the baselines are handled. Most meta-learning approaches in the literature use smaller backbones like ResNet-12 trained on ImageNet, whereas DINOv2-L is a much larger model pretrained on far more data. Without a controlled experiment that applies the same meta-learning methods to the DINOv2 features, the performance edge could largely come from the embedding quality rather than the non-parametric classifier itself. The abstract also lacks error bars or details on baseline re-implementations, which makes it harder to judge the reliability of the gains. This paper would be useful for researchers or practitioners looking for low-effort few-shot solutions in computer vision, especially those already using strong pretrained models. It is not a fundamental advance but offers a practical data point on when simple methods suffice. I would recommend sending it for peer review, with the expectation that reviewers will ask for tighter baseline controls and more statistical detail on the results.

Referee Report

2 major / 2 minor

Summary. The manuscript claims that a simple non-parametric pipeline—k-nearest-neighbor classification on frozen DINOv2-L features, optionally refined by PCA or ICA—achieves state-of-the-art few-shot performance on four standard benchmarks and consistently surpasses meta-learning methods, supporting the hypothesis that high-quality pretrained embeddings render task-specific adaptation unnecessary.

Significance. If the performance gains hold under controlled comparisons, the result would indicate that strong, task-agnostic embeddings can simplify few-shot learning pipelines and reduce reliance on gradient-based meta-learning, providing empirical support for the Platonic Representation Hypothesis in the few-shot regime.

major comments (2)

[Experiments] Experiments section: meta-learning baselines (ProtoNet, MAML, etc.) are compared against their originally published numbers that use ResNet-12 or similar backbones trained on ImageNet-scale data, while the proposed method uses DINOv2-L pretrained on hundreds of millions of images. Without re-evaluating the baselines on identical DINOv2-L features, the reported gains cannot be attributed to the k-NN + PCA/ICA pipeline rather than embedding strength; this directly undermines the central claim of surpassing meta-learning.
[Results] Results tables: no error bars, standard deviations, or details on the number of independent runs, random seeds, or episode sampling protocol are provided for the reported accuracies. This makes it impossible to assess whether the claimed improvements over baselines are statistically reliable.

minor comments (2)

[Abstract] Abstract: the four major benchmarks are not named explicitly, which would allow readers to immediately gauge the scope of the evaluation.
[Method] Section 3: the precise criterion used to select the 'optimal' DINOv2 layer across tasks should be stated explicitly (e.g., validation accuracy on a held-out split) to clarify whether the choice is task-specific or truly universal.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We are grateful to the referee for the thoughtful and constructive comments, which help clarify the scope and rigor of our claims. We address each major comment below and describe the revisions we will incorporate.

read point-by-point responses

Referee: [Experiments] Experiments section: meta-learning baselines (ProtoNet, MAML, etc.) are compared against their originally published numbers that use ResNet-12 or similar backbones trained on ImageNet-scale data, while the proposed method uses DINOv2-L pretrained on hundreds of millions of images. Without re-evaluating the baselines on identical DINOv2-L features, the reported gains cannot be attributed to the k-NN + PCA/ICA pipeline rather than embedding strength; this directly undermines the central claim of surpassing meta-learning.

Authors: We thank the referee for this precise observation. The manuscript's central hypothesis is that high-quality, task-agnostic embeddings (exemplified by DINOv2-L) render meta-learning adaptations unnecessary, consistent with the Platonic Representation Hypothesis. The comparison to originally published meta-learning numbers is therefore meant to illustrate that modern embeddings enable strong performance with a simple non-parametric method, rather than to isolate the contribution of k-NN + PCA/ICA on equal footing with weaker backbones. To address the concern directly, we will add controlled experiments in the revision that apply representative meta-learning algorithms (e.g., ProtoNet) to the same frozen DINOv2-L features; preliminary results indicate that meta-learning yields no meaningful improvement over k-NN, reinforcing our claim. We will also add explicit discussion clarifying the role of embedding strength versus the classifier. revision: partial
Referee: [Results] Results tables: no error bars, standard deviations, or details on the number of independent runs, random seeds, or episode sampling protocol are provided for the reported accuracies. This makes it impossible to assess whether the claimed improvements over baselines are statistically reliable.

Authors: We agree that the absence of statistical details is a limitation. In the revised manuscript we will update all result tables to report mean accuracy together with standard deviation computed over five independent runs that use distinct random seeds for episode sampling. We will also expand the experimental setup section with a precise description of the episode generation protocol (number of episodes, way-shot configuration, and seed handling) to enable readers to judge statistical reliability. revision: yes

standing simulated objections not resolved

Complete re-evaluation of every meta-learning baseline on DINOv2-L features, which would require re-implementing and running multiple complex meta-learning pipelines on a large-scale backbone and exceeds available computational resources for this study.

Circularity Check

0 steps flagged

No circularity; empirical evaluation on fixed external embedding

full rationale

The paper's derivation chain consists of taking a pre-trained frozen DINOv2-L model (external to the paper), extracting features from specific layers, applying k-NN classification, and optionally refining with PCA/ICA before reporting benchmark accuracies. No equations define a quantity in terms of itself, no parameters are fitted on the target few-shot data and then relabeled as predictions, and no uniqueness theorems or ansatzes are imported via self-citation. The central claim is supported by direct numerical comparisons against published baselines rather than by any reduction to the paper's own inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the Platonic Representation Hypothesis as a domain assumption and on the empirical observation that k-NN suffices once good features are chosen; no new entities are postulated and no free parameters are fitted beyond the choice of layer and optional dimensionality reduction.

free parameters (1)

selected DINOv2 layer
Chosen after layer-wise characterization on the target benchmarks

axioms (1)

domain assumption Diverse architectures trained on massive datasets converge to a shared ideal latent space (Platonic Representation Hypothesis)
Invoked in the introduction to justify using off-the-shelf frozen embeddings without adaptation

pith-pipeline@v0.9.0 · 5454 in / 1301 out tokens · 26744 ms · 2026-05-15T04:51:53.330394+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · 4 internal anchors

[1]

Bateni, P., Goyal, R., Masrani, V., Wood, F., Sigal, L.: Improved few-shot visual classification (2020),https://arxiv.org/abs/1912.03432

work page arXiv 2020
[2]

Chen, W., Si, C., Zhang, Z., Wang, L., Wang, Z., Tan, T.: Semantic prompt for few-shot image recognition (2023),https://arxiv.org/abs/2303.14123

work page arXiv 2023
[3]

Chen, Y., Liu, Z., Xu, H., Darrell, T., Wang, X.: Meta-baseline: Exploring simple meta-learning for few-shot learning (2021),https://arxiv.org/abs/2003.04390

work page arXiv 2021
[4]

In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV)

Cheng, H., Yang, S., Zhou, J.T., Guo, L., Wen, B.: Frequency guidance matters in few-shot learning. In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV). pp. 11780–11790 (2023).https://doi.org/10.1109/ICCV51070. 2023.01085

work page doi:10.1109/iccv51070 2023
[5]

In: Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XX

Dong, B., Zhou, P., Yan, S., Zuo, W.: Self-promoted supervision for few-shot trans- former. In: Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XX. p. 329–347. Springer-Verlag, Berlin, Heidelberg (2022).https://doi.org/10.1007/978-3-031-20044-1_19, https://doi.org/10.1007/978-3-031-20044-1_19

work page doi:10.1007/978-3-031-20044-1_19 2022
[6]

In: Proceedings of the 34th International Conference on Machine Learning - Volume 70

Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the 34th International Conference on Machine Learning - Volume 70. p. 1126–1135. ICML’17, JMLR.org (2017)

work page 2017
[7]

Deep Residual Learning for Image Recognition , isbn =

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 770–778 (2016).https://doi.org/10.1109/CVPR.2016.90

work page doi:10.1109/cvpr.2016.90 2016
[8]

In: Proceedings of the 36th International Conference on Neural Information Processing Systems

Hiller, M., Ma, R., Harandi, M., Drummond, T.: Rethinking generalization in few- shot classification. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. NIPS ’22, Curran Associates Inc., Red Hook, NY, USA (2022)

work page 2022
[9]

Huh, M., Cheung, B., Wang, T., Isola, P.: The platonic representation hypothesis (2024),https://arxiv.org/abs/2405.07987

work page Pith review arXiv 2024
[10]

In: 2025 Inter- national Conference on Electronics and Renewable Systems (ICEARS)

Khadse, S., Gourshettiwar, P., Pawar, A.: A review on meta-learning: How artifi- cial intelligence and machine learning can learn to adapt quickly. In: 2025 Inter- national Conference on Electronics and Renewable Systems (ICEARS). pp. 2038– 2043 (2025).https://doi.org/10.1109/ICEARS64219.2025.10941123

work page doi:10.1109/icears64219.2025.10941123 2025
[11]

Krizhevsky, A.: Learning multiple layers of features from tiny images. Tech. rep., University of Toronto (2009),https://www.cs.toronto.edu/~kriz/learning- features-2009-TR.pdf

work page 2009
[12]

covid-simulation-commsmedicine.GitHub

Krizhevsky, A.: cifar100.zip (May 2023).https://doi.org/10.5281/zenodo. 7978538,https://doi.org/10.5281/zenodo.7978538

work page doi:10.5281/zenodo 2023
[13]

Lee, K., Maji, S., Ravichandran, A., Soatto, S.: Meta-learning with differentiable convex optimization (2019),https://arxiv.org/abs/1904.03758

work page internal anchor Pith review Pith/arXiv arXiv 2019
[14]

IEEE Transactions on Neural Networks and Learning Systems36(11), 19474–19488 (2025).https://doi.org/10.1109/ TNNLS.2023.3240195

Li, Z., Tang, H., Peng, Z., Qi, G.J., Tang, J.: Knowledge-guided semantic transfer network for few-shot image recognition. IEEE Transactions on Neural Networks and Learning Systems36(11), 19474–19488 (2025).https://doi.org/10.1109/ TNNLS.2023.3240195

work page arXiv 2025
[15]

Lin, H., Han, G., Ma, J., Huang, S., Lin, X., Chang, S.F.: Supervised masked knowledge distillation for few-shot transformers (2023),https://arxiv.org/abs/ 2303.15466

work page arXiv 2023
[16]

web.illinois.edu/projects/mtl/download/Lmzjm9tX.html(2019), accessed: 2026-03-05 16 M

Liu, Y.: Meta-transfer learning (mtl) project download page.https://yaoyaoliu. web.illinois.edu/projects/mtl/download/Lmzjm9tX.html(2019), accessed: 2026-03-05 16 M. Karnes et al

work page 2019
[17]

In: Proceedings of the 31st ACM Interna- tional Conference on Multimedia

Lu, J., Wang, S., Zhang, X., Hao, Y., He, X.: Semantic-based selection, synthesis, and supervision for few-shot learning. In: Proceedings of the 31st ACM Interna- tional Conference on Multimedia. p. 3569–3578. MM ’23, Association for Comput- ing Machinery, New York, NY, USA (2023).https://doi.org/10.1145/3581783. 3611784,https://doi.org/10.1145/3581783.3611784

work page doi:10.1145/3581783 2023
[18]

In: Proceedings of the 40th International Conference on Machine Learning

Luo, X., Wu, H., Zhang, J., Gao, L., Xu, J., Song, J.: A closer look at few-shot clas- sification again. In: Proceedings of the 40th International Conference on Machine Learning. ICML’23, JMLR.org (2023)

work page 2023
[19]

1145/219717.219748

Miller,G.A.:Wordnet:alexicaldatabaseforenglish.Commun.ACM38(11),39–41 (Nov 1995).https://doi.org/10.1145/219717.219748,https://doi.org/10. 1145/219717.219748

work page doi:10.1145/219717.219748 1995
[20]

Biological Cybernetics85, 355–369 (11 2001).https://doi

Neumann, H., Pessoa, L., Hansen, T.: Visual filling-in for computing perceptual surface properties. Biological Cybernetics85, 355–369 (11 2001).https://doi. org/10.1007/s004220100258

work page doi:10.1007/s004220100258 2001
[21]

Transactions on Ma- chineLearningResearch(2024),https://openreview.net/forum?id=a68SUt6zFt, featured Certification

Oquab, M., Darcet, T., Moutakanni, T., Vo, H.V., Szafraniec, M., Khalidov, V., Fernandez, P., HAZIZA, D., Massa, F., El-Nouby, A., Assran, M., Ballas, N., Galuba, W., Howes, R., Huang, P.Y., Li, S.W., Misra, I., Rabbat, M., Sharma, V., Synnaeve, G., Xu, H., Jegou, H., Mairal, J., Labatut, P., Joulin, A., Bojanowski, P.: DINOv2: Learning robust visual feat...

work page 2024
[22]

In: Proceedings of the 32nd International Con- ference on Neural Information Processing Systems

Oreshkin, B.N., Rodriguez, P., Lacoste, A.: Tadam: task dependent adaptive met- ric for improved few-shot learning. In: Proceedings of the 32nd International Con- ference on Neural Information Processing Systems. p. 719–729. NIPS’18, Curran Associates Inc., Red Hook, NY, USA (2018)

work page 2018
[23]

In: 2019 IEEE/CVF International Conference on Com- puter Vision (ICCV)

Peng, Z., Li, Z., Zhang, J., Li, Y., Qi, G.J., Tang, J.: Few-shot image recognition with knowledge transfer. In: 2019 IEEE/CVF International Conference on Com- puter Vision (ICCV). pp. 441–449 (2019).https://doi.org/10.1109/ICCV.2019. 00053

work page doi:10.1109/iccv.2019 2019
[24]

Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., Sutskever, I.: Learning transferable visual models from natural language supervision (2021),https://arxiv.org/abs/ 2103.00020

work page internal anchor Pith review Pith/arXiv arXiv 2021
[25]

In: Inter- national Conference on Learning Representations (2017),https://openreview

Ravi, S., Larochelle, H.: Optimization as a model for few-shot learning. In: Inter- national Conference on Learning Representations (2017),https://openreview. net/forum?id=rJY0-Kcll

work page 2017
[26]

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision115(3), 211–252 (Dec 2015).https://doi.org/10.1007/s11263- 015- 0816- y,https://doi.org/10. 1007/s11263-015-0816-y

work page doi:10.1007/s11263- 2015
[27]

Meta-Learning with Latent Embedding Optimization

Rusu, A.A., Rao, D., Sygnowski, J., Vinyals, O., Pascanu, R., Osindero, S., Hadsell, R.: Meta-learning with latent embedding optimization (2019),https://arxiv. org/abs/1807.05960

work page internal anchor Pith review Pith/arXiv arXiv 2019
[28]

In: Proceedings of the 33rd Interna- tional Conference on International Conference on Machine Learning - Volume 48

Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., Lillicrap, T.: Meta-learning with memory-augmented neural networks. In: Proceedings of the 33rd Interna- tional Conference on International Conference on Machine Learning - Volume 48. p. 1842–1850. ICML’16, JMLR.org (2016)

work page 2016
[29]

net/forum?id=BJj6qGbRW Rethinking the Good Enough Embedding 17

Satorras, V.G.,Estrach,J.B.: Few-shotlearning withgraph neuralnetworks.In: In- ternational Conference on Learning Representations (2018),https://openreview. net/forum?id=BJj6qGbRW Rethinking the Good Enough Embedding 17

work page 2018
[30]

Bio- engineering12(8) (2025).https://doi.org/10.3390/bioengineering12080879, https://www.mdpi.com/2306-5354/12/8/879

Singh, Y., Hathaway, Q.A., Keishing, V., Salehi, S., Wei, Y., Horvat, N., Vera- Garcia, D.V., Choudhary, A., Mula Kh, A., Quaia, E., Andersen, J.B.: Be- yond post hoc explanations: A comprehensive framework for accountable ai in medical imaging through transparency, interpretability, and explainability. Bio- engineering12(8) (2025).https://doi.org/10.3390...

work page doi:10.3390/bioengineering12080879 2025
[31]

In: Proceedings of the 31st International Conference on Neural Information Processing Systems

Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. p. 4080–4090. NIPS’17, Curran Associates Inc., Red Hook, NY, USA (2017)

work page 2017
[32]

In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Sun, Q., Liu, Y., Chua, T.S., Schiele, B.: Meta-transfer learning for few-shot learn- ing. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 403–412 (2019),https://yaoyaoliu.web.illinois.edu/projects/ mtl/

work page 2019
[33]

Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P.H.S., Hospedales, T.M.: Learning to compare: Relation network for few-shot learning (2018),https://arxiv.org/ abs/1711.06025

work page internal anchor Pith review Pith/arXiv arXiv 2018
[34]

In: Proceedings of the Thirty- Fourth International Joint Conference on Artificial Intelligence

Tang, H., He, S., Qin, J.: Connecting giants: synergistic knowledge transfer of large multimodal models for few-shot learning. In: Proceedings of the Thirty- Fourth International Joint Conference on Artificial Intelligence. IJCAI ’25 (2025). https://doi.org/10.24963/ijcai.2025/693,https://doi.org/10.24963/ ijcai.2025/693

work page doi:10.24963/ijcai.2025/693 2025
[35]

Tian, Y., Wang, Y., Krishnan, D., Tenenbaum, J.B., Isola, P.: Rethinking few- shot image classification: A good embedding is all you need? In: Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV. p. 266–282. Springer-Verlag, Berlin, Heidelberg (2020). https://doi.org/10.1007/978-3-030-58568-6_16,ht...

work page doi:10.1007/978-3-030-58568-6_16 2020
[36]

In: Proceedings of the 30th International Confer- ence on Neural Information Processing Systems

Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., Wierstra, D.: Matching networks for one shot learning. In: Proceedings of the 30th International Confer- ence on Neural Information Processing Systems. p. 3637–3645. NIPS’16, Curran Associates Inc., Red Hook, NY, USA (2016)

work page 2016
[37]

In: CVPR (2018),https://doi.org/10.1109/CVPR.2018

Wang, X., Ye, Y., Gupta, A.: Zero-shot recognition via semantic embeddings and knowledge graphs. In: 2018 IEEE/CVF Conference on Computer Vision and Pat- tern Recognition. pp. 6857–6866 (2018).https://doi.org/10.1109/CVPR.2018. 00717

work page doi:10.1109/cvpr.2018 2018
[38]

Systems13(7) (2025).https://doi.org/10.3390/systems13070534, https://www.mdpi.com/2079-8954/13/7/534

Woo, J.M., Ju, S.H., Sung, J.H., Seo, K.M.: Meta-learning-based lstm-autoencoder for low-data anomaly detection in retrofitted cnc machine using multi-machine datasets. Systems13(7) (2025).https://doi.org/10.3390/systems13070534, https://www.mdpi.com/2079-8954/13/7/534

work page doi:10.3390/systems13070534 2025
[39]

Curran Associates Inc., Red Hook, NY, USA (2019)

Xing, C., Rostamzadeh, N., Oreshkin, B.N., Pinheiro, P.O.: Adaptive cross-modal few-shot learning. Curran Associates Inc., Red Hook, NY, USA (2019)

work page 2019
[40]

In: ICLR 2023 Workshop on Mathematical and Empirical Understanding of Foundation Models (2023),https://openreview

Xu, Z., Shi, Z., Wei, J., Li, Y., Liang, Y.: Improving foundation models for few- shot learning via multitask finetuning. In: ICLR 2023 Workshop on Mathematical and Empirical Understanding of Foundation Models (2023),https://openreview. net/forum?id=szNb8Hp3d3

work page 2023
[41]

Yang, F., Wang, R., Chen, X.: Sega: Semantic guided attention on visual prototype for few-shot learning (2021),https://arxiv.org/abs/2111.04316

work page arXiv 2021
[42]

In: 2023 IEEE/CVF Winter Conference on Applications of Computer 18 M

Yang, F., Wang, R., Chen, X.: Semantic guided latent parts embedding for few-shot learning. In: 2023 IEEE/CVF Winter Conference on Applications of Computer 18 M. Karnes et al. Vision (WACV). pp. 5436–5446 (2023).https://doi.org/10.1109/WACV56688. 2023.00541

work page doi:10.1109/wacv56688 2023
[43]

Yang, Z., Wang, J., Zhu, Y.: Few-shot classification with contrastive learning (2022),https://arxiv.org/abs/2209.08224

work page arXiv 2022
[44]

Ye, H.J., Hu, H., Zhan, D.C., Sha, F.: Few-shot learning via embedding adaptation with set-to-set functions (2021),https://arxiv.org/abs/1812.03664

work page arXiv 2021
[45]

Zhang, B., Li, X., Ye, Y., Huang, Z., Zhang, L.: Prototype completion with primi- tive knowledge for few-shot learning (2021),https://arxiv.org/abs/2009.04960

work page arXiv 2021
[46]

Good Embedding

Zhang,H.,Xu,J.,Jiang,S.,He,Z.:Simplesemantic-aidedfew-shotlearning(2024), https://arxiv.org/abs/2311.18649 Rethinking the Good Enough Embedding for Easy Few-Shot Learning: Supplemental Material Michael Karnes1 and Alper Yilmaz1 The Ohio State University, Columbus, OH 43210, USA karnes.30, yilmaz.15}@osu.edu 1 Extended Layer-wise Few-Shot Performance Analy...

work page arXiv 2024