pith. machine review for the scientific record. sign in

arxiv: 2605.14145 · v1 · submitted 2026-05-13 · 💻 cs.CV

Recognition: no theorem link

Rethinking the Good Enough Embedding for Easy Few-Shot Learning

Authors on Pith no claims yet

Pith reviewed 2026-05-15 04:51 UTC · model grok-4.3

classification 💻 cs.CV
keywords few-shot learningDINOv2frozen embeddingsk-nearest neighborsmanifold refinementuniversal representationsnon-parametric classification
0
0 comments X

The pith

A frozen DINOv2 embedding paired with k-nearest neighbor classification reaches state-of-the-art few-shot accuracy without any fine-tuning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether large self-supervised vision models have converged on representations that are already universal enough for few-shot visual tasks. It freezes the DINOv2-L network, extracts features from a selected intermediate layer, and classifies new examples with a plain k-nearest neighbor lookup. Linear manifold steps using PCA followed by ICA are shown to regularize the space and raise accuracy further. On four standard few-shot benchmarks this non-parametric pipeline exceeds the results of current meta-learning algorithms that require backpropagation and episodic training. The work therefore asks whether a sufficiently good embedding removes the need for task-specific adaptation.

Core claim

By keeping the DINOv2-L weights fixed and feeding its layer activations into a k-nearest neighbor classifier, the method records higher accuracy than meta-learning baselines on four major few-shot benchmarks. Performance peaks at one particular layer; applying PCA and then ICA to the extracted features supplies a measurable regularization benefit that further lifts results.

What carries the argument

Frozen DINOv2-L features with layer selection and PCA-ICA manifold refinement inside a k-nearest neighbor classifier.

If this is right

  • Task-specific fine-tuning or learned metrics become unnecessary once a sufficiently universal embedding is available.
  • Layer choice inside a frozen network can be determined by cross-validation on the support set alone.
  • Simple linear operations such as PCA and ICA act as effective regularizers for high-dimensional frozen features.
  • Episodic training and backpropagation can be bypassed while still exceeding the accuracy of current meta-learning methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Continued scaling of self-supervised pretraining may eventually eliminate the need for any few-shot adaptation step.
  • The same frozen-embedding plus nearest-neighbor recipe could be tested on few-shot problems outside vision.
  • Practitioners gain a lightweight alternative that removes the computational overhead of meta-learning pipelines.

Load-bearing premise

The DINOv2 representation already encodes enough task-agnostic structure that nearest-neighbor lookup on frozen features matches the accuracy of models trained specifically for each new few-shot episode.

What would settle it

Running the identical frozen DINOv2 plus k-NN pipeline on a new few-shot benchmark where it falls below a standard meta-learning baseline would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.14145 by Alper Yilmaz, Michael Karnes.

Figure 1
Figure 1. Figure 1: The proposed feature extraction and classification pipeline. where pi,l represents the i-th patch token. The Principal Component Analysis (PCA) or Independent Component Analysis (ICA) projection is then trained from this set of embedded unlabeled images. The PCA or ICA is then used to project the centered latent vector into a reduced space s ∈ R d ′ via: s = WT (z − µ) (7) Stage 2: Supervised Synthesis of … view at source ↗
Figure 2
Figure 2. Figure 2: Layer-wise Accuracy Trends and Logistic Fits. (A) Raw classification perfor￾mance across the layers of the DINOv2-L backbone. Stars indicate the peak accuracy layer for each dataset. (B) Logistic function fits f(x) = L 1+e−k(x−x0) applied to the ac￾curacy data, where the high R 2 values (> 0.98) suggest that feature maturation follows a sigmoidal progression toward a semantic plateau [PITH_FULL_IMAGE:figu… view at source ↗
Figure 3
Figure 3. Figure 3: illustrates the resilience of the identified optimal layers and their immediate neighbors under varying levels of PCA-driven compression. By pro￾jecting the latent spaces into 512, 256, 128, and 64 components, we quantify the trade-off between representational efficiency and classification precision. Across all four benchmarks, we observe that accuracy remains remarkably stable when reducing the dimensiona… view at source ↗
Figure 4
Figure 4. Figure 4: qualitatively evaluates the CIFAR-FS and FC100 latent manifolds generated by our framework across raw, PCA, and ICA representations. For CIFAR-FS, both the raw and PCA-512 embeddings maintained distinct, well￾separated clusters with significant inter-class distances. In contrast, the ICA￾512 manifold exhibited reduced class spacing and more dispersed clusters. While FC100 proved significantly more challeng… view at source ↗
Figure 1
Figure 1. Figure 1: Few-Shot Learning Performance Across Backbone Layers. Mean accuracy are reported for 5-way tasks on the CIFAR-FS dataset across all 24 layer blocks of the frozen DINOv2-L backbone, comparing the performance of raw encodings in both 1- shot and 5-shot scenarios [PITH_FULL_IMAGE:figures/full_fig_p020_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Mean PCA performances aggregated across all four datasets using many-way classification (64 images per class). The results demonstrate that while the latent man￾ifold is remarkably resilient to moderate compression (512 to 256 components), a sig￾nificant performance decline exists at 64 components [PITH_FULL_IMAGE:figures/full_fig_p020_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Detailed PCA component progression for individual datasets (CIFAR-FS, FC100, miniImageNet, and tieredImageNet) under the many-way characterization setup. The sigmoidal maturation of features is preserved across 512, 256, and 128 di￾mensions, whereas the 64-component sweep collapses toward a near-random baseline, indicating a fundamental loss of semantic structure when using extreme compression [PITH_FULL_… view at source ↗
Figure 4
Figure 4. Figure 4: t-SNE manifold comparison for CIFAR-FS. The clusters remain well-separated across all PCA and ICA reduction levels, with highly cohesive query-to-support map￾ping [PITH_FULL_IMAGE:figures/full_fig_p023_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: t-SNE manifold comparison for the challenging FC100 dataset. Despite substan￾tial class overlap in the raw latent space, the 5-shot ICA refinement notably improves inter-class discriminability [PITH_FULL_IMAGE:figures/full_fig_p024_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: t-SNE manifold comparison for miniImageNet. The manifold exhibits excep￾tional resilience to linear projection, maintaining high cluster purity even at 128 com￾ponents [PITH_FULL_IMAGE:figures/full_fig_p025_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: t-SNE manifold comparison for tieredImageNet. The visualizations reinforce the observation that the Platonic feature structure is a robust, cross-domain backbone property [PITH_FULL_IMAGE:figures/full_fig_p026_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: This table provides the granular numerical results for the many-way charac￾terization across all 24 layers (shown in [PITH_FULL_IMAGE:figures/full_fig_p027_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: A comparative numerical view of the optimal layer blocks and neighbors across four compression levels (shown in [PITH_FULL_IMAGE:figures/full_fig_p028_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: A comparison of centroid of k-NN versus centroid and Mahalanobis, Euclidean, and Cosine similarity metrics across the four considered datasets. The Mahalanobis Distance with k-NN classification provided the highest accuracies for all 5-way 5-shot scenarios. The Cosine Similarity with centroids provided the highest accuracies for the 5-way 1-shot scenarios. The most significant effect of backbone selection… view at source ↗
read the original abstract

The field of deep visual recognition is undergoing a paradigm shift toward universal representations. The Platonic Representation Hypothesis suggests that diverse architectures trained on massive datasets are converging toward a shared, "ideal" latent space. This again raises a critical question: is a "Good Embedding All You Need?" In this paper, we leverage this convergence to demonstrate that off-the-shelf embeddings are inherently "good enough" for complex tasks, rendering intensive task-specific fine-tuning unnecessary. We explore this hypothesis within the few-shot learning framework, proposing a straightforward, non-parametric pipeline that entirely bypasses backpropagation. By utilizing a k-Nearest Neighbor classifier on frozen DINOv2-L features, we conduct a layer-wise characterization to identify an optimal feature extraction. We further demonstrate that manifold refinement via PCA and ICA provides a beneficial regularizing effect. Our results across four major benchmarks demonstrate that our approach consistently surpasses sophisticated meta-learning algorithms, achieving state-of-the-art performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript claims that a simple non-parametric pipeline—k-nearest-neighbor classification on frozen DINOv2-L features, optionally refined by PCA or ICA—achieves state-of-the-art few-shot performance on four standard benchmarks and consistently surpasses meta-learning methods, supporting the hypothesis that high-quality pretrained embeddings render task-specific adaptation unnecessary.

Significance. If the performance gains hold under controlled comparisons, the result would indicate that strong, task-agnostic embeddings can simplify few-shot learning pipelines and reduce reliance on gradient-based meta-learning, providing empirical support for the Platonic Representation Hypothesis in the few-shot regime.

major comments (2)
  1. [Experiments] Experiments section: meta-learning baselines (ProtoNet, MAML, etc.) are compared against their originally published numbers that use ResNet-12 or similar backbones trained on ImageNet-scale data, while the proposed method uses DINOv2-L pretrained on hundreds of millions of images. Without re-evaluating the baselines on identical DINOv2-L features, the reported gains cannot be attributed to the k-NN + PCA/ICA pipeline rather than embedding strength; this directly undermines the central claim of surpassing meta-learning.
  2. [Results] Results tables: no error bars, standard deviations, or details on the number of independent runs, random seeds, or episode sampling protocol are provided for the reported accuracies. This makes it impossible to assess whether the claimed improvements over baselines are statistically reliable.
minor comments (2)
  1. [Abstract] Abstract: the four major benchmarks are not named explicitly, which would allow readers to immediately gauge the scope of the evaluation.
  2. [Method] Section 3: the precise criterion used to select the 'optimal' DINOv2 layer across tasks should be stated explicitly (e.g., validation accuracy on a held-out split) to clarify whether the choice is task-specific or truly universal.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We are grateful to the referee for the thoughtful and constructive comments, which help clarify the scope and rigor of our claims. We address each major comment below and describe the revisions we will incorporate.

read point-by-point responses
  1. Referee: [Experiments] Experiments section: meta-learning baselines (ProtoNet, MAML, etc.) are compared against their originally published numbers that use ResNet-12 or similar backbones trained on ImageNet-scale data, while the proposed method uses DINOv2-L pretrained on hundreds of millions of images. Without re-evaluating the baselines on identical DINOv2-L features, the reported gains cannot be attributed to the k-NN + PCA/ICA pipeline rather than embedding strength; this directly undermines the central claim of surpassing meta-learning.

    Authors: We thank the referee for this precise observation. The manuscript's central hypothesis is that high-quality, task-agnostic embeddings (exemplified by DINOv2-L) render meta-learning adaptations unnecessary, consistent with the Platonic Representation Hypothesis. The comparison to originally published meta-learning numbers is therefore meant to illustrate that modern embeddings enable strong performance with a simple non-parametric method, rather than to isolate the contribution of k-NN + PCA/ICA on equal footing with weaker backbones. To address the concern directly, we will add controlled experiments in the revision that apply representative meta-learning algorithms (e.g., ProtoNet) to the same frozen DINOv2-L features; preliminary results indicate that meta-learning yields no meaningful improvement over k-NN, reinforcing our claim. We will also add explicit discussion clarifying the role of embedding strength versus the classifier. revision: partial

  2. Referee: [Results] Results tables: no error bars, standard deviations, or details on the number of independent runs, random seeds, or episode sampling protocol are provided for the reported accuracies. This makes it impossible to assess whether the claimed improvements over baselines are statistically reliable.

    Authors: We agree that the absence of statistical details is a limitation. In the revised manuscript we will update all result tables to report mean accuracy together with standard deviation computed over five independent runs that use distinct random seeds for episode sampling. We will also expand the experimental setup section with a precise description of the episode generation protocol (number of episodes, way-shot configuration, and seed handling) to enable readers to judge statistical reliability. revision: yes

standing simulated objections not resolved
  • Complete re-evaluation of every meta-learning baseline on DINOv2-L features, which would require re-implementing and running multiple complex meta-learning pipelines on a large-scale backbone and exceeds available computational resources for this study.

Circularity Check

0 steps flagged

No circularity; empirical evaluation on fixed external embedding

full rationale

The paper's derivation chain consists of taking a pre-trained frozen DINOv2-L model (external to the paper), extracting features from specific layers, applying k-NN classification, and optionally refining with PCA/ICA before reporting benchmark accuracies. No equations define a quantity in terms of itself, no parameters are fitted on the target few-shot data and then relabeled as predictions, and no uniqueness theorems or ansatzes are imported via self-citation. The central claim is supported by direct numerical comparisons against published baselines rather than by any reduction to the paper's own inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the Platonic Representation Hypothesis as a domain assumption and on the empirical observation that k-NN suffices once good features are chosen; no new entities are postulated and no free parameters are fitted beyond the choice of layer and optional dimensionality reduction.

free parameters (1)
  • selected DINOv2 layer
    Chosen after layer-wise characterization on the target benchmarks
axioms (1)
  • domain assumption Diverse architectures trained on massive datasets converge to a shared ideal latent space (Platonic Representation Hypothesis)
    Invoked in the introduction to justify using off-the-shelf frozen embeddings without adaptation

pith-pipeline@v0.9.0 · 5454 in / 1301 out tokens · 26744 ms · 2026-05-15T04:51:53.330394+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · 4 internal anchors

  1. [1]

    Bateni, P., Goyal, R., Masrani, V., Wood, F., Sigal, L.: Improved few-shot visual classification (2020),https://arxiv.org/abs/1912.03432

  2. [2]

    Chen, W., Si, C., Zhang, Z., Wang, L., Wang, Z., Tan, T.: Semantic prompt for few-shot image recognition (2023),https://arxiv.org/abs/2303.14123

  3. [3]

    Chen, Y., Liu, Z., Xu, H., Darrell, T., Wang, X.: Meta-baseline: Exploring simple meta-learning for few-shot learning (2021),https://arxiv.org/abs/2003.04390

  4. [4]

    In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV)

    Cheng, H., Yang, S., Zhou, J.T., Guo, L., Wen, B.: Frequency guidance matters in few-shot learning. In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV). pp. 11780–11790 (2023).https://doi.org/10.1109/ICCV51070. 2023.01085

  5. [5]

    In: Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XX

    Dong, B., Zhou, P., Yan, S., Zuo, W.: Self-promoted supervision for few-shot trans- former. In: Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XX. p. 329–347. Springer-Verlag, Berlin, Heidelberg (2022).https://doi.org/10.1007/978-3-031-20044-1_19, https://doi.org/10.1007/978-3-031-20044-1_19

  6. [6]

    In: Proceedings of the 34th International Conference on Machine Learning - Volume 70

    Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the 34th International Conference on Machine Learning - Volume 70. p. 1126–1135. ICML’17, JMLR.org (2017)

  7. [7]

    Deep Residual Learning for Image Recognition , isbn =

    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 770–778 (2016).https://doi.org/10.1109/CVPR.2016.90

  8. [8]

    In: Proceedings of the 36th International Conference on Neural Information Processing Systems

    Hiller, M., Ma, R., Harandi, M., Drummond, T.: Rethinking generalization in few- shot classification. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. NIPS ’22, Curran Associates Inc., Red Hook, NY, USA (2022)

  9. [9]

    Huh, M., Cheung, B., Wang, T., Isola, P.: The platonic representation hypothesis (2024),https://arxiv.org/abs/2405.07987

  10. [10]

    In: 2025 Inter- national Conference on Electronics and Renewable Systems (ICEARS)

    Khadse, S., Gourshettiwar, P., Pawar, A.: A review on meta-learning: How artifi- cial intelligence and machine learning can learn to adapt quickly. In: 2025 Inter- national Conference on Electronics and Renewable Systems (ICEARS). pp. 2038– 2043 (2025).https://doi.org/10.1109/ICEARS64219.2025.10941123

  11. [11]

    Krizhevsky, A.: Learning multiple layers of features from tiny images. Tech. rep., University of Toronto (2009),https://www.cs.toronto.edu/~kriz/learning- features-2009-TR.pdf

  12. [12]

    covid-simulation-commsmedicine.GitHub

    Krizhevsky, A.: cifar100.zip (May 2023).https://doi.org/10.5281/zenodo. 7978538,https://doi.org/10.5281/zenodo.7978538

  13. [13]

    Lee, K., Maji, S., Ravichandran, A., Soatto, S.: Meta-learning with differentiable convex optimization (2019),https://arxiv.org/abs/1904.03758

  14. [14]

    IEEE Transactions on Neural Networks and Learning Systems36(11), 19474–19488 (2025).https://doi.org/10.1109/ TNNLS.2023.3240195

    Li, Z., Tang, H., Peng, Z., Qi, G.J., Tang, J.: Knowledge-guided semantic transfer network for few-shot image recognition. IEEE Transactions on Neural Networks and Learning Systems36(11), 19474–19488 (2025).https://doi.org/10.1109/ TNNLS.2023.3240195

  15. [15]

    Lin, H., Han, G., Ma, J., Huang, S., Lin, X., Chang, S.F.: Supervised masked knowledge distillation for few-shot transformers (2023),https://arxiv.org/abs/ 2303.15466

  16. [16]

    web.illinois.edu/projects/mtl/download/Lmzjm9tX.html(2019), accessed: 2026-03-05 16 M

    Liu, Y.: Meta-transfer learning (mtl) project download page.https://yaoyaoliu. web.illinois.edu/projects/mtl/download/Lmzjm9tX.html(2019), accessed: 2026-03-05 16 M. Karnes et al

  17. [17]

    In: Proceedings of the 31st ACM Interna- tional Conference on Multimedia

    Lu, J., Wang, S., Zhang, X., Hao, Y., He, X.: Semantic-based selection, synthesis, and supervision for few-shot learning. In: Proceedings of the 31st ACM Interna- tional Conference on Multimedia. p. 3569–3578. MM ’23, Association for Comput- ing Machinery, New York, NY, USA (2023).https://doi.org/10.1145/3581783. 3611784,https://doi.org/10.1145/3581783.3611784

  18. [18]

    In: Proceedings of the 40th International Conference on Machine Learning

    Luo, X., Wu, H., Zhang, J., Gao, L., Xu, J., Song, J.: A closer look at few-shot clas- sification again. In: Proceedings of the 40th International Conference on Machine Learning. ICML’23, JMLR.org (2023)

  19. [19]

    1145/219717.219748

    Miller,G.A.:Wordnet:alexicaldatabaseforenglish.Commun.ACM38(11),39–41 (Nov 1995).https://doi.org/10.1145/219717.219748,https://doi.org/10. 1145/219717.219748

  20. [20]

    Biological Cybernetics85, 355–369 (11 2001).https://doi

    Neumann, H., Pessoa, L., Hansen, T.: Visual filling-in for computing perceptual surface properties. Biological Cybernetics85, 355–369 (11 2001).https://doi. org/10.1007/s004220100258

  21. [21]

    Transactions on Ma- chineLearningResearch(2024),https://openreview.net/forum?id=a68SUt6zFt, featured Certification

    Oquab, M., Darcet, T., Moutakanni, T., Vo, H.V., Szafraniec, M., Khalidov, V., Fernandez, P., HAZIZA, D., Massa, F., El-Nouby, A., Assran, M., Ballas, N., Galuba, W., Howes, R., Huang, P.Y., Li, S.W., Misra, I., Rabbat, M., Sharma, V., Synnaeve, G., Xu, H., Jegou, H., Mairal, J., Labatut, P., Joulin, A., Bojanowski, P.: DINOv2: Learning robust visual feat...

  22. [22]

    In: Proceedings of the 32nd International Con- ference on Neural Information Processing Systems

    Oreshkin, B.N., Rodriguez, P., Lacoste, A.: Tadam: task dependent adaptive met- ric for improved few-shot learning. In: Proceedings of the 32nd International Con- ference on Neural Information Processing Systems. p. 719–729. NIPS’18, Curran Associates Inc., Red Hook, NY, USA (2018)

  23. [23]

    In: 2019 IEEE/CVF International Conference on Com- puter Vision (ICCV)

    Peng, Z., Li, Z., Zhang, J., Li, Y., Qi, G.J., Tang, J.: Few-shot image recognition with knowledge transfer. In: 2019 IEEE/CVF International Conference on Com- puter Vision (ICCV). pp. 441–449 (2019).https://doi.org/10.1109/ICCV.2019. 00053

  24. [24]

    Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., Sutskever, I.: Learning transferable visual models from natural language supervision (2021),https://arxiv.org/abs/ 2103.00020

  25. [25]

    In: Inter- national Conference on Learning Representations (2017),https://openreview

    Ravi, S., Larochelle, H.: Optimization as a model for few-shot learning. In: Inter- national Conference on Learning Representations (2017),https://openreview. net/forum?id=rJY0-Kcll

  26. [26]

    Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision115(3), 211–252 (Dec 2015).https://doi.org/10.1007/s11263- 015- 0816- y,https://doi.org/10. 1007/s11263-015-0816-y

  27. [27]

    Meta-Learning with Latent Embedding Optimization

    Rusu, A.A., Rao, D., Sygnowski, J., Vinyals, O., Pascanu, R., Osindero, S., Hadsell, R.: Meta-learning with latent embedding optimization (2019),https://arxiv. org/abs/1807.05960

  28. [28]

    In: Proceedings of the 33rd Interna- tional Conference on International Conference on Machine Learning - Volume 48

    Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., Lillicrap, T.: Meta-learning with memory-augmented neural networks. In: Proceedings of the 33rd Interna- tional Conference on International Conference on Machine Learning - Volume 48. p. 1842–1850. ICML’16, JMLR.org (2016)

  29. [29]

    net/forum?id=BJj6qGbRW Rethinking the Good Enough Embedding 17

    Satorras, V.G.,Estrach,J.B.: Few-shotlearning withgraph neuralnetworks.In: In- ternational Conference on Learning Representations (2018),https://openreview. net/forum?id=BJj6qGbRW Rethinking the Good Enough Embedding 17

  30. [30]

    Bio- engineering12(8) (2025).https://doi.org/10.3390/bioengineering12080879, https://www.mdpi.com/2306-5354/12/8/879

    Singh, Y., Hathaway, Q.A., Keishing, V., Salehi, S., Wei, Y., Horvat, N., Vera- Garcia, D.V., Choudhary, A., Mula Kh, A., Quaia, E., Andersen, J.B.: Be- yond post hoc explanations: A comprehensive framework for accountable ai in medical imaging through transparency, interpretability, and explainability. Bio- engineering12(8) (2025).https://doi.org/10.3390...

  31. [31]

    In: Proceedings of the 31st International Conference on Neural Information Processing Systems

    Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. p. 4080–4090. NIPS’17, Curran Associates Inc., Red Hook, NY, USA (2017)

  32. [32]

    In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    Sun, Q., Liu, Y., Chua, T.S., Schiele, B.: Meta-transfer learning for few-shot learn- ing. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 403–412 (2019),https://yaoyaoliu.web.illinois.edu/projects/ mtl/

  33. [33]

    Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P.H.S., Hospedales, T.M.: Learning to compare: Relation network for few-shot learning (2018),https://arxiv.org/ abs/1711.06025

  34. [34]

    In: Proceedings of the Thirty- Fourth International Joint Conference on Artificial Intelligence

    Tang, H., He, S., Qin, J.: Connecting giants: synergistic knowledge transfer of large multimodal models for few-shot learning. In: Proceedings of the Thirty- Fourth International Joint Conference on Artificial Intelligence. IJCAI ’25 (2025). https://doi.org/10.24963/ijcai.2025/693,https://doi.org/10.24963/ ijcai.2025/693

  35. [35]

    Tian, Y., Wang, Y., Krishnan, D., Tenenbaum, J.B., Isola, P.: Rethinking few- shot image classification: A good embedding is all you need? In: Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV. p. 266–282. Springer-Verlag, Berlin, Heidelberg (2020). https://doi.org/10.1007/978-3-030-58568-6_16,ht...

  36. [36]

    In: Proceedings of the 30th International Confer- ence on Neural Information Processing Systems

    Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., Wierstra, D.: Matching networks for one shot learning. In: Proceedings of the 30th International Confer- ence on Neural Information Processing Systems. p. 3637–3645. NIPS’16, Curran Associates Inc., Red Hook, NY, USA (2016)

  37. [37]

    In: CVPR (2018),https://doi.org/10.1109/CVPR.2018

    Wang, X., Ye, Y., Gupta, A.: Zero-shot recognition via semantic embeddings and knowledge graphs. In: 2018 IEEE/CVF Conference on Computer Vision and Pat- tern Recognition. pp. 6857–6866 (2018).https://doi.org/10.1109/CVPR.2018. 00717

  38. [38]

    Systems13(7) (2025).https://doi.org/10.3390/systems13070534, https://www.mdpi.com/2079-8954/13/7/534

    Woo, J.M., Ju, S.H., Sung, J.H., Seo, K.M.: Meta-learning-based lstm-autoencoder for low-data anomaly detection in retrofitted cnc machine using multi-machine datasets. Systems13(7) (2025).https://doi.org/10.3390/systems13070534, https://www.mdpi.com/2079-8954/13/7/534

  39. [39]

    Curran Associates Inc., Red Hook, NY, USA (2019)

    Xing, C., Rostamzadeh, N., Oreshkin, B.N., Pinheiro, P.O.: Adaptive cross-modal few-shot learning. Curran Associates Inc., Red Hook, NY, USA (2019)

  40. [40]

    In: ICLR 2023 Workshop on Mathematical and Empirical Understanding of Foundation Models (2023),https://openreview

    Xu, Z., Shi, Z., Wei, J., Li, Y., Liang, Y.: Improving foundation models for few- shot learning via multitask finetuning. In: ICLR 2023 Workshop on Mathematical and Empirical Understanding of Foundation Models (2023),https://openreview. net/forum?id=szNb8Hp3d3

  41. [41]

    Yang, F., Wang, R., Chen, X.: Sega: Semantic guided attention on visual prototype for few-shot learning (2021),https://arxiv.org/abs/2111.04316

  42. [42]

    In: 2023 IEEE/CVF Winter Conference on Applications of Computer 18 M

    Yang, F., Wang, R., Chen, X.: Semantic guided latent parts embedding for few-shot learning. In: 2023 IEEE/CVF Winter Conference on Applications of Computer 18 M. Karnes et al. Vision (WACV). pp. 5436–5446 (2023).https://doi.org/10.1109/WACV56688. 2023.00541

  43. [43]

    Yang, Z., Wang, J., Zhu, Y.: Few-shot classification with contrastive learning (2022),https://arxiv.org/abs/2209.08224

  44. [44]

    Ye, H.J., Hu, H., Zhan, D.C., Sha, F.: Few-shot learning via embedding adaptation with set-to-set functions (2021),https://arxiv.org/abs/1812.03664

  45. [45]

    Zhang, B., Li, X., Ye, Y., Huang, Z., Zhang, L.: Prototype completion with primi- tive knowledge for few-shot learning (2021),https://arxiv.org/abs/2009.04960

  46. [46]

    Good Embedding

    Zhang,H.,Xu,J.,Jiang,S.,He,Z.:Simplesemantic-aidedfew-shotlearning(2024), https://arxiv.org/abs/2311.18649 Rethinking the Good Enough Embedding for Easy Few-Shot Learning: Supplemental Material Michael Karnes1 and Alper Yilmaz1 The Ohio State University, Columbus, OH 43210, USA karnes.30, yilmaz.15}@osu.edu 1 Extended Layer-wise Few-Shot Performance Analy...