pith. sign in

arxiv: 2507.22136 · v3 · pith:35HDZVJJnew · submitted 2025-07-29 · 💻 cs.CV

Color as the Impetus: Transforming Few-Shot Learner

Pith reviewed 2026-05-21 23:52 UTC · model grok-4.3

classification 💻 cs.CV
keywords few-shot learningcolor perceptionmeta-learninginter-channel interactionbio-inspired learningknowledge distillationcross-domain transfer
0
0 comments X

The pith

Simulating human color perception through channel interactions improves few-shot classification by extracting stronger intra-class commonalities and inter-class differences.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that conventional meta-learning overlooks color as a basic visual cue that humans use for rapid categorization. It introduces the ColorSense Learner, which processes inter-channel color information to filter noise and highlight discriminative traits. A companion ColorSense Distiller adds teacher knowledge via distillation to further boost the student's meta-learning. Comprehensive tests across eleven benchmarks in coarse, fine-grained, and cross-domain settings are used to show gains in generalization and robustness.

Core claim

We pioneer an innovative viewpoint on few-shot learning by simulating human color perception mechanisms. We propose the ColorSense Learner, a bio-inspired meta-learning framework that capitalizes on inter-channel feature extraction and interactive learning. By strategically emphasizing distinct color information across different channels, our approach effectively filters irrelevant features while capturing discriminative characteristics, enabling better intra-class commonality extraction and larger inter-class differences. We further introduce a meta-distiller, the ColorSense Distiller, which incorporates prior teacher knowledge to augment the student network's meta-learning capacity.

What carries the argument

The ColorSense Learner framework, which performs inter-channel feature extraction and interactive learning to emphasize distinct color information across channels for filtering and discrimination.

If this is right

  • The method achieves strong performance on coarse-grained, fine-grained, and cross-domain few-shot classification tasks.
  • It demonstrates improved robustness and transferability across eleven standard benchmarks.
  • The approach handles few-shot classification by leveraging color perception to enhance meta-learning capacity.
  • The ColorSense Distiller augments student networks with prior teacher knowledge for better results.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If color-channel emphasis works here, similar low-level cue emphasis might help in other visual meta-learning settings like object detection or segmentation.
  • A direct test on datasets where color is deliberately uninformative would clarify whether the gains stem specifically from color or from added architectural capacity.
  • The framework's emphasis on intuitive features could inspire hybrid models that combine color with other perceptual priors such as texture or motion.

Load-bearing premise

The assumption that color information is the most intuitive and under-used visual feature and that its inter-channel interactions will reliably yield larger inter-class differences than standard abstract feature methods.

What would settle it

Run the same few-shot benchmarks on grayscale or color-ablated versions of the datasets and observe whether the reported gains in accuracy and generalization disappear or reverse.

Figures

Figures reproduced from arXiv: 2507.22136 by Chaofei Qi, Jianbin Qiu, Zhitai Liu.

Figure 1
Figure 1. Figure 1: 5-way 1-shot 1-query scenario of few-shot learning. Unlike conventional meta-learning focusing on abstract feature extraction, we focus on cognitive-inspired few-shot learners based on human color perception. By means of the episode training for color perception, our work closes the machine-human chromatic discrimination gap. Bio-inspired strategy can tackle few-shot scenarios via color cognition principle… view at source ↗
Figure 2
Figure 2. Figure 2: Schematic diagram of ColorSense Learner, namely CoSeLearner. Structurally, it consists of three primary components: Color Shunt for channel separation, Feature Echelon for sub-channel feature extraction, and Color Pattern for color feature perception. In light of varying depths and functions, Feature Echelon module is further subdivided into five distinct sub-modules: Sentinels, Integrators, Abstractors, D… view at source ↗
Figure 3
Figure 3. Figure 3: Structure diagrams of the ColorSense Attention and Color Pattern Module. The attention block aims to facilitates shallow attention perception among three groups of feature maps before feature projection, and color pattern module fuses the embedded features. Channel fusion encompasses a 1x1 convolutional layer. We specify I-th channel as the core channel and merge the real-time features of II-th and III-th … view at source ↗
Figure 4
Figure 4. Figure 4: Knowledge distillation process of the ColorSense Distiller. The Feature Echelon of the student network abandons the ColorSense Attention layer, and d indicates the pattern depth of color distillation. The teacher model can transfer the pre-frozen color-perception knowledge to student network. Through distillation, the student network can achieve rapid training, and enhance color recognition ability beyond … view at source ↗
Figure 6
Figure 6. Figure 6: Ablation experiments of color pattern depth g and distilling depth d on tiered-ImageNet. disparities in actual performance when compared to the CIELab space are negligible. Moreover, this finding further validates the applicability of our learner to diverse color spaces. 5.3 Color Pattern Depth of CoSeLearner The color pattern module is devised to facilitate information exchange among the channels of each … view at source ↗
Figure 7
Figure 7. Figure 7: More ablation experiments of the color pattern generation g for CoSeLearner and ColorSense distilling depth d for CoSeDistiller on the mini-ImageNet, CUB-200-2011, meta-iNat, and tiered-meta-iNat benchmarks. the advanced features extracted from each group of color channels. The General sub-module achieves preliminary color perception among the three groups of color channels via the ColorSense Attention mod… view at source ↗
Figure 8
Figure 8. Figure 8: Qualitative comparative experiments of CoSeLearner against four MSA [3, 38, 84] methods and four MNR [18, 25, 52, 78] methods on the mini-ImageNet, CIFAR-FS, tiered-ImageNet, and FC-100 benchmarks. I CoSeDistiller vs. Analogous KD Methods Our CoSeDistiller employs knowledge distillation to acquire prior knowledge from the pre-trained CoSeLearner teacher network, thereby accelerating training and enhancing … view at source ↗
Figure 9
Figure 9. Figure 9: Comparative experiments of our CoSeDistiller compared with knowledge distillation-based methods on the mini-ImageNet and tiered-ImageNet datasets. Among them, the CSL backbone indicates our CoSeLearner. J Cross-Domain Transferability [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗
read the original abstract

Humans possess innate meta-learning capabilities, partly attributable to their exceptional color perception. In this paper, we pioneer an innovative viewpoint on few-shot learning by simulating human color perception mechanisms. We propose the ColorSense Learner, a bio-inspired meta-learning framework that capitalizes on inter-channel feature extraction and interactive learning. By strategically emphasizing distinct color information across different channels, our approach effectively filters irrelevant features while capturing discriminative characteristics. Color information represents the most intuitive visual feature, yet conventional meta-learning methods have predominantly neglected this aspect, focusing instead on abstract feature differentiation across categories. Our framework bridges the gap via synergistic color-channel interactions, enabling better intra-class commonality extraction and larger inter-class differences. Furthermore, we introduce a meta-distiller based on knowledge distillation, ColorSense Distiller, which incorporates prior teacher knowledge to augment the student network's meta-learning capacity. We've conducted comprehensive coarse/fine-grained and cross-domain experiments on eleven few-shot benchmarks for validation. Numerous experiments reveal that our methods have extremely strong generalization ability, robustness, and transferability, and effortless handle few-shot classification from the perspective of color perception.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper proposes the ColorSense Learner, a bio-inspired meta-learning framework for few-shot classification that simulates human color perception through inter-channel feature extraction and interactive learning. By emphasizing distinct color information across channels, it claims to filter irrelevant features, extract better intra-class commonalities, and produce larger inter-class differences than conventional abstract-feature meta-learning methods. The work also introduces the ColorSense Distiller, a knowledge-distillation-based meta-distiller that incorporates teacher priors to boost the student network. Comprehensive experiments on eleven few-shot benchmarks (coarse/fine-grained and cross-domain) are reported to demonstrate strong generalization, robustness, and transferability.

Significance. If the central performance gains can be rigorously attributed to the color-channel mechanism rather than auxiliary components, the work would provide a novel perspective on underutilized visual cues in meta-learning. The bio-inspired framing and distillation component could stimulate further research on modality-specific inductive biases in few-shot settings, particularly where color is discriminative.

major comments (1)
  1. [Abstract and Experimental Validation] The load-bearing claim that 'synergistic color-channel interactions' produce larger inter-class differences and better intra-class commonality (Abstract) is not supported by isolating ablations. No color-ablated, grayscale, or channel-permuted controls are described that would rule out gains arising instead from the ColorSense Distiller, added parameters, or training protocol changes. Without such controls, the superiority over 'conventional meta-learning methods' cannot be attributed to the color emphasis.
minor comments (2)
  1. [Abstract] The abstract introduces 'ColorSense Learner' and 'ColorSense Distiller' as new entities without a concise architectural overview or pseudocode that would allow readers to understand the inter-channel interaction implementation at a glance.
  2. [Introduction (implied)] The statement that color is 'the most intuitive visual feature' yet 'predominantly neglected' would benefit from a brief citation to prior color-aware few-shot or meta-learning works to clarify the novelty gap.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our work. The concern about isolating the contribution of color-channel interactions is well-taken, and we address it directly below along with our plans for revision.

read point-by-point responses
  1. Referee: [Abstract and Experimental Validation] The load-bearing claim that 'synergistic color-channel interactions' produce larger inter-class differences and better intra-class commonality (Abstract) is not supported by isolating ablations. No color-ablated, grayscale, or channel-permuted controls are described that would rule out gains arising instead from the ColorSense Distiller, added parameters, or training protocol changes. Without such controls, the superiority over 'conventional meta-learning methods' cannot be attributed to the color emphasis.

    Authors: We agree that the current set of experiments does not include the specific isolating controls mentioned (grayscale inputs, channel-permuted variants, or explicit color-ablated baselines). Our existing ablations focus on the inter-channel interaction modules within the ColorSense Learner and the addition of the Distiller, showing performance drops when these are removed. However, these do not fully rule out contributions from parameter count or protocol differences. To strengthen attribution to the color-perception mechanism, we will add the requested controls in the revised manuscript: (1) grayscale versions of the same benchmarks, (2) channel-permuted inputs while keeping network architecture fixed, and (3) direct comparisons of the full model versus the Learner alone (without the Distiller). These will be reported alongside the existing results to clarify the source of the observed gains. revision: yes

Circularity Check

0 steps flagged

No circularity: framework presented as novel bio-inspired construction without reduction to inputs or self-citations

full rationale

The paper introduces the ColorSense Learner as a new meta-learning framework that simulates human color perception via inter-channel feature extraction and interactions, along with a ColorSense Distiller for knowledge distillation. No equations, derivation steps, or load-bearing self-citations appear in the abstract or described text. The central claims about filtering irrelevant features and achieving larger inter-class differences through color emphasis are presented as an innovative viewpoint and construction, validated by experiments on eleven benchmarks, rather than any mathematical reduction that equates outputs to fitted inputs or prior author results by definition. This is a standard case of a self-contained proposed architecture.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claim rests on the untested domain assumption that color is the dominant neglected cue for intra-class commonality in few-shot image tasks and that channel interaction will automatically enlarge inter-class margins without introducing new failure modes.

axioms (1)
  • domain assumption Color information is the most intuitive visual feature and has been neglected by prior meta-learning methods
    Stated directly in the abstract as the motivation for the framework.
invented entities (2)
  • ColorSense Learner no independent evidence
    purpose: Bio-inspired meta-learning framework using inter-channel feature extraction and interactive learning
    New named architecture introduced to operationalize color perception simulation.
  • ColorSense Distiller no independent evidence
    purpose: Meta-distiller based on knowledge distillation that incorporates prior teacher knowledge
    New component added to augment the student network's meta-learning capacity.

pith-pipeline@v0.9.0 · 5719 in / 1355 out tokens · 38689 ms · 2026-05-21T23:52:49.819846+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

96 extracted references · 96 canonical work pages

  1. [1]

    Flamingo: a visual language model for few-shot learning

    Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katie Millican, Malcolm Reynolds, Roman Ring, Eliza Rutherford, Serkan Cabi, Tengda Han, Zhitao Gong, Sina Samangooei, Marianne Monteiro, Jacob Menick, Sebastian Borgeaud, Andy Brock, Aida Nematzadeh, Sahand Sharifzadeh, Mikolaj Binkowski, R...

  2. [2]

    Frozen feature augmentation for few-shot image classification

    Andreas Bär, Neil Houlsby, Mostafa Dehghani, and Manoj Kumar. Frozen feature augmentation for few-shot image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 16046–16057, 2024. 2

  3. [3]

    Semantic prompt for few-shot image recognition

    Wentao Chen, Chenyang Si, Zhang Zhang, Liangdao Wang, Zilei Wang, and Tien-Ping Tan. Semantic prompt for few-shot image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 23581–23591, 2023. 2, 3, 7, 16

  4. [4]

    Meta-adapter: An online few-shot learner for vision-language model

    Cheng Cheng, Lin Song, Ruoyi Xue, Hang Wang, Hongbin Sun, Yixiao Ge, and Ying Shan. Meta-adapter: An online few-shot learner for vision-language model. In Neural Information Processing Systems, 2023. 2, 14

  5. [5]

    Conway, Saima Malik-Moraleda, and Edward Gibson

    Bevil R. Conway, Saima Malik-Moraleda, and Edward Gibson. Color appearance and the end of hering’s opponent-colors theory. Trends in Cognitive Sciences, 27:791–804, 2023. 2

  6. [6]

    Bert: Pre-training of deep bidi- rectional transformers for language understanding

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidi- rectional transformers for language understanding. In North American Chapter of the Association for Computational Linguistics, pages 4171–4186, 2019. 2

  7. [7]

    Self-promoted supervision for few-shot transformer

    Bowen Dong, Pan Zhou, Shuicheng Yan, and Wangmeng Zuo. Self-promoted supervision for few-shot transformer. In Proceedings of the European Conference on Computer Vision, pages 329–347, 2022. 4, 16

  8. [8]

    An image is worth 16x16 words: Transformers for image recognition at scale

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021. 2, 14

  9. [9]

    Human-like few-shot learning via bayesian reasoning over natural language

    Kevin Ellis. Human-like few-shot learning via bayesian reasoning over natural language. In Neural Information Processing Systems, 2023. 2

  10. [10]

    Junkins, Ehsan Amid, Jurij Leskovec, Christopher R’e, and Sebastian Thrun

    Christopher Fifty, Dennis Duan, Ronald G. Junkins, Ehsan Amid, Jurij Leskovec, Christopher R’e, and Sebastian Thrun. Context-aware meta-learning. In International Conference on Learning Representations,

  11. [11]

    Instance-based max-margin for practical few-shot recognition

    Minghao Fu, Kevin Zhu, and Jianxin Wu. Instance-based max-margin for practical few-shot recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 28674–28683,

  12. [12]

    Styleadv: Meta style adversarial training for cross-domain few-shot learning

    Yu Fu, Yu Xie, Yanwei Fu, and Yugang Jiang. Styleadv: Meta style adversarial training for cross-domain few-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 24575–24584, 2023. 2, 8, 17

  13. [13]

    Wave-san: Wavelet based style augmentation network for cross-domain few-shot learning

    Yuqian Fu, Yu Xie, Yanwei Fu, Jingjing Chen, and Yu-Gang Jiang. Wave-san: Wavelet based style augmentation network for cross-domain few-shot learning. arXiv preprint, 2022. 2, 8, 17

  14. [14]

    Gabhart, Yihan (Sophy) Xiong, and André M

    Kaitlyn M. Gabhart, Yihan (Sophy) Xiong, and André M. Bastos. Predictive coding: a more cognitive process than we thought? Trends in cognitive sciences, 2025. 2

  15. [15]

    Hassan Gharoun, Fereshteh Momenifar, Fang Chen, and Amir H. Gandomi. Meta-learning approaches for few-shot learning: A survey of recent advances. ACM Computing Surveys, 56:1–41, 2024. 2, 14

  16. [16]

    Groh, Meredith N

    Jennifer M. Groh, Meredith N. Schmehl, Valeria C. Caruso, and Surya T. Tokdar. Signal switching may enhance processing power of the brain. Trends in Cognitive Sciences, 28:600–613, 2024. 2

  17. [17]

    Understanding episode hardness in few-shot learning

    Yurong Guo, Ruoyi Du, Aneeshan Sain, Kongming Liang, Yuan Dong, Yi-Zhe Song, and Zhanyu Ma. Understanding episode hardness in few-shot learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 47:616–633, 2024. 2

  18. [18]

    Class-aware patch embedding adaptation for few-shot image classification

    Fusheng Hao, Fengxiang He, Liu Liu, Fuxiang Wu, Dacheng Tao, and Jun Cheng. Class-aware patch embedding adaptation for few-shot image classification. In Proceedings of the IEEE International Conference on Computer Vision, pages 18859–18869, 2023. 2, 4, 7, 16

  19. [19]

    Cross-level distillation and feature denoising for cross-domain few-shot classification

    Jianzhuang Liu Hao Zheng, Runqi Wang and Asako Kanezaki. Cross-level distillation and feature denoising for cross-domain few-shot classification. In International Conference on Learning Representations, 2023. 2, 4

  20. [20]

    Zhang, Shaoqing Ren, and Jian Sun

    Kaiming He, X. Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016. 2, 14

  21. [21]

    Apseg: Auto-prompt network for cross-domain few-shot semantic segmentation

    Weizhao He, Yang Zhang, Wei Zhuo, LinLin Shen, Jiaqi Yang, Songhe Deng, and Liang Sun. Apseg: Auto-prompt network for cross-domain few-shot semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 23762–23772, 2024. 2

  22. [22]

    Attribute surrogates learning and spectral tokens pooling in transformers for few-shot learning

    Yang He, Weihan Liang, Dongyang Zhao, Hong-Yu Zhou, Weifeng Ge, Yizhou Yu, and Wenqiang Zhang. Attribute surrogates learning and spectral tokens pooling in transformers for few-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9109–9119,

  23. [23]

    Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification

    Patrick Helber, Benjamin Bischke, Andreas Dengel, and Damian Borth. Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens., 12:2217–2226, 2019. 6

  24. [24]

    Adapt before comparison: A new perspective on cross-domain few-shot segmentation

    Jonas Herzog. Adapt before comparison: A new perspective on cross-domain few-shot segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 23605–23615,

  25. [25]

    Rethinking generalization in few-shot classification

    Markus Hiller, Rongkai Ma, Mehrtash Harandi, and Tom Drummond. Rethinking generalization in few-shot classification. In Neural Information Processing Systems, 2022. 4, 7, 16

  26. [26]

    Hospedales, Antreas Antoniou, Paul Micaelli, and Amos J

    Timothy M. Hospedales, Antreas Antoniou, Paul Micaelli, and Amos J. Storkey. Meta-learning in neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44:5149–5169,

  27. [27]

    A closer look at prototype classifier for few-shot image classification

    Mingcheng Hou and Issei Sato. A closer look at prototype classifier for few-shot image classification. In Neural Information Processing Systems, 2022. 2

  28. [28]

    Adversarial feature augmentation for cross-domain few-shot classification

    Yan Hu and Andy Jinhua Ma. Adversarial feature augmentation for cross-domain few-shot classification. In Proceedings of the European Conference on Computer Vision, pages 20–37, 2022. 2, 8, 17

  29. [29]

    Masked distillation with receptive tokens

    Tao Huang, Yuan Zhang, Shan You, Fei Wang, Chen Qian, Jian Cao, and Chang Xu. Masked distillation with receptive tokens. In International Conference on Learning Representations, 2023. 4

  30. [30]

    Relational embedding for few-shot classification

    Dahyun Kang, Heeseung Kwon, Juhong Min, and Minsu Cho. Relational embedding for few-shot classification. In Proceedings of the IEEE International Conference on Computer Vision, pages 8802–8813,

  31. [31]

    Accelerating convergence in bayesian few-shot classification

    Tianjun Ke, Haoqun Cao, and Feng Zhou. Accelerating convergence in bayesian few-shot classification. In International Conference on Machine Learning, 2024. 2

  32. [32]

    Hospedales

    Minyoung Kim and Timothy M. Hospedales. A hierarchical bayesian model for few-shot meta learning. In International Conference on Learning Representations, 2024. 3, 16

  33. [33]

    3d object representations for fine-grained categorization

    Jonathan Krause, Michael Stark, Jia Deng, and Fei-Fei Li. 3d object representations for fine-grained categorization. In Proceedings of the IEEE International Conference on Computer Vision Workshops , pages 554–561, 2013. 6

  34. [34]

    Meta-learning with differ- entiable convex optimization

    Kwonjoon Lee, Subhransu Maji, Avinash Ravichandran, and Stefano Soatto. Meta-learning with differ- entiable convex optimization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 10657–10665, 2019. 6

  35. [35]

    Task-oriented channel attention for fine-grained few-shot classification

    Subeen Lee, WonJun Moon, Hyun Seok Seong, and Jae-Pil Heo. Task-oriented channel attention for fine-grained few-shot classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 47: 1448–1463, 2025. 2, 7

  36. [36]

    Junnan Li, Dongxu Li, Caiming Xiong, and Steven C. H. Hoi. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International Conference on Machine Learning, pages 12888–12900, 2022. 2

  37. [37]

    Revisiting local descriptor based image-to-class measure for few-shot learning

    Wenbin Li, Lei Wang, Jinglin Xu, Jing Huo, Yang Gao, and Jiebo Luo. Revisiting local descriptor based image-to-class measure for few-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7253–7260, 2019. 7

  38. [38]

    Fewvs: A vision-semantics integration framework for few- shot image classification

    Zhuoling Li, Yong Wang, and Kaitong Li. Fewvs: A vision-semantics integration framework for few- shot image classification. In Proceedings of the ACM International Conference on Multimedia , pages 1341–1350, 2024. 3, 7, 16, 18

  39. [39]

    Supervised masked knowledge distillation for few-shot transformers

    Hanxi Lin, Guangxing Han, Jiawei Ma, Shiyuan Huang, Xudong Lin, and Shih-Fu Chang. Supervised masked knowledge distillation for few-shot transformers. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 19649–19659, 2023. 4, 16

  40. [40]

    Learning to affiliate: Mutual centralized learning for few-shot classification

    Yang Liu, Weifeng Zhang, Chao Xiang, Tu Zheng, and Deng Cai. Learning to affiliate: Mutual centralized learning for few-shot classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 14391–14400, 2022. 7

  41. [41]

    Swin transformer: Hierarchical vision transformer using shifted windows

    Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE International Conference on Computer Vision, pages 9992–10002, 2021. 2

  42. [42]

    Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks

    Jiasen Lu, Dhruv Batra, Devi Parikh, and Stefan Lee. Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. In Neural Information Processing Systems, 2019. 2

  43. [43]

    Self-supervision can be a good few-shot learner

    Yuning Lu, Liangjiang Wen, Jianzhuang Liu, Yajing Liu, and Xinmei Tian. Self-supervision can be a good few-shot learner. In Proceedings of the European Conference on Computer Vision, pages 740–758, 2022. 4, 16

  44. [44]

    Jiawei Ma, Hanchen Xie, Guangxing Han, Shih-Fu Chang, A. G. Galstyan, and Wael AbdAlmageed. Partner-assisted learning for few-shot image classification. In Proceedings of the IEEE International Conference on Computer Vision, pages 10553–10562, 2021. 16

  45. [45]

    Cross-layer and cross-sample feature optimization network for few-shot fine-grained image classification

    Zhen-Xiang Ma, Zhen-Duo Chen, Li-Jun Zhao, Ziya Zhang, Xin Luo, and Xin-Shun Xu. Cross-layer and cross-sample feature optimization network for few-shot fine-grained image classification. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 4136–4144, 2024. 2, 7, 18

  46. [46]

    Bi-directional task-guided network for few-shot fine-grained image classification

    Zhen-Xiang Ma, Zhen-Duo Chen, Li-Jun Zhao, Ziya Zhang, Tai Zheng, Xin Luo, and Xin-Shun Xu. Bi-directional task-guided network for few-shot fine-grained image classification. In Proceedings of the ACM International Conference on Multimedia, pages 8277–8286, 2024. 2, 7 11

  47. [47]

    Blaschko, and Andrea Vedaldi

    Subhransu Maji, Esa Rahtu, Juho Kannala, Matthew B. Blaschko, and Andrea Vedaldi. Fine-grained visual classification of aircraft. arXiv preprint, 2013. 6

  48. [48]

    Why do primates have view cells instead of place cells? Trends in Cognitive Sciences, 29:226–229, 2025

    Julio Martinez-Trujillo. Why do primates have view cells instead of place cells? Trends in Cognitive Sciences, 29:226–229, 2025. 2

  49. [49]

    Using deep learning for image-based plant disease detection

    Sharada P Mohanty, David P Hughes, and Marcel Salathe. Using deep learning for image-based plant disease detection. Frontiers in Plant Science, 7, 2016. 6

  50. [50]

    Meta learning to bridge vision and language models for multimodal few-shot learning

    Ivona Najdenkoska, Xiantong Zhen, and Marcel Worring. Meta learning to bridge vision and language models for multimodal few-shot learning. In International Conference on Learning Representations, 2023. 2, 3

  51. [51]

    A survey on stability of learning with limited labelled data and its sensitivity to the effects of randomness

    Branislav Pecher, Ivan Srba, and Mária Bieliková. A survey on stability of learning with limited labelled data and its sensitivity to the effects of randomness. ACM Computing Surveys, 57:1–40, 2025. 2

  52. [52]

    Beclr: Batch enhanced contrastive few-shot learning

    Stylianos Poulakakis-Daktylidis and Hadi Jamali Rad. Beclr: Batch enhanced contrastive few-shot learning. In International Conference on Learning Representations, 2024. 4, 7, 16, 18

  53. [53]

    Improving language understanding by generative pre-training

    Alec Radford and Karthik Narasimhan. Improving language understanding by generative pre-training

  54. [54]

    Self-supervised knowledge distillation for few-shot learning

    Jathushan Rajasegaran, Salman Hameed Khan, Munawar Hayat, Fahad Shahbaz Khan, and Mubarak Shah. Self-supervised knowledge distillation for few-shot learning. In British Machine Vision Conference, 2021. 16

  55. [55]

    Tenenbaum, H

    Mengye Ren, Eleni Triantafillou, Sachin Ravi, Jake Snell, Kevin Swersky, Joshua B. Tenenbaum, H. Larochelle, and Richard S. Zemel. Meta-learning for semi-supervised few-shot classification. In International Conference on Learning Representations, 2018. 6, 17

  56. [56]

    Few-shot learning with graph neural networks

    Victor Garcia Satorras and Joan Bruna. Few-shot learning with graph neural networks. In International Conference on Learning Representations, 2018. 8, 17

  57. [57]

    Adaptive subspaces for few-shot learning

    Christian Simon, Piotr Koniusz, Richard Nock, and Mehrtash Harandi. Adaptive subspaces for few-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4135–4144, 2020. 7

  58. [58]

    Prototypical networks for few-shot learning

    Jake Snell, Kevin Swersky, and Richard Zemel. Prototypical networks for few-shot learning. In Neural Information Processing Systems, 2017. 3, 14, 18

  59. [59]

    A com- prehensive survey of few-shot learning: Evolution, applications, challenges, and opportunities

    Yisheng Song, Ting-Yuan Wang, Puyu Cai, Subrota Kumar Mondal, and Jyoti Prakash Sahoo. A com- prehensive survey of few-shot learning: Evolution, applications, challenges, and opportunities. ACM Computing Surveys, 55:1–40, 2023. 2

  60. [60]

    Explanation-guided training for cross-domain few-shot classification

    Jiamei Sun, Sebastian Lapuschkin, Wojciech Samek, Yunqing Zhao, Ngai-Man Cheung, and Alexander Binder. Explanation-guided training for cross-domain few-shot classification. In Proceedings of the International Conference on Pattern Recognition, pages 7609–7616, 2020. 2, 8, 17

  61. [61]

    Meta-adam: An meta-learned adaptive optimizer with momentum for few-shot learning

    Siyuan Sun and Hongyang Gao. Meta-adam: An meta-learned adaptive optimizer with momentum for few-shot learning. In Neural Information Processing Systems, 2023. 2, 3, 16

  62. [62]

    Meta-learning with self-improving momentum target

    Jihoon Tack, Jongjin Park, Hankook Lee, Jaeho Lee, and Jinwoo Shin. Meta-learning with self-improving momentum target. In Neural Information Processing Systems, 2022. 3

  63. [63]

    The role of color in high-level vision

    James Tanaka, Daniel Weiskopf, and Pepper Williams. The role of color in high-level vision. Trends in Cognitive Sciences, 5:211–215, 2001. 2, 14

  64. [64]

    Amu-tuning: Effective logit bias for clip-based few-shot learning

    Yuwei Tang, Zhenyi Lin, Qilong Wang, Pengfei Zhu, and Qinghua Hu. Amu-tuning: Effective logit bias for clip-based few-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 23323–23333, 2024. 2, 3

  65. [65]

    Mokd: Cross-domain finetuning for few-shot classification via maximizing optimized kernel dependence

    Hongduan Tian, Feng Liu, Tongliang Liu, Bo Du, Yiu ming Cheung, and Bo Han. Mokd: Cross-domain finetuning for few-shot classification via maximizing optimized kernel dependence. In International Conference on Machine Learning, 2024. 2

  66. [66]

    Tenenbaum, and Phillip Isola

    Yonglong Tian, Yue Wang, Dilip Krishnan, Joshua B. Tenenbaum, and Phillip Isola. Rethinking few-shot image classification: a good embedding is all you need? In Proceedings of the European Conference on Computer Vision, pages 266–282, 2021. 16

  67. [67]

    Cross-domain few-shot classifica- tion via learned feature-wise transformation

    Hung-Yu Tseng, Hsin-Ying Lee, Jia-Bin Huang, and Ming-Hsuan Yang. Cross-domain few-shot classifica- tion via learned feature-wise transformation. In International Conference on Learning Representations,

  68. [68]

    Lillicrap, Koray Kavukcuoglu, and Daan Wierstra

    Oriol Vinyals, Charles Blundell, Timothy P. Lillicrap, Koray Kavukcuoglu, and Daan Wierstra. Matching networks for one shot learning. In Neural Information Processing Systems, 2016. 6, 17

  69. [69]

    Focus your attention when few-shot classification

    Haoqing Wang, Shibo Jie, and Zhi-Hong Deng. Focus your attention when few-shot classification. In Neural Information Processing Systems, 2023. 2

  70. [70]

    How to trust unlabeled data? instance credibility inference for few-shot learning

    Yikai Wang, Li Zhang, Yuan Yao, and Yanwei Fu. How to trust unlabeled data? instance credibility inference for few-shot learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44: 6240–6253, 2022. 7

  71. [71]

    Negatives make a positive: An embarrassingly simple approach to semi-supervised few-shot learning

    Xiu-Shen Wei, Hesheng Xu, Zhiwen Yang, Chenlong Duan, and Yuxin Peng. Negatives make a positive: An embarrassingly simple approach to semi-supervised few-shot learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46:2091–2103, 2024. 4, 7, 16

  72. [72]

    Caltechucsd birds 200

    Peter Welinder, Steve Branson, Takeshi Mita, Catherine Wah, Florian Schroff, Serge Belongie, and Pietro Perona. Caltechucsd birds 200. 2010. 6 12

  73. [73]

    Few-shot learning with localization in realistic settings

    Davis Wertheimer and Bharath Hariharan. Few-shot learning with localization in realistic settings. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6558–6567, 2019. 6, 17

  74. [74]

    Fine-grained few-shot classification with feature map reconstruction networks

    Davis Wertheimer, Luming Tang, and Bharath Hariharan. Fine-grained few-shot classification with feature map reconstruction networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8012–8021, 2021. 2, 7, 18

  75. [75]

    Bi-directional ensemble feature reconstruction network for few-shot fine-grained classification

    Jijie Wu, Dongliang Chang, Aneeshan Sain, Xiaoxu Li, Zhanyu Ma, Jie Cao, Jun Guo, and Yi-Zhe Song. Bi-directional ensemble feature reconstruction network for few-shot fine-grained classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46:6082–6096, 2024. 2, 7

  76. [76]

    Le, Mingzhen Huang, ShahRukh Athar, and Dimitris Samaras

    Jingyi Xu, Hieu M. Le, Mingzhen Huang, ShahRukh Athar, and Dimitris Samaras. Variational feature disentangling for fine-grained few-shot classification. InProceedings of the IEEE International Conference on Computer Vision, pages 8792–8801, 2021. 7

  77. [77]

    Channel-spatial support- query cross-attention for fine-grained few-shot image classification

    Shicheng Yang, Xiaoxu Li, Dongliang Chang, Zhanyu Ma, and Jing-Hao Xue. Channel-spatial support- query cross-attention for fine-grained few-shot image classification. In Proceedings of the ACM Interna- tional Conference on Multimedia, pages 9175–9183, 2024. 2, 7

  78. [78]

    One meta-tuned transformer is what you need for few-shot learning

    Xuehan Yang, Huaxiu Yao, and Ying Wei. One meta-tuned transformer is what you need for few-shot learning. In International Conference on Machine Learning, 2024. 4, 7, 16, 18

  79. [79]

    Carbonell, Ruslan Salakhutdinov, and Quoc V

    Zhilin Yang, Zihang Dai, Yiming Yang, Jaime G. Carbonell, Ruslan Salakhutdinov, and Quoc V . Le. Xlnet: Generalized autoregressive pretraining for language understanding. In Neural Information Processing Systems, 2019. 2

  80. [80]

    Few-shot learning with a strong teacher

    Han-Jia Ye, Lu Ming, De chuan Zhan, and Wei-Lun Chao. Few-shot learning with a strong teacher. IEEE Transactions on Pattern Analysis and Machine Intelligence, 56:1425–1440, 2024. 4

Showing first 80 references.