pith. sign in

arxiv: 2402.03038 · v2 · submitted 2024-02-05 · 💻 cs.LG · cs.AI· cs.CL

Automatic Combination of Sample Selection Strategies for Few-Shot Learning

Pith reviewed 2026-05-24 03:24 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CL
keywords few-shot learningsample selectionin-context learningmeta-learningfew-shot fine-tuningautomatic combination
0
0 comments X

The pith

The ACSESS method automatically combines 23 sample selection strategies to improve few-shot learning across models and datasets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proposes ACSESS as a way to automatically combine various sample selection strategies rather than relying on one. It evaluates the approach using 23 strategies on 5 in-context learning models and 3 few-shot learning methods across 6 text and 8 image datasets. The combination consistently beats individual strategies and matches or surpasses specialized baselines. Benefits are largest when selecting very few shots and hold even for smaller datasets.

Core claim

The combination of strategies through the ACSESS method consistently outperforms all individual selection strategies and performs on par or exceeds the in-context learning specific baselines.

What carries the argument

ACSESS, a method for automatic combination of sample selection strategies that leverages their complementarity.

If this is right

  • Sample selection remains effective even on smaller datasets.
  • Greatest benefits occur when only a few shots are selected.
  • The advantage diminishes as the number of shots increases.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Applying automatic combination could reduce reliance on hand-crafted strategies for new few-shot tasks.
  • The method might extend to other selection problems in machine learning where multiple heuristics exist.

Load-bearing premise

The 23 sample selection strategies possess complementary strengths whose automatic combination generalizes across models, datasets, and few-shot paradigms without dataset-specific overfitting or extensive per-task tuning.

What would settle it

An experiment on a new dataset or model where the ACSESS combination fails to outperform the best single strategy or the ICL baselines would disprove the main result.

Figures

Figures reproduced from arXiv: 2402.03038 by Branislav Pecher, Ivan Srba, Joaquin Vanschoren, Maria Bielikova.

Figure 1
Figure 1. Figure 1: Benefit of the different selection strategies, calculated as the difference in accuracy between the specific strategy and the classic few-shot selection, aggregated over the image and text datasets (boxplots show the distribution of results across the datasets). The performance of the classic selection is represented as the red dashed line (zero value). The consistently beneficial selection strategies depe… view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of the LENS method (Li & Qiu, 2023) to our proposed automatic combination of selection strategies (ACSESS). performance and higher variance in the results. Even though the uniform combination is computationally less expensive than the weighted combination, its performance increase is only slightly lower (average difference of 0.10 − 0.25). As such, the uniform weighting represents a good trade￾o… view at source ↗
Figure 3
Figure 3. Figure 3: Effect of the number of shots on the benefit of selection strategies, comparing ACSESS and random selection, aggregated over datasets. The benefit is calculated as a difference to the classic selection at 5-shots. The benefit of sample selection is more significant at lower number of shots. At larger number of shots, the benefit and the performance boost from further increase of shots becomes negligible [… view at source ↗
Figure 4
Figure 4. Figure 4: Standard deviation introduced by multiple runs of the different selection strategies. The results for Prototypical Networks, MAML and Few-Shot Fine-Tuning are aggregated over both the image and text datasets, while the results for Mistral and Zephyr are only from text datasets. The ACSESS method shows lower sensitivity to repeated runs compared to the majority of the strategies included in the combination.… view at source ↗
Figure 5
Figure 5. Figure 5: Distribution of the selection strategies benefit (calculated as difference to the classic few-shot selection) for the 5-shot and 10-shot setting. The benefit of the different selection strategies is more significant at lower number of shots. E. Comparison of Sample Selection Impact Between 5-Shot and 10-Shot Setting To better explore the effect of increasing number of selected shots on the sample selection… view at source ↗
Figure 6
Figure 6. Figure 6: Benefit of the different selection strategies calculated as the difference to the classic few-shot selection strategy for the LR AM.DOG dataset. The performance of the classic selection is represented as the zero value [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Benefit of the different selection strategies calculated as the difference to the classic few-shot selection strategy for the LR AM.AWA dataset. The performance of the classic selection is represented as the zero value. 21 [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Benefit of the different selection strategies calculated as the difference to the classic few-shot selection strategy for the VCL.APL dataset. The performance of the classic selection is represented as the zero value. 22 [PITH_FULL_IMAGE:figures/full_fig_p022_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Benefit of the different selection strategies calculated as the difference to the classic few-shot selection strategy for the HUM ACT.ACT 410 dataset. The performance of the classic selection is represented as the zero value. 23 [PITH_FULL_IMAGE:figures/full_fig_p023_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Benefit of the different selection strategies calculated as the difference to the classic few-shot selection strategy for the MNF.TEX DTD dataset. The performance of the classic selection is represented as the zero value. 24 [PITH_FULL_IMAGE:figures/full_fig_p024_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Benefit of the different selection strategies calculated as the difference to the classic few-shot selection strategy for the MCR.PRT dataset. The performance of the classic selection is represented as the zero value. 25 [PITH_FULL_IMAGE:figures/full_fig_p025_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Benefit of the different selection strategies calculated as the difference to the classic few-shot selection strategy for the MCR.PNU dataset. The performance of the classic selection is represented as the zero value. 26 [PITH_FULL_IMAGE:figures/full_fig_p026_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Benefit of the different selection strategies calculated as the difference to the classic few-shot selection strategy for the PLT.FLW dataset. The performance of the classic selection is represented as the zero value. 27 [PITH_FULL_IMAGE:figures/full_fig_p027_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Benefit of the different selection strategies calculated as the difference to the classic few-shot selection strategy for the 20 News Group dataset. The performance of the classic selection is represented as the zero value. 30 [PITH_FULL_IMAGE:figures/full_fig_p030_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Benefit of the different selection strategies calculated as the difference to the classic few-shot selection strategy for the News Category dataset. The performance of the classic selection is represented as the zero value. 32 [PITH_FULL_IMAGE:figures/full_fig_p032_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Benefit of the different selection strategies calculated as the difference to the classic few-shot selection strategy for the ATIS dataset. The performance of the classic selection is represented as the zero value. 34 [PITH_FULL_IMAGE:figures/full_fig_p034_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Benefit of the different selection strategies calculated as the difference to the classic few-shot selection strategy for the Facebook dataset. The performance of the classic selection is represented as the zero value. 36 [PITH_FULL_IMAGE:figures/full_fig_p036_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Benefit of the different selection strategies calculated as the difference to the classic few-shot selection strategy for the HWU-64 dataset. The performance of the classic selection is represented as the zero value [PITH_FULL_IMAGE:figures/full_fig_p038_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: Benefit of the different selection strategies calculated as the difference to the classic few-shot selection strategy for the SNIPS dataset. The performance of the classic selection is represented as the zero value. 38 [PITH_FULL_IMAGE:figures/full_fig_p038_19.png] view at source ↗
read the original abstract

In few-shot learning, the selection of samples has a significant impact on the performance of the model. While effective sample selection strategies are well-established in supervised settings, research on large language models largely overlooks them, favouring strategies specifically tailored to individual in-context learning settings. In this paper, we propose a new method for Automatic Combination of SamplE Selection Strategies (ACSESS) to leverage the strengths and complementarity of various well-established selection objectives. We investigate and compare the impact of 23 sample selection strategies on the performance of 5 in-context learning models and 3 few-shot learning approaches (meta-learning, few-shot fine-tuning) over 6 text and 8 image datasets. The experimental results show that the combination of strategies through the ACSESS method consistently outperforms all individual selection strategies and performs on par or exceeds the in-context learning specific baselines. Lastly, we demonstrate that sample selection remains effective even on smaller datasets, yielding the greatest benefits when only a few shots are selected, while its advantage diminishes as the number of shots increases.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes ACSESS, a method for automatically combining 23 established sample selection strategies to improve few-shot learning. It evaluates the approach on 5 in-context learning models and 3 few-shot paradigms (meta-learning, few-shot fine-tuning) across 6 text and 8 image datasets, claiming that the combination consistently outperforms all individual strategies and matches or exceeds ICL-specific baselines. Additional results indicate that selection benefits are largest at low shot counts and remain effective on smaller datasets.

Significance. If the experimental claims hold after detailed verification, the work would be significant because it demonstrates that automatic combination of general-purpose selection objectives can leverage complementarity to match or surpass specialized ICL methods across modalities and paradigms. This bridges supervised learning literature with LLM few-shot settings and suggests reduced need for per-task ICL tuning.

major comments (2)
  1. [Experimental results] The central experimental claim (abstract and results section) of consistent outperformance rests on comparisons across 23 strategies, 5 models, 3 paradigms, and 14 datasets, yet the provided description contains no mention of statistical significance tests, variance across random seeds, or error bars; this reporting gap is load-bearing for the 'consistently outperforms' assertion.
  2. [Methods] The description of the ACSESS combination mechanism (methods section) is not detailed enough to assess whether the automatic weighting or selection process introduces hidden per-dataset tuning or risks of overfitting to the 14 evaluation sets, which directly affects the generalizability claim.
minor comments (1)
  1. [Methods] Clarify the exact definition and implementation of the 23 strategies and the 3 few-shot approaches to allow replication.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting gaps in experimental reporting and methods clarity. We will revise the manuscript to strengthen both areas while preserving the core contributions.

read point-by-point responses
  1. Referee: [Experimental results] The central experimental claim (abstract and results section) of consistent outperformance rests on comparisons across 23 strategies, 5 models, 3 paradigms, and 14 datasets, yet the provided description contains no mention of statistical significance tests, variance across random seeds, or error bars; this reporting gap is load-bearing for the 'consistently outperforms' assertion.

    Authors: We agree that the lack of statistical tests, seed variance, and error bars limits the robustness of the 'consistently outperforms' claim. In the revision we will recompute all main results over at least five random seeds, report mean ± standard deviation, add error bars to figures, and include paired statistical significance tests (Wilcoxon signed-rank) between ACSESS and each baseline. These additions will appear in the results section, tables, and appendix. revision: yes

  2. Referee: [Methods] The description of the ACSESS combination mechanism (methods section) is not detailed enough to assess whether the automatic weighting or selection process introduces hidden per-dataset tuning or risks of overfitting to the 14 evaluation sets, which directly affects the generalizability claim.

    Authors: The combination weights in ACSESS are obtained via a single meta-optimization run whose objective and hyperparameters are identical for every dataset; no per-dataset search or validation-set tuning occurs. To make this explicit we will expand the methods section with the full algorithmic pseudocode, the precise objective function, and a statement that all hyperparameters remain fixed across the 14 datasets. We will also add a short paragraph discussing the risk of overfitting to the chosen evaluation collection and note that future work should test on additional held-out domains. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents an empirical method (ACSESS) for combining 23 sample selection strategies and evaluates it across 5 ICL models, 3 few-shot paradigms, 14 datasets (6 text, 8 image), and varying shot counts. No derivation chain, equations, or self-citations are invoked to justify core claims; performance results are grounded in direct experimental comparisons on independent benchmarks rather than reducing to fitted parameters or prior self-referential results by construction. The central claim (combination outperforms individuals) is externally falsifiable via the reported multi-dataset, multi-model evaluation and does not rely on self-definitional loops or uniqueness theorems from the authors' prior work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, no free parameters, axioms, or invented entities are specified. The method presumably involves some weighting or selection mechanism for combination, but details are absent.

pith-pipeline@v0.9.0 · 5718 in / 1090 out tokens · 75274 ms · 2026-05-24T03:24:39.769035+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Medical Incident Causal Factors and Preventive Measures Generation Using Tag-based Example Selection in Few-shot Learning

    cs.CL 2026-05 unverdicted novelty 4.0

    Tag-based few-shot selection yields higher precision and stability than random or similarity-based methods when using LLMs to analyze medical incidents.

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages · cited by 1 Pith paper · 3 internal anchors

  1. [1]

    findings-acl.564

    URL https://aclanthology.org/2023. findings-acl.564. Aimen, A., Ladrecha, B., Sidheekh, S., and Krishnan, N. C. Leveraging task variability in meta-learning. SN Com- puter Science, 4(5):539, 2023. An, S., Zhou, B., Lin, Z., Fu, Q., Chen, B., Zheng, N., Chen, W., and Lou, J.-G. Skill-based few-shot selec- tion for in-context learning. In Bouamor, H., Pino,...

  2. [2]

    doi: 10.18653/v1/2023.emnlp-main.831

    Association for Computational Linguistics. doi: 10.18653/v1/2023.emnlp-main.831. URL https:// aclanthology.org/2023.emnlp-main.831. Chang, T.-Y . and Jia, R. Data curation alone can stabilize in- context learning. In Proceedings of the 61st Annual Meet- ing of the Association for Computational Linguistics (Vol- ume 1: Long Papers), pp. 8123–8144, Toronto,...

  3. [3]

    Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces

    URL https://openreview.net/forum? id=HJg2b0VYDr. Coucke, A., Saade, A., Ball, A., Bluche, T., Caulier, A., Leroy, D., Doumouro, C., Gisselbrecht, T., Caltagirone, F., Lavril, T., et al. Snips voice platform: an embedded spoken language understanding system for private-by- design voice interfaces. arXiv preprint arXiv:1805.10190, 2018. Deng, J., Dong, W., ...

  4. [4]

    nlp4convai-1.15

    URL https://aclanthology.org/2022. nlp4convai-1.15. He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learn- ing for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016. Hemphill, C. T., Godfrey, J. J., and Doddington, G. R. The ATIS spoken language systems pilot corpus. In Speech and ...

  5. [5]

    URL https://proceedings.mlr.press/ v132/iyer21a.html. Iyer, R. K. and Bilmes, J. A. Submodular opti- mization with submodular cover and submodular knapsack constraints. In Burges, C., Bottou, L., Welling, M., Ghahramani, Z., and Weinberger, K. (eds.), Advances in Neural Information Process- ing Systems , volume 26. Curran Associates, Inc., 10 Automatic Co...

  6. [6]

    Mistral 7B

    URL https://proceedings.neurips. cc/paper_files/paper/2013/file/ a1d50185e7426cbb0acad1e6ca74b9aa-Paper. pdf. Jiang, A. Q., Sablayrolles, A., Mensch, A., Bamford, C., Chaplot, D. S., Casas, D. d. l., Bressand, F., Lengyel, G., Lample, G., Saulnier, L., et al. Mistral 7b. arXiv preprint arXiv:2310.06825, 2023. Jundi, I. and Lapesa, G. How to translate your...

  7. [7]

    findings-emnlp.411

    URL https://aclanthology.org/2023. findings-emnlp.411. Li, X., Lv, K., Yan, H., Lin, T., Zhu, W., Ni, Y ., Xie, G., Wang, X., and Qiu, X. Unified demonstration retriever for in-context learning. In Rogers, A., Boyd-Graber, J., and Okazaki, N. (eds.), Proceedings of the 61st Annual Meet- ing of the Association for Computational Linguistics (Vol- ume 1: Lon...

  8. [8]

    doi: 10.18653/v1/2021.emnlp-main.607

    Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.607. URL https:// aclanthology.org/2021.emnlp-main.607. Paul, M., Ganguli, S., and Dziugaite, G. K. Deep learning on a data diet: Finding important examples early in training. In Ranzato, M., Beygelzimer, A., Dauphin, Y ., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural I...

  9. [9]

    Gupta, Xiaojiang Chen, and Xin Wang

    URL https://proceedings.neurips. cc/paper_files/paper/2021/file/ ac56f8fe9eea3e4a365f29f0f1957c55-Paper. pdf. Pecher, B., Srba, I., and Bielikova, M. On the effects of randomness on stability of learning with limited la- belled data: A systematic literature review.arXiv preprint arXiv:2312.01082, 2023. Qin, C., Zhang, A., Dagar, A., and Ye, W. In-context ...

  10. [10]

    naacl-main.191

    URL https://aclanthology.org/2022. naacl-main.191. Scarlatos, A. and Lan, A. Reticl: Sequential retrieval of in-context examples with reinforcement learning. arXiv preprint arXiv:2305.14502, 2023. Schr¨oder, C., Niekler, A., and Potthast, M. Revisiting uncertainty-based query strategies for active learning with transformers. In Muresan, S., Nakov, P., and...

  11. [11]

    findings-acl.172

    URL https://aclanthology.org/2022. findings-acl.172. Schuster, S., Gupta, S., Shah, R., and Lewis, M. Cross- lingual transfer learning for multilingual task oriented dialog. In Burstein, J., Doran, C., and Solorio, T. (eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language T...

  12. [12]

    emnlp-main.608

    URL https://aclanthology.org/2021. emnlp-main.608. Shum, K., Diao, S., and Zhang, T. Automatic prompt augmentation and selection with chain-of-thought from labeled data. In Bouamor, H., Pino, J., and Bali, K. (eds.), Findings of the Association for Computational Linguistics: EMNLP 2023 , pp. 12113–12139, Singa- pore, December 2023. Association for Computa...

  13. [13]

    findings-emnlp.811

    URL https://aclanthology.org/2023. findings-emnlp.811. Snell, J., Swersky, K., and Zemel, R. Prototypical networks for few-shot learning. Advances in neural information processing systems, 30, 2017. Song, Y ., Wang, T., Cai, P., Mondal, S. K., and Sahoo, J. P. A comprehensive survey of few-shot learning: Evolution, applications, challenges, and opportunit...

  14. [14]

    Zephyr: Direct Distillation of LM Alignment

    Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.746. URL https:// aclanthology.org/2020.emnlp-main.746. Tibshirani, R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58(1):267–288, 1996. Toneva, M., Sordoni, A., des Combes, R. T., Trischler, A., Beng...

  15. [15]

    acl-long.859

    URL https://aclanthology.org/2023. acl-long.859. Wang, J., Song, Z., Su, X., Si, L., Dong, H., Qiang, W., and Zheng, C. Learning to sample tasks for meta learning, 2023. Wang, S., Xu, Y ., Fang, Y ., Liu, Y ., Sun, S., Xu, R., Zhu, C., and Zeng, M. Training data is more valuable than you think: A simple and effective method by retriev- ing from training d...

  16. [16]

    emnlp-main.308/

    URL https://aclanthology.org/2023. findings-acl.273. Yu, R., Liu, S., and Wang, X. Dataset distillation: A compre- hensive review. arXiv preprint arXiv:2301.07014, 2023. Zemlyanskiy, Y ., de Jong, M., Ainslie, J., Pasupat, P., Shaw, P., Qiu, L., Sanghai, S., and Sha, F. Generate- and-retrieve: Use your predictions to improve retrieval for semantic parsing...

  17. [17]

    The number of such forgetting events is used as the sample score and samples that are forgotten the most often (or least often) are selected

    simply calculates how often the specific sample is incorrectly classified after being classified correctly in the previous epoch. The number of such forgetting events is used as the sample score and samples that are forgotten the most often (or least often) are selected. In our evaluation of single-property strategies, we consider only the setting of most...

  18. [18]

    [Input] [Output] Intent classification Determine intent of the sentence using following options: 1) [Option 1] 2) [Option 2] 3) [Option 3]

    [Option 4] 5) [Option 5]. [Input] [Output] Intent classification Determine intent of the sentence using following options: 1) [Option 1] 2) [Option 2] 3) [Option 3]

  19. [19]

    [Option 4] 5) [Option 5]. [Input] [Output] Dataset Verbaliser 20 News Group {IBM, Middle East Politics, Windows XP, Motorcycles, Medicine, For Sale, Religion, MS Windows, Baseball, Auto, Hockey, Mac, Graphics, Christianity, Guns, Electronics, Space, Crypto, Atheism, Politics} News Category {Politics, World News, Parenting, Money, Wellness, Business, Weddi...