Automatic Combination of Sample Selection Strategies for Few-Shot Learning

Branislav Pecher; Ivan Srba; Joaquin Vanschoren; Maria Bielikova

arxiv: 2402.03038 · v2 · submitted 2024-02-05 · 💻 cs.LG · cs.AI· cs.CL

Automatic Combination of Sample Selection Strategies for Few-Shot Learning

Branislav Pecher , Ivan Srba , Maria Bielikova , Joaquin Vanschoren This is my paper

Pith reviewed 2026-05-24 03:24 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CL

keywords few-shot learningsample selectionin-context learningmeta-learningfew-shot fine-tuningautomatic combination

0 comments

The pith

The ACSESS method automatically combines 23 sample selection strategies to improve few-shot learning across models and datasets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proposes ACSESS as a way to automatically combine various sample selection strategies rather than relying on one. It evaluates the approach using 23 strategies on 5 in-context learning models and 3 few-shot learning methods across 6 text and 8 image datasets. The combination consistently beats individual strategies and matches or surpasses specialized baselines. Benefits are largest when selecting very few shots and hold even for smaller datasets.

Core claim

The combination of strategies through the ACSESS method consistently outperforms all individual selection strategies and performs on par or exceeds the in-context learning specific baselines.

What carries the argument

ACSESS, a method for automatic combination of sample selection strategies that leverages their complementarity.

If this is right

Sample selection remains effective even on smaller datasets.
Greatest benefits occur when only a few shots are selected.
The advantage diminishes as the number of shots increases.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Applying automatic combination could reduce reliance on hand-crafted strategies for new few-shot tasks.
The method might extend to other selection problems in machine learning where multiple heuristics exist.

Load-bearing premise

The 23 sample selection strategies possess complementary strengths whose automatic combination generalizes across models, datasets, and few-shot paradigms without dataset-specific overfitting or extensive per-task tuning.

What would settle it

An experiment on a new dataset or model where the ACSESS combination fails to outperform the best single strategy or the ICL baselines would disprove the main result.

Figures

Figures reproduced from arXiv: 2402.03038 by Branislav Pecher, Ivan Srba, Joaquin Vanschoren, Maria Bielikova.

**Figure 1.** Figure 1: Benefit of the different selection strategies, calculated as the difference in accuracy between the specific strategy and the classic few-shot selection, aggregated over the image and text datasets (boxplots show the distribution of results across the datasets). The performance of the classic selection is represented as the red dashed line (zero value). The consistently beneficial selection strategies depe… view at source ↗

**Figure 2.** Figure 2: Comparison of the LENS method (Li & Qiu, 2023) to our proposed automatic combination of selection strategies (ACSESS). performance and higher variance in the results. Even though the uniform combination is computationally less expensive than the weighted combination, its performance increase is only slightly lower (average difference of 0.10 − 0.25). As such, the uniform weighting represents a good tradeo… view at source ↗

**Figure 3.** Figure 3: Effect of the number of shots on the benefit of selection strategies, comparing ACSESS and random selection, aggregated over datasets. The benefit is calculated as a difference to the classic selection at 5-shots. The benefit of sample selection is more significant at lower number of shots. At larger number of shots, the benefit and the performance boost from further increase of shots becomes negligible [… view at source ↗

**Figure 4.** Figure 4: Standard deviation introduced by multiple runs of the different selection strategies. The results for Prototypical Networks, MAML and Few-Shot Fine-Tuning are aggregated over both the image and text datasets, while the results for Mistral and Zephyr are only from text datasets. The ACSESS method shows lower sensitivity to repeated runs compared to the majority of the strategies included in the combination.… view at source ↗

**Figure 5.** Figure 5: Distribution of the selection strategies benefit (calculated as difference to the classic few-shot selection) for the 5-shot and 10-shot setting. The benefit of the different selection strategies is more significant at lower number of shots. E. Comparison of Sample Selection Impact Between 5-Shot and 10-Shot Setting To better explore the effect of increasing number of selected shots on the sample selection… view at source ↗

**Figure 6.** Figure 6: Benefit of the different selection strategies calculated as the difference to the classic few-shot selection strategy for the LR AM.DOG dataset. The performance of the classic selection is represented as the zero value [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗

**Figure 7.** Figure 7: Benefit of the different selection strategies calculated as the difference to the classic few-shot selection strategy for the LR AM.AWA dataset. The performance of the classic selection is represented as the zero value. 21 [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗

**Figure 8.** Figure 8: Benefit of the different selection strategies calculated as the difference to the classic few-shot selection strategy for the VCL.APL dataset. The performance of the classic selection is represented as the zero value. 22 [PITH_FULL_IMAGE:figures/full_fig_p022_8.png] view at source ↗

**Figure 9.** Figure 9: Benefit of the different selection strategies calculated as the difference to the classic few-shot selection strategy for the HUM ACT.ACT 410 dataset. The performance of the classic selection is represented as the zero value. 23 [PITH_FULL_IMAGE:figures/full_fig_p023_9.png] view at source ↗

**Figure 10.** Figure 10: Benefit of the different selection strategies calculated as the difference to the classic few-shot selection strategy for the MNF.TEX DTD dataset. The performance of the classic selection is represented as the zero value. 24 [PITH_FULL_IMAGE:figures/full_fig_p024_10.png] view at source ↗

**Figure 11.** Figure 11: Benefit of the different selection strategies calculated as the difference to the classic few-shot selection strategy for the MCR.PRT dataset. The performance of the classic selection is represented as the zero value. 25 [PITH_FULL_IMAGE:figures/full_fig_p025_11.png] view at source ↗

**Figure 12.** Figure 12: Benefit of the different selection strategies calculated as the difference to the classic few-shot selection strategy for the MCR.PNU dataset. The performance of the classic selection is represented as the zero value. 26 [PITH_FULL_IMAGE:figures/full_fig_p026_12.png] view at source ↗

**Figure 13.** Figure 13: Benefit of the different selection strategies calculated as the difference to the classic few-shot selection strategy for the PLT.FLW dataset. The performance of the classic selection is represented as the zero value. 27 [PITH_FULL_IMAGE:figures/full_fig_p027_13.png] view at source ↗

**Figure 14.** Figure 14: Benefit of the different selection strategies calculated as the difference to the classic few-shot selection strategy for the 20 News Group dataset. The performance of the classic selection is represented as the zero value. 30 [PITH_FULL_IMAGE:figures/full_fig_p030_14.png] view at source ↗

**Figure 15.** Figure 15: Benefit of the different selection strategies calculated as the difference to the classic few-shot selection strategy for the News Category dataset. The performance of the classic selection is represented as the zero value. 32 [PITH_FULL_IMAGE:figures/full_fig_p032_15.png] view at source ↗

**Figure 16.** Figure 16: Benefit of the different selection strategies calculated as the difference to the classic few-shot selection strategy for the ATIS dataset. The performance of the classic selection is represented as the zero value. 34 [PITH_FULL_IMAGE:figures/full_fig_p034_16.png] view at source ↗

**Figure 17.** Figure 17: Benefit of the different selection strategies calculated as the difference to the classic few-shot selection strategy for the Facebook dataset. The performance of the classic selection is represented as the zero value. 36 [PITH_FULL_IMAGE:figures/full_fig_p036_17.png] view at source ↗

**Figure 18.** Figure 18: Benefit of the different selection strategies calculated as the difference to the classic few-shot selection strategy for the HWU-64 dataset. The performance of the classic selection is represented as the zero value [PITH_FULL_IMAGE:figures/full_fig_p038_18.png] view at source ↗

**Figure 19.** Figure 19: Benefit of the different selection strategies calculated as the difference to the classic few-shot selection strategy for the SNIPS dataset. The performance of the classic selection is represented as the zero value. 38 [PITH_FULL_IMAGE:figures/full_fig_p038_19.png] view at source ↗

read the original abstract

In few-shot learning, the selection of samples has a significant impact on the performance of the model. While effective sample selection strategies are well-established in supervised settings, research on large language models largely overlooks them, favouring strategies specifically tailored to individual in-context learning settings. In this paper, we propose a new method for Automatic Combination of SamplE Selection Strategies (ACSESS) to leverage the strengths and complementarity of various well-established selection objectives. We investigate and compare the impact of 23 sample selection strategies on the performance of 5 in-context learning models and 3 few-shot learning approaches (meta-learning, few-shot fine-tuning) over 6 text and 8 image datasets. The experimental results show that the combination of strategies through the ACSESS method consistently outperforms all individual selection strategies and performs on par or exceeds the in-context learning specific baselines. Lastly, we demonstrate that sample selection remains effective even on smaller datasets, yielding the greatest benefits when only a few shots are selected, while its advantage diminishes as the number of shots increases.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ACSESS combines existing selection strategies automatically and beats the singles in broad few-shot tests, but the exact combination rule needs clearer exposition.

read the letter

The main point is that ACSESS takes 23 established sample selection strategies, combines them automatically, and gets better results than any one of them alone across 14 datasets, 5 ICL models, and 3 few-shot setups. The gains show up most clearly when the number of shots is very small and shrink as more data becomes available, which lines up with how selection usually works. They also test on both text and image data, so the pattern is not limited to one domain. This is the useful part: it shows that you can reuse supervised selection ideas in LLM and meta-learning contexts without starting from scratch. The scale of the comparison gives the result some weight; it is not a narrow study on one model or task. The new element is the automatic combination step tailored to these low-data regimes rather than hand-picking or using a single objective. On the soft side, the abstract leaves the mechanics of the combination itself thin. It is not obvious whether ACSESS learns weights per task, uses a fixed rule, or does something else, and that matters for judging whether the method will transfer or just fits the tested collection. The claim of consistent outperformance is strong, so the paper will need solid ablations and statistical checks to hold up. This work is aimed at people who actually pick examples for few-shot or in-context tasks and want a practical way to improve without new theory. The experimental breadth is enough that a serious referee should look at it; the core result is checkable and the limitations are the usual ones around method transparency rather than a broken setup. I would send it to review.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes ACSESS, a method for automatically combining 23 established sample selection strategies to improve few-shot learning. It evaluates the approach on 5 in-context learning models and 3 few-shot paradigms (meta-learning, few-shot fine-tuning) across 6 text and 8 image datasets, claiming that the combination consistently outperforms all individual strategies and matches or exceeds ICL-specific baselines. Additional results indicate that selection benefits are largest at low shot counts and remain effective on smaller datasets.

Significance. If the experimental claims hold after detailed verification, the work would be significant because it demonstrates that automatic combination of general-purpose selection objectives can leverage complementarity to match or surpass specialized ICL methods across modalities and paradigms. This bridges supervised learning literature with LLM few-shot settings and suggests reduced need for per-task ICL tuning.

major comments (2)

[Experimental results] The central experimental claim (abstract and results section) of consistent outperformance rests on comparisons across 23 strategies, 5 models, 3 paradigms, and 14 datasets, yet the provided description contains no mention of statistical significance tests, variance across random seeds, or error bars; this reporting gap is load-bearing for the 'consistently outperforms' assertion.
[Methods] The description of the ACSESS combination mechanism (methods section) is not detailed enough to assess whether the automatic weighting or selection process introduces hidden per-dataset tuning or risks of overfitting to the 14 evaluation sets, which directly affects the generalizability claim.

minor comments (1)

[Methods] Clarify the exact definition and implementation of the 23 strategies and the 3 few-shot approaches to allow replication.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting gaps in experimental reporting and methods clarity. We will revise the manuscript to strengthen both areas while preserving the core contributions.

read point-by-point responses

Referee: [Experimental results] The central experimental claim (abstract and results section) of consistent outperformance rests on comparisons across 23 strategies, 5 models, 3 paradigms, and 14 datasets, yet the provided description contains no mention of statistical significance tests, variance across random seeds, or error bars; this reporting gap is load-bearing for the 'consistently outperforms' assertion.

Authors: We agree that the lack of statistical tests, seed variance, and error bars limits the robustness of the 'consistently outperforms' claim. In the revision we will recompute all main results over at least five random seeds, report mean ± standard deviation, add error bars to figures, and include paired statistical significance tests (Wilcoxon signed-rank) between ACSESS and each baseline. These additions will appear in the results section, tables, and appendix. revision: yes
Referee: [Methods] The description of the ACSESS combination mechanism (methods section) is not detailed enough to assess whether the automatic weighting or selection process introduces hidden per-dataset tuning or risks of overfitting to the 14 evaluation sets, which directly affects the generalizability claim.

Authors: The combination weights in ACSESS are obtained via a single meta-optimization run whose objective and hyperparameters are identical for every dataset; no per-dataset search or validation-set tuning occurs. To make this explicit we will expand the methods section with the full algorithmic pseudocode, the precise objective function, and a statement that all hyperparameters remain fixed across the 14 datasets. We will also add a short paragraph discussing the risk of overfitting to the chosen evaluation collection and note that future work should test on additional held-out domains. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents an empirical method (ACSESS) for combining 23 sample selection strategies and evaluates it across 5 ICL models, 3 few-shot paradigms, 14 datasets (6 text, 8 image), and varying shot counts. No derivation chain, equations, or self-citations are invoked to justify core claims; performance results are grounded in direct experimental comparisons on independent benchmarks rather than reducing to fitted parameters or prior self-referential results by construction. The central claim (combination outperforms individuals) is externally falsifiable via the reported multi-dataset, multi-model evaluation and does not rely on self-definitional loops or uniqueness theorems from the authors' prior work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, no free parameters, axioms, or invented entities are specified. The method presumably involves some weighting or selection mechanism for combination, but details are absent.

pith-pipeline@v0.9.0 · 5718 in / 1090 out tokens · 75274 ms · 2026-05-24T03:24:39.769035+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/ArithmeticFromLogic.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose a new method for Automatic Combination of SamplE Selection Strategies (ACSESS) ... combination of strategies through the ACSESS method consistently outperforms all individual selection strategies

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Medical Incident Causal Factors and Preventive Measures Generation Using Tag-based Example Selection in Few-shot Learning
cs.CL 2026-05 unverdicted novelty 4.0

Tag-based few-shot selection yields higher precision and stability than random or similarity-based methods when using LLMs to analyze medical incidents.

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages · cited by 1 Pith paper · 3 internal anchors

[1]

findings-acl.564

URL https://aclanthology.org/2023. findings-acl.564. Aimen, A., Ladrecha, B., Sidheekh, S., and Krishnan, N. C. Leveraging task variability in meta-learning. SN Com- puter Science, 4(5):539, 2023. An, S., Zhou, B., Lin, Z., Fu, Q., Chen, B., Zheng, N., Chen, W., and Lou, J.-G. Skill-based few-shot selec- tion for in-context learning. In Bouamor, H., Pino,...

work page 2023
[2]

doi: 10.18653/v1/2023.emnlp-main.831

Association for Computational Linguistics. doi: 10.18653/v1/2023.emnlp-main.831. URL https:// aclanthology.org/2023.emnlp-main.831. Chang, T.-Y . and Jia, R. Data curation alone can stabilize in- context learning. In Proceedings of the 61st Annual Meet- ing of the Association for Computational Linguistics (Vol- ume 1: Long Papers), pp. 8123–8144, Toronto,...

work page doi:10.18653/v1/2023.emnlp-main.831 2023
[3]

Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces

URL https://openreview.net/forum? id=HJg2b0VYDr. Coucke, A., Saade, A., Ball, A., Bluche, T., Caulier, A., Leroy, D., Doumouro, C., Gisselbrecht, T., Caltagirone, F., Lavril, T., et al. Snips voice platform: an embedded spoken language understanding system for private-by- design voice interfaces. arXiv preprint arXiv:1805.10190, 2018. Deng, J., Dong, W., ...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.18653/v1/n19-1423 2018
[4]

nlp4convai-1.15

URL https://aclanthology.org/2022. nlp4convai-1.15. He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learn- ing for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016. Hemphill, C. T., Godfrey, J. J., and Doddington, G. R. The ATIS spoken language systems pilot corpus. In Speech and ...

work page 2022
[5]

URL https://proceedings.mlr.press/ v132/iyer21a.html. Iyer, R. K. and Bilmes, J. A. Submodular opti- mization with submodular cover and submodular knapsack constraints. In Burges, C., Bottou, L., Welling, M., Ghahramani, Z., and Weinberger, K. (eds.), Advances in Neural Information Process- ing Systems , volume 26. Curran Associates, Inc., 10 Automatic Co...

work page
[6]

Mistral 7B

URL https://proceedings.neurips. cc/paper_files/paper/2013/file/ a1d50185e7426cbb0acad1e6ca74b9aa-Paper. pdf. Jiang, A. Q., Sablayrolles, A., Mensch, A., Bamford, C., Chaplot, D. S., Casas, D. d. l., Bressand, F., Lengyel, G., Lample, G., Saulnier, L., et al. Mistral 7b. arXiv preprint arXiv:2310.06825, 2023. Jundi, I. and Lapesa, G. How to translate your...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.18653/v1/2022 2013
[7]

findings-emnlp.411

URL https://aclanthology.org/2023. findings-emnlp.411. Li, X., Lv, K., Yan, H., Lin, T., Zhu, W., Ni, Y ., Xie, G., Wang, X., and Qiu, X. Unified demonstration retriever for in-context learning. In Rogers, A., Boyd-Graber, J., and Okazaki, N. (eds.), Proceedings of the 61st Annual Meet- ing of the Association for Computational Linguistics (Vol- ume 1: Lon...

work page doi:10.18653/v1/2023.acl-long.256 2023
[8]

doi: 10.18653/v1/2021.emnlp-main.607

Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.607. URL https:// aclanthology.org/2021.emnlp-main.607. Paul, M., Ganguli, S., and Dziugaite, G. K. Deep learning on a data diet: Finding important examples early in training. In Ranzato, M., Beygelzimer, A., Dauphin, Y ., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural I...

work page doi:10.18653/v1/2021.emnlp-main.607 2021
[9]

Gupta, Xiaojiang Chen, and Xin Wang

URL https://proceedings.neurips. cc/paper_files/paper/2021/file/ ac56f8fe9eea3e4a365f29f0f1957c55-Paper. pdf. Pecher, B., Srba, I., and Bielikova, M. On the effects of randomness on stability of learning with limited la- belled data: A systematic literature review.arXiv preprint arXiv:2312.01082, 2023. Qin, C., Zhang, A., Dagar, A., and Ye, W. In-context ...

work page doi:10.1145/3472291 2021
[10]

naacl-main.191

URL https://aclanthology.org/2022. naacl-main.191. Scarlatos, A. and Lan, A. Reticl: Sequential retrieval of in-context examples with reinforcement learning. arXiv preprint arXiv:2305.14502, 2023. Schr¨oder, C., Niekler, A., and Potthast, M. Revisiting uncertainty-based query strategies for active learning with transformers. In Muresan, S., Nakov, P., and...

work page doi:10.18653/v1/2022.findings-acl 2022
[11]

findings-acl.172

URL https://aclanthology.org/2022. findings-acl.172. Schuster, S., Gupta, S., Shah, R., and Lewis, M. Cross- lingual transfer learning for multilingual task oriented dialog. In Burstein, J., Doran, C., and Solorio, T. (eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language T...

work page doi:10.18653/v1/n19-1380 2022
[12]

emnlp-main.608

URL https://aclanthology.org/2021. emnlp-main.608. Shum, K., Diao, S., and Zhang, T. Automatic prompt augmentation and selection with chain-of-thought from labeled data. In Bouamor, H., Pino, J., and Bali, K. (eds.), Findings of the Association for Computational Linguistics: EMNLP 2023 , pp. 12113–12139, Singa- pore, December 2023. Association for Computa...

work page doi:10.18653/v1/2023.findings-emnlp 2021
[13]

findings-emnlp.811

URL https://aclanthology.org/2023. findings-emnlp.811. Snell, J., Swersky, K., and Zemel, R. Prototypical networks for few-shot learning. Advances in neural information processing systems, 30, 2017. Song, Y ., Wang, T., Cai, P., Mondal, S. K., and Sahoo, J. P. A comprehensive survey of few-shot learning: Evolution, applications, challenges, and opportunit...

work page doi:10.1145/3582688 2023
[14]

Zephyr: Direct Distillation of LM Alignment

Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.746. URL https:// aclanthology.org/2020.emnlp-main.746. Tibshirani, R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58(1):267–288, 1996. Toneva, M., Sordoni, A., des Combes, R. T., Trischler, A., Beng...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.18653/v1/2020.emnlp-main.746 2020
[15]

acl-long.859

URL https://aclanthology.org/2023. acl-long.859. Wang, J., Song, Z., Su, X., Si, L., Dong, H., Qiang, W., and Zheng, C. Learning to sample tasks for meta learning, 2023. Wang, S., Xu, Y ., Fang, Y ., Liu, Y ., Sun, S., Xu, R., Zhu, C., and Zeng, M. Training data is more valuable than you think: A simple and effective method by retriev- ing from training d...

work page doi:10.18653/v1/2022.acl-long.226 2023
[16]

emnlp-main.308/

URL https://aclanthology.org/2023. findings-acl.273. Yu, R., Liu, S., and Wang, X. Dataset distillation: A compre- hensive review. arXiv preprint arXiv:2301.07014, 2023. Zemlyanskiy, Y ., de Jong, M., Ainslie, J., Pasupat, P., Shaw, P., Qiu, L., Sanghai, S., and Sha, F. Generate- and-retrieve: Use your predictions to improve retrieval for semantic parsing...

work page doi:10.18653/v1/2021 2023
[17]

The number of such forgetting events is used as the sample score and samples that are forgotten the most often (or least often) are selected

simply calculates how often the specific sample is incorrectly classified after being classified correctly in the previous epoch. The number of such forgetting events is used as the sample score and samples that are forgotten the most often (or least often) are selected. In our evaluation of single-property strategies, we consider only the setting of most...

work page 2013
[18]

[Input] [Output] Intent classification Determine intent of the sentence using following options: 1) [Option 1] 2) [Option 2] 3) [Option 3]

[Option 4] 5) [Option 5]. [Input] [Output] Intent classification Determine intent of the sentence using following options: 1) [Option 1] 2) [Option 2] 3) [Option 3]

work page
[19]

[Option 4] 5) [Option 5]. [Input] [Output] Dataset Verbaliser 20 News Group {IBM, Middle East Politics, Windows XP, Motorcycles, Medicine, For Sale, Religion, MS Windows, Baseball, Auto, Hockey, Mac, Graphics, Christianity, Guns, Electronics, Space, Crypto, Atheism, Politics} News Category {Politics, World News, Parenting, Money, Wellness, Business, Weddi...

work page 2023

[1] [1]

findings-acl.564

URL https://aclanthology.org/2023. findings-acl.564. Aimen, A., Ladrecha, B., Sidheekh, S., and Krishnan, N. C. Leveraging task variability in meta-learning. SN Com- puter Science, 4(5):539, 2023. An, S., Zhou, B., Lin, Z., Fu, Q., Chen, B., Zheng, N., Chen, W., and Lou, J.-G. Skill-based few-shot selec- tion for in-context learning. In Bouamor, H., Pino,...

work page 2023

[2] [2]

doi: 10.18653/v1/2023.emnlp-main.831

Association for Computational Linguistics. doi: 10.18653/v1/2023.emnlp-main.831. URL https:// aclanthology.org/2023.emnlp-main.831. Chang, T.-Y . and Jia, R. Data curation alone can stabilize in- context learning. In Proceedings of the 61st Annual Meet- ing of the Association for Computational Linguistics (Vol- ume 1: Long Papers), pp. 8123–8144, Toronto,...

work page doi:10.18653/v1/2023.emnlp-main.831 2023

[3] [3]

Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces

URL https://openreview.net/forum? id=HJg2b0VYDr. Coucke, A., Saade, A., Ball, A., Bluche, T., Caulier, A., Leroy, D., Doumouro, C., Gisselbrecht, T., Caltagirone, F., Lavril, T., et al. Snips voice platform: an embedded spoken language understanding system for private-by- design voice interfaces. arXiv preprint arXiv:1805.10190, 2018. Deng, J., Dong, W., ...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.18653/v1/n19-1423 2018

[4] [4]

nlp4convai-1.15

URL https://aclanthology.org/2022. nlp4convai-1.15. He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learn- ing for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016. Hemphill, C. T., Godfrey, J. J., and Doddington, G. R. The ATIS spoken language systems pilot corpus. In Speech and ...

work page 2022

[5] [5]

URL https://proceedings.mlr.press/ v132/iyer21a.html. Iyer, R. K. and Bilmes, J. A. Submodular opti- mization with submodular cover and submodular knapsack constraints. In Burges, C., Bottou, L., Welling, M., Ghahramani, Z., and Weinberger, K. (eds.), Advances in Neural Information Process- ing Systems , volume 26. Curran Associates, Inc., 10 Automatic Co...

work page

[6] [6]

Mistral 7B

URL https://proceedings.neurips. cc/paper_files/paper/2013/file/ a1d50185e7426cbb0acad1e6ca74b9aa-Paper. pdf. Jiang, A. Q., Sablayrolles, A., Mensch, A., Bamford, C., Chaplot, D. S., Casas, D. d. l., Bressand, F., Lengyel, G., Lample, G., Saulnier, L., et al. Mistral 7b. arXiv preprint arXiv:2310.06825, 2023. Jundi, I. and Lapesa, G. How to translate your...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.18653/v1/2022 2013

[7] [7]

findings-emnlp.411

URL https://aclanthology.org/2023. findings-emnlp.411. Li, X., Lv, K., Yan, H., Lin, T., Zhu, W., Ni, Y ., Xie, G., Wang, X., and Qiu, X. Unified demonstration retriever for in-context learning. In Rogers, A., Boyd-Graber, J., and Okazaki, N. (eds.), Proceedings of the 61st Annual Meet- ing of the Association for Computational Linguistics (Vol- ume 1: Lon...

work page doi:10.18653/v1/2023.acl-long.256 2023

[8] [8]

doi: 10.18653/v1/2021.emnlp-main.607

Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.607. URL https:// aclanthology.org/2021.emnlp-main.607. Paul, M., Ganguli, S., and Dziugaite, G. K. Deep learning on a data diet: Finding important examples early in training. In Ranzato, M., Beygelzimer, A., Dauphin, Y ., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural I...

work page doi:10.18653/v1/2021.emnlp-main.607 2021

[9] [9]

Gupta, Xiaojiang Chen, and Xin Wang

URL https://proceedings.neurips. cc/paper_files/paper/2021/file/ ac56f8fe9eea3e4a365f29f0f1957c55-Paper. pdf. Pecher, B., Srba, I., and Bielikova, M. On the effects of randomness on stability of learning with limited la- belled data: A systematic literature review.arXiv preprint arXiv:2312.01082, 2023. Qin, C., Zhang, A., Dagar, A., and Ye, W. In-context ...

work page doi:10.1145/3472291 2021

[10] [10]

naacl-main.191

URL https://aclanthology.org/2022. naacl-main.191. Scarlatos, A. and Lan, A. Reticl: Sequential retrieval of in-context examples with reinforcement learning. arXiv preprint arXiv:2305.14502, 2023. Schr¨oder, C., Niekler, A., and Potthast, M. Revisiting uncertainty-based query strategies for active learning with transformers. In Muresan, S., Nakov, P., and...

work page doi:10.18653/v1/2022.findings-acl 2022

[11] [11]

findings-acl.172

URL https://aclanthology.org/2022. findings-acl.172. Schuster, S., Gupta, S., Shah, R., and Lewis, M. Cross- lingual transfer learning for multilingual task oriented dialog. In Burstein, J., Doran, C., and Solorio, T. (eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language T...

work page doi:10.18653/v1/n19-1380 2022

[12] [12]

emnlp-main.608

URL https://aclanthology.org/2021. emnlp-main.608. Shum, K., Diao, S., and Zhang, T. Automatic prompt augmentation and selection with chain-of-thought from labeled data. In Bouamor, H., Pino, J., and Bali, K. (eds.), Findings of the Association for Computational Linguistics: EMNLP 2023 , pp. 12113–12139, Singa- pore, December 2023. Association for Computa...

work page doi:10.18653/v1/2023.findings-emnlp 2021

[13] [13]

findings-emnlp.811

URL https://aclanthology.org/2023. findings-emnlp.811. Snell, J., Swersky, K., and Zemel, R. Prototypical networks for few-shot learning. Advances in neural information processing systems, 30, 2017. Song, Y ., Wang, T., Cai, P., Mondal, S. K., and Sahoo, J. P. A comprehensive survey of few-shot learning: Evolution, applications, challenges, and opportunit...

work page doi:10.1145/3582688 2023

[14] [14]

Zephyr: Direct Distillation of LM Alignment

Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.746. URL https:// aclanthology.org/2020.emnlp-main.746. Tibshirani, R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58(1):267–288, 1996. Toneva, M., Sordoni, A., des Combes, R. T., Trischler, A., Beng...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.18653/v1/2020.emnlp-main.746 2020

[15] [15]

acl-long.859

URL https://aclanthology.org/2023. acl-long.859. Wang, J., Song, Z., Su, X., Si, L., Dong, H., Qiang, W., and Zheng, C. Learning to sample tasks for meta learning, 2023. Wang, S., Xu, Y ., Fang, Y ., Liu, Y ., Sun, S., Xu, R., Zhu, C., and Zeng, M. Training data is more valuable than you think: A simple and effective method by retriev- ing from training d...

work page doi:10.18653/v1/2022.acl-long.226 2023

[16] [16]

emnlp-main.308/

URL https://aclanthology.org/2023. findings-acl.273. Yu, R., Liu, S., and Wang, X. Dataset distillation: A compre- hensive review. arXiv preprint arXiv:2301.07014, 2023. Zemlyanskiy, Y ., de Jong, M., Ainslie, J., Pasupat, P., Shaw, P., Qiu, L., Sanghai, S., and Sha, F. Generate- and-retrieve: Use your predictions to improve retrieval for semantic parsing...

work page doi:10.18653/v1/2021 2023

[17] [17]

The number of such forgetting events is used as the sample score and samples that are forgotten the most often (or least often) are selected

simply calculates how often the specific sample is incorrectly classified after being classified correctly in the previous epoch. The number of such forgetting events is used as the sample score and samples that are forgotten the most often (or least often) are selected. In our evaluation of single-property strategies, we consider only the setting of most...

work page 2013

[18] [18]

[Input] [Output] Intent classification Determine intent of the sentence using following options: 1) [Option 1] 2) [Option 2] 3) [Option 3]

[Option 4] 5) [Option 5]. [Input] [Output] Intent classification Determine intent of the sentence using following options: 1) [Option 1] 2) [Option 2] 3) [Option 3]

work page

[19] [19]

[Option 4] 5) [Option 5]. [Input] [Output] Dataset Verbaliser 20 News Group {IBM, Middle East Politics, Windows XP, Motorcycles, Medicine, For Sale, Religion, MS Windows, Baseball, Auto, Hockey, Mac, Graphics, Christianity, Guns, Electronics, Space, Crypto, Atheism, Politics} News Category {Politics, World News, Parenting, Money, Wellness, Business, Weddi...

work page 2023