Self-Improving In-Context Learning
Pith reviewed 2026-05-25 04:53 UTC · model grok-4.3
The pith
Optimizing the continuous embeddings of a fixed few-shot prompt at test time improves in-context learning by maximizing a log-probability proxy on the demonstrations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that the log-probabilities assigned to demonstrated outputs, available from a single forward pass, constitute a reliable optimization signal for in-context learning; maximizing a formal bounded confidence proxy derived from them via zeroth-order search over prompt embeddings yields better task performance on unseen inputs from the same fixed demonstrations.
What carries the argument
A bounded self-supervised confidence proxy derived from the log-probabilities of demonstrated outputs, maximized over continuous prompt embeddings via zeroth-order optimization.
If this is right
- The calibration procedure matches or improves base-model performance across a range of ICL tasks.
- It outperforms classification-specific baselines on most evaluated tasks.
- Statistically significant correlation exists between improvement in the proxy and gains in downstream accuracy.
- The same procedure applies without modification to both classification and free-form generation tasks.
Where Pith is reading between the lines
- The approach could be tested on whether the same proxy can guide selection or reweighting of which demonstrations to include in the prompt.
- It implies that prompt embeddings can be treated as continuous parameters for inference-time adaptation even when the underlying model weights stay frozen.
- Real-time deployment on streaming inputs might become feasible if the zeroth-order steps can be limited to a small fixed budget per query.
Load-bearing premise
Maximizing the log-probability proxy computed on the fixed demonstrations will produce better predictions on unseen test inputs rather than merely fitting the demonstrations more closely.
What would settle it
If optimizing the proxy produces no corresponding increase (or produces a decrease) in accuracy on held-out test examples, while the base unoptimized prompt remains unchanged, the claim that the proxy encodes a reliable downstream signal would be falsified.
Figures
read the original abstract
We propose to improve in-context learning (ICL) by optimizing the continuous embeddings of a fixed few-shot prompt at test time. The key observation is that the log-probabilities a model assigns to its demonstrated outputs$\unicode{x2013}$available from a single forward pass without generating any tokens$\unicode{x2013}$provide a meaningful signal for how well the model has inferred the task from its demonstrations. We formalize this signal as a bounded, self-supervised confidence proxy and maximize it via zeroth-order optimization over the prompt embeddings, yielding a test-time calibration procedure. The approach requires no finetuning, no token generation, no predefined label set, and no external data, making it equally applicable to both classification and free-form generation tasks. Across a comprehensive suite of ICL tasks, the proposed calibration consistently matches or improves upon the base model and outperforms classification-specific baselines on most tasks. The statistically significant correlation between proxy improvement and downstream accuracy gain confirms that the proposed proxy encodes a reliable optimization signal for in-context learning.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes optimizing the continuous embeddings of a fixed few-shot prompt at test time to maximize a bounded self-supervised confidence proxy based on the model's log-probabilities assigned to the demonstrated outputs (available from a single forward pass). This zeroth-order optimization yields a calibration procedure requiring no finetuning, token generation, predefined label sets, or external data, applicable to both classification and free-form generation. Across ICL tasks the method is reported to match or improve base-model performance, outperform classification-specific baselines on most tasks, and exhibit a statistically significant correlation between proxy improvement and downstream accuracy gains.
Significance. If the central claim holds, the work supplies a lightweight, general test-time adaptation technique for ICL that relies solely on the model's internal signals and extends to generation tasks. The reported correlation supplies empirical grounding for the proxy as an optimization signal. No machine-checked proofs or parameter-free derivations are claimed, but the absence of external data or generation steps is a practical strength.
major comments (2)
- [Abstract] Abstract: the claim that the statistically significant correlation 'confirms that the proposed proxy encodes a reliable optimization signal' does not address whether maximizing the demonstration log-prob proxy changes conditional behavior on unseen test inputs or merely increases probability mass on the fixed demonstration tokens; this distinction is load-bearing for the generalization claim.
- [Method] Method (zeroth-order optimization description): because the objective is defined exclusively on the fixed demonstrations and never observes test inputs, the manuscript must supply analysis or controls showing that embedding adjustments alter predictions outside the demonstration set rather than overfitting the demonstrated outputs; the current correlation evidence alone leaves this open.
minor comments (1)
- Notation for the bounded proxy could be introduced with an explicit equation rather than prose description to improve reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback, which helps clarify the generalization properties of our test-time optimization procedure. We address each major comment below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that the statistically significant correlation 'confirms that the proposed proxy encodes a reliable optimization signal' does not address whether maximizing the demonstration log-prob proxy changes conditional behavior on unseen test inputs or merely increases probability mass on the fixed demonstration tokens; this distinction is load-bearing for the generalization claim.
Authors: We agree the abstract phrasing should be more precise on this point. All downstream accuracy results are measured on held-out test inputs never seen during optimization, and the reported correlation is specifically between proxy gains on the demonstrations and accuracy improvements on those unseen test examples. This already provides evidence that the embedding adjustments affect conditional behavior beyond the fixed demonstrations. We will revise the abstract to explicitly state that the correlation is with test-set accuracy gains, thereby underscoring the generalization aspect. revision: yes
-
Referee: [Method] Method (zeroth-order optimization description): because the objective is defined exclusively on the fixed demonstrations and never observes test inputs, the manuscript must supply analysis or controls showing that embedding adjustments alter predictions outside the demonstration set rather than overfitting the demonstrated outputs; the current correlation evidence alone leaves this open.
Authors: We concur that dedicated controls would strengthen the presentation. While the statistically significant correlation with test accuracy (measured on inputs outside the optimization set) already indicates that the adjustments influence predictions on unseen data, we will add a new analysis subsection. This will include quantitative comparisons of model output distributions on test inputs before versus after optimization, along with discussion of the bounded, small-magnitude nature of the zeroth-order updates to address potential overfitting concerns. revision: yes
Circularity Check
No significant circularity; derivation remains self-contained against external test accuracy
full rationale
The central procedure optimizes prompt embeddings to maximize a log-probability proxy computed solely on the fixed demonstration outputs. This proxy is explicitly defined from the model's forward pass on those demonstrations, but the claimed benefit is measured on independent test inputs whose labels are never observed during optimization. The reported statistically significant correlation between proxy improvement and downstream accuracy gain constitutes an external empirical check rather than a definitional reduction. No equations equate the optimized proxy directly to test accuracy by construction, no self-citation chains bear the load, and no uniqueness theorems or ansatzes are smuggled in. The method is therefore not forced to succeed; any observed lift on unseen data stands as a genuine empirical claim.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Log-probabilities assigned to demonstrated outputs provide a meaningful signal for how well the model has inferred the task
invented entities (1)
-
bounded self-supervised confidence proxy
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Rahul Atul Bhope, Praveen Venkateswaran, K. R. Jayaram, Vatche Isahagian, Vinod Muthusamy, and Nalini Venkatasubramanian. OptiSeq: Ordering examples on-the-fly for in-context learning. In Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng, editors,Findings of the Association for Computational Linguistics: EMNLP 2025, pages 2486...
work page 2025
-
[2]
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin...
work page 2020
-
[3]
ICLEval: Evaluating in-context learning ability of large language models
Wentong Chen, Yankai Lin, ZhenHao Zhou, HongYun Huang, YanTao Jia, Zhao Cao, and Ji-Rong Wen. ICLEval: Evaluating in-context learning ability of large language models. In Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, and Steven Schockaert, editors,Proceedings of the 31st International Conference on Computational Lingui...
work page 2025
-
[4]
Token- based decision criteria are suboptimal in in-context learning
Hakaze Cho, Yoshihiro Sakai, Mariko Kato, Kenshiro Tanaka, Akira Ishii, and Naoya Inoue. Token- based decision criteria are suboptimal in in-context learning. In Luis Chiruzzo, Alan Ritter, and 12 Lu Wang, editors,Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Tech...
-
[5]
Why can GPT learn in-context? language models secretly perform gradient descent as meta-optimizers
Damai Dai, Yutao Sun, Li Dong, Yaru Hao, Shuming Ma, Zhifang Sui, and Furu Wei. Why can GPT learn in-context? language models secretly perform gradient descent as meta-optimizers. In Anna Rogers, Jor- dan Boyd-Graber, and Naoaki Okazaki, editors,Findings of the Association for Computational Linguistics: ACL 2023, pages 4005–4019, Toronto, Canada, July 202...
-
[7]
Complexity-based prompting for multi-step reasoning, 2023
Yao Fu, Hao Peng, Ashish Sabharwal, Peter Clark, and Tushar Khot. Complexity-based prompting for multi-step reasoning, 2023. URLhttps://arxiv.org/abs/2210.00720
-
[8]
Variance- reduced zeroth-order methods for fine-tuning language models
Tanmay Gautam, Youngsuk Park, Hao Zhou, Parameswaran Raman, and Wooseok Ha. Variance- reduced zeroth-order methods for fine-tuning language models. InForty-first International Conference on Machine Learning, 2024. URLhttps://openreview.net/forum?id=VHO4nE7v41
work page 2024
-
[9]
Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava S...
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[10]
Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sébastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee, and Yuanzhi Li. Textbooks are all you need, 2023. URLhttps://arxiv.o...
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[11]
What makes a good order of examples in in-context learning
Qi Guo, Leiyu Wang, Yidong Wang, Wei Ye, and Shikun Zhang. What makes a good order of examples in in-context learning. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors,Findings of the Association for Computational Linguistics: ACL 2024, pages 14892–14904, Bangkok, Thailand, August 2024. Association for Computational Linguistics. doi: 10.18653/v1/...
-
[12]
Prototypical calibration for few-shot learning of language models
Zhixiong Han, Yaru Hao, Li Dong, Yutao Sun, and Furu Wei. Prototypical calibration for few-shot learning of language models. InThe Eleventh International Conference on Learning Representations,
-
[13]
URLhttps://openreview.net/forum?id=nUsP9lFADUF
-
[14]
In-context learning creates task vectors
Roee Hendel, Mor Geva, and Amir Globerson. In-context learning creates task vectors. In Houda Bouamor, Juan Pino, and Kalika Bali, editors,Findings of the Association for Computational Linguistics: EMNLP 2023, pages 9318–9333, Singapore, December 2023. Association for Computational Linguis- tics. doi: 10.18653/v1/2023.findings-emnlp.624. URL https://aclan...
-
[15]
Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. Measuring massive multitask language understanding.Proceedings of the International Conference on Learning Representations (ICLR), 2021
work page 2021
-
[16]
Surface form competition: Why the highest probability answer isn’t always right
Ari Holtzman, Peter West, Vered Shwartz, Yejin Choi, and Luke Zettlemoyer. Surface form competition: Why the highest probability answer isn’t always right. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih, editors,Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7038–7051, Online and...
-
[17]
Hyuhng Joon Kim, Hyunsoo Cho, Junyeob Kim, Taeuk Kim, Kang Min Yoo, and Sang goo Lee. Self-generated in-context learning: Leveraging auto-regressive language models as a demonstration generator, 2022. URLhttps://arxiv.org/abs/2206.08082. 15
-
[18]
Answer-level calibration for free-form multiple choice question answering
Sawan Kumar. Answer-level calibration for free-form multiple choice question answering. In Smaranda Muresan, Preslav Nakov, and Aline Villavicencio, editors,Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 665–679, Dublin, Ireland, May 2022. Association for Computational Linguistics. do...
-
[19]
Gonzalez, Hao Zhang, and Ion Stoica
Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, and Ion Stoica. Efficient memory management for large language model serving with pagedattention. InProceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles, 2023
work page 2023
-
[20]
Diverse demonstrations improve in-context compositional generalization
Itay Levy, Ben Bogin, and Jonathan Berant. Diverse demonstrations improve in-context compositional generalization. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors,Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1401– 1422, Toronto, Canada, July 2023. Association for Com...
-
[21]
Finding support examples for in-context learning
Xiaonan Li and Xipeng Qiu. Finding support examples for in-context learning. In Houda Bouamor, Juan Pino, and Kalika Bali, editors,Findings of the Association for Computational Linguistics: EMNLP 2023, pages 6219–6235, Singapore, December 2023. Association for Computational Linguistics. doi: 10. 18653/v1/2023.findings-emnlp.411. URL https://aclanthology.o...
work page 2023
-
[22]
Task calibration: Calibrating large language models on inference tasks
Yingjie Li, Yun Luo, Xiaotian Xie, and Yue Zhang. Task calibration: Calibrating large language models on inference tasks. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar, editors,Findings of the Association for Computational Linguistics: ACL 2025, pages 6937–6951, Vienna, Austria, July 2025. Association for Computational Lin...
-
[23]
𝑠𝑒2: Sequential example selection for in-context learning
Haoyu Liu, Jianfeng Liu, Shaohan Huang, Yuefeng Zhan, Hao Sun, Weiwei Deng, Furu Wei, and Qi Zhang. 𝑠𝑒2: Sequential example selection for in-context learning. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors,Findings of the Association for Computational Linguistics: ACL 2024, pages 5262–5284, Bangkok, Thailand, August 2024. Association for Comput...
-
[24]
Jiachang Liu, Dinghan Shen, Yizhe Zhang, Bill Dolan, Lawrence Carin, and Weizhu Chen. What makes good in-context examples for GPT-3? In Eneko Agirre, Marianna Apidianaki, and Ivan Vulić, editors, Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures, pages 100–114, D...
work page 2022
-
[25]
doi: 10.18653/v1/2022.deelio-1.10
Association for Computational Linguistics. doi: 10.18653/v1/2022.deelio-1.10. URL https: //aclanthology.org/2022.deelio-1.10/
-
[26]
Sheng Liu, Haotian Ye, Lei Xing, and James Y. Zou. In-context vectors: Making in context learning more effective and controllable through latent space steering. In Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors, Proceedings of the 41st International Conference on Machine L...
work page 2024
-
[27]
Sparse meZO: Less parameters for better performance in zeroth-order LLM fine-tuning
Yong Liu, Zirui Zhu, Chaoyu Gong, Minhao Cheng, Cho-Jui Hsieh, and Yang You. Sparse meZO: Less parameters for better performance in zeroth-order LLM fine-tuning. InThe Thirty-ninth Annual 16 Conference on Neural Information Processing Systems, 2025. URL https://openreview.net/forum? id=Tjw0ACu3NL
work page 2025
-
[28]
Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity
Yao Lu, Max Bartolo, Alastair Moore, Sebastian Riedel, and Pontus Stenetorp. Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. In Smaranda Muresan, Preslav Nakov, and Aline Villavicencio, editors,Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)...
-
[29]
Z-ICL: Zero-shot in-context learning with pseudo-demonstrations
Xinxi Lyu, Sewon Min, Iz Beltagy, Luke Zettlemoyer, and Hannaneh Hajishirzi. Z-ICL: Zero-shot in-context learning with pseudo-demonstrations. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors,Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2304–2317, Toronto, Canada, July...
-
[30]
Lee, Danqi Chen, and Sanjeev Arora
Sadhika Malladi, Tianyu Gao, Eshaan Nichani, Alex Damian, Jason D. Lee, Danqi Chen, and Sanjeev Arora. Fine-tuning language models with just forward passes. InThirty-seventh Conference on Neural Information Processing Systems, 2023. URLhttps://openreview.net/forum?id=Vota6rFhBQ
work page 2023
-
[31]
Noisy channel language model prompting for few-shot text classification
Sewon Min, Mike Lewis, Hannaneh Hajishirzi, and Luke Zettlemoyer. Noisy channel language model prompting for few-shot text classification. In Smaranda Muresan, Preslav Nakov, and Aline Villavicencio, editors,Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5316–5330, Dublin, Ireland, Ma...
-
[32]
Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, and Luke Zettlemoyer. Rethinking the role of demonstrations: What makes in-context learning work? In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang, editors,Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11048–11064, Abu Dhab...
-
[33]
Random gradient-free minimization of convex functions
Yurii Nesterov and Vladimir Spokoiny. Random gradient-free minimization of convex functions. Foundations of Computational Mathematics, 17(2):527–566, 2017. doi: 10.1007/s10208-015-9296-2. URL https://doi.org/10.1007/s10208-015-9296-2
-
[34]
Revisiting demonstration selection strategies in in-context learning
Keqin Peng, Liang Ding, Yancheng Yuan, Xuebo Liu, Min Zhang, Yuanxin Ouyang, and Dacheng Tao. Revisiting demonstration selection strategies in in-context learning. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors,Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 9090–9101, Bangk...
-
[35]
doi: 10.18653/v1/2024.acl-long.492
Association for Computational Linguistics. doi: 10.18653/v1/2024.acl-long.492. URL https: //aclanthology.org/2024.acl-long.492/
-
[36]
Rapid selection and ordering of in- context demonstrations via prompt embedding clustering
Kha Pham, Hung Le, Man Ngo, and Truyen Tran. Rapid selection and ordering of in- context demonstrations via prompt embedding clustering. In Y. Yue, A. Garg, N. Peng, F. Sha, and R. Yu, editors,International Conference on Learning Representations, volume 2025, pages 43540–43556, 2025. URL https://proceedings.iclr.cc/paper_files/paper/2025/file/ 6c2745a8e20...
work page 2025
-
[38]
Language models are unsupervised multitask learners.OpenAI Blog, 2019
Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners.OpenAI Blog, 2019
work page 2019
-
[39]
Test-time detoxification without training or learning anything, 2026
Baturay Saglam and Dionysis Kalogerias. Test-time detoxification without training or learning anything, 2026. URLhttps://arxiv.org/abs/2602.02498
-
[40]
Baturay Saglam and Dionysis Kalogerias. Test-time safety alignment, 2026. URL https://arxiv. org/abs/2604.26167
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[41]
Learning task representations from in-context learning
Baturay Saglam, Xinyang Hu, Zhuoran Yang, Dionysis Kalogerias, and Amin Karbasi. Learning task representations from in-context learning. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar, editors,Findings of the Association for Computational Linguistics: ACL 2025, pages 6634–6663, Vienna, Austria, July 2025. Association for Co...
-
[42]
Hongjin SU, Jungo Kasai, Chen Henry Wu, Weijia Shi, Tianlu Wang, Jiayi Xin, Rui Zhang, Mari Ostendorf, Luke Zettlemoyer, Noah A. Smith, and Tao Yu. Selective annotation makes language models better few-shot learners. InThe Eleventh International Conference on Learning Representations,
-
[43]
URLhttps://openreview.net/forum?id=qY1hlv7gwg
-
[44]
Gemma Team, Morgane Riviere, Shreya Pathak, Pier Giuseppe Sessa, Cassidy Hardin, Surya Bhupati- raju, Léonard Hussenot, Thomas Mesnard, Bobak Shahriari, Alexandre Ramé, Johan Ferret, Peter Liu, Pouya Tafti, Abe Friesen, Michelle Casbon, Sabela Ramos, Ravin Kumar, Charline Le Lan, Sammy Jerome, Anton Tsitsulin, Nino Vieillard, Piotr Stanczyk, Sertan Girgin...
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[45]
Li, Arnab Sen Sharma, Aaron Mueller, Byron C
Eric Todd, Millicent L. Li, Arnab Sen Sharma, Aaron Mueller, Byron C. Wallace, and David Bau. Function vectors in large language models. InThe Twelfth International Conference on Learning Representations, 2024. URLhttps://openreview.net/forum?id=AwyxtyMwaG. arXiv:2310.15213
-
[46]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL htt...
work page 2017
-
[47]
Transformers learn in-context by gradient descent
Johannes Von Oswald, Eyvind Niklasson, Ettore Randazzo, Joao Sacramento, Alexander Mordvintsev, Andrey Zhmoginov, and Max Vladymyrov. Transformers learn in-context by gradient descent. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors,Proceedings of the 40th International Conference on Machi...
work page 2023
-
[48]
Better zero-shot reasoning with self-adaptive prompting
Xingchen Wan, Ruoxi Sun, Hanjun Dai, Sercan Arik, and Tomas Pfister. Better zero-shot reasoning with self-adaptive prompting. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors, Findings of the Association for Computational Linguistics: ACL 2023, pages 3493–3514, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.1...
-
[49]
Yifan Li, Yifan Du, Kun Zhou, Jinpeng Wang, Xin Zhao, and Ji-Rong Wen
Xingchen Wan, Ruoxi Sun, Hootan Nakhost, Hanjun Dai, Julian Eisenschlos, Sercan Arik, and Tomas Pfister. Universal self-adaptive prompting. In Houda Bouamor, Juan Pino, and Kalika Bali, editors, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 7437– 7462, Singapore, December 2023. Association for Computational ...
-
[50]
Label words are anchors: An information flow perspective for understanding in-context learning
Lean Wang, Lei Li, Damai Dai, Deli Chen, Hao Zhou, Fandong Meng, Jie Zhou, and Xu Sun. Label words are anchors: An information flow perspective for understanding in-context learning. In Houda Bouamor, Juan Pino, and Kalika Bali, editors,Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 9840–9855, Singapore, Dece...
-
[51]
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, 19 Quentin Lhoest, and Alexander M. Rush. Transformers: State-of-the-...
work page 2020
-
[52]
Zhiyong Wu, Yaoxiang Wang, Jiacheng Ye, and Lingpeng Kong. Self-adaptive in-context learning: An information compression perspective for in-context example selection and ordering. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors,Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ...
-
[53]
$k$NN prompting: Beyond-context learning with calibration-free nearest neighbor inference
Benfeng Xu, Quan Wang, Zhendong Mao, Yajuan Lyu, Qiaoqiao She, and Yongdong Zhang. $k$NN prompting: Beyond-context learning with calibration-free nearest neighbor inference. InThe Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum? id=fe2S7736sNS
work page 2023
-
[54]
Misconfidence-based demonstration selection for llm in-context learning, 2024
Shangqing Xu and Chao Zhang. Misconfidence-based demonstration selection for llm in-context learning, 2024. URLhttps://arxiv.org/abs/2401.06301
-
[55]
In-context example ordering guided by label distributions
Zhichao Xu, Daniel Cohen, Bei Wang, and Vivek Srikumar. In-context example ordering guided by label distributions. In Kevin Duh, Helena Gomez, and Steven Bethard, editors,Findings of the Association for Computational Linguistics: NAACL 2024, pages 2623–2640, Mexico City, Mexico, June
work page 2024
-
[56]
doi: 10.18653/v1/2024.findings-naacl.167
Association for Computational Linguistics. doi: 10.18653/v1/2024.findings-naacl.167. URL https://aclanthology.org/2024.findings-naacl.167/
-
[57]
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jing Zhou, Jingren Zhou, Junyang Lin, Kai Dang, Keqin Bao, Kexin Yang, ...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[58]
Zhao Yang, Yuanzhe Zhang, Dianbo Sui, Cao Liu, Jun Zhao, and Kang Liu. Representative demonstra- tion selection for in-context learning with two-stage determinantal point process. In Houda Bouamor, Juan Pino, and Kalika Bali, editors,Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 5443–5456, Singapore, Decembe...
-
[59]
Ground-truth labels matter: A deeper look into input-label demonstrations
Kang Min Yoo, Junyeob Kim, Hyuhng Joon Kim, Hyunsoo Cho, Hwiyeol Jo, Sang-Woo Lee, Sang-goo Lee, and Taeuk Kim. Ground-truth labels matter: A deeper look into input-label demonstrations. In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang, editors,Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 2422–2437, Abu D...
-
[60]
Unlocking black-box prompt tuning efficiency via zeroth-order optimization
Heshen Zhan, Congliang Chen, Tian Ding, Ziniu Li, and Ruoyu Sun. Unlocking black-box prompt tuning efficiency via zeroth-order optimization. In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen, editors,Findings of the Association for Computational Linguistics: EMNLP 2024, pages 14825– 14838, Miami, Florida, USA, November 2024. Association for Computation...
work page 2024
-
[61]
Batch-ICL: Effective, efficient, and order-agnostic in-context learning
Kaiyi Zhang, Ang Lv, Yuhan Chen, Hansen Ha, Tao Xu, and Rui Yan. Batch-ICL: Effective, efficient, and order-agnostic in-context learning. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors, Findings of the Association for Computational Linguistics: ACL 2024, pages 10728–10739, Bangkok, Thailand, August 2024. Association for Computational Linguistic...
-
[62]
Dpzero: private fine-tuning of language models without backpropagation
Liang Zhang, Bingcong Li, Kiran Koshy Thekumparampil, Sewoong Oh, and Niao He. Dpzero: private fine-tuning of language models without backpropagation. InProceedings of the 41st International Conference on Machine Learning, ICML’24. JMLR.org, 2024
work page 2024
-
[63]
D.Va: Validate your demonstration first before you use it
Qi Zhang, Zhiqing Xiao, Ruixuan Xiao, Lirong Gao, and Junbo Zhao. D.Va: Validate your demonstration first before you use it. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar, editors,Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2580–2594, Vienna, Austri...
-
[64]
COME: Test-time adaption by conservatively minimizing entropy
Qingyang Zhang, Yatao Bian, Xinke Kong, Peilin Zhao, and Changqing Zhang. COME: Test-time adaption by conservatively minimizing entropy. InThe Thirteenth International Conference on Learning Representations, 2025. URLhttps://openreview.net/forum?id=506BjJ1ziZ
work page 2025
-
[65]
Calibrate before use: Improving few-shot performance of language models
Zihao Zhao, Eric Wallace, Shi Feng, Dan Klein, and Sameer Singh. Calibrate before use: Improving few-shot performance of language models. In Marina Meila and Tong Zhang, editors,Proceedings of the 38th International Conference on Machine Learning, volume 139 ofProceedings of Machine Learning Research, pages 12697–12706. PMLR, 18–24 Jul 2021. URL https://p...
work page 2021
-
[66]
Large language models are not robust multiple choice selectors
Chujie Zheng, Hao Zhou, Fandong Meng, Jie Zhou, and Minlie Huang. Large language models are not robust multiple choice selectors. InThe Twelfth International Conference on Learning Representations,
-
[67]
URLhttps://openreview.net/forum?id=shr9PXz7T0
-
[68]
Han Zhou, Xingchen Wan, Lev Proleev, Diana Mincu, Jilin Chen, Katherine A Heller, and Subhrajit Roy. Batch calibration: Rethinking calibration for in-context learning and prompt engineering. InThe Twelfth International Conference on Learning Representations, 2024. URLhttps://openreview.net/ forum?id=L3FHMoKZcS. 21 A Applicability of Existing Test-Time Met...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.