Ultra-Low-Dimensional Prompt Tuning via Random Projection
Pith reviewed 2026-05-23 03:27 UTC · model grok-4.3
The pith
Prompts learned in 2D space and lifted by a frozen random matrix match full prompt tuning performance with 98 percent fewer trainable parameters.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ULPT optimizes prompt embeddings inside a low-dimensional space such as two dimensions and multiplies them by a frozen random matrix to reach the model's hidden dimension, thereby cutting trainable parameters by 98 percent while preserving downstream performance on more than twenty NLP tasks.
What carries the argument
Frozen random up-projection matrix that maps low-dimensional prompt vectors to the model's full hidden dimensionality.
If this is right
- ULPT requires far fewer trainable parameters than other recent parameter-efficient tuning techniques.
- Performance on more than twenty NLP tasks remains comparable to vanilla prompt tuning.
- The approach enables storage of many more task-specific prompts for the same memory budget.
- Prompt optimization becomes feasible in spaces as small as two dimensions.
Where Pith is reading between the lines
- The storage reduction could let a single device hold separate prompts for thousands of downstream tasks.
- Similar random-projection compression might be applied to other adapter-based tuning methods.
- Task-specific prompts could be transmitted or updated with very small communication cost.
Load-bearing premise
A fixed random matrix from the low-dimensional prompt space to the model's hidden dimension keeps enough task information for performance to stay close to full prompt tuning.
What would settle it
A controlled test on additional tasks in which ULPT accuracy falls markedly below standard prompt tuning while using the same low dimension would show the random projection loses critical information.
Figures
read the original abstract
Large language models achieve state-of-the-art performance but are increasingly costly to fine-tune. Prompt tuning is a parameter-efficient fine-tuning method that addresses parameter-efficiency by learning prompt embeddings, but these embeddings are typically tied to the model's hidden dimensionality, limiting parameter saving. In this paper, we propose Ultra-Low-dimensional Prompt Tuning (ULPT), a simple yet effective method that optimizes prompts in a low-dimensional space (e.g., 2D) and uses a frozen random matrix for up-projection. ULPT can achieve 98% reduction in the training parameters compared to vanilla prompt tuning while preserving performance. Our extensive experiments across over 20 NLP tasks demonstrate that ULPT consistently outperforms recent parameter-efficient tuning methods using significantly fewer parameters, making it well-suited as a storage-efficient framework for massive LLM customization.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Ultra-Low-Dimensional Prompt Tuning (ULPT), which learns prompt vectors in a low-dimensional space (e.g., dimension 2) and maps them to the LLM hidden dimension via a fixed random projection matrix. It claims this yields a 98% reduction in trainable parameters relative to standard prompt tuning while matching or exceeding performance on more than 20 NLP tasks and outperforming other PEFT baselines.
Significance. If the empirical results are robust, the work would be significant for storage-efficient LLM adaptation, as the drastic parameter reduction enables maintaining large numbers of task-specific prompts. The simplicity of the fixed random up-projection is a methodological strength, and the scale of evaluation across 20+ tasks provides a broad empirical test.
major comments (3)
- [§3] §3 (Method): The central modeling assumption—that a fixed random 2D subspace suffices for near-optimal prompts on arbitrary tasks—is stated without a supporting probabilistic argument, Johnson-Lindenstrauss-style bound in the up-projection direction, or comparison to a data-driven basis; this assumption is load-bearing for the 98% reduction claim.
- [§4] §4 (Experiments): Results are reported for a single random projection matrix per task with no ablation over multiple random seeds or variance statistics; without this, it is impossible to determine whether reported gains are stable or depend on fortunate random draws.
- [Table 2] Table 2 (main results): ULPT is compared only against other PEFT methods but not against a learned low-rank projection or PCA-based basis of the same dimension; this omission leaves open whether randomness itself is essential or merely convenient.
minor comments (2)
- [Abstract] The abstract states 'consistently outperforms' but the text does not specify the exact statistical test or multiple-comparison correction used across 20+ tasks.
- [§3, §4] Notation for the random matrix R (shape and initialization) is introduced in §3 but not restated when results are discussed in §4, making cross-reference cumbersome.
Simulated Author's Rebuttal
Thank you for the constructive feedback on our manuscript. We address each major comment point by point below, indicating where revisions will be incorporated.
read point-by-point responses
-
Referee: [§3] §3 (Method): The central modeling assumption—that a fixed random 2D subspace suffices for near-optimal prompts on arbitrary tasks—is stated without a supporting probabilistic argument, Johnson-Lindenstrauss-style bound in the up-projection direction, or comparison to a data-driven basis; this assumption is load-bearing for the 98% reduction claim.
Authors: We acknowledge the value of a formal bound. The standard Johnson-Lindenstrauss lemma already guarantees that random projections approximately preserve distances and norms when mapping from high to low (or vice versa) dimensions with target dimension logarithmic in the source. Our method relies on this known property for the up-projection step. We will add a short discussion paragraph in §3 explicitly connecting ULPT to the JL lemma and clarifying that the sufficiency claim is primarily empirical, supported by results across more than 20 tasks. A data-driven basis comparison is not included because it would require per-task storage of the basis vectors, undermining the storage-efficiency goal that enables the 98% reduction. revision: partial
-
Referee: [§4] §4 (Experiments): Results are reported for a single random projection matrix per task with no ablation over multiple random seeds or variance statistics; without this, it is impossible to determine whether reported gains are stable or depend on fortunate random draws.
Authors: We agree that reporting variance over random seeds strengthens the empirical claims. In the revised version we will rerun the main experiments on a representative subset of tasks using at least five independent random projection matrices per task and report mean performance together with standard deviation. revision: yes
-
Referee: [Table 2] Table 2 (main results): ULPT is compared only against other PEFT methods but not against a learned low-rank projection or PCA-based basis of the same dimension; this omission leaves open whether randomness itself is essential or merely convenient.
Authors: We maintain that the relevant baselines are existing PEFT methods, as these are the methods practitioners would otherwise use. The core advantage of the fixed random projection is that it incurs zero additional per-task storage for the up-projection matrix itself; any learned or PCA-derived basis would need to be stored (or recomputed) per task, eroding the storage benefit that allows maintaining thousands of task-specific prompts. Randomness is therefore not merely convenient but essential to the storage-efficiency claim. We therefore do not plan to add such comparisons. revision: no
Circularity Check
No circularity: empirical proposal with no self-referential derivation
full rationale
The paper introduces ULPT as a direct empirical method: optimize a low-dimensional prompt vector and up-project via a fixed random matrix. No equations, theorems, or claims reduce the performance result to a fitted quantity defined by the method itself, nor rely on self-citation chains for load-bearing uniqueness or ansatzes. The central claim rests on experimental results across tasks rather than any closed-loop derivation. This matches the default expectation of a non-circular empirical contribution.
Axiom & Free-Parameter Ledger
free parameters (1)
- prompt dimension d
Reference graph
Works this paper leans on
-
[1]
Intrinsic dimensionality explains the effectiveness of language model fine-tuning
Armen Aghajanyan, Sonal Gupta, and Luke Zettlemoyer. Intrinsic dimensionality explains the effectiveness of language model fine-tuning. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pages 7319--7328, 2021. URL https://aclanthology.org/2...
work page 2021
-
[2]
ATTEMPT : Parameter-efficient multi-task tuning via attentional mixtures of soft prompts
Akari Asai, Mohammadreza Salehi, Matthew Peters, and Hannaneh Hajishirzi. ATTEMPT : Parameter-efficient multi-task tuning via attentional mixtures of soft prompts. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 6655--6672, 2022. URL https://aclanthology.org/2022.emnlp-main.446
work page 2022
-
[3]
B it F it: Simple parameter-efficient fine-tuning for transformer-based masked language-models
Elad Ben Zaken, Yoav Goldberg, and Shauli Ravfogel. B it F it: Simple parameter-efficient fine-tuning for transformer-based masked language-models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, pages 1--9, 2022. URL https://aclanthology.org/2022.acl-short.1
work page 2022
-
[4]
Random projection in dimensionality reduction: applications to image and text data
Ella Bingham and Heikki Mannila. Random projection in dimensionality reduction: applications to image and text data. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, page 245–250, 2001. URL https://doi.org/10.1145/502512.502546
-
[5]
Language models are few-shot learners
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gr...
work page 1901
-
[6]
S em E val-2017 task 1: Semantic textual similarity multilingual and crosslingual focused evaluation
Daniel Cer, Mona Diab, Eneko Agirre, I \ n igo Lopez-Gazpio, and Lucia Specia. S em E val-2017 task 1: Semantic textual similarity multilingual and crosslingual focused evaluation. In Proceedings of the 11th International Workshop on Semantic Evaluation, pages 1--14, 2017. URL https://aclanthology.org/S17-2001/
work page 2017
-
[7]
SM o P : Towards efficient and effective prompt tuning with sparse mixture-of-prompts
Joon-Young Choi, Junho Kim, Jun-Hyung Park, Wing-Lam Mok, and SangKeun Lee. SM o P : Towards efficient and effective prompt tuning with sparse mixture-of-prompts. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 14306--14316, 2023. URL https://aclanthology.org/2023.emnlp-main.884
work page 2023
-
[8]
B ool Q : Exploring the surprising difficulty of natural yes/no questions
Christopher Clark, Kenton Lee, Ming-Wei Chang, Tom Kwiatkowski, Michael Collins, and Kristina Toutanova. B ool Q : Exploring the surprising difficulty of natural yes/no questions. In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies , pages 2924--2936, 2019. URL ...
work page 2019
-
[9]
An elementary proof of a theorem of J ohnson and L indenstrauss
Sanjoy Dasgupta and Anupam Gupta. An elementary proof of a theorem of J ohnson and L indenstrauss. Random Structures & Algorithms, 22 0 (1): 0 60--65, 2003. URL https://doi.org/10.1002/rsa.10073
-
[10]
The commitmentbank: Investigating projection in naturally occurring discourse
Marie-Catherine De Marneffe, Mandy Simons, and Judith Tonhauser. The commitmentbank: Investigating projection in naturally occurring discourse. In Proceedings of Sinn und Bedeutung, volume 23, pages 107--124, 2019. URL https://semanticsarchive.net/Archive/Tg3ZGI2M/Marneffe.pdf
work page 2019
-
[11]
Transforming Question Answering Datasets Into Natural Language Inference Datasets
Dorottya Demszky, Kelvin Guu, and Percy Liang. Transforming question answering datasets into natural language inference datasets, 2018. URL https://arxiv.org/abs/1809.02922
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[12]
William B. Dolan and Chris Brockett. Automatically constructing a corpus of sentential paraphrases. In Proceedings of the Third International Workshop on Paraphrasing, 2005. URL https://aclanthology.org/I05-5002/
work page 2005
-
[13]
A survey on in-context learning
Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Baobao Chang, Xu Sun, Lei Li, and Zhifang Sui. A survey on in-context learning. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 1107--1128, 2024. URL https://aclanthology.org/2024.emnlp-main.64
work page 2024
-
[14]
SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine
Matthew Dunn, Levent Sagun, Mike Higgins, V Ugur Guney, Volkan Cirik, and Kyunghyun Cho. Search QA : A new Q & A dataset augmented with context from a search engine. arXiv preprint arXiv:1704.05179, 2017. URL https://arxiv.org/abs/1704.05179
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[15]
MRQA 2019 shared task: Evaluating generalization in reading comprehension
Adam Fisch, Alon Talmor, Robin Jia, Minjoon Seo, Eunsol Choi, and Danqi Chen. MRQA 2019 shared task: Evaluating generalization in reading comprehension. In Proceedings of the 2nd Workshop on Machine Reading for Question Answering, pages 1--13, 2019. URL https://aclanthology.org/D19-5801
work page 2019
-
[16]
The third PASCAL recognizing textual entailment challenge
Danilo Giampiccolo, Bernardo Magnini, Ido Dagan, and Bill Dolan. The third PASCAL recognizing textual entailment challenge. In Proceedings of the ACL - PASCAL Workshop on Textual Entailment and Paraphrasing , pages 1--9, 2007. URL https://aclanthology.org/W07-1401
work page 2007
-
[17]
Lo PT : Low-rank prompt tuning for parameter efficient language models, 2024
Shouchang Guo, Sonam Damani, and Keng hao Chang. Lo PT : Low-rank prompt tuning for parameter efficient language models, 2024. URL https://arxiv.org/abs/2406.19486
-
[18]
Flora: Low-rank adapters are secretly gradient compressors
Yongchang Hao, Yanshuai Cao, and Lili Mou. Flora: Low-rank adapters are secretly gradient compressors. In Proceedings of the 41st International Conference on Machine Learning, 2024. URL https://proceedings.mlr.press/v235/hao24a.html
work page 2024
-
[19]
Lo RA +: Efficient low rank adaptation of large models
Soufiane Hayou, Nikhil Ghosh, and Bin Yu. Lo RA +: Efficient low rank adaptation of large models. In Proceedings of the 41st International Conference on Machine Learning, 2024. URL https://proceedings.mlr.press/v235/hayou24a.html
work page 2024
-
[20]
Parameter-efficient transfer learning for NLP
Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. Parameter-efficient transfer learning for NLP . In Proceedings of the 36th International Conference on Machine Learning, pages 2790--2799, 2019. URL https://proceedings.mlr.press/v97/houlsby19a.html
work page 2019
-
[21]
Lo RA : Low-rank adaptation of large language models
Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lo RA : Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=nZeVKeeFYf9
work page 2022
-
[22]
Approximate nearest neighbors: towards removing the curse of dimensionality
Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the thirtieth annual ACM symposium on Theory of computing, pages 604--613, 1998. URL https://dl.acm.org/doi/10.1145/276698.276876
-
[23]
Hyperdecoders: Instance-specific decoders for multi-task NLP
Hamish Ivison and Matthew Peters. Hyperdecoders: Instance-specific decoders for multi-task NLP . In Findings of the Association for Computational Linguistics: EMNLP, pages 1715--1730, 2022. URL https://aclanthology.org/2022.findings-emnlp.124
work page 2022
-
[24]
Parameter-efficient multi-task fine-tuning for transformers via shared hypernetworks
Rabeeh Karimi Mahabadi, Sebastian Ruder, Mostafa Dehghani, and James Henderson. Parameter-efficient multi-task fine-tuning for transformers via shared hypernetworks. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pages 565--576, 2021. UR...
work page 2021
-
[25]
Looking beyond the surface: A challenge set for reading comprehension over multiple sentences
Daniel Khashabi, Snigdha Chaturvedi, Michael Roth, Shyam Upadhyay, and Dan Roth. Looking beyond the surface: A challenge set for reading comprehension over multiple sentences. In Proceedings of the 2018 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies , pages 252--262, 2018. URL https:...
work page 2018
-
[26]
Scitail: A textual entailment dataset from science question answering
Tushar Khot, Ashish Sabharwal, and Peter Clark. Scitail: A textual entailment dataset from science question answering. Proceedings of the AAAI Conference on Artificial Intelligence, 2018. URL https://ojs.aaai.org/index.php/AAAI/article/view/12022
work page 2018
-
[27]
Large language models are zero-shot reasoners
Takeshi Kojima, Shixiang (Shane) Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. Large language models are zero-shot reasoners. In Advances in Neural Information Processing Systems, pages 22199--22213, 2022. URL https://openreview.net/pdf?id=e2TBb5y0yFf
work page 2022
-
[28]
Ve RA : Vector-based random matrix adaptation
Dawid Jan Kopiczko, Tijmen Blankevoort, and Yuki M Asano. Ve RA : Vector-based random matrix adaptation. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=NjNfLdxr3A
work page 2024
-
[29]
Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov
Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Jacob Devlin, Kenton Lee, Kristina Toutanova, Llion Jones, Matthew Kelcey, Ming-Wei Chang, Andrew M. Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov. Natural questions: A benchmark for question answering research. Transac...
work page 2019
-
[30]
The power of scale for parameter-efficient prompt tuning
Brian Lester, Rami Al-Rfou, and Noah Constant. The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3045--3059, 2021. URL https://aclanthology.org/2021.emnlp-main.243
work page 2021
-
[31]
Hector Levesque, Ernest Davis, and Leora Morgenstern. The winograd schema challenge. In Preceddings of the 13th International Conference on the Principles of Knowledge Representation and Reasoning, 2012. URL https://cdn.aaai.org/ocs/4492/4492-21843-1-PB.pdf
work page 2012
-
[32]
Prefix- T uning: Optimizing continuous prompts for generation
Xiang Lisa Li and Percy Liang. Prefix- T uning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pages 4582--4597, 2021. URL https://aclanthology.org/2021.acl-long.353
work page 2021
-
[33]
Relo RA : High-rank training through low-rank updates
Vladislav Lialin, Sherin Muckatira, Namrata Shivagunde, and Anna Rumshisky. Relo RA : High-rank training through low-rank updates. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=DLJznSp6X3
work page 2024
-
[34]
P- T uning: Prompt tuning can be comparable to fine-tuning across scales and tasks
Xiao Liu, Kaixuan Ji, Yicheng Fu, Weng Tam, Zhengxiao Du, Zhilin Yang, and Jie Tang. P- T uning: Prompt tuning can be comparable to fine-tuning across scales and tasks. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, pages 61--68, 2022. URL https://aclanthology.org/2022.acl-short.8
work page 2022
-
[35]
Xiao Liu, Yanan Zheng, Zhengxiao Du, Ming Ding, Yujie Qian, Zhilin Yang, and Jie Tang. GPT understands, too. AI Open, 5: 0 208--215, 2024. URL https://www.sciencedirect.com/science/article/pii/S2666651023000141
work page 2024
-
[36]
PEFT : State-of-the-art parameter-efficient fine-tuning methods
Sourab Mangrulkar, Sylvain Gugger, Lysandre Debut, Younes Belkada, Sayak Paul, and Benjamin Bossan. PEFT : State-of-the-art parameter-efficient fine-tuning methods. https://github.com/huggingface/peft, 2022
work page 2022
-
[37]
On variants of the J ohnson-- L indenstrauss lemma
Ji r \' Matou s ek. On variants of the J ohnson-- L indenstrauss lemma. Random Structures & Algorithms, 33 0 (2): 0 142--156, 2008. URL https://doi.org/10.1002/rsa.20218
-
[38]
Crosslingual generalization through multitask finetuning
Niklas Muennighoff, Thomas Wang, Lintang Sutawika, Adam Roberts, Stella Biderman, Teven Le Scao, M Saiful Bari, Sheng Shen, Zheng Xin Yong, Hailey Schoelkopf, Xiangru Tang, Dragomir Radev, Alham Fikri Aji, Khalid Almubarak, Samuel Albanie, Zaid Alyafeai, Albert Webson, Edward Raff, and Colin Raffel. Crosslingual generalization through multitask finetuning...
work page 2023
-
[39]
Prompting a pretrained transformer can be a universal approximator
Aleksandar Petrov, Philip Torr, and Adel Bibi. Prompting a pretrained transformer can be a universal approximator. In Proceedings of the 41st International Conference on Machine Learning, 2024 a . URL https://proceedings.mlr.press/v235/petrov24a.html
work page 2024
-
[40]
When do prompting and prefix-tuning work? a theory of capabilities and limitations
Aleksandar Petrov, Philip Torr, and Adel Bibi. When do prompting and prefix-tuning work? a theory of capabilities and limitations. In The Twelfth International Conference on Learning Representations, 2024 b . URL https://openreview.net/forum?id=JewzobRhay
work page 2024
-
[41]
W i C : The word-in-context dataset for evaluating context-sensitive meaning representations
Mohammad Taher Pilehvar and Jose Camacho-Collados. W i C : The word-in-context dataset for evaluating context-sensitive meaning representations. In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies , 2019. URL https://aclanthology.org/N19-1128
work page 2019
-
[42]
Exploring universal intrinsic task subspace via prompt tuning, 2022
Yujia Qin, Xiaozhi Wang, Yusheng Su, Yankai Lin, Ning Ding, Jing Yi, Weize Chen, Zhiyuan Liu, Juanzi Li, Lei Hou, Peng Li, Maosong Sun, and Jie Zhou. Exploring universal intrinsic task subspace via prompt tuning, 2022. URL https://arxiv.org/abs/2110.07867
-
[43]
Exploring the limits of transfer learning with a unified text-to-text transformer
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, pages 1--67, 2020. URL https://jmlr.org/papers/v21/20-074.html
work page 2020
-
[44]
Residual P rompt T uning: improving prompt tuning with residual reparameterization
Anastasiia Razdaibiedina, Yuning Mao, Madian Khabsa, Mike Lewis, Rui Hou, Jimmy Ba, and Amjad Almahairi. Residual P rompt T uning: improving prompt tuning with residual reparameterization. In Findings of the Association for Computational Linguistics: ACL, pages 6740--6757, 2023. URL https://aclanthology.org/2023.findings-acl.421
work page 2023
-
[45]
AdapterDrop : O n the efficiency of adapters in transformers
Andreas R \"u ckl \'e , Gregor Geigle, Max Glockner, Tilman Beck, Jonas Pfeiffer, Nils Reimers, and Iryna Gurevych. AdapterDrop : O n the efficiency of adapters in transformers. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021. URL https://aclanthology.org/2021.emnlp-main.626
work page 2021
-
[46]
Wino G rande: An adversarial winograd schema challenge at scale
Keisuke Sakaguchi, Ronan Le Bras, Chandra Bhagavatula, and Yejin Choi. Wino G rande: An adversarial winograd schema challenge at scale. Communications of the ACM, page 99–106, 2021. URL https://doi.org/10.1145/3474381
-
[47]
De PT : Decomposed prompt tuning for parameter-efficient fine-tuning
Zhengxiang Shi and Aldo Lipani. De PT : Decomposed prompt tuning for parameter-efficient fine-tuning. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=KjegfPGRde
work page 2024
-
[48]
Logan IV, Eric Wallace, and Sameer Singh
Taylor Shin, Yasaman Razeghi, Robert L. Logan IV, Eric Wallace, and Sameer Singh. A uto P rompt: E liciting K nowledge from L anguage M odels with A utomatically G enerated P rompts. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pages 4222--4235, 2020. URL https://aclanthology.org/2020.emnlp-main.346
work page 2020
-
[49]
Manning, Andrew Ng, and Christopher Potts
Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher Potts. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1631--1642, 2013. URL https://aclanthology.org/D13-1170/
work page 2013
-
[50]
LST : Ladder side-tuning for parameter and memory efficient transfer learning
Yi-Lin Sung, Jaemin Cho, and Mohit Bansal. LST : Ladder side-tuning for parameter and memory efficient transfer learning. In Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=isPnnaTZaP5
work page 2022
-
[51]
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timoth \'e e Lacroix, Baptiste Rozi \`e re, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023. URL https://arxiv.org/abs/2302.13971
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[52]
N ews QA : A machine comprehension dataset
Adam Trischler, Tong Wang, Xingdi Yuan, Justin Harris, Alessandro Sordoni, Philip Bachman, and Kaheer Suleman. N ews QA : A machine comprehension dataset. In Proceedings of the 2nd Workshop on Representation Learning for NLP , pages 191--200, 2017. URL https://aclanthology.org/W17-2623
work page 2017
-
[53]
SP o T : Better frozen model adaptation through soft prompt transfer
Tu Vu, Brian Lester, Noah Constant, Rami Al-Rfou ' , and Daniel Cer. SP o T : Better frozen model adaptation through soft prompt transfer. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, pages 5039--5059, 2022. URL https://aclanthology.org/2022.acl-long.346
work page 2022
-
[54]
GLUE : A multi-task benchmark and analysis platform for natural language understanding
Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel Bowman. GLUE : A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 2018 EMNLP Workshop B lackbox NLP : Analyzing and Interpreting Neural Networks for NLP , pages 353--355, 2018. URL https://aclanthology.org/W18-5446
work page 2018
-
[55]
Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. Super GLUE : A stickier benchmark for general-purpose language understanding systems. In arxiv, 2019. URL http://arxiv.org/abs/1905.00537
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[56]
Multitask prompt tuning enables parameter-efficient transfer learning
Zhen Wang, Rameswar Panda, Leonid Karlinsky, Rogerio Feris, Huan Sun, and Yoon Kim. Multitask prompt tuning enables parameter-efficient transfer learning. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=Nk2pDtuhTq
work page 2023
-
[57]
Learning to prompt for continual learning
Zifeng Wang, Zizhao Zhang, Chen-Yu Lee, Han Zhang, Ruoxi Sun, Xiaoqi Ren, Guolong Su, Vincent Perot, Jennifer Dy, and Tomas Pfister. Learning to prompt for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 139--149, June 2022. URL https://openaccess.thecvf.com/content/CVPR2022/html/Wang_Learnin...
work page 2022
-
[58]
Alex Warstadt, Amanpreet Singh, and Samuel R. Bowman. Neural network acceptability judgments. Transactions of the Association for Computational Linguistics, 2019. URL https://aclanthology.org/Q19-1040
work page 2019
-
[59]
Jason Wei, Maarten Bosma, Vincent Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, and Quoc V Le. Finetuned language models are zero-shot learners. In International Conference on Learning Representations, 2022 a . URL https://openreview.net/forum?id=gEZrGCozdqR
work page 2022
-
[60]
Chain-of-thought prompting elicits reasoning in large language models
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, brian ichter, Fei Xia, Ed Chi, Quoc V Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems, pages 24824--24837, 2022 b . URL https://openreview.net/pdf?id=_VjQlMeSB_J
work page 2022
-
[61]
A broad-coverage challenge corpus for sentence understanding through inference
Adina Williams, Nikita Nangia, and Samuel Bowman. A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the 2018 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies , pages 1112--1122, 2018. URL https://aclanthology.org/N18-1101/
work page 2018
-
[62]
Xun Wu, Shaohan Huang, and Furu Wei. Mixture of L o RA experts. In The Twelfth International Conference on Learning Representations, 2024 a . URL https://openreview.net/forum?id=uWvKBCYh4S
work page 2024
-
[63]
Re FT : Representation finetuning for language models
Zhengxuan Wu, Aryaman Arora, Zheng Wang, Atticus Geiger, Dan Jurafsky, Christopher D Manning, and Christopher Potts. Re FT : Representation finetuning for language models. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024 b . URL https://openreview.net/forum?id=fykjplMc0V
work page 2024
-
[64]
Zero-shot continuous prompt transfer: Generalizing task semantics across language models
Zijun Wu, Yongkang Wu, and Lili Mou. Zero-shot continuous prompt transfer: Generalizing task semantics across language models. In The Twelfth International Conference on Learning Representations, 2024 c . URL https://openreview.net/forum?id=26XphugOcS
work page 2024
-
[65]
Decomposed prompt tuning via low-rank reparameterization
Yao Xiao, Lu Xu, Jiaxi Li, Wei Lu, and Xiaoli Li. Decomposed prompt tuning via low-rank reparameterization. In Findings of the Association for Computational Linguistics: EMNLP, pages 13335--13347, 2023. URL https://aclanthology.org/2023.findings-emnlp.890
work page 2023
-
[66]
Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William Cohen, Ruslan Salakhutdinov, and Christopher D. Manning. H otpot QA : A dataset for diverse, explainable multi-hop question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2369--2380, 2018. URL https://aclanthology.org/D18-1259/
work page 2018
-
[67]
Lo F i T : Localized fine-tuning on LLM representations
Fangcong Yin, Xi Ye, and Greg Durrett. Lo F i T : Localized fine-tuning on LLM representations. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URL https://openreview.net/forum?id=dfiXFbECSZ
work page 2024
-
[68]
Character-level convolutional networks for text classification
Xiang Zhang, Junbo Zhao, and Yann LeCun. Character-level convolutional networks for text classification. In Proceedings of the 28th International Conference on Neural Information Processing Systems, page 649–657, 2015. URL https://proceedings.neurips.cc/paper/2015/file/250cf8b51c773f3f8dc8b4be867a9a02-Paper.pdf
work page 2015
-
[69]
PAWS : Paraphrase adversaries from word scrambling
Yuan Zhang, Jason Baldridge, and Luheng He. PAWS : Paraphrase adversaries from word scrambling. In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies , pages 1298--1308, 2019. URL https://aclanthology.org/N19-1131
work page 2019
-
[70]
Tuning layernorm in attention: Towards efficient multi-modal LLM finetuning
Bingchen Zhao, Haoqin Tu, Chen Wei, Jieru Mei, and Cihang Xie. Tuning layernorm in attention: Towards efficient multi-modal LLM finetuning. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=YR3ETaElNK
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.