Ultra-Low-Dimensional Prompt Tuning via Random Projection

Lili Mou; Yongchang Hao; Zijun Wu

arxiv: 2502.04501 · v3 · submitted 2025-02-06 · 💻 cs.CL

Ultra-Low-Dimensional Prompt Tuning via Random Projection

Zijun Wu , Yongchang Hao , Lili Mou This is my paper

Pith reviewed 2026-05-23 03:27 UTC · model grok-4.3

classification 💻 cs.CL

keywords prompt tuningparameter-efficient fine-tuningrandom projectionlow-dimensional optimizationlarge language modelsnatural language processing

0 comments

The pith

Prompts learned in 2D space and lifted by a frozen random matrix match full prompt tuning performance with 98 percent fewer trainable parameters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a method that learns prompt vectors in an extremely low-dimensional space rather than tying them to the full hidden size of the language model. A single fixed random matrix then projects these short vectors up to the required embedding dimension. This yields a 98 percent drop in the number of parameters that must be trained and stored. Experiments across more than twenty NLP tasks show accuracy stays comparable to standard prompt tuning and exceeds other recent efficient-tuning baselines that still use more parameters.

Core claim

ULPT optimizes prompt embeddings inside a low-dimensional space such as two dimensions and multiplies them by a frozen random matrix to reach the model's hidden dimension, thereby cutting trainable parameters by 98 percent while preserving downstream performance on more than twenty NLP tasks.

What carries the argument

Frozen random up-projection matrix that maps low-dimensional prompt vectors to the model's full hidden dimensionality.

If this is right

ULPT requires far fewer trainable parameters than other recent parameter-efficient tuning techniques.
Performance on more than twenty NLP tasks remains comparable to vanilla prompt tuning.
The approach enables storage of many more task-specific prompts for the same memory budget.
Prompt optimization becomes feasible in spaces as small as two dimensions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The storage reduction could let a single device hold separate prompts for thousands of downstream tasks.
Similar random-projection compression might be applied to other adapter-based tuning methods.
Task-specific prompts could be transmitted or updated with very small communication cost.

Load-bearing premise

A fixed random matrix from the low-dimensional prompt space to the model's hidden dimension keeps enough task information for performance to stay close to full prompt tuning.

What would settle it

A controlled test on additional tasks in which ULPT accuracy falls markedly below standard prompt tuning while using the same low dimension would show the random projection loses critical information.

Figures

Figures reproduced from arXiv: 2502.04501 by Lili Mou, Yongchang Hao, Zijun Wu.

**Figure 1.** Figure 1: Overview of our approach. (a) ULPT upprojects ultra-low-dimensional embeddings with a random but fixed matrix. (b) ULPT can significantly reduce parameters storage for LLMs customization. trainable parameters than vanilla prompt tuning. We avoid this overhead by employing a random but frozen matrix for the up-projection, as shown in Figure 1a. We further introduce lightweight, learnable shift and scale e… view at source ↗

**Figure 2.** Figure 2: Distribution of prompt embedding values over [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Results with controlled numbers of trainable [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 5.** Figure 5: Left: Training loss curves comparing ULPT with no alignment (dotted), with learnable shift only (dashed), and with both shift and scale (solid). Right: Evaluation accuracy curves for ULPT at r = 2. Adding shift significantly improves optimization and accuracy, while adding scale yields further gains. Trends are consistent across ranks. r=2 r=16 r=64 r=256 r=2 r=16 r=64 r=256 1.000 0.698 1.000 0.569 0.675 … view at source ↗

**Figure 6.** Figure 6: Left: Shift embeddings learned with different ranks are highly similar, suggesting a general alignment role. Right: Scale embeddings vary significantly, indicating their dependence on frozen random projections. projection matrix P˜ hinders the optimization process and consequently lowers the model performance. Introducing a learnable shift embedding b provides a substantial improvement (dashed lines), p… view at source ↗

**Figure 7.** Figure 7: Distribution of randomly selected dimensions [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗

read the original abstract

Large language models achieve state-of-the-art performance but are increasingly costly to fine-tune. Prompt tuning is a parameter-efficient fine-tuning method that addresses parameter-efficiency by learning prompt embeddings, but these embeddings are typically tied to the model's hidden dimensionality, limiting parameter saving. In this paper, we propose Ultra-Low-dimensional Prompt Tuning (ULPT), a simple yet effective method that optimizes prompts in a low-dimensional space (e.g., 2D) and uses a frozen random matrix for up-projection. ULPT can achieve 98% reduction in the training parameters compared to vanilla prompt tuning while preserving performance. Our extensive experiments across over 20 NLP tasks demonstrate that ULPT consistently outperforms recent parameter-efficient tuning methods using significantly fewer parameters, making it well-suited as a storage-efficient framework for massive LLM customization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes Ultra-Low-Dimensional Prompt Tuning (ULPT), which learns prompt vectors in a low-dimensional space (e.g., dimension 2) and maps them to the LLM hidden dimension via a fixed random projection matrix. It claims this yields a 98% reduction in trainable parameters relative to standard prompt tuning while matching or exceeding performance on more than 20 NLP tasks and outperforming other PEFT baselines.

Significance. If the empirical results are robust, the work would be significant for storage-efficient LLM adaptation, as the drastic parameter reduction enables maintaining large numbers of task-specific prompts. The simplicity of the fixed random up-projection is a methodological strength, and the scale of evaluation across 20+ tasks provides a broad empirical test.

major comments (3)

[§3] §3 (Method): The central modeling assumption—that a fixed random 2D subspace suffices for near-optimal prompts on arbitrary tasks—is stated without a supporting probabilistic argument, Johnson-Lindenstrauss-style bound in the up-projection direction, or comparison to a data-driven basis; this assumption is load-bearing for the 98% reduction claim.
[§4] §4 (Experiments): Results are reported for a single random projection matrix per task with no ablation over multiple random seeds or variance statistics; without this, it is impossible to determine whether reported gains are stable or depend on fortunate random draws.
[Table 2] Table 2 (main results): ULPT is compared only against other PEFT methods but not against a learned low-rank projection or PCA-based basis of the same dimension; this omission leaves open whether randomness itself is essential or merely convenient.

minor comments (2)

[Abstract] The abstract states 'consistently outperforms' but the text does not specify the exact statistical test or multiple-comparison correction used across 20+ tasks.
[§3, §4] Notation for the random matrix R (shape and initialization) is introduced in §3 but not restated when results are discussed in §4, making cross-reference cumbersome.

Simulated Author's Rebuttal

3 responses · 0 unresolved

Thank you for the constructive feedback on our manuscript. We address each major comment point by point below, indicating where revisions will be incorporated.

read point-by-point responses

Referee: [§3] §3 (Method): The central modeling assumption—that a fixed random 2D subspace suffices for near-optimal prompts on arbitrary tasks—is stated without a supporting probabilistic argument, Johnson-Lindenstrauss-style bound in the up-projection direction, or comparison to a data-driven basis; this assumption is load-bearing for the 98% reduction claim.

Authors: We acknowledge the value of a formal bound. The standard Johnson-Lindenstrauss lemma already guarantees that random projections approximately preserve distances and norms when mapping from high to low (or vice versa) dimensions with target dimension logarithmic in the source. Our method relies on this known property for the up-projection step. We will add a short discussion paragraph in §3 explicitly connecting ULPT to the JL lemma and clarifying that the sufficiency claim is primarily empirical, supported by results across more than 20 tasks. A data-driven basis comparison is not included because it would require per-task storage of the basis vectors, undermining the storage-efficiency goal that enables the 98% reduction. revision: partial
Referee: [§4] §4 (Experiments): Results are reported for a single random projection matrix per task with no ablation over multiple random seeds or variance statistics; without this, it is impossible to determine whether reported gains are stable or depend on fortunate random draws.

Authors: We agree that reporting variance over random seeds strengthens the empirical claims. In the revised version we will rerun the main experiments on a representative subset of tasks using at least five independent random projection matrices per task and report mean performance together with standard deviation. revision: yes
Referee: [Table 2] Table 2 (main results): ULPT is compared only against other PEFT methods but not against a learned low-rank projection or PCA-based basis of the same dimension; this omission leaves open whether randomness itself is essential or merely convenient.

Authors: We maintain that the relevant baselines are existing PEFT methods, as these are the methods practitioners would otherwise use. The core advantage of the fixed random projection is that it incurs zero additional per-task storage for the up-projection matrix itself; any learned or PCA-derived basis would need to be stored (or recomputed) per task, eroding the storage benefit that allows maintaining thousands of task-specific prompts. Randomness is therefore not merely convenient but essential to the storage-efficiency claim. We therefore do not plan to add such comparisons. revision: no

Circularity Check

0 steps flagged

No circularity: empirical proposal with no self-referential derivation

full rationale

The paper introduces ULPT as a direct empirical method: optimize a low-dimensional prompt vector and up-project via a fixed random matrix. No equations, theorems, or claims reduce the performance result to a fitted quantity defined by the method itself, nor rely on self-citation chains for load-bearing uniqueness or ansatzes. The central claim rests on experimental results across tasks rather than any closed-loop derivation. This matches the default expectation of a non-circular empirical contribution.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 0 invented entities

The central claim rests on the untested premise that random projection from 2D preserves task performance; no free parameters are explicitly fitted beyond the low dimension choice itself.

free parameters (1)

prompt dimension d
Chosen as a small integer (example 2) to achieve the reported parameter reduction; its value directly controls the claimed savings.

pith-pipeline@v0.9.0 · 5657 in / 991 out tokens · 29858 ms · 2026-05-23T03:27:17.534530+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

70 extracted references · 70 canonical work pages · 4 internal anchors

[1]

Intrinsic dimensionality explains the effectiveness of language model fine-tuning

Armen Aghajanyan, Sonal Gupta, and Luke Zettlemoyer. Intrinsic dimensionality explains the effectiveness of language model fine-tuning. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pages 7319--7328, 2021. URL https://aclanthology.org/2...

work page 2021
[2]

ATTEMPT : Parameter-efficient multi-task tuning via attentional mixtures of soft prompts

Akari Asai, Mohammadreza Salehi, Matthew Peters, and Hannaneh Hajishirzi. ATTEMPT : Parameter-efficient multi-task tuning via attentional mixtures of soft prompts. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 6655--6672, 2022. URL https://aclanthology.org/2022.emnlp-main.446

work page 2022
[3]

B it F it: Simple parameter-efficient fine-tuning for transformer-based masked language-models

Elad Ben Zaken, Yoav Goldberg, and Shauli Ravfogel. B it F it: Simple parameter-efficient fine-tuning for transformer-based masked language-models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, pages 1--9, 2022. URL https://aclanthology.org/2022.acl-short.1

work page 2022
[4]

Random projection in dimensionality reduction: applications to image and text data

Ella Bingham and Heikki Mannila. Random projection in dimensionality reduction: applications to image and text data. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, page 245–250, 2001. URL https://doi.org/10.1145/502512.502546

work page doi:10.1145/502512.502546 2001
[5]

Language models are few-shot learners

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gr...

work page 1901
[6]

S em E val-2017 task 1: Semantic textual similarity multilingual and crosslingual focused evaluation

Daniel Cer, Mona Diab, Eneko Agirre, I \ n igo Lopez-Gazpio, and Lucia Specia. S em E val-2017 task 1: Semantic textual similarity multilingual and crosslingual focused evaluation. In Proceedings of the 11th International Workshop on Semantic Evaluation, pages 1--14, 2017. URL https://aclanthology.org/S17-2001/

work page 2017
[7]

SM o P : Towards efficient and effective prompt tuning with sparse mixture-of-prompts

Joon-Young Choi, Junho Kim, Jun-Hyung Park, Wing-Lam Mok, and SangKeun Lee. SM o P : Towards efficient and effective prompt tuning with sparse mixture-of-prompts. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 14306--14316, 2023. URL https://aclanthology.org/2023.emnlp-main.884

work page 2023
[8]

B ool Q : Exploring the surprising difficulty of natural yes/no questions

Christopher Clark, Kenton Lee, Ming-Wei Chang, Tom Kwiatkowski, Michael Collins, and Kristina Toutanova. B ool Q : Exploring the surprising difficulty of natural yes/no questions. In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies , pages 2924--2936, 2019. URL ...

work page 2019
[9]

An elementary proof of a theorem of J ohnson and L indenstrauss

Sanjoy Dasgupta and Anupam Gupta. An elementary proof of a theorem of J ohnson and L indenstrauss. Random Structures & Algorithms, 22 0 (1): 0 60--65, 2003. URL https://doi.org/10.1002/rsa.10073

work page doi:10.1002/rsa.10073 2003
[10]

The commitmentbank: Investigating projection in naturally occurring discourse

Marie-Catherine De Marneffe, Mandy Simons, and Judith Tonhauser. The commitmentbank: Investigating projection in naturally occurring discourse. In Proceedings of Sinn und Bedeutung, volume 23, pages 107--124, 2019. URL https://semanticsarchive.net/Archive/Tg3ZGI2M/Marneffe.pdf

work page 2019
[11]

Transforming Question Answering Datasets Into Natural Language Inference Datasets

Dorottya Demszky, Kelvin Guu, and Percy Liang. Transforming question answering datasets into natural language inference datasets, 2018. URL https://arxiv.org/abs/1809.02922

work page internal anchor Pith review Pith/arXiv arXiv 2018
[12]

Dolan and Chris Brockett

William B. Dolan and Chris Brockett. Automatically constructing a corpus of sentential paraphrases. In Proceedings of the Third International Workshop on Paraphrasing, 2005. URL https://aclanthology.org/I05-5002/

work page 2005
[13]

A survey on in-context learning

Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Baobao Chang, Xu Sun, Lei Li, and Zhifang Sui. A survey on in-context learning. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 1107--1128, 2024. URL https://aclanthology.org/2024.emnlp-main.64

work page 2024
[14]

SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine

Matthew Dunn, Levent Sagun, Mike Higgins, V Ugur Guney, Volkan Cirik, and Kyunghyun Cho. Search QA : A new Q & A dataset augmented with context from a search engine. arXiv preprint arXiv:1704.05179, 2017. URL https://arxiv.org/abs/1704.05179

work page internal anchor Pith review Pith/arXiv arXiv 2017
[15]

MRQA 2019 shared task: Evaluating generalization in reading comprehension

Adam Fisch, Alon Talmor, Robin Jia, Minjoon Seo, Eunsol Choi, and Danqi Chen. MRQA 2019 shared task: Evaluating generalization in reading comprehension. In Proceedings of the 2nd Workshop on Machine Reading for Question Answering, pages 1--13, 2019. URL https://aclanthology.org/D19-5801

work page 2019
[16]

The third PASCAL recognizing textual entailment challenge

Danilo Giampiccolo, Bernardo Magnini, Ido Dagan, and Bill Dolan. The third PASCAL recognizing textual entailment challenge. In Proceedings of the ACL - PASCAL Workshop on Textual Entailment and Paraphrasing , pages 1--9, 2007. URL https://aclanthology.org/W07-1401

work page 2007
[17]

Lo PT : Low-rank prompt tuning for parameter efficient language models, 2024

Shouchang Guo, Sonam Damani, and Keng hao Chang. Lo PT : Low-rank prompt tuning for parameter efficient language models, 2024. URL https://arxiv.org/abs/2406.19486

work page arXiv 2024
[18]

Flora: Low-rank adapters are secretly gradient compressors

Yongchang Hao, Yanshuai Cao, and Lili Mou. Flora: Low-rank adapters are secretly gradient compressors. In Proceedings of the 41st International Conference on Machine Learning, 2024. URL https://proceedings.mlr.press/v235/hao24a.html

work page 2024
[19]

Lo RA +: Efficient low rank adaptation of large models

Soufiane Hayou, Nikhil Ghosh, and Bin Yu. Lo RA +: Efficient low rank adaptation of large models. In Proceedings of the 41st International Conference on Machine Learning, 2024. URL https://proceedings.mlr.press/v235/hayou24a.html

work page 2024
[20]

Parameter-efficient transfer learning for NLP

Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. Parameter-efficient transfer learning for NLP . In Proceedings of the 36th International Conference on Machine Learning, pages 2790--2799, 2019. URL https://proceedings.mlr.press/v97/houlsby19a.html

work page 2019
[21]

Lo RA : Low-rank adaptation of large language models

Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lo RA : Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=nZeVKeeFYf9

work page 2022
[22]

Approximate nearest neighbors: towards removing the curse of dimensionality

Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the thirtieth annual ACM symposium on Theory of computing, pages 604--613, 1998. URL https://dl.acm.org/doi/10.1145/276698.276876

work page doi:10.1145/276698.276876 1998
[23]

Hyperdecoders: Instance-specific decoders for multi-task NLP

Hamish Ivison and Matthew Peters. Hyperdecoders: Instance-specific decoders for multi-task NLP . In Findings of the Association for Computational Linguistics: EMNLP, pages 1715--1730, 2022. URL https://aclanthology.org/2022.findings-emnlp.124

work page 2022
[24]

Parameter-efficient multi-task fine-tuning for transformers via shared hypernetworks

Rabeeh Karimi Mahabadi, Sebastian Ruder, Mostafa Dehghani, and James Henderson. Parameter-efficient multi-task fine-tuning for transformers via shared hypernetworks. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pages 565--576, 2021. UR...

work page 2021
[25]

Looking beyond the surface: A challenge set for reading comprehension over multiple sentences

Daniel Khashabi, Snigdha Chaturvedi, Michael Roth, Shyam Upadhyay, and Dan Roth. Looking beyond the surface: A challenge set for reading comprehension over multiple sentences. In Proceedings of the 2018 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies , pages 252--262, 2018. URL https:...

work page 2018
[26]

Scitail: A textual entailment dataset from science question answering

Tushar Khot, Ashish Sabharwal, and Peter Clark. Scitail: A textual entailment dataset from science question answering. Proceedings of the AAAI Conference on Artificial Intelligence, 2018. URL https://ojs.aaai.org/index.php/AAAI/article/view/12022

work page 2018
[27]

Large language models are zero-shot reasoners

Takeshi Kojima, Shixiang (Shane) Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. Large language models are zero-shot reasoners. In Advances in Neural Information Processing Systems, pages 22199--22213, 2022. URL https://openreview.net/pdf?id=e2TBb5y0yFf

work page 2022
[28]

Ve RA : Vector-based random matrix adaptation

Dawid Jan Kopiczko, Tijmen Blankevoort, and Yuki M Asano. Ve RA : Vector-based random matrix adaptation. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=NjNfLdxr3A

work page 2024
[29]

Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov

Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Jacob Devlin, Kenton Lee, Kristina Toutanova, Llion Jones, Matthew Kelcey, Ming-Wei Chang, Andrew M. Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov. Natural questions: A benchmark for question answering research. Transac...

work page 2019
[30]

The power of scale for parameter-efficient prompt tuning

Brian Lester, Rami Al-Rfou, and Noah Constant. The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3045--3059, 2021. URL https://aclanthology.org/2021.emnlp-main.243

work page 2021
[31]

The winograd schema challenge

Hector Levesque, Ernest Davis, and Leora Morgenstern. The winograd schema challenge. In Preceddings of the 13th International Conference on the Principles of Knowledge Representation and Reasoning, 2012. URL https://cdn.aaai.org/ocs/4492/4492-21843-1-PB.pdf

work page 2012
[32]

Prefix- T uning: Optimizing continuous prompts for generation

Xiang Lisa Li and Percy Liang. Prefix- T uning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pages 4582--4597, 2021. URL https://aclanthology.org/2021.acl-long.353

work page 2021
[33]

Relo RA : High-rank training through low-rank updates

Vladislav Lialin, Sherin Muckatira, Namrata Shivagunde, and Anna Rumshisky. Relo RA : High-rank training through low-rank updates. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=DLJznSp6X3

work page 2024
[34]

P- T uning: Prompt tuning can be comparable to fine-tuning across scales and tasks

Xiao Liu, Kaixuan Ji, Yicheng Fu, Weng Tam, Zhengxiao Du, Zhilin Yang, and Jie Tang. P- T uning: Prompt tuning can be comparable to fine-tuning across scales and tasks. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, pages 61--68, 2022. URL https://aclanthology.org/2022.acl-short.8

work page 2022
[35]

GPT understands, too

Xiao Liu, Yanan Zheng, Zhengxiao Du, Ming Ding, Yujie Qian, Zhilin Yang, and Jie Tang. GPT understands, too. AI Open, 5: 0 208--215, 2024. URL https://www.sciencedirect.com/science/article/pii/S2666651023000141

work page 2024
[36]

PEFT : State-of-the-art parameter-efficient fine-tuning methods

Sourab Mangrulkar, Sylvain Gugger, Lysandre Debut, Younes Belkada, Sayak Paul, and Benjamin Bossan. PEFT : State-of-the-art parameter-efficient fine-tuning methods. https://github.com/huggingface/peft, 2022

work page 2022
[37]

On variants of the J ohnson-- L indenstrauss lemma

Ji r \' Matou s ek. On variants of the J ohnson-- L indenstrauss lemma. Random Structures & Algorithms, 33 0 (2): 0 142--156, 2008. URL https://doi.org/10.1002/rsa.20218

work page doi:10.1002/rsa.20218 2008
[38]

Crosslingual generalization through multitask finetuning

Niklas Muennighoff, Thomas Wang, Lintang Sutawika, Adam Roberts, Stella Biderman, Teven Le Scao, M Saiful Bari, Sheng Shen, Zheng Xin Yong, Hailey Schoelkopf, Xiangru Tang, Dragomir Radev, Alham Fikri Aji, Khalid Almubarak, Samuel Albanie, Zaid Alyafeai, Albert Webson, Edward Raff, and Colin Raffel. Crosslingual generalization through multitask finetuning...

work page 2023
[39]

Prompting a pretrained transformer can be a universal approximator

Aleksandar Petrov, Philip Torr, and Adel Bibi. Prompting a pretrained transformer can be a universal approximator. In Proceedings of the 41st International Conference on Machine Learning, 2024 a . URL https://proceedings.mlr.press/v235/petrov24a.html

work page 2024
[40]

When do prompting and prefix-tuning work? a theory of capabilities and limitations

Aleksandar Petrov, Philip Torr, and Adel Bibi. When do prompting and prefix-tuning work? a theory of capabilities and limitations. In The Twelfth International Conference on Learning Representations, 2024 b . URL https://openreview.net/forum?id=JewzobRhay

work page 2024
[41]

W i C : The word-in-context dataset for evaluating context-sensitive meaning representations

Mohammad Taher Pilehvar and Jose Camacho-Collados. W i C : The word-in-context dataset for evaluating context-sensitive meaning representations. In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies , 2019. URL https://aclanthology.org/N19-1128

work page 2019
[42]

Exploring universal intrinsic task subspace via prompt tuning, 2022

Yujia Qin, Xiaozhi Wang, Yusheng Su, Yankai Lin, Ning Ding, Jing Yi, Weize Chen, Zhiyuan Liu, Juanzi Li, Lei Hou, Peng Li, Maosong Sun, and Jie Zhou. Exploring universal intrinsic task subspace via prompt tuning, 2022. URL https://arxiv.org/abs/2110.07867

work page arXiv 2022
[43]

Exploring the limits of transfer learning with a unified text-to-text transformer

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, pages 1--67, 2020. URL https://jmlr.org/papers/v21/20-074.html

work page 2020
[44]

Residual P rompt T uning: improving prompt tuning with residual reparameterization

Anastasiia Razdaibiedina, Yuning Mao, Madian Khabsa, Mike Lewis, Rui Hou, Jimmy Ba, and Amjad Almahairi. Residual P rompt T uning: improving prompt tuning with residual reparameterization. In Findings of the Association for Computational Linguistics: ACL, pages 6740--6757, 2023. URL https://aclanthology.org/2023.findings-acl.421

work page 2023
[45]

AdapterDrop : O n the efficiency of adapters in transformers

Andreas R \"u ckl \'e , Gregor Geigle, Max Glockner, Tilman Beck, Jonas Pfeiffer, Nils Reimers, and Iryna Gurevych. AdapterDrop : O n the efficiency of adapters in transformers. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021. URL https://aclanthology.org/2021.emnlp-main.626

work page 2021
[46]

Wino G rande: An adversarial winograd schema challenge at scale

Keisuke Sakaguchi, Ronan Le Bras, Chandra Bhagavatula, and Yejin Choi. Wino G rande: An adversarial winograd schema challenge at scale. Communications of the ACM, page 99–106, 2021. URL https://doi.org/10.1145/3474381

work page doi:10.1145/3474381 2021
[47]

De PT : Decomposed prompt tuning for parameter-efficient fine-tuning

Zhengxiang Shi and Aldo Lipani. De PT : Decomposed prompt tuning for parameter-efficient fine-tuning. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=KjegfPGRde

work page 2024
[48]

Logan IV, Eric Wallace, and Sameer Singh

Taylor Shin, Yasaman Razeghi, Robert L. Logan IV, Eric Wallace, and Sameer Singh. A uto P rompt: E liciting K nowledge from L anguage M odels with A utomatically G enerated P rompts. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pages 4222--4235, 2020. URL https://aclanthology.org/2020.emnlp-main.346

work page 2020
[49]

Manning, Andrew Ng, and Christopher Potts

Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher Potts. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1631--1642, 2013. URL https://aclanthology.org/D13-1170/

work page 2013
[50]

LST : Ladder side-tuning for parameter and memory efficient transfer learning

Yi-Lin Sung, Jaemin Cho, and Mohit Bansal. LST : Ladder side-tuning for parameter and memory efficient transfer learning. In Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=isPnnaTZaP5

work page 2022
[51]

LLaMA: Open and Efficient Foundation Language Models

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timoth \'e e Lacroix, Baptiste Rozi \`e re, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023. URL https://arxiv.org/abs/2302.13971

work page internal anchor Pith review Pith/arXiv arXiv 2023
[52]

N ews QA : A machine comprehension dataset

Adam Trischler, Tong Wang, Xingdi Yuan, Justin Harris, Alessandro Sordoni, Philip Bachman, and Kaheer Suleman. N ews QA : A machine comprehension dataset. In Proceedings of the 2nd Workshop on Representation Learning for NLP , pages 191--200, 2017. URL https://aclanthology.org/W17-2623

work page 2017
[53]

SP o T : Better frozen model adaptation through soft prompt transfer

Tu Vu, Brian Lester, Noah Constant, Rami Al-Rfou ' , and Daniel Cer. SP o T : Better frozen model adaptation through soft prompt transfer. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, pages 5039--5059, 2022. URL https://aclanthology.org/2022.acl-long.346

work page 2022
[54]

GLUE : A multi-task benchmark and analysis platform for natural language understanding

Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel Bowman. GLUE : A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 2018 EMNLP Workshop B lackbox NLP : Analyzing and Interpreting Neural Networks for NLP , pages 353--355, 2018. URL https://aclanthology.org/W18-5446

work page 2018
[55]

Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. Super GLUE : A stickier benchmark for general-purpose language understanding systems. In arxiv, 2019. URL http://arxiv.org/abs/1905.00537

work page internal anchor Pith review Pith/arXiv arXiv 2019
[56]

Multitask prompt tuning enables parameter-efficient transfer learning

Zhen Wang, Rameswar Panda, Leonid Karlinsky, Rogerio Feris, Huan Sun, and Yoon Kim. Multitask prompt tuning enables parameter-efficient transfer learning. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=Nk2pDtuhTq

work page 2023
[57]

Learning to prompt for continual learning

Zifeng Wang, Zizhao Zhang, Chen-Yu Lee, Han Zhang, Ruoxi Sun, Xiaoqi Ren, Guolong Su, Vincent Perot, Jennifer Dy, and Tomas Pfister. Learning to prompt for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 139--149, June 2022. URL https://openaccess.thecvf.com/content/CVPR2022/html/Wang_Learnin...

work page 2022
[58]

Alex Warstadt, Amanpreet Singh, and Samuel R. Bowman. Neural network acceptability judgments. Transactions of the Association for Computational Linguistics, 2019. URL https://aclanthology.org/Q19-1040

work page 2019
[59]

Dai, and Quoc V Le

Jason Wei, Maarten Bosma, Vincent Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, and Quoc V Le. Finetuned language models are zero-shot learners. In International Conference on Learning Representations, 2022 a . URL https://openreview.net/forum?id=gEZrGCozdqR

work page 2022
[60]

Chain-of-thought prompting elicits reasoning in large language models

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, brian ichter, Fei Xia, Ed Chi, Quoc V Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems, pages 24824--24837, 2022 b . URL https://openreview.net/pdf?id=_VjQlMeSB_J

work page 2022
[61]

A broad-coverage challenge corpus for sentence understanding through inference

Adina Williams, Nikita Nangia, and Samuel Bowman. A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the 2018 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies , pages 1112--1122, 2018. URL https://aclanthology.org/N18-1101/

work page 2018
[62]

Mixture of L o RA experts

Xun Wu, Shaohan Huang, and Furu Wei. Mixture of L o RA experts. In The Twelfth International Conference on Learning Representations, 2024 a . URL https://openreview.net/forum?id=uWvKBCYh4S

work page 2024
[63]

Re FT : Representation finetuning for language models

Zhengxuan Wu, Aryaman Arora, Zheng Wang, Atticus Geiger, Dan Jurafsky, Christopher D Manning, and Christopher Potts. Re FT : Representation finetuning for language models. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024 b . URL https://openreview.net/forum?id=fykjplMc0V

work page 2024
[64]

Zero-shot continuous prompt transfer: Generalizing task semantics across language models

Zijun Wu, Yongkang Wu, and Lili Mou. Zero-shot continuous prompt transfer: Generalizing task semantics across language models. In The Twelfth International Conference on Learning Representations, 2024 c . URL https://openreview.net/forum?id=26XphugOcS

work page 2024
[65]

Decomposed prompt tuning via low-rank reparameterization

Yao Xiao, Lu Xu, Jiaxi Li, Wei Lu, and Xiaoli Li. Decomposed prompt tuning via low-rank reparameterization. In Findings of the Association for Computational Linguistics: EMNLP, pages 13335--13347, 2023. URL https://aclanthology.org/2023.findings-emnlp.890

work page 2023
[66]

Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William Cohen, Ruslan Salakhutdinov, and Christopher D. Manning. H otpot QA : A dataset for diverse, explainable multi-hop question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2369--2380, 2018. URL https://aclanthology.org/D18-1259/

work page 2018
[67]

Lo F i T : Localized fine-tuning on LLM representations

Fangcong Yin, Xi Ye, and Greg Durrett. Lo F i T : Localized fine-tuning on LLM representations. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URL https://openreview.net/forum?id=dfiXFbECSZ

work page 2024
[68]

Character-level convolutional networks for text classification

Xiang Zhang, Junbo Zhao, and Yann LeCun. Character-level convolutional networks for text classification. In Proceedings of the 28th International Conference on Neural Information Processing Systems, page 649–657, 2015. URL https://proceedings.neurips.cc/paper/2015/file/250cf8b51c773f3f8dc8b4be867a9a02-Paper.pdf

work page 2015
[69]

PAWS : Paraphrase adversaries from word scrambling

Yuan Zhang, Jason Baldridge, and Luheng He. PAWS : Paraphrase adversaries from word scrambling. In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies , pages 1298--1308, 2019. URL https://aclanthology.org/N19-1131

work page 2019
[70]

Tuning layernorm in attention: Towards efficient multi-modal LLM finetuning

Bingchen Zhao, Haoqin Tu, Chen Wei, Jieru Mei, and Cihang Xie. Tuning layernorm in attention: Towards efficient multi-modal LLM finetuning. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=YR3ETaElNK

work page 2024

[1] [1]

Intrinsic dimensionality explains the effectiveness of language model fine-tuning

Armen Aghajanyan, Sonal Gupta, and Luke Zettlemoyer. Intrinsic dimensionality explains the effectiveness of language model fine-tuning. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pages 7319--7328, 2021. URL https://aclanthology.org/2...

work page 2021

[2] [2]

ATTEMPT : Parameter-efficient multi-task tuning via attentional mixtures of soft prompts

Akari Asai, Mohammadreza Salehi, Matthew Peters, and Hannaneh Hajishirzi. ATTEMPT : Parameter-efficient multi-task tuning via attentional mixtures of soft prompts. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 6655--6672, 2022. URL https://aclanthology.org/2022.emnlp-main.446

work page 2022

[3] [3]

B it F it: Simple parameter-efficient fine-tuning for transformer-based masked language-models

Elad Ben Zaken, Yoav Goldberg, and Shauli Ravfogel. B it F it: Simple parameter-efficient fine-tuning for transformer-based masked language-models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, pages 1--9, 2022. URL https://aclanthology.org/2022.acl-short.1

work page 2022

[4] [4]

Random projection in dimensionality reduction: applications to image and text data

Ella Bingham and Heikki Mannila. Random projection in dimensionality reduction: applications to image and text data. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, page 245–250, 2001. URL https://doi.org/10.1145/502512.502546

work page doi:10.1145/502512.502546 2001

[5] [5]

Language models are few-shot learners

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gr...

work page 1901

[6] [6]

S em E val-2017 task 1: Semantic textual similarity multilingual and crosslingual focused evaluation

Daniel Cer, Mona Diab, Eneko Agirre, I \ n igo Lopez-Gazpio, and Lucia Specia. S em E val-2017 task 1: Semantic textual similarity multilingual and crosslingual focused evaluation. In Proceedings of the 11th International Workshop on Semantic Evaluation, pages 1--14, 2017. URL https://aclanthology.org/S17-2001/

work page 2017

[7] [7]

SM o P : Towards efficient and effective prompt tuning with sparse mixture-of-prompts

Joon-Young Choi, Junho Kim, Jun-Hyung Park, Wing-Lam Mok, and SangKeun Lee. SM o P : Towards efficient and effective prompt tuning with sparse mixture-of-prompts. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 14306--14316, 2023. URL https://aclanthology.org/2023.emnlp-main.884

work page 2023

[8] [8]

B ool Q : Exploring the surprising difficulty of natural yes/no questions

Christopher Clark, Kenton Lee, Ming-Wei Chang, Tom Kwiatkowski, Michael Collins, and Kristina Toutanova. B ool Q : Exploring the surprising difficulty of natural yes/no questions. In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies , pages 2924--2936, 2019. URL ...

work page 2019

[9] [9]

An elementary proof of a theorem of J ohnson and L indenstrauss

Sanjoy Dasgupta and Anupam Gupta. An elementary proof of a theorem of J ohnson and L indenstrauss. Random Structures & Algorithms, 22 0 (1): 0 60--65, 2003. URL https://doi.org/10.1002/rsa.10073

work page doi:10.1002/rsa.10073 2003

[10] [10]

The commitmentbank: Investigating projection in naturally occurring discourse

Marie-Catherine De Marneffe, Mandy Simons, and Judith Tonhauser. The commitmentbank: Investigating projection in naturally occurring discourse. In Proceedings of Sinn und Bedeutung, volume 23, pages 107--124, 2019. URL https://semanticsarchive.net/Archive/Tg3ZGI2M/Marneffe.pdf

work page 2019

[11] [11]

Transforming Question Answering Datasets Into Natural Language Inference Datasets

Dorottya Demszky, Kelvin Guu, and Percy Liang. Transforming question answering datasets into natural language inference datasets, 2018. URL https://arxiv.org/abs/1809.02922

work page internal anchor Pith review Pith/arXiv arXiv 2018

[12] [12]

Dolan and Chris Brockett

William B. Dolan and Chris Brockett. Automatically constructing a corpus of sentential paraphrases. In Proceedings of the Third International Workshop on Paraphrasing, 2005. URL https://aclanthology.org/I05-5002/

work page 2005

[13] [13]

A survey on in-context learning

Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Baobao Chang, Xu Sun, Lei Li, and Zhifang Sui. A survey on in-context learning. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 1107--1128, 2024. URL https://aclanthology.org/2024.emnlp-main.64

work page 2024

[14] [14]

SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine

Matthew Dunn, Levent Sagun, Mike Higgins, V Ugur Guney, Volkan Cirik, and Kyunghyun Cho. Search QA : A new Q & A dataset augmented with context from a search engine. arXiv preprint arXiv:1704.05179, 2017. URL https://arxiv.org/abs/1704.05179

work page internal anchor Pith review Pith/arXiv arXiv 2017

[15] [15]

MRQA 2019 shared task: Evaluating generalization in reading comprehension

Adam Fisch, Alon Talmor, Robin Jia, Minjoon Seo, Eunsol Choi, and Danqi Chen. MRQA 2019 shared task: Evaluating generalization in reading comprehension. In Proceedings of the 2nd Workshop on Machine Reading for Question Answering, pages 1--13, 2019. URL https://aclanthology.org/D19-5801

work page 2019

[16] [16]

The third PASCAL recognizing textual entailment challenge

Danilo Giampiccolo, Bernardo Magnini, Ido Dagan, and Bill Dolan. The third PASCAL recognizing textual entailment challenge. In Proceedings of the ACL - PASCAL Workshop on Textual Entailment and Paraphrasing , pages 1--9, 2007. URL https://aclanthology.org/W07-1401

work page 2007

[17] [17]

Lo PT : Low-rank prompt tuning for parameter efficient language models, 2024

Shouchang Guo, Sonam Damani, and Keng hao Chang. Lo PT : Low-rank prompt tuning for parameter efficient language models, 2024. URL https://arxiv.org/abs/2406.19486

work page arXiv 2024

[18] [18]

Flora: Low-rank adapters are secretly gradient compressors

Yongchang Hao, Yanshuai Cao, and Lili Mou. Flora: Low-rank adapters are secretly gradient compressors. In Proceedings of the 41st International Conference on Machine Learning, 2024. URL https://proceedings.mlr.press/v235/hao24a.html

work page 2024

[19] [19]

Lo RA +: Efficient low rank adaptation of large models

Soufiane Hayou, Nikhil Ghosh, and Bin Yu. Lo RA +: Efficient low rank adaptation of large models. In Proceedings of the 41st International Conference on Machine Learning, 2024. URL https://proceedings.mlr.press/v235/hayou24a.html

work page 2024

[20] [20]

Parameter-efficient transfer learning for NLP

Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. Parameter-efficient transfer learning for NLP . In Proceedings of the 36th International Conference on Machine Learning, pages 2790--2799, 2019. URL https://proceedings.mlr.press/v97/houlsby19a.html

work page 2019

[21] [21]

Lo RA : Low-rank adaptation of large language models

Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lo RA : Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=nZeVKeeFYf9

work page 2022

[22] [22]

Approximate nearest neighbors: towards removing the curse of dimensionality

Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the thirtieth annual ACM symposium on Theory of computing, pages 604--613, 1998. URL https://dl.acm.org/doi/10.1145/276698.276876

work page doi:10.1145/276698.276876 1998

[23] [23]

Hyperdecoders: Instance-specific decoders for multi-task NLP

Hamish Ivison and Matthew Peters. Hyperdecoders: Instance-specific decoders for multi-task NLP . In Findings of the Association for Computational Linguistics: EMNLP, pages 1715--1730, 2022. URL https://aclanthology.org/2022.findings-emnlp.124

work page 2022

[24] [24]

Parameter-efficient multi-task fine-tuning for transformers via shared hypernetworks

Rabeeh Karimi Mahabadi, Sebastian Ruder, Mostafa Dehghani, and James Henderson. Parameter-efficient multi-task fine-tuning for transformers via shared hypernetworks. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pages 565--576, 2021. UR...

work page 2021

[25] [25]

Looking beyond the surface: A challenge set for reading comprehension over multiple sentences

Daniel Khashabi, Snigdha Chaturvedi, Michael Roth, Shyam Upadhyay, and Dan Roth. Looking beyond the surface: A challenge set for reading comprehension over multiple sentences. In Proceedings of the 2018 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies , pages 252--262, 2018. URL https:...

work page 2018

[26] [26]

Scitail: A textual entailment dataset from science question answering

Tushar Khot, Ashish Sabharwal, and Peter Clark. Scitail: A textual entailment dataset from science question answering. Proceedings of the AAAI Conference on Artificial Intelligence, 2018. URL https://ojs.aaai.org/index.php/AAAI/article/view/12022

work page 2018

[27] [27]

Large language models are zero-shot reasoners

Takeshi Kojima, Shixiang (Shane) Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. Large language models are zero-shot reasoners. In Advances in Neural Information Processing Systems, pages 22199--22213, 2022. URL https://openreview.net/pdf?id=e2TBb5y0yFf

work page 2022

[28] [28]

Ve RA : Vector-based random matrix adaptation

Dawid Jan Kopiczko, Tijmen Blankevoort, and Yuki M Asano. Ve RA : Vector-based random matrix adaptation. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=NjNfLdxr3A

work page 2024

[29] [29]

Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov

Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Jacob Devlin, Kenton Lee, Kristina Toutanova, Llion Jones, Matthew Kelcey, Ming-Wei Chang, Andrew M. Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov. Natural questions: A benchmark for question answering research. Transac...

work page 2019

[30] [30]

The power of scale for parameter-efficient prompt tuning

Brian Lester, Rami Al-Rfou, and Noah Constant. The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3045--3059, 2021. URL https://aclanthology.org/2021.emnlp-main.243

work page 2021

[31] [31]

The winograd schema challenge

Hector Levesque, Ernest Davis, and Leora Morgenstern. The winograd schema challenge. In Preceddings of the 13th International Conference on the Principles of Knowledge Representation and Reasoning, 2012. URL https://cdn.aaai.org/ocs/4492/4492-21843-1-PB.pdf

work page 2012

[32] [32]

Prefix- T uning: Optimizing continuous prompts for generation

Xiang Lisa Li and Percy Liang. Prefix- T uning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pages 4582--4597, 2021. URL https://aclanthology.org/2021.acl-long.353

work page 2021

[33] [33]

Relo RA : High-rank training through low-rank updates

Vladislav Lialin, Sherin Muckatira, Namrata Shivagunde, and Anna Rumshisky. Relo RA : High-rank training through low-rank updates. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=DLJznSp6X3

work page 2024

[34] [34]

P- T uning: Prompt tuning can be comparable to fine-tuning across scales and tasks

Xiao Liu, Kaixuan Ji, Yicheng Fu, Weng Tam, Zhengxiao Du, Zhilin Yang, and Jie Tang. P- T uning: Prompt tuning can be comparable to fine-tuning across scales and tasks. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, pages 61--68, 2022. URL https://aclanthology.org/2022.acl-short.8

work page 2022

[35] [35]

GPT understands, too

Xiao Liu, Yanan Zheng, Zhengxiao Du, Ming Ding, Yujie Qian, Zhilin Yang, and Jie Tang. GPT understands, too. AI Open, 5: 0 208--215, 2024. URL https://www.sciencedirect.com/science/article/pii/S2666651023000141

work page 2024

[36] [36]

PEFT : State-of-the-art parameter-efficient fine-tuning methods

Sourab Mangrulkar, Sylvain Gugger, Lysandre Debut, Younes Belkada, Sayak Paul, and Benjamin Bossan. PEFT : State-of-the-art parameter-efficient fine-tuning methods. https://github.com/huggingface/peft, 2022

work page 2022

[37] [37]

On variants of the J ohnson-- L indenstrauss lemma

Ji r \' Matou s ek. On variants of the J ohnson-- L indenstrauss lemma. Random Structures & Algorithms, 33 0 (2): 0 142--156, 2008. URL https://doi.org/10.1002/rsa.20218

work page doi:10.1002/rsa.20218 2008

[38] [38]

Crosslingual generalization through multitask finetuning

Niklas Muennighoff, Thomas Wang, Lintang Sutawika, Adam Roberts, Stella Biderman, Teven Le Scao, M Saiful Bari, Sheng Shen, Zheng Xin Yong, Hailey Schoelkopf, Xiangru Tang, Dragomir Radev, Alham Fikri Aji, Khalid Almubarak, Samuel Albanie, Zaid Alyafeai, Albert Webson, Edward Raff, and Colin Raffel. Crosslingual generalization through multitask finetuning...

work page 2023

[39] [39]

Prompting a pretrained transformer can be a universal approximator

Aleksandar Petrov, Philip Torr, and Adel Bibi. Prompting a pretrained transformer can be a universal approximator. In Proceedings of the 41st International Conference on Machine Learning, 2024 a . URL https://proceedings.mlr.press/v235/petrov24a.html

work page 2024

[40] [40]

When do prompting and prefix-tuning work? a theory of capabilities and limitations

Aleksandar Petrov, Philip Torr, and Adel Bibi. When do prompting and prefix-tuning work? a theory of capabilities and limitations. In The Twelfth International Conference on Learning Representations, 2024 b . URL https://openreview.net/forum?id=JewzobRhay

work page 2024

[41] [41]

W i C : The word-in-context dataset for evaluating context-sensitive meaning representations

Mohammad Taher Pilehvar and Jose Camacho-Collados. W i C : The word-in-context dataset for evaluating context-sensitive meaning representations. In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies , 2019. URL https://aclanthology.org/N19-1128

work page 2019

[42] [42]

Exploring universal intrinsic task subspace via prompt tuning, 2022

Yujia Qin, Xiaozhi Wang, Yusheng Su, Yankai Lin, Ning Ding, Jing Yi, Weize Chen, Zhiyuan Liu, Juanzi Li, Lei Hou, Peng Li, Maosong Sun, and Jie Zhou. Exploring universal intrinsic task subspace via prompt tuning, 2022. URL https://arxiv.org/abs/2110.07867

work page arXiv 2022

[43] [43]

Exploring the limits of transfer learning with a unified text-to-text transformer

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, pages 1--67, 2020. URL https://jmlr.org/papers/v21/20-074.html

work page 2020

[44] [44]

Residual P rompt T uning: improving prompt tuning with residual reparameterization

Anastasiia Razdaibiedina, Yuning Mao, Madian Khabsa, Mike Lewis, Rui Hou, Jimmy Ba, and Amjad Almahairi. Residual P rompt T uning: improving prompt tuning with residual reparameterization. In Findings of the Association for Computational Linguistics: ACL, pages 6740--6757, 2023. URL https://aclanthology.org/2023.findings-acl.421

work page 2023

[45] [45]

AdapterDrop : O n the efficiency of adapters in transformers

Andreas R \"u ckl \'e , Gregor Geigle, Max Glockner, Tilman Beck, Jonas Pfeiffer, Nils Reimers, and Iryna Gurevych. AdapterDrop : O n the efficiency of adapters in transformers. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021. URL https://aclanthology.org/2021.emnlp-main.626

work page 2021

[46] [46]

Wino G rande: An adversarial winograd schema challenge at scale

Keisuke Sakaguchi, Ronan Le Bras, Chandra Bhagavatula, and Yejin Choi. Wino G rande: An adversarial winograd schema challenge at scale. Communications of the ACM, page 99–106, 2021. URL https://doi.org/10.1145/3474381

work page doi:10.1145/3474381 2021

[47] [47]

De PT : Decomposed prompt tuning for parameter-efficient fine-tuning

Zhengxiang Shi and Aldo Lipani. De PT : Decomposed prompt tuning for parameter-efficient fine-tuning. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=KjegfPGRde

work page 2024

[48] [48]

Logan IV, Eric Wallace, and Sameer Singh

Taylor Shin, Yasaman Razeghi, Robert L. Logan IV, Eric Wallace, and Sameer Singh. A uto P rompt: E liciting K nowledge from L anguage M odels with A utomatically G enerated P rompts. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pages 4222--4235, 2020. URL https://aclanthology.org/2020.emnlp-main.346

work page 2020

[49] [49]

Manning, Andrew Ng, and Christopher Potts

Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher Potts. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1631--1642, 2013. URL https://aclanthology.org/D13-1170/

work page 2013

[50] [50]

LST : Ladder side-tuning for parameter and memory efficient transfer learning

Yi-Lin Sung, Jaemin Cho, and Mohit Bansal. LST : Ladder side-tuning for parameter and memory efficient transfer learning. In Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=isPnnaTZaP5

work page 2022

[51] [51]

LLaMA: Open and Efficient Foundation Language Models

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timoth \'e e Lacroix, Baptiste Rozi \`e re, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023. URL https://arxiv.org/abs/2302.13971

work page internal anchor Pith review Pith/arXiv arXiv 2023

[52] [52]

N ews QA : A machine comprehension dataset

Adam Trischler, Tong Wang, Xingdi Yuan, Justin Harris, Alessandro Sordoni, Philip Bachman, and Kaheer Suleman. N ews QA : A machine comprehension dataset. In Proceedings of the 2nd Workshop on Representation Learning for NLP , pages 191--200, 2017. URL https://aclanthology.org/W17-2623

work page 2017

[53] [53]

SP o T : Better frozen model adaptation through soft prompt transfer

Tu Vu, Brian Lester, Noah Constant, Rami Al-Rfou ' , and Daniel Cer. SP o T : Better frozen model adaptation through soft prompt transfer. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, pages 5039--5059, 2022. URL https://aclanthology.org/2022.acl-long.346

work page 2022

[54] [54]

GLUE : A multi-task benchmark and analysis platform for natural language understanding

Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel Bowman. GLUE : A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 2018 EMNLP Workshop B lackbox NLP : Analyzing and Interpreting Neural Networks for NLP , pages 353--355, 2018. URL https://aclanthology.org/W18-5446

work page 2018

[55] [55]

Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. Super GLUE : A stickier benchmark for general-purpose language understanding systems. In arxiv, 2019. URL http://arxiv.org/abs/1905.00537

work page internal anchor Pith review Pith/arXiv arXiv 2019

[56] [56]

Multitask prompt tuning enables parameter-efficient transfer learning

Zhen Wang, Rameswar Panda, Leonid Karlinsky, Rogerio Feris, Huan Sun, and Yoon Kim. Multitask prompt tuning enables parameter-efficient transfer learning. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=Nk2pDtuhTq

work page 2023

[57] [57]

Learning to prompt for continual learning

Zifeng Wang, Zizhao Zhang, Chen-Yu Lee, Han Zhang, Ruoxi Sun, Xiaoqi Ren, Guolong Su, Vincent Perot, Jennifer Dy, and Tomas Pfister. Learning to prompt for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 139--149, June 2022. URL https://openaccess.thecvf.com/content/CVPR2022/html/Wang_Learnin...

work page 2022

[58] [58]

Alex Warstadt, Amanpreet Singh, and Samuel R. Bowman. Neural network acceptability judgments. Transactions of the Association for Computational Linguistics, 2019. URL https://aclanthology.org/Q19-1040

work page 2019

[59] [59]

Dai, and Quoc V Le

Jason Wei, Maarten Bosma, Vincent Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, and Quoc V Le. Finetuned language models are zero-shot learners. In International Conference on Learning Representations, 2022 a . URL https://openreview.net/forum?id=gEZrGCozdqR

work page 2022

[60] [60]

Chain-of-thought prompting elicits reasoning in large language models

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, brian ichter, Fei Xia, Ed Chi, Quoc V Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems, pages 24824--24837, 2022 b . URL https://openreview.net/pdf?id=_VjQlMeSB_J

work page 2022

[61] [61]

A broad-coverage challenge corpus for sentence understanding through inference

Adina Williams, Nikita Nangia, and Samuel Bowman. A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the 2018 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies , pages 1112--1122, 2018. URL https://aclanthology.org/N18-1101/

work page 2018

[62] [62]

Mixture of L o RA experts

Xun Wu, Shaohan Huang, and Furu Wei. Mixture of L o RA experts. In The Twelfth International Conference on Learning Representations, 2024 a . URL https://openreview.net/forum?id=uWvKBCYh4S

work page 2024

[63] [63]

Re FT : Representation finetuning for language models

Zhengxuan Wu, Aryaman Arora, Zheng Wang, Atticus Geiger, Dan Jurafsky, Christopher D Manning, and Christopher Potts. Re FT : Representation finetuning for language models. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024 b . URL https://openreview.net/forum?id=fykjplMc0V

work page 2024

[64] [64]

Zero-shot continuous prompt transfer: Generalizing task semantics across language models

Zijun Wu, Yongkang Wu, and Lili Mou. Zero-shot continuous prompt transfer: Generalizing task semantics across language models. In The Twelfth International Conference on Learning Representations, 2024 c . URL https://openreview.net/forum?id=26XphugOcS

work page 2024

[65] [65]

Decomposed prompt tuning via low-rank reparameterization

Yao Xiao, Lu Xu, Jiaxi Li, Wei Lu, and Xiaoli Li. Decomposed prompt tuning via low-rank reparameterization. In Findings of the Association for Computational Linguistics: EMNLP, pages 13335--13347, 2023. URL https://aclanthology.org/2023.findings-emnlp.890

work page 2023

[66] [66]

Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William Cohen, Ruslan Salakhutdinov, and Christopher D. Manning. H otpot QA : A dataset for diverse, explainable multi-hop question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2369--2380, 2018. URL https://aclanthology.org/D18-1259/

work page 2018

[67] [67]

Lo F i T : Localized fine-tuning on LLM representations

Fangcong Yin, Xi Ye, and Greg Durrett. Lo F i T : Localized fine-tuning on LLM representations. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URL https://openreview.net/forum?id=dfiXFbECSZ

work page 2024

[68] [68]

Character-level convolutional networks for text classification

Xiang Zhang, Junbo Zhao, and Yann LeCun. Character-level convolutional networks for text classification. In Proceedings of the 28th International Conference on Neural Information Processing Systems, page 649–657, 2015. URL https://proceedings.neurips.cc/paper/2015/file/250cf8b51c773f3f8dc8b4be867a9a02-Paper.pdf

work page 2015

[69] [69]

PAWS : Paraphrase adversaries from word scrambling

Yuan Zhang, Jason Baldridge, and Luheng He. PAWS : Paraphrase adversaries from word scrambling. In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies , pages 1298--1308, 2019. URL https://aclanthology.org/N19-1131

work page 2019

[70] [70]

Tuning layernorm in attention: Towards efficient multi-modal LLM finetuning

Bingchen Zhao, Haoqin Tu, Chen Wei, Jieru Mei, and Cihang Xie. Tuning layernorm in attention: Towards efficient multi-modal LLM finetuning. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=YR3ETaElNK

work page 2024