arxiv: 2605.14055 · v1 · submitted 2026-05-13 · 💻 cs.CL · cs.AI

Recognition: 2 theorem links

· Lean Theorem

PEML: Parameter-efficient Multi-Task Learning with Optimized Continuous Prompts

Anjir Ahmed Chowdhury , Syed Zawad , Xiaolong Ma , Xu Dong , Feng Yan

Authors on Pith no claims yet

Pith reviewed 2026-05-15 05:22 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords parameter-efficient fine-tuningmulti-task learningcontinuous promptslow-rank adaptationprompt optimizationGLUE benchmarkSuperGLUElarge language models

0 comments

The pith

PEML jointly optimizes continuous prompts via neural architecture engineering and low-rank model adaptation to improve multi-task LLM performance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes PEML to overcome limits of single-task methods like LoRA and Prefix Tuning when handling multiple tasks at once. Existing approaches either focus only on model weights or use simple prompt structures that restrict adaptation. PEML instead co-optimizes prompt tuning with a dedicated neural architecture and applies low-rank adaptation to the model weights. This shared setup exploits common features across tasks to cut data needs and deployment costs. Tests on GLUE, SuperGLUE, MMLU, and commonsense benchmarks show gains over MTL-LoRA, MultiLoRA, C-Poly, and MoE.

Core claim

PEML employs a neural architecture engineering method for optimizing the continuous prompts while also performing low-rank adaption for model weights. On the GLUE, SuperGLUE, Massive Multitask Language Understanding, and commonsense reasoning benchmarks it delivers an average accuracy improvement of up to 6.67 percent, with individual tasks reaching peak gains of up to 10.75 percent over state-of-the-art multi-task baselines.

What carries the argument

PEML framework that pairs neural-architecture prompt optimization with low-rank adaptation of model weights to co-optimize prompt and model adaptation for multi-task learning.

If this is right

Outperforms MTL-LoRA, MultiLoRA, C-Poly, and MoE on GLUE, SuperGLUE, MMLU, and commonsense reasoning tasks.
Delivers up to 6.67 percent average accuracy gain and up to 10.75 percent peak gains on individual tasks.
Supports resource consolidation by fine-tuning one LLM for multiple tasks instead of separate models.
Lowers overall data requirements for fine-tuning by leveraging shared features across tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same prompt-plus-adaptation pattern could be tested on non-language domains such as vision-language models to check transfer.
If the gains hold at larger scales, PEML-style co-optimization might become a default template for any multi-task LLM deployment.
Developers could explore whether the automated prompt-optimization framework reduces the need for task-specific prompt engineering expertise.

Load-bearing premise

The assumption that the neural architecture for prompt optimization combined with low-rank adaptation will consistently outperform existing methods across diverse tasks without introducing new overfitting risks or requiring extensive hyperparameter tuning.

What would settle it

Evaluating PEML on a fresh collection of multi-task benchmarks outside the original GLUE, SuperGLUE, MMLU, and commonsense sets and checking whether the reported accuracy margins disappear.

Figures

Figures reproduced from arXiv: 2605.14055 by Anjir Ahmed Chowdhury, Feng Yan, Syed Zawad, Xiaolong Ma, Xu Dong.

**Figure 1.** Figure 1: Overview of PEML. Hidden states are projected into queries, keys, and values using WQ, WK and WV , with LoRA applied to key and value projections. Learnable prefix vectors PK and PV are prepended to the key and value sequences, enabling taskspecific conditioning during multi-head attention. LoRA Hu et al. [2021] and Prefix Tuning Li and Liang [2021] are among prominent PEFT methods for adapting models t… view at source ↗

**Figure 2.** Figure 2: Unified view of PEML method. The left side illustrates LoRA, where matrix B is set to 0 and matrix A follows a normal distribution N(0, σ 2 ). The right side represents PrefixNAS, showcasing the optimal architecture derived from the search process. 4 [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: The performance of PEML on SuperGLUE benchmark with different sensitive hyperparameter configurations. 6 Conclusion In this research, we introduced PEML, a novel approach designed to overcome the limitations of existing state-of-the-art (SOTA) PEFT methods in multi-task learning. SOTA techniques such as MTL, MultiLoRA, C-Poly, and MoE often struggle with challenges such as prompt misalignment, static task … view at source ↗

**Figure 4.** Figure 4: Trade-off between HellaSwag specialization and overall average accuracy as task sampling weight γ varies. The default configuration (γ=0.1) optimizes for overall average performance. We interpret these findings as reflecting the inherent trade-off in multi-task PEFT models between learning shared representations for generalization and retaining task-specific flexibility for optimal adaptation. This is al… view at source ↗

**Figure 5.** Figure 5: Computational cost and GLUE performance of PEFT methods. Another challenge in comparing PEML to baselines such as MTL-LoRA is that these methods report only the final training time, despite requiring substantial hyperparameter sweeps to achieve competitive performance. For instance, MTL-LoRA evaluates robustness across multiple hyperparameters, including the number of up-projection matrices (n), the tempe… view at source ↗

**Figure 6.** Figure 6: Compute accuracy tradeoff for PrefixNAS vs Two-stage vs surrogate-based NAS. Exploring two-stage or surrogate-based strategies for PrefixNAS can moderately reduce GPU usage by performing the architecture search on a smaller proxy or using an approximate performance predictor. As shown in [PITH_FULL_IMAGE:figures/full_fig_p022_6.png] view at source ↗

**Figure 7.** Figure 7: (a) Throughput and (b) peak VRAM usage benchmarked when training LLaMA-7B with sequences of 1024 tokens. n × r on horizontal axis indicates total rank of MultiLoRA and PEML. 7.11 Stability of the Differentiable–Discrete Transition PrefixNAS conducts architecture search using a continuous relaxation and finalizes the architecture via an argmax operation 8. To evaluate the stability of this transition, we co… view at source ↗

**Figure 8.** Figure 8: (a) KDE contours of architectural gradient norms during search. Softmax + Argmax shows tighter concentration, indicating stable optimization. and (b) Distribution of architectural gradient norms during search. Softmax + Argmax exhibits tighter variance, indicating more stable optimization. In addition, we provide a box plot 8 of architectural gradient norms to illustrate the distributional stability during… view at source ↗

read the original abstract

Parameter-Efficient Fine-Tuning (PEFT) is widely used for adapting Large Language Models (LLMs) for various tasks. Recently, there has been an increasing demand for fine-tuning a single LLM for multiple tasks because it requires overall less data for fine-tuning thanks to the common features shared among tasks. More importantly, LLMs are resource demanding and deploying a single model for multiple tasks facilitates resource consolidation and consumes significantly less resources compared to deploying individual large model for each task. Existing PEFT methods like LoRA and Prefix Tuning are designed to adapt LLMs to a specific task. LoRA and its variation focus on aligning the model itself for tasks, overlooking the importance of prompt tuning in multi-task learning while Prefix Tuning only adopts a simple architecture to optimize prompts, which limits the adaption capabilities for multi-task. To enable efficient fine-tuning for multi-task learning, it is important to co-optimize prompt optimization and model adaptation. In this work, we propose a Parameter-Efficient Multi-task Learning (\PM), which employs a neural architecture engineering method for optimizing the continuous prompts while also performing low-rank adaption for model weights. We prototype PEML by creating an automated framework for optimizing the continuous prompts and adapting model weights. We evaluate PEML against state-of-the-arts multi-task learning methods MTL-LoRA, MultiLoRa, C-Poly, and MoE, on the GLUE, SuperGLUE, Massive Multitask Language Understanding, and commonsense reasoning benchmarks. The evaluation results present an average accuracy improvement of up to 6.67%, with individual tasks showing peak gains of up to 10.75%.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PEML pairs a neural prompt optimizer with LoRA for multi-task PEFT and reports modest gains on GLUE-style suites, but the numbers are single-run point estimates with no error bars or ablations shown.

read the letter

The core idea here is straightforward: instead of hand-designed prefix tuning or plain LoRA, they train a small neural network to produce the continuous prompts while also applying low-rank updates to the model weights, all in a shared multi-task setup. That combination is the main novelty relative to prior PEFT work. The paper shows the method on GLUE, SuperGLUE, MMLU, and commonsense tasks, claiming an average lift of 6.67% and peaks near 10.75% over MTL-LoRA, MultiLoRA, C-Poly, and MoE baselines. If the full experiments back this up with proper controls, it could be a practical tweak for anyone already using LoRA on multiple tasks at once. What the work does cleanly is keep the parameter count low while trying to optimize both the prompt space and the weight space jointly, which is a reasonable direction given how prompt tuning and adapter methods have evolved separately. The soft spot is the evaluation. The abstract and stress-test note both point to single-run accuracy numbers with no reported variance, no multiple seeds, and no statistical tests. That makes it hard to know whether the deltas are stable or just the result of extra hyperparameter freedom in the prompt network. There is also no ablation that isolates the neural prompt optimizer from the LoRA part, so we cannot tell how much each piece contributes. The architecture details for the prompt network itself are not spelled out in the summary, which leaves open questions about initialization, training schedule, and whether the baselines received equivalent tuning effort. For a reader already deep in PEFT literature, the paper is worth a look to see the exact construction and whether the gains survive a re-run. It is not yet strong enough to cite without verification, but the question it asks is legitimate and the method is simple enough to reproduce. I would send it to peer review so referees can check the missing controls and ask for the ablations and error bars.

Referee Report

2 major / 2 minor

Summary. The paper proposes PEML, a parameter-efficient multi-task learning method that combines a neural architecture for optimizing continuous prompts with low-rank adaptation (LoRA) of model weights. It evaluates this approach against baselines including MTL-LoRA, MultiLoRA, C-Poly, and MoE on GLUE, SuperGLUE, MMLU, and commonsense reasoning benchmarks, claiming average accuracy improvements of up to 6.67% and peak per-task gains of up to 10.75%.

Significance. If the results prove robust under proper statistical validation and with full methodological disclosure, PEML could advance PEFT techniques by addressing the gap in prompt optimization for multi-task settings, enabling more resource-efficient adaptation of LLMs across tasks. The core idea of jointly engineering prompts and weights is a reasonable extension of existing work, but the current lack of architectural details and experimental rigor limits its immediate impact.

major comments (2)

[Abstract/Evaluation] Abstract and Evaluation section: The headline performance claims (average +6.67%, peak +10.75%) rest exclusively on single-run point estimates with no reported variance, standard deviations, multiple random seeds, or statistical significance tests. This directly undermines the central claim of consistent outperformance over MTL-LoRA/MultiLoRA/C-Poly/MoE, as the extra degrees of freedom in the prompt optimizer and joint training schedule could favor PEML under favorable hyperparameter choices.
[Methods] Methods section: The neural architecture for continuous prompt optimization is described only at a high level with no specifics on its structure, layer count, initialization, parameter count relative to baselines, or the exact joint optimization schedule with LoRA. Without these details the parameter-efficiency claim cannot be verified and reproduction is impossible.

minor comments (2)

[Abstract] Abstract: The acronym is introduced as Parameter-Efficient Multi-task Learning (PM) but rendered as PEML in the title; standardize notation and expand on first use.
[Abstract] Abstract: Typo 'adaption' should read 'adaptation' in the sentence describing Prefix Tuning limitations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below and will revise the paper to incorporate the suggested improvements for greater rigor and reproducibility.

read point-by-point responses

Referee: [Abstract/Evaluation] Abstract and Evaluation section: The headline performance claims (average +6.67%, peak +10.75%) rest exclusively on single-run point estimates with no reported variance, standard deviations, multiple random seeds, or statistical significance tests. This directly undermines the central claim of consistent outperformance over MTL-LoRA/MultiLoRA/C-Poly/MoE, as the extra degrees of freedom in the prompt optimizer and joint training schedule could favor PEML under favorable hyperparameter choices.

Authors: We agree that reporting only single-run point estimates limits the strength of our performance claims. In the revised manuscript, we will rerun all experiments across multiple random seeds (minimum of five), report mean accuracies with standard deviations for PEML and all baselines, and include statistical significance tests (e.g., paired t-tests) to confirm that the observed gains of up to 6.67% average and 10.75% peak are robust rather than artifacts of a single favorable run. This will directly address the concern about extra degrees of freedom in the prompt optimizer. revision: yes
Referee: [Methods] Methods section: The neural architecture for continuous prompt optimization is described only at a high level with no specifics on its structure, layer count, initialization, parameter count relative to baselines, or the exact joint optimization schedule with LoRA. Without these details the parameter-efficiency claim cannot be verified and reproduction is impossible.

Authors: We acknowledge the description of the prompt optimization network is currently high-level. In the revised Methods section we will add complete specifications: the exact architecture (number of layers, hidden size, activations), initialization procedure, total parameter count of the prompt optimizer relative to LoRA and other baselines, and the full joint training schedule (learning rates, optimizer, number of epochs, and how prompt and LoRA parameters are co-optimized). These additions will enable verification of the parameter-efficiency claims and full reproducibility. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical method proposal with benchmark evaluation

full rationale

The paper proposes PEML as a neural prompt optimizer plus LoRA for multi-task PEFT and reports accuracy gains on GLUE/SuperGLUE/MMLU/commonsense suites versus MTL-LoRA, MultiLoRA, C-Poly, and MoE. No derivation chain, equations, or 'predictions' are present that reduce to fitted inputs or self-citations by construction. All claims rest on direct experimental comparisons; the architecture choices and optimization are described as novel contributions rather than derived from prior self-referential results.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only abstract available; no specific free parameters, axioms or invented entities identifiable from the provided text.

pith-pipeline@v0.9.0 · 5608 in / 982 out tokens · 47446 ms · 2026-05-15T05:22:17.529353+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

PrefixNAS generates candidate prefix architectures Ai(α) via continuous relaxation and softmax, then prunes to argmax operation.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

97 extracted references · 97 canonical work pages · 30 internal anchors

[1]

International conference on machine learning , pages=

Hyperprompt: Prompt-based task-conditioning of transformers , author=. International conference on machine learning , pages=. 2022 , organization=

work page 2022
[2]

Parameter-efficient multi-task fine-tuning for transformers via shared hypernetworks , author=. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) , pages=

work page
[3]

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Unipelt: A unified framework for parameter-efficient language model tuning , author=. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

work page
[4]

H yper L o RA : Efficient Cross-task Generalization via Constrained Low-Rank Adapters Generation

Lv, Chuancheng and Li, Lei and Zhang, Shitou and Chen, Gang and Qi, Fanchao and Zhang, Ningyu and Zheng, Hai-Tao. H yper L o RA : Efficient Cross-task Generalization via Constrained Low-Rank Adapters Generation. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.956

work page doi:10.18653/v1/2024.findings-emnlp.956 2024
[5]

arXiv preprint arXiv:2307.13269 , year=

Lorahub: Efficient cross-task generalization via dynamic lora composition , author=. arXiv preprint arXiv:2307.13269 , year=

work page arXiv
[6]

arXiv preprint arXiv:2506.06105 , year=

Text-to-lora: Instant transformer adaption , author=. arXiv preprint arXiv:2506.06105 , year=

work page arXiv
[7]

Findings of the Association for Computational Linguistics: EMNLP 2024 , pages=

Hyperlora: Efficient cross-task generalization via constrained low-rank adapters generation , author=. Findings of the Association for Computational Linguistics: EMNLP 2024 , pages=

work page 2024
[8]

Advances in Neural Information Processing Systems , volume=

Hydralora: An asymmetric lora architecture for efficient fine-tuning , author=. Advances in Neural Information Processing Systems , volume=

work page
[9]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

LoRAMoE: Alleviating world knowledge forgetting in large language models via MoE-style plugin , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

work page
[10]

arXiv preprint arXiv:2501.06252 , year=

Transformer-squared: Self-adaptive llms , author=. arXiv preprint arXiv:2501.06252 , year=

work page arXiv
[11]

arXiv preprint arXiv:2407.01411 , year=

Hyperloader: Integrating hypernetwork-based lora and adapter layers into multi-task transformers for sequence labelling , author=. arXiv preprint arXiv:2407.01411 , year=

work page arXiv
[12]

SIAM Journal on Optimization , volume =

Ghadimi, Saeed and Lan, Guanghui , title =. SIAM Journal on Optimization , volume =

work page
[13]

The Fourteenth International Conference on Learning Representations , year =

PrefixMemory-Tuning: Modernizing Prefix-Tuning by Decoupling the Prefix from Attention , author=. The Fourteenth International Conference on Learning Representations , year =

work page
[14]

Advances in Neural Information Processing Systems , volume=

Bridging discrete and backpropagation: Straight-through and beyond , author=. Advances in Neural Information Processing Systems , volume=

work page
[15]

IEEE transactions on pattern analysis and machine intelligence , volume=

A review of the gumbel-max trick and its extensions for discrete stochasticity in machine learning , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2022 , publisher=

work page 2022
[16]

SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

Alex Wang and Yada Pruksachatkun and Nikita Nangia and Amanpreet Singh and Julian Michael and Felix Hill and Omer Levy and Samuel R. Bowman , title =. CoRR , volume =. 2019 , url =. 1905.00537 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv 2019
[17]

Proceedings of the International Conference on Learning Representations (ICLR) , year=

Measuring Massive Multitask Language Understanding , author=. Proceedings of the International Conference on Learning Representations (ICLR) , year=

work page
[18]

and Ng, Andrew and Potts, Christopher

Socher, Richard and Perelygin, Alex and Wu, Jean and Chuang, Jason and Manning, Christopher D. and Ng, Andrew and Potts, Christopher. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 2013

work page 2013
[19]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Mtl-lora: Low-rank adaptation for multi-task learning , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page
[20]

Training Verifiers to Solve Math Word Problems

Training verifiers to solve math word problems , author=. arXiv preprint arXiv:2110.14168 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[21]

Liu, and Christopher D

See, Abigail and Liu, Peter J. and Manning, Christopher D. Get To The Point: Summarization with Pointer-Generator Networks. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017. doi:10.18653/v1/P17-1099

work page doi:10.18653/v1/p17-1099 2017
[22]

Quantifying the Carbon Emissions of Machine Learning

Quantifying the carbon emissions of machine learning , author=. arXiv preprint arXiv:1910.09700 , year=

work page internal anchor Pith review arXiv 1910
[23]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

PIQA: Reasoning about Physical Commonsense in Natural Language , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page
[24]

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages=

Social IQa: Commonsense Reasoning about Social Interactions , author=. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages=

work page 2019
[25]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

WinoGrande: An Adversarial Winograd Schema Challenge at Scale , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page
[26]

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages=

Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering , author=. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages=

work page 2018
[27]

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL) , pages=

HellaSwag: Can a Machine Really Finish Your Sentence? , author=. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL) , pages=

work page
[28]

Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge , author=. arXiv preprint arXiv:1803.05457 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[29]

arXiv preprint arXiv:2304.11127 , year=

Tree-structured parzen estimator: Understanding its algorithm components and their roles for better empirical performance , author=. arXiv preprint arXiv:2304.11127 , year=

work page arXiv
[30]

arXiv preprint arXiv:2312.03248 , year=

Customizable combination of parameter-efficient modules for multi-task learning , author=. arXiv preprint arXiv:2312.03248 , year=

work page arXiv
[31]

SemEval-2017 Task 1: Semantic Textual Similarity - Multilingual and Cross-lingual Focused Evaluation

Daniel M. Cer and Mona T. Diab and Eneko Agirre and I. SemEval-2017 Task 1: Semantic Textual Similarity - Multilingual and Cross-lingual Focused Evaluation , journal =. 2017 , url =. 1708.00055 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv 2017
[32]

LLaMA: Open and Efficient Foundation Language Models

Llama: Open and efficient foundation language models , author=. arXiv preprint arXiv:2302.13971 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[33]

arXiv preprint arXiv:1805.12471 , year=

Neural Network Acceptability Judgments , author=. arXiv preprint arXiv:1805.12471 , year=

work page arXiv
[34]

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

Alex Wang and Amanpreet Singh and Julian Michael and Felix Hill and Omer Levy and Samuel R. Bowman , title =. CoRR , volume =. 2018 , url =. 1804.07461 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv 2018
[35]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Llama 2: Open foundation and fine-tuned chat models , author=. arXiv preprint arXiv:2307.09288 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[36]

Liu , title =

Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu , title =. Journal of Machine Learning Research , year =

work page
[37]

LoRA: Low-Rank Adaptation of Large Language Models

Edward J. Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen. LoRA: Low-Rank Adaptation of Large Language Models , journal =. 2021 , url =. 2106.09685 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv 2021
[38]

DARTS: Differentiable Architecture Search

Hanxiao Liu and Karen Simonyan and Yiming Yang , title =. CoRR , volume =. 2018 , url =. 1806.09055 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv 2018
[39]

Optuna: A Next-generation Hyperparameter Optimization Framework

Takuya Akiba and Shotaro Sano and Toshihiko Yanase and Takeru Ohta and Masanori Koyama , title =. CoRR , volume =. 2019 , url =. 1907.10902 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv 2019
[40]

BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions

Christopher Clark and Kenton Lee and Ming. BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions , journal =. 2019 , url =. 1905.10044 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv 2019
[41]

2019 , url=

The CommitmentBank: Investigating projection in naturally occurring discourse , author=. 2019 , url=

work page 2019
[42]

AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning , year=

Choice of Plausible Alternatives: An Evaluation of Commonsense Causal Reasoning , author=. AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning , year=

work page
[43]

Looking Beyond the Surface: A Challenge Set for Reading Comprehension over Multiple Sentences

Khashabi, Daniel and Chaturvedi, Snigdha and Roth, Michael and Upadhyay, Shyam and Roth, Dan. Looking Beyond the Surface: A Challenge Set for Reading Comprehension over Multiple Sentences. Proceedings of the 2018 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)....

work page doi:10.18653/v1/n18-1023 2018
[44]

CoRR , volume =

Adam Poliak , title =. CoRR , volume =. 2020 , url =. 2010.03061 , timestamp =

work page arXiv 2020
[45]

WiC: the Word-in-Context Dataset for Evaluating Context-Sensitive Meaning Representations

Mohammad Taher Pilehvar and Jos. WiC: 10, 000 Example Pairs for Evaluating Context-Sensitive Representations , journal =. 2018 , url =. 1808.09121 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv 2018
[46]

and Davis, Ernest and Morgenstern, Leora , title =

Levesque, Hector J. and Davis, Ernest and Morgenstern, Leora , title =. Proceedings of the Thirteenth International Conference on Principles of Knowledge Representation and Reasoning , pages =. 2012 , isbn =

work page 2012
[47]

The PASCAL Recognising Textual Entailment Challenge

Dagan, Ido and Glickman, Oren and Magnini, Bernardo. The PASCAL Recognising Textual Entailment Challenge. Machine Learning Challenges. Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Tectual Entailment. 2006

work page 2006
[48]

, author =

Accelerate: Training and inference at scale made simple, efficient and adaptable. , author =

work page
[49]

Tune: A Research Platform for Distributed Model Selection and Training

Tune: A Research Platform for Distributed Model Selection and Training , author=. arXiv preprint arXiv:1807.05118 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[50]

Parameter-Efficient Transfer Learning for NLP

Neil Houlsby and Andrei Giurgiu and Stanislaw Jastrzebski and Bruna Morrone and Quentin de Laroussilhe and Andrea Gesmundo and Mona Attariyan and Sylvain Gelly , title =. CoRR , volume =. 2019 , url =. 1902.00751 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv 2019
[51]

Towards a Unified View of Parameter-Efficient Transfer Learning , journal =

Junxian He and Chunting Zhou and Xuezhe Ma and Taylor Berg. Towards a Unified View of Parameter-Efficient Transfer Learning , journal =. 2021 , url =. 2110.04366 , timestamp =

work page arXiv 2021
[52]

CoRR , volume =

Rabeeh Karimi Mahabadi and James Henderson and Sebastian Ruder , title =. CoRR , volume =. 2021 , url =. 2106.04647 , timestamp =

work page arXiv 2021
[53]

The Power of Scale for Parameter-Efficient Prompt Tuning

Brian Lester and Rami Al. The Power of Scale for Parameter-Efficient Prompt Tuning , journal =. 2021 , url =. 2104.08691 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv 2021
[54]

2023 , eprint=

Residual Prompt Tuning: Improving Prompt Tuning with Residual Americanization , author=. 2023 , eprint=

work page 2023
[55]

2023 , eprint=

Non-Intrusive Adaptation: Input-Centric Parameter-efficient Fine-Tuning for Versatile Multimodal Modeling , author=. 2023 , eprint=

work page 2023
[56]

Prefix-Tuning: Optimizing Continuous Prompts for Generation

Xiang Lisa Li and Percy Liang , title =. CoRR , volume =. 2021 , url =. 2101.00190 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv 2021
[57]

AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning

AdaLoRA: Adaptive budget allocation for parameter-efficient fine-tuning , author=. arXiv preprint arXiv:2303.10512 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[58]

2023 , eprint=

A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA , author=. 2023 , eprint=

work page 2023
[59]

2024 , eprint=

DoRA: Weight-Decomposed Low-Rank Adaptation , author=. 2024 , eprint=

work page 2024
[60]

2024 , eprint=

Navigating Text-To-Image Customization: From LyCORIS Fine-Tuning to Model Evaluation , author=. 2024 , eprint=

work page 2024
[61]

2024 , eprint=

VeRA: Vector-based Random Matrix Adaptation , author=. 2024 , eprint=

work page 2024
[62]

CoRR , volume =

Yu Zhang and Qiang Yang , title =. CoRR , volume =. 2017 , url =. 1707.08114 , timestamp =

work page arXiv 2017
[63]

An Overview of Multi-Task Learning in Deep Neural Networks

Sebastian Ruder , title =. CoRR , volume =. 2017 , url =. 1706.05098 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv 2017
[64]

Proceedings of the 31st ACM International Conference on Information & Knowledge Management , pages=

Match-prompt: Improving multi-task generalization ability for neural text matching via prompt learning , author=. Proceedings of the 31st ACM International Conference on Information & Knowledge Management , pages=

work page
[65]

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

Noam Shazeer and Azalia Mirhoseini and Krzysztof Maziarz and Andy Davis and Quoc V. Le and Geoffrey E. Hinton and Jeff Dean , title =. CoRR , volume =. 2017 , url =. 1701.06538 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv 2017
[66]

Exploring and Predicting Transferability across

Tu Vu and Tong Wang and Tsendsuren Munkhdalai and Alessandro Sordoni and Adam Trischler and Andrew Mattarella. Exploring and Predicting Transferability across. CoRR , volume =. 2020 , url =. 2005.00770 , timestamp =

work page arXiv 2020
[67]

CoRR , volume =

Armen Aghajanyan and Anchit Gupta and Akshat Shrivastava and Xilun Chen and Luke Zettlemoyer and Sonal Gupta , title =. CoRR , volume =. 2021 , url =. 2101.11038 , timestamp =

work page arXiv 2021
[68]

Multitask Prompted Training Enables Zero-Shot Task Generalization

Victor Sanh and Albert Webson and Colin Raffel and Stephen H. Bach and Lintang Sutawika and Zaid Alyafeai and Antoine Chaffin and Arnaud Stiegler and Teven Le Scao and Arun Raja and Manan Dey and M. Saiful Bari and Canwen Xu and Urmish Thakker and Shanya Sharma and Eliza Szczechla and Taewoon Kim and Gunjan Chhablani and Nihal V. Nayak and Debajyoti Datta...

work page internal anchor Pith review Pith/arXiv arXiv 2021
[69]

2022 , eprint=

Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks , author=. 2022 , eprint=

work page 2022
[70]

2022 , eprint=

Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning , author=. 2022 , eprint=

work page 2022
[71]

Finetuned Language Models Are Zero-Shot Learners

Jason Wei and Maarten Bosma and Vincent Y. Zhao and Kelvin Guu and Adams Wei Yu and Brian Lester and Nan Du and Andrew M. Dai and Quoc V. Le , title =. CoRR , volume =. 2021 , url =. 2109.01652 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv 2021
[72]

AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning , journal =

Ximeng Sun and Rameswar Panda and Rog. AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning , journal =. 2019 , url =. 1911.12423 , timestamp =

work page arXiv 2019
[73]

Cross-stitch Networks for Multi-task Learning

Ishan Misra and Abhinav Shrivastava and Abhinav Gupta and Martial Hebert , title =. CoRR , volume =. 2016 , url =. 1604.03539 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv 2016
[74]

2023 , eprint=

Multitask Prompt Tuning Enables Parameter-Efficient Transfer Learning , author=. 2023 , eprint=

work page 2023
[75]

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

Mike Lewis and Yinhan Liu and Naman Goyal and Marjan Ghazvininejad and Abdelrahman Mohamed and Omer Levy and Veselin Stoyanov and Luke Zettlemoyer , title =. CoRR , volume =. 2019 , url =. 1910.13461 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv 2019
[76]

Joty and Steven C

Yue Wang and Weishi Wang and Shafiq R. Joty and Steven C. H. Hoi , title =. CoRR , volume =. 2021 , url =. 2109.00859 , timestamp =

work page arXiv 2021
[77]

2024 , eprint=

Improving In-context Learning via Bidirectional Alignment , author=. 2024 , eprint=

work page 2024
[78]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin and Ming. CoRR , volume =. 2018 , url =. 1810.04805 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv 2018
[79]

Universal Language Model Fine-tuning for Text Classification

Jeremy Howard and Sebastian Ruder , title =. CoRR , volume =. 2018 , url =. 1801.06146 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv 2018
[80]

2023 , eprint=

Sparse Low-rank Adaptation of Pre-trained Language Models , author=. 2023 , eprint=

work page 2023

Showing first 80 references.