pith. machine review for the scientific record. sign in

arxiv: 2605.14055 · v1 · submitted 2026-05-13 · 💻 cs.CL · cs.AI

Recognition: 2 theorem links

· Lean Theorem

PEML: Parameter-efficient Multi-Task Learning with Optimized Continuous Prompts

Authors on Pith no claims yet

Pith reviewed 2026-05-15 05:22 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords parameter-efficient fine-tuningmulti-task learningcontinuous promptslow-rank adaptationprompt optimizationGLUE benchmarkSuperGLUElarge language models
0
0 comments X

The pith

PEML jointly optimizes continuous prompts via neural architecture engineering and low-rank model adaptation to improve multi-task LLM performance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes PEML to overcome limits of single-task methods like LoRA and Prefix Tuning when handling multiple tasks at once. Existing approaches either focus only on model weights or use simple prompt structures that restrict adaptation. PEML instead co-optimizes prompt tuning with a dedicated neural architecture and applies low-rank adaptation to the model weights. This shared setup exploits common features across tasks to cut data needs and deployment costs. Tests on GLUE, SuperGLUE, MMLU, and commonsense benchmarks show gains over MTL-LoRA, MultiLoRA, C-Poly, and MoE.

Core claim

PEML employs a neural architecture engineering method for optimizing the continuous prompts while also performing low-rank adaption for model weights. On the GLUE, SuperGLUE, Massive Multitask Language Understanding, and commonsense reasoning benchmarks it delivers an average accuracy improvement of up to 6.67 percent, with individual tasks reaching peak gains of up to 10.75 percent over state-of-the-art multi-task baselines.

What carries the argument

PEML framework that pairs neural-architecture prompt optimization with low-rank adaptation of model weights to co-optimize prompt and model adaptation for multi-task learning.

If this is right

  • Outperforms MTL-LoRA, MultiLoRA, C-Poly, and MoE on GLUE, SuperGLUE, MMLU, and commonsense reasoning tasks.
  • Delivers up to 6.67 percent average accuracy gain and up to 10.75 percent peak gains on individual tasks.
  • Supports resource consolidation by fine-tuning one LLM for multiple tasks instead of separate models.
  • Lowers overall data requirements for fine-tuning by leveraging shared features across tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same prompt-plus-adaptation pattern could be tested on non-language domains such as vision-language models to check transfer.
  • If the gains hold at larger scales, PEML-style co-optimization might become a default template for any multi-task LLM deployment.
  • Developers could explore whether the automated prompt-optimization framework reduces the need for task-specific prompt engineering expertise.

Load-bearing premise

The assumption that the neural architecture for prompt optimization combined with low-rank adaptation will consistently outperform existing methods across diverse tasks without introducing new overfitting risks or requiring extensive hyperparameter tuning.

What would settle it

Evaluating PEML on a fresh collection of multi-task benchmarks outside the original GLUE, SuperGLUE, MMLU, and commonsense sets and checking whether the reported accuracy margins disappear.

Figures

Figures reproduced from arXiv: 2605.14055 by Anjir Ahmed Chowdhury, Feng Yan, Syed Zawad, Xiaolong Ma, Xu Dong.

Figure 1
Figure 1. Figure 1: Overview of PEML. Hid￾den states are projected into queries, keys, and values using WQ, WK and WV , with LoRA applied to key and value projections. Learnable prefix vec￾tors PK and PV are prepended to the key and value sequences, enabling task￾specific conditioning during multi-head attention. LoRA Hu et al. [2021] and Prefix Tuning Li and Liang [2021] are among prominent PEFT methods for adapting models t… view at source ↗
Figure 2
Figure 2. Figure 2: Unified view of PEML method. The left side illustrates LoRA, where matrix B is set to 0 and matrix A follows a normal distribution N(0, σ 2 ). The right side represents PrefixNAS, showcasing the optimal architecture derived from the search process. 4 [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The performance of PEML on SuperGLUE benchmark with different sensitive hyperparameter configurations. 6 Conclusion In this research, we introduced PEML, a novel approach designed to overcome the limitations of existing state-of-the-art (SOTA) PEFT methods in multi-task learning. SOTA techniques such as MTL, MultiLoRA, C-Poly, and MoE often struggle with challenges such as prompt misalignment, static task … view at source ↗
Figure 4
Figure 4. Figure 4: Trade-off between HellaSwag spe￾cialization and overall average accuracy as task sampling weight γ varies. The default config￾uration (γ=0.1) optimizes for overall average performance. We interpret these findings as reflecting the inherent trade-off in multi-task PEFT models between learning shared representations for generalization and retaining task-specific flexibility for optimal adaptation. This is al… view at source ↗
Figure 5
Figure 5. Figure 5: Computational cost and GLUE performance of PEFT methods. Another challenge in comparing PEML to baselines such as MTL-LoRA is that these methods report only the final training time, despite requiring substantial hyperparameter sweeps to achieve competi￾tive performance. For instance, MTL-LoRA evaluates robustness across multiple hyperparameters, including the number of up-projection matrices (n), the tempe… view at source ↗
Figure 6
Figure 6. Figure 6: Compute accuracy trade￾off for PrefixNAS vs Two-stage vs surrogate-based NAS. Exploring two-stage or surrogate-based strategies for Prefix￾NAS can moderately reduce GPU usage by performing the architecture search on a smaller proxy or using an approximate performance predictor. As shown in [PITH_FULL_IMAGE:figures/full_fig_p022_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: (a) Throughput and (b) peak VRAM usage benchmarked when training LLaMA-7B with sequences of 1024 tokens. n × r on horizontal axis indicates total rank of MultiLoRA and PEML. 7.11 Stability of the Differentiable–Discrete Transition PrefixNAS conducts architecture search using a continuous relaxation and finalizes the architecture via an argmax operation 8. To evaluate the stability of this transition, we co… view at source ↗
Figure 8
Figure 8. Figure 8: (a) KDE contours of architectural gradient norms during search. Softmax + Argmax shows tighter concentration, indicating stable optimization. and (b) Distribution of architectural gradient norms during search. Softmax + Argmax exhibits tighter variance, indicating more stable optimization. In addition, we provide a box plot 8 of architectural gradient norms to illustrate the distributional stability during… view at source ↗
read the original abstract

Parameter-Efficient Fine-Tuning (PEFT) is widely used for adapting Large Language Models (LLMs) for various tasks. Recently, there has been an increasing demand for fine-tuning a single LLM for multiple tasks because it requires overall less data for fine-tuning thanks to the common features shared among tasks. More importantly, LLMs are resource demanding and deploying a single model for multiple tasks facilitates resource consolidation and consumes significantly less resources compared to deploying individual large model for each task. Existing PEFT methods like LoRA and Prefix Tuning are designed to adapt LLMs to a specific task. LoRA and its variation focus on aligning the model itself for tasks, overlooking the importance of prompt tuning in multi-task learning while Prefix Tuning only adopts a simple architecture to optimize prompts, which limits the adaption capabilities for multi-task. To enable efficient fine-tuning for multi-task learning, it is important to co-optimize prompt optimization and model adaptation. In this work, we propose a Parameter-Efficient Multi-task Learning (\PM), which employs a neural architecture engineering method for optimizing the continuous prompts while also performing low-rank adaption for model weights. We prototype PEML by creating an automated framework for optimizing the continuous prompts and adapting model weights. We evaluate PEML against state-of-the-arts multi-task learning methods MTL-LoRA, MultiLoRa, C-Poly, and MoE, on the GLUE, SuperGLUE, Massive Multitask Language Understanding, and commonsense reasoning benchmarks. The evaluation results present an average accuracy improvement of up to 6.67%, with individual tasks showing peak gains of up to 10.75%.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes PEML, a parameter-efficient multi-task learning method that combines a neural architecture for optimizing continuous prompts with low-rank adaptation (LoRA) of model weights. It evaluates this approach against baselines including MTL-LoRA, MultiLoRA, C-Poly, and MoE on GLUE, SuperGLUE, MMLU, and commonsense reasoning benchmarks, claiming average accuracy improvements of up to 6.67% and peak per-task gains of up to 10.75%.

Significance. If the results prove robust under proper statistical validation and with full methodological disclosure, PEML could advance PEFT techniques by addressing the gap in prompt optimization for multi-task settings, enabling more resource-efficient adaptation of LLMs across tasks. The core idea of jointly engineering prompts and weights is a reasonable extension of existing work, but the current lack of architectural details and experimental rigor limits its immediate impact.

major comments (2)
  1. [Abstract/Evaluation] Abstract and Evaluation section: The headline performance claims (average +6.67%, peak +10.75%) rest exclusively on single-run point estimates with no reported variance, standard deviations, multiple random seeds, or statistical significance tests. This directly undermines the central claim of consistent outperformance over MTL-LoRA/MultiLoRA/C-Poly/MoE, as the extra degrees of freedom in the prompt optimizer and joint training schedule could favor PEML under favorable hyperparameter choices.
  2. [Methods] Methods section: The neural architecture for continuous prompt optimization is described only at a high level with no specifics on its structure, layer count, initialization, parameter count relative to baselines, or the exact joint optimization schedule with LoRA. Without these details the parameter-efficiency claim cannot be verified and reproduction is impossible.
minor comments (2)
  1. [Abstract] Abstract: The acronym is introduced as Parameter-Efficient Multi-task Learning (PM) but rendered as PEML in the title; standardize notation and expand on first use.
  2. [Abstract] Abstract: Typo 'adaption' should read 'adaptation' in the sentence describing Prefix Tuning limitations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below and will revise the paper to incorporate the suggested improvements for greater rigor and reproducibility.

read point-by-point responses
  1. Referee: [Abstract/Evaluation] Abstract and Evaluation section: The headline performance claims (average +6.67%, peak +10.75%) rest exclusively on single-run point estimates with no reported variance, standard deviations, multiple random seeds, or statistical significance tests. This directly undermines the central claim of consistent outperformance over MTL-LoRA/MultiLoRA/C-Poly/MoE, as the extra degrees of freedom in the prompt optimizer and joint training schedule could favor PEML under favorable hyperparameter choices.

    Authors: We agree that reporting only single-run point estimates limits the strength of our performance claims. In the revised manuscript, we will rerun all experiments across multiple random seeds (minimum of five), report mean accuracies with standard deviations for PEML and all baselines, and include statistical significance tests (e.g., paired t-tests) to confirm that the observed gains of up to 6.67% average and 10.75% peak are robust rather than artifacts of a single favorable run. This will directly address the concern about extra degrees of freedom in the prompt optimizer. revision: yes

  2. Referee: [Methods] Methods section: The neural architecture for continuous prompt optimization is described only at a high level with no specifics on its structure, layer count, initialization, parameter count relative to baselines, or the exact joint optimization schedule with LoRA. Without these details the parameter-efficiency claim cannot be verified and reproduction is impossible.

    Authors: We acknowledge the description of the prompt optimization network is currently high-level. In the revised Methods section we will add complete specifications: the exact architecture (number of layers, hidden size, activations), initialization procedure, total parameter count of the prompt optimizer relative to LoRA and other baselines, and the full joint training schedule (learning rates, optimizer, number of epochs, and how prompt and LoRA parameters are co-optimized). These additions will enable verification of the parameter-efficiency claims and full reproducibility. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical method proposal with benchmark evaluation

full rationale

The paper proposes PEML as a neural prompt optimizer plus LoRA for multi-task PEFT and reports accuracy gains on GLUE/SuperGLUE/MMLU/commonsense suites versus MTL-LoRA, MultiLoRA, C-Poly, and MoE. No derivation chain, equations, or 'predictions' are present that reduce to fitted inputs or self-citations by construction. All claims rest on direct experimental comparisons; the architecture choices and optimization are described as novel contributions rather than derived from prior self-referential results.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only abstract available; no specific free parameters, axioms or invented entities identifiable from the provided text.

pith-pipeline@v0.9.0 · 5608 in / 982 out tokens · 47446 ms · 2026-05-15T05:22:17.529353+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

97 extracted references · 97 canonical work pages · 30 internal anchors

  1. [1]

    International conference on machine learning , pages=

    Hyperprompt: Prompt-based task-conditioning of transformers , author=. International conference on machine learning , pages=. 2022 , organization=

  2. [2]

    Parameter-efficient multi-task fine-tuning for transformers via shared hypernetworks , author=. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) , pages=

  3. [3]

    Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

    Unipelt: A unified framework for parameter-efficient language model tuning , author=. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

  4. [4]

    H yper L o RA : Efficient Cross-task Generalization via Constrained Low-Rank Adapters Generation

    Lv, Chuancheng and Li, Lei and Zhang, Shitou and Chen, Gang and Qi, Fanchao and Zhang, Ningyu and Zheng, Hai-Tao. H yper L o RA : Efficient Cross-task Generalization via Constrained Low-Rank Adapters Generation. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.956

  5. [5]

    arXiv preprint arXiv:2307.13269 , year=

    Lorahub: Efficient cross-task generalization via dynamic lora composition , author=. arXiv preprint arXiv:2307.13269 , year=

  6. [6]

    arXiv preprint arXiv:2506.06105 , year=

    Text-to-lora: Instant transformer adaption , author=. arXiv preprint arXiv:2506.06105 , year=

  7. [7]

    Findings of the Association for Computational Linguistics: EMNLP 2024 , pages=

    Hyperlora: Efficient cross-task generalization via constrained low-rank adapters generation , author=. Findings of the Association for Computational Linguistics: EMNLP 2024 , pages=

  8. [8]

    Advances in Neural Information Processing Systems , volume=

    Hydralora: An asymmetric lora architecture for efficient fine-tuning , author=. Advances in Neural Information Processing Systems , volume=

  9. [9]

    Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

    LoRAMoE: Alleviating world knowledge forgetting in large language models via MoE-style plugin , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

  10. [10]

    arXiv preprint arXiv:2501.06252 , year=

    Transformer-squared: Self-adaptive llms , author=. arXiv preprint arXiv:2501.06252 , year=

  11. [11]

    arXiv preprint arXiv:2407.01411 , year=

    Hyperloader: Integrating hypernetwork-based lora and adapter layers into multi-task transformers for sequence labelling , author=. arXiv preprint arXiv:2407.01411 , year=

  12. [12]

    SIAM Journal on Optimization , volume =

    Ghadimi, Saeed and Lan, Guanghui , title =. SIAM Journal on Optimization , volume =

  13. [13]

    The Fourteenth International Conference on Learning Representations , year =

    PrefixMemory-Tuning: Modernizing Prefix-Tuning by Decoupling the Prefix from Attention , author=. The Fourteenth International Conference on Learning Representations , year =

  14. [14]

    Advances in Neural Information Processing Systems , volume=

    Bridging discrete and backpropagation: Straight-through and beyond , author=. Advances in Neural Information Processing Systems , volume=

  15. [15]

    IEEE transactions on pattern analysis and machine intelligence , volume=

    A review of the gumbel-max trick and its extensions for discrete stochasticity in machine learning , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2022 , publisher=

  16. [16]

    SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

    Alex Wang and Yada Pruksachatkun and Nikita Nangia and Amanpreet Singh and Julian Michael and Felix Hill and Omer Levy and Samuel R. Bowman , title =. CoRR , volume =. 2019 , url =. 1905.00537 , timestamp =

  17. [17]

    Proceedings of the International Conference on Learning Representations (ICLR) , year=

    Measuring Massive Multitask Language Understanding , author=. Proceedings of the International Conference on Learning Representations (ICLR) , year=

  18. [18]

    and Ng, Andrew and Potts, Christopher

    Socher, Richard and Perelygin, Alex and Wu, Jean and Chuang, Jason and Manning, Christopher D. and Ng, Andrew and Potts, Christopher. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 2013

  19. [19]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume=

    Mtl-lora: Low-rank adaptation for multi-task learning , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

  20. [20]

    Training Verifiers to Solve Math Word Problems

    Training verifiers to solve math word problems , author=. arXiv preprint arXiv:2110.14168 , year=

  21. [21]

    Liu, and Christopher D

    See, Abigail and Liu, Peter J. and Manning, Christopher D. Get To The Point: Summarization with Pointer-Generator Networks. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017. doi:10.18653/v1/P17-1099

  22. [22]

    Quantifying the Carbon Emissions of Machine Learning

    Quantifying the carbon emissions of machine learning , author=. arXiv preprint arXiv:1910.09700 , year=

  23. [23]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume=

    PIQA: Reasoning about Physical Commonsense in Natural Language , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

  24. [24]

    Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages=

    Social IQa: Commonsense Reasoning about Social Interactions , author=. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages=

  25. [25]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume=

    WinoGrande: An Adversarial Winograd Schema Challenge at Scale , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

  26. [26]

    Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages=

    Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering , author=. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages=

  27. [27]

    Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL) , pages=

    HellaSwag: Can a Machine Really Finish Your Sentence? , author=. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL) , pages=

  28. [28]

    Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

    Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge , author=. arXiv preprint arXiv:1803.05457 , year=

  29. [29]

    arXiv preprint arXiv:2304.11127 , year=

    Tree-structured parzen estimator: Understanding its algorithm components and their roles for better empirical performance , author=. arXiv preprint arXiv:2304.11127 , year=

  30. [30]

    arXiv preprint arXiv:2312.03248 , year=

    Customizable combination of parameter-efficient modules for multi-task learning , author=. arXiv preprint arXiv:2312.03248 , year=

  31. [31]

    SemEval-2017 Task 1: Semantic Textual Similarity - Multilingual and Cross-lingual Focused Evaluation

    Daniel M. Cer and Mona T. Diab and Eneko Agirre and I. SemEval-2017 Task 1: Semantic Textual Similarity - Multilingual and Cross-lingual Focused Evaluation , journal =. 2017 , url =. 1708.00055 , timestamp =

  32. [32]

    LLaMA: Open and Efficient Foundation Language Models

    Llama: Open and efficient foundation language models , author=. arXiv preprint arXiv:2302.13971 , year=

  33. [33]

    arXiv preprint arXiv:1805.12471 , year=

    Neural Network Acceptability Judgments , author=. arXiv preprint arXiv:1805.12471 , year=

  34. [34]

    GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

    Alex Wang and Amanpreet Singh and Julian Michael and Felix Hill and Omer Levy and Samuel R. Bowman , title =. CoRR , volume =. 2018 , url =. 1804.07461 , timestamp =

  35. [35]

    Llama 2: Open Foundation and Fine-Tuned Chat Models

    Llama 2: Open foundation and fine-tuned chat models , author=. arXiv preprint arXiv:2307.09288 , year=

  36. [36]

    Liu , title =

    Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu , title =. Journal of Machine Learning Research , year =

  37. [37]

    LoRA: Low-Rank Adaptation of Large Language Models

    Edward J. Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen. LoRA: Low-Rank Adaptation of Large Language Models , journal =. 2021 , url =. 2106.09685 , timestamp =

  38. [38]

    DARTS: Differentiable Architecture Search

    Hanxiao Liu and Karen Simonyan and Yiming Yang , title =. CoRR , volume =. 2018 , url =. 1806.09055 , timestamp =

  39. [39]

    Optuna: A Next-generation Hyperparameter Optimization Framework

    Takuya Akiba and Shotaro Sano and Toshihiko Yanase and Takeru Ohta and Masanori Koyama , title =. CoRR , volume =. 2019 , url =. 1907.10902 , timestamp =

  40. [40]

    BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions

    Christopher Clark and Kenton Lee and Ming. BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions , journal =. 2019 , url =. 1905.10044 , timestamp =

  41. [41]

    2019 , url=

    The CommitmentBank: Investigating projection in naturally occurring discourse , author=. 2019 , url=

  42. [42]

    AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning , year=

    Choice of Plausible Alternatives: An Evaluation of Commonsense Causal Reasoning , author=. AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning , year=

  43. [43]

    Looking Beyond the Surface: A Challenge Set for Reading Comprehension over Multiple Sentences

    Khashabi, Daniel and Chaturvedi, Snigdha and Roth, Michael and Upadhyay, Shyam and Roth, Dan. Looking Beyond the Surface: A Challenge Set for Reading Comprehension over Multiple Sentences. Proceedings of the 2018 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)....

  44. [44]

    CoRR , volume =

    Adam Poliak , title =. CoRR , volume =. 2020 , url =. 2010.03061 , timestamp =

  45. [45]

    WiC: the Word-in-Context Dataset for Evaluating Context-Sensitive Meaning Representations

    Mohammad Taher Pilehvar and Jos. WiC: 10, 000 Example Pairs for Evaluating Context-Sensitive Representations , journal =. 2018 , url =. 1808.09121 , timestamp =

  46. [46]

    and Davis, Ernest and Morgenstern, Leora , title =

    Levesque, Hector J. and Davis, Ernest and Morgenstern, Leora , title =. Proceedings of the Thirteenth International Conference on Principles of Knowledge Representation and Reasoning , pages =. 2012 , isbn =

  47. [47]

    The PASCAL Recognising Textual Entailment Challenge

    Dagan, Ido and Glickman, Oren and Magnini, Bernardo. The PASCAL Recognising Textual Entailment Challenge. Machine Learning Challenges. Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Tectual Entailment. 2006

  48. [48]

    , author =

    Accelerate: Training and inference at scale made simple, efficient and adaptable. , author =

  49. [49]

    Tune: A Research Platform for Distributed Model Selection and Training

    Tune: A Research Platform for Distributed Model Selection and Training , author=. arXiv preprint arXiv:1807.05118 , year=

  50. [50]

    Parameter-Efficient Transfer Learning for NLP

    Neil Houlsby and Andrei Giurgiu and Stanislaw Jastrzebski and Bruna Morrone and Quentin de Laroussilhe and Andrea Gesmundo and Mona Attariyan and Sylvain Gelly , title =. CoRR , volume =. 2019 , url =. 1902.00751 , timestamp =

  51. [51]

    Towards a Unified View of Parameter-Efficient Transfer Learning , journal =

    Junxian He and Chunting Zhou and Xuezhe Ma and Taylor Berg. Towards a Unified View of Parameter-Efficient Transfer Learning , journal =. 2021 , url =. 2110.04366 , timestamp =

  52. [52]

    CoRR , volume =

    Rabeeh Karimi Mahabadi and James Henderson and Sebastian Ruder , title =. CoRR , volume =. 2021 , url =. 2106.04647 , timestamp =

  53. [53]

    The Power of Scale for Parameter-Efficient Prompt Tuning

    Brian Lester and Rami Al. The Power of Scale for Parameter-Efficient Prompt Tuning , journal =. 2021 , url =. 2104.08691 , timestamp =

  54. [54]

    2023 , eprint=

    Residual Prompt Tuning: Improving Prompt Tuning with Residual Americanization , author=. 2023 , eprint=

  55. [55]

    2023 , eprint=

    Non-Intrusive Adaptation: Input-Centric Parameter-efficient Fine-Tuning for Versatile Multimodal Modeling , author=. 2023 , eprint=

  56. [56]

    Prefix-Tuning: Optimizing Continuous Prompts for Generation

    Xiang Lisa Li and Percy Liang , title =. CoRR , volume =. 2021 , url =. 2101.00190 , timestamp =

  57. [57]

    AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning

    AdaLoRA: Adaptive budget allocation for parameter-efficient fine-tuning , author=. arXiv preprint arXiv:2303.10512 , year=

  58. [58]

    2023 , eprint=

    A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA , author=. 2023 , eprint=

  59. [59]

    2024 , eprint=

    DoRA: Weight-Decomposed Low-Rank Adaptation , author=. 2024 , eprint=

  60. [60]

    2024 , eprint=

    Navigating Text-To-Image Customization: From LyCORIS Fine-Tuning to Model Evaluation , author=. 2024 , eprint=

  61. [61]

    2024 , eprint=

    VeRA: Vector-based Random Matrix Adaptation , author=. 2024 , eprint=

  62. [62]

    CoRR , volume =

    Yu Zhang and Qiang Yang , title =. CoRR , volume =. 2017 , url =. 1707.08114 , timestamp =

  63. [63]

    An Overview of Multi-Task Learning in Deep Neural Networks

    Sebastian Ruder , title =. CoRR , volume =. 2017 , url =. 1706.05098 , timestamp =

  64. [64]

    Proceedings of the 31st ACM International Conference on Information & Knowledge Management , pages=

    Match-prompt: Improving multi-task generalization ability for neural text matching via prompt learning , author=. Proceedings of the 31st ACM International Conference on Information & Knowledge Management , pages=

  65. [65]

    Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

    Noam Shazeer and Azalia Mirhoseini and Krzysztof Maziarz and Andy Davis and Quoc V. Le and Geoffrey E. Hinton and Jeff Dean , title =. CoRR , volume =. 2017 , url =. 1701.06538 , timestamp =

  66. [66]

    Exploring and Predicting Transferability across

    Tu Vu and Tong Wang and Tsendsuren Munkhdalai and Alessandro Sordoni and Adam Trischler and Andrew Mattarella. Exploring and Predicting Transferability across. CoRR , volume =. 2020 , url =. 2005.00770 , timestamp =

  67. [67]

    CoRR , volume =

    Armen Aghajanyan and Anchit Gupta and Akshat Shrivastava and Xilun Chen and Luke Zettlemoyer and Sonal Gupta , title =. CoRR , volume =. 2021 , url =. 2101.11038 , timestamp =

  68. [68]

    Multitask Prompted Training Enables Zero-Shot Task Generalization

    Victor Sanh and Albert Webson and Colin Raffel and Stephen H. Bach and Lintang Sutawika and Zaid Alyafeai and Antoine Chaffin and Arnaud Stiegler and Teven Le Scao and Arun Raja and Manan Dey and M. Saiful Bari and Canwen Xu and Urmish Thakker and Shanya Sharma and Eliza Szczechla and Taewoon Kim and Gunjan Chhablani and Nihal V. Nayak and Debajyoti Datta...

  69. [69]

    2022 , eprint=

    Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks , author=. 2022 , eprint=

  70. [70]

    2022 , eprint=

    Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning , author=. 2022 , eprint=

  71. [71]

    Finetuned Language Models Are Zero-Shot Learners

    Jason Wei and Maarten Bosma and Vincent Y. Zhao and Kelvin Guu and Adams Wei Yu and Brian Lester and Nan Du and Andrew M. Dai and Quoc V. Le , title =. CoRR , volume =. 2021 , url =. 2109.01652 , timestamp =

  72. [72]

    AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning , journal =

    Ximeng Sun and Rameswar Panda and Rog. AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning , journal =. 2019 , url =. 1911.12423 , timestamp =

  73. [73]

    Cross-stitch Networks for Multi-task Learning

    Ishan Misra and Abhinav Shrivastava and Abhinav Gupta and Martial Hebert , title =. CoRR , volume =. 2016 , url =. 1604.03539 , timestamp =

  74. [74]

    2023 , eprint=

    Multitask Prompt Tuning Enables Parameter-Efficient Transfer Learning , author=. 2023 , eprint=

  75. [75]

    BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

    Mike Lewis and Yinhan Liu and Naman Goyal and Marjan Ghazvininejad and Abdelrahman Mohamed and Omer Levy and Veselin Stoyanov and Luke Zettlemoyer , title =. CoRR , volume =. 2019 , url =. 1910.13461 , timestamp =

  76. [76]

    Joty and Steven C

    Yue Wang and Weishi Wang and Shafiq R. Joty and Steven C. H. Hoi , title =. CoRR , volume =. 2021 , url =. 2109.00859 , timestamp =

  77. [77]

    2024 , eprint=

    Improving In-context Learning via Bidirectional Alignment , author=. 2024 , eprint=

  78. [78]

    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

    Jacob Devlin and Ming. CoRR , volume =. 2018 , url =. 1810.04805 , timestamp =

  79. [79]

    Universal Language Model Fine-tuning for Text Classification

    Jeremy Howard and Sebastian Ruder , title =. CoRR , volume =. 2018 , url =. 1801.06146 , timestamp =

  80. [80]

    2023 , eprint=

    Sparse Low-rank Adaptation of Pre-trained Language Models , author=. 2023 , eprint=

Showing first 80 references.