Recognition: 2 theorem links
· Lean TheoremPEML: Parameter-efficient Multi-Task Learning with Optimized Continuous Prompts
Pith reviewed 2026-05-15 05:22 UTC · model grok-4.3
The pith
PEML jointly optimizes continuous prompts via neural architecture engineering and low-rank model adaptation to improve multi-task LLM performance.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
PEML employs a neural architecture engineering method for optimizing the continuous prompts while also performing low-rank adaption for model weights. On the GLUE, SuperGLUE, Massive Multitask Language Understanding, and commonsense reasoning benchmarks it delivers an average accuracy improvement of up to 6.67 percent, with individual tasks reaching peak gains of up to 10.75 percent over state-of-the-art multi-task baselines.
What carries the argument
PEML framework that pairs neural-architecture prompt optimization with low-rank adaptation of model weights to co-optimize prompt and model adaptation for multi-task learning.
If this is right
- Outperforms MTL-LoRA, MultiLoRA, C-Poly, and MoE on GLUE, SuperGLUE, MMLU, and commonsense reasoning tasks.
- Delivers up to 6.67 percent average accuracy gain and up to 10.75 percent peak gains on individual tasks.
- Supports resource consolidation by fine-tuning one LLM for multiple tasks instead of separate models.
- Lowers overall data requirements for fine-tuning by leveraging shared features across tasks.
Where Pith is reading between the lines
- The same prompt-plus-adaptation pattern could be tested on non-language domains such as vision-language models to check transfer.
- If the gains hold at larger scales, PEML-style co-optimization might become a default template for any multi-task LLM deployment.
- Developers could explore whether the automated prompt-optimization framework reduces the need for task-specific prompt engineering expertise.
Load-bearing premise
The assumption that the neural architecture for prompt optimization combined with low-rank adaptation will consistently outperform existing methods across diverse tasks without introducing new overfitting risks or requiring extensive hyperparameter tuning.
What would settle it
Evaluating PEML on a fresh collection of multi-task benchmarks outside the original GLUE, SuperGLUE, MMLU, and commonsense sets and checking whether the reported accuracy margins disappear.
Figures
read the original abstract
Parameter-Efficient Fine-Tuning (PEFT) is widely used for adapting Large Language Models (LLMs) for various tasks. Recently, there has been an increasing demand for fine-tuning a single LLM for multiple tasks because it requires overall less data for fine-tuning thanks to the common features shared among tasks. More importantly, LLMs are resource demanding and deploying a single model for multiple tasks facilitates resource consolidation and consumes significantly less resources compared to deploying individual large model for each task. Existing PEFT methods like LoRA and Prefix Tuning are designed to adapt LLMs to a specific task. LoRA and its variation focus on aligning the model itself for tasks, overlooking the importance of prompt tuning in multi-task learning while Prefix Tuning only adopts a simple architecture to optimize prompts, which limits the adaption capabilities for multi-task. To enable efficient fine-tuning for multi-task learning, it is important to co-optimize prompt optimization and model adaptation. In this work, we propose a Parameter-Efficient Multi-task Learning (\PM), which employs a neural architecture engineering method for optimizing the continuous prompts while also performing low-rank adaption for model weights. We prototype PEML by creating an automated framework for optimizing the continuous prompts and adapting model weights. We evaluate PEML against state-of-the-arts multi-task learning methods MTL-LoRA, MultiLoRa, C-Poly, and MoE, on the GLUE, SuperGLUE, Massive Multitask Language Understanding, and commonsense reasoning benchmarks. The evaluation results present an average accuracy improvement of up to 6.67%, with individual tasks showing peak gains of up to 10.75%.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes PEML, a parameter-efficient multi-task learning method that combines a neural architecture for optimizing continuous prompts with low-rank adaptation (LoRA) of model weights. It evaluates this approach against baselines including MTL-LoRA, MultiLoRA, C-Poly, and MoE on GLUE, SuperGLUE, MMLU, and commonsense reasoning benchmarks, claiming average accuracy improvements of up to 6.67% and peak per-task gains of up to 10.75%.
Significance. If the results prove robust under proper statistical validation and with full methodological disclosure, PEML could advance PEFT techniques by addressing the gap in prompt optimization for multi-task settings, enabling more resource-efficient adaptation of LLMs across tasks. The core idea of jointly engineering prompts and weights is a reasonable extension of existing work, but the current lack of architectural details and experimental rigor limits its immediate impact.
major comments (2)
- [Abstract/Evaluation] Abstract and Evaluation section: The headline performance claims (average +6.67%, peak +10.75%) rest exclusively on single-run point estimates with no reported variance, standard deviations, multiple random seeds, or statistical significance tests. This directly undermines the central claim of consistent outperformance over MTL-LoRA/MultiLoRA/C-Poly/MoE, as the extra degrees of freedom in the prompt optimizer and joint training schedule could favor PEML under favorable hyperparameter choices.
- [Methods] Methods section: The neural architecture for continuous prompt optimization is described only at a high level with no specifics on its structure, layer count, initialization, parameter count relative to baselines, or the exact joint optimization schedule with LoRA. Without these details the parameter-efficiency claim cannot be verified and reproduction is impossible.
minor comments (2)
- [Abstract] Abstract: The acronym is introduced as Parameter-Efficient Multi-task Learning (PM) but rendered as PEML in the title; standardize notation and expand on first use.
- [Abstract] Abstract: Typo 'adaption' should read 'adaptation' in the sentence describing Prefix Tuning limitations.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below and will revise the paper to incorporate the suggested improvements for greater rigor and reproducibility.
read point-by-point responses
-
Referee: [Abstract/Evaluation] Abstract and Evaluation section: The headline performance claims (average +6.67%, peak +10.75%) rest exclusively on single-run point estimates with no reported variance, standard deviations, multiple random seeds, or statistical significance tests. This directly undermines the central claim of consistent outperformance over MTL-LoRA/MultiLoRA/C-Poly/MoE, as the extra degrees of freedom in the prompt optimizer and joint training schedule could favor PEML under favorable hyperparameter choices.
Authors: We agree that reporting only single-run point estimates limits the strength of our performance claims. In the revised manuscript, we will rerun all experiments across multiple random seeds (minimum of five), report mean accuracies with standard deviations for PEML and all baselines, and include statistical significance tests (e.g., paired t-tests) to confirm that the observed gains of up to 6.67% average and 10.75% peak are robust rather than artifacts of a single favorable run. This will directly address the concern about extra degrees of freedom in the prompt optimizer. revision: yes
-
Referee: [Methods] Methods section: The neural architecture for continuous prompt optimization is described only at a high level with no specifics on its structure, layer count, initialization, parameter count relative to baselines, or the exact joint optimization schedule with LoRA. Without these details the parameter-efficiency claim cannot be verified and reproduction is impossible.
Authors: We acknowledge the description of the prompt optimization network is currently high-level. In the revised Methods section we will add complete specifications: the exact architecture (number of layers, hidden size, activations), initialization procedure, total parameter count of the prompt optimizer relative to LoRA and other baselines, and the full joint training schedule (learning rates, optimizer, number of epochs, and how prompt and LoRA parameters are co-optimized). These additions will enable verification of the parameter-efficiency claims and full reproducibility. revision: yes
Circularity Check
No circularity: empirical method proposal with benchmark evaluation
full rationale
The paper proposes PEML as a neural prompt optimizer plus LoRA for multi-task PEFT and reports accuracy gains on GLUE/SuperGLUE/MMLU/commonsense suites versus MTL-LoRA, MultiLoRA, C-Poly, and MoE. No derivation chain, equations, or 'predictions' are present that reduce to fitted inputs or self-citations by construction. All claims rest on direct experimental comparisons; the architecture choices and optimization are described as novel contributions rather than derived from prior self-referential results.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
PrefixNAS generates candidate prefix architectures Ai(α) via continuous relaxation and softmax, then prunes to argmax operation.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
International conference on machine learning , pages=
Hyperprompt: Prompt-based task-conditioning of transformers , author=. International conference on machine learning , pages=. 2022 , organization=
work page 2022
-
[2]
Parameter-efficient multi-task fine-tuning for transformers via shared hypernetworks , author=. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) , pages=
-
[3]
Unipelt: A unified framework for parameter-efficient language model tuning , author=. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
-
[4]
H yper L o RA : Efficient Cross-task Generalization via Constrained Low-Rank Adapters Generation
Lv, Chuancheng and Li, Lei and Zhang, Shitou and Chen, Gang and Qi, Fanchao and Zhang, Ningyu and Zheng, Hai-Tao. H yper L o RA : Efficient Cross-task Generalization via Constrained Low-Rank Adapters Generation. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.956
-
[5]
arXiv preprint arXiv:2307.13269 , year=
Lorahub: Efficient cross-task generalization via dynamic lora composition , author=. arXiv preprint arXiv:2307.13269 , year=
-
[6]
arXiv preprint arXiv:2506.06105 , year=
Text-to-lora: Instant transformer adaption , author=. arXiv preprint arXiv:2506.06105 , year=
-
[7]
Findings of the Association for Computational Linguistics: EMNLP 2024 , pages=
Hyperlora: Efficient cross-task generalization via constrained low-rank adapters generation , author=. Findings of the Association for Computational Linguistics: EMNLP 2024 , pages=
work page 2024
-
[8]
Advances in Neural Information Processing Systems , volume=
Hydralora: An asymmetric lora architecture for efficient fine-tuning , author=. Advances in Neural Information Processing Systems , volume=
-
[9]
LoRAMoE: Alleviating world knowledge forgetting in large language models via MoE-style plugin , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
-
[10]
arXiv preprint arXiv:2501.06252 , year=
Transformer-squared: Self-adaptive llms , author=. arXiv preprint arXiv:2501.06252 , year=
-
[11]
arXiv preprint arXiv:2407.01411 , year=
Hyperloader: Integrating hypernetwork-based lora and adapter layers into multi-task transformers for sequence labelling , author=. arXiv preprint arXiv:2407.01411 , year=
-
[12]
SIAM Journal on Optimization , volume =
Ghadimi, Saeed and Lan, Guanghui , title =. SIAM Journal on Optimization , volume =
-
[13]
The Fourteenth International Conference on Learning Representations , year =
PrefixMemory-Tuning: Modernizing Prefix-Tuning by Decoupling the Prefix from Attention , author=. The Fourteenth International Conference on Learning Representations , year =
-
[14]
Advances in Neural Information Processing Systems , volume=
Bridging discrete and backpropagation: Straight-through and beyond , author=. Advances in Neural Information Processing Systems , volume=
-
[15]
IEEE transactions on pattern analysis and machine intelligence , volume=
A review of the gumbel-max trick and its extensions for discrete stochasticity in machine learning , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2022 , publisher=
work page 2022
-
[16]
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
Alex Wang and Yada Pruksachatkun and Nikita Nangia and Amanpreet Singh and Julian Michael and Felix Hill and Omer Levy and Samuel R. Bowman , title =. CoRR , volume =. 2019 , url =. 1905.00537 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[17]
Proceedings of the International Conference on Learning Representations (ICLR) , year=
Measuring Massive Multitask Language Understanding , author=. Proceedings of the International Conference on Learning Representations (ICLR) , year=
-
[18]
and Ng, Andrew and Potts, Christopher
Socher, Richard and Perelygin, Alex and Wu, Jean and Chuang, Jason and Manning, Christopher D. and Ng, Andrew and Potts, Christopher. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 2013
work page 2013
-
[19]
Proceedings of the AAAI Conference on Artificial Intelligence , volume=
Mtl-lora: Low-rank adaptation for multi-task learning , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
-
[20]
Training Verifiers to Solve Math Word Problems
Training verifiers to solve math word problems , author=. arXiv preprint arXiv:2110.14168 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[21]
See, Abigail and Liu, Peter J. and Manning, Christopher D. Get To The Point: Summarization with Pointer-Generator Networks. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017. doi:10.18653/v1/P17-1099
-
[22]
Quantifying the Carbon Emissions of Machine Learning
Quantifying the carbon emissions of machine learning , author=. arXiv preprint arXiv:1910.09700 , year=
work page internal anchor Pith review arXiv 1910
-
[23]
Proceedings of the AAAI Conference on Artificial Intelligence , volume=
PIQA: Reasoning about Physical Commonsense in Natural Language , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
-
[24]
Social IQa: Commonsense Reasoning about Social Interactions , author=. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages=
work page 2019
-
[25]
Proceedings of the AAAI Conference on Artificial Intelligence , volume=
WinoGrande: An Adversarial Winograd Schema Challenge at Scale , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
-
[26]
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering , author=. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages=
work page 2018
-
[27]
HellaSwag: Can a Machine Really Finish Your Sentence? , author=. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL) , pages=
-
[28]
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge , author=. arXiv preprint arXiv:1803.05457 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[29]
arXiv preprint arXiv:2304.11127 , year=
Tree-structured parzen estimator: Understanding its algorithm components and their roles for better empirical performance , author=. arXiv preprint arXiv:2304.11127 , year=
-
[30]
arXiv preprint arXiv:2312.03248 , year=
Customizable combination of parameter-efficient modules for multi-task learning , author=. arXiv preprint arXiv:2312.03248 , year=
-
[31]
SemEval-2017 Task 1: Semantic Textual Similarity - Multilingual and Cross-lingual Focused Evaluation
Daniel M. Cer and Mona T. Diab and Eneko Agirre and I. SemEval-2017 Task 1: Semantic Textual Similarity - Multilingual and Cross-lingual Focused Evaluation , journal =. 2017 , url =. 1708.00055 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[32]
LLaMA: Open and Efficient Foundation Language Models
Llama: Open and efficient foundation language models , author=. arXiv preprint arXiv:2302.13971 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[33]
arXiv preprint arXiv:1805.12471 , year=
Neural Network Acceptability Judgments , author=. arXiv preprint arXiv:1805.12471 , year=
-
[34]
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Wang and Amanpreet Singh and Julian Michael and Felix Hill and Omer Levy and Samuel R. Bowman , title =. CoRR , volume =. 2018 , url =. 1804.07461 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[35]
Llama 2: Open Foundation and Fine-Tuned Chat Models
Llama 2: Open foundation and fine-tuned chat models , author=. arXiv preprint arXiv:2307.09288 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[36]
Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu , title =. Journal of Machine Learning Research , year =
-
[37]
LoRA: Low-Rank Adaptation of Large Language Models
Edward J. Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen. LoRA: Low-Rank Adaptation of Large Language Models , journal =. 2021 , url =. 2106.09685 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[38]
DARTS: Differentiable Architecture Search
Hanxiao Liu and Karen Simonyan and Yiming Yang , title =. CoRR , volume =. 2018 , url =. 1806.09055 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[39]
Optuna: A Next-generation Hyperparameter Optimization Framework
Takuya Akiba and Shotaro Sano and Toshihiko Yanase and Takeru Ohta and Masanori Koyama , title =. CoRR , volume =. 2019 , url =. 1907.10902 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[40]
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
Christopher Clark and Kenton Lee and Ming. BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions , journal =. 2019 , url =. 1905.10044 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[41]
The CommitmentBank: Investigating projection in naturally occurring discourse , author=. 2019 , url=
work page 2019
-
[42]
AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning , year=
Choice of Plausible Alternatives: An Evaluation of Commonsense Causal Reasoning , author=. AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning , year=
-
[43]
Looking Beyond the Surface: A Challenge Set for Reading Comprehension over Multiple Sentences
Khashabi, Daniel and Chaturvedi, Snigdha and Roth, Michael and Upadhyay, Shyam and Roth, Dan. Looking Beyond the Surface: A Challenge Set for Reading Comprehension over Multiple Sentences. Proceedings of the 2018 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)....
-
[44]
Adam Poliak , title =. CoRR , volume =. 2020 , url =. 2010.03061 , timestamp =
-
[45]
WiC: the Word-in-Context Dataset for Evaluating Context-Sensitive Meaning Representations
Mohammad Taher Pilehvar and Jos. WiC: 10, 000 Example Pairs for Evaluating Context-Sensitive Representations , journal =. 2018 , url =. 1808.09121 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[46]
and Davis, Ernest and Morgenstern, Leora , title =
Levesque, Hector J. and Davis, Ernest and Morgenstern, Leora , title =. Proceedings of the Thirteenth International Conference on Principles of Knowledge Representation and Reasoning , pages =. 2012 , isbn =
work page 2012
-
[47]
The PASCAL Recognising Textual Entailment Challenge
Dagan, Ido and Glickman, Oren and Magnini, Bernardo. The PASCAL Recognising Textual Entailment Challenge. Machine Learning Challenges. Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Tectual Entailment. 2006
work page 2006
-
[48]
Accelerate: Training and inference at scale made simple, efficient and adaptable. , author =
-
[49]
Tune: A Research Platform for Distributed Model Selection and Training
Tune: A Research Platform for Distributed Model Selection and Training , author=. arXiv preprint arXiv:1807.05118 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[50]
Parameter-Efficient Transfer Learning for NLP
Neil Houlsby and Andrei Giurgiu and Stanislaw Jastrzebski and Bruna Morrone and Quentin de Laroussilhe and Andrea Gesmundo and Mona Attariyan and Sylvain Gelly , title =. CoRR , volume =. 2019 , url =. 1902.00751 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[51]
Towards a Unified View of Parameter-Efficient Transfer Learning , journal =
Junxian He and Chunting Zhou and Xuezhe Ma and Taylor Berg. Towards a Unified View of Parameter-Efficient Transfer Learning , journal =. 2021 , url =. 2110.04366 , timestamp =
-
[52]
Rabeeh Karimi Mahabadi and James Henderson and Sebastian Ruder , title =. CoRR , volume =. 2021 , url =. 2106.04647 , timestamp =
-
[53]
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester and Rami Al. The Power of Scale for Parameter-Efficient Prompt Tuning , journal =. 2021 , url =. 2104.08691 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[54]
Residual Prompt Tuning: Improving Prompt Tuning with Residual Americanization , author=. 2023 , eprint=
work page 2023
-
[55]
Non-Intrusive Adaptation: Input-Centric Parameter-efficient Fine-Tuning for Versatile Multimodal Modeling , author=. 2023 , eprint=
work page 2023
-
[56]
Prefix-Tuning: Optimizing Continuous Prompts for Generation
Xiang Lisa Li and Percy Liang , title =. CoRR , volume =. 2021 , url =. 2101.00190 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[57]
AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning
AdaLoRA: Adaptive budget allocation for parameter-efficient fine-tuning , author=. arXiv preprint arXiv:2303.10512 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[58]
A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA , author=. 2023 , eprint=
work page 2023
- [59]
-
[60]
Navigating Text-To-Image Customization: From LyCORIS Fine-Tuning to Model Evaluation , author=. 2024 , eprint=
work page 2024
- [61]
-
[62]
Yu Zhang and Qiang Yang , title =. CoRR , volume =. 2017 , url =. 1707.08114 , timestamp =
-
[63]
An Overview of Multi-Task Learning in Deep Neural Networks
Sebastian Ruder , title =. CoRR , volume =. 2017 , url =. 1706.05098 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[64]
Proceedings of the 31st ACM International Conference on Information & Knowledge Management , pages=
Match-prompt: Improving multi-task generalization ability for neural text matching via prompt learning , author=. Proceedings of the 31st ACM International Conference on Information & Knowledge Management , pages=
-
[65]
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Noam Shazeer and Azalia Mirhoseini and Krzysztof Maziarz and Andy Davis and Quoc V. Le and Geoffrey E. Hinton and Jeff Dean , title =. CoRR , volume =. 2017 , url =. 1701.06538 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[66]
Exploring and Predicting Transferability across
Tu Vu and Tong Wang and Tsendsuren Munkhdalai and Alessandro Sordoni and Adam Trischler and Andrew Mattarella. Exploring and Predicting Transferability across. CoRR , volume =. 2020 , url =. 2005.00770 , timestamp =
-
[67]
Armen Aghajanyan and Anchit Gupta and Akshat Shrivastava and Xilun Chen and Luke Zettlemoyer and Sonal Gupta , title =. CoRR , volume =. 2021 , url =. 2101.11038 , timestamp =
-
[68]
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh and Albert Webson and Colin Raffel and Stephen H. Bach and Lintang Sutawika and Zaid Alyafeai and Antoine Chaffin and Arnaud Stiegler and Teven Le Scao and Arun Raja and Manan Dey and M. Saiful Bari and Canwen Xu and Urmish Thakker and Shanya Sharma and Eliza Szczechla and Taewoon Kim and Gunjan Chhablani and Nihal V. Nayak and Debajyoti Datta...
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[69]
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks , author=. 2022 , eprint=
work page 2022
-
[70]
Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning , author=. 2022 , eprint=
work page 2022
-
[71]
Finetuned Language Models Are Zero-Shot Learners
Jason Wei and Maarten Bosma and Vincent Y. Zhao and Kelvin Guu and Adams Wei Yu and Brian Lester and Nan Du and Andrew M. Dai and Quoc V. Le , title =. CoRR , volume =. 2021 , url =. 2109.01652 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[72]
AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning , journal =
Ximeng Sun and Rameswar Panda and Rog. AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning , journal =. 2019 , url =. 1911.12423 , timestamp =
-
[73]
Cross-stitch Networks for Multi-task Learning
Ishan Misra and Abhinav Shrivastava and Abhinav Gupta and Martial Hebert , title =. CoRR , volume =. 2016 , url =. 1604.03539 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[74]
Multitask Prompt Tuning Enables Parameter-Efficient Transfer Learning , author=. 2023 , eprint=
work page 2023
-
[75]
Mike Lewis and Yinhan Liu and Naman Goyal and Marjan Ghazvininejad and Abdelrahman Mohamed and Omer Levy and Veselin Stoyanov and Luke Zettlemoyer , title =. CoRR , volume =. 2019 , url =. 1910.13461 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[76]
Yue Wang and Weishi Wang and Shafiq R. Joty and Steven C. H. Hoi , title =. CoRR , volume =. 2021 , url =. 2109.00859 , timestamp =
-
[77]
Improving In-context Learning via Bidirectional Alignment , author=. 2024 , eprint=
work page 2024
-
[78]
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin and Ming. CoRR , volume =. 2018 , url =. 1810.04805 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[79]
Universal Language Model Fine-tuning for Text Classification
Jeremy Howard and Sebastian Ruder , title =. CoRR , volume =. 2018 , url =. 1801.06146 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[80]
Sparse Low-rank Adaptation of Pre-trained Language Models , author=. 2023 , eprint=
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.