Recognition: 2 theorem links
· Lean TheoremParameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey
Pith reviewed 2026-05-13 11:28 UTC · model grok-4.3
The pith
Parameter-efficient fine-tuning adapts large pre-trained models to new tasks while adding only a small number of parameters.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
PEFT refers to the process of adjusting the parameters of a pre-trained large model to adapt it to a specific task or domain while minimizing the number of additional parameters introduced or computational resources required. The survey presents comprehensive studies of various PEFT algorithms, examining their performance and computational overhead, provides an overview of applications developed using different PEFT algorithms, discusses common techniques to mitigate computation costs, and examines various real-world system designs to investigate the implementation costs.
What carries the argument
Parameter-Efficient Fine-Tuning (PEFT) algorithms that minimize the number of additional parameters or computational resources when adapting pre-trained large models.
If this is right
- Large models become adaptable on hardware platforms with constrained computational capabilities.
- The computational costs associated with customizing large models for downstream tasks are significantly reduced.
- Broader applications of large models are enabled across various fields through efficient adaptation methods.
Where Pith is reading between the lines
- Standardized evaluation benchmarks could allow more reliable direct comparisons between different PEFT methods.
- PEFT techniques may generalize to other model types like vision transformers with similar efficiency benefits.
- Hardware architectures optimized for sparse or partial updates could amplify the advantages of PEFT approaches.
Load-bearing premise
The selected PEFT methods and system designs are representative of the full current landscape and that performance comparisons drawn from cited works are directly comparable across papers.
What would settle it
A unified benchmark experiment running all surveyed PEFT methods on the same set of tasks and hardware to verify or contradict the performance and overhead summaries.
read the original abstract
Large models represent a groundbreaking advancement in multiple application fields, enabling remarkable achievements across various tasks. However, their unprecedented scale comes with significant computational costs. These models, often consisting of billions of parameters, require vast amounts of computational resources for execution. Especially, the expansive scale and computational demands pose considerable challenges when customizing them for particular downstream tasks, particularly over the hardware platforms constrained by computational capabilities. Parameter Efficient Fine-Tuning (PEFT) provides a practical solution by efficiently adjusting the large models over the various downstream tasks. In particular, PEFT refers to the process of adjusting the parameters of a pre-trained large model to adapt it to a specific task or domain while minimizing the number of additional parameters introduced or computational resources required. This approach is particularly important when dealing with large-scale language models with high parameter counts, as fine-tuning these models from scratch can be computationally expensive and resource-intensive, posing considerable challenges in the supporting system platform design. In this survey, we present comprehensive studies of various PEFT algorithms, examining their performance and computational overhead. Moreover, we provide an overview of applications developed using different PEFT algorithms and discuss common techniques employed to mitigate computation costs for PEFT. In addition to providing an extensive survey from an algorithmic standpoint, we also examine various real-world system designs to investigate the implementation costs associated with different PEFT approaches. This survey serves as a valuable resource for researchers aiming to understand both the PEFT algorithm and its system implementation, offering detailed ......
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper surveys Parameter-Efficient Fine-Tuning (PEFT) methods for large models, presenting comprehensive studies of various algorithms with examinations of their performance and computational overhead, an overview of applications developed using different PEFT approaches, common techniques to mitigate computation costs, and analysis of real-world system designs for implementation costs.
Significance. If the coverage is representative and comparisons are presented with appropriate caveats, this survey would serve as a useful reference for researchers working on efficient adaptation of large models, bridging algorithmic PEFT techniques with practical system-level considerations in resource-constrained settings.
major comments (2)
- [Performance and computational overhead examination sections] In the sections presenting performance and overhead comparisons (e.g., the algorithmic performance studies and system implementation analysis), aggregated metrics from cited works are contrasted directly. However, the source papers use varying base model families/sizes, training datasets, evaluation protocols, and hardware, without the survey re-implementing methods under a common benchmark or systematically normalizing for these differences. This assumption that cross-paper numbers are interchangeable is load-bearing for the central claim of examining relative performance and overhead.
- [Applications and mitigation techniques overview] The overview of applications and mitigation techniques would be strengthened by explicit discussion of how representative the selected methods are of the full landscape, including any gaps in coverage of recent variants or domain-specific adaptations.
minor comments (3)
- [Abstract] The abstract contains some repetitive phrasing in the definition of PEFT; tightening this would improve readability.
- [Figures and tables] Figures summarizing method taxonomies or comparison tables could include clearer captions noting the source of each reported metric to aid interpretation.
- [References] A few citations appear to reference preprints; confirming they have been updated to published versions where applicable would be helpful.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our survey. The comments have prompted us to strengthen the discussion of methodological caveats and coverage scope. We have revised the manuscript accordingly and believe these changes improve its utility as a reference.
read point-by-point responses
-
Referee: In the sections presenting performance and overhead comparisons (e.g., the algorithmic performance studies and system implementation analysis), aggregated metrics from cited works are contrasted directly. However, the source papers use varying base model families/sizes, training datasets, evaluation protocols, and hardware, without the survey re-implementing methods under a common benchmark or systematically normalizing for these differences. This assumption that cross-paper numbers are interchangeable is load-bearing for the central claim of examining relative performance and overhead.
Authors: We acknowledge the heterogeneity in experimental setups across the cited literature and agree that direct numerical comparisons carry inherent limitations. Our original manuscript already includes brief caveats in the performance tables and system analysis sections noting differences in base models and hardware. To address this more explicitly, we have added a new subsection titled 'Caveats in Cross-Paper Comparisons' that systematically discusses variations in model families, datasets, evaluation protocols, and hardware. We also added notes to key tables highlighting representative setups from source papers and emphasized that the compiled metrics are intended for qualitative trends rather than precise quantitative ranking. Re-implementing all methods under a unified benchmark exceeds the scope of a survey; such efforts are better suited to dedicated benchmark studies. We believe the expanded discussion now makes the load-bearing assumptions transparent while preserving the value of the aggregated overview. revision: partial
-
Referee: The overview of applications and mitigation techniques would be strengthened by explicit discussion of how representative the selected methods are of the full landscape, including any gaps in coverage of recent variants or domain-specific adaptations.
Authors: We appreciate this suggestion for improving transparency. The original manuscript selected methods based on citation impact, recency, and diversity across algorithmic families. In the revised version, we have inserted a dedicated paragraph in the 'Applications' and 'Mitigation Techniques' sections (and a short 'Scope and Limitations' subsection) that explicitly states our selection criteria, notes that the rapidly evolving PEFT landscape means some recent variants (e.g., new adapter compositions or domain-specific adaptations in vision-language or RL settings) may not be exhaustively covered, and highlights potential gaps such as limited coverage of certain low-resource domains. This addition clarifies representativeness without altering the core content. revision: yes
Circularity Check
No circularity: purely descriptive survey with no derivations or predictions
full rationale
This paper is a literature survey that aggregates and describes existing PEFT algorithms, their reported performance, applications, and system implementations. It contains no original equations, derivations, fitted parameters, or predictions that could reduce to inputs by construction. The central claims are overviews of prior work rather than new results justified by self-referential steps. No self-citation chains, ansatzes, or uniqueness theorems are invoked to force conclusions. Performance comparisons are presented as reported in source papers without re-derivation, so no fitted-input-called-prediction pattern applies. The survey is self-contained as a descriptive resource and does not rely on any load-bearing circular reductions.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 26 Pith papers
-
LOFT: Low-Rank Orthogonal Fine-Tuning via Task-Aware Support Selection
LOFT unifies orthogonal PEFT by treating adaptation as low-rank subspace rotation and adds task-aware support selection that improves efficiency under fixed budgets.
-
Revisiting Privacy Leakage in Machine Unlearning: Membership Inference Beyond the Forgotten Set
Unlearning increases privacy leakage for the retain set, and a new tri-class membership inference attack distinguishes forget, retain, and unseen data using pre- and post-unlearning model outputs.
-
Rectification Difficulty and Optimal Sample Allocation in LLM-Augmented Surveys
A method using predicted rectification difficulty for optimal human sample allocation in LLM-augmented surveys captures 61-79% of theoretical efficiency gains and reduces MSE by 11% on two datasets without pilot data.
-
HeiSD: Hybrid Speculative Decoding for Embodied Vision-Language-Action Models with Kinematic Awareness
HeiSD delivers up to 2.45x faster inference for embodied VLA models by hybridizing speculative decoding with kinematic boundary detection and error-mitigation tricks while preserving task success rates.
-
Towards the Next Frontier of LLMs, Training on Private Data: A Cross-Domain Benchmark for Federated Fine-Tuning
Federated PEFT on LLMs across healthcare and finance datasets performs close to centralized training and beats isolated local training under non-IID conditions.
-
Black-box model classification under the discriminative factorization
Discriminative factorization distinguishes high-quality query sets for black-box model classification, with chance-level error decaying exponentially in query budget and parameters predicting empirical decay rates on ...
-
Pretraining Induces a Reusable Spectral Basis for Downstream Task Adaptation
Pretraining induces stable leading singular vectors that form a reusable spectral basis inherited by downstream tasks, enabling competitive performance with 0.2% trainable parameters on GLUE.
-
Direct-to-Event Spiking Neural Network Transfer
This work provides the first systematic study of transferring direct-coded spiking neural networks to event-based representations while aiming to preserve accuracy and reduce energy use.
-
From History to State: Constant-Context Skill Learning for LLM Agents
Constant-context skill learning trains reusable task-family modules for LLM agents using a deterministic state block for progress tracking and subgoal rewards, achieving 89.6% unseen success on ALFWorld, 76.8% on WebS...
-
You Snooze, You Lose: Automatic Safety Alignment Restoration through Neural Weight Translation
NeWTral is a non-linear weight translation framework using MoE routing that reduces average attack success rate from 70% to 13% on unsafe domain adapters across Llama, Mistral, Qwen, and Gemma models up to 72B while r...
-
TLoRA: Task-aware Low Rank Adaptation of Large Language Models
TLoRA jointly optimizes LoRA initialization via task-data SVD and sensitivity-driven rank allocation, delivering stronger results than standard LoRA across NLU, reasoning, math, code, and chat tasks while using fewer ...
-
BioTrain: Sub-MB, Sub-50mW On-Device Fine-Tuning for Edge-AI on Biosignals
BioTrain enables full-network fine-tuning of biosignal AI models on edge MCUs with sub-MB memory and sub-50mW power, delivering up to 35% accuracy gains and 8.1x memory reduction.
-
MP-ISMoE: Mixed-Precision Interactive Side Mixture-of-Experts for Efficient Transfer Learning
MP-ISMoE uses Gaussian noise perturbed iterative quantization and interactive side mixture-of-experts to deliver higher accuracy than prior memory-efficient transfer learning methods while keeping similar parameter an...
-
The Art of (Mis)alignment: How Fine-Tuning Methods Effectively Misalign and Realign LLMs in Post-Training
ORPO is most effective at misaligning LLMs while DPO excels at realigning them, though it reduces utility, revealing an asymmetry between attack and defense methods.
-
Pretrain-then-Adapt: Uncertainty-Aware Test-Time Adaptation for Text-based Person Search
UATTA adapts pre-trained text-image models at test time without labels by using disagreement in bidirectional retrieval rankings to estimate and mitigate uncertainty for improved person search.
-
Fine-Tuning Integrity for Modern Neural Networks: Structured Drift Proofs via Norm, Rank, and Sparsity Certificates
Succinct Model Difference Proofs certify that a neural-network update stays inside a policy-defined drift class using zero-knowledge proofs whose cost depends only on the drift structure.
-
BLK-Assist: A Methodological Framework for Artist-Led Co-Creation with Generative AI Models
BLK-Assist is a three-part framework (Conceptor for sketches, Stencil for transparent assets, Upscale for high-res outputs) that fine-tunes public diffusion models on one artist's proprietary corpus for style-faithful...
-
HiP-LoRA: Budgeted Spectral Plasticity for Robust Low-Rank Adaptation
HiP-LoRA decomposes LoRA updates into principal and residual spectral channels with a singular-value-weighted stability budget to reduce forgetting and interference during foundation model adaptation.
-
Cross-Lingual Attention Distillation with Personality-Informed Generative Augmentation for Multilingual Personality Recognition
ADAM uses personality-guided LLM augmentation and cross-lingual attention distillation to raise balanced accuracy on multilingual personality recognition to 0.6332 on Essays and 0.7448 on Kaggle, outperforming standar...
-
Assessment of RAG and Fine-Tuning for Industrial Question-Answering-Applications
RAG is more effective and cost-efficient than fine-tuning for industrial QA adaptation on automotive datasets.
-
Low-Rank Adaptation of Geospatial Foundation Models for Wildfire Mapping Using Sentinel-2 Data
LoRA-adapted Prithvi-v2 achieves the highest accuracy and best cross-domain generalization for burned-area mapping on Sentinel-2 data compared to full fine-tuning across 3,820 wildfire events.
-
From Weights to Activations: Is Steering the Next Frontier of Adaptation?
Steering is positioned as a distinct adaptation paradigm that uses targeted activation interventions for local, reversible behavioral changes without parameter updates.
-
Low-Rank Adaptation Redux for Large Models
An overview revisits LoRA variants by categorizing advances in architectural design, efficient optimization, and applications while linking them to classical signal processing tools for principled fine-tuning.
-
High-Dimensional Statistics: Reflections on Progress and Open Problems
A survey synthesizing representative advances, common themes, and open problems in high-dimensional statistics while pointing to key entry-point works.
-
NTIRE 2026 Challenge on Bitstream-Corrupted Video Restoration: Methods and Results
The NTIRE 2026 Challenge establishes a benchmark for bitstream-corrupted video restoration and summarizes the top methods and observed trends from participating teams.
-
Redefining End-of-Life: Intelligent Automation for Electronics Remanufacturing Systems
A literature review of intelligent automation approaches using robotics, AI, and control for disassembly, inspection, sorting, and reprocessing of end-of-life electronics.
Reference graph
Works this paper leans on
-
[1]
Language mod- els are few-shot learners,
T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., “Language mod- els are few-shot learners,” Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020
work page 1901
-
[2]
Toolqa: A dataset for llm question answering with external tools,
Y . Zhuang, Y . Yu, K. Wang, H. Sun, and C. Zhang, “Toolqa: A dataset for llm question answering with external tools,” arXiv preprint arXiv:2306.13304, 2023
-
[3]
W. Zhu, H. Liu, Q. Dong, J. Xu, L. Kong, J. Chen, L. Li, and S. Huang, “Multilingual machine translation with large language models: Empir- ical results and analysis,” arXiv preprint arXiv:2304.04675 , 2023
-
[4]
A survey on large language models: Applications, challenges, limitations, and practical usage,
M. U. Hadi, R. Qureshi, A. Shah, M. Irfan, A. Zafar, M. Shaikh, N. Akhtar, J. Wu, and S. Mirjalili, “A survey on large language models: Applications, challenges, limitations, and practical usage,” TechRxiv, 2023
work page 2023
-
[5]
Gentopia: A collaborative platform for tool-augmented llms,
B. Xu, X. Liu, H. Shen, Z. Han, Y . Li, M. Yue, Z. Peng, Y . Liu, Z. Yao, and D. Xu, “Gentopia: A collaborative platform for tool-augmented llms,” arXiv preprint arXiv:2308.04030 , 2023
-
[6]
Camel: Communicative agents for
G. Li, H. A. A. K. Hammoud, H. Itani, D. Khizbullin, and B. Ghanem, “Camel: Communicative agents for ”mind” exploration of large lan- guage model society,” in Thirty-seventh Conference on Neural Infor- mation Processing Systems , 2023
work page 2023
-
[7]
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
Q. Wu, G. Bansal, J. Zhang, Y . Wu, S. Zhang, E. Zhu, B. Li, L. Jiang, X. Zhang, and C. Wang, “Autogen: Enabling next-gen llm applications via multi-agent conversation framework,” arXiv preprint arXiv:2308.08155, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[8]
Summit: Iterative text summarization via chatgpt,
H. Zhang, X. Liu, and J. Zhang, “Summit: Iterative text summarization via chatgpt,” arXiv preprint arXiv:2305.14835 , 2023
-
[9]
Root mean square layer normalization,
B. Zhang and R. Sennrich, “Root mean square layer normalization,” Advances in Neural Information Processing Systems , vol. 32, 2019
work page 2019
-
[10]
RoFormer: Enhanced Transformer with Rotary Position Embedding
J. Su, Y . Lu, S. Pan, A. Murtadha, B. Wen, and Y . Liu, “Roformer: Enhanced transformer with rotary position embedding,” arXiv preprint arXiv:2104.09864, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[11]
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. R. Bowman, “Glue: A multi-task benchmark and analysis platform for natural language understanding,” arXiv preprint arXiv:1804.07461 , 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[12]
Can a suit of armor conduct electricity? a new dataset for open book question answering,
T. Mihaylov, P. Clark, T. Khot, and A. Sabharwal, “Can a suit of armor conduct electricity? a new dataset for open book question answering,” in EMNLP, 2018
work page 2018
-
[13]
Piqa: Reasoning about physical commonsense in natural language,
Y . Bisk, R. Zellers, R. L. Bras, J. Gao, and Y . Choi, “Piqa: Reasoning about physical commonsense in natural language,” in Thirty-Fourth AAAI Conference on Artificial Intelligence , 2020
work page 2020
-
[14]
SocialIQA: Commonsense Reasoning about Social Interactions
M. Sap, H. Rashkin, D. Chen, R. LeBras, and Y . Choi, “Socialiqa: Commonsense reasoning about social interactions,” arXiv preprint arXiv:1904.09728, 2019
work page internal anchor Pith review arXiv 1904
-
[15]
Hellaswag: Can a machine really finish your sentence?
R. Zellers, A. Holtzman, Y . Bisk, A. Farhadi, and Y . Choi, “Hellaswag: Can a machine really finish your sentence?” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , 2019
work page 2019
-
[16]
Boolq: Exploring the surprising difficulty of natural yes/no questions,
C. e. a. Clark, “Boolq: Exploring the surprising difficulty of natural yes/no questions,” in NAACL, 2019
work page 2019
-
[17]
Winogrande: An adversarial winograd schema challenge at scale,
K. Sakaguchi, R. L. Bras, C. Bhagavatula, and Y . Choi, “Winogrande: An adversarial winograd schema challenge at scale,” Communications of the ACM , vol. 64, no. 9, pp. 99–106, 2021
work page 2021
-
[18]
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
P. Clark, I. Cowhey, O. Etzioni, T. Khot, A. Sabharwal, C. Schoenick, and O. Tafjord, “Think you have solved question answering? try arc, the ai2 reasoning challenge,” arXiv:1803.05457v1, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[19]
The Kinetics Human Action Video Dataset
W. Kay, J. Carreira, K. Simonyan, B. Zhang, C. Hillier, S. Vijaya- narasimhan, F. Viola, T. Green, T. Back, P. Natsev et al., “The kinetics human action video dataset,” arXiv preprint arXiv:1705.06950 , 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[20]
R. Goyal, S. Ebrahimi Kahou, V . Michalski, J. Materzynska, S. West- phal, H. Kim, V . Haenel, I. Fruend, P. Yianilos, M. Mueller-Freitag et al. , “The” something something” video database for learning and evaluating visual common sense,” in Proceedings of the IEEE interna- tional conference on computer vision , 2017, pp. 5842–5850
work page 2017
-
[21]
Hmdb: a large video database for human motion recognition,
H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, “Hmdb: a large video database for human motion recognition,” in 2011 Interna- tional conference on computer vision . IEEE, 2011, pp. 2556–2563
work page 2011
-
[22]
Microsoft coco: Common objects in context,
T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll ´ar, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13 . Springer, 2014, pp. 740–755
work page 2014
-
[23]
Scene parsing through ade20k dataset,
B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso, and A. Torralba, “Scene parsing through ade20k dataset,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 633– 641
work page 2017
-
[24]
The pascal visual object classes (voc) challenge,
M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisser- man, “The pascal visual object classes (voc) challenge,” International journal of computer vision , vol. 88, pp. 303–338, 2010
work page 2010
-
[25]
Parameter-efficient fine-tuning of large-scale pre-trained language models,
N. Ding, Y . Qin, G. Yang, F. Wei, Z. Yang, Y . Su, S. Hu, Y . Chen, C.- M. Chan, W. Chen et al., “Parameter-efficient fine-tuning of large-scale pre-trained language models,” Nature Machine Intelligence , vol. 5, no. 3, pp. 220–235, 2023
work page 2023
-
[26]
L. Xu, H. Xie, S.-Z. J. Qin, X. Tao, and F. L. Wang, “Parameter- efficient fine-tuning methods for pretrained language models: A critical review and assessment,” arXiv preprint arXiv:2312.12148 , 2023
-
[27]
Empirical analysis of the strengths and weaknesses of peft techniques for llms,
G. Pu, A. Jain, J. Yin, and R. Kaplan, “Empirical analysis of the strengths and weaknesses of peft techniques for llms,” arXiv preprint arXiv:2304.14999, 2023
- [28]
-
[29]
Microsoft azure function trace,
Microsoft, “Microsoft azure function trace,” https://github.com/Azure/AzurePublicDataset, 2023
work page 2023
-
[30]
Analysis, modeling and simulation of workload patterns in a large-scale utility cloud,
I. S. Moreno, P. Garraghan, P. Townend, and J. Xu, “Analysis, modeling and simulation of workload patterns in a large-scale utility cloud,”IEEE Transactions on Cloud Computing , vol. 2, no. 2, pp. 208–221, 2014
work page 2014
-
[31]
Parameter-efficient transfer learning for nlp,
N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Gesmundo, M. Attariyan, and S. Gelly, “Parameter-efficient transfer learning for nlp,” in International Conference on Machine Learning . PMLR, 2019, pp. 2790–2799
work page 2019
-
[32]
Towards a unified view of parameter-efficient transfer learning
J. He, C. Zhou, X. Ma, T. Berg-Kirkpatrick, and G. Neubig, “Towards a unified view of parameter-efficient transfer learning,” arXiv preprint arXiv:2110.04366, 2021
-
[33]
Counter- interference adapter for multilingual machine translation,
Y . Zhu, J. Feng, C. Zhao, M. Wang, and L. Li, “Counter- interference adapter for multilingual machine translation,” arXiv preprint arXiv:2104.08154, 2021
-
[34]
Conditional adapters: Parameter-efficient transfer learning with fast inference,
T. Lei, J. Bai, S. Brahma, J. Ainslie, K. Lee, Y . Zhou, N. Du, V . Y . Zhao, Y . Wu, B. Li et al. , “Conditional adapters: Parameter-efficient transfer learning with fast inference,” arXiv preprint arXiv:2304.04947, 2023
-
[35]
AdapterFusion: Non-Destructive Task Composition for Transfer Learning , journal =
J. Pfeiffer, A. Kamath, A. R ¨uckl´e, K. Cho, and I. Gurevych, “Adapter- fusion: Non-destructive task composition for transfer learning,” arXiv preprint arXiv:2005.00247, 2020
-
[36]
Adamix: Mixture-of-adapter for parameter-efficient tuning of large language models,
Y . Wang, S. Mukherjee, X. Liu, J. Gao, A. H. Awadallah, and J. Gao, “Adamix: Mixture-of-adapter for parameter-efficient tuning of large language models,” arXiv preprint arXiv:2205.12410, vol. 1, no. 2, p. 4, 2022
-
[37]
Prototype-based hyperadapter for sample- efficient multi-task tuning,
H. Zhao, J. Fu, and Z. He, “Prototype-based hyperadapter for sample- efficient multi-task tuning,” arXiv preprint arXiv:2310.11670 , 2023
-
[38]
Adaptersoup: Weight averaging to improve generalization of pretrained language models,
A. Chronopoulou, M. E. Peters, A. Fraser, and J. Dodge, “Adaptersoup: Weight averaging to improve generalization of pretrained language models,” arXiv preprint arXiv:2302.07027 , 2023
-
[39]
Mera: Merging pretrained adapters for few-shot learning,
S. He, R.-Z. Fan, L. Ding, L. Shen, T. Zhou, and D. Tao, “Mera: Merging pretrained adapters for few-shot learning,” arXiv preprint arXiv:2308.15982, 2023
-
[40]
arXiv preprint arXiv:2106.04489 , year=
R. K. Mahabadi, S. Ruder, M. Dehghani, and J. Henderson, “Parameter- efficient multi-task fine-tuning for transformers via shared hypernet- works,” arXiv preprint arXiv:2106.04489 , 2021
-
[41]
Prefix-Tuning: Optimizing Continuous Prompts for Generation
X. L. Li and P. Liang, “Prefix-tuning: Optimizing continuous prompts for generation,” arXiv preprint arXiv:2101.00190 , 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[42]
Prefix propaga- tion: Parameter-efficient tuning for long sequences,
J. Li, W. Aitken, R. Bhambhoria, and X. Zhu, “Prefix propaga- tion: Parameter-efficient tuning for long sequences,” arXiv preprint arXiv:2305.12086, 2023
-
[43]
arXiv preprint arXiv:2110.07602 , year=
X. Liu, K. Ji, Y . Fu, W. L. Tam, Z. Du, Z. Yang, and J. Tang, “P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks,” arXiv preprint arXiv:2110.07602 , 2021. 21
-
[44]
Towards adaptive prefix tuning for parameter-efficient language model fine-tuning,
Z.-R. Zhang, C. Tan, H. Xu, C. Wang, J. Huang, and S. Huang, “Towards adaptive prefix tuning for parameter-efficient language model fine-tuning,” arXiv preprint arXiv:2305.15212 , 2023
-
[45]
arXiv preprint arXiv:2103.10385 , year=
X. Liu, Y . Zheng, Z. Du, M. Ding, Y . Qian, Z. Yang, and J. Tang, “Gpt understands, too,” arXiv preprint arXiv:2103.10385 , 2021
-
[46]
The Power of Scale for Parameter-Efficient Prompt Tuning
B. Lester, R. Al-Rfou, and N. Constant, “The power of scale for parameter-efficient prompt tuning,” arXiv preprint arXiv:2104.08691 , 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[47]
Xprompt: Exploring the extreme of prompt tuning,
F. Ma, C. Zhang, L. Ren, J. Wang, Q. Wang, W. Wu, X. Quan, and D. Song, “Xprompt: Exploring the extreme of prompt tuning,” arXiv preprint arXiv:2210.04457, 2022
-
[48]
Idpg: An instance-dependent prompt generation method,
Z. Wu, S. Wang, J. Gu, R. Hou, Y . Dong, V . Vydiswaran, and H. Ma, “Idpg: An instance-dependent prompt generation method,” arXiv preprint arXiv:2204.04497 , 2022
-
[49]
Late prompt tuning: A late prompt could be better than many prompts,
X. Liu, T. Sun, X. Huang, and X. Qiu, “Late prompt tuning: A late prompt could be better than many prompts,” arXiv preprint arXiv:2210.11292, 2022
-
[50]
Spt: Learning to selectively insert prompts for better prompt tuning,
W. Zhu and M. Tan, “Spt: Learning to selectively insert prompts for better prompt tuning,” in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023, pp. 11 862– 11 878
work page 2023
-
[51]
Aprompt: Attention prompt tuning for efficient adaptation of pre-trained language models,
Q. Wang, Y . Mao, J. Wang, H. Yu, S. Nie, S. Wang, F. Feng, L. Huang, X. Quan, Z. Xu et al., “Aprompt: Attention prompt tuning for efficient adaptation of pre-trained language models,” in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , 2023, pp. 9147–9160
work page 2023
-
[52]
Spot: Better frozen model adaptation through soft prompt transfer,
T. Vu, B. Lester, N. Constant, R. Al-Rfou, and D. Cer, “Spot: Better frozen model adaptation through soft prompt transfer,” arXiv preprint arXiv:2110.07904, 2021
-
[53]
On Transferability of Prompt Tuning for Natural Language Understanding , journal =
Y . Su, X. Wang, Y . Qin, C.-M. Chan, Y . Lin, H. Wang, K. Wen, Z. Liu, P. Li, J. Li et al. , “On transferability of prompt tuning for natural language processing,” arXiv preprint arXiv:2111.06719 , 2021
-
[54]
Infoprompt: Information-theoretic soft prompt tuning for natural language understanding,
J. Wu, T. Yu, R. Wang, Z. Song, R. Zhang, H. Zhao, C. Lu, S. Li, and R. Henao, “Infoprompt: Information-theoretic soft prompt tuning for natural language understanding,” arXiv preprint arXiv:2306.04933, 2023
-
[55]
Ptp: Boosting stability and performance of prompt tuning with perturbation-based regularizer,
L. Chen, H. Huang, and M. Cheng, “Ptp: Boosting stability and performance of prompt tuning with perturbation-based regularizer,” arXiv preprint arXiv:2305.02423 , 2023
-
[56]
Exploring universal intrinsic task subspace via prompt tuning,
Y . Qin, X. Wang, Y . Su, Y . Lin, N. Ding, J. Yi, W. Chen, Z. Liu, J. Li, L. Hou et al., “Exploring universal intrinsic task subspace via prompt tuning,” arXiv preprint arXiv:2110.07867 , 2021
-
[57]
Smop: Towards efficient and effective prompt tuning with sparse mixture-of-prompts,
J.-Y . Choi, J. Kim, J.-H. Park, W.-L. Mok, and S. Lee, “Smop: Towards efficient and effective prompt tuning with sparse mixture-of-prompts,” in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023, pp. 14 306–14 316
work page 2023
-
[58]
Dept: Decomposed prompt tuning for parameter- efficient fine-tuning,
Z. Shi and A. Lipani, “Dept: Decomposed prompt tuning for parameter- efficient fine-tuning,” arXiv preprint arXiv:2309.05173 , 2023
-
[59]
Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning,
H. Liu, D. Tam, M. Muqeeth, J. Mohta, T. Huang, M. Bansal, and C. A. Raffel, “Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning,” Advances in Neural Information Processing Systems, vol. 35, pp. 1950–1965, 2022
work page 1950
-
[60]
T. Zadouri, A. ¨Ust¨un, A. Ahmadian, B. Ermis ¸, A. Locatelli, and S. Hooker, “Pushing mixture of experts to the limit: Extremely parameter efficient moe for instruction tuning,” arXiv preprint arXiv:2309.05444, 2023
-
[61]
Scaling & shifting your features: A new baseline for efficient model tuning,
D. Lian, D. Zhou, J. Feng, and X. Wang, “Scaling & shifting your features: A new baseline for efficient model tuning,” Advances in Neural Information Processing Systems , vol. 35, pp. 109–123, 2022
work page 2022
-
[62]
Inference-time policy adapters (ipa): Tailoring extreme-scale lms without fine-tuning,
X. Lu, F. Brahman, P. West, J. Jang, K. Chandu, A. Ravichander, L. Qin, P. Ammanabrolu, L. Jiang, S. Ramnath et al., “Inference-time policy adapters (ipa): Tailoring extreme-scale lms without fine-tuning,” arXiv preprint arXiv:2305.15065 , 2023
-
[63]
Parameter-efficient transfer learning with diff pruning,
D. Guo, A. M. Rush, and Y . Kim, “Parameter-efficient transfer learning with diff pruning,” arXiv preprint arXiv:2012.07463 , 2020
-
[64]
Neural architecture search for parameter-efficient fine-tuning of large pre-trained language models,
N. Lawton, A. Kumar, G. Thattai, A. Galstyan, and G. V . Steeg, “Neural architecture search for parameter-efficient fine-tuning of large pre-trained language models,” arXiv preprint arXiv:2305.16597, 2023
-
[65]
Parameter-efficient fine-tuning without introducing new latency,
B. Liao, Y . Meng, and C. Monz, “Parameter-efficient fine-tuning without introducing new latency,” arXiv preprint arXiv:2305.16742 , 2023
-
[66]
Training neural networks with fixed sparse masks,
Y .-L. Sung, V . Nair, and C. A. Raffel, “Training neural networks with fixed sparse masks,” Advances in Neural Information Processing Systems, vol. 34, pp. 24 193–24 205, 2021
work page 2021
-
[67]
Unified low-resource sequence labeling by sample-aware dynamic sparse fine- tuning,
S. S. S. Das, R. H. Zhang, P. Shi, W. Yin, and R. Zhang, “Unified low-resource sequence labeling by sample-aware dynamic sparse fine- tuning,” arXiv preprint arXiv:2311.03748 , 2023
-
[68]
Compos- able sparse fine-tuning for cross-lingual transfer,
A. Ansell, E. M. Ponti, A. Korhonen, and I. Vuli ´c, “Compos- able sparse fine-tuning for cross-lingual transfer,” arXiv preprint arXiv:2110.07560, 2021
-
[69]
On the effectiveness of parameter-efficient fine-tuning,
Z. Fu, H. Yang, A. M.-C. So, W. Lam, L. Bing, and N. Collier, “On the effectiveness of parameter-efficient fine-tuning,” in Proceedings of the AAAI Conference on Artificial Intelligence , vol. 37, no. 11, 2023, pp. 12 799–12 807
work page 2023
-
[70]
Raise a child in large language model: Towards effective and gener- alizable fine-tuning,
R. Xu, F. Luo, Z. Zhang, C. Tan, B. Chang, S. Huang, and F. Huang, “Raise a child in large language model: Towards effective and gener- alizable fine-tuning,” arXiv preprint arXiv:2109.05687 , 2021
-
[71]
Efficient fine-tuning of bert models on the edge,
D. Vucetic, M. Tayaranian, M. Ziaeefard, J. J. Clark, B. H. Meyer, and W. J. Gross, “Efficient fine-tuning of bert models on the edge,” in 2022 IEEE International Symposium on Circuits and Systems (ISCAS) . IEEE, 2022, pp. 1838–1842
work page 2022
-
[72]
Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models
E. B. Zaken, S. Ravfogel, and Y . Goldberg, “Bitfit: Simple parameter- efficient fine-tuning for transformer-based masked language-models,” arXiv preprint arXiv:2106.10199 , 2021
-
[73]
Cross-attention is all you need: Adapt- ing pretrained transformers for machine translation,
M. Gheini, X. Ren, and J. May, “Cross-attention is all you need: Adapt- ing pretrained transformers for machine translation,” arXiv preprint arXiv:2104.08771, 2021
-
[74]
Sensitivity-aware visual parameter-efficient fine-tuning,
H. He, J. Cai, J. Zhang, D. Tao, and B. Zhuang, “Sensitivity-aware visual parameter-efficient fine-tuning,” inProceedings of the IEEE/CVF International Conference on Computer Vision , 2023, pp. 11 825– 11 835
work page 2023
-
[75]
arXiv preprint arXiv:2012.13255 , year=
A. Aghajanyan, L. Zettlemoyer, and S. Gupta, “Intrinsic dimension- ality explains the effectiveness of language model fine-tuning,” arXiv preprint arXiv:2012.13255, 2020
-
[76]
LoRA: Low-Rank Adaptation of Large Language Models
E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” arXiv preprint arXiv:2106.09685 , 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[77]
Compacter: Ef- ficient low-rank hypercomplex adapter layers,
R. Karimi Mahabadi, J. Henderson, and S. Ruder, “Compacter: Ef- ficient low-rank hypercomplex adapter layers,” Advances in Neural Information Processing Systems , vol. 34, pp. 1022–1035, 2021
work page 2021
-
[78]
Krona: Parameter efficient tuning with kronecker adapter,
A. Edalati, M. Tahaei, I. Kobyzev, V . P. Nia, J. J. Clark, and M. Reza- gholizadeh, “Krona: Parameter efficient tuning with kronecker adapter,” arXiv preprint arXiv:2212.10650 , 2022
-
[79]
Parameter-efficient model adaptation for vision transformers,
X. He, C. Li, P. Zhang, J. Yang, and X. E. Wang, “Parameter-efficient model adaptation for vision transformers,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 1, 2023, pp. 817–825
work page 2023
-
[80]
Vera: Vector-based random matrix adaptation,
D. J. Kopiczko, T. Blankevoort, and Y . M. Asano, “Vera: Vector-based random matrix adaptation,” arXiv preprint arXiv:2310.11454 , 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.