arxiv: 2403.14608 · v7 · submitted 2024-03-21 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey

Zeyu Han , Chao Gao , Jinyang Liu , Jeff Zhang , Sai Qian Zhang

Authors on Pith no claims yet

Pith reviewed 2026-05-13 11:28 UTC · model grok-4.3

classification 💻 cs.LG

keywords parameter-efficient fine-tuninglarge language modelsmodel adaptationcomputational overheadsurveyfine-tuning methods

0 comments

The pith

Parameter-efficient fine-tuning adapts large pre-trained models to new tasks while adding only a small number of parameters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper surveys various parameter-efficient fine-tuning (PEFT) algorithms for adapting large pre-trained models. It examines their performance, computational overhead, and applications across different tasks. The survey also reviews system designs that help lower the costs of implementing these methods. This matters because full fine-tuning of models with billions of parameters is often too expensive for many hardware setups and tasks. By compiling this information, the work aims to help researchers select efficient ways to customize large models without prohibitive resource demands.

Core claim

PEFT refers to the process of adjusting the parameters of a pre-trained large model to adapt it to a specific task or domain while minimizing the number of additional parameters introduced or computational resources required. The survey presents comprehensive studies of various PEFT algorithms, examining their performance and computational overhead, provides an overview of applications developed using different PEFT algorithms, discusses common techniques to mitigate computation costs, and examines various real-world system designs to investigate the implementation costs.

What carries the argument

Parameter-Efficient Fine-Tuning (PEFT) algorithms that minimize the number of additional parameters or computational resources when adapting pre-trained large models.

If this is right

Large models become adaptable on hardware platforms with constrained computational capabilities.
The computational costs associated with customizing large models for downstream tasks are significantly reduced.
Broader applications of large models are enabled across various fields through efficient adaptation methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Standardized evaluation benchmarks could allow more reliable direct comparisons between different PEFT methods.
PEFT techniques may generalize to other model types like vision transformers with similar efficiency benefits.
Hardware architectures optimized for sparse or partial updates could amplify the advantages of PEFT approaches.

Load-bearing premise

The selected PEFT methods and system designs are representative of the full current landscape and that performance comparisons drawn from cited works are directly comparable across papers.

What would settle it

A unified benchmark experiment running all surveyed PEFT methods on the same set of tasks and hardware to verify or contradict the performance and overhead summaries.

read the original abstract

Large models represent a groundbreaking advancement in multiple application fields, enabling remarkable achievements across various tasks. However, their unprecedented scale comes with significant computational costs. These models, often consisting of billions of parameters, require vast amounts of computational resources for execution. Especially, the expansive scale and computational demands pose considerable challenges when customizing them for particular downstream tasks, particularly over the hardware platforms constrained by computational capabilities. Parameter Efficient Fine-Tuning (PEFT) provides a practical solution by efficiently adjusting the large models over the various downstream tasks. In particular, PEFT refers to the process of adjusting the parameters of a pre-trained large model to adapt it to a specific task or domain while minimizing the number of additional parameters introduced or computational resources required. This approach is particularly important when dealing with large-scale language models with high parameter counts, as fine-tuning these models from scratch can be computationally expensive and resource-intensive, posing considerable challenges in the supporting system platform design. In this survey, we present comprehensive studies of various PEFT algorithms, examining their performance and computational overhead. Moreover, we provide an overview of applications developed using different PEFT algorithms and discuss common techniques employed to mitigate computation costs for PEFT. In addition to providing an extensive survey from an algorithmic standpoint, we also examine various real-world system designs to investigate the implementation costs associated with different PEFT approaches. This survey serves as a valuable resource for researchers aiming to understand both the PEFT algorithm and its system implementation, offering detailed ......

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This survey organizes PEFT methods and system implications in a practical way, but its performance comparisons rest on non-standardized numbers from the original papers.

read the letter

This survey organizes the main parameter-efficient fine-tuning approaches for large models, grouping them by type, looking at accuracy versus cost trade-offs, and extending into applications and actual system implementations. The system-level sections are the stronger part because they discuss hardware constraints and additional tricks like combined quantization that go beyond pure algorithm descriptions. That perspective is less common and directly relevant for people who have to run these models on limited resources. The coverage of applications in NLP, vision, and other areas also gives a sense of where each method has been tried. The comparisons are the softer spot. The tables pull performance and overhead numbers straight from the cited works, which used different base models, datasets, metrics, and hardware. Without re-running the methods under one protocol, those rankings are only roughly indicative. The paper notes some of these differences but does not systematically normalize or flag every mismatch, so readers need to stay cautious when treating one method as clearly superior. This is the kind of paper that helps researchers and engineers who are choosing or building on PEFT techniques get oriented quickly. It is not introducing new algorithms or proofs, so its value is in the synthesis rather than original results. I would bring it to a reading group to talk through the system angle and how future surveys could handle cross-paper comparisons better. It deserves peer review because the organization is clear and the topic is current, even if the comparison claims need tightening by referees.

Referee Report

2 major / 3 minor

Summary. The paper surveys Parameter-Efficient Fine-Tuning (PEFT) methods for large models, presenting comprehensive studies of various algorithms with examinations of their performance and computational overhead, an overview of applications developed using different PEFT approaches, common techniques to mitigate computation costs, and analysis of real-world system designs for implementation costs.

Significance. If the coverage is representative and comparisons are presented with appropriate caveats, this survey would serve as a useful reference for researchers working on efficient adaptation of large models, bridging algorithmic PEFT techniques with practical system-level considerations in resource-constrained settings.

major comments (2)

[Performance and computational overhead examination sections] In the sections presenting performance and overhead comparisons (e.g., the algorithmic performance studies and system implementation analysis), aggregated metrics from cited works are contrasted directly. However, the source papers use varying base model families/sizes, training datasets, evaluation protocols, and hardware, without the survey re-implementing methods under a common benchmark or systematically normalizing for these differences. This assumption that cross-paper numbers are interchangeable is load-bearing for the central claim of examining relative performance and overhead.
[Applications and mitigation techniques overview] The overview of applications and mitigation techniques would be strengthened by explicit discussion of how representative the selected methods are of the full landscape, including any gaps in coverage of recent variants or domain-specific adaptations.

minor comments (3)

[Abstract] The abstract contains some repetitive phrasing in the definition of PEFT; tightening this would improve readability.
[Figures and tables] Figures summarizing method taxonomies or comparison tables could include clearer captions noting the source of each reported metric to aid interpretation.
[References] A few citations appear to reference preprints; confirming they have been updated to published versions where applicable would be helpful.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our survey. The comments have prompted us to strengthen the discussion of methodological caveats and coverage scope. We have revised the manuscript accordingly and believe these changes improve its utility as a reference.

read point-by-point responses

Referee: In the sections presenting performance and overhead comparisons (e.g., the algorithmic performance studies and system implementation analysis), aggregated metrics from cited works are contrasted directly. However, the source papers use varying base model families/sizes, training datasets, evaluation protocols, and hardware, without the survey re-implementing methods under a common benchmark or systematically normalizing for these differences. This assumption that cross-paper numbers are interchangeable is load-bearing for the central claim of examining relative performance and overhead.

Authors: We acknowledge the heterogeneity in experimental setups across the cited literature and agree that direct numerical comparisons carry inherent limitations. Our original manuscript already includes brief caveats in the performance tables and system analysis sections noting differences in base models and hardware. To address this more explicitly, we have added a new subsection titled 'Caveats in Cross-Paper Comparisons' that systematically discusses variations in model families, datasets, evaluation protocols, and hardware. We also added notes to key tables highlighting representative setups from source papers and emphasized that the compiled metrics are intended for qualitative trends rather than precise quantitative ranking. Re-implementing all methods under a unified benchmark exceeds the scope of a survey; such efforts are better suited to dedicated benchmark studies. We believe the expanded discussion now makes the load-bearing assumptions transparent while preserving the value of the aggregated overview. revision: partial
Referee: The overview of applications and mitigation techniques would be strengthened by explicit discussion of how representative the selected methods are of the full landscape, including any gaps in coverage of recent variants or domain-specific adaptations.

Authors: We appreciate this suggestion for improving transparency. The original manuscript selected methods based on citation impact, recency, and diversity across algorithmic families. In the revised version, we have inserted a dedicated paragraph in the 'Applications' and 'Mitigation Techniques' sections (and a short 'Scope and Limitations' subsection) that explicitly states our selection criteria, notes that the rapidly evolving PEFT landscape means some recent variants (e.g., new adapter compositions or domain-specific adaptations in vision-language or RL settings) may not be exhaustively covered, and highlights potential gaps such as limited coverage of certain low-resource domains. This addition clarifies representativeness without altering the core content. revision: yes

Circularity Check

0 steps flagged

No circularity: purely descriptive survey with no derivations or predictions

full rationale

This paper is a literature survey that aggregates and describes existing PEFT algorithms, their reported performance, applications, and system implementations. It contains no original equations, derivations, fitted parameters, or predictions that could reduce to inputs by construction. The central claims are overviews of prior work rather than new results justified by self-referential steps. No self-citation chains, ansatzes, or uniqueness theorems are invoked to force conclusions. Performance comparisons are presented as reported in source papers without re-derivation, so no fitted-input-called-prediction pattern applies. The survey is self-contained as a descriptive resource and does not rely on any load-bearing circular reductions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Survey paper introduces no free parameters, axioms, or invented entities; all content rests on cited prior work.

pith-pipeline@v0.9.0 · 5571 in / 919 out tokens · 29178 ms · 2026-05-13T11:28:34.536598+00:00 · methodology

discussion (0)

Forward citations

Cited by 26 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

LOFT: Low-Rank Orthogonal Fine-Tuning via Task-Aware Support Selection
cs.LG 2026-05 unverdicted novelty 7.0

LOFT unifies orthogonal PEFT by treating adaptation as low-rank subspace rotation and adds task-aware support selection that improves efficiency under fixed budgets.
Revisiting Privacy Leakage in Machine Unlearning: Membership Inference Beyond the Forgotten Set
cs.CR 2026-05 unverdicted novelty 7.0

Unlearning increases privacy leakage for the retain set, and a new tri-class membership inference attack distinguishes forget, retain, and unseen data using pre- and post-unlearning model outputs.
Rectification Difficulty and Optimal Sample Allocation in LLM-Augmented Surveys
cs.AI 2026-04 unverdicted novelty 7.0

A method using predicted rectification difficulty for optimal human sample allocation in LLM-augmented surveys captures 61-79% of theoretical efficiency gains and reduces MSE by 11% on two datasets without pilot data.
HeiSD: Hybrid Speculative Decoding for Embodied Vision-Language-Action Models with Kinematic Awareness
cs.RO 2026-03 unverdicted novelty 7.0

HeiSD delivers up to 2.45x faster inference for embodied VLA models by hybridizing speculative decoding with kinematic boundary detection and error-mitigation tricks while preserving task success rates.
Towards the Next Frontier of LLMs, Training on Private Data: A Cross-Domain Benchmark for Federated Fine-Tuning
cs.LG 2026-05 unverdicted novelty 6.0

Federated PEFT on LLMs across healthcare and finance datasets performs close to centralized training and beats isolated local training under non-IID conditions.
Black-box model classification under the discriminative factorization
cs.LG 2026-05 unverdicted novelty 6.0

Discriminative factorization distinguishes high-quality query sets for black-box model classification, with chance-level error decaying exponentially in query budget and parameters predicting empirical decay rates on ...
Pretraining Induces a Reusable Spectral Basis for Downstream Task Adaptation
cs.LG 2026-05 unverdicted novelty 6.0

Pretraining induces stable leading singular vectors that form a reusable spectral basis inherited by downstream tasks, enabling competitive performance with 0.2% trainable parameters on GLUE.
Direct-to-Event Spiking Neural Network Transfer
cs.NE 2026-05 unverdicted novelty 6.0

This work provides the first systematic study of transferring direct-coded spiking neural networks to event-based representations while aiming to preserve accuracy and reduce energy use.
From History to State: Constant-Context Skill Learning for LLM Agents
cs.AI 2026-05 unverdicted novelty 6.0

Constant-context skill learning trains reusable task-family modules for LLM agents using a deterministic state block for progress tracking and subgoal rewards, achieving 89.6% unseen success on ALFWorld, 76.8% on WebS...
You Snooze, You Lose: Automatic Safety Alignment Restoration through Neural Weight Translation
cs.CR 2026-05 unverdicted novelty 6.0

NeWTral is a non-linear weight translation framework using MoE routing that reduces average attack success rate from 70% to 13% on unsafe domain adapters across Llama, Mistral, Qwen, and Gemma models up to 72B while r...
TLoRA: Task-aware Low Rank Adaptation of Large Language Models
cs.CL 2026-04 unverdicted novelty 6.0

TLoRA jointly optimizes LoRA initialization via task-data SVD and sensitivity-driven rank allocation, delivering stronger results than standard LoRA across NLU, reasoning, math, code, and chat tasks while using fewer ...
BioTrain: Sub-MB, Sub-50mW On-Device Fine-Tuning for Edge-AI on Biosignals
cs.LG 2026-04 unverdicted novelty 6.0

BioTrain enables full-network fine-tuning of biosignal AI models on edge MCUs with sub-MB memory and sub-50mW power, delivering up to 35% accuracy gains and 8.1x memory reduction.
MP-ISMoE: Mixed-Precision Interactive Side Mixture-of-Experts for Efficient Transfer Learning
cs.LG 2026-04 unverdicted novelty 6.0

MP-ISMoE uses Gaussian noise perturbed iterative quantization and interactive side mixture-of-experts to deliver higher accuracy than prior memory-efficient transfer learning methods while keeping similar parameter an...
The Art of (Mis)alignment: How Fine-Tuning Methods Effectively Misalign and Realign LLMs in Post-Training
cs.CR 2026-04 unverdicted novelty 6.0

ORPO is most effective at misaligning LLMs while DPO excels at realigning them, though it reduces utility, revealing an asymmetry between attack and defense methods.
Pretrain-then-Adapt: Uncertainty-Aware Test-Time Adaptation for Text-based Person Search
cs.IR 2026-04 unverdicted novelty 6.0

UATTA adapts pre-trained text-image models at test time without labels by using disagreement in bidirectional retrieval rankings to estimate and mitigate uncertainty for improved person search.
Fine-Tuning Integrity for Modern Neural Networks: Structured Drift Proofs via Norm, Rank, and Sparsity Certificates
cs.CR 2026-04 unverdicted novelty 6.0

Succinct Model Difference Proofs certify that a neural-network update stays inside a policy-defined drift class using zero-knowledge proofs whose cost depends only on the drift structure.
BLK-Assist: A Methodological Framework for Artist-Led Co-Creation with Generative AI Models
cs.CY 2026-03 unverdicted novelty 6.0

BLK-Assist is a three-part framework (Conceptor for sketches, Stencil for transparent assets, Upscale for high-res outputs) that fine-tunes public diffusion models on one artist's proprietary corpus for style-faithful...
HiP-LoRA: Budgeted Spectral Plasticity for Robust Low-Rank Adaptation
cs.LG 2026-04 unverdicted novelty 5.0

HiP-LoRA decomposes LoRA updates into principal and residual spectral channels with a singular-value-weighted stability budget to reduce forgetting and interference during foundation model adaptation.
Cross-Lingual Attention Distillation with Personality-Informed Generative Augmentation for Multilingual Personality Recognition
cs.CL 2026-04 unverdicted novelty 5.0

ADAM uses personality-guided LLM augmentation and cross-lingual attention distillation to raise balanced accuracy on multilingual personality recognition to 0.6332 on Essays and 0.7448 on Kaggle, outperforming standar...
Assessment of RAG and Fine-Tuning for Industrial Question-Answering-Applications
cs.CL 2026-05 unverdicted novelty 4.0

RAG is more effective and cost-efficient than fine-tuning for industrial QA adaptation on automotive datasets.
Low-Rank Adaptation of Geospatial Foundation Models for Wildfire Mapping Using Sentinel-2 Data
cs.CV 2026-05 unverdicted novelty 4.0

LoRA-adapted Prithvi-v2 achieves the highest accuracy and best cross-domain generalization for burned-area mapping on Sentinel-2 data compared to full fine-tuning across 3,820 wildfire events.
From Weights to Activations: Is Steering the Next Frontier of Adaptation?
cs.CL 2026-04 unverdicted novelty 4.0

Steering is positioned as a distinct adaptation paradigm that uses targeted activation interventions for local, reversible behavioral changes without parameter updates.
Low-Rank Adaptation Redux for Large Models
cs.LG 2026-04 unverdicted novelty 3.0

An overview revisits LoRA variants by categorizing advances in architectural design, efficient optimization, and applications while linking them to classical signal processing tools for principled fine-tuning.
High-Dimensional Statistics: Reflections on Progress and Open Problems
math.ST 2026-05 unverdicted novelty 2.0

A survey synthesizing representative advances, common themes, and open problems in high-dimensional statistics while pointing to key entry-point works.
NTIRE 2026 Challenge on Bitstream-Corrupted Video Restoration: Methods and Results
cs.CV 2026-04 unverdicted novelty 2.0

The NTIRE 2026 Challenge establishes a benchmark for bitstream-corrupted video restoration and summarizes the top methods and observed trends from participating teams.
Redefining End-of-Life: Intelligent Automation for Electronics Remanufacturing Systems
eess.SY 2026-04 unverdicted novelty 2.0

A literature review of intelligent automation approaches using robotics, AI, and control for disassembly, inspection, sorting, and reprocessing of end-of-life electronics.

Reference graph

Works this paper leans on

260 extracted references · 260 canonical work pages · cited by 26 Pith papers · 22 internal anchors

[1]

Language mod- els are few-shot learners,

T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., “Language mod- els are few-shot learners,” Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020

work page 1901
[2]

Toolqa: A dataset for llm question answering with external tools,

Y . Zhuang, Y . Yu, K. Wang, H. Sun, and C. Zhang, “Toolqa: A dataset for llm question answering with external tools,” arXiv preprint arXiv:2306.13304, 2023

work page arXiv 2023
[3]

Multilingual machine translation with large language models: Empirical results and analysis.arXiv preprint arXiv:2304.04675, 2023

W. Zhu, H. Liu, Q. Dong, J. Xu, L. Kong, J. Chen, L. Li, and S. Huang, “Multilingual machine translation with large language models: Empir- ical results and analysis,” arXiv preprint arXiv:2304.04675 , 2023

work page arXiv 2023
[4]

A survey on large language models: Applications, challenges, limitations, and practical usage,

M. U. Hadi, R. Qureshi, A. Shah, M. Irfan, A. Zafar, M. Shaikh, N. Akhtar, J. Wu, and S. Mirjalili, “A survey on large language models: Applications, challenges, limitations, and practical usage,” TechRxiv, 2023

work page 2023
[5]

Gentopia: A collaborative platform for tool-augmented llms,

B. Xu, X. Liu, H. Shen, Z. Han, Y . Li, M. Yue, Z. Peng, Y . Liu, Z. Yao, and D. Xu, “Gentopia: A collaborative platform for tool-augmented llms,” arXiv preprint arXiv:2308.04030 , 2023

work page arXiv 2023
[6]

Camel: Communicative agents for

G. Li, H. A. A. K. Hammoud, H. Itani, D. Khizbullin, and B. Ghanem, “Camel: Communicative agents for ”mind” exploration of large lan- guage model society,” in Thirty-seventh Conference on Neural Infor- mation Processing Systems , 2023

work page 2023
[7]

AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation

Q. Wu, G. Bansal, J. Zhang, Y . Wu, S. Zhang, E. Zhu, B. Li, L. Jiang, X. Zhang, and C. Wang, “Autogen: Enabling next-gen llm applications via multi-agent conversation framework,” arXiv preprint arXiv:2308.08155, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[8]

Summit: Iterative text summarization via chatgpt,

H. Zhang, X. Liu, and J. Zhang, “Summit: Iterative text summarization via chatgpt,” arXiv preprint arXiv:2305.14835 , 2023

work page arXiv 2023
[9]

Root mean square layer normalization,

B. Zhang and R. Sennrich, “Root mean square layer normalization,” Advances in Neural Information Processing Systems , vol. 32, 2019

work page 2019
[10]

RoFormer: Enhanced Transformer with Rotary Position Embedding

J. Su, Y . Lu, S. Pan, A. Murtadha, B. Wen, and Y . Liu, “Roformer: Enhanced transformer with rotary position embedding,” arXiv preprint arXiv:2104.09864, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[11]

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. R. Bowman, “Glue: A multi-task benchmark and analysis platform for natural language understanding,” arXiv preprint arXiv:1804.07461 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[12]

Can a suit of armor conduct electricity? a new dataset for open book question answering,

T. Mihaylov, P. Clark, T. Khot, and A. Sabharwal, “Can a suit of armor conduct electricity? a new dataset for open book question answering,” in EMNLP, 2018

work page 2018
[13]

Piqa: Reasoning about physical commonsense in natural language,

Y . Bisk, R. Zellers, R. L. Bras, J. Gao, and Y . Choi, “Piqa: Reasoning about physical commonsense in natural language,” in Thirty-Fourth AAAI Conference on Artificial Intelligence , 2020

work page 2020
[14]

SocialIQA: Commonsense Reasoning about Social Interactions

M. Sap, H. Rashkin, D. Chen, R. LeBras, and Y . Choi, “Socialiqa: Commonsense reasoning about social interactions,” arXiv preprint arXiv:1904.09728, 2019

work page internal anchor Pith review arXiv 1904
[15]

Hellaswag: Can a machine really finish your sentence?

R. Zellers, A. Holtzman, Y . Bisk, A. Farhadi, and Y . Choi, “Hellaswag: Can a machine really finish your sentence?” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , 2019

work page 2019
[16]

Boolq: Exploring the surprising difficulty of natural yes/no questions,

C. e. a. Clark, “Boolq: Exploring the surprising difficulty of natural yes/no questions,” in NAACL, 2019

work page 2019
[17]

Winogrande: An adversarial winograd schema challenge at scale,

K. Sakaguchi, R. L. Bras, C. Bhagavatula, and Y . Choi, “Winogrande: An adversarial winograd schema challenge at scale,” Communications of the ACM , vol. 64, no. 9, pp. 99–106, 2021

work page 2021
[18]

Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

P. Clark, I. Cowhey, O. Etzioni, T. Khot, A. Sabharwal, C. Schoenick, and O. Tafjord, “Think you have solved question answering? try arc, the ai2 reasoning challenge,” arXiv:1803.05457v1, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[19]

The Kinetics Human Action Video Dataset

W. Kay, J. Carreira, K. Simonyan, B. Zhang, C. Hillier, S. Vijaya- narasimhan, F. Viola, T. Green, T. Back, P. Natsev et al., “The kinetics human action video dataset,” arXiv preprint arXiv:1705.06950 , 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[20]

The” something something

R. Goyal, S. Ebrahimi Kahou, V . Michalski, J. Materzynska, S. West- phal, H. Kim, V . Haenel, I. Fruend, P. Yianilos, M. Mueller-Freitag et al. , “The” something something” video database for learning and evaluating visual common sense,” in Proceedings of the IEEE interna- tional conference on computer vision , 2017, pp. 5842–5850

work page 2017
[21]

Hmdb: a large video database for human motion recognition,

H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, “Hmdb: a large video database for human motion recognition,” in 2011 Interna- tional conference on computer vision . IEEE, 2011, pp. 2556–2563

work page 2011
[22]

Microsoft coco: Common objects in context,

T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll ´ar, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13 . Springer, 2014, pp. 740–755

work page 2014
[23]

Scene parsing through ade20k dataset,

B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso, and A. Torralba, “Scene parsing through ade20k dataset,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 633– 641

work page 2017
[24]

The pascal visual object classes (voc) challenge,

M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisser- man, “The pascal visual object classes (voc) challenge,” International journal of computer vision , vol. 88, pp. 303–338, 2010

work page 2010
[25]

Parameter-efficient fine-tuning of large-scale pre-trained language models,

N. Ding, Y . Qin, G. Yang, F. Wei, Z. Yang, Y . Su, S. Hu, Y . Chen, C.- M. Chan, W. Chen et al., “Parameter-efficient fine-tuning of large-scale pre-trained language models,” Nature Machine Intelligence , vol. 5, no. 3, pp. 220–235, 2023

work page 2023
[26]

Parameter-efficient fine-tuning methods for pretrained language models: A critical review and assessment

L. Xu, H. Xie, S.-Z. J. Qin, X. Tao, and F. L. Wang, “Parameter- efficient fine-tuning methods for pretrained language models: A critical review and assessment,” arXiv preprint arXiv:2312.12148 , 2023

work page arXiv 2023
[27]

Empirical analysis of the strengths and weaknesses of peft techniques for llms,

G. Pu, A. Jain, J. Yin, and R. Kaplan, “Empirical analysis of the strengths and weaknesses of peft techniques for llms,” arXiv preprint arXiv:2304.14999, 2023

work page arXiv 2023
[28]

Sharegpt,

OpenAI, “Sharegpt,” https://sharegpt.com/, 2023

work page 2023
[29]

Microsoft azure function trace,

Microsoft, “Microsoft azure function trace,” https://github.com/Azure/AzurePublicDataset, 2023

work page 2023
[30]

Analysis, modeling and simulation of workload patterns in a large-scale utility cloud,

I. S. Moreno, P. Garraghan, P. Townend, and J. Xu, “Analysis, modeling and simulation of workload patterns in a large-scale utility cloud,”IEEE Transactions on Cloud Computing , vol. 2, no. 2, pp. 208–221, 2014

work page 2014
[31]

Parameter-efficient transfer learning for nlp,

N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Gesmundo, M. Attariyan, and S. Gelly, “Parameter-efficient transfer learning for nlp,” in International Conference on Machine Learning . PMLR, 2019, pp. 2790–2799

work page 2019
[32]

Towards a unified view of parameter-efficient transfer learning

J. He, C. Zhou, X. Ma, T. Berg-Kirkpatrick, and G. Neubig, “Towards a unified view of parameter-efficient transfer learning,” arXiv preprint arXiv:2110.04366, 2021

work page arXiv 2021
[33]

Counter- interference adapter for multilingual machine translation,

Y . Zhu, J. Feng, C. Zhao, M. Wang, and L. Li, “Counter- interference adapter for multilingual machine translation,” arXiv preprint arXiv:2104.08154, 2021

work page arXiv 2021
[34]

Conditional adapters: Parameter-efficient transfer learning with fast inference,

T. Lei, J. Bai, S. Brahma, J. Ainslie, K. Lee, Y . Zhou, N. Du, V . Y . Zhao, Y . Wu, B. Li et al. , “Conditional adapters: Parameter-efficient transfer learning with fast inference,” arXiv preprint arXiv:2304.04947, 2023

work page arXiv 2023
[35]

AdapterFusion: Non-Destructive Task Composition for Transfer Learning , journal =

J. Pfeiffer, A. Kamath, A. R ¨uckl´e, K. Cho, and I. Gurevych, “Adapter- fusion: Non-destructive task composition for transfer learning,” arXiv preprint arXiv:2005.00247, 2020

work page arXiv 2005
[36]

Adamix: Mixture-of-adapter for parameter-efficient tuning of large language models,

Y . Wang, S. Mukherjee, X. Liu, J. Gao, A. H. Awadallah, and J. Gao, “Adamix: Mixture-of-adapter for parameter-efficient tuning of large language models,” arXiv preprint arXiv:2205.12410, vol. 1, no. 2, p. 4, 2022

work page arXiv 2022
[37]

Prototype-based hyperadapter for sample- efficient multi-task tuning,

H. Zhao, J. Fu, and Z. He, “Prototype-based hyperadapter for sample- efficient multi-task tuning,” arXiv preprint arXiv:2310.11670 , 2023

work page arXiv 2023
[38]

Adaptersoup: Weight averaging to improve generalization of pretrained language models,

A. Chronopoulou, M. E. Peters, A. Fraser, and J. Dodge, “Adaptersoup: Weight averaging to improve generalization of pretrained language models,” arXiv preprint arXiv:2302.07027 , 2023

work page arXiv 2023
[39]

Mera: Merging pretrained adapters for few-shot learning,

S. He, R.-Z. Fan, L. Ding, L. Shen, T. Zhou, and D. Tao, “Mera: Merging pretrained adapters for few-shot learning,” arXiv preprint arXiv:2308.15982, 2023

work page arXiv 2023
[40]

arXiv preprint arXiv:2106.04489 , year=

R. K. Mahabadi, S. Ruder, M. Dehghani, and J. Henderson, “Parameter- efficient multi-task fine-tuning for transformers via shared hypernet- works,” arXiv preprint arXiv:2106.04489 , 2021

work page arXiv 2021
[41]

Prefix-Tuning: Optimizing Continuous Prompts for Generation

X. L. Li and P. Liang, “Prefix-tuning: Optimizing continuous prompts for generation,” arXiv preprint arXiv:2101.00190 , 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[42]

Prefix propaga- tion: Parameter-efficient tuning for long sequences,

J. Li, W. Aitken, R. Bhambhoria, and X. Zhu, “Prefix propaga- tion: Parameter-efficient tuning for long sequences,” arXiv preprint arXiv:2305.12086, 2023

work page arXiv 2023
[43]

arXiv preprint arXiv:2110.07602 , year=

X. Liu, K. Ji, Y . Fu, W. L. Tam, Z. Du, Z. Yang, and J. Tang, “P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks,” arXiv preprint arXiv:2110.07602 , 2021. 21

work page arXiv 2021
[44]

Towards adaptive prefix tuning for parameter-efficient language model fine-tuning,

Z.-R. Zhang, C. Tan, H. Xu, C. Wang, J. Huang, and S. Huang, “Towards adaptive prefix tuning for parameter-efficient language model fine-tuning,” arXiv preprint arXiv:2305.15212 , 2023

work page arXiv 2023
[45]

arXiv preprint arXiv:2103.10385 , year=

X. Liu, Y . Zheng, Z. Du, M. Ding, Y . Qian, Z. Yang, and J. Tang, “Gpt understands, too,” arXiv preprint arXiv:2103.10385 , 2021

work page arXiv 2021
[46]

The Power of Scale for Parameter-Efficient Prompt Tuning

B. Lester, R. Al-Rfou, and N. Constant, “The power of scale for parameter-efficient prompt tuning,” arXiv preprint arXiv:2104.08691 , 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[47]

Xprompt: Exploring the extreme of prompt tuning,

F. Ma, C. Zhang, L. Ren, J. Wang, Q. Wang, W. Wu, X. Quan, and D. Song, “Xprompt: Exploring the extreme of prompt tuning,” arXiv preprint arXiv:2210.04457, 2022

work page arXiv 2022
[48]

Idpg: An instance-dependent prompt generation method,

Z. Wu, S. Wang, J. Gu, R. Hou, Y . Dong, V . Vydiswaran, and H. Ma, “Idpg: An instance-dependent prompt generation method,” arXiv preprint arXiv:2204.04497 , 2022

work page arXiv 2022
[49]

Late prompt tuning: A late prompt could be better than many prompts,

X. Liu, T. Sun, X. Huang, and X. Qiu, “Late prompt tuning: A late prompt could be better than many prompts,” arXiv preprint arXiv:2210.11292, 2022

work page arXiv 2022
[50]

Spt: Learning to selectively insert prompts for better prompt tuning,

W. Zhu and M. Tan, “Spt: Learning to selectively insert prompts for better prompt tuning,” in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023, pp. 11 862– 11 878

work page 2023
[51]

Aprompt: Attention prompt tuning for efficient adaptation of pre-trained language models,

Q. Wang, Y . Mao, J. Wang, H. Yu, S. Nie, S. Wang, F. Feng, L. Huang, X. Quan, Z. Xu et al., “Aprompt: Attention prompt tuning for efficient adaptation of pre-trained language models,” in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , 2023, pp. 9147–9160

work page 2023
[52]

Spot: Better frozen model adaptation through soft prompt transfer,

T. Vu, B. Lester, N. Constant, R. Al-Rfou, and D. Cer, “Spot: Better frozen model adaptation through soft prompt transfer,” arXiv preprint arXiv:2110.07904, 2021

work page arXiv 2021
[53]

On Transferability of Prompt Tuning for Natural Language Understanding , journal =

Y . Su, X. Wang, Y . Qin, C.-M. Chan, Y . Lin, H. Wang, K. Wen, Z. Liu, P. Li, J. Li et al. , “On transferability of prompt tuning for natural language processing,” arXiv preprint arXiv:2111.06719 , 2021

work page arXiv 2021
[54]

Infoprompt: Information-theoretic soft prompt tuning for natural language understanding,

J. Wu, T. Yu, R. Wang, Z. Song, R. Zhang, H. Zhao, C. Lu, S. Li, and R. Henao, “Infoprompt: Information-theoretic soft prompt tuning for natural language understanding,” arXiv preprint arXiv:2306.04933, 2023

work page arXiv 2023
[55]

Ptp: Boosting stability and performance of prompt tuning with perturbation-based regularizer,

L. Chen, H. Huang, and M. Cheng, “Ptp: Boosting stability and performance of prompt tuning with perturbation-based regularizer,” arXiv preprint arXiv:2305.02423 , 2023

work page arXiv 2023
[56]

Exploring universal intrinsic task subspace via prompt tuning,

Y . Qin, X. Wang, Y . Su, Y . Lin, N. Ding, J. Yi, W. Chen, Z. Liu, J. Li, L. Hou et al., “Exploring universal intrinsic task subspace via prompt tuning,” arXiv preprint arXiv:2110.07867 , 2021

work page arXiv 2021
[57]

Smop: Towards efficient and effective prompt tuning with sparse mixture-of-prompts,

J.-Y . Choi, J. Kim, J.-H. Park, W.-L. Mok, and S. Lee, “Smop: Towards efficient and effective prompt tuning with sparse mixture-of-prompts,” in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023, pp. 14 306–14 316

work page 2023
[58]

Dept: Decomposed prompt tuning for parameter- efficient fine-tuning,

Z. Shi and A. Lipani, “Dept: Decomposed prompt tuning for parameter- efficient fine-tuning,” arXiv preprint arXiv:2309.05173 , 2023

work page arXiv 2023
[59]

Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning,

H. Liu, D. Tam, M. Muqeeth, J. Mohta, T. Huang, M. Bansal, and C. A. Raffel, “Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning,” Advances in Neural Information Processing Systems, vol. 35, pp. 1950–1965, 2022

work page 1950
[60]

Pushing mixture of experts to the limit: Extremely parameter efficient moe for instruction tuning.arXiv preprint arXiv:2309.05444,

T. Zadouri, A. ¨Ust¨un, A. Ahmadian, B. Ermis ¸, A. Locatelli, and S. Hooker, “Pushing mixture of experts to the limit: Extremely parameter efficient moe for instruction tuning,” arXiv preprint arXiv:2309.05444, 2023

work page arXiv 2023
[61]

Scaling & shifting your features: A new baseline for efficient model tuning,

D. Lian, D. Zhou, J. Feng, and X. Wang, “Scaling & shifting your features: A new baseline for efficient model tuning,” Advances in Neural Information Processing Systems , vol. 35, pp. 109–123, 2022

work page 2022
[62]

Inference-time policy adapters (ipa): Tailoring extreme-scale lms without fine-tuning,

X. Lu, F. Brahman, P. West, J. Jang, K. Chandu, A. Ravichander, L. Qin, P. Ammanabrolu, L. Jiang, S. Ramnath et al., “Inference-time policy adapters (ipa): Tailoring extreme-scale lms without fine-tuning,” arXiv preprint arXiv:2305.15065 , 2023

work page arXiv 2023
[63]

Parameter-efficient transfer learning with diff pruning,

D. Guo, A. M. Rush, and Y . Kim, “Parameter-efficient transfer learning with diff pruning,” arXiv preprint arXiv:2012.07463 , 2020

work page arXiv 2012
[64]

Neural architecture search for parameter-efficient fine-tuning of large pre-trained language models,

N. Lawton, A. Kumar, G. Thattai, A. Galstyan, and G. V . Steeg, “Neural architecture search for parameter-efficient fine-tuning of large pre-trained language models,” arXiv preprint arXiv:2305.16597, 2023

work page arXiv 2023
[65]

Parameter-efficient fine-tuning without introducing new latency,

B. Liao, Y . Meng, and C. Monz, “Parameter-efficient fine-tuning without introducing new latency,” arXiv preprint arXiv:2305.16742 , 2023

work page arXiv 2023
[66]

Training neural networks with fixed sparse masks,

Y .-L. Sung, V . Nair, and C. A. Raffel, “Training neural networks with fixed sparse masks,” Advances in Neural Information Processing Systems, vol. 34, pp. 24 193–24 205, 2021

work page 2021
[67]

Unified low-resource sequence labeling by sample-aware dynamic sparse fine- tuning,

S. S. S. Das, R. H. Zhang, P. Shi, W. Yin, and R. Zhang, “Unified low-resource sequence labeling by sample-aware dynamic sparse fine- tuning,” arXiv preprint arXiv:2311.03748 , 2023

work page arXiv 2023
[68]

Compos- able sparse fine-tuning for cross-lingual transfer,

A. Ansell, E. M. Ponti, A. Korhonen, and I. Vuli ´c, “Compos- able sparse fine-tuning for cross-lingual transfer,” arXiv preprint arXiv:2110.07560, 2021

work page arXiv 2021
[69]

On the effectiveness of parameter-efficient fine-tuning,

Z. Fu, H. Yang, A. M.-C. So, W. Lam, L. Bing, and N. Collier, “On the effectiveness of parameter-efficient fine-tuning,” in Proceedings of the AAAI Conference on Artificial Intelligence , vol. 37, no. 11, 2023, pp. 12 799–12 807

work page 2023
[70]

Raise a child in large language model: Towards effective and gener- alizable fine-tuning,

R. Xu, F. Luo, Z. Zhang, C. Tan, B. Chang, S. Huang, and F. Huang, “Raise a child in large language model: Towards effective and gener- alizable fine-tuning,” arXiv preprint arXiv:2109.05687 , 2021

work page arXiv 2021
[71]

Efficient fine-tuning of bert models on the edge,

D. Vucetic, M. Tayaranian, M. Ziaeefard, J. J. Clark, B. H. Meyer, and W. J. Gross, “Efficient fine-tuning of bert models on the edge,” in 2022 IEEE International Symposium on Circuits and Systems (ISCAS) . IEEE, 2022, pp. 1838–1842

work page 2022
[72]

Bitﬁt: Simple parameter-efﬁcient ﬁne-tuning for transformer-based masked language-models

E. B. Zaken, S. Ravfogel, and Y . Goldberg, “Bitfit: Simple parameter- efficient fine-tuning for transformer-based masked language-models,” arXiv preprint arXiv:2106.10199 , 2021

work page arXiv 2021
[73]

Cross-attention is all you need: Adapt- ing pretrained transformers for machine translation,

M. Gheini, X. Ren, and J. May, “Cross-attention is all you need: Adapt- ing pretrained transformers for machine translation,” arXiv preprint arXiv:2104.08771, 2021

work page arXiv 2021
[74]

Sensitivity-aware visual parameter-efficient fine-tuning,

H. He, J. Cai, J. Zhang, D. Tao, and B. Zhuang, “Sensitivity-aware visual parameter-efficient fine-tuning,” inProceedings of the IEEE/CVF International Conference on Computer Vision , 2023, pp. 11 825– 11 835

work page 2023
[75]

arXiv preprint arXiv:2012.13255 , year=

A. Aghajanyan, L. Zettlemoyer, and S. Gupta, “Intrinsic dimension- ality explains the effectiveness of language model fine-tuning,” arXiv preprint arXiv:2012.13255, 2020

work page arXiv 2012
[76]

LoRA: Low-Rank Adaptation of Large Language Models

E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” arXiv preprint arXiv:2106.09685 , 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[77]

Compacter: Ef- ficient low-rank hypercomplex adapter layers,

R. Karimi Mahabadi, J. Henderson, and S. Ruder, “Compacter: Ef- ficient low-rank hypercomplex adapter layers,” Advances in Neural Information Processing Systems , vol. 34, pp. 1022–1035, 2021

work page 2021
[78]

Krona: Parameter efficient tuning with kronecker adapter,

A. Edalati, M. Tahaei, I. Kobyzev, V . P. Nia, J. J. Clark, and M. Reza- gholizadeh, “Krona: Parameter efficient tuning with kronecker adapter,” arXiv preprint arXiv:2212.10650 , 2022

work page arXiv 2022
[79]

Parameter-efficient model adaptation for vision transformers,

X. He, C. Li, P. Zhang, J. Yang, and X. E. Wang, “Parameter-efficient model adaptation for vision transformers,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 1, 2023, pp. 817–825

work page 2023
[80]

Vera: Vector-based random matrix adaptation,

D. J. Kopiczko, T. Blankevoort, and Y . M. Asano, “Vera: Vector-based random matrix adaptation,” arXiv preprint arXiv:2310.11454 , 2023

work page arXiv 2023

Showing first 80 references.