Exploring Versatile Generative Language Model Via Parameter-Efficient Transfer Learning
read the original abstract
Fine-tuning pre-trained generative language models to down-stream language generation tasks has shown promising results. However, this comes with the cost of having a single, large model for each task, which is not ideal in low-memory/power scenarios (e.g., mobile). In this paper, we propose an effective way to fine-tune multiple down-stream generation tasks simultaneously using a single, large pre-trained model. The experiments on five diverse language generation tasks show that by just using an additional 2-3% parameters for each task, our model can maintain or even improve the performance of fine-tuning the whole model.
This paper has not been read by Pith yet.
Forward citations
Cited by 3 Pith papers
-
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
LLaMA-Adapter turns frozen LLaMA 7B into a capable instruction follower using only 1.2M new parameters and zero-init attention, matching Alpaca while extending to image-conditioned reasoning on ScienceQA and COCO.
-
Language Model Networks: Supervision-Efficient Learning through Dense Communication
LMNet connects stripped LLMs as nodes with trainable seq2seq edges for dense vector exchange, supporting supervision-efficient learning through differentiable communication.
-
State Space Models Meet Remote Sensing: A Survey
A literature survey of State Space Model methods applied to remote sensing tasks, architectures, and challenges since their introduction to the field.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.