Exploring Versatile Generative Language Model Via Parameter-Efficient Transfer Learning

Andrea Madotto; Pascale Fung; Zhaojiang Lin

arxiv: 2004.03829 · v2 · pith:WA3NDLMGnew · submitted 2020-04-08 · 💻 cs.CL

Exploring Versatile Generative Language Model Via Parameter-Efficient Transfer Learning

Zhaojiang Lin , Andrea Madotto , Pascale Fung This is my paper

classification 💻 cs.CL

keywords modellanguagegenerationtasksdown-streamfine-tuninggenerativelarge

0 comments

read the original abstract

Fine-tuning pre-trained generative language models to down-stream language generation tasks has shown promising results. However, this comes with the cost of having a single, large model for each task, which is not ideal in low-memory/power scenarios (e.g., mobile). In this paper, we propose an effective way to fine-tune multiple down-stream generation tasks simultaneously using a single, large pre-trained model. The experiments on five diverse language generation tasks show that by just using an additional 2-3% parameters for each task, our model can maintain or even improve the performance of fine-tuning the whole model.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
cs.CV 2023-03 conditional novelty 7.0

LLaMA-Adapter turns frozen LLaMA 7B into a capable instruction follower using only 1.2M new parameters and zero-init attention, matching Alpaca while extending to image-conditioned reasoning on ScienceQA and COCO.
Language Model Networks: Supervision-Efficient Learning through Dense Communication
cs.AI 2025-05 unverdicted novelty 6.0

LMNet connects stripped LLMs as nodes with trainable seq2seq edges for dense vector exchange, supporting supervision-efficient learning through differentiable communication.
State Space Models Meet Remote Sensing: A Survey
cs.CV 2026-06 unverdicted novelty 2.0

A literature survey of State Space Model methods applied to remote sensing tasks, architectures, and challenges since their introduction to the field.