pith. sign in

arxiv: 2004.03829 · v2 · pith:WA3NDLMGnew · submitted 2020-04-08 · 💻 cs.CL

Exploring Versatile Generative Language Model Via Parameter-Efficient Transfer Learning

classification 💻 cs.CL
keywords modellanguagegenerationtasksdown-streamfine-tuninggenerativelarge
0
0 comments X
read the original abstract

Fine-tuning pre-trained generative language models to down-stream language generation tasks has shown promising results. However, this comes with the cost of having a single, large model for each task, which is not ideal in low-memory/power scenarios (e.g., mobile). In this paper, we propose an effective way to fine-tune multiple down-stream generation tasks simultaneously using a single, large pre-trained model. The experiments on five diverse language generation tasks show that by just using an additional 2-3% parameters for each task, our model can maintain or even improve the performance of fine-tuning the whole model.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention

    cs.CV 2023-03 conditional novelty 7.0

    LLaMA-Adapter turns frozen LLaMA 7B into a capable instruction follower using only 1.2M new parameters and zero-init attention, matching Alpaca while extending to image-conditioned reasoning on ScienceQA and COCO.

  2. Language Model Networks: Supervision-Efficient Learning through Dense Communication

    cs.AI 2025-05 unverdicted novelty 6.0

    LMNet connects stripped LLMs as nodes with trainable seq2seq edges for dense vector exchange, supporting supervision-efficient learning through differentiable communication.

  3. State Space Models Meet Remote Sensing: A Survey

    cs.CV 2026-06 unverdicted novelty 2.0

    A literature survey of State Space Model methods applied to remote sensing tasks, architectures, and challenges since their introduction to the field.