To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks

Matthew E. Peters; Noah A. Smith; Sebastian Ruder

arxiv: 1903.05987 · v2 · pith:NNSEKU5Wnew · submitted 2019-03-14 · 💻 cs.CL · cs.LG

To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks

Matthew E. Peters , Sebastian Ruder , Noah A. Smith This is my paper

classification 💻 cs.CL cs.LG

keywords pretrainedtasksadaptationdiverseextractionfeaturefine-tuningmodel

0 comments

read the original abstract

While most previous work has focused on different pretraining objectives and architectures for transfer learning, we ask how to best adapt the pretrained model to a given target task. We focus on the two most common forms of adaptation, feature extraction (where the pretrained weights are frozen), and directly fine-tuning the pretrained model. Our empirical results across diverse NLP tasks with two state-of-the-art models show that the relative performance of fine-tuning vs. feature extraction depends on the similarity of the pretraining and target tasks. We explore possible explanations for this finding and provide a set of adaptation guidelines for the NLP practitioner.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

OPT: Open Pre-trained Transformer Language Models
cs.CL 2022-05 unverdicted novelty 7.0

OPT releases open decoder-only transformers up to 175B parameters that match GPT-3 performance at one-seventh the carbon cost, along with code and training logs.
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
cs.LG 2019-10 unverdicted novelty 7.0

T5 casts all NLP tasks as text-to-text generation, systematically explores pre-training choices, and reaches strong performance on summarization, QA, classification and other tasks via large-scale training on the Colo...
Transfer Learning for Risk Classification of Social Media Posts: Model Evaluation Study
cs.CL 2019-07 unverdicted novelty 4.0

Finetuning GPT-1 on 150000 unlabeled Reachout.com posts then feeding the features into AutoML yields a new state-of-the-art macro F1 of 0.572 for triaging risk in 1588 labeled CLPsych 2017 posts without metadata or history.
To Tune or Not To Tune? How About the Best of Both Worlds?
cs.CL 2019-07 unverdicted novelty 3.0

A sequential fine-tuning strategy for pre-trained language models reports modest accuracy gains of 4.7%, 0.99%, and 0.72% on semantic similarity, sequence labeling, and text classification tasks.