What Happens To BERT Embeddings During Fine-tuning?

Amil Merchant; Elahe Rahimtoroghi; Ellie Pavlick; Ian Tenney

arxiv: 2004.14448 · v1 · pith:X2Y4XEEXnew · submitted 2020-04-29 · 💻 cs.CL

What Happens To BERT Embeddings During Fine-tuning?

Amil Merchant , Elahe Rahimtoroghi , Ellie Pavlick , Ian Tenney This is my paper

classification 💻 cs.CL

keywords fine-tuningmodelbertfindrepresentationsaffectsanalysislinguistic

0 comments

read the original abstract

While there has been much recent work studying how linguistic information is encoded in pre-trained sentence representations, comparatively little is understood about how these models change when adapted to solve downstream tasks. Using a suite of analysis techniques (probing classifiers, Representational Similarity Analysis, and model ablations), we investigate how fine-tuning affects the representations of the BERT model. We find that while fine-tuning necessarily makes significant changes, it does not lead to catastrophic forgetting of linguistic phenomena. We instead find that fine-tuning primarily affects the top layers of BERT, but with noteworthy variation across tasks. In particular, dependency parsing reconfigures most of the model, whereas SQuAD and MNLI appear to involve much shallower processing. Finally, we also find that fine-tuning has a weaker effect on representations of out-of-domain sentences, suggesting room for improvement in model generalization.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

EPnG: Adaptive Expert Prune-and-Grow for Parameter-Efficient MoE Fine-tuning
cs.LG 2026-07 unverdicted novelty 6.0

EPnG reallocates LoRA capacity in MoE models by pruning experts with low router gate probabilities and expanding high-importance ones via rank growth, outperforming standard LoRA and nearing full fine-tuning performan...