Surgical Fine-Tuning Improves Adaptation to Distribution Shifts

Ananya Kumar; Annie S. Chen; Chelsea Finn; Fahim Tajwar; Huaxiu Yao; Percy Liang; Yoonho Lee

arxiv: 2210.11466 · v3 · pith:AE3UUO5Nnew · submitted 2022-10-20 · 💻 cs.LG · cs.AI

Surgical Fine-Tuning Improves Adaptation to Distribution Shifts

Yoonho Lee , Annie S. Chen , Fahim Tajwar , Ananya Kumar , Huaxiu Yao , Percy Liang , Chelsea Finn This is my paper

classification 💻 cs.LG cs.AI

keywords fine-tuningdistributionlayersshiftinformationlearnedshiftssubset

0 comments

read the original abstract

A common approach to transfer learning under distribution shift is to fine-tune the last few layers of a pre-trained model, preserving learned features while also adapting to the new task. This paper shows that in such settings, selectively fine-tuning a subset of layers (which we term surgical fine-tuning) matches or outperforms commonly used fine-tuning approaches. Moreover, the type of distribution shift influences which subset is more effective to tune: for example, for image corruptions, fine-tuning only the first few layers works best. We validate our findings systematically across seven real-world data tasks spanning three types of distribution shifts. Theoretically, we prove that for two-layer neural networks in an idealized setting, first-layer tuning can outperform fine-tuning all layers. Intuitively, fine-tuning more parameters on a small target dataset can cause information learned during pre-training to be forgotten, and the relevant information depends on the type of shift.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Sample-wise Targeted Adversarial Attacks on Test-time Adaptation
cs.LG 2026-05 unverdicted novelty 6.0

Proposes meta-learning attack with priority-aware gradient alignment for sample-wise targeted attacks on TTA that maintain label distribution consistency with no-attack baseline.
Selective, Regularized, and Calibrated: Harnessing Vision Foundation Models for Cross-Domain Few-Shot Semantic Segmentation
cs.CV 2026-05 unverdicted novelty 6.0

HERA is a select-regularize-calibrate framework adapting frozen vision foundation models for cross-domain few-shot semantic segmentation via hierarchical layer selection with ETR, prior-guided regularization, and pixe...
Intermediate Representations are Strong AI-Generated Image Detectors
cs.CV 2026-05 unverdicted novelty 6.0

Intermediate layer embedding sensitivity to perturbations distinguishes AI-generated images from real ones, yielding higher AUROC on GenImage and Forensics Small benchmarks than prior methods.
Task-agnostic Low-rank Residual Adaptation for Efficient Federated Continual Fine-Tuning
cs.LG 2025-05 unverdicted novelty 6.0

Fed-TaLoRA uses task-agnostic low-rank residual adaptation with post-aggregation calibration to enable efficient federated continual fine-tuning across sequential tasks under non-IID conditions.
Generalizable Deepfake Detection Based on Forgery-aware Layer Masking and Multi-artifact Subspace Decomposition
cs.CV 2026-01 unverdicted novelty 5.0

FMSD improves cross-dataset generalization in deepfake detection by using gradient-based layer masking to select forgery-sensitive weights and SVD to split them into preserved semantic and multiple learnable artifact ...