Residual Connections Harm Generative Representation Learning

Michael Maire; Rebecca Willett; Ruoxi Jiang; William Gao; Xiao Zhang

arxiv: 2404.10947 · v5 · pith:BME2K2KMnew · submitted 2024-04-16 · 💻 cs.CV

Residual Connections Harm Generative Representation Learning

Xiao Zhang , Ruoxi Jiang , William Gao , Rebecca Willett , Michael Maire This is my paper

classification 💻 cs.CV

keywords featurelearningresidualaccuracyconnectionsdiffusiongenerativeidentity

0 comments

read the original abstract

We show that introducing a weighting factor to reduce the influence of identity shortcuts in residual networks significantly enhances semantic feature learning in generative representation learning frameworks, such as masked autoencoders (MAEs) and diffusion models. Our modification notably improves feature quality, raising ImageNet-1K K-Nearest Neighbor accuracy from 27.4% to 63.9% and linear probing accuracy from 67.8% to 72.7% for MAEs with a ViT-B/16 backbone, while also enhancing generation quality in diffusion models. This significant gap suggests that, while residual connection structure serves an essential role in facilitating gradient propagation, it may have a harmful side effect of reducing capacity for abstract learning by virtue of injecting an echo of shallower representations into deeper layers. We ameliorate this downside via a fixed formula for monotonically decreasing the contribution of identity connections as layer depth increases. Our design promotes the gradual development of feature abstractions, without impacting network trainability. Analyzing the representations learned by our modified residual networks, we find correlation between low effective feature rank and downstream task performance.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Uni-Encoder Meets Multi-Encoders: Representation Before Fusion for Brain Tumor Segmentation with Missing Modalities
cs.CV 2026-04 unverdicted novelty 5.0

UniME combines a pretrained unified ViT encoder with modality-specific CNN encoders to improve brain tumor segmentation performance when some MRI modalities are missing.