SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability

Jascha Sohl-Dickstein; Jason Yosinski; Justin Gilmer; Maithra Raghu

arxiv: 1706.05806 · v2 · pith:M3PJQ35Knew · submitted 2017-06-19 · 📊 stat.ML · cs.LG

SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability

Maithra Raghu , Justin Gilmer , Jason Yosinski , Jascha Sohl-Dickstein This is my paper

classification 📊 stat.ML cs.LG

keywords networkssvccaallowinganalysiscanonicalcorrelationdynamicslayers

0 comments

read the original abstract

We propose a new technique, Singular Vector Canonical Correlation Analysis (SVCCA), a tool for quickly comparing two representations in a way that is both invariant to affine transform (allowing comparison between different layers and networks) and fast to compute (allowing more comparisons to be calculated than with previous methods). We deploy this tool to measure the intrinsic dimensionality of layers, showing in some cases needless over-parameterization; to probe learning dynamics throughout training, finding that networks converge to final representations from the bottom up; to show where class-specific information in networks is formed; and to suggest new training regimes that simultaneously save computation and overfit less. Code: https://github.com/google/svcca/

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Toy Combinatorial Interpretability Models Reveal Lottery Tickets in Early Feature Space
cs.LG 2026-05 unverdicted novelty 7.0

In a combinatorial toy setting, winning lottery tickets preserve families of compatible feature locations in early feature space that balance proximity to final codes with low interference, rather than specific weight...
From Layers to Networks: Comparing Neural Representations via Diffusion Geometry
cs.LG 2026-05 unverdicted novelty 7.0

Develops multi-scale and alternating-diffusion fused variants of CKA and distance correlation via Markov matrices for neural representation comparison, reporting state-of-the-art results on ReSi and OOD benchmarks.
When Are Two Networks the Same? Tensor Similarity for Mechanistic Interpretability
cs.LG 2026-05 unverdicted novelty 7.0

Tensor similarity is a symmetry-invariant metric that measures functional equivalence between tensor-based networks using a recursive algorithm for cross-layer mechanisms.
The Long Delay to Arithmetic Generalization: When Learned Representations Outrun Behavior
cs.LG 2026-03 unverdicted novelty 7.0

The grokking delay in encoder-decoder models on one-step Collatz prediction stems from decoder inability to use early-learned encoder representations of parity and residue structure, with numeral base acting as a stro...
Understanding intermediate layers using linear classifier probes
stat.ML 2016-10 accept novelty 7.0

Linear probes demonstrate that feature separability for classification increases monotonically with network depth in Inception v3 and ResNet-50.