Representation Learning on a Random Lattice
read the original abstract
Decomposing a deep neural network's learned representations into interpretable features could greatly enhance its safety and reliability. To better understand features, we adopt a geometric perspective, viewing them as a learned coordinate system for mapping an embedded data distribution. We motivate a model of a generic data distribution as a random lattice and analyze its properties using percolation theory. Learned features are categorized into context, component, and surface features. The model is qualitatively consistent with recent findings in mechanistic interpretability and suggests directions for future research.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
Critical Percolation as a Synthetic Data Model for Interpretability
Critical percolation clusters embedded in high dimensions, combined with taxonomic latent variables, form an analytically tractable synthetic data model whose ground-truth hierarchy can be linearly decoded from networ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.