How Data Augmentation Shapes Neural Representations

Alex H. Williams; Sarah E. Harvey; Tianxiao He

arxiv: 2605.15306 · v1 · pith:RA2L3U7Snew · submitted 2026-05-14 · 💻 cs.LG · stat.ML

How Data Augmentation Shapes Neural Representations

Tianxiao He , Alex H. Williams , Sarah E. Harvey This is my paper

Pith reviewed 2026-05-19 16:20 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords data augmentationneural representationsshape analysisgeometric trajectoriesensemble predictioninvariant metric spacerepresentation geometry

0 comments

The pith

Data augmentation steers neural representations along distinct, predictable trajectories in an invariant shape space.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that different data augmentation strategies reshape the internal geometry of neural network representations in measurable ways. By embedding hidden activations into a metric space that ignores scaling, translation, rotation, and reflection, the authors track how representations move as augmentation intensity increases. Stronger augmentations produce smoother, more consistent paths, while distinct augmentation types push representations in different directions. These geometric patterns turn out to be stable across architectures and random seeds. The same shape-space measurements also forecast which layers or models will gain the most when their outputs are combined in an ensemble.

Core claim

We characterize how different data augmentation strategies reshape internal representations in neural networks. Using tools from shape analysis, we embed network hidden representations into a metric space where distance is invariant to scaling, translation, rotation and reflection. We show that increasing augmentation strength leads to well-behaved trajectories in this space, and that different augmentation types steer representations in distinct directions. Moreover, we investigate how neural representation shapes are distorted along data augmentation trajectories, and show that insights from neural geometry can predict which representations provide the most improvement when ensembling.

What carries the argument

The invariant metric embedding of hidden representations obtained via shape analysis, which turns raw activations into points whose distances ignore scaling, translation, rotation, and reflection.

Load-bearing premise

The chosen invariant metric embedding of hidden representations captures the geometric properties most relevant to generalization and ensembling performance.

What would settle it

If the representations predicted to improve ensembling the most fail to produce measurable gains when those models are actually combined, or if the reported trajectories do not appear when the same augmentations are applied to new architectures or datasets.

Figures

Figures reproduced from arXiv: 2605.15306 by Alex H. Williams, Sarah E. Harvey, Tianxiao He.

**Figure 1.** Figure 1: Embedding into shape space enables more meaningful comparison of large ensembles of representations. (a) After two models are trained end-to-end using different data augmentation methods, M unaltered images from the test dataset are randomly selected as probe images. The Riemannian shape distance eq. (2) between these two representations is measured as the angular distance up to translation, scale, rotatio… view at source ↗

**Figure 2.** Figure 2: Orthogonal alignment reveals well-behaved DA trajectories in representation shape space that are not present without alignment. Using ResNet-18 layer 2 responses to 10,000 CIFAR-10 test set images from models trained with the same seed, pairwise representation distances show coherent organization across augmentation strengths only when the alignment step is included. (a, b) We compare additive Gaussian noi… view at source ↗

**Figure 3.** Figure 3: Augmentation-driven representation changes in shape space can be compared directly with seed-to-seed variability. Patch Gaussian DA is applied during training of ResNet-18 (top) and ViT (bottom) for a grid of (C, s) values. (a) Cross-seed distance matrices show repeated within-seed structure, (b) MDS-PCA suggests similar augmentation trajectories across three random seeds. (c) Daug and Dseed averaged over … view at source ↗

**Figure 4.** Figure 4: Different augmentation types produce distinct representational changes, whose diversity predicts ensemble gains. (a) MDS-PCA embedding of representation shapes for ResNet18 layer 2.1 trained with 8 distinct augmentation types applied individually. Each trajectory connected with dotted lines corresponds to a single augmentation method applied at different magnitudes listed in [PITH_FULL_IMAGE:figures/full_… view at source ↗

**Figure 5.** Figure 5: DA produces layer-dependent landmark distortions. Landmark displacement analysis is performed comparing ResNet-18 representations trained without DA with those trained with grayscale DA for layer 2 (top) and avgpool (bottom). (a,e) Histograms of landmark displacement magnitudes across probe images; (b,f) 25 minimally displaced landmarks; (c,g) maximally displaced landmarks split into those contracting towa… view at source ↗

read the original abstract

Data augmentation is widely recognized for improving generalization in deep networks, yet its impact on the geometry of learned representations remains poorly understood. In this work, we characterize how different data augmentation strategies reshape internal representations in neural networks. Using tools from shape analysis, we embed network hidden representations into a metric space where distance is invariant to scaling, translation, rotation and reflection. We show that increasing augmentation strength leads to well-behaved trajectories in this space, and that different augmentation types steer representations in distinct directions. Moreover, we investigate how neural representation shapes are distorted along data augmentation trajectories, and show that insights from neural geometry can predict which representations provide the most improvement when ensembling models. Our results reveal shared geometric patterns across architectures and seeds, and suggest that analyzing shape-space trajectories offers a principled tool for understanding and comparing data augmentation methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper maps data-augmentation effects onto invariant shape-space trajectories and claims those trajectories predict which models ensemble well, but the invariance step may be discarding magnitude signals that actually drive the gains.

read the letter

The core observation is that stronger augmentation produces smoother paths in a representation space whose distance ignores scale, translation, rotation, and reflection, and that different augmentation families point in different directions within that space. They also report that the shape of these trajectories can be used to pick which checkpoints or architectures will give the biggest lift when ensembled, and that the patterns repeat across seeds and model families. That last part is the most practically useful piece if it holds up. The experiments appear to be run on standard vision benchmarks with multiple architectures, which is the right scope for a methods paper like this. The shared geometric patterns across runs are a clean result and worth having on record. The main weakness is exactly the one the stress-test note flags: once you normalize away magnitude, you lose any signal that lives in activation scale or norm, and those quantities often correlate with calibration and ensemble weighting. Without a direct comparison to a non-invariant baseline on the same ensembling task, it is hard to tell whether the predictive power comes from the shape geometry or from whatever information survives the normalization. The paper would be stronger with that ablation and with clearer reporting on how they quantify the ensembling improvement. This is the kind of work that belongs in a methods-oriented venue rather than a broad theory track. Readers who already care about representation geometry or augmentation design will find the trajectories and the ensembling prediction useful to test in their own pipelines. It is coherent enough and grounded enough in reproducible experiments to deserve referee time, though it will need the invariance check and tighter statistical reporting before it is ready for a top conference.

Referee Report

1 major / 2 minor

Summary. The paper claims that data augmentation reshapes neural network hidden representations in geometrically interpretable ways. By embedding representations into a metric space invariant to scaling, translation, rotation and reflection via shape-analysis tools, the authors report that stronger augmentations yield well-behaved trajectories, distinct augmentation families produce distinct directional shifts, and geometric properties along these trajectories predict which representations yield the largest gains when models are ensembled. Shared patterns are observed across architectures and random seeds.

Significance. If the central claims hold under rigorous controls, the work supplies a new geometric vocabulary for comparing augmentation strategies and for forecasting ensemble utility directly from representation shape. The cross-architecture consistency, if demonstrated with appropriate statistical tests, would be a notable strength for a field that often treats augmentation as a black-box hyper-parameter.

major comments (1)

[ensembling-prediction experiments (likely §4.3 or §5)] The central claim that geometric insights in the invariant shape space predict ensembling improvements (abstract and the ensembling-prediction experiments) rests on the assumption that the chosen invariant metric retains the properties that actually drive generalization and ensemble benefit. Because the embedding normalizes away activation magnitudes, it could mask differences in representation scale or norm that correlate with calibration and ensemble weighting. Without an ablation that compares the invariant distance against a non-invariant baseline (e.g., centered but unscaled Euclidean) on the same ensembling task, it remains unclear whether the reported predictive power stems from the geometric construction or from information that survives the invariance.

minor comments (2)

[Abstract] The abstract states that trajectories are 'well-behaved' without a brief operational definition or pointer to the quantitative measure used; a single sentence clarifying the criterion would aid readability.
[Figures] Figure captions and axis labels in the trajectory plots should explicitly state the invariance group (scaling, translation, rotation, reflection) to prevent readers from misinterpreting the embedding.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive and insightful report. We address the major comment below and describe the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [ensembling-prediction experiments (likely §4.3 or §5)] The central claim that geometric insights in the invariant shape space predict ensembling improvements (abstract and the ensembling-prediction experiments) rests on the assumption that the chosen invariant metric retains the properties that actually drive generalization and ensemble benefit. Because the embedding normalizes away activation magnitudes, it could mask differences in representation scale or norm that correlate with calibration and ensemble weighting. Without an ablation that compares the invariant distance against a non-invariant baseline (e.g., centered but unscaled Euclidean) on the same ensembling task, it remains unclear whether the reported predictive power stems from the geometric construction or from information that survives the invariance.

Authors: We agree that an explicit ablation is needed to confirm that the reported predictive power derives from the shape-invariant geometry rather than residual information after normalization. The invariant embedding is motivated by our goal of isolating representation shape from magnitude effects, which we hypothesize is particularly relevant for comparing augmentation strategies. To address the concern directly, the revised manuscript will include a new ablation in the ensembling-prediction section that evaluates the same prediction task using both the invariant metric and a non-invariant baseline (centered but unscaled Euclidean distance). We will report the comparative results and discuss whether the invariance improves, maintains, or reduces predictive accuracy. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation relies on external shape-analysis tools and empirical trajectories

full rationale

The paper embeds representations using established invariant metrics from shape analysis (scaling/translation/rotation/reflection invariance), then reports observed trajectories under augmentation strength and type. These trajectories and their use for predicting ensemble gains are presented as empirical findings, not as quantities fitted or defined to match the target outcomes. No equations reduce a claimed prediction to a fitted parameter by construction, and no load-bearing premise rests solely on self-citation chains. The central claims remain falsifiable against held-out data and alternative embeddings, satisfying the criteria for a self-contained, non-circular derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only abstract available; no free parameters, axioms or invented entities are described.

pith-pipeline@v0.9.0 · 5665 in / 944 out tokens · 36631 ms · 2026-05-19T16:20:15.777938+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

embed network hidden representations into a metric space where distance is invariant to scaling, translation, rotation and reflection... Riemannian shape distance ρ(Xi,Xj)=arccos(sup Tr[Zj⊤ZiO])

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages · 6 internal anchors

[1]

A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay

Leslie N. Smith. “A disciplined approach to neural network hyper-parameters: Part 1 – learning rate, batch size, momentum, and weight decay”. In:CoRRabs/1803.09820 (2018).URL: https://arxiv.org/abs/1803.09820

work page internal anchor Pith review Pith/arXiv arXiv 2018
[2]

Khoshgoftaar

Connor Shorten and Taghi Khoshgoftaar. “A survey on Image Data Augmentation for Deep Learning”. In:Journal of Big Data6 (July 2019).DOI:10.1186/s40537-019-0197-0

work page doi:10.1186/s40537-019-0197-0 2019
[3]

A Kernel Theory of Modern Data Augmentation

Tri Dao, Albert Gu, Alexander J. Ratner, Virginia Smith, Christopher De Sa, and Christopher Ré.A Kernel Theory of Modern Data Augmentation. 2019. arXiv: 1803.06084 [cs.LG] . URL:https://arxiv.org/abs/1803.06084

work page internal anchor Pith review Pith/arXiv arXiv 2019
[4]

Understanding Image Representations by Measuring Their Equivariance and Equivalence

Karel Lenc and Andrea Vedaldi. “Understanding Image Representations by Measuring Their Equivariance and Equivalence”. In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). June 2015

work page 2015
[5]

A group-theoretic framework for data augmentation

Shuxiao Chen, Edgar Dobriban, and Jane H Lee. “A group-theoretic framework for data augmentation”. In:Journal of Machine Learning Research21.245 (2020), pp. 1–71

work page 2020
[6]

Generalized Shape Metrics on Neural Representations

Alex H Williams, Erin Kunz, Simon Kornblith, and Scott Linderman. “Generalized Shape Metrics on Neural Representations”. In:Advances in Neural Information Processing Systems. Ed. by M. Ranzato, A. Beygelzimer, Y . Dauphin, P.S. Liang, and J. Wortman Vaughan. V ol. 34. Curran Associates, Inc., 2021, pp. 4738–4750

work page 2021
[7]

Deep Networks as Paths on the Manifold of Neural Representations

Richard D Lange, Devin Kwok, Jordan Kyle Matelsky, Xinyue Wang, David Rolnick, and Konrad Kording. “Deep Networks as Paths on the Manifold of Neural Representations”. In:Proceedings of 2nd Annual Workshop on Topology, Algebra, and Geometry in Machine Learning (TAG-ML). Ed. by Timothy Doster, Tegan Emerson, Henry Kvinge, Nina Miolane, Mathilde Papillon, Ba...

work page 2023
[8]

Aggarwal, Jian Pei, and Yuanchun Zhou.A Comprehensive Survey on Data Augmentation

Zaitian Wang, Pengfei Wang, Kunpeng Liu, Pengyang Wang, Yanjie Fu, Chang-Tien Lu, Charu C. Aggarwal, Jian Pei, and Yuanchun Zhou.A Comprehensive Survey on Data Augmentation

work page
[9]

arXiv:2405.09591 [cs.LG].URL:https://arxiv.org/abs/2405.09591

work page arXiv
[10]

Training with noise is equivalent to Tikhonov regularization

Christopher M. Bishop. “Training with noise is equivalent to Tikhonov regularization”. In: Neural Computation7.1 (1995), pp. 108–116.DOI: 10.1162/neco.1995.7.1.108 .URL: https://doi.org/10.1162/neco.1995.7.1.108

work page doi:10.1162/neco.1995.7.1.108 1995
[11]

A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information

BA Olshausen, CH Anderson, and DC Van Essen. “A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information”. In:Journal of Neuroscience13.11 (1993), pp. 4700–4719.ISSN: 0270-6474.DOI: 10.1523/JNEUROSCI.13- 11-04700.1993 . eprint: https://www.jneurosci.org/content/13/11/4700.full. pdf.URL:https://www...

work page doi:10.1523/jneurosci.13- 1993
[12]

Hierarchical models of object recognition in cortex

Maximilian Riesenhuber and Tomaso Poggio. “Hierarchical models of object recognition in cortex”. In:Nature neuroscience2.11 (1999), pp. 1019–1025

work page 1999
[13]

Untangling invariant object recognition

James J DiCarlo and David D Cox. “Untangling invariant object recognition”. In:Trends in cognitive sciences11.8 (2007), pp. 333–341

work page 2007
[14]

Simon Kornblith, Mohammad Norouzi, Honglak Lee, and Geoffrey Hinton.Similarity of Neural Network Representations Revisited. 2019. arXiv: 1905.00414 [cs.LG].URL: https: //arxiv.org/abs/1905.00414

work page internal anchor Pith review Pith/arXiv arXiv 2019
[15]

Do wide and deep networks learn the same things? uncovering how neural network representations vary with width and depth.arXiv preprint arXiv:2010.15327, 2020

Thao Nguyen, Maithra Raghu, and Simon Kornblith. “Do Wide and Deep Networks Learn the Same Things? Uncovering How Neural Network Representations Vary with Width and Depth”. In:CoRRabs/2010.15327 (2020). arXiv: 2010.15327.URL: https://arxiv.org/ abs/2010.15327. 10

work page arXiv 2010
[16]

Similarity of neural network models: A survey of functional and representational measures

Max Klabunde, Tobias Schumacher, Markus Strohmaier, and Florian Lemmerich. “Similarity of neural network models: A survey of functional and representational measures”. In:ACM Computing Surveys57.9 (2025), pp. 1–52

work page 2025
[17]

The Diffusion of Shape

D G Kendall. “The Diffusion of Shape”. In:Adv. Appl. Probab.9.3 (1977), pp. 428–430

work page 1977
[18]

Grounding representation simi- larity through statistical testing

Frances Ding, Jean-Stanislas Denain, and Jacob Steinhardt. “Grounding representation simi- larity through statistical testing”. In:Advances in Neural Information Processing Systems34 (2021), pp. 1556–1568

work page 2021
[19]

What Representational Similarity Mea- sures Imply about Decodable Information

Sarah E Harvey, David Lipshutz, and Alex H Williams. “What Representational Similarity Mea- sures Imply about Decodable Information”. In:Proceedings of UniReps: the Second Edition of the Workshop on Unifying Representations in Neural Models. Ed. by Marco Fumero, Clemen- tine Domine, Zorah Lähner, Donato Crisostomi, Luca Moschella, and Kimberly Stachenfeld...

work page 2024
[20]

John Wiley & Sons, 2009

David George Kendall, Dennis Barden, Thomas K Carne, and Huiling Le.Shape and shape theory. John Wiley & Sons, 2009

work page 2009
[21]

Dryden and Kanti V

Ian L. Dryden and Kanti V . Mardia.Statistical Shape Analysis: With Applications in R. 2nd. John Wiley & Sons, 2016.ISBN: 978-0-470-69962-1

work page 2016
[22]

ImageNet: A large- scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. “ImageNet: A large- scale hierarchical image database”. In:2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE. 2009, pp. 248–255

work page 2009
[23]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Deep residual learning for image recognition”. In:Proceedings of the IEEE conference on computer vision and pattern recognition. 2016, pp. 770–778

work page 2016
[24]

Improving Robustness Without Sacrificing Accuracy with Patch Gaussian Augmentation

Raphael Gontijo Lopes, Dong Yin, Ben Poole, Justin Gilmer, and Ekin D Cubuk. “Improving robustness without sacrificing accuracy with patch gaussian augmentation”. In:arXiv preprint arXiv:1906.02611(2019)

work page internal anchor Pith review Pith/arXiv arXiv 1906
[25]

Improved Regularization of Convolutional Neural Networks with Cutout

Terrance DeVries and Graham W Taylor. “Improved regularization of convolutional neural networks with cutout”. In:arXiv preprint arXiv:1708.04552(2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[26]

Wide Residual Networks

Sergey Zagoruyko and Nikos Komodakis. “Wide residual networks”. In:arXiv preprint arXiv:1605.07146(2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[27]

The Butterfly Effect: Neural Network Training Trajectories Are Highly Sensitive to Initial Conditions

Gül Sena Altınta¸ s, Devin Kwok, Colin Raffel, and David Rolnick. “The Butterfly Effect: Neural Network Training Trajectories Are Highly Sensitive to Initial Conditions”. In:F orty- second International Conference on Machine Learning. 2025.URL: https://openreview. net/forum?id=L1Bm396P0X

work page 2025
[28]

Cambridge Series in Statistical and Probabilistic Mathematics

Roman Vershynin.High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2018.ISBN: 9781108415194

work page 2018
[29]

Duality of Bures and Shape Distances with Implications for Comparing Neural Representations

Sarah E. Harvey, Brett W. Larsen, and Alex H. Williams. “Duality of Bures and Shape Distances with Implications for Comparing Neural Representations”. In:Proceedings of UniReps: the First Workshop on Unifying Representations in Neural Models. Ed. by Marco Fumero, Emanuele Rodolá, Clementine Domine, Francesco Locatello, Karolina Dziugaite, and Caron Mathil...

work page 2024

[1] [1]

A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay

Leslie N. Smith. “A disciplined approach to neural network hyper-parameters: Part 1 – learning rate, batch size, momentum, and weight decay”. In:CoRRabs/1803.09820 (2018).URL: https://arxiv.org/abs/1803.09820

work page internal anchor Pith review Pith/arXiv arXiv 2018

[2] [2]

Khoshgoftaar

Connor Shorten and Taghi Khoshgoftaar. “A survey on Image Data Augmentation for Deep Learning”. In:Journal of Big Data6 (July 2019).DOI:10.1186/s40537-019-0197-0

work page doi:10.1186/s40537-019-0197-0 2019

[3] [3]

A Kernel Theory of Modern Data Augmentation

Tri Dao, Albert Gu, Alexander J. Ratner, Virginia Smith, Christopher De Sa, and Christopher Ré.A Kernel Theory of Modern Data Augmentation. 2019. arXiv: 1803.06084 [cs.LG] . URL:https://arxiv.org/abs/1803.06084

work page internal anchor Pith review Pith/arXiv arXiv 2019

[4] [4]

Understanding Image Representations by Measuring Their Equivariance and Equivalence

Karel Lenc and Andrea Vedaldi. “Understanding Image Representations by Measuring Their Equivariance and Equivalence”. In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). June 2015

work page 2015

[5] [5]

A group-theoretic framework for data augmentation

Shuxiao Chen, Edgar Dobriban, and Jane H Lee. “A group-theoretic framework for data augmentation”. In:Journal of Machine Learning Research21.245 (2020), pp. 1–71

work page 2020

[6] [6]

Generalized Shape Metrics on Neural Representations

Alex H Williams, Erin Kunz, Simon Kornblith, and Scott Linderman. “Generalized Shape Metrics on Neural Representations”. In:Advances in Neural Information Processing Systems. Ed. by M. Ranzato, A. Beygelzimer, Y . Dauphin, P.S. Liang, and J. Wortman Vaughan. V ol. 34. Curran Associates, Inc., 2021, pp. 4738–4750

work page 2021

[7] [7]

Deep Networks as Paths on the Manifold of Neural Representations

Richard D Lange, Devin Kwok, Jordan Kyle Matelsky, Xinyue Wang, David Rolnick, and Konrad Kording. “Deep Networks as Paths on the Manifold of Neural Representations”. In:Proceedings of 2nd Annual Workshop on Topology, Algebra, and Geometry in Machine Learning (TAG-ML). Ed. by Timothy Doster, Tegan Emerson, Henry Kvinge, Nina Miolane, Mathilde Papillon, Ba...

work page 2023

[8] [8]

Aggarwal, Jian Pei, and Yuanchun Zhou.A Comprehensive Survey on Data Augmentation

Zaitian Wang, Pengfei Wang, Kunpeng Liu, Pengyang Wang, Yanjie Fu, Chang-Tien Lu, Charu C. Aggarwal, Jian Pei, and Yuanchun Zhou.A Comprehensive Survey on Data Augmentation

work page

[9] [9]

arXiv:2405.09591 [cs.LG].URL:https://arxiv.org/abs/2405.09591

work page arXiv

[10] [10]

Training with noise is equivalent to Tikhonov regularization

Christopher M. Bishop. “Training with noise is equivalent to Tikhonov regularization”. In: Neural Computation7.1 (1995), pp. 108–116.DOI: 10.1162/neco.1995.7.1.108 .URL: https://doi.org/10.1162/neco.1995.7.1.108

work page doi:10.1162/neco.1995.7.1.108 1995

[11] [11]

A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information

BA Olshausen, CH Anderson, and DC Van Essen. “A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information”. In:Journal of Neuroscience13.11 (1993), pp. 4700–4719.ISSN: 0270-6474.DOI: 10.1523/JNEUROSCI.13- 11-04700.1993 . eprint: https://www.jneurosci.org/content/13/11/4700.full. pdf.URL:https://www...

work page doi:10.1523/jneurosci.13- 1993

[12] [12]

Hierarchical models of object recognition in cortex

Maximilian Riesenhuber and Tomaso Poggio. “Hierarchical models of object recognition in cortex”. In:Nature neuroscience2.11 (1999), pp. 1019–1025

work page 1999

[13] [13]

Untangling invariant object recognition

James J DiCarlo and David D Cox. “Untangling invariant object recognition”. In:Trends in cognitive sciences11.8 (2007), pp. 333–341

work page 2007

[14] [14]

Simon Kornblith, Mohammad Norouzi, Honglak Lee, and Geoffrey Hinton.Similarity of Neural Network Representations Revisited. 2019. arXiv: 1905.00414 [cs.LG].URL: https: //arxiv.org/abs/1905.00414

work page internal anchor Pith review Pith/arXiv arXiv 2019

[15] [15]

Do wide and deep networks learn the same things? uncovering how neural network representations vary with width and depth.arXiv preprint arXiv:2010.15327, 2020

Thao Nguyen, Maithra Raghu, and Simon Kornblith. “Do Wide and Deep Networks Learn the Same Things? Uncovering How Neural Network Representations Vary with Width and Depth”. In:CoRRabs/2010.15327 (2020). arXiv: 2010.15327.URL: https://arxiv.org/ abs/2010.15327. 10

work page arXiv 2010

[16] [16]

Similarity of neural network models: A survey of functional and representational measures

Max Klabunde, Tobias Schumacher, Markus Strohmaier, and Florian Lemmerich. “Similarity of neural network models: A survey of functional and representational measures”. In:ACM Computing Surveys57.9 (2025), pp. 1–52

work page 2025

[17] [17]

The Diffusion of Shape

D G Kendall. “The Diffusion of Shape”. In:Adv. Appl. Probab.9.3 (1977), pp. 428–430

work page 1977

[18] [18]

Grounding representation simi- larity through statistical testing

Frances Ding, Jean-Stanislas Denain, and Jacob Steinhardt. “Grounding representation simi- larity through statistical testing”. In:Advances in Neural Information Processing Systems34 (2021), pp. 1556–1568

work page 2021

[19] [19]

What Representational Similarity Mea- sures Imply about Decodable Information

Sarah E Harvey, David Lipshutz, and Alex H Williams. “What Representational Similarity Mea- sures Imply about Decodable Information”. In:Proceedings of UniReps: the Second Edition of the Workshop on Unifying Representations in Neural Models. Ed. by Marco Fumero, Clemen- tine Domine, Zorah Lähner, Donato Crisostomi, Luca Moschella, and Kimberly Stachenfeld...

work page 2024

[20] [20]

John Wiley & Sons, 2009

David George Kendall, Dennis Barden, Thomas K Carne, and Huiling Le.Shape and shape theory. John Wiley & Sons, 2009

work page 2009

[21] [21]

Dryden and Kanti V

Ian L. Dryden and Kanti V . Mardia.Statistical Shape Analysis: With Applications in R. 2nd. John Wiley & Sons, 2016.ISBN: 978-0-470-69962-1

work page 2016

[22] [22]

ImageNet: A large- scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. “ImageNet: A large- scale hierarchical image database”. In:2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE. 2009, pp. 248–255

work page 2009

[23] [23]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Deep residual learning for image recognition”. In:Proceedings of the IEEE conference on computer vision and pattern recognition. 2016, pp. 770–778

work page 2016

[24] [24]

Improving Robustness Without Sacrificing Accuracy with Patch Gaussian Augmentation

Raphael Gontijo Lopes, Dong Yin, Ben Poole, Justin Gilmer, and Ekin D Cubuk. “Improving robustness without sacrificing accuracy with patch gaussian augmentation”. In:arXiv preprint arXiv:1906.02611(2019)

work page internal anchor Pith review Pith/arXiv arXiv 1906

[25] [25]

Improved Regularization of Convolutional Neural Networks with Cutout

Terrance DeVries and Graham W Taylor. “Improved regularization of convolutional neural networks with cutout”. In:arXiv preprint arXiv:1708.04552(2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[26] [26]

Wide Residual Networks

Sergey Zagoruyko and Nikos Komodakis. “Wide residual networks”. In:arXiv preprint arXiv:1605.07146(2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[27] [27]

The Butterfly Effect: Neural Network Training Trajectories Are Highly Sensitive to Initial Conditions

Gül Sena Altınta¸ s, Devin Kwok, Colin Raffel, and David Rolnick. “The Butterfly Effect: Neural Network Training Trajectories Are Highly Sensitive to Initial Conditions”. In:F orty- second International Conference on Machine Learning. 2025.URL: https://openreview. net/forum?id=L1Bm396P0X

work page 2025

[28] [28]

Cambridge Series in Statistical and Probabilistic Mathematics

Roman Vershynin.High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2018.ISBN: 9781108415194

work page 2018

[29] [29]

Duality of Bures and Shape Distances with Implications for Comparing Neural Representations

Sarah E. Harvey, Brett W. Larsen, and Alex H. Williams. “Duality of Bures and Shape Distances with Implications for Comparing Neural Representations”. In:Proceedings of UniReps: the First Workshop on Unifying Representations in Neural Models. Ed. by Marco Fumero, Emanuele Rodolá, Clementine Domine, Francesco Locatello, Karolina Dziugaite, and Caron Mathil...

work page 2024