Deep network as memory space: complexity, generalization, disentangled representation and interpretability
Pith reviewed 2026-05-24 22:22 UTC · model grok-4.3
The pith
Deep networks function as memory spaces whose capacity and efficiency follow from a Fisher-metric complexity that obeys the least-action principle.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Deep networks are memory spaces of information in which the capacity, robustness, and efficiency of the stored information are governed by the complexity, generalization, and disentanglement that arise once a Fisher metric is placed on the manifold of network configurations and the least-action principle is imposed so that complexity equals action.
What carries the argument
Fisher-metric formulation of network complexity together with the least-action principle (complexity equals action) applied to the geometry of deep-network configurations.
If this is right
- The geometric volume of admissible configurations sets the effective capacity of the memory.
- Generalization improves when the network configuration minimizes the action for a given task.
- Disentangled representations correspond to orthogonal directions in the Fisher geometry of the memory space.
- Interpretability reduces to reading off the geometric invariants of the learned configurations.
Where Pith is reading between the lines
- Regularization schemes could be derived by penalizing the action rather than conventional weight norms.
- The same geometric memory picture might be tested on recurrent or transformer architectures to check whether their generalization scales follow the same action principle.
- Empirical measurement of the Fisher information matrix during training could serve as an early indicator of whether a network will generalize.
Load-bearing premise
That the Fisher metric and least-action principle can be applied directly to deep-network parameter spaces and will produce valid statements about generalization and interpretability.
What would settle it
A controlled experiment in which networks with measurably lower Fisher-metric complexity fail to show improved generalization or more disentangled internal representations would falsify the claimed link.
Figures
read the original abstract
By bridging deep networks and physics, the programme of geometrization of deep networks was proposed as a framework for the interpretability of deep learning systems. Following this programme we can apply two key ideas of physics, the geometrization of physics and the least action principle, on deep networks and deliver a new picture of deep networks: deep networks as memory space of information, where the capacity, robustness and efficiency of the memory are closely related with the complexity, generalization and disentanglement of deep networks. The key components of this understanding include:(1) a Fisher metric based formulation of the network complexity; (2)the least action (complexity=action) principle on deep networks and (3)the geometry built on deep network configurations. We will show how this picture will bring us a new understanding of the interpretability of deep learning systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a conceptual framework that interprets deep neural networks as an information 'memory space' by importing two ideas from physics: geometrization via the Fisher information metric (to formulate network complexity) and the least-action principle (equating complexity to action). This geometry on network configurations is claimed to relate memory capacity, robustness and efficiency to complexity, generalization and disentanglement, thereby furnishing a new route to interpretability of deep learning systems. The three key components listed are (1) the Fisher-metric complexity, (2) the complexity=action principle, and (3) the induced geometry on configurations.
Significance. If the proposed mappings can be made rigorous and predictive, the framework would supply a physics-inspired language for discussing generalization and interpretability that is currently absent from the literature. The explicit invocation of Fisher geometry and least-action principles is a distinctive strength; however, the manuscript remains at the level of formal analogy without derivations, explicit parameter-space calculations, or falsifiable predictions that would allow the claimed relations to be tested.
major comments (1)
- [Abstract / key components (1) and (2)] Abstract and key components (1)–(2): the central claim that a Fisher-metric formulation together with the least-action principle (complexity=action) yields relations among capacity, generalization and disentanglement is asserted without any explicit derivation, mapping of the metric to network parameters, or worked example. Because these relations are load-bearing for the entire interpretability programme, their absence prevents evaluation of whether the framework is more than a re-description of existing quantities.
minor comments (1)
- Notation for the Fisher metric and the action functional is introduced only at the level of the abstract; explicit definitions and any regularity conditions on the parameter manifold should be supplied in the main text.
Simulated Author's Rebuttal
We thank the referee for the constructive review. The manuscript presents a conceptual framework that applies ideas from physics to deep networks, and we respond to the major comment below while clarifying the intended scope of the work.
read point-by-point responses
-
Referee: [Abstract / key components (1) and (2)] Abstract and key components (1)–(2): the central claim that a Fisher-metric formulation together with the least-action principle (complexity=action) yields relations among capacity, generalization and disentanglement is asserted without any explicit derivation, mapping of the metric to network parameters, or worked example. Because these relations are load-bearing for the entire interpretability programme, their absence prevents evaluation of whether the framework is more than a re-description of existing quantities.
Authors: We agree that the manuscript does not contain explicit derivations, mappings of the Fisher metric to specific network parameters, or worked examples. The paper is framed as a high-level proposal that imports the geometrization of physics and the least-action principle to define network complexity and suggest links to generalization and disentanglement. These links are presented as implications of the proposed geometry rather than as rigorously derived results. We will revise the abstract and the description of the key components to state explicitly that the framework outlines a programme for future investigation rather than completed derivations. This change will make the scope of the claims clearer without altering the conceptual contribution. revision: partial
Circularity Check
No significant circularity identified
full rationale
The paper introduces a conceptual mapping from physics (Fisher metric on parameter space, least-action principle) onto deep networks to reinterpret complexity, generalization, and disentanglement. The relation 'complexity=action' is presented as an application of an external physical principle rather than an internal definition that forces later results. No equations, derivations, or self-citations in the provided text reduce any claimed prediction or insight to a fitted input or prior self-result by construction. The framework operates at the level of formal analogy and re-interpretation; its central claims remain independent of the inputs they are applied to.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Fisher metric can be used to formulate the complexity of a deep network
- domain assumption The least action principle applies to deep networks with the identification complexity=action
invented entities (1)
-
Deep network as memory space
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Fisher metric based formulation of the network complexity; the least action (complexity=action) principle on deep networks and the geometry built on deep network configurations
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
geometry/information duality... spacetime emerges from the information of a physical system... memory space
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
dimension of the emergent geometry is determined by the dimension of the quantum operation... 4-dimensional spacetime
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Geometrization of deep networks for the interpretability of deep learning systems
X. Dong and L. Zhou. Geometrization of deep networks for the interpretability of deep learning systems. arxiv:1901.02354, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1901
-
[2]
Understanding over-parameterized deep networks by geometrization
X. Dong and L. Zhou. Understanding over-parameterized deep networks by geometrization. arxiv:1902.03793, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1902
-
[3]
S. Lloyd. A theory of quantum gravity based on quantum computation. Class.quant.grav, 2006
work page 2006
-
[4]
Constructing holographic spacetimes using entanglement renormalization
Brian Swingle. Constructing holographic spacetimes using entanglement renormalization. Physics, 2012
work page 2012
-
[5]
M. van Raamsdonk. Building up spacetime with quantum entanglement. General Relativity and Gravitation , 42(10):2323–2329, 2010
work page 2010
-
[6]
Geometry and Dynamics of Emergent Spacetime from Entanglement Spectrum
Hiroaki Matsueda. Derivation of gravitational field equation from entanglement entropy. arXiv:1408.5589v2, 70, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[7]
Quantum order from string-net condensations and the origin of light and massless fermions
Wen Xiao-Gang. Quantum order from string-net condensations and the origin of light and massless fermions. Physical Review D , 68(6):484– 504, 2003
work page 2003
-
[8]
A theory of quantum gravity based on quantum computation
Seth Lloyd. A theory of quantum gravity based on quantum computation. Class.quant.grav, 2012
work page 2012
-
[9]
G. Evenbly and G. Vidal. Tensor network states and geometry. Journal of Statistical Physics , 145(4):891–918, 2011
work page 2011
-
[10]
Algorithms for tensor network renormalization
Glen Evenbly. Algorithms for tensor network renormalization. Phys.rev.b, 95(4), 2017
work page 2017
-
[11]
M. R. Dowling and M. A. Nielsen. The geometry of quantum computation. Quantum Information and Computation , 8(10):861–899, 2008
work page 2008
-
[12]
Leonard Susskind. Dear qubitzers, gr=qm. arXiv:1708.03040v1, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[13]
J.S. Wu X. Dong and L. Zhou. How deep learning works –the geometry of deep learning. arXiv:1710.10784, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[14]
Martins Bruveris and Darryl D. Holm. Geometry of image registration: The diffeomorphism group and momentum maps. Fields Institute Communications, 73:19–56, 2013
work page 2013
-
[15]
A mean-field optimal control formulation of deep learning
E Weinan, Jiequn Han, and Qianxiao Li. A mean-field optimal control formulation of deep learning. arxiv:1807.01083v1, 2018
-
[16]
Romn Ors. A practical introduction to tensor networks: Matrix prod- uct states and projected entangled pair states. Annals of Physics , 349(10):117–158, 2014
work page 2014
-
[17]
L. Susskind. Entanglement is not enough. arXiv:1411.0690v1, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[18]
Emergent General Relativity from Fisher Information Metric
Hiroaki Matsueda. Emergent general relativity from fisher information metric. arXiv:1310.1831v2, 2013
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[19]
Martins Bruveris and Peter W. Michor. Geometry of the fisher-rao metric on the space of smooth densities on a compact manifold. 2016
work page 2016
-
[20]
Understanding black-box predictions via influence functions
Pang Wei Koh and Percy Liang. Understanding black-box predictions via influence functions. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning , volume 70 of Proceedings of Machine Learning Research , pages 1885– 1894, International Convention Centre, Sydney, Australia, 06–11 Aug
-
[21]
Learning to reweight examples for robust deep learning
Mengye Ren, Wenyuan Zeng, Bin Yang, and Raquel Urtasun. Learning to reweight examples for robust deep learning. In Jennifer Dy and An- dreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning , volume 80 of Proceedings of Machine Learning Research, pages 4334–4343, Stockholmsmssan, Stockholm Sweden, 10– 15 Jul 2018. PMLR
work page 2018
-
[22]
Fisher-rao metric, geometry, and complexity of neural networks
Tengyuan Liang, Tomaso Poggio, Alexander Rakhlin, and James Stokes. Fisher-rao metric, geometry, and complexity of neural networks. arxiv:1711.01530, 2017
-
[23]
Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes
Wu Lei, Zhanxing Zhu, and E Weinan. Towards understand- ing generalization of deep learning: Perspective of loss landscapes. arxiv:1706.10239v2, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[24]
Universal statistics of fisher information in deep neural networks: Mean field approach
Ryo Karakida, Shotaro Akaho, and Shun Ichi Amari. Universal statistics of fisher information in deep neural networks: Mean field approach. 2018
work page 2018
-
[25]
The typical state paradox: diagnosing horizons with complexity
Leonard Susskind. The typical state paradox: diagnosing horizons with complexity. F ortschritte Der Physik, 64(1):84–91, 2016. 11
work page 2016
-
[26]
Switchbacks and the Bridge to Nowhere
L. Susskind and Y . Zhao. Switchbacks and the bridge to nowhere. arXiv:1408.2823v1, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[27]
C. Fernndez-Gonzlez, N. Schuch, M. M. Wolf, J. I. Cirac, and D. Prez- Garca. Frustration free gapless hamiltonians for matrix product states. Communications in Mathematical Physics , 333(1):299–333, 2015
work page 2015
-
[28]
H. Heydari. Geometric formulation of quantum mechanics. arXiv:1503.00238, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[29]
Challenging common as- sumptions in the unsupervised learning of disentangled representations
Francesco Locatello, Stefan Bauer, Mario Lucic, Gunnar Rtsch, Sylvain Gelly, Bernhard Schlkopf, and Olivier Bachem. Challenging common as- sumptions in the unsupervised learning of disentangled representations. 2018
work page 2018
-
[30]
Gauge theory and twins paradox of disentangled representations
X. Dong and Zhou. L. Gauge theory and twins paradox of disentangled representations. arxiv:1906.10545, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1906
-
[31]
Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba, and Alexei A. Efros. Dataset distillation. 2018
work page 2018
-
[32]
David Bau, Jun-Yan Zhu, Hendrik Strobelt, Bolei Zhou, Joshua B. Tenenbaum, William T. Freeman, and Antonio Torralba. Gan dissection: Visualizing and understanding generative adversarial networks. 2018
work page 2018
-
[33]
A. Gaier and D. Ha. Weight agnostic neural networks. arxiv:1906.04358, 2019
- [34]
-
[35]
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
Jonathan Frankle and Michael Carbin. The lottery ticket hypothesis: Finding sparse, trainable neural networks. arxiv:1803.03635, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
- [36]
-
[37]
Y . He, P. Liu, Z.W. Wang, Z.L. Hu, and Yang Y . Filter pruning via geometric median for deep convolutional neural networks acceleration. arxiv:1811.00250, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.