pith. sign in

arxiv: 1907.06572 · v1 · pith:N4MOBDTLnew · submitted 2019-07-12 · 💻 cs.LG · cs.AI

Deep network as memory space: complexity, generalization, disentangled representation and interpretability

Pith reviewed 2026-05-24 22:22 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords deep networksFisher metricleast action principleinterpretabilitygeneralizationdisentanglementmemory spacegeometrization
0
0 comments X

The pith

Deep networks function as memory spaces whose capacity and efficiency follow from a Fisher-metric complexity that obeys the least-action principle.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes to view deep networks as memory spaces of information by importing the geometrization of physics and the least-action principle. It formulates network complexity via a Fisher metric on the space of network configurations and sets this complexity equal to the action. Under this picture the capacity, robustness, and efficiency of the memory become directly tied to the network's complexity, generalization behavior, and the degree of disentanglement in its representations, yielding a geometric account of interpretability.

Core claim

Deep networks are memory spaces of information in which the capacity, robustness, and efficiency of the stored information are governed by the complexity, generalization, and disentanglement that arise once a Fisher metric is placed on the manifold of network configurations and the least-action principle is imposed so that complexity equals action.

What carries the argument

Fisher-metric formulation of network complexity together with the least-action principle (complexity equals action) applied to the geometry of deep-network configurations.

If this is right

  • The geometric volume of admissible configurations sets the effective capacity of the memory.
  • Generalization improves when the network configuration minimizes the action for a given task.
  • Disentangled representations correspond to orthogonal directions in the Fisher geometry of the memory space.
  • Interpretability reduces to reading off the geometric invariants of the learned configurations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Regularization schemes could be derived by penalizing the action rather than conventional weight norms.
  • The same geometric memory picture might be tested on recurrent or transformer architectures to check whether their generalization scales follow the same action principle.
  • Empirical measurement of the Fisher information matrix during training could serve as an early indicator of whether a network will generalize.

Load-bearing premise

That the Fisher metric and least-action principle can be applied directly to deep-network parameter spaces and will produce valid statements about generalization and interpretability.

What would settle it

A controlled experiment in which networks with measurably lower Fisher-metric complexity fail to show improved generalization or more disentangled internal representations would falsify the claimed link.

Figures

Figures reproduced from arXiv: 1907.06572 by L. Zhou, X. Dong.

Figure 1
Figure 1. Figure 1: Geometry of deep networks. GCom is a Riemannian manifold of the functions that can be represented by deep network, where the network structure G defines a submanifold and different network configurations with the same structure G correspond to different curves on GCom connecting the trivial identity function I and the target function T. Different deep networks also have their correspondent emergent geometr… view at source ↗
Figure 2
Figure 2. Figure 2: The fibre bundle structure of disentangled representations[30] [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
read the original abstract

By bridging deep networks and physics, the programme of geometrization of deep networks was proposed as a framework for the interpretability of deep learning systems. Following this programme we can apply two key ideas of physics, the geometrization of physics and the least action principle, on deep networks and deliver a new picture of deep networks: deep networks as memory space of information, where the capacity, robustness and efficiency of the memory are closely related with the complexity, generalization and disentanglement of deep networks. The key components of this understanding include:(1) a Fisher metric based formulation of the network complexity; (2)the least action (complexity=action) principle on deep networks and (3)the geometry built on deep network configurations. We will show how this picture will bring us a new understanding of the interpretability of deep learning systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript proposes a conceptual framework that interprets deep neural networks as an information 'memory space' by importing two ideas from physics: geometrization via the Fisher information metric (to formulate network complexity) and the least-action principle (equating complexity to action). This geometry on network configurations is claimed to relate memory capacity, robustness and efficiency to complexity, generalization and disentanglement, thereby furnishing a new route to interpretability of deep learning systems. The three key components listed are (1) the Fisher-metric complexity, (2) the complexity=action principle, and (3) the induced geometry on configurations.

Significance. If the proposed mappings can be made rigorous and predictive, the framework would supply a physics-inspired language for discussing generalization and interpretability that is currently absent from the literature. The explicit invocation of Fisher geometry and least-action principles is a distinctive strength; however, the manuscript remains at the level of formal analogy without derivations, explicit parameter-space calculations, or falsifiable predictions that would allow the claimed relations to be tested.

major comments (1)
  1. [Abstract / key components (1) and (2)] Abstract and key components (1)–(2): the central claim that a Fisher-metric formulation together with the least-action principle (complexity=action) yields relations among capacity, generalization and disentanglement is asserted without any explicit derivation, mapping of the metric to network parameters, or worked example. Because these relations are load-bearing for the entire interpretability programme, their absence prevents evaluation of whether the framework is more than a re-description of existing quantities.
minor comments (1)
  1. Notation for the Fisher metric and the action functional is introduced only at the level of the abstract; explicit definitions and any regularity conditions on the parameter manifold should be supplied in the main text.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive review. The manuscript presents a conceptual framework that applies ideas from physics to deep networks, and we respond to the major comment below while clarifying the intended scope of the work.

read point-by-point responses
  1. Referee: [Abstract / key components (1) and (2)] Abstract and key components (1)–(2): the central claim that a Fisher-metric formulation together with the least-action principle (complexity=action) yields relations among capacity, generalization and disentanglement is asserted without any explicit derivation, mapping of the metric to network parameters, or worked example. Because these relations are load-bearing for the entire interpretability programme, their absence prevents evaluation of whether the framework is more than a re-description of existing quantities.

    Authors: We agree that the manuscript does not contain explicit derivations, mappings of the Fisher metric to specific network parameters, or worked examples. The paper is framed as a high-level proposal that imports the geometrization of physics and the least-action principle to define network complexity and suggest links to generalization and disentanglement. These links are presented as implications of the proposed geometry rather than as rigorously derived results. We will revise the abstract and the description of the key components to state explicitly that the framework outlines a programme for future investigation rather than completed derivations. This change will make the scope of the claims clearer without altering the conceptual contribution. revision: partial

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper introduces a conceptual mapping from physics (Fisher metric on parameter space, least-action principle) onto deep networks to reinterpret complexity, generalization, and disentanglement. The relation 'complexity=action' is presented as an application of an external physical principle rather than an internal definition that forces later results. No equations, derivations, or self-citations in the provided text reduce any claimed prediction or insight to a fitted input or prior self-result by construction. The framework operates at the level of formal analogy and re-interpretation; its central claims remain independent of the inputs they are applied to.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The framework rests on transferring two physics concepts to neural networks without independent empirical or formal support shown in the abstract.

axioms (2)
  • domain assumption Fisher metric can be used to formulate the complexity of a deep network
    Explicitly listed as key component (1) in the abstract
  • domain assumption The least action principle applies to deep networks with the identification complexity=action
    Explicitly listed as key component (2) in the abstract
invented entities (1)
  • Deep network as memory space no independent evidence
    purpose: To relate memory capacity, robustness and efficiency to network complexity, generalization and disentanglement
    Central new picture announced in the abstract

pith-pipeline@v0.9.0 · 5669 in / 1425 out tokens · 25623 ms · 2026-05-24T22:22:50.790185+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 13 internal anchors

  1. [1]

    Geometrization of deep networks for the interpretability of deep learning systems

    X. Dong and L. Zhou. Geometrization of deep networks for the interpretability of deep learning systems. arxiv:1901.02354, 2019

  2. [2]

    Understanding over-parameterized deep networks by geometrization

    X. Dong and L. Zhou. Understanding over-parameterized deep networks by geometrization. arxiv:1902.03793, 2019

  3. [3]

    S. Lloyd. A theory of quantum gravity based on quantum computation. Class.quant.grav, 2006

  4. [4]

    Constructing holographic spacetimes using entanglement renormalization

    Brian Swingle. Constructing holographic spacetimes using entanglement renormalization. Physics, 2012

  5. [5]

    van Raamsdonk

    M. van Raamsdonk. Building up spacetime with quantum entanglement. General Relativity and Gravitation , 42(10):2323–2329, 2010

  6. [6]

    Geometry and Dynamics of Emergent Spacetime from Entanglement Spectrum

    Hiroaki Matsueda. Derivation of gravitational field equation from entanglement entropy. arXiv:1408.5589v2, 70, 2014

  7. [7]

    Quantum order from string-net condensations and the origin of light and massless fermions

    Wen Xiao-Gang. Quantum order from string-net condensations and the origin of light and massless fermions. Physical Review D , 68(6):484– 504, 2003

  8. [8]

    A theory of quantum gravity based on quantum computation

    Seth Lloyd. A theory of quantum gravity based on quantum computation. Class.quant.grav, 2012

  9. [9]

    Evenbly and G

    G. Evenbly and G. Vidal. Tensor network states and geometry. Journal of Statistical Physics , 145(4):891–918, 2011

  10. [10]

    Algorithms for tensor network renormalization

    Glen Evenbly. Algorithms for tensor network renormalization. Phys.rev.b, 95(4), 2017

  11. [11]

    M. R. Dowling and M. A. Nielsen. The geometry of quantum computation. Quantum Information and Computation , 8(10):861–899, 2008

  12. [12]

    Dear Qubitzers, GR=QM

    Leonard Susskind. Dear qubitzers, gr=qm. arXiv:1708.03040v1, 2017

  13. [13]

    J.S. Wu X. Dong and L. Zhou. How deep learning works –the geometry of deep learning. arXiv:1710.10784, 2017

  14. [14]

    Martins Bruveris and Darryl D. Holm. Geometry of image registration: The diffeomorphism group and momentum maps. Fields Institute Communications, 73:19–56, 2013

  15. [15]

    A mean-field optimal control formulation of deep learning

    E Weinan, Jiequn Han, and Qianxiao Li. A mean-field optimal control formulation of deep learning. arxiv:1807.01083v1, 2018

  16. [16]

    A practical introduction to tensor networks: Matrix prod- uct states and projected entangled pair states

    Romn Ors. A practical introduction to tensor networks: Matrix prod- uct states and projected entangled pair states. Annals of Physics , 349(10):117–158, 2014

  17. [17]

    Entanglement is not Enough

    L. Susskind. Entanglement is not enough. arXiv:1411.0690v1, 2014

  18. [18]

    Emergent General Relativity from Fisher Information Metric

    Hiroaki Matsueda. Emergent general relativity from fisher information metric. arXiv:1310.1831v2, 2013

  19. [19]

    Martins Bruveris and Peter W. Michor. Geometry of the fisher-rao metric on the space of smooth densities on a compact manifold. 2016

  20. [20]

    Understanding black-box predictions via influence functions

    Pang Wei Koh and Percy Liang. Understanding black-box predictions via influence functions. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning , volume 70 of Proceedings of Machine Learning Research , pages 1885– 1894, International Convention Centre, Sydney, Australia, 06–11 Aug

  21. [21]

    Learning to reweight examples for robust deep learning

    Mengye Ren, Wenyuan Zeng, Bin Yang, and Raquel Urtasun. Learning to reweight examples for robust deep learning. In Jennifer Dy and An- dreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning , volume 80 of Proceedings of Machine Learning Research, pages 4334–4343, Stockholmsmssan, Stockholm Sweden, 10– 15 Jul 2018. PMLR

  22. [22]

    Fisher-rao metric, geometry, and complexity of neural networks

    Tengyuan Liang, Tomaso Poggio, Alexander Rakhlin, and James Stokes. Fisher-rao metric, geometry, and complexity of neural networks. arxiv:1711.01530, 2017

  23. [23]

    Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes

    Wu Lei, Zhanxing Zhu, and E Weinan. Towards understand- ing generalization of deep learning: Perspective of loss landscapes. arxiv:1706.10239v2, 2017

  24. [24]

    Universal statistics of fisher information in deep neural networks: Mean field approach

    Ryo Karakida, Shotaro Akaho, and Shun Ichi Amari. Universal statistics of fisher information in deep neural networks: Mean field approach. 2018

  25. [25]

    The typical state paradox: diagnosing horizons with complexity

    Leonard Susskind. The typical state paradox: diagnosing horizons with complexity. F ortschritte Der Physik, 64(1):84–91, 2016. 11

  26. [26]

    Switchbacks and the Bridge to Nowhere

    L. Susskind and Y . Zhao. Switchbacks and the bridge to nowhere. arXiv:1408.2823v1, 2014

  27. [27]

    Fernndez-Gonzlez, N

    C. Fernndez-Gonzlez, N. Schuch, M. M. Wolf, J. I. Cirac, and D. Prez- Garca. Frustration free gapless hamiltonians for matrix product states. Communications in Mathematical Physics , 333(1):299–333, 2015

  28. [28]

    H. Heydari. Geometric formulation of quantum mechanics. arXiv:1503.00238, 2015

  29. [29]

    Challenging common as- sumptions in the unsupervised learning of disentangled representations

    Francesco Locatello, Stefan Bauer, Mario Lucic, Gunnar Rtsch, Sylvain Gelly, Bernhard Schlkopf, and Olivier Bachem. Challenging common as- sumptions in the unsupervised learning of disentangled representations. 2018

  30. [30]

    Gauge theory and twins paradox of disentangled representations

    X. Dong and Zhou. L. Gauge theory and twins paradox of disentangled representations. arxiv:1906.10545, 2019

  31. [31]

    Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba, and Alexei A. Efros. Dataset distillation. 2018

  32. [32]

    Tenenbaum, William T

    David Bau, Jun-Yan Zhu, Hendrik Strobelt, Bolei Zhou, Joshua B. Tenenbaum, William T. Freeman, and Antonio Torralba. Gan dissection: Visualizing and understanding generative adversarial networks. 2018

  33. [33]

    Gaier and D

    A. Gaier and D. Ha. Weight agnostic neural networks. arxiv:1906.04358, 2019

  34. [34]

    Harlow F

    D. Harlow F. Pastawski, B. Yoshida and J. Preskill. Holographic quantum error-correcting codes: toy models for the bulk/boundary correspondence. Journal of High Energy Physics , 2015(6):1–55, 2015

  35. [35]

    The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks

    Jonathan Frankle and Michael Carbin. The lottery ticket hypothesis: Finding sparse, trainable neural networks. arxiv:1803.03635, 2018

  36. [36]

    H. Zhou, J. Lan, R. Liu, and J. Yosinski. Deconstructing lottery tickets: Zeros, signs, and the supermask. arxiv:1905.01067, 2019

  37. [37]

    Y . He, P. Liu, Z.W. Wang, Z.L. Hu, and Yang Y . Filter pruning via geometric median for deep convolutional neural networks acceleration. arxiv:1811.00250, 2018