Deep network as memory space: complexity, generalization, disentangled representation and interpretability

L. Zhou; X. Dong

arxiv: 1907.06572 · v1 · pith:N4MOBDTLnew · submitted 2019-07-12 · 💻 cs.LG · cs.AI

Deep network as memory space: complexity, generalization, disentangled representation and interpretability

X. Dong , L. Zhou This is my paper

Pith reviewed 2026-05-24 22:22 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords deep networksFisher metricleast action principleinterpretabilitygeneralizationdisentanglementmemory spacegeometrization

0 comments

The pith

Deep networks function as memory spaces whose capacity and efficiency follow from a Fisher-metric complexity that obeys the least-action principle.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes to view deep networks as memory spaces of information by importing the geometrization of physics and the least-action principle. It formulates network complexity via a Fisher metric on the space of network configurations and sets this complexity equal to the action. Under this picture the capacity, robustness, and efficiency of the memory become directly tied to the network's complexity, generalization behavior, and the degree of disentanglement in its representations, yielding a geometric account of interpretability.

Core claim

Deep networks are memory spaces of information in which the capacity, robustness, and efficiency of the stored information are governed by the complexity, generalization, and disentanglement that arise once a Fisher metric is placed on the manifold of network configurations and the least-action principle is imposed so that complexity equals action.

What carries the argument

Fisher-metric formulation of network complexity together with the least-action principle (complexity equals action) applied to the geometry of deep-network configurations.

If this is right

The geometric volume of admissible configurations sets the effective capacity of the memory.
Generalization improves when the network configuration minimizes the action for a given task.
Disentangled representations correspond to orthogonal directions in the Fisher geometry of the memory space.
Interpretability reduces to reading off the geometric invariants of the learned configurations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Regularization schemes could be derived by penalizing the action rather than conventional weight norms.
The same geometric memory picture might be tested on recurrent or transformer architectures to check whether their generalization scales follow the same action principle.
Empirical measurement of the Fisher information matrix during training could serve as an early indicator of whether a network will generalize.

Load-bearing premise

That the Fisher metric and least-action principle can be applied directly to deep-network parameter spaces and will produce valid statements about generalization and interpretability.

What would settle it

A controlled experiment in which networks with measurably lower Fisher-metric complexity fail to show improved generalization or more disentangled internal representations would falsify the claimed link.

Figures

Figures reproduced from arXiv: 1907.06572 by L. Zhou, X. Dong.

**Figure 1.** Figure 1: Geometry of deep networks. GCom is a Riemannian manifold of the functions that can be represented by deep network, where the network structure G defines a submanifold and different network configurations with the same structure G correspond to different curves on GCom connecting the trivial identity function I and the target function T. Different deep networks also have their correspondent emergent geometr… view at source ↗

**Figure 2.** Figure 2: The fibre bundle structure of disentangled representations[30] [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

read the original abstract

By bridging deep networks and physics, the programme of geometrization of deep networks was proposed as a framework for the interpretability of deep learning systems. Following this programme we can apply two key ideas of physics, the geometrization of physics and the least action principle, on deep networks and deliver a new picture of deep networks: deep networks as memory space of information, where the capacity, robustness and efficiency of the memory are closely related with the complexity, generalization and disentanglement of deep networks. The key components of this understanding include:(1) a Fisher metric based formulation of the network complexity; (2)the least action (complexity=action) principle on deep networks and (3)the geometry built on deep network configurations. We will show how this picture will bring us a new understanding of the interpretability of deep learning systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper sketches a physics-flavored geometric framing of deep nets as memory spaces but stays at formal analogy without new derivations or tests.

read the letter

The core idea is to recast deep networks through the Fisher metric on parameter space and the least-action principle, so that complexity equals action and this in turn explains generalization and disentanglement. The authors position the work as an extension of an existing geometrization programme rather than a fresh start. That framing is clear and the mapping they propose is internally consistent on its own terms. The paper does a reasonable job of spelling out the three components—Fisher complexity, least-action dynamics, and configuration geometry—and showing how they could relate capacity, robustness, and efficiency to the usual ML quantities. Credit for keeping the language accessible and for trying to import two standard physics tools without overclaiming immediate algorithmic payoffs. The main limitation is that the argument remains at the level of analogy. The relation complexity=action is introduced inside the framework itself, with no independent derivation or external benchmark supplied in the text. No proofs appear that the Fisher metric actually produces the claimed generalization bounds, and the abstract and stress-test note indicate the derivations do not move beyond formal correspondence. Empirical checks are also absent, so it is hard to judge whether the picture yields falsifiable predictions or just re-describes known behavior. Minor issues include reliance on imported concepts without fresh quantitative results and the risk that the memory-space metaphor stays suggestive rather than operational. The work is aimed at readers already interested in geometric or physics-inspired accounts of deep learning; it will not move the needle for people focused on concrete bounds or experiments. It is coherent enough and engages the literature honestly, so a serious editor could reasonably send it for review to see whether the authors can add the missing derivations or small-scale checks. I would not cite it in its current form, but it is worth a look if the full manuscript contains more substance than the abstract alone.

Referee Report

1 major / 1 minor

Summary. The manuscript proposes a conceptual framework that interprets deep neural networks as an information 'memory space' by importing two ideas from physics: geometrization via the Fisher information metric (to formulate network complexity) and the least-action principle (equating complexity to action). This geometry on network configurations is claimed to relate memory capacity, robustness and efficiency to complexity, generalization and disentanglement, thereby furnishing a new route to interpretability of deep learning systems. The three key components listed are (1) the Fisher-metric complexity, (2) the complexity=action principle, and (3) the induced geometry on configurations.

Significance. If the proposed mappings can be made rigorous and predictive, the framework would supply a physics-inspired language for discussing generalization and interpretability that is currently absent from the literature. The explicit invocation of Fisher geometry and least-action principles is a distinctive strength; however, the manuscript remains at the level of formal analogy without derivations, explicit parameter-space calculations, or falsifiable predictions that would allow the claimed relations to be tested.

major comments (1)

[Abstract / key components (1) and (2)] Abstract and key components (1)–(2): the central claim that a Fisher-metric formulation together with the least-action principle (complexity=action) yields relations among capacity, generalization and disentanglement is asserted without any explicit derivation, mapping of the metric to network parameters, or worked example. Because these relations are load-bearing for the entire interpretability programme, their absence prevents evaluation of whether the framework is more than a re-description of existing quantities.

minor comments (1)

Notation for the Fisher metric and the action functional is introduced only at the level of the abstract; explicit definitions and any regularity conditions on the parameter manifold should be supplied in the main text.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive review. The manuscript presents a conceptual framework that applies ideas from physics to deep networks, and we respond to the major comment below while clarifying the intended scope of the work.

read point-by-point responses

Referee: [Abstract / key components (1) and (2)] Abstract and key components (1)–(2): the central claim that a Fisher-metric formulation together with the least-action principle (complexity=action) yields relations among capacity, generalization and disentanglement is asserted without any explicit derivation, mapping of the metric to network parameters, or worked example. Because these relations are load-bearing for the entire interpretability programme, their absence prevents evaluation of whether the framework is more than a re-description of existing quantities.

Authors: We agree that the manuscript does not contain explicit derivations, mappings of the Fisher metric to specific network parameters, or worked examples. The paper is framed as a high-level proposal that imports the geometrization of physics and the least-action principle to define network complexity and suggest links to generalization and disentanglement. These links are presented as implications of the proposed geometry rather than as rigorously derived results. We will revise the abstract and the description of the key components to state explicitly that the framework outlines a programme for future investigation rather than completed derivations. This change will make the scope of the claims clearer without altering the conceptual contribution. revision: partial

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper introduces a conceptual mapping from physics (Fisher metric on parameter space, least-action principle) onto deep networks to reinterpret complexity, generalization, and disentanglement. The relation 'complexity=action' is presented as an application of an external physical principle rather than an internal definition that forces later results. No equations, derivations, or self-citations in the provided text reduce any claimed prediction or insight to a fitted input or prior self-result by construction. The framework operates at the level of formal analogy and re-interpretation; its central claims remain independent of the inputs they are applied to.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The framework rests on transferring two physics concepts to neural networks without independent empirical or formal support shown in the abstract.

axioms (2)

domain assumption Fisher metric can be used to formulate the complexity of a deep network
Explicitly listed as key component (1) in the abstract
domain assumption The least action principle applies to deep networks with the identification complexity=action
Explicitly listed as key component (2) in the abstract

invented entities (1)

Deep network as memory space no independent evidence
purpose: To relate memory capacity, robustness and efficiency to network complexity, generalization and disentanglement
Central new picture announced in the abstract

pith-pipeline@v0.9.0 · 5669 in / 1425 out tokens · 25623 ms · 2026-05-24T22:22:50.790185+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Fisher metric based formulation of the network complexity; the least action (complexity=action) principle on deep networks and the geometry built on deep network configurations
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

geometry/information duality... spacetime emerges from the information of a physical system... memory space
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

dimension of the emergent geometry is determined by the dimension of the quantum operation... 4-dimensional spacetime

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 13 internal anchors

[1]

Geometrization of deep networks for the interpretability of deep learning systems

X. Dong and L. Zhou. Geometrization of deep networks for the interpretability of deep learning systems. arxiv:1901.02354, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1901
[2]

Understanding over-parameterized deep networks by geometrization

X. Dong and L. Zhou. Understanding over-parameterized deep networks by geometrization. arxiv:1902.03793, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1902
[3]

S. Lloyd. A theory of quantum gravity based on quantum computation. Class.quant.grav, 2006

work page 2006
[4]

Constructing holographic spacetimes using entanglement renormalization

Brian Swingle. Constructing holographic spacetimes using entanglement renormalization. Physics, 2012

work page 2012
[5]

van Raamsdonk

M. van Raamsdonk. Building up spacetime with quantum entanglement. General Relativity and Gravitation , 42(10):2323–2329, 2010

work page 2010
[6]

Geometry and Dynamics of Emergent Spacetime from Entanglement Spectrum

Hiroaki Matsueda. Derivation of gravitational ﬁeld equation from entanglement entropy. arXiv:1408.5589v2, 70, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[7]

Quantum order from string-net condensations and the origin of light and massless fermions

Wen Xiao-Gang. Quantum order from string-net condensations and the origin of light and massless fermions. Physical Review D , 68(6):484– 504, 2003

work page 2003
[8]

A theory of quantum gravity based on quantum computation

Seth Lloyd. A theory of quantum gravity based on quantum computation. Class.quant.grav, 2012

work page 2012
[9]

Evenbly and G

G. Evenbly and G. Vidal. Tensor network states and geometry. Journal of Statistical Physics , 145(4):891–918, 2011

work page 2011
[10]

Algorithms for tensor network renormalization

Glen Evenbly. Algorithms for tensor network renormalization. Phys.rev.b, 95(4), 2017

work page 2017
[11]

M. R. Dowling and M. A. Nielsen. The geometry of quantum computation. Quantum Information and Computation , 8(10):861–899, 2008

work page 2008
[12]

Dear Qubitzers, GR=QM

Leonard Susskind. Dear qubitzers, gr=qm. arXiv:1708.03040v1, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[13]

J.S. Wu X. Dong and L. Zhou. How deep learning works –the geometry of deep learning. arXiv:1710.10784, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[14]

Martins Bruveris and Darryl D. Holm. Geometry of image registration: The diffeomorphism group and momentum maps. Fields Institute Communications, 73:19–56, 2013

work page 2013
[15]

A mean-ﬁeld optimal control formulation of deep learning

E Weinan, Jiequn Han, and Qianxiao Li. A mean-ﬁeld optimal control formulation of deep learning. arxiv:1807.01083v1, 2018

work page arXiv 2018
[16]

A practical introduction to tensor networks: Matrix prod- uct states and projected entangled pair states

Romn Ors. A practical introduction to tensor networks: Matrix prod- uct states and projected entangled pair states. Annals of Physics , 349(10):117–158, 2014

work page 2014
[17]

Entanglement is not Enough

L. Susskind. Entanglement is not enough. arXiv:1411.0690v1, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[18]

Emergent General Relativity from Fisher Information Metric

Hiroaki Matsueda. Emergent general relativity from ﬁsher information metric. arXiv:1310.1831v2, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[19]

Martins Bruveris and Peter W. Michor. Geometry of the ﬁsher-rao metric on the space of smooth densities on a compact manifold. 2016

work page 2016
[20]

Understanding black-box predictions via inﬂuence functions

Pang Wei Koh and Percy Liang. Understanding black-box predictions via inﬂuence functions. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning , volume 70 of Proceedings of Machine Learning Research , pages 1885– 1894, International Convention Centre, Sydney, Australia, 06–11 Aug

work page
[21]

Learning to reweight examples for robust deep learning

Mengye Ren, Wenyuan Zeng, Bin Yang, and Raquel Urtasun. Learning to reweight examples for robust deep learning. In Jennifer Dy and An- dreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning , volume 80 of Proceedings of Machine Learning Research, pages 4334–4343, Stockholmsmssan, Stockholm Sweden, 10– 15 Jul 2018. PMLR

work page 2018
[22]

Fisher-rao metric, geometry, and complexity of neural networks

Tengyuan Liang, Tomaso Poggio, Alexander Rakhlin, and James Stokes. Fisher-rao metric, geometry, and complexity of neural networks. arxiv:1711.01530, 2017

work page arXiv 2017
[23]

Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes

Wu Lei, Zhanxing Zhu, and E Weinan. Towards understand- ing generalization of deep learning: Perspective of loss landscapes. arxiv:1706.10239v2, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[24]

Universal statistics of ﬁsher information in deep neural networks: Mean ﬁeld approach

Ryo Karakida, Shotaro Akaho, and Shun Ichi Amari. Universal statistics of ﬁsher information in deep neural networks: Mean ﬁeld approach. 2018

work page 2018
[25]

The typical state paradox: diagnosing horizons with complexity

Leonard Susskind. The typical state paradox: diagnosing horizons with complexity. F ortschritte Der Physik, 64(1):84–91, 2016. 11

work page 2016
[26]

Switchbacks and the Bridge to Nowhere

L. Susskind and Y . Zhao. Switchbacks and the bridge to nowhere. arXiv:1408.2823v1, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[27]

Fernndez-Gonzlez, N

C. Fernndez-Gonzlez, N. Schuch, M. M. Wolf, J. I. Cirac, and D. Prez- Garca. Frustration free gapless hamiltonians for matrix product states. Communications in Mathematical Physics , 333(1):299–333, 2015

work page 2015
[28]

H. Heydari. Geometric formulation of quantum mechanics. arXiv:1503.00238, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[29]

Challenging common as- sumptions in the unsupervised learning of disentangled representations

Francesco Locatello, Stefan Bauer, Mario Lucic, Gunnar Rtsch, Sylvain Gelly, Bernhard Schlkopf, and Olivier Bachem. Challenging common as- sumptions in the unsupervised learning of disentangled representations. 2018

work page 2018
[30]

Gauge theory and twins paradox of disentangled representations

X. Dong and Zhou. L. Gauge theory and twins paradox of disentangled representations. arxiv:1906.10545, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1906
[31]

Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba, and Alexei A. Efros. Dataset distillation. 2018

work page 2018
[32]

Tenenbaum, William T

David Bau, Jun-Yan Zhu, Hendrik Strobelt, Bolei Zhou, Joshua B. Tenenbaum, William T. Freeman, and Antonio Torralba. Gan dissection: Visualizing and understanding generative adversarial networks. 2018

work page 2018
[33]

Gaier and D

A. Gaier and D. Ha. Weight agnostic neural networks. arxiv:1906.04358, 2019

work page arXiv 1906
[34]

Harlow F

D. Harlow F. Pastawski, B. Yoshida and J. Preskill. Holographic quantum error-correcting codes: toy models for the bulk/boundary correspondence. Journal of High Energy Physics , 2015(6):1–55, 2015

work page 2015
[35]

The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks

Jonathan Frankle and Michael Carbin. The lottery ticket hypothesis: Finding sparse, trainable neural networks. arxiv:1803.03635, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[36]

H. Zhou, J. Lan, R. Liu, and J. Yosinski. Deconstructing lottery tickets: Zeros, signs, and the supermask. arxiv:1905.01067, 2019

work page arXiv 1905
[37]

Y . He, P. Liu, Z.W. Wang, Z.L. Hu, and Yang Y . Filter pruning via geometric median for deep convolutional neural networks acceleration. arxiv:1811.00250, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[1] [1]

Geometrization of deep networks for the interpretability of deep learning systems

X. Dong and L. Zhou. Geometrization of deep networks for the interpretability of deep learning systems. arxiv:1901.02354, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1901

[2] [2]

Understanding over-parameterized deep networks by geometrization

X. Dong and L. Zhou. Understanding over-parameterized deep networks by geometrization. arxiv:1902.03793, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1902

[3] [3]

S. Lloyd. A theory of quantum gravity based on quantum computation. Class.quant.grav, 2006

work page 2006

[4] [4]

Constructing holographic spacetimes using entanglement renormalization

Brian Swingle. Constructing holographic spacetimes using entanglement renormalization. Physics, 2012

work page 2012

[5] [5]

van Raamsdonk

M. van Raamsdonk. Building up spacetime with quantum entanglement. General Relativity and Gravitation , 42(10):2323–2329, 2010

work page 2010

[6] [6]

Geometry and Dynamics of Emergent Spacetime from Entanglement Spectrum

Hiroaki Matsueda. Derivation of gravitational ﬁeld equation from entanglement entropy. arXiv:1408.5589v2, 70, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[7] [7]

Quantum order from string-net condensations and the origin of light and massless fermions

Wen Xiao-Gang. Quantum order from string-net condensations and the origin of light and massless fermions. Physical Review D , 68(6):484– 504, 2003

work page 2003

[8] [8]

A theory of quantum gravity based on quantum computation

Seth Lloyd. A theory of quantum gravity based on quantum computation. Class.quant.grav, 2012

work page 2012

[9] [9]

Evenbly and G

G. Evenbly and G. Vidal. Tensor network states and geometry. Journal of Statistical Physics , 145(4):891–918, 2011

work page 2011

[10] [10]

Algorithms for tensor network renormalization

Glen Evenbly. Algorithms for tensor network renormalization. Phys.rev.b, 95(4), 2017

work page 2017

[11] [11]

M. R. Dowling and M. A. Nielsen. The geometry of quantum computation. Quantum Information and Computation , 8(10):861–899, 2008

work page 2008

[12] [12]

Dear Qubitzers, GR=QM

Leonard Susskind. Dear qubitzers, gr=qm. arXiv:1708.03040v1, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[13] [13]

J.S. Wu X. Dong and L. Zhou. How deep learning works –the geometry of deep learning. arXiv:1710.10784, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[14] [14]

Martins Bruveris and Darryl D. Holm. Geometry of image registration: The diffeomorphism group and momentum maps. Fields Institute Communications, 73:19–56, 2013

work page 2013

[15] [15]

A mean-ﬁeld optimal control formulation of deep learning

E Weinan, Jiequn Han, and Qianxiao Li. A mean-ﬁeld optimal control formulation of deep learning. arxiv:1807.01083v1, 2018

work page arXiv 2018

[16] [16]

A practical introduction to tensor networks: Matrix prod- uct states and projected entangled pair states

Romn Ors. A practical introduction to tensor networks: Matrix prod- uct states and projected entangled pair states. Annals of Physics , 349(10):117–158, 2014

work page 2014

[17] [17]

Entanglement is not Enough

L. Susskind. Entanglement is not enough. arXiv:1411.0690v1, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[18] [18]

Emergent General Relativity from Fisher Information Metric

Hiroaki Matsueda. Emergent general relativity from ﬁsher information metric. arXiv:1310.1831v2, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[19] [19]

Martins Bruveris and Peter W. Michor. Geometry of the ﬁsher-rao metric on the space of smooth densities on a compact manifold. 2016

work page 2016

[20] [20]

Understanding black-box predictions via inﬂuence functions

Pang Wei Koh and Percy Liang. Understanding black-box predictions via inﬂuence functions. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning , volume 70 of Proceedings of Machine Learning Research , pages 1885– 1894, International Convention Centre, Sydney, Australia, 06–11 Aug

work page

[21] [21]

Learning to reweight examples for robust deep learning

Mengye Ren, Wenyuan Zeng, Bin Yang, and Raquel Urtasun. Learning to reweight examples for robust deep learning. In Jennifer Dy and An- dreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning , volume 80 of Proceedings of Machine Learning Research, pages 4334–4343, Stockholmsmssan, Stockholm Sweden, 10– 15 Jul 2018. PMLR

work page 2018

[22] [22]

Fisher-rao metric, geometry, and complexity of neural networks

Tengyuan Liang, Tomaso Poggio, Alexander Rakhlin, and James Stokes. Fisher-rao metric, geometry, and complexity of neural networks. arxiv:1711.01530, 2017

work page arXiv 2017

[23] [23]

Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes

Wu Lei, Zhanxing Zhu, and E Weinan. Towards understand- ing generalization of deep learning: Perspective of loss landscapes. arxiv:1706.10239v2, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[24] [24]

Universal statistics of ﬁsher information in deep neural networks: Mean ﬁeld approach

Ryo Karakida, Shotaro Akaho, and Shun Ichi Amari. Universal statistics of ﬁsher information in deep neural networks: Mean ﬁeld approach. 2018

work page 2018

[25] [25]

The typical state paradox: diagnosing horizons with complexity

Leonard Susskind. The typical state paradox: diagnosing horizons with complexity. F ortschritte Der Physik, 64(1):84–91, 2016. 11

work page 2016

[26] [26]

Switchbacks and the Bridge to Nowhere

L. Susskind and Y . Zhao. Switchbacks and the bridge to nowhere. arXiv:1408.2823v1, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[27] [27]

Fernndez-Gonzlez, N

C. Fernndez-Gonzlez, N. Schuch, M. M. Wolf, J. I. Cirac, and D. Prez- Garca. Frustration free gapless hamiltonians for matrix product states. Communications in Mathematical Physics , 333(1):299–333, 2015

work page 2015

[28] [28]

H. Heydari. Geometric formulation of quantum mechanics. arXiv:1503.00238, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[29] [29]

Challenging common as- sumptions in the unsupervised learning of disentangled representations

Francesco Locatello, Stefan Bauer, Mario Lucic, Gunnar Rtsch, Sylvain Gelly, Bernhard Schlkopf, and Olivier Bachem. Challenging common as- sumptions in the unsupervised learning of disentangled representations. 2018

work page 2018

[30] [30]

Gauge theory and twins paradox of disentangled representations

X. Dong and Zhou. L. Gauge theory and twins paradox of disentangled representations. arxiv:1906.10545, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1906

[31] [31]

Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba, and Alexei A. Efros. Dataset distillation. 2018

work page 2018

[32] [32]

Tenenbaum, William T

David Bau, Jun-Yan Zhu, Hendrik Strobelt, Bolei Zhou, Joshua B. Tenenbaum, William T. Freeman, and Antonio Torralba. Gan dissection: Visualizing and understanding generative adversarial networks. 2018

work page 2018

[33] [33]

Gaier and D

A. Gaier and D. Ha. Weight agnostic neural networks. arxiv:1906.04358, 2019

work page arXiv 1906

[34] [34]

Harlow F

D. Harlow F. Pastawski, B. Yoshida and J. Preskill. Holographic quantum error-correcting codes: toy models for the bulk/boundary correspondence. Journal of High Energy Physics , 2015(6):1–55, 2015

work page 2015

[35] [35]

The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks

Jonathan Frankle and Michael Carbin. The lottery ticket hypothesis: Finding sparse, trainable neural networks. arxiv:1803.03635, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[36] [36]

H. Zhou, J. Lan, R. Liu, and J. Yosinski. Deconstructing lottery tickets: Zeros, signs, and the supermask. arxiv:1905.01067, 2019

work page arXiv 1905

[37] [37]

Y . He, P. Liu, Z.W. Wang, Z.L. Hu, and Yang Y . Filter pruning via geometric median for deep convolutional neural networks acceleration. arxiv:1811.00250, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018