Architecture Induces Structural Invariant Manifolds of Neural Network Training Dynamics

Jiajie Zhao; Tao Luo; Yaoyu Zhang

arxiv: 2510.09564 · v2 · submitted 2025-10-10 · 🧮 math.DS

Architecture Induces Structural Invariant Manifolds of Neural Network Training Dynamics

Jiajie Zhao , Tao Luo , Yaoyu Zhang This is my paper

Pith reviewed 2026-05-18 07:29 UTC · model grok-4.3

classification 🧮 math.DS

keywords neural network training dynamicsstructural invariant manifoldssymmetry-induced invariantsgradient flowneuron condensationgeometric control theorypermutation symmetrytwo-layer networks

0 comments

The pith

Neural network architecture creates symmetry-induced manifolds that trap gradient flow trajectories independent of data or loss.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that symmetries built into a neural network's parameterization, such as the ability to swap identical neurons, force the training dynamics to stay on lower-dimensional submanifolds of parameter space. These structural invariant manifolds are defined purely by the model's analytic form and the family of gradient vector fields it generates for different inputs; they do not depend on the training data or the choice of loss. For fully connected networks the manifolds produce a hierarchy in which neurons condense and the effective dynamics match those of a narrower network. In the two-layer case the authors prove that every possible invariant manifold arises from these symmetries, closing the gap between known architectural features and all dynamical invariants.

Core claim

For an analytic model F(θ)(x) the structural invariant manifolds are exactly the unions of orbits of the vector-field family {∇_θ F(·)(x) | x ∈ ℝ^d}. Model symmetries, for example permutation symmetry of neurons, induce such manifolds. In fully connected networks this produces a hierarchy of symmetry-induced SIMs that explain neuron condensation and dynamical equivalence to reduced-width networks. For two-layer networks every SIM is symmetry-induced.

What carries the argument

Structural Invariant Manifolds (SIMs): submanifolds of parameter space that confine gradient-flow trajectories for any data and any loss, constructed as unions of orbits of the input-parameterized gradient vector fields and induced by the model's built-in symmetries.

If this is right

Training paths are confined to symmetry-determined lower-dimensional sets, so the effective degrees of freedom are strictly fewer than the nominal parameter count.
Neuron condensation occurs automatically, making wide networks behave like narrower ones along the invariant manifolds.
For two-layer networks the list of all possible invariants is exhausted by the known architectural symmetries.
Architecture alone, without reference to data, already determines families of invariant subspaces for the dynamics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Design choices that enlarge or shrink the symmetry group could be used to steer training trajectories toward or away from particular regions of parameter space.
The same orbit-union construction may apply to other parameterised families of vector fields, offering a route to classify invariants in non-neural dynamical systems.
If the manifolds persist under stochastic gradient descent, they could provide a geometric explanation for why certain initialisations or regularisers improve generalisation.

Load-bearing premise

The model must be analytic and training must follow exact gradient flow on parameter space so that geometric control theory applies directly to the generated vector fields.

What would settle it

A concrete counter-example would be a two-layer network with explicit permutation symmetry whose training trajectories escape the predicted symmetry-induced manifolds or remain on an additional invariant manifold that cannot be generated by any symmetry of the architecture.

read the original abstract

While architecture is recognized as key to the performance of deep neural networks, its precise effect on training dynamics has been unclear due to the confounding influence of data and loss functions. This paper proposed an analytic framework based on the geometric control theory to characterize the dynamical properties intrinsic to a model's parameterization. We prove that the Structural Invariant Manifolds (SIMs) of an analytic model $F(\mathbf{\theta})(\mathbf{x})$--submanifolds that confine gradient flow trajectories independent of data and loss--are unions of orbits of the vector field family $\{\nabla_{\mathbf{\theta}} F(\cdot)(\mathbf{x})\mid\mathbf{x}\in\mathbb{R}^d\}$. We then prove that a model's symmetry, e.g., permutation symmetry for neural networks, induces SIMs. Applying this, we characterize the hierarchy of symmetry-induced SIMs in fully-connected networks, where dynamics exhibit neuron condensation and equivalence to reduced-width networks. For two-layer networks, we prove all SIMs are symmetry-induced, closing the gap between known symmetries and all possible invariants. Overall, by establishing the framework for analyzing SIMs induced by architecture, our work paves the way for a deeper analysis of neural network training dynamics and generalization in the near future.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper uses geometric control theory to tie architecture symmetries to invariant manifolds that trap gradient flow, with a clean result for two-layer nets, but the orbit characterization skips the Lie algebra rank check that the stress-test flags.

read the letter

The main takeaway is that this work frames training dynamics as having architecture-intrinsic invariant manifolds that come from symmetries like permutations in the weights. They prove these manifolds are unions of orbits under the vector fields given by gradients with respect to parameters for each input, then show symmetries induce such manifolds, and finally that in two-layer networks every invariant manifold arises this way. That last part closes a gap between observed symmetries and all possible invariants. The hierarchy they build for fully connected networks also lines up with neuron condensation and effective width reduction, which is a concrete payoff. The separation of architecture effects from data and loss is the part that feels genuinely useful for thinking about generalization. The soft spot sits in the control-theory step. The claim that the manifolds are exactly the orbits requires the Lie algebra generated by those vector fields to have constant rank and span the right distribution. The vector fields inherit the layered polynomial structure of the network, so brackets are not obviously independent, and the paper does not appear to compute or bound the rank once permutation symmetries are accounted for. Without that verification the actual orbits could be strictly smaller than the asserted manifolds. The assumptions of analytic models and exact gradient flow are standard but narrow the scope. This is aimed at theorists who want a dynamical-systems view of why architecture matters. A reader already working on symmetry or invariance in loss landscapes would find the framework worth testing, even if the rank issue needs fixing. I would send it to peer review so the control-theory application gets a proper check rather than desk-rejecting it outright.

Referee Report

2 major / 2 minor

Summary. The paper develops an analytic framework using geometric control theory to study training dynamics of analytic models F(θ)(x) independent of data and loss. It defines Structural Invariant Manifolds (SIMs) as submanifolds confining gradient-flow trajectories, proves that these SIMs are unions of orbits of the vector-field family {∇_θ F(·)(x) | x ∈ ℝ^d}, shows that model symmetries (e.g., permutation symmetries) induce SIMs, characterizes the resulting hierarchy for fully-connected networks (including neuron condensation and equivalence to reduced-width networks), and proves that for two-layer networks every SIM is symmetry-induced.

Significance. If the central claims hold, the work supplies a symmetry-based geometric mechanism that explains architecture-dependent invariants in gradient flow, including explicit reduction of effective network width. The explicit closure result for two-layer networks and the hierarchy for deeper fully-connected nets are concrete, falsifiable predictions that could guide both theoretical analysis of generalization and practical architecture design.

major comments (2)

[§3] §3 (or the section containing the application of the orbit theorem): The claim that SIMs are unions of orbits of the family {∇_θ F(·)(x)} invokes the orbit theorem of geometric control theory, which requires that the Lie algebra generated by the family produces a distribution of constant rank. The manuscript does not compute or bound this rank for the structured, polynomial vector fields arising from layered networks; algebraic dependencies inherited from the network architecture may cause the rank to drop once permutation symmetries are quotiented, making the actual orbits strictly smaller than the asserted SIMs.
[Theorem on two-layer networks] Theorem on two-layer networks (the result asserting that all SIMs are symmetry-induced): The proof that every invariant manifold is generated by the known permutation symmetries must rule out the existence of additional, non-symmetry invariants. Without an explicit verification that the Lie-algebra rank equals the dimension of the symmetry-induced distribution (or a direct computation of the orbit dimension for the two-layer case), the claim that the symmetry-induced SIMs exhaust all possible invariants remains open.

minor comments (2)

[Abstract] The abstract contains a grammatical inconsistency ('This paper proposed' should read 'This paper proposes').
[Introduction / §2] Notation for the vector-field family is introduced without an explicit statement of the regularity class (analyticity) assumed on F, which is used in the subsequent Lie-bracket calculations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. The points raised regarding the application of the orbit theorem and the completeness of the symmetry-induced invariants for two-layer networks are important for rigor. We address each major comment below and describe the revisions we will make.

read point-by-point responses

Referee: [§3] §3 (or the section containing the application of the orbit theorem): The claim that SIMs are unions of orbits of the family {∇_θ F(·)(x)} invokes the orbit theorem of geometric control theory, which requires that the Lie algebra generated by the family produces a distribution of constant rank. The manuscript does not compute or bound this rank for the structured, polynomial vector fields arising from layered networks; algebraic dependencies inherited from the network architecture may cause the rank to drop once permutation symmetries are quotiented, making the actual orbits strictly smaller than the asserted SIMs.

Authors: We agree that an explicit treatment of the Lie-algebra rank would strengthen the invocation of the orbit theorem. In the revised manuscript we will add a dedicated paragraph in §3 (and a short appendix) that computes the rank of the distribution generated by {∇_θ F(·)(x)} for fully-connected networks. Because the vector fields are polynomial (hence analytic), the rank is constant on a dense open subset of parameter space. We will further show that any rank drop induced by the permutation symmetries is exactly accounted for by the dimension of the symmetry-induced SIMs, so that the orbits coincide with the asserted manifolds. revision: yes
Referee: [Theorem on two-layer networks] Theorem on two-layer networks (the result asserting that all SIMs are symmetry-induced): The proof that every invariant manifold is generated by the known permutation symmetries must rule out the existence of additional, non-symmetry invariants. Without an explicit verification that the Lie-algebra rank equals the dimension of the symmetry-induced distribution (or a direct computation of the orbit dimension for the two-layer case), the claim that the symmetry-induced SIMs exhaust all possible invariants remains open.

Authors: The existing proof already demonstrates that the symmetry-induced distribution for two-layer networks spans the full tangent space orthogonal to the symmetry group orbits. To make this verification fully explicit, we will insert a direct rank computation for the two-layer case in the revised version, confirming that the Lie-algebra rank equals the dimension of the symmetry-induced distribution. This step rules out additional non-symmetry invariants and closes the argument that all SIMs are symmetry-induced. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is deductive from definitions and external theorems

full rationale

The paper defines Structural Invariant Manifolds (SIMs) for an analytic model F(θ)(x) and proves they are unions of orbits of the family {∇_θ F(·)(x)} via geometric control theory orbit theorems, then shows symmetry (e.g., permutation) induces such manifolds, with a specific result for two-layer networks that all SIMs are symmetry-induced. These steps follow directly from the model parameterization, gradient flow dynamics, and standard control-theoretic results applied to the vector fields; no reduction by construction to fitted inputs, self-definitional loops, or load-bearing self-citations appears. The framework remains self-contained as a mathematical characterization independent of empirical data or author-specific prior theorems invoked circularly.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The framework rests on the analyticity of the model parameterization and the applicability of geometric control theory to the induced vector fields; no free parameters are introduced.

axioms (2)

domain assumption The model F(θ)(x) is analytic
Explicitly stated as the setting for which SIMs are defined and proven.
domain assumption Training proceeds by gradient flow on parameter space
Central to the definition of trajectories confined by the manifolds.

invented entities (1)

Structural Invariant Manifolds (SIMs) no independent evidence
purpose: Submanifolds that confine gradient flow trajectories independent of data and loss
Newly defined objects whose properties are proven from the vector field family.

pith-pipeline@v0.9.0 · 5746 in / 1379 out tokens · 42358 ms · 2026-05-18T07:29:07.011035+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We prove that the Structural Invariant Manifolds (SIMs) of an analytic model F(θ)(x) are unions of orbits of the vector field family {∇_θ F(·)(x) | x ∈ R^d}. ... For two-layer networks, we prove all SIMs are symmetry-induced
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

By employing the geometric control theory, in particular the Hermann–Nagano Theorem ... the orbits of F and their unions give rise to all SIMs

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.