The Riemannian Geometry Associated to Gradient Flows of Linear Convolutional Networks

El Mehdi Achour; Holger Rauhut; Kathl\'en Kohn

arxiv: 2507.06367 · v2 · submitted 2025-07-08 · 💻 cs.LG · math.AG

The Riemannian Geometry Associated to Gradient Flows of Linear Convolutional Networks

El Mehdi Achour , Kathl\'en Kohn , Holger Rauhut This is my paper

Pith reviewed 2026-05-19 05:23 UTC · model grok-4.3

classification 💻 cs.LG math.AG

keywords gradient flowlinear convolutional networksRiemannian geometryfunction spaceinitialization independencedeep learning optimizationconvolutional algebra

0 comments

The pith

Gradient flows for linear convolutional networks are Riemannian flows on function space for any initialization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that the ordinary gradient descent dynamics on the weights of a linear convolutional network are equivalent to a Riemannian gradient flow directly on the space of functions the network represents. This equivalence does not require any special initialization condition, unlike the fully-connected case. The result applies to convolutions in two or more dimensions and to one-dimensional convolutions provided all strides exceed one. The Riemannian metric on function space is determined by the starting weights.

Core claim

We establish that the gradient flow on parameter space for learning linear convolutional networks can be written as a Riemannian gradient flow on function space regardless of the initialization. This result holds for D-dimensional convolutions with D ≥ 2, and for D =1 it holds if all so-called strides of the convolutions are greater than one. The corresponding Riemannian metric depends on the initialization.

What carries the argument

The initialization-dependent Riemannian metric on function space induced by the algebraic structure of the convolution operator, which equates the parameter-space Euclidean gradient flow to a Riemannian flow on the represented functions.

If this is right

Optimization trajectories can be studied using tools from Riemannian geometry applied directly to the functions rather than the weights.
Convergence rates and stationary points become independent of balancedness requirements at initialization.
The geometry of the function space is fully determined once the initial weights are fixed.
The same equivalence extends previous results from fully connected linear networks to the convolutional setting under milder conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar metric constructions might be attempted for networks with nonlinear activations to test whether the Riemannian view survives.
The dependence of the metric on initialization could be used to design weight initializations that simplify the induced geometry.
Connections may exist to symmetry groups preserved by convolution that are not visible in the fully connected case.
One could check whether the Riemannian formulation yields new bounds on generalization that depend on the induced metric rather than on parameter norms.

Load-bearing premise

That the convolution algebra permits a metric on the network's output functions making the parameter gradients identical to the Riemannian gradients, an equivalence that relies on linearity and specific stride or dimension conditions.

What would settle it

A concrete numerical trajectory for a one-dimensional convolution with stride one where the parameter-space gradient flow deviates from every possible Riemannian flow on the corresponding function space for some random initialization.

read the original abstract

We study geometric properties of the gradient flow for learning deep linear convolutional networks. For linear fully connected networks, it has been shown recently that the corresponding gradient flow on parameter space can be written as a Riemannian gradient flow on function space (i.e., on the product of weight matrices) if the initialization satisfies a so-called balancedness condition. We establish that the gradient flow on parameter space for learning linear convolutional networks can be written as a Riemannian gradient flow on function space regardless of the initialization. This result holds for $D$-dimensional convolutions with $D \geq 2$, and for $D =1$ it holds if all so-called strides of the convolutions are greater than one. The corresponding Riemannian metric depends on the initialization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Conv nets get the Riemannian equivalence for gradient flows without needing balanced initialization, thanks to convolution algebra, unlike the FC case.

read the letter

The main thing here is that linear convolutional networks let the parameter gradient flow be rewritten as a Riemannian gradient flow on function space regardless of initialization. This holds for D-dimensional convolutions with D at least 2, and for 1D only when all strides exceed one. The metric still depends on the starting point, which they state clearly up front. This is the key difference from the fully connected results that require balancedness. The paper does a solid job using the algebraic structure of convolutions to make the equivalence initialization-independent in that sense and spelling out the exact dimension and stride conditions. It extends the prior work without just reducing to it. The central claim looks like it holds up on the given description, with no obvious gaps or contradictions in how the metric is defined from the convolution operator. A minor soft spot is the restriction to linear networks, which keeps the math clean but leaves the nonlinear case for later work. The 1D stride condition is worth verifying in the proofs to confirm it is necessary. This paper is for people working on geometric views of optimization in deep learning, especially those comparing architectures. A reader who wants precise conditions on when these equivalences apply would get something useful from it. I would recommend sending it for peer review as a focused and honest extension of the existing literature.

Referee Report

0 major / 2 minor

Summary. The paper claims that gradient flow on the parameters of deep linear convolutional networks is equivalent to a Riemannian gradient flow on function space for arbitrary initialization. This equivalence relies on the algebraic structure of convolutions and holds for all D-dimensional convolutions with D ≥ 2 as well as for 1D convolutions when every stride is strictly greater than one; the induced Riemannian metric on function space is initialization-dependent. The result is positioned as an extension of prior work on fully connected linear networks that required a balancedness condition.

Significance. If the derivation is correct, the result supplies a geometric account of gradient descent that is structurally more robust for convolutional than for fully connected linear networks. By exploiting convolution algebra to eliminate the balancedness requirement, the work isolates a concrete difference between network families that affects the geometry of the optimization trajectory. The explicit dependence of the metric on initialization is a useful feature that could support future analyses of training dynamics and landscape geometry in convolutional architectures.

minor comments (2)

[Introduction] The introduction would benefit from a short paragraph contrasting the convolutional construction with the balancedness condition of the fully connected case, including a pointer to the relevant prior theorem.
Notation for the function space (product of weight matrices) and the precise definition of the Riemannian metric should be stated once in a dedicated subsection or displayed equation to improve readability for readers coming from the fully connected literature.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their positive assessment and recommendation for minor revision. The report accurately captures the main contribution of our work.

read point-by-point responses

Referee: The paper claims that gradient flow on the parameters of deep linear convolutional networks is equivalent to a Riemannian gradient flow on function space for arbitrary initialization. This equivalence relies on the algebraic structure of convolutions and holds for all D-dimensional convolutions with D ≥ 2 as well as for 1D convolutions when every stride is strictly greater than one; the induced Riemannian metric on function space is initialization-dependent. The result is positioned as an extension of prior work on fully connected linear networks that required a balancedness condition.

Authors: We confirm that the referee's summary is correct and complete. The algebraic properties of convolutions indeed allow the equivalence to hold for arbitrary initializations in the stated cases, removing the need for the balancedness condition required in the fully connected setting. revision: no

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The central result derives an equivalence between parameter-space gradient flow and a Riemannian gradient flow on function space for linear convolutional networks, holding independently of initialization for D-dimensional cases with D≥2 (or D=1 with strides>1). This follows from the algebraic structure of the convolution operator and does not reduce to a redefinition of inputs, a fitted parameter renamed as prediction, or a load-bearing self-citation chain. Prior work on fully connected networks is referenced for contrast but the convolutional derivation introduces its own conditions and metric construction without collapsing to those prior results by construction. The paper remains self-contained against external benchmarks with no quoted step exhibiting the required reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper introduces no new free parameters or entities but relies on domain assumptions about how to equip the function space with a metric depending on initialization.

axioms (2)

domain assumption Gradient flows can be lifted to Riemannian manifolds on function space
This is the core modeling assumption for reinterpreting parameter gradient flow.
ad hoc to paper Convolutional linear networks admit a representation where the flow is independent of balanced initialization
Specific to this work's contribution for the convolutional case.

pith-pipeline@v0.9.0 · 5655 in / 1289 out tokens · 98035 ms · 2026-05-19T05:23:27.598417+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the gradient flow on parameter space ... can be written as a Riemannian gradient flow on function space regardless of the initialization ... neural tangent kernel only depends on the δl
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

For linear convolutional networks on D-dimensional signals where D ≥ 2 ... K(δ)(v) ... Riemannian metric on the smooth locus of M

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

10 extracted references · 10 canonical work pages

[1]

On the optimization of deep networks: Implicit acceleration by overparameterization, 2018

Sanjeev Arora, Nadav Cohen, and Elad Hazan. On the optimization of deep networks: Implicit acceleration by overparameterization, 2018

work page 2018
[2]

Learning deep linear neural networks: Riemannian gradient flows and convergence to global minimizers

Bubacarr Bah, Holger Rauhut, Ulrich Terstiege, and Michael Westdickenberg. Learning deep linear neural networks: Riemannian gradient flows and convergence to global minimizers. Information and Inference: A Journal of the IMA, 11 0 (1): 0 307--353, 2022

work page 2022
[3]

Convergence of gradient flow for learning linear convolutional neural networks

Jona Diederen, Ulrich Terstiege, and Holger Rauhut. Convergence of gradient flow for learning linear convolutional neural networks. Preprint, 2025

work page 2025
[4]

Discriminants, resultants, and multidimensional determinants

Israel M Gelfand, Mikhail M Kapranov, and Andrei V Zelevinsky. Discriminants, resultants, and multidimensional determinants. Birkhäuser, 1994

work page 1994
[5]

Neural tangent kernel: Convergence and generalization in neural networks

Arthur Jacot, Franck Gabriel, and Cl \'e ment Hongler. Neural tangent kernel: Convergence and generalization in neural networks. Advances in Neural Information Processing Systems, 31, 2018

work page 2018
[6]

Secants, bitangents, and their congruences

Kathl \'e n Kohn, Bernt Ivar Utst l N dland, and Paolo Tripoli. Secants, bitangents, and their congruences. In Combinatorial Algebraic Geometry, pages 87--112. Springer, 2017

work page 2017
[7]

Geometry of linear convolutional networks

Kathl \'e n Kohn, Thomas Merkh, Guido Mont \'u far, and Matthew Trager. Geometry of linear convolutional networks. SIAM Journal on Applied Algebra and Geometry, 6 0 (3): 0 368--406, 2022

work page 2022
[8]

Function space and critical points of linear convolutional networks

Kathl \'e n Kohn, Guido Mont \'u far, Vahid Shahverdi, and Matthew Trager. Function space and critical points of linear convolutional networks. SIAM Journal on Applied Algebra and Geometry, 8 0 (2): 0 333--362, 2024

work page 2024
[9]

Abide by the law and follow the flow: Conservation laws for gradient flows

Sibylle Marcotte, R \'e mi Gribonval, and Gabriel Peyr \'e . Abide by the law and follow the flow: Conservation laws for gradient flows. Advances in neural information processing systems, 36, 2024

work page 2024
[10]

Convergence of gradient descent for learning linear neural networks

Gabin Maxime Nguegnang, Holger Rauhut, and Ulrich Terstiege. Convergence of gradient descent for learning linear neural networks. Advances in Continuous and Discrete Models, 23, 2024. doi:10.1186/s13662-023-03797-x

work page doi:10.1186/s13662-023-03797-x 2024

[1] [1]

On the optimization of deep networks: Implicit acceleration by overparameterization, 2018

Sanjeev Arora, Nadav Cohen, and Elad Hazan. On the optimization of deep networks: Implicit acceleration by overparameterization, 2018

work page 2018

[2] [2]

Learning deep linear neural networks: Riemannian gradient flows and convergence to global minimizers

Bubacarr Bah, Holger Rauhut, Ulrich Terstiege, and Michael Westdickenberg. Learning deep linear neural networks: Riemannian gradient flows and convergence to global minimizers. Information and Inference: A Journal of the IMA, 11 0 (1): 0 307--353, 2022

work page 2022

[3] [3]

Convergence of gradient flow for learning linear convolutional neural networks

Jona Diederen, Ulrich Terstiege, and Holger Rauhut. Convergence of gradient flow for learning linear convolutional neural networks. Preprint, 2025

work page 2025

[4] [4]

Discriminants, resultants, and multidimensional determinants

Israel M Gelfand, Mikhail M Kapranov, and Andrei V Zelevinsky. Discriminants, resultants, and multidimensional determinants. Birkhäuser, 1994

work page 1994

[5] [5]

Neural tangent kernel: Convergence and generalization in neural networks

Arthur Jacot, Franck Gabriel, and Cl \'e ment Hongler. Neural tangent kernel: Convergence and generalization in neural networks. Advances in Neural Information Processing Systems, 31, 2018

work page 2018

[6] [6]

Secants, bitangents, and their congruences

Kathl \'e n Kohn, Bernt Ivar Utst l N dland, and Paolo Tripoli. Secants, bitangents, and their congruences. In Combinatorial Algebraic Geometry, pages 87--112. Springer, 2017

work page 2017

[7] [7]

Geometry of linear convolutional networks

Kathl \'e n Kohn, Thomas Merkh, Guido Mont \'u far, and Matthew Trager. Geometry of linear convolutional networks. SIAM Journal on Applied Algebra and Geometry, 6 0 (3): 0 368--406, 2022

work page 2022

[8] [8]

Function space and critical points of linear convolutional networks

Kathl \'e n Kohn, Guido Mont \'u far, Vahid Shahverdi, and Matthew Trager. Function space and critical points of linear convolutional networks. SIAM Journal on Applied Algebra and Geometry, 8 0 (2): 0 333--362, 2024

work page 2024

[9] [9]

Abide by the law and follow the flow: Conservation laws for gradient flows

Sibylle Marcotte, R \'e mi Gribonval, and Gabriel Peyr \'e . Abide by the law and follow the flow: Conservation laws for gradient flows. Advances in neural information processing systems, 36, 2024

work page 2024

[10] [10]

Convergence of gradient descent for learning linear neural networks

Gabin Maxime Nguegnang, Holger Rauhut, and Ulrich Terstiege. Convergence of gradient descent for learning linear neural networks. Advances in Continuous and Discrete Models, 23, 2024. doi:10.1186/s13662-023-03797-x

work page doi:10.1186/s13662-023-03797-x 2024