A Wasserstein Geometric Framework for Hebbian Plasticity
Pith reviewed 2026-05-10 07:56 UTC · model grok-4.3
The pith
Hebbian plasticity arises from Wasserstein minimizing movements of memory probability measures.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Memory states are modeled as probability measures evolving through Wasserstein minimizing movements, with Hebbian learning rules cast as Hebbian energies satisfying a sequential stability condition. This ensures fiberwise JKO updates, optimal transport realizations, and an energy descent inequality. The resulting variational structure separates internal memory evolution along Wasserstein geodesics in a latent space from observable dynamics via geometric projection maps, with simplicial projections yielding classical affine schemes and revealing competition and pruning geometrically. Under Lipschitz regularity, continuous-time limits exist as perturbed Wasserstein gradient flows, yielding a 0
What carries the argument
Fiberwise JKO updates of sequentially stable Hebbian energies in the Wasserstein space of probability measures, which drive the evolution and induce the projections to observable synaptic weights.
If this is right
- Internal memory states evolve along Wasserstein geodesics while observable synaptic weights arise through projection maps.
- Simplicial projections recover exponential moving averages and mirror descent as special cases, with synaptic pruning following from mass redistribution.
- Classical neural networks correspond to flat projections of the curved internal dynamics.
- The framework extends naturally to richer representations such as structural weights and embedding memories.
- Memory consolidation is formulated variationally as a perturbed Wasserstein gradient flow in a quasi-stationary regime.
Where Pith is reading between the lines
- This approach could inspire new experiments measuring whether synaptic weight changes align with predicted Wasserstein geodesic projections in biological networks.
- Connections to optimal transport in other domains, like population genetics, might reveal analogous geometric structures in non-neural systems.
- The separation of latent curved dynamics from observable flat ones suggests designing AI systems with explicit hidden distributional states for better context handling.
Load-bearing premise
Hebbian learning rules can be formalized as energies that satisfy a sequential stability condition ensuring well-posed fiberwise updates and energy descent.
What would settle it
If measurements of synaptic plasticity in experiments fail to match any sequence of projections from Wasserstein minimizing movements of underlying measure-valued memory states, the claimed geometric separation would not hold.
Figures
read the original abstract
We introduce the Tan-HWG framework (Hebbian-Wasserstein-Geometry), a geometric theory of Hebbian plasticity in which memory states are modeled as probability measures evolving through Wasserstein minimizing movements. Hebbian learning rules are formalized as Hebbian energies satisfying a sequential stability condition, ensuring well-posed fiberwise JKO updates, optimal-transport realizations, and an energy descent inequality. This variational structure induces a fundamental separation between internal and observable dynamics. Internal memory states evolve along Wasserstein geodesics in a latent curved space, while observable quantities, such as effective synaptic weights, arise through geometric projection maps into external spaces. Simplicial projections recover classical affine schemes (including exponential moving averages and mirror descent), while revealing synaptic competition and pruning as geometric consequences of mass redistribution. Hilbertian projections provide a geometric account of phase alignment and multi-scale coherence. Classical neural networks appear as flat projections of this curved dynamics, while the framework naturally accommodates richer distributional representations, including structural weights and embedding memories, and their spectral extensions in complex internal spaces. Under mild Lipschitz regularity assumptions, including a quasi-stationary "sleep-mode" regime, we establish the existence of continuous-time limit curves. This yields a variational formulation of memory consolidation as a perturbed Wasserstein gradient flow. The framework thus provides a unified geometric foundation for synaptic plasticity, representation dynamics, and context-dependent computation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces the Tan-HWG (Hebbian-Wasserstein-Geometry) framework, modeling memory states as probability measures evolving via Wasserstein minimizing movements (JKO scheme). Hebbian learning rules are recast as energies obeying a sequential stability condition that ensures well-posed fiberwise JKO updates, optimal-transport realizations, and an energy descent inequality. Internal dynamics follow Wasserstein geodesics in a latent space, while observable synaptic weights arise via simplicial and Hilbertian projections that recover classical schemes (exponential moving averages, mirror descent) and interpret competition/pruning as mass redistribution. Under Lipschitz assumptions and a quasi-stationary sleep-mode regime, continuous-time limits exist, yielding a variational formulation of memory consolidation as a perturbed Wasserstein gradient flow.
Significance. If the sequential stability condition holds for standard Hebbian rules and the continuous-time limits are rigorously derived, the framework would supply a novel variational geometry linking optimal transport to synaptic plasticity, representation dynamics, and context-dependent computation. The separation of curved internal evolution from projected observables and the geometric account of consolidation are potentially high-impact contributions to theoretical neuroscience and optimization.
major comments (3)
- [§2] §2 (Hebbian Energies): The sequential stability condition is introduced to guarantee well-posed fiberwise JKO updates, optimal-transport realizations, and the energy descent inequality. No derivation or verification is supplied showing that canonical Hebbian updates (plain Hebbian, Oja, BCM) satisfy the condition without auxiliary normalization or projection steps; the unification claim therefore rests on an imposed modeling postulate rather than a necessary geometric property.
- [Existence theorem] Theorem on continuous-time limits (existence section): The statement asserts existence of limit curves under mild Lipschitz regularity and the quasi-stationary sleep-mode regime, yet supplies neither the explicit derivation of the limit, error bounds, nor verification that the perturbed Wasserstein gradient flow formulation follows. This leaves the variational account of memory consolidation without load-bearing technical support.
- [§5] §5 (Simplicial and Hilbertian Projections): The claim that simplicial projections recover classical affine schemes (exponential moving averages, mirror descent) and reveal synaptic competition as geometric mass redistribution is asserted, but the explicit construction of the projection maps and the step-by-step recovery of the update rules are not provided with equations, weakening the geometric interpretation.
minor comments (3)
- [Abstract and §1] The acronym 'Tan-HWG' is introduced without explanation of the 'Tan' component.
- [Notation] Notation for the fiberwise JKO updates and the stability condition could be clarified with a dedicated table of symbols.
- [References] References to foundational JKO literature (Jordan-Kinderlehrer-Otto 1998) and Wasserstein gradient flows are present but could be expanded for the continuous-time limit argument.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments on our manuscript. The points raised identify opportunities to strengthen the technical foundations of the Tan-HWG framework. We address each major comment below and indicate the revisions we will undertake.
read point-by-point responses
-
Referee: [§2] §2 (Hebbian Energies): The sequential stability condition is introduced to guarantee well-posed fiberwise JKO updates, optimal-transport realizations, and the energy descent inequality. No derivation or verification is supplied showing that canonical Hebbian updates (plain Hebbian, Oja, BCM) satisfy the condition without auxiliary normalization or projection steps; the unification claim therefore rests on an imposed modeling postulate rather than a necessary geometric property.
Authors: We agree that the manuscript presents the sequential stability condition primarily as an enabling assumption. In the revised version we will add an explicit verification subsection (or appendix) that derives the condition for the canonical rules. Specifically, we will show that the plain Hebbian rule, Oja’s rule, and the BCM rule each satisfy the sequential stability inequality after the standard normalization that keeps synaptic weights in the probability simplex; the calculations will be carried out directly from the energy definitions without additional projection steps. This will demonstrate that the condition is satisfied by the classical updates rather than imposed externally. revision: yes
-
Referee: [Existence theorem] Theorem on continuous-time limits (existence section): The statement asserts existence of limit curves under mild Lipschitz regularity and the quasi-stationary sleep-mode regime, yet supplies neither the explicit derivation of the limit, error bounds, nor verification that the perturbed Wasserstein gradient flow formulation follows. This leaves the variational account of memory consolidation without load-bearing technical support.
Authors: The existence result is stated under the Lipschitz and quasi-stationary assumptions, but we acknowledge that the passage from the discrete JKO scheme to the continuous-time perturbed gradient flow is only sketched. In the revision we will expand the proof to include: (i) the explicit construction of the limit curve via the Arzelà–Ascoli theorem applied to the discrete trajectories, (ii) quantitative error bounds between the discrete and continuous evolutions that rely on the Lipschitz constant and the sleep-mode time-scale separation, and (iii) the direct identification of the limiting equation as the perturbed Wasserstein gradient flow of the Hebbian energy. These additions will supply the missing technical support for the variational formulation of memory consolidation. revision: yes
-
Referee: [§5] §5 (Simplicial and Hilbertian Projections): The claim that simplicial projections recover classical affine schemes (exponential moving averages, mirror descent) and reveal synaptic competition as geometric mass redistribution is asserted, but the explicit construction of the projection maps and the step-by-step recovery of the update rules are not provided with equations, weakening the geometric interpretation.
Authors: We agree that the geometric interpretation would be strengthened by explicit formulas. In the revised manuscript we will insert a dedicated subsection that defines the simplicial projection map via the optimal-transport push-forward onto the probability simplex, followed by the explicit algebraic steps that recover the exponential moving-average update and the mirror-descent step. We will likewise derive the mass-redistribution interpretation of synaptic competition and pruning directly from the non-negativity and total-mass preservation of the Wasserstein geodesic. The Hilbertian projection will be treated analogously with the corresponding phase-alignment equations. revision: yes
Circularity Check
Sequential stability condition on Hebbian energies defined to guarantee JKO updates and descent by construction
specific steps
-
self definitional
[Abstract]
"Hebbian learning rules are formalized as Hebbian energies satisfying a sequential stability condition, ensuring well-posed fiberwise JKO updates, optimal-transport realizations, and an energy descent inequality."
The sequential stability condition is introduced as part of the formalization of Hebbian energies specifically to ensure the JKO updates, optimal-transport realizations, and descent inequality. This makes the framework's key variational consequences (internal Wasserstein geodesics, projections recovering classical schemes) follow directly from the definition rather than being derived from or verified against standard Hebbian rules like plain Hebbian or Oja without additional assumptions.
full rationale
The paper's core formalization step defines Hebbian learning rules via energies that satisfy a sequential stability condition introduced precisely to ensure well-posed fiberwise JKO updates, optimal-transport realizations, and energy descent. This reduces the claimed geometric unification (Wasserstein minimizing movements, simplicial projections recovering classical schemes) to a modeling choice that builds in the desired properties rather than deriving them from standard Hebbian rules. The abstract and framework description treat the condition as definitional, with no independent verification shown for canonical updates without auxiliary steps. No other circular patterns (self-citations, ansatzes, or renamings) are evident in the provided text. This yields partial circularity in the derivation chain while the overall variational structure retains some independent geometric content.
Axiom & Free-Parameter Ledger
axioms (2)
- ad hoc to paper Hebbian learning rules are formalized as Hebbian energies satisfying a sequential stability condition
- domain assumption Mild Lipschitz regularity assumptions hold, including a quasi-stationary sleep-mode regime
invented entities (2)
-
fiberwise JKO updates
no independent evidence
-
Tan-HWG framework
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Luigi Ambrosio, Nicola Gigli, and Giuseppe Savaré.Gradient Flows in Metric Spaces and in the Space of Probability Measures. Birkhäuser, 2008
work page 2008
-
[2]
Amos Arieli, Alexander Sterkin, Amiram Grinvald, and Ad Aertsen. Dynamics of ongoing activity: explanation of the large variability in evoked cortical responses.Science, 273(5283): 1868–1871, 1996
work page 1996
-
[3]
Martin R. Bridson and André Haefliger.Metric Spaces of Non-Positive Curvature, volume 319 ofGrundlehren der mathematischen Wissenschaften. Springer, 1999
work page 1999
-
[4]
Dean V Buonomano and Wolfgang Maass. State-dependent computations: spatiotemporal processing in cortical networks.Nature Reviews Neuroscience, 10(2):113–125, 2009
work page 2009
-
[5]
György Buzsáki.Rhythms of the Brain. Oxford University Press, 2006
work page 2006
-
[6]
Mark M Churchland, Byron M Yu, John P Cunningham, Leo P Sugrue, Marlene R Cohen, Greg S Corrado, William T Newsome, Andrew M Clark, Paymon Hosseini, Benjamin B Scott, et al. Stimulus onset quenches neural variability: a widespread cortical phenomenon.Nature Neuroscience, 13(3):369–378, 2010
work page 2010
-
[7]
Sinkhorn distances: Lightspeed computation of optimal transportation dis- tances.NeurIPS, 2013
Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transportation dis- tances.NeurIPS, 2013
work page 2013
-
[8]
Peter Dayan and L. F. Abbott.Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems. MIT Press, 2001
work page 2001
-
[9]
Reaching a consensus.Journal of the American Statistical Association, 69 (345):118–121, 1974
Morris H DeGroot. Reaching a consensus.Journal of the American Statistical Association, 69 (345):118–121, 1974
work page 1974
-
[10]
Itai Gat, Tal Remez, Neta Shaul, Felix Kreuk, Ricky T. Q. Chen, Gabriel Synnaeve, Yossi Adi, and Yaron Lipman. Discrete flow matching.NeurIPS, 2024
work page 2024
-
[11]
Wulfram Gerstner, Werner M. Kistler, Richard Naud, and Liam Paninski.Neuronal Dynamics: From Single Neurons to Networks and Models of Cognition. Cambridge University Press, 2014
work page 2014
-
[12]
Hebb.The Organization of Behavior
Donald O. Hebb.The Organization of Behavior. Wiley, 1949
work page 1949
-
[13]
Argmax flows and multinomial diffusion: Learning categorical distributions
Emiel Hoogeboom, Didrik Nielsen, Priyank Jaini, Patrick Forré, and Max Welling. Argmax flows and multinomial diffusion: Learning categorical distributions. InNeurIPS, 2021
work page 2021
-
[14]
JohnJ.Hopfield. Neuralnetworksandphysicalsystemswithemergentcollectivecomputational abilities.Proceedings of the National Academy of Sciences, 79(8):2554–2558, 1982
work page 1982
-
[15]
A. Jadbabaie, J. Lin, and A.S. Morse. Coordination of groups of mobile autonomous agents using nearest neighbor rules. InProceedings of the 41st IEEE Conference on Decision and Control, 2002., volume 3, pages 2953–2958 vol.3, 2002. doi: 10.1109/CDC.2002.1184304
-
[16]
RichardJordan, DavidKinderlehrer, andFelixOtto. Thevariationalformulationofthefokker– planck equation.SIAM Journal on Mathematical Analysis, 29(1):1–17, 1998
work page 1998
-
[17]
Self-entrainment of a population of coupled non-linear oscillators
Yoshiki Kuramoto. Self-entrainment of a population of coupled non-linear oscillators. In H. Araki, editor,International Symposium on Mathematical Problems in Theoretical Physics, volume 39 ofLecture Notes in Physics, pages 420–422. Springer, 1975. 74
work page 1975
-
[18]
Ferster D. Lampl I, Reichova I. Synchronous membrane potential fluctuations in neurons of the cat visual cortex.Neuron, 22(2):361–374, 1999
work page 1999
-
[19]
Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling.ICLR, 2023
work page 2023
-
[20]
John E. Lisman and Ole Jensen. The theta-gamma neural code.Neuron, 77(6):1002–1016, 2013
work page 2013
-
[21]
Sfm: Stochastic flow matching for discrete and categorical data.NeurIPS, 2024
Xuehai Liu et al. Sfm: Stochastic flow matching for discrete and categorical data.NeurIPS, 2024
work page 2024
-
[22]
Ricci curvature for metric-measure spaces via optimal transport
John Lott and Cédric Villani. Ricci curvature for metric-measure spaces via optimal transport. Annals of Mathematics, 169(3):903–991, 2009
work page 2009
-
[23]
ValerioMante, DavidSussillo, KrishnaVShenoy, andWilliamTNewsome. Context-dependent computation by recurrent dynamics in prefrontal cortex.Nature, 503(7474):78–84, 2013
work page 2013
-
[24]
Robert J. McCann. A convexity principle for interacting gases.Advances in Mathematics, 128 (1):153–179, 1997. doi: 10.1006/aima.1997.1634
-
[25]
R. Olfati-Saber and R.M. Murray. Consensus problems in networks of agents with switching topology and time-delays.IEEE Transactions on Automatic Control, 49(9):1520–1533, 2004. doi: 10.1109/TAC.2004.834113
-
[26]
Reza Olfati-Saber, J. Alex Fax, and Richard M. Murray. Consensus and cooperation in networked multi-agent systems.Proceedings of the IEEE, 95(1):215–233, 2007. doi: 10.1109/JPROC.2006.887293
-
[27]
Transient dynamics for neural process- ing.Science, 321(5885):48–50, 2008
Misha Rabinovich, Ramón Huerta, and Gilles Laurent. Transient dynamics for neural process- ing.Science, 321(5885):48–50, 2008
work page 2008
-
[28]
Wei Ren and Randal W Beard. Consensus seeking in multiagent systems under dynamically changing interaction topologies.IEEE Transactions on Automatic Control, 50(5):655–661, 2005
work page 2005
-
[29]
Filippo Santambrogio.Optimal Transport for Applied Mathematicians, volume 87 ofProgress in Nonlinear Differential Equations and Their Applications. Birkhäuser, 2015
work page 2015
-
[30]
Karl-Theodor Sturm. Probability measures on metric spaces of nonpositive curvature.Heat Kernels and Analysis on Manifolds, Graphs, and Metric Spaces, pages 357–390, 2003
work page 2003
-
[31]
Richard S. Sutton and Andrew G. Barto.Reinforcement Learning: An Introduction. MIT Press, 2 edition, 2018
work page 2018
-
[32]
Cédric Villani.Optimal Transport: Old and New, volume 338 ofGrundlehren der mathema- tischen Wissenschaften. Springer, 2009
work page 2009
- [33]
-
[34]
Xiao-Jing Wang. Neurophysiological and computational principles of cortical rhythms in cog- nition.Physiological Reviews, 90(3):1195–1268, 2010. 75
work page 2010
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.