pith. sign in

arxiv: 2604.16052 · v1 · submitted 2026-04-17 · 🧮 math.OC · cs.LG· math.PR

A Wasserstein Geometric Framework for Hebbian Plasticity

Pith reviewed 2026-05-10 07:56 UTC · model grok-4.3

classification 🧮 math.OC cs.LGmath.PR
keywords Hebbian plasticityWasserstein geometryprobability measuresminimizing movementsoptimal transportsynaptic competitionmemory consolidationgeometric projections
0
0 comments X

The pith

Hebbian plasticity arises from Wasserstein minimizing movements of memory probability measures.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a geometric framework for Hebbian plasticity by representing memory states as probability measures that evolve via Wasserstein minimizing movements. Hebbian learning is formalized through energies that meet a sequential stability condition, leading to well-posed updates and energy decrease. This creates a separation between internal dynamics along geodesics in curved space and observable quantities obtained by projections, which recover classical learning schemes and interpret pruning as mass redistribution. A reader would care because it supplies a single variational language connecting synaptic changes, representation formation, and context-dependent neural computation.

Core claim

Memory states are modeled as probability measures evolving through Wasserstein minimizing movements, with Hebbian learning rules cast as Hebbian energies satisfying a sequential stability condition. This ensures fiberwise JKO updates, optimal transport realizations, and an energy descent inequality. The resulting variational structure separates internal memory evolution along Wasserstein geodesics in a latent space from observable dynamics via geometric projection maps, with simplicial projections yielding classical affine schemes and revealing competition and pruning geometrically. Under Lipschitz regularity, continuous-time limits exist as perturbed Wasserstein gradient flows, yielding a 0

What carries the argument

Fiberwise JKO updates of sequentially stable Hebbian energies in the Wasserstein space of probability measures, which drive the evolution and induce the projections to observable synaptic weights.

If this is right

  • Internal memory states evolve along Wasserstein geodesics while observable synaptic weights arise through projection maps.
  • Simplicial projections recover exponential moving averages and mirror descent as special cases, with synaptic pruning following from mass redistribution.
  • Classical neural networks correspond to flat projections of the curved internal dynamics.
  • The framework extends naturally to richer representations such as structural weights and embedding memories.
  • Memory consolidation is formulated variationally as a perturbed Wasserstein gradient flow in a quasi-stationary regime.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This approach could inspire new experiments measuring whether synaptic weight changes align with predicted Wasserstein geodesic projections in biological networks.
  • Connections to optimal transport in other domains, like population genetics, might reveal analogous geometric structures in non-neural systems.
  • The separation of latent curved dynamics from observable flat ones suggests designing AI systems with explicit hidden distributional states for better context handling.

Load-bearing premise

Hebbian learning rules can be formalized as energies that satisfy a sequential stability condition ensuring well-posed fiberwise updates and energy descent.

What would settle it

If measurements of synaptic plasticity in experiments fail to match any sequence of projections from Wasserstein minimizing movements of underlying measure-valued memory states, the claimed geometric separation would not hold.

Figures

Figures reproduced from arXiv: 2604.16052 by Ulrich Tan.

Figure 1
Figure 1. Figure 1: Star-shaped metric tree Y representing the synaptic targets X′ M. XN is naturally equipped with the normalized measure µ = 1 N X N j=1 δxj , and for each postsynaptic neuron xj ∈ XN , we represent its synaptic profile by a probability measure ρxj = X M i=1 pi,j δyi ∈ P2(Y ), supported on the leaves YM, where ρxj ({yi}) = pi,j ∈ [0, 1] denotes the relative synaptic strength from x ′ i (or yi) to xj , with t… view at source ↗
Figure 2
Figure 2. Figure 2: Illustration of the Wasserstein transport between two discrete synaptic profiles. Mass is [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
read the original abstract

We introduce the Tan-HWG framework (Hebbian-Wasserstein-Geometry), a geometric theory of Hebbian plasticity in which memory states are modeled as probability measures evolving through Wasserstein minimizing movements. Hebbian learning rules are formalized as Hebbian energies satisfying a sequential stability condition, ensuring well-posed fiberwise JKO updates, optimal-transport realizations, and an energy descent inequality. This variational structure induces a fundamental separation between internal and observable dynamics. Internal memory states evolve along Wasserstein geodesics in a latent curved space, while observable quantities, such as effective synaptic weights, arise through geometric projection maps into external spaces. Simplicial projections recover classical affine schemes (including exponential moving averages and mirror descent), while revealing synaptic competition and pruning as geometric consequences of mass redistribution. Hilbertian projections provide a geometric account of phase alignment and multi-scale coherence. Classical neural networks appear as flat projections of this curved dynamics, while the framework naturally accommodates richer distributional representations, including structural weights and embedding memories, and their spectral extensions in complex internal spaces. Under mild Lipschitz regularity assumptions, including a quasi-stationary "sleep-mode" regime, we establish the existence of continuous-time limit curves. This yields a variational formulation of memory consolidation as a perturbed Wasserstein gradient flow. The framework thus provides a unified geometric foundation for synaptic plasticity, representation dynamics, and context-dependent computation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The manuscript introduces the Tan-HWG (Hebbian-Wasserstein-Geometry) framework, modeling memory states as probability measures evolving via Wasserstein minimizing movements (JKO scheme). Hebbian learning rules are recast as energies obeying a sequential stability condition that ensures well-posed fiberwise JKO updates, optimal-transport realizations, and an energy descent inequality. Internal dynamics follow Wasserstein geodesics in a latent space, while observable synaptic weights arise via simplicial and Hilbertian projections that recover classical schemes (exponential moving averages, mirror descent) and interpret competition/pruning as mass redistribution. Under Lipschitz assumptions and a quasi-stationary sleep-mode regime, continuous-time limits exist, yielding a variational formulation of memory consolidation as a perturbed Wasserstein gradient flow.

Significance. If the sequential stability condition holds for standard Hebbian rules and the continuous-time limits are rigorously derived, the framework would supply a novel variational geometry linking optimal transport to synaptic plasticity, representation dynamics, and context-dependent computation. The separation of curved internal evolution from projected observables and the geometric account of consolidation are potentially high-impact contributions to theoretical neuroscience and optimization.

major comments (3)
  1. [§2] §2 (Hebbian Energies): The sequential stability condition is introduced to guarantee well-posed fiberwise JKO updates, optimal-transport realizations, and the energy descent inequality. No derivation or verification is supplied showing that canonical Hebbian updates (plain Hebbian, Oja, BCM) satisfy the condition without auxiliary normalization or projection steps; the unification claim therefore rests on an imposed modeling postulate rather than a necessary geometric property.
  2. [Existence theorem] Theorem on continuous-time limits (existence section): The statement asserts existence of limit curves under mild Lipschitz regularity and the quasi-stationary sleep-mode regime, yet supplies neither the explicit derivation of the limit, error bounds, nor verification that the perturbed Wasserstein gradient flow formulation follows. This leaves the variational account of memory consolidation without load-bearing technical support.
  3. [§5] §5 (Simplicial and Hilbertian Projections): The claim that simplicial projections recover classical affine schemes (exponential moving averages, mirror descent) and reveal synaptic competition as geometric mass redistribution is asserted, but the explicit construction of the projection maps and the step-by-step recovery of the update rules are not provided with equations, weakening the geometric interpretation.
minor comments (3)
  1. [Abstract and §1] The acronym 'Tan-HWG' is introduced without explanation of the 'Tan' component.
  2. [Notation] Notation for the fiberwise JKO updates and the stability condition could be clarified with a dedicated table of symbols.
  3. [References] References to foundational JKO literature (Jordan-Kinderlehrer-Otto 1998) and Wasserstein gradient flows are present but could be expanded for the continuous-time limit argument.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. The points raised identify opportunities to strengthen the technical foundations of the Tan-HWG framework. We address each major comment below and indicate the revisions we will undertake.

read point-by-point responses
  1. Referee: [§2] §2 (Hebbian Energies): The sequential stability condition is introduced to guarantee well-posed fiberwise JKO updates, optimal-transport realizations, and the energy descent inequality. No derivation or verification is supplied showing that canonical Hebbian updates (plain Hebbian, Oja, BCM) satisfy the condition without auxiliary normalization or projection steps; the unification claim therefore rests on an imposed modeling postulate rather than a necessary geometric property.

    Authors: We agree that the manuscript presents the sequential stability condition primarily as an enabling assumption. In the revised version we will add an explicit verification subsection (or appendix) that derives the condition for the canonical rules. Specifically, we will show that the plain Hebbian rule, Oja’s rule, and the BCM rule each satisfy the sequential stability inequality after the standard normalization that keeps synaptic weights in the probability simplex; the calculations will be carried out directly from the energy definitions without additional projection steps. This will demonstrate that the condition is satisfied by the classical updates rather than imposed externally. revision: yes

  2. Referee: [Existence theorem] Theorem on continuous-time limits (existence section): The statement asserts existence of limit curves under mild Lipschitz regularity and the quasi-stationary sleep-mode regime, yet supplies neither the explicit derivation of the limit, error bounds, nor verification that the perturbed Wasserstein gradient flow formulation follows. This leaves the variational account of memory consolidation without load-bearing technical support.

    Authors: The existence result is stated under the Lipschitz and quasi-stationary assumptions, but we acknowledge that the passage from the discrete JKO scheme to the continuous-time perturbed gradient flow is only sketched. In the revision we will expand the proof to include: (i) the explicit construction of the limit curve via the Arzelà–Ascoli theorem applied to the discrete trajectories, (ii) quantitative error bounds between the discrete and continuous evolutions that rely on the Lipschitz constant and the sleep-mode time-scale separation, and (iii) the direct identification of the limiting equation as the perturbed Wasserstein gradient flow of the Hebbian energy. These additions will supply the missing technical support for the variational formulation of memory consolidation. revision: yes

  3. Referee: [§5] §5 (Simplicial and Hilbertian Projections): The claim that simplicial projections recover classical affine schemes (exponential moving averages, mirror descent) and reveal synaptic competition as geometric mass redistribution is asserted, but the explicit construction of the projection maps and the step-by-step recovery of the update rules are not provided with equations, weakening the geometric interpretation.

    Authors: We agree that the geometric interpretation would be strengthened by explicit formulas. In the revised manuscript we will insert a dedicated subsection that defines the simplicial projection map via the optimal-transport push-forward onto the probability simplex, followed by the explicit algebraic steps that recover the exponential moving-average update and the mirror-descent step. We will likewise derive the mass-redistribution interpretation of synaptic competition and pruning directly from the non-negativity and total-mass preservation of the Wasserstein geodesic. The Hilbertian projection will be treated analogously with the corresponding phase-alignment equations. revision: yes

Circularity Check

1 steps flagged

Sequential stability condition on Hebbian energies defined to guarantee JKO updates and descent by construction

specific steps
  1. self definitional [Abstract]
    "Hebbian learning rules are formalized as Hebbian energies satisfying a sequential stability condition, ensuring well-posed fiberwise JKO updates, optimal-transport realizations, and an energy descent inequality."

    The sequential stability condition is introduced as part of the formalization of Hebbian energies specifically to ensure the JKO updates, optimal-transport realizations, and descent inequality. This makes the framework's key variational consequences (internal Wasserstein geodesics, projections recovering classical schemes) follow directly from the definition rather than being derived from or verified against standard Hebbian rules like plain Hebbian or Oja without additional assumptions.

full rationale

The paper's core formalization step defines Hebbian learning rules via energies that satisfy a sequential stability condition introduced precisely to ensure well-posed fiberwise JKO updates, optimal-transport realizations, and energy descent. This reduces the claimed geometric unification (Wasserstein minimizing movements, simplicial projections recovering classical schemes) to a modeling choice that builds in the desired properties rather than deriving them from standard Hebbian rules. The abstract and framework description treat the condition as definitional, with no independent verification shown for canonical updates without auxiliary steps. No other circular patterns (self-citations, ansatzes, or renamings) are evident in the provided text. This yields partial circularity in the derivation chain while the overall variational structure retains some independent geometric content.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The central claim rests on newly introduced concepts including Hebbian energies and their stability condition, plus regularity assumptions for limit curves; these are postulated rather than derived from external benchmarks.

axioms (2)
  • ad hoc to paper Hebbian learning rules are formalized as Hebbian energies satisfying a sequential stability condition
    This condition is required to ensure well-posed fiberwise JKO updates and energy descent inequality.
  • domain assumption Mild Lipschitz regularity assumptions hold, including a quasi-stationary sleep-mode regime
    Invoked to establish existence of continuous-time limit curves as perturbed Wasserstein gradient flows.
invented entities (2)
  • fiberwise JKO updates no independent evidence
    purpose: To realize the evolution of internal memory states in the latent space
    New construct introduced to link the stability condition to optimal-transport realizations.
  • Tan-HWG framework no independent evidence
    purpose: To unify Hebbian plasticity with Wasserstein geometry
    The overall framework is the novel construction presented in the paper.

pith-pipeline@v0.9.0 · 5540 in / 1517 out tokens · 38274 ms · 2026-05-10T07:56:20.648535+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages

  1. [1]

    Birkhäuser, 2008

    Luigi Ambrosio, Nicola Gigli, and Giuseppe Savaré.Gradient Flows in Metric Spaces and in the Space of Probability Measures. Birkhäuser, 2008

  2. [2]

    Dynamics of ongoing activity: explanation of the large variability in evoked cortical responses.Science, 273(5283): 1868–1871, 1996

    Amos Arieli, Alexander Sterkin, Amiram Grinvald, and Ad Aertsen. Dynamics of ongoing activity: explanation of the large variability in evoked cortical responses.Science, 273(5283): 1868–1871, 1996

  3. [3]

    Bridson and André Haefliger.Metric Spaces of Non-Positive Curvature, volume 319 ofGrundlehren der mathematischen Wissenschaften

    Martin R. Bridson and André Haefliger.Metric Spaces of Non-Positive Curvature, volume 319 ofGrundlehren der mathematischen Wissenschaften. Springer, 1999

  4. [4]

    State-dependent computations: spatiotemporal processing in cortical networks.Nature Reviews Neuroscience, 10(2):113–125, 2009

    Dean V Buonomano and Wolfgang Maass. State-dependent computations: spatiotemporal processing in cortical networks.Nature Reviews Neuroscience, 10(2):113–125, 2009

  5. [5]

    Oxford University Press, 2006

    György Buzsáki.Rhythms of the Brain. Oxford University Press, 2006

  6. [6]

    Stimulus onset quenches neural variability: a widespread cortical phenomenon.Nature Neuroscience, 13(3):369–378, 2010

    Mark M Churchland, Byron M Yu, John P Cunningham, Leo P Sugrue, Marlene R Cohen, Greg S Corrado, William T Newsome, Andrew M Clark, Paymon Hosseini, Benjamin B Scott, et al. Stimulus onset quenches neural variability: a widespread cortical phenomenon.Nature Neuroscience, 13(3):369–378, 2010

  7. [7]

    Sinkhorn distances: Lightspeed computation of optimal transportation dis- tances.NeurIPS, 2013

    Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transportation dis- tances.NeurIPS, 2013

  8. [8]

    Peter Dayan and L. F. Abbott.Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems. MIT Press, 2001

  9. [9]

    Reaching a consensus.Journal of the American Statistical Association, 69 (345):118–121, 1974

    Morris H DeGroot. Reaching a consensus.Journal of the American Statistical Association, 69 (345):118–121, 1974

  10. [10]

    Itai Gat, Tal Remez, Neta Shaul, Felix Kreuk, Ricky T. Q. Chen, Gabriel Synnaeve, Yossi Adi, and Yaron Lipman. Discrete flow matching.NeurIPS, 2024

  11. [11]

    Kistler, Richard Naud, and Liam Paninski.Neuronal Dynamics: From Single Neurons to Networks and Models of Cognition

    Wulfram Gerstner, Werner M. Kistler, Richard Naud, and Liam Paninski.Neuronal Dynamics: From Single Neurons to Networks and Models of Cognition. Cambridge University Press, 2014

  12. [12]

    Hebb.The Organization of Behavior

    Donald O. Hebb.The Organization of Behavior. Wiley, 1949

  13. [13]

    Argmax flows and multinomial diffusion: Learning categorical distributions

    Emiel Hoogeboom, Didrik Nielsen, Priyank Jaini, Patrick Forré, and Max Welling. Argmax flows and multinomial diffusion: Learning categorical distributions. InNeurIPS, 2021

  14. [14]

    Neuralnetworksandphysicalsystemswithemergentcollectivecomputational abilities.Proceedings of the National Academy of Sciences, 79(8):2554–2558, 1982

    JohnJ.Hopfield. Neuralnetworksandphysicalsystemswithemergentcollectivecomputational abilities.Proceedings of the National Academy of Sciences, 79(8):2554–2558, 1982

  15. [15]

    Jadbabaie, J

    A. Jadbabaie, J. Lin, and A.S. Morse. Coordination of groups of mobile autonomous agents using nearest neighbor rules. InProceedings of the 41st IEEE Conference on Decision and Control, 2002., volume 3, pages 2953–2958 vol.3, 2002. doi: 10.1109/CDC.2002.1184304

  16. [16]

    Thevariationalformulationofthefokker– planck equation.SIAM Journal on Mathematical Analysis, 29(1):1–17, 1998

    RichardJordan, DavidKinderlehrer, andFelixOtto. Thevariationalformulationofthefokker– planck equation.SIAM Journal on Mathematical Analysis, 29(1):1–17, 1998

  17. [17]

    Self-entrainment of a population of coupled non-linear oscillators

    Yoshiki Kuramoto. Self-entrainment of a population of coupled non-linear oscillators. In H. Araki, editor,International Symposium on Mathematical Problems in Theoretical Physics, volume 39 ofLecture Notes in Physics, pages 420–422. Springer, 1975. 74

  18. [18]

    Lampl I, Reichova I

    Ferster D. Lampl I, Reichova I. Synchronous membrane potential fluctuations in neurons of the cat visual cortex.Neuron, 22(2):361–374, 1999

  19. [19]

    Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling.ICLR, 2023

  20. [20]

    Lisman and Ole Jensen

    John E. Lisman and Ole Jensen. The theta-gamma neural code.Neuron, 77(6):1002–1016, 2013

  21. [21]

    Sfm: Stochastic flow matching for discrete and categorical data.NeurIPS, 2024

    Xuehai Liu et al. Sfm: Stochastic flow matching for discrete and categorical data.NeurIPS, 2024

  22. [22]

    Ricci curvature for metric-measure spaces via optimal transport

    John Lott and Cédric Villani. Ricci curvature for metric-measure spaces via optimal transport. Annals of Mathematics, 169(3):903–991, 2009

  23. [23]

    Context-dependent computation by recurrent dynamics in prefrontal cortex.Nature, 503(7474):78–84, 2013

    ValerioMante, DavidSussillo, KrishnaVShenoy, andWilliamTNewsome. Context-dependent computation by recurrent dynamics in prefrontal cortex.Nature, 503(7474):78–84, 2013

  24. [24]

    Robert J. McCann. A convexity principle for interacting gases.Advances in Mathematics, 128 (1):153–179, 1997. doi: 10.1006/aima.1997.1634

  25. [25]

    Olfati-Saber and R.M

    R. Olfati-Saber and R.M. Murray. Consensus problems in networks of agents with switching topology and time-delays.IEEE Transactions on Automatic Control, 49(9):1520–1533, 2004. doi: 10.1109/TAC.2004.834113

  26. [26]

    Alex Fax, and Richard M

    Reza Olfati-Saber, J. Alex Fax, and Richard M. Murray. Consensus and cooperation in networked multi-agent systems.Proceedings of the IEEE, 95(1):215–233, 2007. doi: 10.1109/JPROC.2006.887293

  27. [27]

    Transient dynamics for neural process- ing.Science, 321(5885):48–50, 2008

    Misha Rabinovich, Ramón Huerta, and Gilles Laurent. Transient dynamics for neural process- ing.Science, 321(5885):48–50, 2008

  28. [28]

    Consensus seeking in multiagent systems under dynamically changing interaction topologies.IEEE Transactions on Automatic Control, 50(5):655–661, 2005

    Wei Ren and Randal W Beard. Consensus seeking in multiagent systems under dynamically changing interaction topologies.IEEE Transactions on Automatic Control, 50(5):655–661, 2005

  29. [29]

    Birkhäuser, 2015

    Filippo Santambrogio.Optimal Transport for Applied Mathematicians, volume 87 ofProgress in Nonlinear Differential Equations and Their Applications. Birkhäuser, 2015

  30. [30]

    Probability measures on metric spaces of nonpositive curvature.Heat Kernels and Analysis on Manifolds, Graphs, and Metric Spaces, pages 357–390, 2003

    Karl-Theodor Sturm. Probability measures on metric spaces of nonpositive curvature.Heat Kernels and Analysis on Manifolds, Graphs, and Metric Spaces, pages 357–390, 2003

  31. [31]

    Sutton and Andrew G

    Richard S. Sutton and Andrew G. Barto.Reinforcement Learning: An Introduction. MIT Press, 2 edition, 2018

  32. [32]

    Springer, 2009

    Cédric Villani.Optimal Transport: Old and New, volume 338 ofGrundlehren der mathema- tischen Wissenschaften. Springer, 2009

  33. [33]

    AMS, 2003

    Cédric Villani.Topics in Optimal Transportation. AMS, 2003

  34. [34]

    Neurophysiological and computational principles of cortical rhythms in cog- nition.Physiological Reviews, 90(3):1195–1268, 2010

    Xiao-Jing Wang. Neurophysiological and computational principles of cortical rhythms in cog- nition.Physiological Reviews, 90(3):1195–1268, 2010. 75