Curl Descent: Non-Gradient Learning Dynamics with Sign-Diverse Plasticity

Hugo Ninou; Jonathan Kadmon; N. Alex Cayco-Gajic

arxiv: 2510.02765 · v4 · submitted 2025-10-03 · 💻 cs.LG

Curl Descent: Non-Gradient Learning Dynamics with Sign-Diverse Plasticity

Hugo Ninou , Jonathan Kadmon , N. Alex Cayco-Gajic This is my paper

Pith reviewed 2026-05-18 10:25 UTC · model grok-4.3

classification 💻 cs.LG

keywords non-gradient learningcurl termsplasticity rulesstudent-teacher frameworksaddle escapelearning stabilityneural network dynamics

0 comments

The pith

Non-gradient curl terms from diverse plasticity rules can match gradient descent or accelerate learning by escaping saddles depending on their strength and network architecture.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper examines whether learning dynamics in neural networks can include fundamentally non-gradient curl-like terms that still optimize a loss function effectively. These terms arise naturally from inhibitory-excitatory connectivity or mixed Hebbian and anti-Hebbian plasticity rules. The authors introduce them systematically in a student-teacher framework by allowing some neurons to follow rule-flipped plasticity. For small curl strengths the solution manifold remains stable and the dynamics closely resemble ordinary gradient descent. Beyond a critical strength the manifold destabilizes, which in some architectures produces chaotic weight changes that ruin performance while in others it lets the system temporarily ascend the loss surface to leave saddle points more quickly than gradient descent alone.

Core claim

In the student-teacher framework with rule-flipped plasticity, the learning dynamics include curl terms that cannot be derived from any single loss function. Small magnitudes of these curl terms leave the stability of the solution manifold intact, yielding convergence behavior akin to gradient descent. Above a critical strength, the curl terms destabilize the manifold; in some architectures this produces chaos and poor performance, while in others it permits the weights to ascend the loss temporarily and thereby escape saddle points more rapidly than pure gradient descent would allow.

What carries the argument

Curl terms generated by rule-flipped plasticity neurons, which add non-conservative components to the weight-update vector field and prevent it from being the gradient of any scalar loss.

If this is right

Small curl terms produce weight dynamics that remain close to those of gradient descent on the original loss.
Strong curl terms destabilize the solution manifold once their magnitude exceeds a critical threshold determined by network parameters.
In certain feedforward architectures the resulting instability generates chaotic trajectories that prevent successful learning.
In other architectures the same instability lets the dynamics climb the loss surface briefly, enabling escape from saddle points and faster overall convergence.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Biological circuits containing both excitatory and inhibitory neurons may naturally generate curl components that confer robustness or speed advantages over pure gradient-like rules.
The critical curl threshold could be measured experimentally by gradually altering the balance of Hebbian and anti-Hebbian synapses and tracking changes in learning speed or stability.
Extending the analysis to recurrent or deeper networks might reveal whether curl terms interact with other known optimization phenomena such as implicit regularization.

Load-bearing premise

The student-teacher framework with neurons exhibiting rule-flipped plasticity sufficiently captures the non-gradient effects that would arise from inhibitory-excitatory connectivity or Hebbian/anti-Hebbian plasticity in more realistic networks.

What would settle it

A numerical simulation that increases curl strength past the analytically predicted critical value and checks whether one architecture shows chaotic divergence while another shows faster convergence via temporary loss increase would confirm or refute the stability transition.

read the original abstract

Gradient-based algorithms are a cornerstone of artificial neural network training, yet it remains unclear whether biological neural networks use similar gradient-based strategies during learning. Experiments often discover a diversity of synaptic plasticity rules, but whether these amount to an approximation to gradient descent is unclear. Here we investigate a previously overlooked possibility: that learning dynamics may include fundamentally non-gradient "curl"-like components while still being able to effectively optimize a loss function. Curl terms naturally emerge in networks with inhibitory-excitatory connectivity or Hebbian/anti-Hebbian plasticity, resulting in learning dynamics that cannot be framed as gradient descent on any objective. To investigate the impact of these curl terms, we analyze feedforward networks within an analytically tractable student-teacher framework, systematically introducing non-gradient dynamics through neurons exhibiting rule-flipped plasticity. Small curl terms preserve the stability of the original solution manifold, resulting in learning dynamics similar to gradient descent. Beyond a critical value, strong curl terms destabilize the solution manifold. Depending on the network architecture, this loss of stability can lead to chaotic learning dynamics that destroy performance. In other cases, the curl terms can counterintuitively speed learning compared to gradient descent by allowing the weight dynamics to escape saddles by temporarily ascending the loss. Our results identify specific architectures capable of supporting robust learning via diverse learning rules, providing an important counterpoint to normative theories of gradient-based learning in neural networks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces curl terms via rule-flipped plasticity in a student-teacher model and shows small curls preserve manifold stability while larger ones can destabilize or accelerate saddle escape depending on architecture.

read the letter

The main thing here is that you can inject non-gradient curl components into learning dynamics without immediately breaking optimization, at least inside their controlled setup. Small curl strengths keep the solution manifold stable so the weight flow stays close to gradient descent. Past a threshold the manifold loses stability, and the outcome splits by architecture: chaos that tanks performance in some cases, or faster escape from saddles through temporary loss increases in others.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces curl-like non-gradient terms into the learning dynamics of feedforward networks via a student-teacher framework in which a subset of neurons follow rule-flipped plasticity. It claims that small curl terms leave the solution manifold stable, yielding dynamics qualitatively similar to gradient descent, while beyond a critical curl strength the manifold loses stability; depending on architecture this produces either chaotic trajectories that degrade performance or, counterintuitively, accelerated escape from saddles that improves learning speed relative to pure gradient descent.

Significance. If the stability thresholds and architecture-dependent outcomes are robust, the work supplies a concrete, analytically tractable counter-example to the assumption that effective learning requires gradient flow on some objective. The explicit separation of curl magnitude from gradient strength, together with the identification of architectures that tolerate or benefit from sign-diverse plasticity, offers a useful bridge between normative optimization theory and the diversity of biological plasticity rules.

major comments (2)

[Abstract] Abstract and the paragraph introducing non-gradient dynamics: the claim that curl terms 'naturally emerge' from inhibitory-excitatory connectivity or opposing Hebbian/anti-Hebbian rules is not accompanied by a derivation that maps those connectivity patterns onto the curl operator realized by rule-flipping; without this mapping or a spectral comparison, the reported critical curl value and the chaos-versus-saddle-escape transition remain tied to the specific proxy and may not generalize.
[Stability analysis] Stability analysis (student-teacher section): the statement that small curl terms 'preserve the stability of the original solution manifold' is load-bearing for the central claim, yet the manuscript provides neither the explicit Jacobian at the manifold nor the perturbation analysis showing how the curl contribution shifts its eigenvalues; the absence of these steps prevents verification that the preservation holds for any non-zero curl rather than only in a trivial limit.

minor comments (2)

[Introduction] Notation for the curl vector field should be introduced with an explicit equation (e.g., the decomposition of the flow into gradient plus curl components) before the stability claims are stated.
Simulation figures comparing learning curves with and without curl would benefit from error bars or multiple random seeds to make the reported speed-up or performance collapse statistically visible.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments. We respond to each major comment below and describe the revisions we will make.

read point-by-point responses

Referee: [Abstract] Abstract and the paragraph introducing non-gradient dynamics: the claim that curl terms 'naturally emerge' from inhibitory-excitatory connectivity or opposing Hebbian/anti-Hebbian rules is not accompanied by a derivation that maps those connectivity patterns onto the curl operator realized by rule-flipping; without this mapping or a spectral comparison, the reported critical curl value and the chaos-versus-saddle-escape transition remain tied to the specific proxy and may not generalize.

Authors: We agree that an explicit mapping would improve the link to biological mechanisms. In the revised manuscript we will insert a short derivation in the introduction showing how a minimal two-population circuit with opposing Hebbian and anti-Hebbian rules produces an effective curl component in the weight-update vector field. The derivation will be accompanied by a spectral comparison between the resulting non-gradient flow and the rule-flipped proxy employed in the student-teacher analysis. We will also add a clarifying sentence stating that the quantitative critical curl strength is model-specific while the qualitative stability transition is expected to be robust for small non-gradient perturbations of this class. revision: yes
Referee: [Stability analysis] Stability analysis (student-teacher section): the statement that small curl terms 'preserve the stability of the original solution manifold' is load-bearing for the central claim, yet the manuscript provides neither the explicit Jacobian at the manifold nor the perturbation analysis showing how the curl contribution shifts its eigenvalues; the absence of these steps prevents verification that the preservation holds for any non-zero curl rather than only in a trivial limit.

Authors: The referee is correct that the stability claim would be easier to verify with the explicit steps. In the revision we will add the closed-form Jacobian of the student-teacher dynamics evaluated on the solution manifold, followed by a first-order perturbation analysis of its eigenvalues under the addition of the curl term. The calculation shows that, to linear order in curl strength, the real parts of the eigenvalues remain unchanged while imaginary parts appear, confirming local stability for sufficiently small curl. We will also include a brief numerical check of the eigenvalue shift for a range of small curl values. revision: yes

Circularity Check

0 steps flagged

No circularity: explicit model construction and dynamical analysis

full rationale

The paper defines a student-teacher framework and introduces non-gradient curl components explicitly by assigning rule-flipped plasticity to a subset of neurons. It then performs a direct stability analysis of the resulting weight dynamics on the solution manifold, deriving thresholds for preservation versus destabilization from the model's own differential equations. No parameters are fitted to data and then relabeled as predictions, no self-citations supply load-bearing uniqueness theorems, and no ansatz is smuggled in; the reported behaviors (small-curl stability, large-curl chaos or saddle escape) are obtained by standard dynamical-systems methods applied to the constructed vector field.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The analysis rests on the standard student-teacher modeling assumption plus the new modeling choice of rule-flipped plasticity to generate curl terms; no additional free parameters are explicitly fitted in the abstract description.

axioms (1)

domain assumption Feedforward networks within an analytically tractable student-teacher framework can be used to study the impact of non-gradient dynamics.
Invoked to enable systematic introduction of curl terms and stability analysis.

invented entities (1)

curl terms in learning dynamics no independent evidence
purpose: To represent fundamentally non-gradient components arising from inhibitory-excitatory connectivity or Hebbian/anti-Hebbian plasticity.
Introduced as the central object of study whose strength is varied to probe stability and performance.

pith-pipeline@v0.9.0 · 5785 in / 1493 out tokens · 41491 ms · 2026-05-18T10:25:53.764867+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Curl descent rule: ΔW_curl_1 = −W₂ᵀ e xᵀ D₁, ΔW_curl_2 = −e hᵀ D₂ with d_{l,j} ∈ {+1,−1}
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean alpha_pin_under_high_calibration unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Jacobian spectrum on solution manifold; stability boundary depends on compression ratio c = M/N and flip fraction α_h, α_r

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.