Complex Stochastic Gradient Descent and Directional Bias in Reproducing Kernel Hilbert Spaces

Emeric Battaglia; Natanael Alpay

arxiv: 2604.23017 · v2 · pith:UDGTCHTAnew · submitted 2026-04-24 · 💻 cs.LG · cs.NA· math.CV· math.NA

Complex Stochastic Gradient Descent and Directional Bias in Reproducing Kernel Hilbert Spaces

Natanael Alpay , Emeric Battaglia This is my paper

Pith reviewed 2026-05-08 12:10 UTC · model grok-4.3

classification 💻 cs.LG cs.NAmath.CVmath.NA

keywords complex SGDreproducing kernel Hilbert spacesdirectional biaskernel regressionconvergence guaranteescomplex parametersFock spaceHardy space

0 comments

The pith

Complex SGD converges under the same assumptions as real SGD and extends directional bias to kernel regression in complex RKHS.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper defines a complex variant of stochastic gradient descent that supports updates on complex-valued parameters. It establishes convergence guarantees that mirror the standard real-valued assumptions of convexity, smoothness, and bounded variance, without imposing analyticity requirements. The same guarantees cover gradient descent. In reproducing kernel Hilbert spaces, the directional bias previously shown for real kernel regression carries over to the complex setting. Experiments confirm that this complex SGD recovers superoscillation functions in the Fock space and Blaschke products in the Hardy space for suitably chosen losses.

Core claim

We propose complex SGD that permits complex parameters and derive convergence guarantees under assumptions parallel to the real setting. These results also hold for GD. With the same assumptions we show that directional bias results extend from real to complex kernel regression problems. Empirical tests in complex RKHS recover superoscillation functions and Blaschke products as optimal solutions for particular loss functions.

What carries the argument

Complex SGD, defined via a complex gradient that enables parameter updates without analyticity constraints and allows direct transfer of real-case convergence arguments.

If this is right

Convergence proofs for real SGD and GD transfer directly to complex-valued optimization problems.
Directional bias in kernel regression holds in complex reproducing kernel Hilbert spaces.
Complex SGD recovers superoscillation functions from the Fock space and Blaschke products from the Hardy space for appropriate losses.
Complex-valued neural networks can be trained at large scale without analyticity constraints.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The parallel assumption structure suggests that existing real-valued SGD analyses can be reused for complex domains with minimal modification.
Applications in signal processing or quantum-inspired models may benefit from the bias-preserving property when fitting complex kernels.
Non-convex extensions could be tested by checking whether the same bias directions appear in complex neural network training.

Load-bearing premise

The standard assumptions of convexity, smoothness, and bounded variance continue to guarantee convergence when both parameters and gradients are complex-valued.

What would settle it

A concrete counter-example in which complex SGD fails to converge or loses the predicted directional bias in a low-dimensional complex kernel regression task under the listed assumptions.

read the original abstract

Stochastic Gradient Descent (SGD) is a known stochastic iterative method popular for large-scale convex optimization problems due to its simple implementation and scalability. Some objectives, such as those found in complex-valued neural networks, benefit from updates like in SGD and Gradient Descent (GD) with a newly defined ``gradient'' that allows for complex parameters. This complex variant of the SGD/GD methods has already been proposed, but convergence guarantees without analyticity constraints have not yet been provided. We propose a variant of SGD (complex SGD) that allows for complex parameters, and we provide convergence guarantees under assumptions that parallel those from the real setting. Notably, these results extend to GD as well, and with the same set of assumptions, we confirm that some directional bias results extend from the real to the complex setting for kernel regression problems. We provide empirical results demonstrating the efficacy of the complex SGD in kernel regression problems utilizing complex reproducing kernel Hilbert spaces. In particular, we demonstrate we may recover superoscillation functions and Blaschke products from the Fock Space and Hardy Space, respectively, as the optimal functions for a particular choice of a loss function.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper extends SGD convergence guarantees to complex parameters without analyticity and checks directional bias in complex RKHS, but the variance bound and descent steps need explicit re-derivation for sesquilinear products.

read the letter

The central thing to know is that this work supplies convergence results for a complex SGD variant and for GD, under assumptions that line up with the usual real-valued ones like convexity, smoothness, and bounded variance, and it shows some directional bias results carry over to kernel regression in complex RKHS. The experiments recover superoscillation functions in Fock space and Blaschke products in Hardy space as optimal solutions for a chosen loss.

Referee Report

2 major / 2 minor

Summary. The paper proposes a complex-valued variant of SGD (complex SGD) for optimization with complex parameters, extending to GD, and claims convergence guarantees under assumptions (convexity, smoothness, bounded variance) that parallel the real-valued case without requiring analyticity. It further asserts that directional bias results from real kernel regression carry over to the complex RKHS setting, and provides empirical demonstrations of recovering superoscillation functions from the Fock space and Blaschke products from the Hardy space via suitable loss functions in complex kernel regression.

Significance. If the convergence analysis holds under the stated parallel assumptions, the work would meaningfully extend SGD theory to complex domains, supporting applications in complex-valued neural networks and kernel methods where analyticity is undesirable. The empirical recovery of specific functions in Fock and Hardy spaces provides concrete evidence of utility in complex RKHS settings. The lack of analyticity constraints is a potential strength over prior complex optimization literature.

major comments (2)

[Convergence analysis / Theorem on complex SGD] Convergence theorem for complex SGD (likely §3 or the main theorem paralleling real SGD): The bounded-variance assumption is transplanted directly from the real case to bound E[||g_t - ∇f||^2] ≤ σ² inside the descent inequality, but the sesquilinear inner product and complex modulus require an explicit re-derivation of the inequality; without it the stochastic-error control step is not secured by the listed assumptions alone.
[Directional bias results / kernel regression experiments] Extension of directional bias to complex kernel regression (kernel regression section): The claim that 'some directional bias results extend' is stated under the same assumptions, but the bias analysis must account for phase and Hermitian structure; a direct comparison to the real-valued bias equations (e.g., the specific bias term) is needed to confirm the extension is not merely formal.

minor comments (2)

[Introduction] The definition of the 'newly defined gradient' for complex parameters should be stated explicitly with notation in the introduction or §2 rather than deferred to the methods.
[Experiments] Empirical figures for superoscillation and Blaschke product recovery would benefit from error bars or multiple random seeds to demonstrate stability of the complex SGD.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading of the manuscript and for highlighting these important points regarding the rigor of the convergence analysis and the directional bias extension. We address each major comment below and will incorporate the requested clarifications and derivations into a revised version.

read point-by-point responses

Referee: Convergence theorem for complex SGD (likely §3 or the main theorem paralleling real SGD): The bounded-variance assumption is transplanted directly from the real case to bound E[||g_t - ∇f||^2] ≤ σ² inside the descent inequality, but the sesquilinear inner product and complex modulus require an explicit re-derivation of the inequality; without it the stochastic-error control step is not secured by the listed assumptions alone.

Authors: We agree that an explicit re-derivation is required to rigorously establish the descent inequality under the sesquilinear inner product and complex modulus. Although the assumptions are stated to parallel the real-valued case, the stochastic-error control step does need to be re-derived from first principles for the complex setting. In the revised manuscript we will insert a complete, self-contained derivation of the key inequality (including the handling of the complex modulus and the expectation of the squared norm), thereby securing the convergence result under the listed assumptions. revision: yes
Referee: Extension of directional bias to complex kernel regression (kernel regression section): The claim that 'some directional bias results extend' is stated under the same assumptions, but the bias analysis must account for phase and Hermitian structure; a direct comparison to the real-valued bias equations (e.g., the specific bias term) is needed to confirm the extension is not merely formal.

Authors: We acknowledge that a direct comparison is necessary to demonstrate that the extension accounts for phase and the Hermitian inner-product structure rather than being merely formal. In the revised version we will expand the directional-bias subsection to include side-by-side statements of the real-valued bias equation and its complex counterpart, explicitly showing how the bias term is modified by the sesquilinear form and the phase factor. This will make the substantive nature of the extension clear. revision: yes

Circularity Check

0 steps flagged

No circularity: convergence and bias extensions derived from parallel assumptions without self-referential reduction.

full rationale

The paper introduces complex SGD as a variant allowing complex parameters, then states convergence guarantees under assumptions (convexity, smoothness, bounded variance) that explicitly parallel the real-valued case, without analyticity constraints. Directional bias results for kernel regression in complex RKHS are presented as extensions confirmed under the same assumptions. No equations or claims reduce a target quantity to its own definition or fitted inputs by construction. No load-bearing self-citation chain is invoked to justify uniqueness or the core proofs; the work treats the real-setting results as external benchmarks and transplants the assumption structure directly. This is a standard honest extension rather than a renaming or ansatz smuggling. The derivation chain remains self-contained against external real-valued SGD literature.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review is based solely on the abstract; the paper relies on standard real-valued SGD assumptions adapted to the complex domain. No free parameters or new entities are mentioned.

axioms (1)

domain assumption Assumptions parallel to those used for real-valued SGD (convexity, smoothness, bounded variance) hold and suffice for complex parameters and gradients
Abstract states convergence guarantees are provided 'under assumptions that parallel those from the real setting'.

pith-pipeline@v0.9.0 · 5505 in / 1464 out tokens · 66296 ms · 2026-05-08T12:10:09.206864+00:00 · methodology

Complex Stochastic Gradient Descent and Directional Bias in Reproducing Kernel Hilbert Spaces

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)