Dictionary learning for Kernel EDMD

Boumediene Hamzi; Erik Lien Bolager; Felix Dietrich; Houman Owhadi; Ioannis G. Kevrekidis

arxiv: 2604.25572 · v1 · submitted 2026-04-28 · 🧮 math.DS · cs.LG

Dictionary learning for Kernel EDMD

Erik Lien Bolager , Boumediene Hamzi , Houman Owhadi , Ioannis G. Kevrekidis , Felix Dietrich This is my paper

Pith reviewed 2026-05-07 14:16 UTC · model grok-4.3

classification 🧮 math.DS cs.LG

keywords kernel extended dynamic mode decompositionKoopman operatordictionary learninggradient optimizationnonlinear dynamical systemsDuffing oscillatorKuramoto-Sivashinsky equation

0 comments

The pith

Simplifying kEDMD allows gradient optimization over kernel parameters to learn useful kernels from data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper extends dictionary learning ideas to kernel extended dynamic mode decomposition by first simplifying the kEDMD procedure. The simplification makes it possible to use gradient descent to tune the parameters of a weighted combination of kernels that starts from random initial values. The resulting kernels are then plugged back into the standard kEDMD algorithm to approximate the Koopman operator from system snapshots. Experiments on the Duffing oscillator and the Kuramoto-Sivashinsky PDE show that the learned kernels produce good approximations and that the learned weights can be used to drop kernels that contribute little. A reader would care because the method reduces the need to hand-select and tune kernels when applying data-driven Koopman analysis to nonlinear dynamics.

Core claim

By simplifying kEDMD we show how to perform gradient-based optimization over the learnable kernel parameters, and demonstrate that this method leads to useful kernels for the original kEDMD. The focus of our work is a method that takes a weighted list of kernels with randomly initialized values as input and outputs a list of kernels and parameter values suitable for approximating the Koopman operator of the underlying system. We demonstrate that unimportant kernels can be removed from the list by analyzing the weights in the weighted sum.

What carries the argument

Simplified kEDMD formulation that enables direct gradient optimization of parameters in a weighted sum of kernels.

If this is right

The learned kernels and parameters can be inserted directly into the original kEDMD to obtain finite-dimensional approximations of the Koopman operator and its spectrum.
Analyzing the optimized weights allows removal of low-contribution kernels without systematic loss of approximation quality.
The procedure applies to both ordinary differential equations such as the Duffing oscillator and partial differential equations such as the Kuramoto-Sivashinsky equation.
Kernel choice for data-driven Koopman analysis becomes an automated optimization step rather than a manual selection task.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar gradient-based tuning of kernel weights could be applied to other operator-learning methods that rely on kernel dictionaries.
The pruning step based on learned weights might transfer to dictionary-learning algorithms outside the Koopman setting.
Testing the method on systems with stronger chaos or higher state dimension would reveal whether the random-initialization-plus-pruning strategy remains reliable.

Load-bearing premise

That gradient descent applied to a randomly initialized weighted sum of kernels will converge to parameter values that meaningfully capture the dynamics of the underlying system.

What would settle it

On the Duffing oscillator or Kuramoto-Sivashinsky PDE, if the kernels produced by the optimization yield higher prediction error or worse spectral approximation than manually chosen kernels when used in standard kEDMD, the claim is falsified.

Figures

Figures reproduced from arXiv: 2604.25572 by Boumediene Hamzi, Erik Lien Bolager, Felix Dietrich, Houman Owhadi, Ioannis G. Kevrekidis.

**Figure 1.** Figure 1: Different loss functions (vertical axes) measured over training epochs (horizontal axes), for the Duffing view at source ↗

**Figure 2.** Figure 2: The trajectories of the test set, where on the left we see the trajectories computed from the initial conditions view at source ↗

**Figure 3.** Figure 3: The eigenvalues of the Koopman approximation for the Duffing oscillator, using two different kernels. The view at source ↗

**Figure 4.** Figure 4: Different loss functions (vertical axes) measured for each epoch (horizontal axes), using the current kernel view at source ↗

**Figure 5.** Figure 5: The trajectories of the test set, where on the right we see the true trajectories. Using the original kEDMD, on view at source ↗

**Figure 6.** Figure 6: The eigenvalues of Ktr computed using the kernel with initialization parameter on the left, and similarly using the parameter found through dictionary learning on the right. points in the training set, where we divide them into five batches. For the subsampling, we set N˜ = 40, and create the following kernel gθ(x, y) = ˜w 2 1 exp − ∥h(x) − h(y)∥ 2 2 2σ 2 1 + ˜w 2 2 exp − |x − y| 2 2σ 2 2 + ˜w 2 3 … view at source ↗

**Figure 7.** Figure 7: Eigenvalues of Ksk, where on the left we have before training, in the middle after training, and on the right is after pruning, i.e., after setting w2 = w3 = w4 = 0.0. On the right plot we also plot the first 20 true eigenvalues, i.e., we plot exp(iωj) for j = 1, 2, . . . , 20. The true values are marked by red cross, while the approximated ones are the blue dots. 4.3 Kuramoto-Sivashinsky equation To test … view at source ↗

**Figure 8.** Figure 8: Different loss functions measured for each epoch for the KS PDE. On the left we have view at source ↗

**Figure 9.** Figure 9: Approximating the trajectories of the Kuramoto-Sivashinsky PDE for three chosen initial conditions in the view at source ↗

**Figure 10.** Figure 10: The eigenvalues of Ksk when approximating the Kuramoto-Sivashinsky PDE, computed using kernel with initialization parameter on the left, and similarly for the parameter found through dictionary learning on the right. The corresponding color of each point is the eigenvalue’s residual, computed following Section 2.1.1. The minimum, median, and maximum value of the residuals for the initial kernel and after … view at source ↗

**Figure 11.** Figure 11: Approximating the trajectories of the Kuramoto-Sivashinsky PDE for three chosen initial conditions in view at source ↗

**Figure 12.** Figure 12: The eigenvalues of Ksk when approximating the Kuramoto-Sivashinsky PDE after pruning the kernel based on the first iteration of training. Computed using kernel with initialization parameter on the left, and similarly for the parameter found through dictionary learning on the right. The corresponding color of each point is the eigenvalue’s residual, computed following Section 2.1.1. The minimum, median, an… view at source ↗

**Figure 13.** Figure 13: Approximating the trajectories of the Kuramoto-Sivashinsky PDE for all 10 trajectories in the test set. On the view at source ↗

**Figure 14.** Figure 14: Approximating the trajectories of the Kuramoto-Sivashinsky PDE for all 10 trajectories in the test set after view at source ↗

**Figure 15.** Figure 15: Approximating the trajectories of the Kuramoto-Sivashinsky PDE for 10 trajectories in the training set after view at source ↗

**Figure 16.** Figure 16: Approximating the trajectories of the Kuramoto-Sivashinsky PDE for 10 trajectories in the training set view at source ↗

read the original abstract

Studying nonlinear dynamical systems through their state space behavior can be challenging, and one possible alternative is to analyze them via their associated Koopman operator. This turns the nonlinear problem into a linear, infinite-dimensional one. To approximate the operator in finite dimensions, extended dynamic mode decomposition (EDMD) is a commonly used algorithm. It requires a finite list of functionals and a set of snapshots from the system to compute an approximation of the operator and its corresponding spectrum. Instead of choosing the list of functionals directly, it can be implicitly defined via kernels, a method known as kernel extended dynamic mode decomposition (kEDMD). However, one still needs to define the kernel and choose its parameter values. In this paper, we aim to streamline this process by extending dictionary learning for EDMD to kernel learning in kEDMD. By simplifying kEDMD we show how to perform gradient-based optimization over the learnable kernel parameters, and demonstrate that this method leads to useful kernels for the original kEDMD. The focus of our work is a method that takes a weighted list of kernels with randomly initialized values as input and outputs a list of kernels and parameter values suitable for approximating the Koopman operator of the underlying system. We demonstrate that unimportant kernels can be removed from the list by analyzing the weights in the weighted sum. We evaluate the method across several experiments, including the Duffing oscillator and the Kuramoto-Sivashinsky PDE, showcasing the method's different strengths.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a workable gradient method to tune kernel parameters in kEDMD by optimizing a weighted sum on a simplified proxy and pruning by weight, with decent results on Duffing and Kuramoto-Sivashinsky, but the transfer from proxy to original kEDMD still needs tighter checks.

read the letter

The core contribution is extending dictionary learning to kernel parameters: start with a random weighted sum of kernels, run gradient descent on the parameters after a simplification that makes backprop feasible, then drop low-weight kernels. This directly addresses the manual kernel choice that has always been a pain point in kEDMD. The experiments on the Duffing oscillator and the Kuramoto-Sivashinsky PDE show that the resulting kernels can be dropped back into standard kEDMD and produce usable approximations, which is the practical test that matters most here. The pruning rule is simple and appears to preserve performance without much loss in the reported cases. That is useful incremental work for anyone already running kEDMD on data from nonlinear systems or PDEs. The main soft spot is exactly the one the stress-test flags. The optimization happens on a simplified version of kEDMD so that gradients flow through the kernel parameters. The paper asserts that the learned kernels remain useful in the unmodified algorithm, but the abstract and the described method do not supply an explicit error bound or a side-by-side comparison of the two objectives. If the simplification alters the Gram matrix construction or the least-squares step even modestly, the stationary points found by gradient descent optimize a different quantity than the true Koopman approximation error. The experiments mitigate this concern by showing end-to-end performance, yet a short analysis of how much the proxy deviates would make the claim more secure. The citation pattern looks standard for the EDMD/kEDMD literature and does not appear to over-claim novelty. This is the kind of paper that belongs in a methods-focused journal or conference proceedings on data-driven dynamical systems. Readers who already use kernel EDMD for control or reduced-order modeling will get immediate practical value from the procedure and the pruning heuristic. It is coherent on its own terms and shows honest engagement with the computational bottleneck, so it deserves a serious referee rather than a desk reject. I would send it out for review with a request that the authors clarify the relationship between the proxy loss and the original kEDMD error.

Referee Report

2 major / 2 minor

Summary. The paper extends dictionary learning ideas from EDMD to kEDMD by introducing a simplified formulation of kernel EDMD that permits gradient-based optimization of parameters in a weighted sum of kernels. Starting from randomly initialized kernel weights and parameters, the method produces a pruned list of kernels and tuned values that are then inserted into the standard kEDMD pipeline; experiments on the Duffing oscillator and Kuramoto-Sivashinsky PDE are used to illustrate that the resulting kernels are useful for Koopman operator approximation.

Significance. If the transfer from the simplified proxy to the original kEDMD operator approximation holds, the approach would automate kernel selection and parameter tuning, a practical bottleneck in kEDMD applications. The weight-based pruning mechanism could further improve computational efficiency while preserving approximation quality.

major comments (2)

[§3 (simplification and gradient flow)] The central claim that kernels optimized under the simplified kEDMD yield useful results for the original kEDMD (abstract and §4) rests on an unproven transfer: the paper must supply either an explicit equivalence between the proxy loss and the true least-squares Koopman residual or a quantitative error bound showing that stationary points of the simplified objective improve the original finite-dimensional operator. Without this, the gradient flow may optimize a different quantity.
[Table 1 and §5.2] Table 1 and the Kuramoto-Sivashinsky experiments report improved spectrum accuracy after optimization, yet no ablation isolates the contribution of the simplification itself versus the choice of initial kernel pool or the pruning threshold; the reported gains could be driven by the richer initial dictionary rather than the learned parameters.

minor comments (2)

The precise algebraic form of the simplification (which matrix or solve is replaced by a differentiable surrogate) is introduced only after the abstract claim; moving a short derivation or pseudocode to the introduction would clarify the scope of the approximation.
Notation for the weighted kernel sum K_θ and the resulting Gram matrix should be unified between the optimization section and the final kEDMD reconstruction step.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment point by point below, indicating where revisions will be made.

read point-by-point responses

Referee: [§3 (simplification and gradient flow)] The central claim that kernels optimized under the simplified kEDMD yield useful results for the original kEDMD (abstract and §4) rests on an unproven transfer: the paper must supply either an explicit equivalence between the proxy loss and the true least-squares Koopman residual or a quantitative error bound showing that stationary points of the simplified objective improve the original finite-dimensional operator. Without this, the gradient flow may optimize a different quantity.

Authors: We thank the referee for identifying this gap. The simplification introduced in §3 replaces the full kernel matrix inversion with a weighted sum that is differentiable, enabling gradient descent on kernel parameters and weights. While the manuscript demonstrates empirically that the resulting kernels improve performance when inserted into the standard kEDMD pipeline (Duffing and Kuramoto-Sivashinsky examples), we do not claim or prove an exact equivalence to the original least-squares residual. We will revise §3 to state explicitly that the objective is a proxy chosen for tractability, include a short discussion of the potential mismatch, and note the absence of a transfer theorem as a limitation of the current analysis. revision: partial
Referee: [Table 1 and §5.2] Table 1 and the Kuramoto-Sivashinsky experiments report improved spectrum accuracy after optimization, yet no ablation isolates the contribution of the simplification itself versus the choice of initial kernel pool or the pruning threshold; the reported gains could be driven by the richer initial dictionary rather than the learned parameters.

Authors: We agree that the present experiments do not isolate these factors. The reported improvements could partly stem from the size of the initial kernel pool or the pruning rule rather than the gradient updates. In the revised manuscript we will add ablation studies to §5.2 and Table 1: (i) kEDMD using the same initial weighted kernels without optimization, (ii) optimization with pruning disabled, and (iii) results for varying pruning thresholds. These will quantify the separate contributions of the learned parameters. revision: yes

standing simulated objections not resolved

Request for an explicit equivalence between the proxy loss and the true least-squares Koopman residual or a quantitative error bound on the transfer; deriving such a result would require substantial additional theoretical development that is not feasible within the scope of the present work.

Circularity Check

0 steps flagged

No significant circularity; optimization is data-driven from snapshots

full rationale

The paper's core procedure simplifies kEDMD to enable gradient flow over kernel parameters and weights, then empirically transfers the resulting kernels to standard kEDMD. This is a standard data-driven learning loop whose loss is computed from system snapshots, not from the final approximation error or from a self-referential definition. No quoted equation reduces the learned kernel or the claimed improvement to a fitted input by construction, and no load-bearing self-citation chain is invoked to justify uniqueness or the transfer step. The abstract and method description treat the simplification as a computational device whose validity is checked by downstream performance on the original operator, which is an independent empirical test rather than a tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Full details unavailable from abstract alone; the method appears to rest on standard assumptions of kernel methods and gradient optimization in machine learning for dynamical systems, plus the existence of suitable snapshot data from the system.

pith-pipeline@v0.9.0 · 5574 in / 1150 out tokens · 35361 ms · 2026-05-07T14:16:48.582665+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

3 extracted references · 1 canonical work pages

[1]

@esa (Ref

\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...
[2]

\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...
[3]

0362 #1 ^H 2

@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...

work page arXiv 2021

[1] [1]

@esa (Ref

\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

[2] [2]

\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

[3] [3]

0362 #1 ^H 2

@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...

work page arXiv 2021