Dictionary learning for Kernel EDMD
Pith reviewed 2026-05-07 14:16 UTC · model grok-4.3
The pith
Simplifying kEDMD allows gradient optimization over kernel parameters to learn useful kernels from data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By simplifying kEDMD we show how to perform gradient-based optimization over the learnable kernel parameters, and demonstrate that this method leads to useful kernels for the original kEDMD. The focus of our work is a method that takes a weighted list of kernels with randomly initialized values as input and outputs a list of kernels and parameter values suitable for approximating the Koopman operator of the underlying system. We demonstrate that unimportant kernels can be removed from the list by analyzing the weights in the weighted sum.
What carries the argument
Simplified kEDMD formulation that enables direct gradient optimization of parameters in a weighted sum of kernels.
If this is right
- The learned kernels and parameters can be inserted directly into the original kEDMD to obtain finite-dimensional approximations of the Koopman operator and its spectrum.
- Analyzing the optimized weights allows removal of low-contribution kernels without systematic loss of approximation quality.
- The procedure applies to both ordinary differential equations such as the Duffing oscillator and partial differential equations such as the Kuramoto-Sivashinsky equation.
- Kernel choice for data-driven Koopman analysis becomes an automated optimization step rather than a manual selection task.
Where Pith is reading between the lines
- Similar gradient-based tuning of kernel weights could be applied to other operator-learning methods that rely on kernel dictionaries.
- The pruning step based on learned weights might transfer to dictionary-learning algorithms outside the Koopman setting.
- Testing the method on systems with stronger chaos or higher state dimension would reveal whether the random-initialization-plus-pruning strategy remains reliable.
Load-bearing premise
That gradient descent applied to a randomly initialized weighted sum of kernels will converge to parameter values that meaningfully capture the dynamics of the underlying system.
What would settle it
On the Duffing oscillator or Kuramoto-Sivashinsky PDE, if the kernels produced by the optimization yield higher prediction error or worse spectral approximation than manually chosen kernels when used in standard kEDMD, the claim is falsified.
Figures
read the original abstract
Studying nonlinear dynamical systems through their state space behavior can be challenging, and one possible alternative is to analyze them via their associated Koopman operator. This turns the nonlinear problem into a linear, infinite-dimensional one. To approximate the operator in finite dimensions, extended dynamic mode decomposition (EDMD) is a commonly used algorithm. It requires a finite list of functionals and a set of snapshots from the system to compute an approximation of the operator and its corresponding spectrum. Instead of choosing the list of functionals directly, it can be implicitly defined via kernels, a method known as kernel extended dynamic mode decomposition (kEDMD). However, one still needs to define the kernel and choose its parameter values. In this paper, we aim to streamline this process by extending dictionary learning for EDMD to kernel learning in kEDMD. By simplifying kEDMD we show how to perform gradient-based optimization over the learnable kernel parameters, and demonstrate that this method leads to useful kernels for the original kEDMD. The focus of our work is a method that takes a weighted list of kernels with randomly initialized values as input and outputs a list of kernels and parameter values suitable for approximating the Koopman operator of the underlying system. We demonstrate that unimportant kernels can be removed from the list by analyzing the weights in the weighted sum. We evaluate the method across several experiments, including the Duffing oscillator and the Kuramoto-Sivashinsky PDE, showcasing the method's different strengths.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper extends dictionary learning ideas from EDMD to kEDMD by introducing a simplified formulation of kernel EDMD that permits gradient-based optimization of parameters in a weighted sum of kernels. Starting from randomly initialized kernel weights and parameters, the method produces a pruned list of kernels and tuned values that are then inserted into the standard kEDMD pipeline; experiments on the Duffing oscillator and Kuramoto-Sivashinsky PDE are used to illustrate that the resulting kernels are useful for Koopman operator approximation.
Significance. If the transfer from the simplified proxy to the original kEDMD operator approximation holds, the approach would automate kernel selection and parameter tuning, a practical bottleneck in kEDMD applications. The weight-based pruning mechanism could further improve computational efficiency while preserving approximation quality.
major comments (2)
- [§3 (simplification and gradient flow)] The central claim that kernels optimized under the simplified kEDMD yield useful results for the original kEDMD (abstract and §4) rests on an unproven transfer: the paper must supply either an explicit equivalence between the proxy loss and the true least-squares Koopman residual or a quantitative error bound showing that stationary points of the simplified objective improve the original finite-dimensional operator. Without this, the gradient flow may optimize a different quantity.
- [Table 1 and §5.2] Table 1 and the Kuramoto-Sivashinsky experiments report improved spectrum accuracy after optimization, yet no ablation isolates the contribution of the simplification itself versus the choice of initial kernel pool or the pruning threshold; the reported gains could be driven by the richer initial dictionary rather than the learned parameters.
minor comments (2)
- The precise algebraic form of the simplification (which matrix or solve is replaced by a differentiable surrogate) is introduced only after the abstract claim; moving a short derivation or pseudocode to the introduction would clarify the scope of the approximation.
- Notation for the weighted kernel sum K_θ and the resulting Gram matrix should be unified between the optimization section and the final kEDMD reconstruction step.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment point by point below, indicating where revisions will be made.
read point-by-point responses
-
Referee: [§3 (simplification and gradient flow)] The central claim that kernels optimized under the simplified kEDMD yield useful results for the original kEDMD (abstract and §4) rests on an unproven transfer: the paper must supply either an explicit equivalence between the proxy loss and the true least-squares Koopman residual or a quantitative error bound showing that stationary points of the simplified objective improve the original finite-dimensional operator. Without this, the gradient flow may optimize a different quantity.
Authors: We thank the referee for identifying this gap. The simplification introduced in §3 replaces the full kernel matrix inversion with a weighted sum that is differentiable, enabling gradient descent on kernel parameters and weights. While the manuscript demonstrates empirically that the resulting kernels improve performance when inserted into the standard kEDMD pipeline (Duffing and Kuramoto-Sivashinsky examples), we do not claim or prove an exact equivalence to the original least-squares residual. We will revise §3 to state explicitly that the objective is a proxy chosen for tractability, include a short discussion of the potential mismatch, and note the absence of a transfer theorem as a limitation of the current analysis. revision: partial
-
Referee: [Table 1 and §5.2] Table 1 and the Kuramoto-Sivashinsky experiments report improved spectrum accuracy after optimization, yet no ablation isolates the contribution of the simplification itself versus the choice of initial kernel pool or the pruning threshold; the reported gains could be driven by the richer initial dictionary rather than the learned parameters.
Authors: We agree that the present experiments do not isolate these factors. The reported improvements could partly stem from the size of the initial kernel pool or the pruning rule rather than the gradient updates. In the revised manuscript we will add ablation studies to §5.2 and Table 1: (i) kEDMD using the same initial weighted kernels without optimization, (ii) optimization with pruning disabled, and (iii) results for varying pruning thresholds. These will quantify the separate contributions of the learned parameters. revision: yes
- Request for an explicit equivalence between the proxy loss and the true least-squares Koopman residual or a quantitative error bound on the transfer; deriving such a result would require substantial additional theoretical development that is not feasible within the scope of the present work.
Circularity Check
No significant circularity; optimization is data-driven from snapshots
full rationale
The paper's core procedure simplifies kEDMD to enable gradient flow over kernel parameters and weights, then empirically transfers the resulting kernels to standard kEDMD. This is a standard data-driven learning loop whose loss is computed from system snapshots, not from the final approximation error or from a self-referential definition. No quoted equation reduces the learned kernel or the claimed improvement to a fitted input by construction, and no load-bearing self-citation chain is invoked to justify uniqueness or the transfer step. The abstract and method description treat the simplification as a computational device whose validity is checked by downstream performance on the original operator, which is an independent empirical test rather than a tautology.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
@esa (Ref
\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...
-
[2]
\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...
-
[3]
@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.