pith. sign in

arxiv: 2605.15234 · v1 · pith:PDA6E56Qnew · submitted 2026-05-13 · 🧮 math.NA · cs.NA· math.SP· math.ST· stat.CO· stat.TH

Sampling pseudospectrum for data-driven matrices

Pith reviewed 2026-05-19 17:32 UTC · model grok-4.3

classification 🧮 math.NA cs.NAmath.SPmath.STstat.COstat.TH
keywords sampling pseudospectrumdata-driven matricesfinite samplingeigenvalue testingdynamic mode decompositionsignal versus noisestatistical assessment
0
0 comments X

The pith

A sampling pseudospectrum estimator lets users test statistically whether eigenvalues from finite data are genuine or sampling artifacts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses the lack of objective criteria for deciding which isolated eigenvalues from data-driven decompositions reflect real system behavior versus finite-sample errors. It defines a sampling pseudospectrum that supplies probabilistic information on where finite-data eigenvalues are likely to appear in the complex plane. An estimator for this pseudospectrum is obtained simply by reprocessing the original finite data sample, making the procedure computationally efficient. This setup supplies a general statistical test for the location of true eigenvalues without extra assumptions on error distributions. The result would let researchers separate signal patterns from noise more reliably in applications that construct matrices from limited observations.

Core claim

The paper claims that the sampling pseudospectrum P(λ) provides probabilistic information on the behaviour of finite-data eigenvalues, and that the estimator ˆP(λ) computed by reprocessing the finite data sample enables statistical tests for the location of the true eigenvalues of the underlying infinite-data operator.

What carries the argument

The sampling pseudospectrum P(λ) and its estimator ˆP(λ), which together describe the distribution of eigenvalues arising from finite sampling of the true operator.

If this is right

  • The estimator supplies an objective criterion for classifying peripheral eigenvalues as signal or noise in data-driven spectral methods.
  • The approach applies to any matrix constructed by least-squares fitting from finite observations, including dynamical mode decomposition and related algorithms.
  • Because the estimator reuses the existing sample, it adds little computational cost to existing analysis pipelines.
  • Persistent emergent patterns extracted from complex systems can be assessed for statistical significance with greater rigor.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Integration into standard dynamical mode decomposition codebases would allow automatic flagging of likely noisy eigenvalues.
  • The same reprocessing idea could be tested on synthetic datasets whose true spectra are known exactly.
  • Analogous estimators might be developed for other data-driven constructions such as those arising in machine learning of dynamical systems.
  • Connections to classical pseudospectral theory could yield explicit bounds on the estimator's accuracy.

Load-bearing premise

Reprocessing the finite data sample yields an unbiased estimator for the sampling pseudospectrum of the underlying infinite-data operator.

What would settle it

A systematic mismatch between the values of the estimator ˆP(λ) and the actual distribution of eigenvalues obtained from many independent finite samples of the same true operator would falsify the claim.

Figures

Figures reproduced from arXiv: 2605.15234 by Caroline Wormell.

Figure 1
Figure 1. Figure 1: Plot of the optimal windowing function κp. 6.2 Metastability kernel κp Now, suppose the Γℓ decay slowly. If the sampling process is chaotic or stochastic, we usually find that our correlations have some nice summation structure, i.e. there exists an expansion Γℓ = X K k=1 Wkµ ℓ k + O(e −ℓ/τ ), ℓ ≥ 0 where the µk lie on or inside the complex unit circle, and e −1/τ lies far enough inside the unit circle tha… view at source ↗
Figure 2
Figure 2. Figure 2: Left: sampling pseudospectrum for system in Section 7 with true eigenvalues (green crosses), [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Top left: graph of the map (15). Top right: its Ruelle–Pollicott resonances. Bottom: a [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: For different values of N, a comparison of P(λ) and Psym(λ) for the system in Section 8. Top row: filled contour plot of indicator Psym(λ); middle row: filled contour plot of indicator P(λ); bottom row: heatmap of their ratio. In black, the eigenvalues of the corresponding infinite-data Koopman matrices. Against these, we simulated KM for various values of M and N, and plotted their spectra against Psym(λ)… view at source ↗
Figure 5
Figure 5. Figure 5: In white, eigenvalues of 100 Koopman matrices for the system in Section 8 sampled with [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: For four different realisations, eigenvalues of [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Empirical estimate of [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Left: MPˆ landscape from Lorenz-63 system computed in Section 9; right: detail around eigenvalue #1 plot. Both eigenvalues of a finite-data approximation (M = 10, 000, red dots) and of the infinite-data limit (estimated by M = 107 , green crosses) are shown. In right, the boundary of a 95% confidence region for eigenvalue #1 described in the text is plotted in purple. 9 Example 3: Lorenz-63 system Estimati… view at source ↗
Figure 9
Figure 9. Figure 9: Leading EDMD eigenfunctions of the Lorenz-63 system computed in Section 9. Hue corresponds [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Least-squares residuals between large-data ( [PITH_FULL_IMAGE:figures/full_fig_p020_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Eigenvalues and sampling pseudospectrum of the Rayleigh–Bénard system computed in Sec [PITH_FULL_IMAGE:figures/full_fig_p021_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Leading DMD modes of the Rayleigh–Bénard system computed in Section 10, plotted as [PITH_FULL_IMAGE:figures/full_fig_p021_12.png] view at source ↗
read the original abstract

Many complex systems can be reduced to their key components through spectrally decomposing matrices that capture their dynamics. These matrices can in turn be constructed from data, often by least-squares fitting: examples of algorithms to do this include Dynamical Mode Decomposition and variants, subspace identification and eigenvalue realisation algorithms. Typical outputs of these algorithms include a range of isolated, peripheral eigenvalues capturing persistent emergent patterns in the system. However, there is no objective way to assess which of these discrete eigenvalues are artefacts of finite data error, and which are reflections of a fully sampled operator. n this paper, we present a sampling pseudospectrum $P(\lambda)$, that provides probabilistic information on the behaviour of finite-data eigenvalues in the complex plane, and an estimator $\hat P(\lambda)$, which can be obtained by reprocessing our finite data sample. The estimator, which is computationally efficient to implement, allows us to test statistically for the location of the true eigenvalues. This gives us a rigorous and very general way to assess whether the patterns we extract from finite data are likely to be signal or noise.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript introduces a sampling pseudospectrum P(λ) to characterize the probabilistic distribution of eigenvalues extracted from finite-data matrices obtained via least-squares fitting in data-driven methods such as DMD. It further defines an estimator ˆP(λ) computed by reprocessing the given finite sample, which is claimed to enable rigorous statistical tests for distinguishing true eigenvalues (signal) from finite-sample artefacts (noise).

Significance. If the unbiasedness of ˆP(λ) for P(λ) can be established under the stated conditions, the contribution would be significant: it supplies a computationally efficient, general-purpose statistical tool for validating extracted spectral features in dynamical systems identification, addressing a practical limitation in existing DMD and subspace methods.

major comments (1)
  1. Abstract: the claim that ˆP(λ) 'allows us to test statistically for the location of the true eigenvalues' and supplies a 'rigorous' assessment rests on the premise that reprocessing the finite sample yields an unbiased estimator of the sampling pseudospectrum P(λ). No derivation, expectation calculation, or error analysis establishing E[ˆP(λ)] = P(λ) is supplied in the abstract, and the text does not state the required conditions on sampling-error distributions or resolvent bounds; this is load-bearing for the central statistical-test claim.
minor comments (1)
  1. Abstract: the sentence beginning 'n this paper' contains a typographical error and should read 'In this paper'.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading of the manuscript and for highlighting the need for greater clarity on the statistical foundations of the estimator. We address the major comment point by point below and propose targeted revisions to strengthen the presentation.

read point-by-point responses
  1. Referee: Abstract: the claim that ˆP(λ) 'allows us to test statistically for the location of the true eigenvalues' and supplies a 'rigorous' assessment rests on the premise that reprocessing the finite sample yields an unbiased estimator of the sampling pseudospectrum P(λ). No derivation, expectation calculation, or error analysis establishing E[ˆP(λ)] = P(λ) is supplied in the abstract, and the text does not state the required conditions on sampling-error distributions or resolvent bounds; this is load-bearing for the central statistical-test claim.

    Authors: We agree that the abstract, due to length constraints, does not contain the full derivation or an explicit list of conditions. The main text does derive the expectation E[ˆP(λ)] = P(λ) by direct calculation under the data-generating model (Section 3), where the sampling errors are taken to be zero-mean and independent across snapshots with finite second moments. The resolvent is assumed bounded on the relevant compact set in the complex plane to control the perturbation. We acknowledge that these modeling assumptions and the explicit statement E[ˆP(λ)] = P(λ) could be stated more prominently. In the revised manuscript we will (i) add a short sentence to the abstract referencing the unbiasedness result under the stated conditions and (ii) insert a dedicated paragraph immediately after the definition of ˆP(λ) that lists the precise assumptions on the error distribution and the resolvent bound, together with a pointer to the expectation calculation. These changes will make the load-bearing statistical claim fully transparent without altering the technical content. revision: yes

Circularity Check

0 steps flagged

No circularity: estimator obtained by independent data reprocessing

full rationale

The paper defines the sampling pseudospectrum P(λ) for the infinite-data operator and introduces the estimator ˆP(λ) explicitly as obtained by reprocessing the finite data sample. This construction does not reduce to a self-definitional loop, a fitted parameter renamed as prediction, or any self-citation chain. The statistical test for eigenvalues follows from the reprocessing procedure rather than assuming the target result by construction, rendering the derivation self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The contribution rests on standard assumptions from numerical linear algebra and statistics for dynamical systems; no free parameters or invented physical entities are mentioned.

axioms (1)
  • domain assumption Finite sampling of data introduces probabilistic perturbations to the eigenvalues of the fitted matrix relative to the true operator.
    This is the core modeling assumption that motivates the need for a sampling pseudospectrum.
invented entities (1)
  • Sampling pseudospectrum P(λ) no independent evidence
    purpose: To encode probabilistic information on the location of finite-data eigenvalues in the complex plane.
    New diagnostic object introduced by the paper.

pith-pipeline@v0.9.0 · 5717 in / 1168 out tokens · 40083 ms · 2026-05-19T17:32:55.864793+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · 1 internal anchor

  1. [1]

    A Collatz-Wielandt characterization of the spectral radius of order-preserving homogeneous maps on cones

    Marianne Akian, Stéphane Gaubert, and Roger Nussbaum. A Collatz-Wielandt characterization of the spectral radius of order-preserving homogeneous maps on cones.arXiv preprint arXiv:1112.5968, 2011

  2. [2]

    Ergodic theory, dynamic mode decomposition, and computation of spectral properties of the koopman operator.SIAM Journal on Applied Dynamical Systems, 16(4):2096–2126, 2017

    Hassan Arbabi and Igor Mezic. Ergodic theory, dynamic mode decomposition, and computation of spectral properties of the koopman operator.SIAM Journal on Applied Dynamical Systems, 16(4):2096–2126, 2017

  3. [3]

    Arnold.Random dynamical systems

    L. Arnold.Random dynamical systems. Springer Monographs in Mathematics. Springer, Berlin, Heidelberg, 1st ed. 1998. edition, 2002

  4. [4]

    Springer, 2018

    VivianeBaladi.Dynamical zeta functions and dynamical determinants for hyperbolic maps. Springer, 2018

  5. [5]

    Rayleigh-bénard convection.Contemporary Physics, 25(6):535–582, 1984

    P Bergé and M Dubois. Rayleigh-bénard convection.Contemporary Physics, 25(6):535–582, 1984

  6. [6]

    SIAM, 1994

    Abraham Berman and Robert J Plemmons.Nonnegative matrices in the mathematical sciences. SIAM, 1994

  7. [7]

    The mpEDMD algorithm for data-driven computations of measure-preserving dynamical systems.arXiv preprint arXiv:2209.02244, 2022

    Matthew J Colbrook. The mpEDMD algorithm for data-driven computations of measure-preserving dynamical systems.arXiv preprint arXiv:2209.02244, 2022

  8. [8]

    Limits and powers of koopman learning

    Matthew J Colbrook, Igor Mezić, and Alexei Stepanenko. Limits and powers of koopman learning. arXiv preprint arXiv:2407.06312, 2024

  9. [9]

    Springer, 2001

    Hubert Hennion and Loïc Hervé.Limit theorems for Markov chains and stochastic properties of dynamical systems by quasi-compactness. Springer, 2001

  10. [10]

    Avoiding spectral pollution for transfer operators using residuals.arXiv preprint arXiv:2507.16915, 2025

    April Herwig, Matthew J Colbrook, Oliver Junge, Péter Koltai, and Julia Slipantschuk. Avoiding spectral pollution for transfer operators using residuals.arXiv preprint arXiv:2507.16915, 2025. 29

  11. [11]

    An eigensystem realization algorithm for modal parameter identification and model reduction.Journal of guidance, control, and dynamics, 8(5):620–627, 1985

    Jer-Nan Juang and Richard S Pappa. An eigensystem realization algorithm for modal parameter identification and model reduction.Journal of guidance, control, and dynamics, 8(5):620–627, 1985

  12. [12]

    Springer Science & Business Media, 2011

    Rafail Khasminskii.Stochastic stability of differential equations, volume 66. Springer Science & Business Media, 2011

  13. [13]

    A koopman-takens theorem: Linear least squares prediction of nonlinear time series.arXiv preprint arXiv:2308.02175, 2023

    Péter Koltai and Philipp Kunde. A koopman-takens theorem: Linear least squares prediction of nonlinear time series.arXiv preprint arXiv:2308.02175, 2023

  14. [14]

    Hamiltonian systems and transformation in hilbert space.Proceedings of the National Academy of Sciences, 17(5):315–318, 1931

    Bernard O Koopman. Hamiltonian systems and transformation in hilbert space.Proceedings of the National Academy of Sciences, 17(5):315–318, 1931

  15. [15]

    On convergence of extended dynamic mode decomposition to the Koopman operator.Journal of Nonlinear Science, 28(2):687–710, 2018

    Milan Korda and Igor Mezić. On convergence of extended dynamic mode decomposition to the Koopman operator.Journal of Nonlinear Science, 28(2):687–710, 2018

  16. [16]

    Central limit theorem for deterministic systems

    Carlangelo Liverani. Central limit theorem for deterministic systems. InInternational Conference on Dynamical Systems (Montevideo, 1995), volume 362, pages 56–75, 1996

  17. [17]

    The mechanics of vacillation.Journal of Atmospheric Sciences, 20(5):448–465, 1963

    Edward N Lorenz. The mechanics of vacillation.Journal of Atmospheric Sciences, 20(5):448–465, 1963

  18. [18]

    Superpolynomial and polynomial mixing for semiflows and flows.Nonlinearity, 31(10):R268, 2018

    Ian Melbourne. Superpolynomial and polynomial mixing for semiflows and flows.Nonlinearity, 31(10):R268, 2018

  19. [19]

    Spectral properties of dynamical systems, model reduction and decompositions.Non- linear Dynamics, 41(1):309–325, 2005

    Igor Mezić. Spectral properties of dynamical systems, model reduction and decompositions.Non- linear Dynamics, 41(1):309–325, 2005

  20. [20]

    Springer, 1996

    Peter Overschee and Bart Moor.Subspace identification for linear systems: Theory-Implementation- Applications. Springer, 1996

  21. [21]

    Dynamic mode decomposition and its variants.Annual Review of Fluid Mechanics, 54:225–254, 2022

    Peter J Schmid. Dynamic mode decomposition and its variants.Annual Review of Fluid Mechanics, 54:225–254, 2022

  22. [22]

    Dynamic mode decomposition for analytic maps.Communications in Nonlinear Science and Numerical Simulation, 84:105179, 2020

    Julia Slipantschuk, Oscar F Bandtlow, and Wolfram Just. Dynamic mode decomposition for analytic maps.Communications in Nonlinear Science and Numerical Simulation, 84:105179, 2020

  23. [23]

    Extremal probabilities for Gaussian quadratic forms.Proba- bility theory and related fields, 126(2):184–202, 2003

    Gábor J Székely and Nail K Bakirov. Extremal probabilities for Gaussian quadratic forms.Proba- bility theory and related fields, 126(2):184–202, 2003

  24. [24]

    Princeton University Press, 2005

    Lloyd N Trefethen and Mark Embree.Spectra and Pseudospectra: The Behavior of Nonnormal Matrices and Operators. Princeton University Press, 2005

  25. [25]

    Freedman’s inequality for matrix martingales.Electronic Communications in Prob- ability, 16:262–270, 2011

    Joel A Tropp. Freedman’s inequality for matrix martingales.Electronic Communications in Prob- ability, 16:262–270, 2011

  26. [26]

    User-friendly tail bounds for sums of random matrices.Foundations of computational mathematics, 12(4):389–434, 2012

    Joel A Tropp. User-friendly tail bounds for sums of random matrices.Foundations of computational mathematics, 12(4):389–434, 2012

  27. [27]

    An introduction to matrix concentration inequalities.Foundations and Trends in Machine Learning, 8(1-2):1–230, 2015

    Joel A Tropp et al. An introduction to matrix concentration inequalities.Foundations and Trends in Machine Learning, 8(1-2):1–230, 2015

  28. [28]

    A data–driven approximation of the Koopman operator: Extending dynamic mode decomposition.Journal of Nonlinear Science, 25:1307–1346, 2015

    Matthew O Williams, Ioannis G Kevrekidis, and Clarence W Rowley. A data–driven approximation of the Koopman operator: Extending dynamic mode decomposition.Journal of Nonlinear Science, 25:1307–1346, 2015

  29. [29]

    SamplingPseudospectrum.jl.https://github.com/wormell/ SamplingPseudospectrum.jl

    Caroline Wormell. SamplingPseudospectrum.jl.https://github.com/wormell/ SamplingPseudospectrum.jl

  30. [30]

    Spectral Galerkin methods for transfer operators in uniformly expanding dynam- ics.Numerische Mathematik, 142(2):421–463, 2019

    Caroline Wormell. Spectral Galerkin methods for transfer operators in uniformly expanding dynam- ics.Numerische Mathematik, 142(2):421–463, 2019

  31. [31]

    Orthogonal polynomial approximation and extended dynamic mode decomposi- tion in chaos.SIAM Journal on Numerical Analysis, 63(1):122–148, 2025

    Caroline Wormell. Orthogonal polynomial approximation and extended dynamic mode decomposi- tion in chaos.SIAM Journal on Numerical Analysis, 63(1):122–148, 2025. 30