pith. sign in

arxiv: 2511.09024 · v2 · submitted 2025-11-12 · 📊 stat.ME

Instrumental variables system identification with L^p consistency

Pith reviewed 2026-05-17 22:45 UTC · model grok-4.3

classification 📊 stat.ME
keywords instrumental variablessystem identificationfinite-sample consistencyL^p consistencydynamical systemsnonparametric convergencetime seriesparameter estimation
0
0 comments X

The pith

A data-synthesized instrumental variables estimator achieves finite-sample L^p consistency for dynamical system identification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an instrumental variables estimator for dynamical systems that generates its own instruments directly from the observed noisy data. It proves this estimator is consistent in the L^p sense for every p at least 1, in both discrete-time and continuous-time models, while recovering a nonparametric square-root-of-n convergence rate. The approach matters because least-squares identification is biased by measurement noise and traditional instrumental variables methods require external instruments that are rarely available for nonlinear time series. The only modeling assumption is linearity in the unknown parameters, which allows the estimator to apply to modern sparsity-promoting techniques for learning dynamics.

Core claim

By synthesizing instruments internally from the data, the instrumental variables estimator recovers the true parameters with finite-sample L^p consistency for all p greater than or equal to 1 in both discrete- and continuous-time dynamical systems that are linear in the parameters, attaining a nonparametric square-root-of-n rate.

What carries the argument

The data-synthesized instrumental variables estimator, which constructs valid instruments from the observations to eliminate correlation between regressors and noise and thereby enable the consistency proofs.

If this is right

  • The estimator applies to both discrete-time difference equations and continuous-time differential equations.
  • It attains nonparametric square-root-of-n convergence without further parametric assumptions beyond linearity in parameters.
  • On the forced Lorenz system the method reduces parameter bias by 200 times in continuous time and 500 times in discrete time relative to least squares.
  • Root-mean-squared error decreases by up to a factor of ten compared with ordinary least squares.
  • The method extends directly to sparsity-promoting regression techniques used in modern dynamics learning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar internal instrument synthesis could be explored for other time-series bias-correction tasks where external instruments are unavailable.
  • Engineers estimating models from sensor streams alone might obtain less biased parameters without additional experiments.
  • Direct verification of the finite-sample bounds could be performed on benchmark systems with fully known ground-truth dynamics.
  • The nonparametric rate opens the possibility of scaling the approach to higher-dimensional or more complex dynamical systems.

Load-bearing premise

Valid instruments can be synthesized from the observed data alone while preserving the finite-sample L^p consistency guarantees under the linearity-in-parameters assumption.

What would settle it

A controlled simulation with known true parameters in which the estimator's parameter error fails to decrease proportionally to the square root of sample size or in which bias remains positive as sample size grows.

Figures

Figures reproduced from arXiv: 2511.09024 by Simon Kuang, Xinfan Lin.

Figure 1
Figure 1. Figure 1: Elementwise marginal kernel density estimates of the sampling distributions of our esti￾mator (dashed) and a baseline estimator (solid). Vertical line indicates ground truth; ticks indicate mean of sampling distribution. Estimator bias (%) std (%) rmse (%) Instrumental Variables (ours) 0.017(8) 96.964(9) 0.800(7) Least Squares 2.382(3) 95.040(4) 2.437(3) [PITH_FULL_IMAGE:figures/full_fig_p009_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Elementwise marginal kernel density estimates of the sampling distributions of our esti￾mator (dashed) and a baseline estimator (solid). Vertical line indicates ground truth; ticks indicate mean of sampling distribution. 25 [PITH_FULL_IMAGE:figures/full_fig_p025_2.png] view at source ↗
read the original abstract

Instrumental variables (eliminate the bias that afflicts least-squares identification of dynamical systems through noisy data, yet traditionally relies on external instruments that are seldom available for nonlinear time series data. We propose an IV estimator that synthesizes instruments from the data. We establish finite-sample $L^{p}$ consistency for all $p \ge 1$ in both discrete- and continuous-time models, recovering a nonparametric $\sqrt{n}$-convergence rate. On a forced Lorenz system our estimator reduces parameter bias by 200x (continuous-time) and 500x (discrete-time) relative to least squares and reduces RMSE by up to tenfold. Because the method only assumes that the model is linear in the unknown parameters, it is broadly applicable to modern sparsity-promoting dynamics learning models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes synthesizing instrumental variables directly from observed noisy trajectories for identifying dynamical systems that are linear in the parameters. It claims to establish finite-sample L^p consistency (all p ≥ 1) for both discrete- and continuous-time models, recovering a nonparametric √n rate, and reports large bias reductions (200× continuous-time, 500× discrete-time) plus up to 10× RMSE improvement versus least squares on forced Lorenz examples.

Significance. If the finite-sample L^p bounds hold under the internal instrument construction, the result would be significant for system identification: it supplies non-asymptotic guarantees without external instruments and applies directly to sparsity-promoting models. The all-p coverage and √n rate are strong if the orthogonality step is rigorous.

major comments (2)
  1. [Main consistency theorem / instrument construction section] The finite-sample L^p consistency (abstract and main theorem) requires that the data-synthesized instrument matrix Z satisfies E[Z^T e] = 0 exactly (or with a remainder that does not degrade the rate). When Z is built from the same noisy y and u trajectories (lags, filtered versions, or basis projections), this moment condition is not automatic under process/measurement noise; the linearity-in-parameters assumption alone does not guarantee it. Please cite the specific lemma or assumption that establishes exact orthogonality for the non-asymptotic bound, especially in the continuous-time case.
  2. [Numerical experiments on Lorenz system] Table or figure reporting Lorenz results: the 200×/500× bias reductions and RMSE gains are presented without visible Monte-Carlo count, data-exclusion rules, or error bars. If these metrics are used to support the practical value of the √n rate, the experimental protocol must be stated so that post-hoc choices can be ruled out.
minor comments (2)
  1. [Notation and preliminaries] Define the precise L^p norm (vector or matrix) and the probability space on which the finite-sample bound is taken.
  2. [Introduction] Add a short discussion of how the method relates to existing data-driven IV approaches for time series (e.g., lagged-state instruments under whiteness).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review of our manuscript. The comments have helped us identify areas where additional clarity would strengthen the presentation. We address each major comment below and have revised the manuscript accordingly.

read point-by-point responses
  1. Referee: [Main consistency theorem / instrument construction section] The finite-sample L^p consistency (abstract and main theorem) requires that the data-synthesized instrument matrix Z satisfies E[Z^T e] = 0 exactly (or with a remainder that does not degrade the rate). When Z is built from the same noisy y and u trajectories (lags, filtered versions, or basis projections), this moment condition is not automatic under process/measurement noise; the linearity-in-parameters assumption alone does not guarantee it. Please cite the specific lemma or assumption that establishes exact orthogonality for the non-asymptotic bound, especially in the continuous-time case.

    Authors: We appreciate the referee drawing attention to this central requirement. The exact orthogonality condition E[Z^T e] = 0 is stated in Assumption 2.3 and is established rigorously in Lemma 3.2 (discrete time) and Lemma 4.1 (continuous time). These lemmas demonstrate that the data-synthesized instruments—constructed via lagged filtered versions or basis projections of the observed trajectories—remain uncorrelated with the composite noise term e because the measurement noise is independent of the underlying deterministic state and forcing input. The finite-sample L^p bound in the main theorem (Theorem 3.1) then follows from this moment condition together with the uniform boundedness assumptions on the regressor and instrument matrices. To make the dependence explicit, we have inserted direct cross-references to Lemmas 3.2 and 4.1 immediately after the statement of the main consistency result. revision: partial

  2. Referee: [Numerical experiments on Lorenz system] Table or figure reporting Lorenz results: the 200×/500× bias reductions and RMSE gains are presented without visible Monte-Carlo count, data-exclusion rules, or error bars. If these metrics are used to support the practical value of the √n rate, the experimental protocol must be stated so that post-hoc choices can be ruled out.

    Authors: We agree that full transparency in the experimental protocol is necessary to substantiate the reported performance improvements. In the revised manuscript we have expanded Section 5.2 with a dedicated paragraph that specifies: (i) the use of 1000 independent Monte Carlo realizations, (ii) the exact parameter values and integration scheme employed to generate the forced Lorenz trajectories, (iii) confirmation that no observations were excluded beyond routine numerical stability checks, and (iv) the addition of standard-error bars to all bias and RMSE plots. These details ensure that the observed bias reductions (approximately 200× continuous-time, 500× discrete-time) and RMSE gains are reproducible and not the result of selective reporting. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained mathematical analysis

full rationale

The paper derives finite-sample L^p consistency and nonparametric sqrt(n) rates for its synthesized-instrument IV estimator via direct analysis of the linear-in-parameters regression under stated moment conditions. The central claims rest on explicit assumptions about instrument validity and noise properties rather than reducing by construction to parameters fitted from the target data or to self-citations that bear the load of the uniqueness or rate results. No equations or steps in the abstract or described claims equate the reported consistency bounds to inputs chosen from the same trajectories, and the method is presented as broadly applicable under linearity without smuggling ansatzes or renaming known empirical patterns as new derivations.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that the unknown dynamics are linear in the parameters and that instruments synthesized from the data remain valid for the consistency proof.

axioms (1)
  • domain assumption The model is linear in the unknown parameters
    Explicitly stated in the abstract as the sole modeling assumption enabling broad applicability.

pith-pipeline@v0.9.0 · 5423 in / 1191 out tokens · 44825 ms · 2026-05-17T22:45:48.809442+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages · 1 internal anchor

  1. [1]

    Granha Jeronimo and P

    ACM. doi:10.1145/3564246.3585247. URL https://dl.acm.org/doi/10.1145/3564246.3585247. Steven L. Brunton, Joshua L. Proctor, and J. Nathan Kutz. Discovering governing equations from data by sparse identification of nonlinear dynamical systems.Proceedings of the National Academy of Sciences, 113(15):3932–3937, April

  2. [2]

    L., Proctor J

    doi:10.1073/pnas.1517384113. URL https://www.pnas.org/doi/10.1073/pnas.1517384113. Publisher: Proceedings of the National Academy of Sciences. Russell Davidson and James G. MacKinnon.Econometric theory and methods. Oxford Univ. Press, New York, NY ,

  3. [3]

    doi:10.1016/j.automatica.2024.111697

    ISSN 00051098. doi:10.1016/j.automatica.2024.111697. URL https://linkinghub.elsevier.com/retrieve/pii/S0005109824001912. George Haller.Modeling Nonlinear Dynamics from Equations and Data — with Applications to Solids, Fluids, and Controls. Society for Industrial and Applied Mathematics, Philadelphia, PA, January

  4. [4]

    doi:10.1137/1.9781611978353

    ISBN 978-1-61197-834-6 978-1-61197-835-3. doi:10.1137/1.9781611978353. URLhttps://epubs.siam.org/doi/book/10.1137/1.9781611978353. Junette Hsin, Shubhankar Agarwal, Adam Thorpe, Luis Sentis, and David Fridovich-Keil. Symbolic Regression on Sparse and Noisy Data with Gaussian Processes, October

  5. [5]

    arXiv:2309.11076 [cs]

    URL http: //arxiv.org/abs/2309.11076. arXiv:2309.11076 [cs]. 11 KUANGLIN Simon Kuang and Xinfan Lin. Estimation Sample Complexity of a Class of Nonlinear Continuous- time Systems. InIFAC-PapersOnLine, volume 58 ofThe 4th Modeling, Estimation, and Control Conference – 2024, pages 786–791, January

  6. [6]

    URLhttps: //www.sciencedirect.com/science/article/pii/S2405896325000692

    doi:10.1016/j.ifacol.2025.01.069. URLhttps: //www.sciencedirect.com/science/article/pii/S2405896325000692. J. Nathan Kutz, Steven L. Brunton, Bingni W. Brunton, and Joshua L. Proctor.Dynamic Mode Decomposition: Data-Driven Modeling of Complex Systems. Society for Industrial and Applied Mathematics, Philadelphia, PA, November

  7. [7]

    doi:10.1137/1.9781611974508

    ISBN 978-1-61197-449-2 978-1-61197-450-8. doi:10.1137/1.9781611974508. URL http://epubs.siam.org/doi/book/10.1137/ 1.9781611974508. Dipankar Maity and Debdipta Goswami. On the Effect of Quantization on Extended Dynamic Mode Decomposition. In2025 American Control Conference (ACC), pages 3176–3182, Denver, CO, USA, July

  8. [8]

    Drgoa, T

    IEEE. ISBN 9798331569372. doi:10.23919/ACC63710.2025.11107527. URL https://ieeexplore.ieee.org/document/11107527/. Igor Mezi ´c. Koopman Operator, Geometry, and Learning of Dynamical Systems.No- tices of the American Mathematical Society, 68(07):1, August

  9. [9]

    doi:10.1090/noti2306

    ISSN 0002-9920, 1088-9477. doi:10.1090/noti2306. URL https://www.ams.org/notices/202107/ rnoti-p1087.pdf. Siqi Pan, James S. Welsh, Rodrigo A. Gonz ´alez, and Cristian R. Rojas. Efficiency analysis of the Simplified Refined Instrumental Variable method for Continuous-time systems.Automatica, 121:109196, November

  10. [10]

    doi:10.1016/j.automatica.2020.109196

    ISSN 00051098. doi:10.1016/j.automatica.2020.109196. URL https://linkinghub.elsevier.com/retrieve/pii/S0005109820303940. Torsten S¨oderstr¨om.Errors-in-Variables Methods in System Identification. Communications and Control Engineering. Springer International Publishing, Cham,

  11. [11]

    doi:10.1007/978-3-319-75001-9

    ISBN 978-3-319-75000-2 978-3-319-75001-9. doi:10.1007/978-3-319-75001-9. URL http://link.springer.com/ 10.1007/978-3-319-75001-9. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention Is All You Need, August

  12. [12]

    Attention Is All You Need

    URL http: //arxiv.org/abs/1706.03762. arXiv:1706.03762 [cs]. Roman Vershynin. High-Dimensional Probability: An Introduction with Applications in Data Science, September

  13. [13]

    doi:10.1016/j.cma.2023.116096

    ISSN 00457825. doi:10.1016/j.cma.2023.116096. URL https://linkinghub.elsevier. com/retrieve/pii/S0045782523002207. Grace Y . Yi, Aurore Delaigle, and Paul Gustafson.Handbook of Measurement Error Models. Chapman and Hall/CRC, Boca Raton, 1 edition, September

  14. [14]

    URL https://www.taylorfrancis.com/books/ 9781315101279

    doi:10.1201/9781315101279. URL https://www.taylorfrancis.com/books/ 9781315101279. Ingvar Ziemann, Anastasios Tsiamis, Bruce Lee, Yassir Jedra, Nikolai Matni, and George J. Pappas. A Tutorial on the Non-Asymptotic Theory of System Identification, September

  15. [15]

    arXiv:2309.03873 [cs, eess, stat]

    URL http://arxiv.org/abs/2309.03873. arXiv:2309.03873 [cs, eess, stat]. 13 KUANGLIN Contents 1 Introduction 1 2 Notation and conventions 2 3 Problem statement 2 4 Construction of the estimator 4 4.1 The filters ˆH, ˆG, and ˜G. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4.2 WhyZ? . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...

  16. [16]

    We invoke Lemma 12 to bound the subgaussian norms of (Z ⋆)⊺X ⋆ −Z ⊺X and Z⊺Y−(Z ⋆)⊺Y ⋆, and then use Fact 1 to convert these intoL p norms

    2 log2(λ/(σ2 −λ)) s σ2 −2λ 2   + exp(−C(σ 2 −λ) 2/sL2 ZX ) λ .(31) We invoke Lemma 6 to boundZ⊺X−[Z ⊺X] ∨λ almost surely. We invoke Lemma 12 to bound the subgaussian norms of (Z ⋆)⊺X ⋆ −Z ⊺X and Z⊺Y−(Z ⋆)⊺Y ⋆, and then use Fact 1 to convert these intoL p norms. The result is ˆθ−θ ∗ q ∥θ⋆∥ ≤γ(q;σ 2, λ)λ+C p q(1 + 1/ϵ)γ(q(1 + 1/ϵ);σ 2, λ) · ( n ¯νzx + θ⋆ ...

  17. [17]

    Reporting All values are normalized by the Frobenius norm of the (pseudo-) true parameter

    E.4. Reporting All values are normalized by the Frobenius norm of the (pseudo-) true parameter. The bias is computed as the Frobenius distances between the mean of the estimator and the (pseudo-) true parameter. The standard deviation is computed as the quadratic mean of the Frobenius distance between the estimator and its mean. The root mean square error...