Gradient-based filtering under misspecification: Stability and error bounds

Bram van Os; Dick van Dijk; Rutger-Jan Lange; Simon Donker van Heel

arxiv: 2502.05021 · v8 · submitted 2025-02-07 · 📊 stat.ME · eess.SP· stat.ML

Gradient-based filtering under misspecification: Stability and error bounds

Simon Donker van Heel , Rutger-Jan Lange , Bram van Os , Dick van Dijk This is my paper

Pith reviewed 2026-05-23 03:25 UTC · model grok-4.3

classification 📊 stat.ME eess.SPstat.ML

keywords gradient-based filtersscore-driven filtersmodel misspecificationexponential stabilityerror boundstime-varying parametersfiltering

0 comments

The pith

Gradient-based filters achieve exponential stability when tracking time-varying parameters even under model misspecification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that both explicit and implicit gradient-based filters can track multidimensional time-varying parameters under noisy observations and model misspecification. It derives sufficient conditions for exponential stability of the filtered parameter path that hold independently of the data-generating process. Under mild additional moment conditions, finite-sample and asymptotic mean squared error bounds are obtained relative to the pseudo-true parameter path. Implicit filters satisfy the guarantees under weaker restrictions than explicit filters, which also require a Lipschitz continuous score and sufficiently small learning rate.

Core claim

For both explicit and implicit gradient-based filters, novel sufficient conditions ensure exponential stability of the filtered parameter path independently of the data-generating process. Under mild moment conditions on the data-generating process, finite-sample and asymptotic mean squared error bounds hold relative to the pseudo-true parameter path, with implicit filters satisfying these under weaker parameter restrictions.

What carries the argument

The gradient of the postulated observation density (the score), evaluated at the predicted parameter for explicit filters or the updated parameter for implicit filters.

If this is right

Stability holds independently of the data-generating process whenever the pseudo-true path exists.
Implicit filters meet the stability and error bounds under weak parameter restrictions.
Explicit filters require the additional conditions of Lipschitz continuous score and small learning rate.
Finite-sample and asymptotic MSE bounds relative to the pseudo-true path follow from the mild moment conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

These stability results could extend to online updating in streaming data applications where models are known to be approximate.
The preference for implicit over explicit filters may influence choice of update rule in real-time econometric monitoring.
The framework suggests testing whether similar stability carries over when the postulated density is replaced by other loss functions.

Load-bearing premise

The existence of a pseudo-true parameter path under misspecification together with mild moment conditions on the data-generating process.

What would settle it

A dataset or simulation in which the filtered parameter path diverges exponentially even though a pseudo-true path exists, moments are satisfied, and for explicit filters the score is Lipschitz with small learning rate.

Figures

Figures reproduced from arXiv: 2502.05021 by Bram van Os, Dick van Dijk, Rutger-Jan Lange, Simon Donker van Heel.

**Figure 1.** Figure 1: Semilog plots of empirical MSEs and MSE bounds (dotted) for least-squares recovery with respect to the time step t, learning rate η, state volatility σξ, and Lipschitz gradient constant β, with average errors computed at horizon T = 500 for the latter three plots. Empirical averages are computed over 1,000 replications. Unless stated otherwise, parameters are k = 50, n = 100, α = β = 1, σ = 10, and σξ = 1,… view at source ↗

**Figure 2.** Figure 2: Semilog plots of guaranteed bounds and empirical tracking errors for least-squares recovery with respect to iteration t, Lipschitz gradient constant β, state dimension k, and observation dimension n, with average errors computed at horizon T = 500 for the latter three plots. Empirical averages are computed over 1,000 trials. Default parameter values: α = 1, β = 40, σ = 10, σξ = 1, η = η⋆, k = 50, n = 100. … view at source ↗

**Figure 3.** Figure 3: Plots of the out-of-sample empirical errors and guaranteed bounds for tracking (the logarithm of the rate of) a dynamic Poisson distribution (i.e., the true state {ϑt}) with respect to its variation σξ. The left-hand plot shows the Kullback-Leibler (KL) divergence between the postulated and true densities p(·|µt|t) and p 0 (·|µt), where µt|t and µt denote the filtered and true rates, respectively. The righ… view at source ↗

read the original abstract

Can stochastic gradient methods track a moving target? We study the problem of tracking multidimensional time-varying parameters under noisy observations and possible model misspecification. Gradient-based filters update the time-varying parameters using the gradient of a postulated objective function. A natural filtering objective is the logarithm of the postulated observation density, which gives rise to the widely used class of score-driven filters. As in the optimization literature, these filters come in two forms: explicit filters evaluate the gradient at the predicted parameter, whereas implicit filters evaluate it at the updated parameter. For both filter types, we derive novel sufficient conditions for exponential stability of the filtered parameter path, showing that stability can be guaranteed independently of the data-generating process. Under mild additional moment conditions on the data-generating process, we also obtain finite-sample and asymptotic mean squared error bounds relative to the pseudo-true parameter path. For implicit filters, these guarantees hold under weak parameter restrictions. For explicit filters, they additionally require Lipschitz continuity of the score and a sufficiently small learning rate. Simulation studies support our theoretical findings and show that implicit gradient filters outperform explicit ones in both accuracy and stability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper supplies new sufficient conditions for exponential stability of gradient filters that hold independently of the DGP, plus MSE bounds, but the uniformity of the Lipschitz requirement for explicit filters is left unclear in the abstract.

read the letter

The main new pieces are the sufficient conditions for exponential stability of the filtered path that do not depend on the data-generating process, plus finite-sample and asymptotic MSE bounds relative to the pseudo-true path. They separate explicit and implicit versions cleanly, with weaker requirements for the implicit case (just parameter restrictions) and extra Lipschitz-plus-small-step-size conditions for explicit ones. The simulations are said to show implicit filters winning on both accuracy and stability, which lines up with the theory and gives a practical takeaway for people running score-driven filters in time series work.

Referee Report

2 major / 2 minor

Summary. The paper studies gradient-based filters (explicit and implicit) for tracking multidimensional time-varying parameters under noisy observations and model misspecification. It derives novel sufficient conditions for exponential stability of the filtered parameter path that hold independently of the data-generating process, along with finite-sample and asymptotic MSE bounds relative to the pseudo-true path under mild moment conditions on the DGP. For implicit filters the guarantees require only weak parameter restrictions; for explicit filters they additionally require Lipschitz continuity of the score and a sufficiently small learning rate. Simulation studies are presented to support the theory.

Significance. If the stability and bound derivations hold with the claimed independence from the DGP, the results would supply useful theoretical justification for score-driven filters in misspecified environments, extending contraction-mapping ideas from optimization to filtering. The explicit/implicit distinction and the provision of both stability and error bounds are constructive contributions.

major comments (2)

[Abstract / explicit-filter stability theorem] Abstract and statement of main stability result for explicit filters: the claim that exponential stability holds independently of the DGP requires a uniform contraction mapping. If the Lipschitz constant L(y) of the score is permitted to depend on the observation y, the one-step contraction factor of the map θ ↦ θ − η · score(θ, y) is not uniform over observation sequences; the paper must therefore clarify whether the Lipschitz condition is required to be uniform in y (with L independent of y) or only pointwise, and show how uniformity is obtained from the stated assumptions.
[Explicit-filter stability derivation] Section deriving the explicit-filter stability (likely §3 or §4): the proof sketch that stability is independent of the DGP appears to rest on a contraction whose rate is controlled by ηL; without an explicit uniform bound on L (or a demonstration that the moment conditions already imply such a bound), the independence claim is not yet load-bearing.

minor comments (2)

[Introduction / notation section] Notation for the pseudo-true path should be introduced earlier and used consistently when stating the MSE bounds.
[Simulation studies] The simulation section would benefit from reporting the exact learning-rate values used and whether they satisfy the small-η condition derived in the theory.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and the constructive comments on the stability claims. We address the two major comments below and will revise the manuscript to improve clarity on the uniformity of the Lipschitz condition.

read point-by-point responses

Referee: [Abstract / explicit-filter stability theorem] Abstract and statement of main stability result for explicit filters: the claim that exponential stability holds independently of the DGP requires a uniform contraction mapping. If the Lipschitz constant L(y) of the score is permitted to depend on the observation y, the one-step contraction factor of the map θ ↦ θ − η · score(θ, y) is not uniform over observation sequences; the paper must therefore clarify whether the Lipschitz condition is required to be uniform in y (with L independent of y) or only pointwise, and show how uniformity is obtained from the stated assumptions.

Authors: We agree that the contraction must be uniform for the DGP-independence claim to hold. The manuscript assumes a uniform Lipschitz condition on the score (i.e., there exists L < ∞ independent of y such that ||score(θ, y) - score(θ', y)|| ≤ L ||θ - θ'|| for all y). This is stated in the assumptions for the explicit-filter theorem and ensures the one-step map is a uniform contraction with factor controlled by ηL. We will revise the abstract and theorem statement to explicitly note that the Lipschitz condition is uniform in y, and add a short remark in the proof section showing how this yields the claimed DGP-independent stability. revision: yes
Referee: [Explicit-filter stability derivation] Section deriving the explicit-filter stability (likely §3 or §4): the proof sketch that stability is independent of the DGP appears to rest on a contraction whose rate is controlled by ηL; without an explicit uniform bound on L (or a demonstration that the moment conditions already imply such a bound), the independence claim is not yet load-bearing.

Authors: The uniform bound on L is supplied directly by the Lipschitz assumption in the theorem statement for explicit filters; this assumption is part of the sufficient conditions and does not depend on the DGP. The mild moment conditions on the DGP are used only for the subsequent MSE bounds, not for establishing stability. We will revise the derivation section to explicitly isolate the contraction step, state that the rate ηL is uniform by assumption, and thereby confirm that stability holds independently of the data-generating process. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivations rest on external assumptions

full rationale

The paper's central results derive sufficient conditions for exponential stability of gradient-based filters (explicit and implicit) and associated MSE bounds relative to a pseudo-true path. These rest on stated assumptions including Lipschitz continuity of the score (for explicit filters), small learning rate, and mild moment conditions on the DGP. No quoted step reduces the claimed stability or bounds by construction to quantities fitted from the same data, nor does any load-bearing premise collapse to a self-citation chain or ansatz smuggled via prior work. The pseudo-true path is defined by the postulated model under misspecification, which is a standard external benchmark rather than a fitted input renamed as prediction. The derivation chain is therefore self-contained against the listed assumptions.

Axiom & Free-Parameter Ledger

1 free parameters · 3 axioms · 0 invented entities

The central claims rest on standard mathematical assumptions from dynamical systems and optimization theory plus domain assumptions about pseudo-true paths and moments; no free parameters are explicitly fitted in the abstract, though the learning rate is constrained to be small for explicit filters.

free parameters (1)

learning rate
Must be sufficiently small for explicit filter stability guarantees.

axioms (3)

domain assumption Existence of a pseudo-true parameter path under the postulated model
Error bounds are defined relative to this path.
domain assumption Mild moment conditions on the data-generating process
Required to obtain finite-sample and asymptotic MSE bounds.
domain assumption Lipschitz continuity of the score function
Additional requirement stated for explicit filters.

pith-pipeline@v0.9.0 · 5734 in / 1399 out tokens · 34820 ms · 2026-05-23T03:25:09.475928+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages

[1]

write newline

" write newline "" before.all 'output.state := FUNCTION article output.bibitem format.authors "author" output.check author format.key output output.year.check new.block format.title "title" output.check new.block crossref missing format.jour.vol output format.article.crossref output.nonnull format.pages output if new.block note output fin.entry FUNCTION b...

work page
[2]

Blasques, J

Artemova, M., F. Blasques, J. van Brummelen, and S. J. Koopman (2022a). Score-driven models: Methodology and theory. In Oxford Research Encyclopedia of Economics and Finance . Oxford University Press

work page
[3]

Blasques, J

Artemova, M., F. Blasques, J. van Brummelen, and S. J. Koopman (2022b). Score-driven models: Methods and applications. In Oxford Research Encyclopedia of Economics and Finance . Oxford University Press

work page
[4]

Lanconelli, C

Bernardi, E., A. Lanconelli, C. S. Lauria, and B. T. Per c in (2024). Non trivial optimal sampling rate for estimating a L ipschitz-continuous function in presence of mean-reverting O rnstein- U hlenbeck noise. https://arxiv.org/pdf/2405.10795

work page arXiv 2024
[5]

Gorgi, S

Blasques, F., P. Gorgi, S. J. Koopman, and O. Wintenberger (2018). Feasible invertibility conditions and maximum likelihood estimation for observation-driven models. Electronic Journal of Statistics\/ 12 , 1019--1052

work page 2018
[6]

van Brummelen, S

Blasques, F., J. van Brummelen, S. J. Koopman, and A. Lucas (2022). Maximum likelihood estimation for score-driven models. Journal of Econometrics\/ 227\/ (2), 325--346

work page 2022
[7]

Bottou, L., F. E. Curtis, and J. Nocedal (2018). Optimization methods for large-scale machine learning. SIAM Review\/ 60\/ (2), 223--311

work page 2018
[8]

Bougerol, P. (1993). Kalman filtering with random coefficients and contractions. SIAM Journal on Control and Optimization\/ 31\/ (4), 942--959

work page 1993
[9]

Brownlees, C. and J. Llorens-Terrazas (2024). Empirical risk minimization for time series: Nonparametric performance bounds for prediction. Journal of Econometrics\/ 244\/ (1), 105849

work page 2024
[10]

Caivano, M. and A. Harvey (2014). Time-series models with an EGB 2 conditional distribution. Journal of Time Series Analysis\/ 35\/ (6), 558--571

work page 2014
[11]

Harvey, and A

Caivano, M., A. Harvey, and A. Luati (2016). Robust time series models with trend and seasonal components. SERIEs\/ 7 , 99--120

work page 2016
[12]

Zhang, and H

Cao, X., J. Zhang, and H. V. Poor (2019). On the time-varying distributions of online stochastic optimization. In 2019 American Control Conference (ACC) , pp.\ 1494--1500. IEEE

work page 2019
[13]

Creal, D., S. J. Koopman, and A. Lucas (2013). Generalized autoregressive score models with applications. Journal of Applied Econometrics\/ 28\/ (5), 777--795

work page 2013
[14]

Drusvyatskiy, and Z

Cutler, J., D. Drusvyatskiy, and Z. Harchaoui (2023). Stochastic optimization under distributional drift. Journal of Machine Learning Research\/ 24\/ (147), 1--56

work page 2023
[15]

Davis, R. A., W. T. Dunsmuir, and S. B. Streett (2003). Observation-driven models for P oisson counts. Biometrika\/ 90\/ (4), 777--790

work page 2003
[16]

Duchi, J. C. (2018). Introductory lectures on stochastic optimization. The Mathematics of Data\/ 25 , 99--186

work page 2018
[17]

Durbin, J. and S. J. Koopman (1997). Monte C arlo maximum likelihood estimation for non- G aussian state space models. Biometrika\/ 84\/ (3), 669--684

work page 1997
[18]

Durbin, J. and S. J. Koopman (2000). Time series analysis of non- G aussian observations based on state space models from both classical and B ayesian perspectives. Journal of the Royal Statistical Society Series B: Statistical Methodology\/ 62\/ (1), 3--56

work page 2000
[19]

Durbin, J. and S. J. Koopman (2012). Time Series Analysis by State Space Methods , Volume 38. Oxford University Press

work page 2012
[20]

Fahrmeir, L. (1992). Posterior mode estimation by extended K alman filtering for multivariate dynamic generalized linear models . Journal of the American Statistical Association\/ 87\/ (418), 501--509

work page 1992
[21]

Gorgi, P. (2018). Integer-valued autoregressive models with survival probability driven by a stochastic recurrence equation. Journal of Time Series Analysis\/ 39\/ (2), 150--171

work page 2018
[22]

Lauria, and A

Gorgi, P., C. Lauria, and A. Luati (2024). On the optimality of score-driven models. Biometrika\/ 111\/ (3), 865--880

work page 2024
[23]

Guo, L. and L. Ljung (1995). Exponential stability of general tracking algorithms. IEEE Transactions on Automatic Control\/ 40\/ (8), 1376--1387

work page 1995
[24]

Harvey, A. C. (1989). Forecasting, Structural Time Series Models and the K alman Filter . Cambridge University Press

work page 1989
[25]

Harvey, A. C. (2013). Dynamic Models for Volatility and Heavy Tails: With Applications to Financial and Economic Time Series , Volume 52. Cambridge University Press

work page 2013
[26]

Harvey, A. C. (2022). Score-driven time series models. Annual Review of Statistics and Its Application\/ 9\/ (1), 321--342

work page 2022
[27]

Harvey, A. C. and C. Fernandes (1989). Time series models for count or qualitative observations. Journal of Business & Economic Statistics\/ 7\/ (4), 407--417

work page 1989
[28]

Harvey, A. C. and A. Luati (2014). Filtering with heavy tails. Journal of the American Statistical Association\/ 109\/ (507), 1112--1122

work page 2014
[29]

Henderson, H. V. and S. R. Searle (1981). On deriving the inverse of a sum of matrices. SIAM Review\/ 23\/ (1), 53--60

work page 1981
[30]

Horn, R. A. and C. R. Johnson (2012). Matrix Analysis . Cambridge University Press

work page 2012
[31]

Jungers, R. (2009). The Joint Spectral Radius: Theory and Applications , Volume 385. Springer

work page 2009
[32]

Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. Journal of Basic Engineering\/ 82\/ (1), 35--45

work page 1960
[33]

Koopman, S. J., A. Lucas, and M. Scharth (2016). Predicting time-varying parameters with parameter-driven and observation-driven models. Review of Economics and Statistics\/ 98\/ (1), 97--110

work page 2016
[34]

Bonnabel, and F

Lambert, M., S. Bonnabel, and F. Bach (2022). The recursive variational G aussian approximation (R-VGA) . Statistics and Computing\/ 32\/ (10), 1--24

work page 2022
[35]

Lanconelli, A. and C. S. Lauria (2024). Maximum likelihood with a time varying parameter. Statistical Papers\/ 65\/ (4), 2555--2566

work page 2024
[36]

Lange, R.-J. (2024a). Bellman filtering and smoothing for state--space models. Journal of Econometrics\/ 238\/ (2), 105632

work page
[37]

Lange, R.-J. (2024b). Short and simple introduction to B ellman filtering and smoothing . arXiv preprint arXiv:2405.12668\/

work page arXiv
[38]

van Os, and D

Lange, R.-J., B. van Os, and D. J. van Dijk (2024). Implicit score-driven filters for time-varying parameter models. https://ssrn.com/abstract=4227958

work page 2024
[39]

Lehmann, E. L. and G. Casella (1998). Theory of Point Estimation . Springer

work page 1998
[40]

Liu, D. C. and J. Nocedal (1989). On the limited memory BFGS method for large scale optimization. Mathematical Programming\/ 45\/ (1), 503--528

work page 1989
[41]

Becker, and E

Madden, L., S. Becker, and E. Dall’Anese (2021). Bounds for the tracking error of first-order online optimization methods. Journal of Optimization Theory and Applications\/ 189 , 437--457

work page 2021
[42]

Juditsky, G

Nemirovski, A., A. Juditsky, G. Lan, and A. Shapiro (2009). Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization\/ 19\/ (4), 1574--1609

work page 2009
[43]

Nesterov, Y. (1983). A method for unconstrained convex minimization problem with the rate of convergence O (1/k^2) . Doklady AN SSSR\/ 269 , 543--547

work page 1983
[44]

Nesterov, Y. (2003). Introductory Lectures on Convex Optimization: A Basic Course , Volume 87. Springer

work page 2003
[45]

Nesterov, Y. (2018). Lectures on C onvex Optimization , Volume 137. Springer

work page 2018
[46]

Ollivier, Y. (2018). Online natural gradient as a K alman filter. Electronic Journal of Statistics\/ 12 , 2930--2961

work page 2018
[47]

Sherman, J. and W. J. Morrison (1950). Adjustment of an inverse matrix corresponding to a change in one element of a given matrix. The Annals of Mathematical Statistics\/ 21\/ (1), 124--127

work page 1950
[48]

Dall'Anese, S

Simonetto, A., E. Dall'Anese, S. Paternain, G. Leus, and G. B. Giannakis (2020). Time-varying convex optimization: Time-structured algorithms and applications. Proceedings of the IEEE\/ 108\/ (11), 2032--2048

work page 2020
[49]

Straumann, D. and T. Mikosch (2006). Quasi-maximum-likelihood estimation in conditionally heteroscedastic time series: A stochastic recurrence equations approach. Annals of Statistics\/ 34 , 2449--2495

work page 2006
[50]

Toulis, P. and E. M. Airoldi (2017). Asymptotic and finite-sample properties of estimators based on stochastic gradients. Annals of Statistics\/ 45 , 1694--1727

work page 2017
[51]

Wilson, C., V. V. Veeravalli, and A. Nedi \'c (2019). Adaptive sequential stochastic optimization. IEEE Transactions on Automatic Control\/ 64\/ (2), 496--509

work page 2019

[1] [1]

write newline

" write newline "" before.all 'output.state := FUNCTION article output.bibitem format.authors "author" output.check author format.key output output.year.check new.block format.title "title" output.check new.block crossref missing format.jour.vol output format.article.crossref output.nonnull format.pages output if new.block note output fin.entry FUNCTION b...

work page

[2] [2]

Blasques, J

Artemova, M., F. Blasques, J. van Brummelen, and S. J. Koopman (2022a). Score-driven models: Methodology and theory. In Oxford Research Encyclopedia of Economics and Finance . Oxford University Press

work page

[3] [3]

Blasques, J

Artemova, M., F. Blasques, J. van Brummelen, and S. J. Koopman (2022b). Score-driven models: Methods and applications. In Oxford Research Encyclopedia of Economics and Finance . Oxford University Press

work page

[4] [4]

Lanconelli, C

Bernardi, E., A. Lanconelli, C. S. Lauria, and B. T. Per c in (2024). Non trivial optimal sampling rate for estimating a L ipschitz-continuous function in presence of mean-reverting O rnstein- U hlenbeck noise. https://arxiv.org/pdf/2405.10795

work page arXiv 2024

[5] [5]

Gorgi, S

Blasques, F., P. Gorgi, S. J. Koopman, and O. Wintenberger (2018). Feasible invertibility conditions and maximum likelihood estimation for observation-driven models. Electronic Journal of Statistics\/ 12 , 1019--1052

work page 2018

[6] [6]

van Brummelen, S

Blasques, F., J. van Brummelen, S. J. Koopman, and A. Lucas (2022). Maximum likelihood estimation for score-driven models. Journal of Econometrics\/ 227\/ (2), 325--346

work page 2022

[7] [7]

Bottou, L., F. E. Curtis, and J. Nocedal (2018). Optimization methods for large-scale machine learning. SIAM Review\/ 60\/ (2), 223--311

work page 2018

[8] [8]

Bougerol, P. (1993). Kalman filtering with random coefficients and contractions. SIAM Journal on Control and Optimization\/ 31\/ (4), 942--959

work page 1993

[9] [9]

Brownlees, C. and J. Llorens-Terrazas (2024). Empirical risk minimization for time series: Nonparametric performance bounds for prediction. Journal of Econometrics\/ 244\/ (1), 105849

work page 2024

[10] [10]

Caivano, M. and A. Harvey (2014). Time-series models with an EGB 2 conditional distribution. Journal of Time Series Analysis\/ 35\/ (6), 558--571

work page 2014

[11] [11]

Harvey, and A

Caivano, M., A. Harvey, and A. Luati (2016). Robust time series models with trend and seasonal components. SERIEs\/ 7 , 99--120

work page 2016

[12] [12]

Zhang, and H

Cao, X., J. Zhang, and H. V. Poor (2019). On the time-varying distributions of online stochastic optimization. In 2019 American Control Conference (ACC) , pp.\ 1494--1500. IEEE

work page 2019

[13] [13]

Creal, D., S. J. Koopman, and A. Lucas (2013). Generalized autoregressive score models with applications. Journal of Applied Econometrics\/ 28\/ (5), 777--795

work page 2013

[14] [14]

Drusvyatskiy, and Z

Cutler, J., D. Drusvyatskiy, and Z. Harchaoui (2023). Stochastic optimization under distributional drift. Journal of Machine Learning Research\/ 24\/ (147), 1--56

work page 2023

[15] [15]

Davis, R. A., W. T. Dunsmuir, and S. B. Streett (2003). Observation-driven models for P oisson counts. Biometrika\/ 90\/ (4), 777--790

work page 2003

[16] [16]

Duchi, J. C. (2018). Introductory lectures on stochastic optimization. The Mathematics of Data\/ 25 , 99--186

work page 2018

[17] [17]

Durbin, J. and S. J. Koopman (1997). Monte C arlo maximum likelihood estimation for non- G aussian state space models. Biometrika\/ 84\/ (3), 669--684

work page 1997

[18] [18]

Durbin, J. and S. J. Koopman (2000). Time series analysis of non- G aussian observations based on state space models from both classical and B ayesian perspectives. Journal of the Royal Statistical Society Series B: Statistical Methodology\/ 62\/ (1), 3--56

work page 2000

[19] [19]

Durbin, J. and S. J. Koopman (2012). Time Series Analysis by State Space Methods , Volume 38. Oxford University Press

work page 2012

[20] [20]

Fahrmeir, L. (1992). Posterior mode estimation by extended K alman filtering for multivariate dynamic generalized linear models . Journal of the American Statistical Association\/ 87\/ (418), 501--509

work page 1992

[21] [21]

Gorgi, P. (2018). Integer-valued autoregressive models with survival probability driven by a stochastic recurrence equation. Journal of Time Series Analysis\/ 39\/ (2), 150--171

work page 2018

[22] [22]

Lauria, and A

Gorgi, P., C. Lauria, and A. Luati (2024). On the optimality of score-driven models. Biometrika\/ 111\/ (3), 865--880

work page 2024

[23] [23]

Guo, L. and L. Ljung (1995). Exponential stability of general tracking algorithms. IEEE Transactions on Automatic Control\/ 40\/ (8), 1376--1387

work page 1995

[24] [24]

Harvey, A. C. (1989). Forecasting, Structural Time Series Models and the K alman Filter . Cambridge University Press

work page 1989

[25] [25]

Harvey, A. C. (2013). Dynamic Models for Volatility and Heavy Tails: With Applications to Financial and Economic Time Series , Volume 52. Cambridge University Press

work page 2013

[26] [26]

Harvey, A. C. (2022). Score-driven time series models. Annual Review of Statistics and Its Application\/ 9\/ (1), 321--342

work page 2022

[27] [27]

Harvey, A. C. and C. Fernandes (1989). Time series models for count or qualitative observations. Journal of Business & Economic Statistics\/ 7\/ (4), 407--417

work page 1989

[28] [28]

Harvey, A. C. and A. Luati (2014). Filtering with heavy tails. Journal of the American Statistical Association\/ 109\/ (507), 1112--1122

work page 2014

[29] [29]

Henderson, H. V. and S. R. Searle (1981). On deriving the inverse of a sum of matrices. SIAM Review\/ 23\/ (1), 53--60

work page 1981

[30] [30]

Horn, R. A. and C. R. Johnson (2012). Matrix Analysis . Cambridge University Press

work page 2012

[31] [31]

Jungers, R. (2009). The Joint Spectral Radius: Theory and Applications , Volume 385. Springer

work page 2009

[32] [32]

Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. Journal of Basic Engineering\/ 82\/ (1), 35--45

work page 1960

[33] [33]

Koopman, S. J., A. Lucas, and M. Scharth (2016). Predicting time-varying parameters with parameter-driven and observation-driven models. Review of Economics and Statistics\/ 98\/ (1), 97--110

work page 2016

[34] [34]

Bonnabel, and F

Lambert, M., S. Bonnabel, and F. Bach (2022). The recursive variational G aussian approximation (R-VGA) . Statistics and Computing\/ 32\/ (10), 1--24

work page 2022

[35] [35]

Lanconelli, A. and C. S. Lauria (2024). Maximum likelihood with a time varying parameter. Statistical Papers\/ 65\/ (4), 2555--2566

work page 2024

[36] [36]

Lange, R.-J. (2024a). Bellman filtering and smoothing for state--space models. Journal of Econometrics\/ 238\/ (2), 105632

work page

[37] [37]

Lange, R.-J. (2024b). Short and simple introduction to B ellman filtering and smoothing . arXiv preprint arXiv:2405.12668\/

work page arXiv

[38] [38]

van Os, and D

Lange, R.-J., B. van Os, and D. J. van Dijk (2024). Implicit score-driven filters for time-varying parameter models. https://ssrn.com/abstract=4227958

work page 2024

[39] [39]

Lehmann, E. L. and G. Casella (1998). Theory of Point Estimation . Springer

work page 1998

[40] [40]

Liu, D. C. and J. Nocedal (1989). On the limited memory BFGS method for large scale optimization. Mathematical Programming\/ 45\/ (1), 503--528

work page 1989

[41] [41]

Becker, and E

Madden, L., S. Becker, and E. Dall’Anese (2021). Bounds for the tracking error of first-order online optimization methods. Journal of Optimization Theory and Applications\/ 189 , 437--457

work page 2021

[42] [42]

Juditsky, G

Nemirovski, A., A. Juditsky, G. Lan, and A. Shapiro (2009). Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization\/ 19\/ (4), 1574--1609

work page 2009

[43] [43]

Nesterov, Y. (1983). A method for unconstrained convex minimization problem with the rate of convergence O (1/k^2) . Doklady AN SSSR\/ 269 , 543--547

work page 1983

[44] [44]

Nesterov, Y. (2003). Introductory Lectures on Convex Optimization: A Basic Course , Volume 87. Springer

work page 2003

[45] [45]

Nesterov, Y. (2018). Lectures on C onvex Optimization , Volume 137. Springer

work page 2018

[46] [46]

Ollivier, Y. (2018). Online natural gradient as a K alman filter. Electronic Journal of Statistics\/ 12 , 2930--2961

work page 2018

[47] [47]

Sherman, J. and W. J. Morrison (1950). Adjustment of an inverse matrix corresponding to a change in one element of a given matrix. The Annals of Mathematical Statistics\/ 21\/ (1), 124--127

work page 1950

[48] [48]

Dall'Anese, S

Simonetto, A., E. Dall'Anese, S. Paternain, G. Leus, and G. B. Giannakis (2020). Time-varying convex optimization: Time-structured algorithms and applications. Proceedings of the IEEE\/ 108\/ (11), 2032--2048

work page 2020

[49] [49]

Straumann, D. and T. Mikosch (2006). Quasi-maximum-likelihood estimation in conditionally heteroscedastic time series: A stochastic recurrence equations approach. Annals of Statistics\/ 34 , 2449--2495

work page 2006

[50] [50]

Toulis, P. and E. M. Airoldi (2017). Asymptotic and finite-sample properties of estimators based on stochastic gradients. Annals of Statistics\/ 45 , 1694--1727

work page 2017

[51] [51]

Wilson, C., V. V. Veeravalli, and A. Nedi \'c (2019). Adaptive sequential stochastic optimization. IEEE Transactions on Automatic Control\/ 64\/ (2), 496--509

work page 2019