Implicit score-driven filters for time-varying parameter models

Bram van Os; Dick van Dijk; Rutger-Jan Lange

arxiv: 2512.02744 · v3 · submitted 2025-12-02 · 📊 stat.ME · econ.EM· stat.AP

Implicit score-driven filters for time-varying parameter models

Rutger-Jan Lange , Bram van Os , Dick van Dijk This is my paper

Pith reviewed 2026-05-17 02:20 UTC · model grok-4.3

classification 📊 stat.ME econ.EMstat.AP

keywords score-driven filterstime-varying parametersimplicit updateslog-concave densitiesobservation-driven modelsstability analysisstochastic gradientpseudo-true parameter

0 comments

The pith

Implicit score-driven filters remain stable for all learning rates when observation densities are log-concave.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an implicit score-driven update for observation-driven models whose parameters evolve over time. Instead of linearly approximating the log observation density around the predicted parameter as in explicit score-driven models, the new method solves an optimization problem that maximizes the full log-density while penalizing deviation from the one-step-ahead prediction. This implicit update preserves the entire shape of the density rather than truncating it to a first-order expansion. For log-concave densities, the resulting filter stays stable no matter how large the learning rate is chosen, and every update reduces mean squared error toward the pseudo-true parameter value. The guarantee holds whether the model is correctly specified or not.

Core claim

The central claim is that the implicit stochastic-gradient update, obtained by maximizing the logarithmic observation density subject to a quadratic penalty relative to the predicted parameter, produces a filter whose updates are contractive in mean squared error toward the (pseudo-)true parameter at every time step and remain stable for all learning rates, provided the observation densities are log-concave.

What carries the argument

The implicit score-driven (ISD) update, defined as the maximizer of the current log observation density minus a weighted L2 penalty on the distance to the one-step-ahead predicted parameter.

If this is right

Explicit score-driven models arise exactly when the log-density is replaced by its linear approximation around the prediction.
The ISD filter extends the local contraction properties of explicit updates to a global setting that holds for arbitrary step sizes.
The same stability and contraction results apply under misspecification as long as log-concavity is preserved.
Finance and macroeconomics applications can use the filter to track time-varying parameters with explicit global guarantees.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Practitioners could safely adopt larger learning rates in real-time applications without risking filter instability.
The implicit formulation may generalize to other density classes if a suitable contraction mapping can be established.
The method aligns score-driven filtering more closely with implicit gradient techniques used in optimization.
Performance gains over explicit approximations are most likely to appear in series with strong non-linearities or volatility clusters.

Load-bearing premise

The observation densities are log-concave so that the implicit update is well-defined and the contraction argument applies at every step.

What would settle it

A simulation or empirical series with a non-log-concave observation density in which raising the learning rate causes the parameter path to diverge or the mean squared error to stop contracting toward the pseudo-true value.

Figures

Figures reproduced from arXiv: 2512.02744 by Bram van Os, Dick van Dijk, Rutger-Jan Lange.

**Figure 2.** Figure 2: Kernel estimate (solid) of estimation errors of the ISD filter’s static parameters [PITH_FULL_IMAGE:figures/full_fig_p024_2.png] view at source ↗

**Figure 3.** Figure 3: Illustration of filtering performance with dynamic Gamma distribution. [PITH_FULL_IMAGE:figures/full_fig_p026_3.png] view at source ↗

**Figure 4.** Figure 4: MSE of out-of-sample predicted states for [PITH_FULL_IMAGE:figures/full_fig_p027_4.png] view at source ↗

**Figure 5.** Figure 5: Dynamic CAPM using the ISD and ESD filters for MSFT from March 1986 until [PITH_FULL_IMAGE:figures/full_fig_p029_5.png] view at source ↗

**Figure 6.** Figure 6: Growth-at-risk estimates for the ISD and ESD models for [PITH_FULL_IMAGE:figures/full_fig_p031_6.png] view at source ↗

**Figure 7.** Figure 7: T-bill spread data with filtered paths and impact curves. [PITH_FULL_IMAGE:figures/full_fig_p032_7.png] view at source ↗

read the original abstract

We propose an observation-driven modeling framework that allows model parameters to vary over time through an implicit score-driven (ISD) update. The ISD update maximizes the logarithmic observation density with respect to the parameter vector while penalizing the weighted L2 norm relative to a one-step-ahead predicted parameter. This yields an implicit stochastic-gradient update. We show that the popular class of explicit score-driven (ESD) models arises when the observation log density is linearly approximated around the prediction. By preserving the full density, the ISD update extends the favorable local properties of the ESD update to a global setting. For log-concave observation densities, whether correctly specified or not, the ISD filter is stable for all learning rates, and its updates are contractive in mean squared error toward the (pseudo-)true parameter at every time step. We demonstrate the usefulness of ISD filters in simulations and empirical applications in finance and macroeconomics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces an implicit score-driven update that uses the full density instead of a linear approximation, but the claimed unconditional MSE contraction at every step does not hold because of irreducible observation noise.

read the letter

The main contribution is defining the time-varying parameter update as the solution to a penalized optimization problem that maximizes the log observation density while staying close to the one-step prediction. This implicit step reduces to the standard explicit score-driven update when the density is linearized around the prediction, and the authors show it extends local stability results to a global setting for log-concave densities. That framing is a clean technical move and worth having on record for people already working with score-driven models in econometrics and finance. They correctly note that log-concavity guarantees the penalized objective is strictly concave, so the update exists and is unique for any learning rate, and the filter remains stable. That part looks solid on the abstract and the stress-test math. The soft spot is the stronger claim that the updates are contractive in mean squared error toward the pseudo-true parameter at every time step. The Gaussian location example, which satisfies all the stated conditions, produces an expected squared error equal to a factor less than one times the prior distance squared plus a positive term from the observation variance. Once the current error drops below a threshold set by that variance, the next expected error rises rather than falls. The same additive structure appears for any non-degenerate log-concave density, so the unconditional per-step contraction does not follow from log-concavity alone. The paper mentions simulations and macro-finance applications, but without the numbers it is hard to judge whether the implicit version delivers measurable gains over explicit alternatives in practice. This is aimed at researchers who already use or extend score-driven filters for time-varying parameters. A reader in that niche will find the implicit formulation useful even if the contraction statement needs to be restated to account for noise. It is worth sending to a serious referee to verify the proofs and see whether the stability result can be tightened without losing the global flavor.

Referee Report

1 major / 1 minor

Summary. The manuscript proposes implicit score-driven (ISD) filters for time-varying parameter models. The ISD update is obtained by maximizing the log observation density penalized by a weighted L2 distance to the one-step-ahead prediction, producing an implicit stochastic-gradient step. The paper shows that explicit score-driven (ESD) models arise exactly when the log-density is linearly approximated around the prediction. For log-concave observation densities (correctly specified or misspecified), it claims that the ISD filter is stable for all learning rates and that the updates are contractive in mean squared error toward the (pseudo-)true parameter at every time step. Simulations and empirical applications in finance and macroeconomics are presented to illustrate the approach.

Significance. If the stability and contraction results can be established in a form that accounts for irreducible observation noise, the ISD framework would usefully extend the local properties of score-driven models to a global setting while preserving the link to the well-studied ESD class. The reduction of ISD to ESD under linear approximation is a clear presentational strength.

major comments (1)

[Abstract] Abstract: the claim that 'its updates are contractive in mean squared error toward the (pseudo-)true parameter at every time step' is contradicted by the elementary Gaussian location model, which is log-concave. The closed-form ISD update yields the conditional expectation E[||θ_new − θ*||² | θ_pred] = [λ/(λ+1)]² ||θ_pred − θ*||² + 1/(λ+1)². The additive positive term implies that the expected squared error can increase when ||θ_pred − θ*|| is smaller than 1/√(2λ+1), so the stated unconditional contraction does not hold for all initial distances. This is load-bearing for the global stability result.

minor comments (1)

The abstract refers to 'simulations and empirical applications' without indicating the relevant sections or tables, making it harder to evaluate the scope and design of the numerical evidence.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and for identifying the need to refine the wording on the contraction property. We address the comment below and will make the corresponding revisions.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that 'its updates are contractive in mean squared error toward the (pseudo-)true parameter at every time step' is contradicted by the elementary Gaussian location model, which is log-concave. The closed-form ISD update yields the conditional expectation E[||θ_new − θ*||² | θ_pred] = [λ/(λ+1)]² ||θ_pred − θ*||² + 1/(λ+1)². The additive positive term implies that the expected squared error can increase when ||θ_pred − θ*|| is smaller than 1/√(2λ+1), so the stated unconditional contraction does not hold for all initial distances. This is load-bearing for the global stability result.

Authors: We thank the referee for this precise observation. The calculation for the Gaussian location model is correct: the conditional expected squared error takes the form a·d + b with a = [λ/(λ+1)]² < 1 and b = 1/(λ+1)² > 0. Consequently, the MSE to the pseudo-true parameter does not decrease at every step when the current distance d is sufficiently small. We agree that the original abstract wording claiming unconditional contraction in MSE at every time step is imprecise. The global stability result, however, rests on the fact that the linear contraction factor a is strictly less than one for any λ > 0; this ensures that the recursion remains bounded and converges in expectation to a neighborhood of the pseudo-true parameter whose size is controlled by b, even from arbitrary initial conditions. We will revise the abstract to state that the ISD filter is stable for all learning rates and that the updates are contractive in mean squared error toward the (pseudo-)true parameter up to an irreducible observation-noise term. We will also add a short remark (with the Gaussian example) in the theoretical section to clarify the distinction between strict contraction and contraction-plus-bounded-noise. These changes preserve the validity of the stability theorems while accurately describing the MSE dynamics. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation is self-contained from optimization and log-concavity

full rationale

The ISD update is defined directly as the argmax of the penalized log-density objective. Stability and MSE contraction are derived as theorems from the strict concavity guaranteed by log-concave densities, without reducing the target claim to a fitted parameter, a self-citation chain, or an input quantity by construction. The reduction to ESD under linear approximation is an explicit limiting case shown from the same objective, not a renaming or smuggling of prior results. The paper's central claims therefore retain independent mathematical content under the stated assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central stability claim rests on the log-concavity of the observation density and the existence of a unique maximizer of the penalized log-density at each step; no free parameters or invented entities are named in the abstract.

axioms (1)

domain assumption Observation densities are log-concave
Invoked to guarantee stability for all learning rates and mean-squared-error contraction at every time step.

pith-pipeline@v0.9.0 · 5451 in / 1242 out tokens · 31311 ms · 2026-05-17T02:20:24.080656+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The ISD update maximizes the logarithmic observation density with respect to the parameter vector while penalizing the weighted L2 norm relative to a one-step-ahead predicted parameter... for log-concave observation densities... updates are contractive in mean squared error toward the (pseudo-)true parameter at every time step (Theorem 2, Corollary 1).
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Assumption 5 (Log-concave observation density) logp(yt|θ) + αt/2 ∥θ∥² is concave...

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

78 extracted references · 78 canonical work pages · 2 internal anchors

[1]

Boyarchenko, and D

Adrian, T., N. Boyarchenko, and D. Giannone (2019). Vulnerable growth. American Economic Review\/ 109\/ (4), 1263--89

work page 2019
[2]

Akyildiz, \"O . D., E. Chouzenoux, V. Elvira, and J. M \'i guez (2019). A probabilistic incremental proximal gradient method. IEEE Signal Processing Letters\/ 26\/ (8), 1257--1261

work page 2019
[3]

Amari, S.-i. (1993). Backpropagation and stochastic gradient descent method. Neurocomputing\/ 5\/ (4-5), 185--196

work page 1993
[4]

Anderson, B. D. and J. B. Moore (2012). Optimal filtering . Prentice-Hall

work page 2012
[5]

Blasques, J

Artemova, M., F. Blasques, J. van Brummelen, and S. J. Koopman (2022a). Score-driven models: M ethodology and theory. In Oxford Research Encyclopedia of Economics and Finance . Oxford University Press

work page
[6]

Blasques, J

Artemova, M., F. Blasques, J. van Brummelen, and S. J. Koopman (2022b). Score-driven models: Methods and applications . In Oxford Research Encyclopedia of Economics and Finance . Oxford University Press

work page
[7]

Asi, H. and J. C. Duchi (2019). Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization\/ 29\/ (3), 2257--2290

work page 2019
[8]

Bauschke, H. H., J. M. Borwein, and P. L. Combettes (2003). Bregman monotone optimization algorithms. SIAM Journal on Control and Optimization\/ 42\/ (2), 596--636

work page 2003
[9]

M \'e tivier, and P

Benveniste, A., M. M \'e tivier, and P. Priouret (2012). Adaptive algorithms and stochastic approximations . Springer

work page 2012
[10]

Benveniste, A. and G. Ruget (2003). A measure of the tracking capability of recursive stochastic algorithms with constant gains. IEEE Transactions on Automatic Control\/ 27\/ (3), 639--649

work page 2003
[11]

Bertsekas, D. P. (1996). Incremental least squares methods and the extended K alman filter. SIAM Journal on Optimization\/ 6\/ (3), 807--822

work page 1996
[12]

Beutner, E. A., Y. Lin, and A. Lucas (2023). Consistency, distributional convergence, and optimality of score-driven filters. Preprint\/ . https://papers.tinbergen.nl/23051.pdf

work page 2023
[13]

Bianchi, P. (2016). Ergodic convergence of a stochastic proximal point algorithm. SIAM Journal on Optimization\/ 26\/ (4), 2235--2260

work page 2016
[14]

Bierman, G. J. (1977). Factorization methods for discrete sequential estimation . Academic Press

work page 1977
[15]

Gorgi, S

Blasques, F., P. Gorgi, S. J. Koopman, and O. Wintenberger (2018). Feasible invertibility conditions and maximum likelihood estimation for observation-driven models. Electronic Journal of Statistics\/ 12\/ (1), 1019--1052

work page 2018
[16]

van Brummelen, S

Blasques, F., J. van Brummelen, S. J. Koopman, and A. Lucas (2022). Maximum likelihood estimation for score-driven models. Journal of Econometrics\/ 227\/ (2), 325--346

work page 2022
[17]

Lyapunov Theory for Discrete Time Systems

Bof, N., R. Carli, and L. Schenato (2018). Lyapunov theory for discrete time systems. arXiv preprint arXiv:1809.05289\/

work page internal anchor Pith review Pith/arXiv arXiv 2018
[18]

Bottou, L. (2012). Stochastic gradient descent tricks. In G. Montavon, G. Orr, and K.-R. M \"u ller (Eds.), Neural networks: Tricks of the trade , pp.\ 421--436. Springer

work page 2012
[19]

Bougerol, P. (1993). Kalman filtering with random coefficients and contractions. SIAM Journal on Control and Optimization\/ 31\/ (4), 942--959

work page 1993
[20]

Boyd, S. and L. Vandenberghe (2004). Convex optimization . Cambridge University Press

work page 2004
[21]

Brandt, A. (1986). The stochastic equation Y_ n+ 1 = A_n\,Y_n+ B_n with stationary coefficients . Advances in Applied Probability\/ 18\/ (1), 211--220

work page 1986
[22]

Caivano, M., A. C. Harvey, and A. Luati (2016). Robust time series models with trend and seasonal components. SERIEs\/ 7 , 99--120

work page 2016
[23]

Cesa-Bianchi, N. and F. Orabona (2021). Online learning algorithms. Annual Review of Statistics and Its Application\/ 8 , 165--190

work page 2021
[24]

Chopin, N. and O. Papaspiliopoulos (2020). An introduction to sequential M onte C arlo . Springer

work page 2020
[25]

Creal, D., S. J. Koopman, and A. Lucas (2013). Generalized autoregressive score models with applications. Journal of Applied Econometrics\/ 28\/ (5), 777--795

work page 2013
[26]

Schwaab, S

Creal, D., B. Schwaab, S. J. Koopman, and A. Lucas (2014). Observation-driven mixed-measurement dynamic factor models with an application to credit risk. Review of Economics and Statistics\/ 96\/ (5), 898--915

work page 2014
[27]

Diniz, P. S. (1997). Adaptive filtering . Springer

work page 1997
[28]

Engle, R. F. (2002). Dynamic conditional correlation: A simple class of multivariate generalized autoregressive conditional heteroskedasticity models. Journal of Business & Economic Statistics\/ 20\/ (3), 339--350

work page 2002
[29]

Engle, R. F. and S. Manganelli (2004). CAViaR: Conditional autoregressive value at risk by regression quantiles . Journal of Business & Economic Statistics\/ 22\/ (4), 367--381

work page 2004
[30]

Fearnhead, P. and L. Meligkotsidou (2004). Exact filtering for partially observed continuous time models. Journal of the Royal Statistical Society Series B: Statistical Methodology\/ 66\/ (3), 771--789

work page 2004
[31]

Geraci, M. and M. Bottai (2007). Quantile regression for longitudinal data using the asymmetric L aplace distribution. Biostatistics\/ 8\/ (1), 140--154

work page 2007
[32]

Gorgi, P. (2020). Beta--negative binomial auto-regressions for modelling integer-valued time series with extreme observations. Journal of the Royal Statistical Society Series B: Statistical Methodology\/ 82\/ (5), 1325--1347

work page 2020
[33]

Grimmer, B., H. Lu, P. Worah, and V. Mirrokni (2023). The landscape of the proximal point method for nonconvex--nonconcave minimax optimization. Mathematical Programming\/ 201\/ (1), 373--407

work page 2023
[34]

Hare, W. and C. Sagastiz \'a bal (2009). Computing proximal points of nonconvex functions. Mathematical Programming\/ 116\/ (1), 221--258

work page 2009
[35]

Harvey, A. C. (2013). Dynamic models for volatility and heavy tails: W ith applications to financial and economic time series . Cambridge University Press

work page 2013
[36]

Harvey, A. C. (2022). Score-driven time series models. Annual Review of Statistics and Its Application\/ 9 , 321--342

work page 2022
[37]

Harvey, A. C. and R.-J. Lange (2017). Volatility modeling with a generalized t distribution. Journal of Time Series Analysis\/ 38\/ (2), 175--190

work page 2017
[38]

Harvey, A. C. and R.-J. Lange (2018). Modeling the interactions between volatility and returns using EGARCH-M . Journal of Time Series Analysis\/ 39\/ (6), 909--919

work page 2018
[39]

Harvey, A. C. and A. Luati (2014). Filtering with heavy tails. Journal of the American Statistical Association\/ 109\/ (507), 1112--1122

work page 2014
[40]

Jagannathan, R. and Z. Wang (1996). The conditional CAPM and the cross-section of expected returns . The Journal of Finance\/ 51\/ (1), 3--53

work page 1996
[41]

Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. Journal of Basic Engineering\/ 82\/ (1), 35--45

work page 1960
[42]

Koenker, R. and G. Bassett (1978). Regression quantiles. Econometrica: Journal of the Econometric Society\/ 46 , 33--50

work page 1978
[43]

Koenker, R. and K. F. Hallock (2001). Quantile regression. Journal of Economic Perspectives\/ 15\/ (4), 143--156

work page 2001
[44]

Koenker, R. and J. A. Machado (1999). Goodness of fit and related inference processes for quantile regression. Journal of the American Statistical Association\/ 94\/ (448), 1296--1310

work page 1999
[45]

Koopman, S. J., A. Lucas, and M. Scharth (2016). Predicting time-varying parameters with parameter-driven and observation-driven models. The Review of Economics and Statistics\/ 98\/ (1), 97--110

work page 2016
[46]

Castellanos P \'e rez-Bolde, C

Koyama, S., L. Castellanos P \'e rez-Bolde, C. R. Shalizi, and R. E. Kass (2010). Approximate methods for state-space models. Journal of the American Statistical Association\/ 105\/ (489), 170--180

work page 2010
[47]

Krengel, U. (1985). Ergodic theorems . Walter de Gruyter

work page 1985
[48]

Kulis, B. and P. L. Bartlett (2010). Implicit online learning. Proceedings of the 27th International Conference on Machine Learning\/ , 575--582

work page 2010
[49]

Kushner, H. (2010). Stochastic approximation: A survey. Wiley Interdisciplinary Reviews: Computational Statistics\/ 2\/ (1), 87--96

work page 2010
[50]

Kushner, H. and J. Yang (2002). Analysis of adaptive step-size sa algorithms for parameter tracking. IEEE Transactions on Automatic Control\/ 40\/ (8), 1403--1410

work page 2002
[51]

Lange, R.-J. (2024a). Bellman filtering and smoothing for state--space models. Journal of Econometrics\/ 238\/ (2), 105632

work page
[52]

Lange, R.-J. (2024b). Short and simple introduction to B ellman filtering and smoothing. Preprint arXiv:2405.12668\/

work page arXiv
[53]

Ljung, L. (1977). Analysis of recursive stochastic algorithms. IEEE Transactions on Automatic Control\/ 22\/ (4), 551--575

work page 1977
[54]

Nagumo, J.-I. and A. Noda (1967). A learning method for system identification. IEEE Transactions on Automatic Control\/ 12\/ (3), 282--287

work page 1967
[55]

Nesterov, Y. (2018). Lectures on convex optimization . Springer

work page 2018
[56]

Janus, A

Opschoor, A., P. Janus, A. Lucas, and D. van Dijk (2018). New heavy models for fat-tailed realized covariances and returns. Journal of Business & Economic Statistics\/ 36\/ (4), 643--657

work page 2018
[57]

Orabona, F. (2019). A modern introduction to online learning. Preprint arXiv:1912.13213\/

work page internal anchor Pith review Pith/arXiv arXiv 2019
[58]

Parikh, N. and S. Boyd (2014). Proximal algorithms. Foundations and Trends in Optimization\/ 1\/ (3), 127--239

work page 2014
[59]

Patrascu, A. and I. Necoara (2018). Nonasymptotic convergence of stochastic proximal point methods for constrained convex optimization. The Journal of Machine Learning Research\/ 18\/ (1), 7204--7245

work page 2018
[60]

Polson, N. G., J. G. Scott, and B. T. Willard (2015). Proximal algorithms in statistics and machine learning. Statistical Science\/ 30\/ (4), 559 -- 581

work page 2015
[61]

Robbins, H. and S. Monro (1951). A stochastic approximation method. The Annals of Mathematical Statistics\/ 22\/ (3), 400--407

work page 1951
[62]

Rockafellar, R. T. (1976). Monotone operators and the proximal point algorithm. SIAM Journal on Control and Optimization\/ 14\/ (5), 877--898

work page 1976
[63]

Martino, and N

Rue, H., S. Martino, and N. Chopin (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations . Journal of the Royal Statistical Society Series B: Statistical Methodology\/ 71\/ (2), 319--392

work page 2009
[64]

Ryu, E. K. and S. Boyd (2016). Stochastic proximal iteration: A non-asymptotic improvement upon stochastic gradient descent. Author website https://web.stanford.edu/ boyd/papers/pdf/spi.pdf\/

work page 2016
[65]

Simonetto, A. and P. Massioni (2024). Nonlinear optimization filters for stochastic time-varying convex optimization. International Journal of Robust and Nonlinear Control\/ 34\/ (12), 8065--8089

work page 2024
[66]

Stock, J. H. and M. W. Watson (1996). Evidence on structural instability in macroeconomic time series relations. Journal of Business & Economic Statistics\/ 14\/ (1), 11--30

work page 1996
[67]

Straumann, D. and T. Mikosch (2006). Quasi-maximum-likelihood estimation in conditionally heteroscedastic time series: A stochastic recurrence equations approach. The Annals of Statistics\/ 34\/ (5), 2449--2495

work page 2006
[68]

Ter \"a svirta, T. (2009). An introduction to univariate GARCH models . In T. G. Andersen, R. A. Davis, J.-P. Krei , and T. V. Mikosch (Eds.), Handbook of financial time series , pp.\ 17--42. Springer

work page 2009
[69]

Toulis, P. and E. M. Airoldi (2015). Scalable estimation strategies based on stochastic approximations: Classical results and new insights. Statistics and Computing\/ 25\/ (4), 781--795

work page 2015
[70]

Toulis, P. and E. M. Airoldi (2017). Asymptotic and finite-sample properties of estimators based on stochastic gradients. The Annals of Statistics\/ 45\/ (4), 1694--1727

work page 2017
[71]

Horel, and E

Toulis, P., T. Horel, and E. M. Airoldi (2021). The proximal Robbins-Monro method . Journal of the Royal Statistical Society Series B: Statistical Methodology\/ 83\/ (1), 188--212

work page 2021
[72]

Tran, and E

Toulis, P., D. Tran, and E. M. Airoldi (2016). Towards stability and optimality in stochastic gradient descent. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics\/ 51 , 1290--1298

work page 2016
[73]

Wu, L. and W. J. Su (2023). The implicit regularization of dynamical stability in stochastic gradient descent. In International Conference on Machine Learning , pp.\ 37656--37684. PMLR

work page 2023
[74]

Zou, H. and M. Yuan (2008). Composite quantile regression and the oracle model selection theory. The Annals of Statistics\/ 36\/ (3), 1108--1126

work page 2008
[75]

Blasques, J

Artemova, M., F. Blasques, J. van Brummelen, and S. J. Koopman (2022). Score-driven models: M ethodology and theory. In Oxford Research Encyclopedia of Economics and Finance . Oxford University Press

work page 2022
[76]

Blasques, F., S. J. Koopman, and A. Lucas (2015). Information-theoretic optimality of observation-driven time series models for continuous responses. Biometrika\/ 102\/ (2), 325--343

work page 2015
[77]

Poznyak, A. (2008). Advanced mathematical tools for automatic control engineers: Deterministic techniques . Elsevier

work page 2008
[78]

write newline

" write newline "" before.all 'output.state := FUNCTION article output.bibitem format.authors "author" output.check author format.key output output.year.check new.block format.title "title" output.check new.block crossref missing format.jour.vol output format.article.crossref output.nonnull format.pages output if new.block note output fin.entry FUNCTION b...

work page

[1] [1]

Boyarchenko, and D

Adrian, T., N. Boyarchenko, and D. Giannone (2019). Vulnerable growth. American Economic Review\/ 109\/ (4), 1263--89

work page 2019

[2] [2]

Akyildiz, \"O . D., E. Chouzenoux, V. Elvira, and J. M \'i guez (2019). A probabilistic incremental proximal gradient method. IEEE Signal Processing Letters\/ 26\/ (8), 1257--1261

work page 2019

[3] [3]

Amari, S.-i. (1993). Backpropagation and stochastic gradient descent method. Neurocomputing\/ 5\/ (4-5), 185--196

work page 1993

[4] [4]

Anderson, B. D. and J. B. Moore (2012). Optimal filtering . Prentice-Hall

work page 2012

[5] [5]

Blasques, J

Artemova, M., F. Blasques, J. van Brummelen, and S. J. Koopman (2022a). Score-driven models: M ethodology and theory. In Oxford Research Encyclopedia of Economics and Finance . Oxford University Press

work page

[6] [6]

Blasques, J

Artemova, M., F. Blasques, J. van Brummelen, and S. J. Koopman (2022b). Score-driven models: Methods and applications . In Oxford Research Encyclopedia of Economics and Finance . Oxford University Press

work page

[7] [7]

Asi, H. and J. C. Duchi (2019). Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization\/ 29\/ (3), 2257--2290

work page 2019

[8] [8]

Bauschke, H. H., J. M. Borwein, and P. L. Combettes (2003). Bregman monotone optimization algorithms. SIAM Journal on Control and Optimization\/ 42\/ (2), 596--636

work page 2003

[9] [9]

M \'e tivier, and P

Benveniste, A., M. M \'e tivier, and P. Priouret (2012). Adaptive algorithms and stochastic approximations . Springer

work page 2012

[10] [10]

Benveniste, A. and G. Ruget (2003). A measure of the tracking capability of recursive stochastic algorithms with constant gains. IEEE Transactions on Automatic Control\/ 27\/ (3), 639--649

work page 2003

[11] [11]

Bertsekas, D. P. (1996). Incremental least squares methods and the extended K alman filter. SIAM Journal on Optimization\/ 6\/ (3), 807--822

work page 1996

[12] [12]

Beutner, E. A., Y. Lin, and A. Lucas (2023). Consistency, distributional convergence, and optimality of score-driven filters. Preprint\/ . https://papers.tinbergen.nl/23051.pdf

work page 2023

[13] [13]

Bianchi, P. (2016). Ergodic convergence of a stochastic proximal point algorithm. SIAM Journal on Optimization\/ 26\/ (4), 2235--2260

work page 2016

[14] [14]

Bierman, G. J. (1977). Factorization methods for discrete sequential estimation . Academic Press

work page 1977

[15] [15]

Gorgi, S

Blasques, F., P. Gorgi, S. J. Koopman, and O. Wintenberger (2018). Feasible invertibility conditions and maximum likelihood estimation for observation-driven models. Electronic Journal of Statistics\/ 12\/ (1), 1019--1052

work page 2018

[16] [16]

van Brummelen, S

Blasques, F., J. van Brummelen, S. J. Koopman, and A. Lucas (2022). Maximum likelihood estimation for score-driven models. Journal of Econometrics\/ 227\/ (2), 325--346

work page 2022

[17] [17]

Lyapunov Theory for Discrete Time Systems

Bof, N., R. Carli, and L. Schenato (2018). Lyapunov theory for discrete time systems. arXiv preprint arXiv:1809.05289\/

work page internal anchor Pith review Pith/arXiv arXiv 2018

[18] [18]

Bottou, L. (2012). Stochastic gradient descent tricks. In G. Montavon, G. Orr, and K.-R. M \"u ller (Eds.), Neural networks: Tricks of the trade , pp.\ 421--436. Springer

work page 2012

[19] [19]

Bougerol, P. (1993). Kalman filtering with random coefficients and contractions. SIAM Journal on Control and Optimization\/ 31\/ (4), 942--959

work page 1993

[20] [20]

Boyd, S. and L. Vandenberghe (2004). Convex optimization . Cambridge University Press

work page 2004

[21] [21]

Brandt, A. (1986). The stochastic equation Y_ n+ 1 = A_n\,Y_n+ B_n with stationary coefficients . Advances in Applied Probability\/ 18\/ (1), 211--220

work page 1986

[22] [22]

Caivano, M., A. C. Harvey, and A. Luati (2016). Robust time series models with trend and seasonal components. SERIEs\/ 7 , 99--120

work page 2016

[23] [23]

Cesa-Bianchi, N. and F. Orabona (2021). Online learning algorithms. Annual Review of Statistics and Its Application\/ 8 , 165--190

work page 2021

[24] [24]

Chopin, N. and O. Papaspiliopoulos (2020). An introduction to sequential M onte C arlo . Springer

work page 2020

[25] [25]

Creal, D., S. J. Koopman, and A. Lucas (2013). Generalized autoregressive score models with applications. Journal of Applied Econometrics\/ 28\/ (5), 777--795

work page 2013

[26] [26]

Schwaab, S

Creal, D., B. Schwaab, S. J. Koopman, and A. Lucas (2014). Observation-driven mixed-measurement dynamic factor models with an application to credit risk. Review of Economics and Statistics\/ 96\/ (5), 898--915

work page 2014

[27] [27]

Diniz, P. S. (1997). Adaptive filtering . Springer

work page 1997

[28] [28]

Engle, R. F. (2002). Dynamic conditional correlation: A simple class of multivariate generalized autoregressive conditional heteroskedasticity models. Journal of Business & Economic Statistics\/ 20\/ (3), 339--350

work page 2002

[29] [29]

Engle, R. F. and S. Manganelli (2004). CAViaR: Conditional autoregressive value at risk by regression quantiles . Journal of Business & Economic Statistics\/ 22\/ (4), 367--381

work page 2004

[30] [30]

Fearnhead, P. and L. Meligkotsidou (2004). Exact filtering for partially observed continuous time models. Journal of the Royal Statistical Society Series B: Statistical Methodology\/ 66\/ (3), 771--789

work page 2004

[31] [31]

Geraci, M. and M. Bottai (2007). Quantile regression for longitudinal data using the asymmetric L aplace distribution. Biostatistics\/ 8\/ (1), 140--154

work page 2007

[32] [32]

Gorgi, P. (2020). Beta--negative binomial auto-regressions for modelling integer-valued time series with extreme observations. Journal of the Royal Statistical Society Series B: Statistical Methodology\/ 82\/ (5), 1325--1347

work page 2020

[33] [33]

Grimmer, B., H. Lu, P. Worah, and V. Mirrokni (2023). The landscape of the proximal point method for nonconvex--nonconcave minimax optimization. Mathematical Programming\/ 201\/ (1), 373--407

work page 2023

[34] [34]

Hare, W. and C. Sagastiz \'a bal (2009). Computing proximal points of nonconvex functions. Mathematical Programming\/ 116\/ (1), 221--258

work page 2009

[35] [35]

Harvey, A. C. (2013). Dynamic models for volatility and heavy tails: W ith applications to financial and economic time series . Cambridge University Press

work page 2013

[36] [36]

Harvey, A. C. (2022). Score-driven time series models. Annual Review of Statistics and Its Application\/ 9 , 321--342

work page 2022

[37] [37]

Harvey, A. C. and R.-J. Lange (2017). Volatility modeling with a generalized t distribution. Journal of Time Series Analysis\/ 38\/ (2), 175--190

work page 2017

[38] [38]

Harvey, A. C. and R.-J. Lange (2018). Modeling the interactions between volatility and returns using EGARCH-M . Journal of Time Series Analysis\/ 39\/ (6), 909--919

work page 2018

[39] [39]

Harvey, A. C. and A. Luati (2014). Filtering with heavy tails. Journal of the American Statistical Association\/ 109\/ (507), 1112--1122

work page 2014

[40] [40]

Jagannathan, R. and Z. Wang (1996). The conditional CAPM and the cross-section of expected returns . The Journal of Finance\/ 51\/ (1), 3--53

work page 1996

[41] [41]

Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. Journal of Basic Engineering\/ 82\/ (1), 35--45

work page 1960

[42] [42]

Koenker, R. and G. Bassett (1978). Regression quantiles. Econometrica: Journal of the Econometric Society\/ 46 , 33--50

work page 1978

[43] [43]

Koenker, R. and K. F. Hallock (2001). Quantile regression. Journal of Economic Perspectives\/ 15\/ (4), 143--156

work page 2001

[44] [44]

Koenker, R. and J. A. Machado (1999). Goodness of fit and related inference processes for quantile regression. Journal of the American Statistical Association\/ 94\/ (448), 1296--1310

work page 1999

[45] [45]

Koopman, S. J., A. Lucas, and M. Scharth (2016). Predicting time-varying parameters with parameter-driven and observation-driven models. The Review of Economics and Statistics\/ 98\/ (1), 97--110

work page 2016

[46] [46]

Castellanos P \'e rez-Bolde, C

Koyama, S., L. Castellanos P \'e rez-Bolde, C. R. Shalizi, and R. E. Kass (2010). Approximate methods for state-space models. Journal of the American Statistical Association\/ 105\/ (489), 170--180

work page 2010

[47] [47]

Krengel, U. (1985). Ergodic theorems . Walter de Gruyter

work page 1985

[48] [48]

Kulis, B. and P. L. Bartlett (2010). Implicit online learning. Proceedings of the 27th International Conference on Machine Learning\/ , 575--582

work page 2010

[49] [49]

Kushner, H. (2010). Stochastic approximation: A survey. Wiley Interdisciplinary Reviews: Computational Statistics\/ 2\/ (1), 87--96

work page 2010

[50] [50]

Kushner, H. and J. Yang (2002). Analysis of adaptive step-size sa algorithms for parameter tracking. IEEE Transactions on Automatic Control\/ 40\/ (8), 1403--1410

work page 2002

[51] [51]

Lange, R.-J. (2024a). Bellman filtering and smoothing for state--space models. Journal of Econometrics\/ 238\/ (2), 105632

work page

[52] [52]

Lange, R.-J. (2024b). Short and simple introduction to B ellman filtering and smoothing. Preprint arXiv:2405.12668\/

work page arXiv

[53] [53]

Ljung, L. (1977). Analysis of recursive stochastic algorithms. IEEE Transactions on Automatic Control\/ 22\/ (4), 551--575

work page 1977

[54] [54]

Nagumo, J.-I. and A. Noda (1967). A learning method for system identification. IEEE Transactions on Automatic Control\/ 12\/ (3), 282--287

work page 1967

[55] [55]

Nesterov, Y. (2018). Lectures on convex optimization . Springer

work page 2018

[56] [56]

Janus, A

Opschoor, A., P. Janus, A. Lucas, and D. van Dijk (2018). New heavy models for fat-tailed realized covariances and returns. Journal of Business & Economic Statistics\/ 36\/ (4), 643--657

work page 2018

[57] [57]

Orabona, F. (2019). A modern introduction to online learning. Preprint arXiv:1912.13213\/

work page internal anchor Pith review Pith/arXiv arXiv 2019

[58] [58]

Parikh, N. and S. Boyd (2014). Proximal algorithms. Foundations and Trends in Optimization\/ 1\/ (3), 127--239

work page 2014

[59] [59]

Patrascu, A. and I. Necoara (2018). Nonasymptotic convergence of stochastic proximal point methods for constrained convex optimization. The Journal of Machine Learning Research\/ 18\/ (1), 7204--7245

work page 2018

[60] [60]

Polson, N. G., J. G. Scott, and B. T. Willard (2015). Proximal algorithms in statistics and machine learning. Statistical Science\/ 30\/ (4), 559 -- 581

work page 2015

[61] [61]

Robbins, H. and S. Monro (1951). A stochastic approximation method. The Annals of Mathematical Statistics\/ 22\/ (3), 400--407

work page 1951

[62] [62]

Rockafellar, R. T. (1976). Monotone operators and the proximal point algorithm. SIAM Journal on Control and Optimization\/ 14\/ (5), 877--898

work page 1976

[63] [63]

Martino, and N

Rue, H., S. Martino, and N. Chopin (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations . Journal of the Royal Statistical Society Series B: Statistical Methodology\/ 71\/ (2), 319--392

work page 2009

[64] [64]

Ryu, E. K. and S. Boyd (2016). Stochastic proximal iteration: A non-asymptotic improvement upon stochastic gradient descent. Author website https://web.stanford.edu/ boyd/papers/pdf/spi.pdf\/

work page 2016

[65] [65]

Simonetto, A. and P. Massioni (2024). Nonlinear optimization filters for stochastic time-varying convex optimization. International Journal of Robust and Nonlinear Control\/ 34\/ (12), 8065--8089

work page 2024

[66] [66]

Stock, J. H. and M. W. Watson (1996). Evidence on structural instability in macroeconomic time series relations. Journal of Business & Economic Statistics\/ 14\/ (1), 11--30

work page 1996

[67] [67]

Straumann, D. and T. Mikosch (2006). Quasi-maximum-likelihood estimation in conditionally heteroscedastic time series: A stochastic recurrence equations approach. The Annals of Statistics\/ 34\/ (5), 2449--2495

work page 2006

[68] [68]

Ter \"a svirta, T. (2009). An introduction to univariate GARCH models . In T. G. Andersen, R. A. Davis, J.-P. Krei , and T. V. Mikosch (Eds.), Handbook of financial time series , pp.\ 17--42. Springer

work page 2009

[69] [69]

Toulis, P. and E. M. Airoldi (2015). Scalable estimation strategies based on stochastic approximations: Classical results and new insights. Statistics and Computing\/ 25\/ (4), 781--795

work page 2015

[70] [70]

Toulis, P. and E. M. Airoldi (2017). Asymptotic and finite-sample properties of estimators based on stochastic gradients. The Annals of Statistics\/ 45\/ (4), 1694--1727

work page 2017

[71] [71]

Horel, and E

Toulis, P., T. Horel, and E. M. Airoldi (2021). The proximal Robbins-Monro method . Journal of the Royal Statistical Society Series B: Statistical Methodology\/ 83\/ (1), 188--212

work page 2021

[72] [72]

Tran, and E

Toulis, P., D. Tran, and E. M. Airoldi (2016). Towards stability and optimality in stochastic gradient descent. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics\/ 51 , 1290--1298

work page 2016

[73] [73]

Wu, L. and W. J. Su (2023). The implicit regularization of dynamical stability in stochastic gradient descent. In International Conference on Machine Learning , pp.\ 37656--37684. PMLR

work page 2023

[74] [74]

Zou, H. and M. Yuan (2008). Composite quantile regression and the oracle model selection theory. The Annals of Statistics\/ 36\/ (3), 1108--1126

work page 2008

[75] [75]

Blasques, J

Artemova, M., F. Blasques, J. van Brummelen, and S. J. Koopman (2022). Score-driven models: M ethodology and theory. In Oxford Research Encyclopedia of Economics and Finance . Oxford University Press

work page 2022

[76] [76]

Blasques, F., S. J. Koopman, and A. Lucas (2015). Information-theoretic optimality of observation-driven time series models for continuous responses. Biometrika\/ 102\/ (2), 325--343

work page 2015

[77] [77]

Poznyak, A. (2008). Advanced mathematical tools for automatic control engineers: Deterministic techniques . Elsevier

work page 2008

[78] [78]

write newline

" write newline "" before.all 'output.state := FUNCTION article output.bibitem format.authors "author" output.check author format.key output output.year.check new.block format.title "title" output.check new.block crossref missing format.jour.vol output format.article.crossref output.nonnull format.pages output if new.block note output fin.entry FUNCTION b...

work page