Implicit score-driven filters for time-varying parameter models
Pith reviewed 2026-05-17 02:20 UTC · model grok-4.3
The pith
Implicit score-driven filters remain stable for all learning rates when observation densities are log-concave.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that the implicit stochastic-gradient update, obtained by maximizing the logarithmic observation density subject to a quadratic penalty relative to the predicted parameter, produces a filter whose updates are contractive in mean squared error toward the (pseudo-)true parameter at every time step and remain stable for all learning rates, provided the observation densities are log-concave.
What carries the argument
The implicit score-driven (ISD) update, defined as the maximizer of the current log observation density minus a weighted L2 penalty on the distance to the one-step-ahead predicted parameter.
If this is right
- Explicit score-driven models arise exactly when the log-density is replaced by its linear approximation around the prediction.
- The ISD filter extends the local contraction properties of explicit updates to a global setting that holds for arbitrary step sizes.
- The same stability and contraction results apply under misspecification as long as log-concavity is preserved.
- Finance and macroeconomics applications can use the filter to track time-varying parameters with explicit global guarantees.
Where Pith is reading between the lines
- Practitioners could safely adopt larger learning rates in real-time applications without risking filter instability.
- The implicit formulation may generalize to other density classes if a suitable contraction mapping can be established.
- The method aligns score-driven filtering more closely with implicit gradient techniques used in optimization.
- Performance gains over explicit approximations are most likely to appear in series with strong non-linearities or volatility clusters.
Load-bearing premise
The observation densities are log-concave so that the implicit update is well-defined and the contraction argument applies at every step.
What would settle it
A simulation or empirical series with a non-log-concave observation density in which raising the learning rate causes the parameter path to diverge or the mean squared error to stop contracting toward the pseudo-true value.
Figures
read the original abstract
We propose an observation-driven modeling framework that allows model parameters to vary over time through an implicit score-driven (ISD) update. The ISD update maximizes the logarithmic observation density with respect to the parameter vector while penalizing the weighted L2 norm relative to a one-step-ahead predicted parameter. This yields an implicit stochastic-gradient update. We show that the popular class of explicit score-driven (ESD) models arises when the observation log density is linearly approximated around the prediction. By preserving the full density, the ISD update extends the favorable local properties of the ESD update to a global setting. For log-concave observation densities, whether correctly specified or not, the ISD filter is stable for all learning rates, and its updates are contractive in mean squared error toward the (pseudo-)true parameter at every time step. We demonstrate the usefulness of ISD filters in simulations and empirical applications in finance and macroeconomics.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes implicit score-driven (ISD) filters for time-varying parameter models. The ISD update is obtained by maximizing the log observation density penalized by a weighted L2 distance to the one-step-ahead prediction, producing an implicit stochastic-gradient step. The paper shows that explicit score-driven (ESD) models arise exactly when the log-density is linearly approximated around the prediction. For log-concave observation densities (correctly specified or misspecified), it claims that the ISD filter is stable for all learning rates and that the updates are contractive in mean squared error toward the (pseudo-)true parameter at every time step. Simulations and empirical applications in finance and macroeconomics are presented to illustrate the approach.
Significance. If the stability and contraction results can be established in a form that accounts for irreducible observation noise, the ISD framework would usefully extend the local properties of score-driven models to a global setting while preserving the link to the well-studied ESD class. The reduction of ISD to ESD under linear approximation is a clear presentational strength.
major comments (1)
- [Abstract] Abstract: the claim that 'its updates are contractive in mean squared error toward the (pseudo-)true parameter at every time step' is contradicted by the elementary Gaussian location model, which is log-concave. The closed-form ISD update yields the conditional expectation E[||θ_new − θ*||² | θ_pred] = [λ/(λ+1)]² ||θ_pred − θ*||² + 1/(λ+1)². The additive positive term implies that the expected squared error can increase when ||θ_pred − θ*|| is smaller than 1/√(2λ+1), so the stated unconditional contraction does not hold for all initial distances. This is load-bearing for the global stability result.
minor comments (1)
- The abstract refers to 'simulations and empirical applications' without indicating the relevant sections or tables, making it harder to evaluate the scope and design of the numerical evidence.
Simulated Author's Rebuttal
We thank the referee for the careful reading and for identifying the need to refine the wording on the contraction property. We address the comment below and will make the corresponding revisions.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that 'its updates are contractive in mean squared error toward the (pseudo-)true parameter at every time step' is contradicted by the elementary Gaussian location model, which is log-concave. The closed-form ISD update yields the conditional expectation E[||θ_new − θ*||² | θ_pred] = [λ/(λ+1)]² ||θ_pred − θ*||² + 1/(λ+1)². The additive positive term implies that the expected squared error can increase when ||θ_pred − θ*|| is smaller than 1/√(2λ+1), so the stated unconditional contraction does not hold for all initial distances. This is load-bearing for the global stability result.
Authors: We thank the referee for this precise observation. The calculation for the Gaussian location model is correct: the conditional expected squared error takes the form a·d + b with a = [λ/(λ+1)]² < 1 and b = 1/(λ+1)² > 0. Consequently, the MSE to the pseudo-true parameter does not decrease at every step when the current distance d is sufficiently small. We agree that the original abstract wording claiming unconditional contraction in MSE at every time step is imprecise. The global stability result, however, rests on the fact that the linear contraction factor a is strictly less than one for any λ > 0; this ensures that the recursion remains bounded and converges in expectation to a neighborhood of the pseudo-true parameter whose size is controlled by b, even from arbitrary initial conditions. We will revise the abstract to state that the ISD filter is stable for all learning rates and that the updates are contractive in mean squared error toward the (pseudo-)true parameter up to an irreducible observation-noise term. We will also add a short remark (with the Gaussian example) in the theoretical section to clarify the distinction between strict contraction and contraction-plus-bounded-noise. These changes preserve the validity of the stability theorems while accurately describing the MSE dynamics. revision: yes
Circularity Check
No circularity; derivation is self-contained from optimization and log-concavity
full rationale
The ISD update is defined directly as the argmax of the penalized log-density objective. Stability and MSE contraction are derived as theorems from the strict concavity guaranteed by log-concave densities, without reducing the target claim to a fitted parameter, a self-citation chain, or an input quantity by construction. The reduction to ESD under linear approximation is an explicit limiting case shown from the same objective, not a renaming or smuggling of prior results. The paper's central claims therefore retain independent mathematical content under the stated assumptions.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Observation densities are log-concave
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The ISD update maximizes the logarithmic observation density with respect to the parameter vector while penalizing the weighted L2 norm relative to a one-step-ahead predicted parameter... for log-concave observation densities... updates are contractive in mean squared error toward the (pseudo-)true parameter at every time step (Theorem 2, Corollary 1).
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Assumption 5 (Log-concave observation density) logp(yt|θ) + αt/2 ∥θ∥² is concave...
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Adrian, T., N. Boyarchenko, and D. Giannone (2019). Vulnerable growth. American Economic Review\/ 109\/ (4), 1263--89
work page 2019
-
[2]
Akyildiz, \"O . D., E. Chouzenoux, V. Elvira, and J. M \'i guez (2019). A probabilistic incremental proximal gradient method. IEEE Signal Processing Letters\/ 26\/ (8), 1257--1261
work page 2019
-
[3]
Amari, S.-i. (1993). Backpropagation and stochastic gradient descent method. Neurocomputing\/ 5\/ (4-5), 185--196
work page 1993
-
[4]
Anderson, B. D. and J. B. Moore (2012). Optimal filtering . Prentice-Hall
work page 2012
-
[5]
Artemova, M., F. Blasques, J. van Brummelen, and S. J. Koopman (2022a). Score-driven models: M ethodology and theory. In Oxford Research Encyclopedia of Economics and Finance . Oxford University Press
-
[6]
Artemova, M., F. Blasques, J. van Brummelen, and S. J. Koopman (2022b). Score-driven models: Methods and applications . In Oxford Research Encyclopedia of Economics and Finance . Oxford University Press
-
[7]
Asi, H. and J. C. Duchi (2019). Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization\/ 29\/ (3), 2257--2290
work page 2019
-
[8]
Bauschke, H. H., J. M. Borwein, and P. L. Combettes (2003). Bregman monotone optimization algorithms. SIAM Journal on Control and Optimization\/ 42\/ (2), 596--636
work page 2003
-
[9]
Benveniste, A., M. M \'e tivier, and P. Priouret (2012). Adaptive algorithms and stochastic approximations . Springer
work page 2012
-
[10]
Benveniste, A. and G. Ruget (2003). A measure of the tracking capability of recursive stochastic algorithms with constant gains. IEEE Transactions on Automatic Control\/ 27\/ (3), 639--649
work page 2003
-
[11]
Bertsekas, D. P. (1996). Incremental least squares methods and the extended K alman filter. SIAM Journal on Optimization\/ 6\/ (3), 807--822
work page 1996
-
[12]
Beutner, E. A., Y. Lin, and A. Lucas (2023). Consistency, distributional convergence, and optimality of score-driven filters. Preprint\/ . https://papers.tinbergen.nl/23051.pdf
work page 2023
-
[13]
Bianchi, P. (2016). Ergodic convergence of a stochastic proximal point algorithm. SIAM Journal on Optimization\/ 26\/ (4), 2235--2260
work page 2016
-
[14]
Bierman, G. J. (1977). Factorization methods for discrete sequential estimation . Academic Press
work page 1977
- [15]
-
[16]
Blasques, F., J. van Brummelen, S. J. Koopman, and A. Lucas (2022). Maximum likelihood estimation for score-driven models. Journal of Econometrics\/ 227\/ (2), 325--346
work page 2022
-
[17]
Lyapunov Theory for Discrete Time Systems
Bof, N., R. Carli, and L. Schenato (2018). Lyapunov theory for discrete time systems. arXiv preprint arXiv:1809.05289\/
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[18]
Bottou, L. (2012). Stochastic gradient descent tricks. In G. Montavon, G. Orr, and K.-R. M \"u ller (Eds.), Neural networks: Tricks of the trade , pp.\ 421--436. Springer
work page 2012
-
[19]
Bougerol, P. (1993). Kalman filtering with random coefficients and contractions. SIAM Journal on Control and Optimization\/ 31\/ (4), 942--959
work page 1993
-
[20]
Boyd, S. and L. Vandenberghe (2004). Convex optimization . Cambridge University Press
work page 2004
-
[21]
Brandt, A. (1986). The stochastic equation Y_ n+ 1 = A_n\,Y_n+ B_n with stationary coefficients . Advances in Applied Probability\/ 18\/ (1), 211--220
work page 1986
-
[22]
Caivano, M., A. C. Harvey, and A. Luati (2016). Robust time series models with trend and seasonal components. SERIEs\/ 7 , 99--120
work page 2016
-
[23]
Cesa-Bianchi, N. and F. Orabona (2021). Online learning algorithms. Annual Review of Statistics and Its Application\/ 8 , 165--190
work page 2021
-
[24]
Chopin, N. and O. Papaspiliopoulos (2020). An introduction to sequential M onte C arlo . Springer
work page 2020
-
[25]
Creal, D., S. J. Koopman, and A. Lucas (2013). Generalized autoregressive score models with applications. Journal of Applied Econometrics\/ 28\/ (5), 777--795
work page 2013
-
[26]
Creal, D., B. Schwaab, S. J. Koopman, and A. Lucas (2014). Observation-driven mixed-measurement dynamic factor models with an application to credit risk. Review of Economics and Statistics\/ 96\/ (5), 898--915
work page 2014
-
[27]
Diniz, P. S. (1997). Adaptive filtering . Springer
work page 1997
-
[28]
Engle, R. F. (2002). Dynamic conditional correlation: A simple class of multivariate generalized autoregressive conditional heteroskedasticity models. Journal of Business & Economic Statistics\/ 20\/ (3), 339--350
work page 2002
-
[29]
Engle, R. F. and S. Manganelli (2004). CAViaR: Conditional autoregressive value at risk by regression quantiles . Journal of Business & Economic Statistics\/ 22\/ (4), 367--381
work page 2004
-
[30]
Fearnhead, P. and L. Meligkotsidou (2004). Exact filtering for partially observed continuous time models. Journal of the Royal Statistical Society Series B: Statistical Methodology\/ 66\/ (3), 771--789
work page 2004
-
[31]
Geraci, M. and M. Bottai (2007). Quantile regression for longitudinal data using the asymmetric L aplace distribution. Biostatistics\/ 8\/ (1), 140--154
work page 2007
-
[32]
Gorgi, P. (2020). Beta--negative binomial auto-regressions for modelling integer-valued time series with extreme observations. Journal of the Royal Statistical Society Series B: Statistical Methodology\/ 82\/ (5), 1325--1347
work page 2020
-
[33]
Grimmer, B., H. Lu, P. Worah, and V. Mirrokni (2023). The landscape of the proximal point method for nonconvex--nonconcave minimax optimization. Mathematical Programming\/ 201\/ (1), 373--407
work page 2023
-
[34]
Hare, W. and C. Sagastiz \'a bal (2009). Computing proximal points of nonconvex functions. Mathematical Programming\/ 116\/ (1), 221--258
work page 2009
-
[35]
Harvey, A. C. (2013). Dynamic models for volatility and heavy tails: W ith applications to financial and economic time series . Cambridge University Press
work page 2013
-
[36]
Harvey, A. C. (2022). Score-driven time series models. Annual Review of Statistics and Its Application\/ 9 , 321--342
work page 2022
-
[37]
Harvey, A. C. and R.-J. Lange (2017). Volatility modeling with a generalized t distribution. Journal of Time Series Analysis\/ 38\/ (2), 175--190
work page 2017
-
[38]
Harvey, A. C. and R.-J. Lange (2018). Modeling the interactions between volatility and returns using EGARCH-M . Journal of Time Series Analysis\/ 39\/ (6), 909--919
work page 2018
-
[39]
Harvey, A. C. and A. Luati (2014). Filtering with heavy tails. Journal of the American Statistical Association\/ 109\/ (507), 1112--1122
work page 2014
-
[40]
Jagannathan, R. and Z. Wang (1996). The conditional CAPM and the cross-section of expected returns . The Journal of Finance\/ 51\/ (1), 3--53
work page 1996
-
[41]
Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. Journal of Basic Engineering\/ 82\/ (1), 35--45
work page 1960
-
[42]
Koenker, R. and G. Bassett (1978). Regression quantiles. Econometrica: Journal of the Econometric Society\/ 46 , 33--50
work page 1978
-
[43]
Koenker, R. and K. F. Hallock (2001). Quantile regression. Journal of Economic Perspectives\/ 15\/ (4), 143--156
work page 2001
-
[44]
Koenker, R. and J. A. Machado (1999). Goodness of fit and related inference processes for quantile regression. Journal of the American Statistical Association\/ 94\/ (448), 1296--1310
work page 1999
-
[45]
Koopman, S. J., A. Lucas, and M. Scharth (2016). Predicting time-varying parameters with parameter-driven and observation-driven models. The Review of Economics and Statistics\/ 98\/ (1), 97--110
work page 2016
-
[46]
Castellanos P \'e rez-Bolde, C
Koyama, S., L. Castellanos P \'e rez-Bolde, C. R. Shalizi, and R. E. Kass (2010). Approximate methods for state-space models. Journal of the American Statistical Association\/ 105\/ (489), 170--180
work page 2010
-
[47]
Krengel, U. (1985). Ergodic theorems . Walter de Gruyter
work page 1985
-
[48]
Kulis, B. and P. L. Bartlett (2010). Implicit online learning. Proceedings of the 27th International Conference on Machine Learning\/ , 575--582
work page 2010
-
[49]
Kushner, H. (2010). Stochastic approximation: A survey. Wiley Interdisciplinary Reviews: Computational Statistics\/ 2\/ (1), 87--96
work page 2010
-
[50]
Kushner, H. and J. Yang (2002). Analysis of adaptive step-size sa algorithms for parameter tracking. IEEE Transactions on Automatic Control\/ 40\/ (8), 1403--1410
work page 2002
-
[51]
Lange, R.-J. (2024a). Bellman filtering and smoothing for state--space models. Journal of Econometrics\/ 238\/ (2), 105632
- [52]
-
[53]
Ljung, L. (1977). Analysis of recursive stochastic algorithms. IEEE Transactions on Automatic Control\/ 22\/ (4), 551--575
work page 1977
-
[54]
Nagumo, J.-I. and A. Noda (1967). A learning method for system identification. IEEE Transactions on Automatic Control\/ 12\/ (3), 282--287
work page 1967
-
[55]
Nesterov, Y. (2018). Lectures on convex optimization . Springer
work page 2018
- [56]
-
[57]
Orabona, F. (2019). A modern introduction to online learning. Preprint arXiv:1912.13213\/
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[58]
Parikh, N. and S. Boyd (2014). Proximal algorithms. Foundations and Trends in Optimization\/ 1\/ (3), 127--239
work page 2014
-
[59]
Patrascu, A. and I. Necoara (2018). Nonasymptotic convergence of stochastic proximal point methods for constrained convex optimization. The Journal of Machine Learning Research\/ 18\/ (1), 7204--7245
work page 2018
-
[60]
Polson, N. G., J. G. Scott, and B. T. Willard (2015). Proximal algorithms in statistics and machine learning. Statistical Science\/ 30\/ (4), 559 -- 581
work page 2015
-
[61]
Robbins, H. and S. Monro (1951). A stochastic approximation method. The Annals of Mathematical Statistics\/ 22\/ (3), 400--407
work page 1951
-
[62]
Rockafellar, R. T. (1976). Monotone operators and the proximal point algorithm. SIAM Journal on Control and Optimization\/ 14\/ (5), 877--898
work page 1976
-
[63]
Rue, H., S. Martino, and N. Chopin (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations . Journal of the Royal Statistical Society Series B: Statistical Methodology\/ 71\/ (2), 319--392
work page 2009
-
[64]
Ryu, E. K. and S. Boyd (2016). Stochastic proximal iteration: A non-asymptotic improvement upon stochastic gradient descent. Author website https://web.stanford.edu/ boyd/papers/pdf/spi.pdf\/
work page 2016
-
[65]
Simonetto, A. and P. Massioni (2024). Nonlinear optimization filters for stochastic time-varying convex optimization. International Journal of Robust and Nonlinear Control\/ 34\/ (12), 8065--8089
work page 2024
-
[66]
Stock, J. H. and M. W. Watson (1996). Evidence on structural instability in macroeconomic time series relations. Journal of Business & Economic Statistics\/ 14\/ (1), 11--30
work page 1996
-
[67]
Straumann, D. and T. Mikosch (2006). Quasi-maximum-likelihood estimation in conditionally heteroscedastic time series: A stochastic recurrence equations approach. The Annals of Statistics\/ 34\/ (5), 2449--2495
work page 2006
-
[68]
Ter \"a svirta, T. (2009). An introduction to univariate GARCH models . In T. G. Andersen, R. A. Davis, J.-P. Krei , and T. V. Mikosch (Eds.), Handbook of financial time series , pp.\ 17--42. Springer
work page 2009
-
[69]
Toulis, P. and E. M. Airoldi (2015). Scalable estimation strategies based on stochastic approximations: Classical results and new insights. Statistics and Computing\/ 25\/ (4), 781--795
work page 2015
-
[70]
Toulis, P. and E. M. Airoldi (2017). Asymptotic and finite-sample properties of estimators based on stochastic gradients. The Annals of Statistics\/ 45\/ (4), 1694--1727
work page 2017
-
[71]
Toulis, P., T. Horel, and E. M. Airoldi (2021). The proximal Robbins-Monro method . Journal of the Royal Statistical Society Series B: Statistical Methodology\/ 83\/ (1), 188--212
work page 2021
-
[72]
Toulis, P., D. Tran, and E. M. Airoldi (2016). Towards stability and optimality in stochastic gradient descent. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics\/ 51 , 1290--1298
work page 2016
-
[73]
Wu, L. and W. J. Su (2023). The implicit regularization of dynamical stability in stochastic gradient descent. In International Conference on Machine Learning , pp.\ 37656--37684. PMLR
work page 2023
-
[74]
Zou, H. and M. Yuan (2008). Composite quantile regression and the oracle model selection theory. The Annals of Statistics\/ 36\/ (3), 1108--1126
work page 2008
-
[75]
Artemova, M., F. Blasques, J. van Brummelen, and S. J. Koopman (2022). Score-driven models: M ethodology and theory. In Oxford Research Encyclopedia of Economics and Finance . Oxford University Press
work page 2022
-
[76]
Blasques, F., S. J. Koopman, and A. Lucas (2015). Information-theoretic optimality of observation-driven time series models for continuous responses. Biometrika\/ 102\/ (2), 325--343
work page 2015
-
[77]
Poznyak, A. (2008). Advanced mathematical tools for automatic control engineers: Deterministic techniques . Elsevier
work page 2008
-
[78]
" write newline "" before.all 'output.state := FUNCTION article output.bibitem format.authors "author" output.check author format.key output output.year.check new.block format.title "title" output.check new.block crossref missing format.jour.vol output format.article.crossref output.nonnull format.pages output if new.block note output fin.entry FUNCTION b...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.