Exponentially weighted estimands and the exponential family: Filtering, prediction and smoothing
Pith reviewed 2026-05-16 21:10 UTC · model grok-4.3
The pith
Maximizing a discounted convex combination of the log-likelihood and its expected value produces exact linear filters, predictors and smoothers for the canonical exponential family.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By replacing ordinary maximum-likelihood estimation with the maximization of a discounted convex combination of the log-likelihood and the corresponding expected log-likelihood, one obtains filters, predictors and smoothers for exponential-family time series. In the canonical case these objects satisfy exact linear recursions whose coefficients are simple functions of the natural parameter and the discount factor.
What carries the argument
The exponentially weighted estimand: the argmax of a discounted convex combination of the log-likelihood with its expectation under the current parameter.
If this is right
- Exact linear recursions exist for filtering, one-step prediction and smoothing inside the canonical exponential family.
- The recursions are driven only by the current observation, the previous estimate and the fixed discount factor.
- The same construction supplies a consistent theory for the asymptotic behavior of these estimators under standard regularity conditions.
- The procedures apply immediately to common models such as Poisson, Bernoulli and normal time series.
Where Pith is reading between the lines
- The linear structure may allow closed-form expressions for forecast intervals without simulation.
- The discount factor acts as a tuning parameter that trades responsiveness against smoothness, suggesting systematic selection rules could be derived.
- The framework could be tested on multivariate exponential-family series to see whether the linear recursions survive the vector case.
Load-bearing premise
Maximizing the proposed discounted convex combination of the log-likelihood and expected log-likelihood directly produces the desired filter, predictor and smoother.
What would settle it
Derive the closed-form linear recursion for a Poisson or Bernoulli series, run it on simulated data with known true parameters, and check whether the filtered estimates match the exact conditional means obtained by direct integration.
Figures
read the original abstract
We propose using a discounted version of a convex combination of the log-likelihood with the corresponding expected log-likelihood such that when they are maximized they yield a filter, predictor and smoother for time series. This paper then focuses on working out the implications of this in the case of the canonical exponential family. The results are simple exact filters, predictors and smoothers with linear recursions. A theory for these models is developed and the models are illustrated on simulated and real data.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a discounted convex combination of the log-likelihood and the corresponding expected log-likelihood, maximized to produce filters, predictors, and smoothers for time series. For members of the canonical exponential family, the resulting estimators admit exact linear recursions in the natural parameters. A supporting theory is developed and the methods are demonstrated on simulated and real data examples.
Significance. If the derivations hold, the work supplies a clean, exact recursive framework for exponential-family time series that incorporates forgetting via discounting while preserving linearity. This is a useful addition to the toolkit for sequential estimation, as it extends conjugacy-like behavior to non-stationary settings without requiring particle methods or approximations. The linear-recursion property is particularly valuable for implementation and theoretical analysis.
major comments (2)
- [§3.2] §3.2, Eq. (8)–(11): the central claim that the argmax of the discounted objective satisfies a linear recursion in the natural parameter is asserted but the derivation does not explicitly verify that the gradient of the expected-log-likelihood term remains affine in the sufficient statistic after the discount factor is introduced; a concrete expansion for at least one non-Gaussian member (e.g., Poisson) is needed to confirm the cancellation.
- [Theorem 2] Theorem 2 (smoother recursion): the backward recursion is presented as exact, yet the proof sketch relies on the same discounted conjugacy that is under scrutiny in the filter step; if the forward filter already contains an approximation, the smoother cannot be guaranteed exact without additional error bounds.
minor comments (2)
- [§2] Notation for the discount factor λ_t is introduced without a clear statement of whether it is time-varying or constant; consistency across filter/predictor/smoother sections would improve readability.
- [Figure 2] Figure 2 caption does not specify the sample size or the exact exponential-family member used in the simulation; adding these details would aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments on our manuscript. The suggestions will help strengthen the presentation of the derivations. We address each major comment below.
read point-by-point responses
-
Referee: [§3.2] §3.2, Eq. (8)–(11): the central claim that the argmax of the discounted objective satisfies a linear recursion in the natural parameter is asserted but the derivation does not explicitly verify that the gradient of the expected-log-likelihood term remains affine in the sufficient statistic after the discount factor is introduced; a concrete expansion for at least one non-Gaussian member (e.g., Poisson) is needed to confirm the cancellation.
Authors: We agree that an explicit verification for a non-Gaussian member would improve clarity. The general argument in Section 3.2 relies on the fact that the gradient of the expected log-likelihood term is the difference between the observed and expected sufficient statistics, which remains affine in the natural parameter after discounting because the discount factor multiplies the entire term uniformly. In the revision we will add a concrete expansion for the Poisson case, showing the explicit cancellation in the score equation that yields the linear recursion for the natural parameter. revision: yes
-
Referee: [Theorem 2] Theorem 2 (smoother recursion): the backward recursion is presented as exact, yet the proof sketch relies on the same discounted conjugacy that is under scrutiny in the filter step; if the forward filter already contains an approximation, the smoother cannot be guaranteed exact without additional error bounds.
Authors: The forward filter is obtained exactly by maximizing the discounted objective; no approximation is introduced because the canonical exponential-family structure preserves the required conjugacy under uniform discounting. The smoother recursion is then derived exactly from the forward quantities via the same conjugacy. We will expand the proof of Theorem 2 to include an explicit inductive argument confirming that exactness propagates backward without error accumulation. revision: yes
Circularity Check
No significant circularity; derivation derives linear recursions directly from the proposed discounted objective
full rationale
The paper introduces a discounted convex combination of the log-likelihood and expected log-likelihood as a new objective, then shows that its maximizer for canonical exponential family members yields exact linear recursions for filtering, prediction and smoothing. This is a constructive derivation from the stated objective rather than a re-expression of pre-fitted quantities or a self-citation chain. No load-bearing step reduces by construction to its own inputs; the linearity follows from the exponential-family structure under the proposed discounting. The abstract and theory section present this as an implication to be worked out, not as an assumption smuggled in via prior work by the same authors. External benchmarks (simulated and real data) are used for illustration rather than for fitting the core recursions.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Maximizing the discounted convex combination of log-likelihood and expected log-likelihood yields a filter, predictor, and smoother.
Reference graph
Works this paper leans on
-
[1]
Abramowitz, M. and I. A. Stegun (1970).Handbook of Mathematical Functions. New York: Dover Publications Inc
work page 1970
-
[2]
Benjamin, M. A., R. A. Rigby, and D. M. Stasinopoulos (2003). Generalized autoregressive moving average model.Journal of the American Statistical Association 98, 214–223
work page 2003
-
[3]
Blasques, F., S. J. Koopman, M. Mallee, and Z. Zhang (2016). Weighted maximum likeli- hood for dynamic factor analysis and forecasting with mixed frequency data.Journal of Econometrics 193(2), 405–417
work page 2016
-
[4]
Bollerslev, T. (1986). Generalised autoregressive conditional heteroskedasticity.Journal of Econometrics 51, 307–327
work page 1986
-
[5]
Boyd, S. and L. Vandenberghe (2004).Convex Optimization. Cambridge University Press
work page 2004
-
[6]
Brown, B. M. (1971). Martingale central limit theorems.Annals of Mathematical Statis- tics 49, 59–66
work page 1971
-
[7]
Brown, R. G. (1956).Exponential smoothing for predicting demand. Cambridge, Mas- sachusetts: Author D Little, Inc
work page 1956
-
[8]
Cox, D. R. (1961). Tests of seperate families of hypotheses.Proceedings of the Berkeley Symposium 4, 105–123. 34
work page 1961
-
[9]
Creal, D., S. J. Koopman, and A. Lucas (2013). Generalized autoregressive score models with applications.Journal of Applied Econometrics 28, 777–795
work page 2013
-
[10]
Davis, R. A., K. Fokianos, S. H. Holan, H. Joe, J. Livsey, R. Lund, V. Pipiras, and N. Rav- ishanker (2021). Count time series: A methodological review.Journal of the American Statistical Association 116(535), 1533–1547
work page 2021
-
[11]
Dixon, M. J. and S. G. Coles (1997). Modelling association football scores and inefficiencies in the football betting market.Journal of the Royal Statistical Society: Series C (Applied Statistics) 46(2), 265–280
work page 1997
-
[12]
Durbin, J. and S. J. Koopman (2012).Time Series Analysis by State Space Methods(2 ed.). Oxford: Oxford University Press
work page 2012
-
[13]
(2012).Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction
Efron, B. (2012).Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction. Cambridge University Press
work page 2012
-
[14]
(2022).Exponential Families in Theory and Practice
Efron, B. (2022).Exponential Families in Theory and Practice. Cambridge University Press
work page 2022
-
[15]
Efron, B. and C. Morris (1977). Stein’s paradox in statistics.Scientific Americian 236, 119–127
work page 1977
-
[16]
Engle, R. F. (1982). Autoregressive conditional heteroskedasticity with estimates of the variance of the United Kingdom inflation.Econometrica 50, 987–1007
work page 1982
-
[17]
Engle, R. F. and J. Mezrich (1996). GARCH for groups.Risk, 36–40
work page 1996
-
[18]
Fan, J., N. E. Heckman, and M. P. Wand (1995). Local polynomial kernel regression for gen- eralized linear models and quasi-likelihood functions.Journal of the American Statistical Association 90, 141–150
work page 1995
-
[19]
Fan, J. and Q. Yao (2005).Nonlinear Time Series. New York: Springer
work page 2005
-
[20]
Francq, C., L. Horvath, and J.-M. Zako¨ ıan (2013). Merits and drawbacks of variance target- ing in garch models.Journal of Financial Econometrics 9, 619–656
work page 2013
-
[21]
Gallant, A. R. (1987).Nonlinear Statistical Models. New York: John Wiley
work page 1987
-
[22]
Harvey, A. C. (1989).Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge: Cambridge University Press
work page 1989
-
[23]
Harvey, A. C. (2013).Dynamic models for volatility and heavy tails: With applications to financial and economic time series. Cambridge University Press
work page 2013
-
[24]
Hoerl, A. E. and R. W. Kennard (1970). Ridge regression: Biased estimation for nonorthog- onal problems.Technometrics 12, 55––67
work page 1970
-
[25]
Holmes, C. C. and S. G. Walker (2017). Assigning a value to a power likelihood in a general bayesian model.Biometrika 104, 497–503
work page 2017
-
[26]
Hu, F. and J. V. Zidek (2002). The weighted likelihood.Canadian Journal of Statistics 30(3), 347–371
work page 2002
-
[27]
Huber, P. J. (1967). The behavior of maximum likelihood estimates under nonstandard conditions. InProceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 221–233. University of California Press. 35
work page 1967
-
[28]
Janson, S. (2021). A central limit theorem for m-dependent variables. Unpublished paper: Department of Mathematics, Uppsala University
work page 2021
-
[29]
Li, W. K. (1994). Time series models based on generalized linear models: Some further results.Biometrics 50, 506–511
work page 1994
-
[30]
Luxenberg, E. and S. Boyd (2024). Exponentially weighted moving models. Unpublished paper: Stanford University
work page 2024
-
[31]
McCullagh, P. and J. A. Nelder (1989).Generalized Linear Models(2 ed.). London: Chap- man & Hall
work page 1989
-
[32]
Newey, W. K. and D. McFadden (1994). Large sample estimation and hypothesis testing. In R. F. Engle and D. McFadden (Eds.),The Handbook of Econometrics, Volume 4, pp. 2111–2245. North-Holland
work page 1994
-
[33]
Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso.Journal of the Royal Statistical Society, Series B 58, 267–288
work page 1996
-
[34]
Wedderburn, R. W. M. (1974). Quasi-likelihood functions, generalized linear models and the Gauss-Newton methods.Biometrika 61, 439–47
work page 1974
-
[35]
White, H. (1982). Maximum likelihood estimation of misspecified models.Econometrica 50, 1–25
work page 1982
-
[36]
Zeger, S. L. and B. Qaqish (1988). Markov regression models for time series, a quasi likelihood approach.Biometrics 44, 1019–1032. 36
work page 1988
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.