pith. sign in

arxiv: 2512.16745 · v3 · submitted 2025-12-18 · 📊 stat.ME · econ.EM

Exponentially weighted estimands and the exponential family: Filtering, prediction and smoothing

Pith reviewed 2026-05-16 21:10 UTC · model grok-4.3

classification 📊 stat.ME econ.EM
keywords exponential familyfilteringsmoothingpredictiontime seriesexponentially weightedmaximum likelihoodrecursive estimation
0
0 comments X

The pith

Maximizing a discounted convex combination of the log-likelihood and its expected value produces exact linear filters, predictors and smoothers for the canonical exponential family.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces exponentially weighted estimands formed by maximizing a discounted mix of the observed log-likelihood and the expected log-likelihood. For time series belonging to the canonical exponential family this maximization yields exact filters, predictors and smoothers whose updates obey simple linear recursions. The approach supplies a complete theory for these recursions and demonstrates them on both simulated series and real data. A reader would care because the resulting procedures are computationally cheap, exact, and avoid the approximations common in sequential estimation for common statistical models.

Core claim

By replacing ordinary maximum-likelihood estimation with the maximization of a discounted convex combination of the log-likelihood and the corresponding expected log-likelihood, one obtains filters, predictors and smoothers for exponential-family time series. In the canonical case these objects satisfy exact linear recursions whose coefficients are simple functions of the natural parameter and the discount factor.

What carries the argument

The exponentially weighted estimand: the argmax of a discounted convex combination of the log-likelihood with its expectation under the current parameter.

If this is right

  • Exact linear recursions exist for filtering, one-step prediction and smoothing inside the canonical exponential family.
  • The recursions are driven only by the current observation, the previous estimate and the fixed discount factor.
  • The same construction supplies a consistent theory for the asymptotic behavior of these estimators under standard regularity conditions.
  • The procedures apply immediately to common models such as Poisson, Bernoulli and normal time series.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The linear structure may allow closed-form expressions for forecast intervals without simulation.
  • The discount factor acts as a tuning parameter that trades responsiveness against smoothness, suggesting systematic selection rules could be derived.
  • The framework could be tested on multivariate exponential-family series to see whether the linear recursions survive the vector case.

Load-bearing premise

Maximizing the proposed discounted convex combination of the log-likelihood and expected log-likelihood directly produces the desired filter, predictor and smoother.

What would settle it

Derive the closed-form linear recursion for a Poisson or Bernoulli series, run it on simulated data with known true parameters, and check whether the filtered estimates match the exact conditional means obtained by direct integration.

Figures

Figures reproduced from arXiv: 2512.16745 by Neil Shephard, Simon Donker van Heel.

Figure 1
Figure 1. Figure 1: Simulation from Yt |Y1:t−1 ∼ CEF(θe t|t−1, h, ψ) for time t = 5, ..., T = 2000 with discount parameter λ = 0.93. Top: anchoring parameter α = 0.7; bottom: α = 0.95. Columns: evolution of E[Y^t |Y1:t−1] = ψ ′ (θe t|t−1) in blue with observations Yt as gray circles. Exponential Gaussian (zero mean) Pareto α = 0.70 α = 0.95 [PITH_FULL_IMAGE:figures/full_fig_p018_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Simulation from Yt |Y1:t−1 ∼ CEF(θe t|t−1, h, ψ) for time t = 5, ..., T = 2000 with discount parameter λ = 0.93. Top: anchoring parameter α = 0.7; bottom: α = 0.95. Gray circles show observations Yt . Blue lines show the conditional expectation: E[Y^t |Y1:t−1] = ψ ′ (θe t|t−1) for the exponential, and E[Y^t |Y1:t−1] = θe t|t−1 θe t|t−1+1 for the Pareto (y-axis on log scale) when θe t|t−1 < −1 and ∞ otherwi… view at source ↗
Figure 3
Figure 3. Figure 3: Simulation from Yt |Y1:t−1 ∼ CEF(θe t|t−1, h, ψ) for time t = 5, ..., T = 2000 with dis￾count parameter λ = 0.93. Top: anchoring parameter α = 0.7; bottom: α = 0.95. Columns: evolution of the conditional expectation E[Y^t |Y1:t−1] (blue line) with observations Yt (gray circles) for the Beta, Gaussian (time-varying mean & variance), & von Mises distributions. For the Gaussian distribution, the red line show… view at source ↗
Figure 4
Figure 4. Figure 4: The product Ktnλ,t against t for q ∈ {0.001, 0.1, 0.3, 1, 2, 10}. This is the ratio of the Kalman filter’s weight on Yt (under diffuse initialization) to the EWMA weight on Yt , showing the impact of initial conditions. Values below 1 mean the Kalman filter places less weight on the current observation than the EWMA. All curves converge to 1 in steady state. Deviations from 1 are largest, and convergence t… view at source ↗
Figure 5
Figure 5. Figure 5: Household financial situation: Uni. Michigan Survey of Consumers (Jan 1978 to [PITH_FULL_IMAGE:figures/full_fig_p031_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Monte Carlo: quasi-likelihood estimator precision for household financial expec [PITH_FULL_IMAGE:figures/full_fig_p033_6.png] view at source ↗
read the original abstract

We propose using a discounted version of a convex combination of the log-likelihood with the corresponding expected log-likelihood such that when they are maximized they yield a filter, predictor and smoother for time series. This paper then focuses on working out the implications of this in the case of the canonical exponential family. The results are simple exact filters, predictors and smoothers with linear recursions. A theory for these models is developed and the models are illustrated on simulated and real data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a discounted convex combination of the log-likelihood and the corresponding expected log-likelihood, maximized to produce filters, predictors, and smoothers for time series. For members of the canonical exponential family, the resulting estimators admit exact linear recursions in the natural parameters. A supporting theory is developed and the methods are demonstrated on simulated and real data examples.

Significance. If the derivations hold, the work supplies a clean, exact recursive framework for exponential-family time series that incorporates forgetting via discounting while preserving linearity. This is a useful addition to the toolkit for sequential estimation, as it extends conjugacy-like behavior to non-stationary settings without requiring particle methods or approximations. The linear-recursion property is particularly valuable for implementation and theoretical analysis.

major comments (2)
  1. [§3.2] §3.2, Eq. (8)–(11): the central claim that the argmax of the discounted objective satisfies a linear recursion in the natural parameter is asserted but the derivation does not explicitly verify that the gradient of the expected-log-likelihood term remains affine in the sufficient statistic after the discount factor is introduced; a concrete expansion for at least one non-Gaussian member (e.g., Poisson) is needed to confirm the cancellation.
  2. [Theorem 2] Theorem 2 (smoother recursion): the backward recursion is presented as exact, yet the proof sketch relies on the same discounted conjugacy that is under scrutiny in the filter step; if the forward filter already contains an approximation, the smoother cannot be guaranteed exact without additional error bounds.
minor comments (2)
  1. [§2] Notation for the discount factor λ_t is introduced without a clear statement of whether it is time-varying or constant; consistency across filter/predictor/smoother sections would improve readability.
  2. [Figure 2] Figure 2 caption does not specify the sample size or the exact exponential-family member used in the simulation; adding these details would aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. The suggestions will help strengthen the presentation of the derivations. We address each major comment below.

read point-by-point responses
  1. Referee: [§3.2] §3.2, Eq. (8)–(11): the central claim that the argmax of the discounted objective satisfies a linear recursion in the natural parameter is asserted but the derivation does not explicitly verify that the gradient of the expected-log-likelihood term remains affine in the sufficient statistic after the discount factor is introduced; a concrete expansion for at least one non-Gaussian member (e.g., Poisson) is needed to confirm the cancellation.

    Authors: We agree that an explicit verification for a non-Gaussian member would improve clarity. The general argument in Section 3.2 relies on the fact that the gradient of the expected log-likelihood term is the difference between the observed and expected sufficient statistics, which remains affine in the natural parameter after discounting because the discount factor multiplies the entire term uniformly. In the revision we will add a concrete expansion for the Poisson case, showing the explicit cancellation in the score equation that yields the linear recursion for the natural parameter. revision: yes

  2. Referee: [Theorem 2] Theorem 2 (smoother recursion): the backward recursion is presented as exact, yet the proof sketch relies on the same discounted conjugacy that is under scrutiny in the filter step; if the forward filter already contains an approximation, the smoother cannot be guaranteed exact without additional error bounds.

    Authors: The forward filter is obtained exactly by maximizing the discounted objective; no approximation is introduced because the canonical exponential-family structure preserves the required conjugacy under uniform discounting. The smoother recursion is then derived exactly from the forward quantities via the same conjugacy. We will expand the proof of Theorem 2 to include an explicit inductive argument confirming that exactness propagates backward without error accumulation. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation derives linear recursions directly from the proposed discounted objective

full rationale

The paper introduces a discounted convex combination of the log-likelihood and expected log-likelihood as a new objective, then shows that its maximizer for canonical exponential family members yields exact linear recursions for filtering, prediction and smoothing. This is a constructive derivation from the stated objective rather than a re-expression of pre-fitted quantities or a self-citation chain. No load-bearing step reduces by construction to its own inputs; the linearity follows from the exponential-family structure under the proposed discounting. The abstract and theory section present this as an implication to be worked out, not as an assumption smuggled in via prior work by the same authors. External benchmarks (simulated and real data) are used for illustration rather than for fitting the core recursions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the unproven assertion that maximization of the discounted convex combination produces the stated filter/predictor/smoother; no free parameters or invented entities are mentioned in the abstract.

axioms (1)
  • domain assumption Maximizing the discounted convex combination of log-likelihood and expected log-likelihood yields a filter, predictor, and smoother.
    This is the core proposal stated in the abstract; its validity is required for all subsequent claims.

pith-pipeline@v0.9.0 · 5368 in / 1198 out tokens · 21917 ms · 2026-05-16T21:10:45.945951+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages

  1. [1]

    Abramowitz, M. and I. A. Stegun (1970).Handbook of Mathematical Functions. New York: Dover Publications Inc

  2. [2]

    Benjamin, M. A., R. A. Rigby, and D. M. Stasinopoulos (2003). Generalized autoregressive moving average model.Journal of the American Statistical Association 98, 214–223

  3. [3]

    Blasques, F., S. J. Koopman, M. Mallee, and Z. Zhang (2016). Weighted maximum likeli- hood for dynamic factor analysis and forecasting with mixed frequency data.Journal of Econometrics 193(2), 405–417

  4. [4]

    Bollerslev, T. (1986). Generalised autoregressive conditional heteroskedasticity.Journal of Econometrics 51, 307–327

  5. [5]

    Boyd, S. and L. Vandenberghe (2004).Convex Optimization. Cambridge University Press

  6. [6]

    Brown, B. M. (1971). Martingale central limit theorems.Annals of Mathematical Statis- tics 49, 59–66

  7. [7]

    Brown, R. G. (1956).Exponential smoothing for predicting demand. Cambridge, Mas- sachusetts: Author D Little, Inc

  8. [8]

    Cox, D. R. (1961). Tests of seperate families of hypotheses.Proceedings of the Berkeley Symposium 4, 105–123. 34

  9. [9]

    Creal, D., S. J. Koopman, and A. Lucas (2013). Generalized autoregressive score models with applications.Journal of Applied Econometrics 28, 777–795

  10. [10]

    Davis, R. A., K. Fokianos, S. H. Holan, H. Joe, J. Livsey, R. Lund, V. Pipiras, and N. Rav- ishanker (2021). Count time series: A methodological review.Journal of the American Statistical Association 116(535), 1533–1547

  11. [11]

    Dixon, M. J. and S. G. Coles (1997). Modelling association football scores and inefficiencies in the football betting market.Journal of the Royal Statistical Society: Series C (Applied Statistics) 46(2), 265–280

  12. [12]

    Durbin, J. and S. J. Koopman (2012).Time Series Analysis by State Space Methods(2 ed.). Oxford: Oxford University Press

  13. [13]

    (2012).Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction

    Efron, B. (2012).Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction. Cambridge University Press

  14. [14]

    (2022).Exponential Families in Theory and Practice

    Efron, B. (2022).Exponential Families in Theory and Practice. Cambridge University Press

  15. [15]

    Efron, B. and C. Morris (1977). Stein’s paradox in statistics.Scientific Americian 236, 119–127

  16. [16]

    Engle, R. F. (1982). Autoregressive conditional heteroskedasticity with estimates of the variance of the United Kingdom inflation.Econometrica 50, 987–1007

  17. [17]

    Engle, R. F. and J. Mezrich (1996). GARCH for groups.Risk, 36–40

  18. [18]

    Fan, J., N. E. Heckman, and M. P. Wand (1995). Local polynomial kernel regression for gen- eralized linear models and quasi-likelihood functions.Journal of the American Statistical Association 90, 141–150

  19. [19]

    Fan, J. and Q. Yao (2005).Nonlinear Time Series. New York: Springer

  20. [20]

    Horvath, and J.-M

    Francq, C., L. Horvath, and J.-M. Zako¨ ıan (2013). Merits and drawbacks of variance target- ing in garch models.Journal of Financial Econometrics 9, 619–656

  21. [21]

    Gallant, A. R. (1987).Nonlinear Statistical Models. New York: John Wiley

  22. [22]

    Harvey, A. C. (1989).Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge: Cambridge University Press

  23. [23]

    Harvey, A. C. (2013).Dynamic models for volatility and heavy tails: With applications to financial and economic time series. Cambridge University Press

  24. [24]

    Hoerl, A. E. and R. W. Kennard (1970). Ridge regression: Biased estimation for nonorthog- onal problems.Technometrics 12, 55––67

  25. [25]

    Holmes, C. C. and S. G. Walker (2017). Assigning a value to a power likelihood in a general bayesian model.Biometrika 104, 497–503

  26. [26]

    Hu, F. and J. V. Zidek (2002). The weighted likelihood.Canadian Journal of Statistics 30(3), 347–371

  27. [27]

    Huber, P. J. (1967). The behavior of maximum likelihood estimates under nonstandard conditions. InProceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 221–233. University of California Press. 35

  28. [28]

    Janson, S. (2021). A central limit theorem for m-dependent variables. Unpublished paper: Department of Mathematics, Uppsala University

  29. [29]

    Li, W. K. (1994). Time series models based on generalized linear models: Some further results.Biometrics 50, 506–511

  30. [30]

    Luxenberg, E. and S. Boyd (2024). Exponentially weighted moving models. Unpublished paper: Stanford University

  31. [31]

    McCullagh, P. and J. A. Nelder (1989).Generalized Linear Models(2 ed.). London: Chap- man & Hall

  32. [32]

    Newey, W. K. and D. McFadden (1994). Large sample estimation and hypothesis testing. In R. F. Engle and D. McFadden (Eds.),The Handbook of Econometrics, Volume 4, pp. 2111–2245. North-Holland

  33. [33]

    Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso.Journal of the Royal Statistical Society, Series B 58, 267–288

  34. [34]

    Wedderburn, R. W. M. (1974). Quasi-likelihood functions, generalized linear models and the Gauss-Newton methods.Biometrika 61, 439–47

  35. [35]

    White, H. (1982). Maximum likelihood estimation of misspecified models.Econometrica 50, 1–25

  36. [36]

    Zeger, S. L. and B. Qaqish (1988). Markov regression models for time series, a quasi likelihood approach.Biometrics 44, 1019–1032. 36