pith. sign in

arxiv: 2309.08517 · v3 · submitted 2023-09-15 · 🧮 math.PR · stat.CO

On the Forgetting of Particle Filters

Pith reviewed 2026-05-24 06:43 UTC · model grok-4.3

classification 🧮 math.PR stat.CO
keywords particle filtersforgettingexponential mixingFeynman-Kac modelsconditional particle filterpropagation of chaos
0
0 comments X

The pith

Under a mixing condition the particle filter forgets its initial state after O(log N) steps.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper treats the collection of N particles in a particle filter as the state of a Markov chain and shows that, when the underlying Feynman-Kac model satisfies a strong mixing assumption, this chain is exponentially ergodic. Consequently the distribution of the particle system converges to its unique stationary distribution at a rate that makes the dependence on the arbitrary initial condition decay to a negligible level after only O(log N) algorithm steps. An explicit example demonstrates that the logarithmic rate cannot be improved in general. The same exponential mixing result is proved for the conditional particle filter, supported by new time-uniform L^p error bounds.

Core claim

When the Feynman-Kac model satisfies a strong mixing assumption, the particle filter viewed as a Markov chain on the space of N-particle configurations is exponentially mixing and therefore forgets its initial configuration after O(log N) resampling-mutation steps; the same conclusion holds for the conditional particle filter.

What carries the argument

the particle filter regarded as a Markov chain whose transition kernel inherits exponential contraction from the strong mixing assumption on the underlying Feynman-Kac model

If this is right

  • The number of steps needed to erase initialization dependence scales only logarithmically rather than exponentially in the number of particles.
  • The conditional particle filter likewise forgets its starting state in O(log N) steps once equipped with the new uniform error estimates.
  • Propagation-of-chaos bounds obtained via the same contraction arguments become time-uniform.
  • The improved mixing rate supports more efficient coupling constructions between independent particle filters.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The O(log N) forgetting time may allow practitioners to restart or reinitialize particle filters more frequently without paying an exponential penalty in computation.
  • The contraction technique could be tested on other resampling schemes such as systematic or stratified resampling to see whether the same logarithmic rate holds.
  • Because the result is sharp, it supplies a concrete benchmark against which future analyses of particle-filter stability can be compared.

Load-bearing premise

The underlying Feynman-Kac model must satisfy a strong mixing condition that produces uniform contraction in total variation or a suitable Wasserstein distance.

What would settle it

Construct a Feynman-Kac model that satisfies the strong mixing assumption yet requires more than c log N steps for the particle system to reach a fixed total-variation distance to stationarity for some constant c independent of N.

Figures

Figures reproduced from arXiv: 2309.08517 by Anthony Lee, Joona Karjalainen, Matti Vihola, Sumeetpal S. Singh.

Figure 1
Figure 1. Figure 1: Propagation of chaos total variation distances (13) as a function of q/N in a discrete state-space model example for times k = 4 and k = 20. The lines illustrate the mentioned orders of decay. behaviour for a range of q, and the total variations can be upper and lower bounded between Θ(q/N) curves, but these may be specific to the example. The propagation of chaos can be combined with forgetting, leading t… view at source ↗
read the original abstract

We study the forgetting properties of the particle filter when its state - the collection of particles - is regarded as a Markov chain. Under a strong mixing assumption on the particle filter's underlying Feynman-Kac model, we find that the particle filter is exponentially mixing, and forgets its initial state in $O(\log N )$ 'time', where $N$ is the number of particles and time refers to the number of particle filter algorithm steps, each comprising a selection (or resampling) and mutation (or prediction) operation. We present an example which shows that this rate is optimal. In contrast to our result, available results to-date are extremely conservative, suggesting $O(\alpha^N)$ time steps are needed, for some $\alpha>1$, for the particle filter to forget its initialisation. We also study the conditional particle filter (CPF) and extend our forgetting result to this context. We establish a similar conclusion, namely, CPF is exponentially mixing and forgets its initial state in $O(\log N )$ time. To support this analysis, we establish new time-uniform $L^p$ error estimates for CPF, which can be of independent interest. We also establish new propagation of chaos type results using our proof techniques, discuss implications to couplings of particle filters and an application to processing out-of-sequence measurements.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript studies the forgetting properties of the particle filter by regarding the N-particle configuration as a Markov chain on the product space. Under a strong mixing assumption on the underlying Feynman-Kac model, it claims that the N-particle kernel is geometrically ergodic with rate independent of N, so that the filter forgets its initial state in O(log N) steps; an example establishes optimality of this rate. The analysis is extended to the conditional particle filter, yielding new time-uniform L^p error bounds, propagation-of-chaos results, and applications to couplings and out-of-sequence measurements.

Significance. If the central uniformity claim holds, the work replaces the extremely conservative O(α^N) mixing-time bounds in the literature with the tight and optimal O(log N) rate. The optimality example, the new CPF error estimates, and the propagation-of-chaos arguments are concrete strengths that would be of independent interest to the sequential Monte Carlo community.

major comments (2)
  1. [§3, Theorem 3.1] §3, Theorem 3.1 and the subsequent contraction argument: the O(log N) forgetting time requires that the one-step Dobrushin coefficient r_N of the full N-particle kernel (resampling composed with mutation) satisfies r_N ≤ ρ < 1 with ρ independent of N. The strong-mixing assumption supplies a uniform minorization on the underlying FK kernels, but the manuscript must explicitly verify that the multinomial resampling step does not introduce a 1/N deterioration in the joint contraction coefficient; without this uniform bound the mixing time could become Θ(N).
  2. [§5] §5, the optimality example: while the lower bound construction is welcome, it only shows that some initial configurations require Ω(log N) steps; the manuscript should confirm that the same example does not force a worse (e.g., polynomial) upper bound under the stated mixing assumption.
minor comments (2)
  1. [§2] The definition of the N-particle state space and the precise form of the resampling kernel could be stated more explicitly in §2 to make the subsequent Dobrushin-coefficient calculations easier to follow.
  2. A short remark comparing the obtained O(log N) rate with the existing O(α^N) literature bounds would help readers locate the improvement.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and the constructive comments on the forgetting properties of particle filters. We address each major comment below.

read point-by-point responses
  1. Referee: [§3, Theorem 3.1] §3, Theorem 3.1 and the subsequent contraction argument: the O(log N) forgetting time requires that the one-step Dobrushin coefficient r_N of the full N-particle kernel (resampling composed with mutation) satisfies r_N ≤ ρ < 1 with ρ independent of N. The strong-mixing assumption supplies a uniform minorization on the underlying FK kernels, but the manuscript must explicitly verify that the multinomial resampling step does not introduce a 1/N deterioration in the joint contraction coefficient; without this uniform bound the mixing time could become Θ(N).

    Authors: We agree that an explicit verification of the uniform bound on r_N is valuable for clarity. In the proof of Theorem 3.1 the Dobrushin coefficient of the composed kernel is controlled by first applying the contraction of the mutation step (which inherits the N-independent minorization constant from the strong-mixing assumption on the Feynman-Kac kernels) and then noting that multinomial resampling is a convex combination of the particles; as a stochastic matrix it cannot increase the Dobrushin coefficient. Consequently r_N ≤ ρ < 1 holds uniformly in N. We will insert a short remark immediately after the statement of Theorem 3.1 making this preservation property explicit. revision: yes

  2. Referee: [§5] §5, the optimality example: while the lower bound construction is welcome, it only shows that some initial configurations require Ω(log N) steps; the manuscript should confirm that the same example does not force a worse (e.g., polynomial) upper bound under the stated mixing assumption.

    Authors: The example in §5 is constructed under exactly the same strong-mixing assumption used for the upper bound. Theorem 3.1 therefore applies directly to the example and guarantees an O(log N) upper bound for every initial configuration, including those used in the lower-bound construction. The example saturates the rate but does not violate the general upper bound; no polynomial (or worse) deterioration is forced. revision: no

Circularity Check

0 steps flagged

No circularity; derivation rests on external mixing assumption

full rationale

The paper's central result (exponential mixing of the N-particle filter with O(log N) forgetting time) is derived from an explicitly stated strong mixing assumption on the underlying Feynman-Kac model. This assumption is external and not shown to reduce to any self-referential definition, fitted parameter renamed as prediction, or self-citation chain within the paper. The optimality example is presented separately and does not close a definitional loop. No load-bearing step equates the claimed contraction rate or mixing time to its own inputs by construction. The analysis therefore remains self-contained against the stated external hypothesis.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the strong mixing assumption for the Feynman-Kac model; no free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption strong mixing assumption on the particle filter's underlying Feynman-Kac model
    Invoked to establish that the particle system Markov chain is exponentially mixing.

pith-pipeline@v0.9.0 · 5770 in / 1233 out tokens · 20172 ms · 2026-05-24T06:43:21.538218+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · 1 internal anchor

  1. [1]

    Andrieu, A

    C. Andrieu, A. Doucet, and R. Holenstein. Particle Markov chain Monte Carlo methods. J. R. Stat. Soc. Ser. B Stat. Methodol. , 72(3):269–342, 2010

  2. [2]

    Capp´ e, E

    O. Capp´ e, E. Moulines, and T. Ryd´ en.Inference in Hidden Markov Models . Springer Series in Statistics. Springer, 2005

  3. [3]

    Cardoso, Y

    G. Cardoso, Y. Janati El Idrissi, S. Le Corff, E. Moulines, and J. Olsson. State and parameter learning with PARIS particle Gibbs. In Proceedings of the 40th International Conference on Machine Learning , pages 3625–3675, 2023

  4. [4]

    C´ erou, P

    F. C´ erou, P. Del Moral, and A. Guyader. A nonasymptotic theorem for un- normalized Feynman–Kac particle models. Ann. Inst. Henri Poincar´ e Probab. Stat., 47(3):629–649, 2011

  5. [5]

    H. P. Chan and T. L. Lai. A general theory of particle filters in hidden Markov models and some applications. Ann. Statist., 41(6):2877–2904, 2013

  6. [6]

    N. Chopin. Central limit theorem for sequential Monte Carlo methods and its application to Bayesian inference. Ann. Statist., 32(6):2385 – 2411, 2004

  7. [7]

    Chopin and O

    N. Chopin and O. Papaspiliopoulos. An introduction to sequential Monte Carlo, volume 4 of Springer Series in Statistics . Springer, 2020

  8. [8]

    Chopin and S

    N. Chopin and S. S. Singh. On particle Gibbs sampling. Bernoulli, 21(3): 1855–1883, 2015. 26 JOONA KARJALAINEN, ANTHONY LEE, SUMEETPAL S. SINGH, AND MATTI VIHOLA

  9. [9]

    Dau and N

    H.-D. Dau and N. Chopin. On backward smoothing algorithms. The Annals of Statistics, 51(5):2145–2169, 2023

  10. [10]

    Del Moral

    P. Del Moral. Feynman-Kac formulae: Genealogical and Interacting Particle Systems with Applications , volume 88. Springer, 2004

  11. [11]

    Del Moral and A

    P. Del Moral and A. Guionnet. Large deviations for interacting particle sys- tems: Applications to non-linear filtering. Stochastic Process. Appl. , 78(1): 69–95, 1998

  12. [12]

    Del Moral and A

    P. Del Moral and A. Guionnet. Central limit theorem for nonlinear filtering and interacting particle systems. Ann. Appl. Probab., 9(2):275 – 297, 1999

  13. [13]

    Del Moral, A

    P. Del Moral, A. Doucet, and G. W. Peters. Sharp propagation of chaos estimates for Feynman–Kac particle models. Theory Probab. Appl. , 51(3): 459–485, 2007

  14. [14]

    Del Moral, A

    P. Del Moral, A. Doucet, and S. S. Singh. A backward particle interpretation of Feynman–Kac formulae. ESAIM Math. Model. Numer. Anal. , 44(5):947–975, 2010

  15. [15]

    Del Moral, F

    P. Del Moral, F. Patras, and S. Rubenthaler. Convergence of U-statistics for interacting particle systems. J. Theoret. Probab., 24:1002–1027, 2011

  16. [16]

    Del Moral, A

    P. Del Moral, A. Doucet, and S. S. Singh. Uniform stability of a particle approximation of the optimal filter derivative. SIAM J. Control Optim. , 53 (3):1278–1304, 2015

  17. [17]

    Douc and E

    R. Douc and E. Moulines. Limit theorems for weighted samples with appli- cations to sequential Monte Carlo methods. Ann. Statist., 36(5):2344 – 2376, 2008

  18. [18]

    R. Douc, A. Garivier, E. Moulines, and J. Olsson. Sequential Monte Carlo smoothing for general state space hidden Markov models. Ann. Appl. Probab., 21(6):2109–2145, 2011

  19. [19]

    R. Douc, A. Garivier, E. Moulines, and J. Olsson. Sequential Monte Carlo smoothing for general state space hidden Markov models. The Annals of Ap- plied Probability, 21(6):2109–2145, 2011

  20. [20]

    R. Douc, E. Moulines, P. Priouret, and P. Soulier. Markov chains. Springer, 2018

  21. [21]

    S. J. Godsill, A. Doucet, and M. West. Monte Carlo smoothing for nonlinear time series. Journal of the American Statistical Association , 99(465):156–168, 2004

  22. [22]

    N. J. Gordon, D. J. Salmond, and A. F. M. Smith. Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEE Proceedings-F, 140 (2):107–113, 1993

  23. [23]

    Gregory, C

    A. Gregory, C. J. Cotter, and S. Reich. Multilevel ensemble transform particle filtering. SIAM J. Sci. Comput. , 38(3):A1317–A1338, 2016

  24. [24]

    Hoeffding and J

    W. Hoeffding and J. Wolfowitz. Distinguishability of sets of distributions. The Annals of Mathematical Statistics , 29(3):700–718, 1958

  25. [25]

    P. E. Jacob, F. Lindsten, and T. B. Sch¨ on. Smoothing with couplings of conditional particle filters. J. Amer. Statist. Assoc. , 115(530):721–729, 2020

  26. [26]

    Jasra and F

    A. Jasra and F. Yu. Central limit theorems for coupled particle filters. Adv. in Appl. Probab., 52:942–1001, 2020

  27. [27]

    Jasra, K

    A. Jasra, K. Kamatani, K. J. Law, and Y. Zhou. Multilevel particle filters. SIAM J. Numer. Anal. , 55(6):3068–3096, 2017

  28. [28]

    Kantas, A

    N. Kantas, A. Doucet, S. S. Singh, J. Maciejowski, and N. Chopin. On particle methods for parameter estimation in state-space models. Statist. Sci. , 30(3): 328–351, 2015

  29. [29]

    Karjalainen, A

    J. Karjalainen, A. Lee, S. S. Singh, and M. Vihola. Mixing time of the condi- tional backward sampling particle filter. Preprint arXiv:2312.17572, 2023. ON THE FORGETTING OF PARTICLE FILTERS 27

  30. [30]

    Stability of Conditional Sequential Monte Carlo

    B. Kuhlenschmidt and S. S. Singh. Stability of conditional sequential Monte Carlo. Preprint arXiv:1806.06520, 2018

  31. [31]

    Lee and N

    A. Lee and N. Whiteley. Variance estimation in the particle filter. Biometrika, 105(3):609–625, 2018

  32. [32]

    A. Lee, S. S. Singh, and M. Vihola. Coupled conditional backward sampling particle filter. Ann. Statist., 48(5):3066–3089, 2020

  33. [33]

    Lindvall

    T. Lindvall. Lectures on the coupling method . Wiley, 1992. Reprint: Dover paperback edition, 2002

  34. [34]

    Mastrototaro and J

    A. Mastrototaro and J. Olsson. Adaptive online variance estimation in particle filters: the ALVar estimator. Statist. Comput., 33(4):1–26, 2023

  35. [35]

    Olsson and R

    J. Olsson and R. Douc. Numerically stable online estimation of variance in particle filters. Bernoulli, 25(2):1504–1535, 2019

  36. [36]

    Orguner and F

    U. Orguner and F. Gustafsson. Storage efficient particle filters for the out of sequence measurement problem. In 2008 11th International Conference on Information Fusion, pages 1–8. IEEE, 2008

  37. [37]

    Orton and A

    M. Orton and A. Marrs. Particle filters for tracking with out-of-sequence measurements. IEEE Transactions on Aerospace and Electronic Systems , 41 (2):693–702, 2005

  38. [38]

    M. K. Pitt and N. Shephard. Filtering via simulation: Auxiliary particle filters. J. Amer. Statist. Assoc. , 94(446):590–599, 1999

  39. [39]

    Poyiadjis, A

    G. Poyiadjis, A. Doucety, and S. S. Singh. Particle approximations of the score and observed information matrix in state space models with application to parameter estimation. Biometrika, 98(1):65–80, 2011

  40. [40]

    T. B. Sch¨ on, A. Wills, and B. Ninness. System identification of nonlinear state-space models. Automatica, 47(1):39–49, 2011

  41. [41]

    D. Sen, A. H. Thiery, and A. Jasra. On coupling particle filter trajectories. Statist. Comput., 28(2):461–475, 2018

  42. [42]

    A. N. Shiryaev. Probability. Springer-Verlag, New York, second edition, 1996. ISBN 0-387-94549-0

  43. [43]

    V. Z. Tadi´ c and A. Doucet. Asymptotic properties of recursive particle max- imum likelihood estimation. IEEE Trans. Inform. Theory , 67(3):1825–1848, 2021

  44. [44]

    Thorisson

    H. Thorisson. Coupling, Stationarity, and Regeneration . Springer, 2000

  45. [45]

    M. J. Wainwright. High-dimensional statistics: A non-asymptotic viewpoint , volume 48. Cambridge university press, 2019

  46. [46]

    Whiteley

    N. Whiteley. Stability properties of some particle filters. Ann. Appl. Probab., 23(6):2500–2537, 2013. 28 JOONA KARJALAINEN, ANTHONY LEE, SUMEETPAL S. SINGH, AND MATTI VIHOLA Appendix A. Technical lemmas Lemma 30. For any probability measures µ and ν on E it holds that ∥µ − ν∥2 TV ≤ 1 − (1 − H2(µ, ν))2. Proof. Denote by H2(µ ∥ ν) := 2H2(µ, ν) the (unnorma...

  47. [47]

    Under (A2), we obtain a lower bound for the densities of the ideal filters for large enough time indices, not depending on the initial distribution η0

    give analogues of Lemmas 1 and 2 under (A2): sup µ,ν ∥Φn,n+k(µ) − Φn,n+k(ν)∥TV ≤ β⌊k/m⌋ m ,(19) sup n≥0 sup osc(ϕ)≤1 ∥ηN n (ϕ) − ηn(ϕ)∥p ≤ cp,m√ N ,(20) for some βm < 1, cp,m < ∞, which only depend on p, m, and the bounds on G and Mt:t+m. Under (A2), we obtain a lower bound for the densities of the ideal filters for large enough time indices, not dependin...