On the Forgetting of Particle Filters
Pith reviewed 2026-05-24 06:43 UTC · model grok-4.3
The pith
Under a mixing condition the particle filter forgets its initial state after O(log N) steps.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
When the Feynman-Kac model satisfies a strong mixing assumption, the particle filter viewed as a Markov chain on the space of N-particle configurations is exponentially mixing and therefore forgets its initial configuration after O(log N) resampling-mutation steps; the same conclusion holds for the conditional particle filter.
What carries the argument
the particle filter regarded as a Markov chain whose transition kernel inherits exponential contraction from the strong mixing assumption on the underlying Feynman-Kac model
If this is right
- The number of steps needed to erase initialization dependence scales only logarithmically rather than exponentially in the number of particles.
- The conditional particle filter likewise forgets its starting state in O(log N) steps once equipped with the new uniform error estimates.
- Propagation-of-chaos bounds obtained via the same contraction arguments become time-uniform.
- The improved mixing rate supports more efficient coupling constructions between independent particle filters.
Where Pith is reading between the lines
- The O(log N) forgetting time may allow practitioners to restart or reinitialize particle filters more frequently without paying an exponential penalty in computation.
- The contraction technique could be tested on other resampling schemes such as systematic or stratified resampling to see whether the same logarithmic rate holds.
- Because the result is sharp, it supplies a concrete benchmark against which future analyses of particle-filter stability can be compared.
Load-bearing premise
The underlying Feynman-Kac model must satisfy a strong mixing condition that produces uniform contraction in total variation or a suitable Wasserstein distance.
What would settle it
Construct a Feynman-Kac model that satisfies the strong mixing assumption yet requires more than c log N steps for the particle system to reach a fixed total-variation distance to stationarity for some constant c independent of N.
Figures
read the original abstract
We study the forgetting properties of the particle filter when its state - the collection of particles - is regarded as a Markov chain. Under a strong mixing assumption on the particle filter's underlying Feynman-Kac model, we find that the particle filter is exponentially mixing, and forgets its initial state in $O(\log N )$ 'time', where $N$ is the number of particles and time refers to the number of particle filter algorithm steps, each comprising a selection (or resampling) and mutation (or prediction) operation. We present an example which shows that this rate is optimal. In contrast to our result, available results to-date are extremely conservative, suggesting $O(\alpha^N)$ time steps are needed, for some $\alpha>1$, for the particle filter to forget its initialisation. We also study the conditional particle filter (CPF) and extend our forgetting result to this context. We establish a similar conclusion, namely, CPF is exponentially mixing and forgets its initial state in $O(\log N )$ time. To support this analysis, we establish new time-uniform $L^p$ error estimates for CPF, which can be of independent interest. We also establish new propagation of chaos type results using our proof techniques, discuss implications to couplings of particle filters and an application to processing out-of-sequence measurements.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript studies the forgetting properties of the particle filter by regarding the N-particle configuration as a Markov chain on the product space. Under a strong mixing assumption on the underlying Feynman-Kac model, it claims that the N-particle kernel is geometrically ergodic with rate independent of N, so that the filter forgets its initial state in O(log N) steps; an example establishes optimality of this rate. The analysis is extended to the conditional particle filter, yielding new time-uniform L^p error bounds, propagation-of-chaos results, and applications to couplings and out-of-sequence measurements.
Significance. If the central uniformity claim holds, the work replaces the extremely conservative O(α^N) mixing-time bounds in the literature with the tight and optimal O(log N) rate. The optimality example, the new CPF error estimates, and the propagation-of-chaos arguments are concrete strengths that would be of independent interest to the sequential Monte Carlo community.
major comments (2)
- [§3, Theorem 3.1] §3, Theorem 3.1 and the subsequent contraction argument: the O(log N) forgetting time requires that the one-step Dobrushin coefficient r_N of the full N-particle kernel (resampling composed with mutation) satisfies r_N ≤ ρ < 1 with ρ independent of N. The strong-mixing assumption supplies a uniform minorization on the underlying FK kernels, but the manuscript must explicitly verify that the multinomial resampling step does not introduce a 1/N deterioration in the joint contraction coefficient; without this uniform bound the mixing time could become Θ(N).
- [§5] §5, the optimality example: while the lower bound construction is welcome, it only shows that some initial configurations require Ω(log N) steps; the manuscript should confirm that the same example does not force a worse (e.g., polynomial) upper bound under the stated mixing assumption.
minor comments (2)
- [§2] The definition of the N-particle state space and the precise form of the resampling kernel could be stated more explicitly in §2 to make the subsequent Dobrushin-coefficient calculations easier to follow.
- A short remark comparing the obtained O(log N) rate with the existing O(α^N) literature bounds would help readers locate the improvement.
Simulated Author's Rebuttal
We thank the referee for the careful reading and the constructive comments on the forgetting properties of particle filters. We address each major comment below.
read point-by-point responses
-
Referee: [§3, Theorem 3.1] §3, Theorem 3.1 and the subsequent contraction argument: the O(log N) forgetting time requires that the one-step Dobrushin coefficient r_N of the full N-particle kernel (resampling composed with mutation) satisfies r_N ≤ ρ < 1 with ρ independent of N. The strong-mixing assumption supplies a uniform minorization on the underlying FK kernels, but the manuscript must explicitly verify that the multinomial resampling step does not introduce a 1/N deterioration in the joint contraction coefficient; without this uniform bound the mixing time could become Θ(N).
Authors: We agree that an explicit verification of the uniform bound on r_N is valuable for clarity. In the proof of Theorem 3.1 the Dobrushin coefficient of the composed kernel is controlled by first applying the contraction of the mutation step (which inherits the N-independent minorization constant from the strong-mixing assumption on the Feynman-Kac kernels) and then noting that multinomial resampling is a convex combination of the particles; as a stochastic matrix it cannot increase the Dobrushin coefficient. Consequently r_N ≤ ρ < 1 holds uniformly in N. We will insert a short remark immediately after the statement of Theorem 3.1 making this preservation property explicit. revision: yes
-
Referee: [§5] §5, the optimality example: while the lower bound construction is welcome, it only shows that some initial configurations require Ω(log N) steps; the manuscript should confirm that the same example does not force a worse (e.g., polynomial) upper bound under the stated mixing assumption.
Authors: The example in §5 is constructed under exactly the same strong-mixing assumption used for the upper bound. Theorem 3.1 therefore applies directly to the example and guarantees an O(log N) upper bound for every initial configuration, including those used in the lower-bound construction. The example saturates the rate but does not violate the general upper bound; no polynomial (or worse) deterioration is forced. revision: no
Circularity Check
No circularity; derivation rests on external mixing assumption
full rationale
The paper's central result (exponential mixing of the N-particle filter with O(log N) forgetting time) is derived from an explicitly stated strong mixing assumption on the underlying Feynman-Kac model. This assumption is external and not shown to reduce to any self-referential definition, fitted parameter renamed as prediction, or self-citation chain within the paper. The optimality example is presented separately and does not close a definitional loop. No load-bearing step equates the claimed contraction rate or mixing time to its own inputs by construction. The analysis therefore remains self-contained against the stated external hypothesis.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption strong mixing assumption on the particle filter's underlying Feynman-Kac model
Reference graph
Works this paper leans on
-
[1]
C. Andrieu, A. Doucet, and R. Holenstein. Particle Markov chain Monte Carlo methods. J. R. Stat. Soc. Ser. B Stat. Methodol. , 72(3):269–342, 2010
work page 2010
-
[2]
O. Capp´ e, E. Moulines, and T. Ryd´ en.Inference in Hidden Markov Models . Springer Series in Statistics. Springer, 2005
work page 2005
-
[3]
G. Cardoso, Y. Janati El Idrissi, S. Le Corff, E. Moulines, and J. Olsson. State and parameter learning with PARIS particle Gibbs. In Proceedings of the 40th International Conference on Machine Learning , pages 3625–3675, 2023
work page 2023
-
[4]
F. C´ erou, P. Del Moral, and A. Guyader. A nonasymptotic theorem for un- normalized Feynman–Kac particle models. Ann. Inst. Henri Poincar´ e Probab. Stat., 47(3):629–649, 2011
work page 2011
-
[5]
H. P. Chan and T. L. Lai. A general theory of particle filters in hidden Markov models and some applications. Ann. Statist., 41(6):2877–2904, 2013
work page 2013
-
[6]
N. Chopin. Central limit theorem for sequential Monte Carlo methods and its application to Bayesian inference. Ann. Statist., 32(6):2385 – 2411, 2004
work page 2004
-
[7]
N. Chopin and O. Papaspiliopoulos. An introduction to sequential Monte Carlo, volume 4 of Springer Series in Statistics . Springer, 2020
work page 2020
-
[8]
N. Chopin and S. S. Singh. On particle Gibbs sampling. Bernoulli, 21(3): 1855–1883, 2015. 26 JOONA KARJALAINEN, ANTHONY LEE, SUMEETPAL S. SINGH, AND MATTI VIHOLA
work page 2015
- [9]
- [10]
-
[11]
P. Del Moral and A. Guionnet. Large deviations for interacting particle sys- tems: Applications to non-linear filtering. Stochastic Process. Appl. , 78(1): 69–95, 1998
work page 1998
-
[12]
P. Del Moral and A. Guionnet. Central limit theorem for nonlinear filtering and interacting particle systems. Ann. Appl. Probab., 9(2):275 – 297, 1999
work page 1999
-
[13]
P. Del Moral, A. Doucet, and G. W. Peters. Sharp propagation of chaos estimates for Feynman–Kac particle models. Theory Probab. Appl. , 51(3): 459–485, 2007
work page 2007
-
[14]
P. Del Moral, A. Doucet, and S. S. Singh. A backward particle interpretation of Feynman–Kac formulae. ESAIM Math. Model. Numer. Anal. , 44(5):947–975, 2010
work page 2010
-
[15]
P. Del Moral, F. Patras, and S. Rubenthaler. Convergence of U-statistics for interacting particle systems. J. Theoret. Probab., 24:1002–1027, 2011
work page 2011
-
[16]
P. Del Moral, A. Doucet, and S. S. Singh. Uniform stability of a particle approximation of the optimal filter derivative. SIAM J. Control Optim. , 53 (3):1278–1304, 2015
work page 2015
-
[17]
R. Douc and E. Moulines. Limit theorems for weighted samples with appli- cations to sequential Monte Carlo methods. Ann. Statist., 36(5):2344 – 2376, 2008
work page 2008
-
[18]
R. Douc, A. Garivier, E. Moulines, and J. Olsson. Sequential Monte Carlo smoothing for general state space hidden Markov models. Ann. Appl. Probab., 21(6):2109–2145, 2011
work page 2011
-
[19]
R. Douc, A. Garivier, E. Moulines, and J. Olsson. Sequential Monte Carlo smoothing for general state space hidden Markov models. The Annals of Ap- plied Probability, 21(6):2109–2145, 2011
work page 2011
-
[20]
R. Douc, E. Moulines, P. Priouret, and P. Soulier. Markov chains. Springer, 2018
work page 2018
-
[21]
S. J. Godsill, A. Doucet, and M. West. Monte Carlo smoothing for nonlinear time series. Journal of the American Statistical Association , 99(465):156–168, 2004
work page 2004
-
[22]
N. J. Gordon, D. J. Salmond, and A. F. M. Smith. Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEE Proceedings-F, 140 (2):107–113, 1993
work page 1993
-
[23]
A. Gregory, C. J. Cotter, and S. Reich. Multilevel ensemble transform particle filtering. SIAM J. Sci. Comput. , 38(3):A1317–A1338, 2016
work page 2016
-
[24]
W. Hoeffding and J. Wolfowitz. Distinguishability of sets of distributions. The Annals of Mathematical Statistics , 29(3):700–718, 1958
work page 1958
-
[25]
P. E. Jacob, F. Lindsten, and T. B. Sch¨ on. Smoothing with couplings of conditional particle filters. J. Amer. Statist. Assoc. , 115(530):721–729, 2020
work page 2020
-
[26]
A. Jasra and F. Yu. Central limit theorems for coupled particle filters. Adv. in Appl. Probab., 52:942–1001, 2020
work page 2020
- [27]
- [28]
-
[29]
J. Karjalainen, A. Lee, S. S. Singh, and M. Vihola. Mixing time of the condi- tional backward sampling particle filter. Preprint arXiv:2312.17572, 2023. ON THE FORGETTING OF PARTICLE FILTERS 27
-
[30]
Stability of Conditional Sequential Monte Carlo
B. Kuhlenschmidt and S. S. Singh. Stability of conditional sequential Monte Carlo. Preprint arXiv:1806.06520, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
- [31]
-
[32]
A. Lee, S. S. Singh, and M. Vihola. Coupled conditional backward sampling particle filter. Ann. Statist., 48(5):3066–3089, 2020
work page 2020
- [33]
-
[34]
A. Mastrototaro and J. Olsson. Adaptive online variance estimation in particle filters: the ALVar estimator. Statist. Comput., 33(4):1–26, 2023
work page 2023
-
[35]
J. Olsson and R. Douc. Numerically stable online estimation of variance in particle filters. Bernoulli, 25(2):1504–1535, 2019
work page 2019
-
[36]
U. Orguner and F. Gustafsson. Storage efficient particle filters for the out of sequence measurement problem. In 2008 11th International Conference on Information Fusion, pages 1–8. IEEE, 2008
work page 2008
-
[37]
M. Orton and A. Marrs. Particle filters for tracking with out-of-sequence measurements. IEEE Transactions on Aerospace and Electronic Systems , 41 (2):693–702, 2005
work page 2005
-
[38]
M. K. Pitt and N. Shephard. Filtering via simulation: Auxiliary particle filters. J. Amer. Statist. Assoc. , 94(446):590–599, 1999
work page 1999
-
[39]
G. Poyiadjis, A. Doucety, and S. S. Singh. Particle approximations of the score and observed information matrix in state space models with application to parameter estimation. Biometrika, 98(1):65–80, 2011
work page 2011
-
[40]
T. B. Sch¨ on, A. Wills, and B. Ninness. System identification of nonlinear state-space models. Automatica, 47(1):39–49, 2011
work page 2011
-
[41]
D. Sen, A. H. Thiery, and A. Jasra. On coupling particle filter trajectories. Statist. Comput., 28(2):461–475, 2018
work page 2018
-
[42]
A. N. Shiryaev. Probability. Springer-Verlag, New York, second edition, 1996. ISBN 0-387-94549-0
work page 1996
-
[43]
V. Z. Tadi´ c and A. Doucet. Asymptotic properties of recursive particle max- imum likelihood estimation. IEEE Trans. Inform. Theory , 67(3):1825–1848, 2021
work page 2021
- [44]
-
[45]
M. J. Wainwright. High-dimensional statistics: A non-asymptotic viewpoint , volume 48. Cambridge university press, 2019
work page 2019
-
[46]
N. Whiteley. Stability properties of some particle filters. Ann. Appl. Probab., 23(6):2500–2537, 2013. 28 JOONA KARJALAINEN, ANTHONY LEE, SUMEETPAL S. SINGH, AND MATTI VIHOLA Appendix A. Technical lemmas Lemma 30. For any probability measures µ and ν on E it holds that ∥µ − ν∥2 TV ≤ 1 − (1 − H2(µ, ν))2. Proof. Denote by H2(µ ∥ ν) := 2H2(µ, ν) the (unnorma...
work page 2013
-
[47]
give analogues of Lemmas 1 and 2 under (A2): sup µ,ν ∥Φn,n+k(µ) − Φn,n+k(ν)∥TV ≤ β⌊k/m⌋ m ,(19) sup n≥0 sup osc(ϕ)≤1 ∥ηN n (ϕ) − ηn(ϕ)∥p ≤ cp,m√ N ,(20) for some βm < 1, cp,m < ∞, which only depend on p, m, and the bounds on G and Mt:t+m. Under (A2), we obtain a lower bound for the densities of the ideal filters for large enough time indices, not dependin...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.