Truncated Neural Likelihood Estimation for Simulation-Based Inference in State-Space Models
Pith reviewed 2026-05-22 07:17 UTC · model grok-4.3
The pith
Truncated sequential neural likelihood estimation overcomes key limitations of standard SNL for parameter inference in state-space models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors propose truncated sequential neural likelihood (T-SNL) for simulation-based parameter inference in state-space models. By truncating the full latent state sequence during neural likelihood training, T-SNL directly addresses the high sample requirements, poor scaling with sequence length, and lack of amortization that limit standard SNL. The resulting procedure produces more accurate and stable posterior estimates while remaining flexible for sequential data arrival.
What carries the argument
The truncation mechanism applied to the latent state sequence inside sequential neural likelihood estimation.
If this is right
- T-SNL requires substantially fewer simulated trajectories than standard SNL to reach a target accuracy level.
- Performance remains stable as the length of the observed time series increases.
- The trained estimator can be reused without retraining when new observations become available.
- Training exhibits greater robustness to hyperparameter choices and random seeds than untruncated SNL.
Where Pith is reading between the lines
- Truncation length could be made data-dependent to balance bias and variance in a wider range of SSMs.
- Similar truncation ideas might improve neural likelihood methods for other partially observed sequential processes.
- Combining T-SNL with existing particle-filter approximations could further reduce variance in high-dimensional settings.
Load-bearing premise
Truncating the state sequence preserves enough information about the model parameters to support accurate inference without adding bias or training instability.
What would settle it
A controlled experiment on a linear Gaussian state-space model with known true parameters in which T-SNL yields systematically biased or unstable posterior estimates for any truncation length would disprove the central performance claims.
read the original abstract
State-space models (SSMs) are powerful probabilistic tools for modeling time-varying systems with latent dynamics. Inference in SSMs involves the estimation of latent states and parameters. In this work, we focus on parameter inference, which for SSMs is in general a very challenging problem due to the intractability of the likelihood. Recently, neural estimation methods, such as sequential neural likelihood (SNL), have shown promising results in Bayesian inference problems. In this paper, we show that SNL, when applied to the SSM setting, suffers important limitations, such as requiring a large amount of simulated samples to achieve a moderate performance, scaling poorly with sequence length, while not being amortized. We then introduce a novel inference algorithm called truncated-SNL (T-SNL), which addresses the limitations of SNL. Our algorithm is more accurate, more stable and robust during training, more scalable to longer temporal sequences, and can be amortized when new observations become available. Our experiments show that T-SNL is sample-efficient, robust, and flexible algorithm which outperforms other approaches.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Truncated Sequential Neural Likelihood (T-SNL) for simulation-based Bayesian parameter inference in state-space models. It identifies limitations of standard sequential neural likelihood (SNL) such as requiring large numbers of simulations, poor scaling with sequence length, and lack of amortization. T-SNL applies a truncation operator to the state trajectory to address these issues and claims to deliver greater accuracy, training stability, robustness, scalability to longer sequences, and amortization for new observations, with experiments demonstrating outperformance over baselines.
Significance. If the truncation preserves sufficient information from the latent dynamics without introducing material bias or instability, T-SNL would represent a practical advance in SBI for SSMs, enabling more efficient inference on long time series. The reported sample efficiency and robustness during training would be valuable strengths for applied work in dynamical systems and time-series modeling.
major comments (2)
- [§3] §3 (T-SNL definition and truncation operator): the central claim that T-SNL yields more accurate and unbiased parameter posteriors rests on the truncation window preserving enough information from the full state sequence. No explicit approximation error bound, consistency result, or analysis of bias from discarded long-range dependencies is provided, leaving open the possibility that reported gains are artifacts of the chosen truncation length matching the simulation regime rather than a general improvement.
- [§5] §5 (Experiments): the comparison tables and figures do not report results across a range of truncation lengths K or SSMs with varying memory lengths; without this, it is difficult to assess whether the claimed superiority in accuracy and scalability holds generally or only for the specific dynamics tested.
minor comments (2)
- [§2] Notation for the truncated likelihood p(y_{t-K:t}|θ, s_{t-K}) should be introduced earlier and used consistently when discussing amortization.
- [§5] Figure captions and axis labels in the experimental results could more explicitly indicate the truncation length used for each method.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive comments on our manuscript. We address each major point below, indicating planned revisions where appropriate.
read point-by-point responses
-
Referee: [§3] §3 (T-SNL definition and truncation operator): the central claim that T-SNL yields more accurate and unbiased parameter posteriors rests on the truncation window preserving enough information from the full state sequence. No explicit approximation error bound, consistency result, or analysis of bias from discarded long-range dependencies is provided, leaving open the possibility that reported gains are artifacts of the chosen truncation length matching the simulation regime rather than a general improvement.
Authors: We agree that a formal approximation error bound or consistency result would strengthen the theoretical foundation. Deriving such general results for arbitrary SSMs is challenging because bias depends on model-specific mixing rates and long-range dependencies. The manuscript motivates truncation via the practical decay of state influence in many SSMs and demonstrates empirically that posteriors remain accurate for the tested regimes. We will revise §3 to add a discussion of truncation bias, practical guidelines for choosing K based on state autocorrelation, and an explicit acknowledgment that a general consistency theorem is not provided. revision: partial
-
Referee: [§5] §5 (Experiments): the comparison tables and figures do not report results across a range of truncation lengths K or SSMs with varying memory lengths; without this, it is difficult to assess whether the claimed superiority in accuracy and scalability holds generally or only for the specific dynamics tested.
Authors: We concur that broader sensitivity analysis would better support the generality claims. Current experiments select K per SSM to balance efficiency and information retention. We will expand §5 with additional results that vary K over a range for the existing models and introduce at least one SSM with longer memory dependencies, including corresponding accuracy, stability, and scalability metrics. revision: yes
- A rigorous general approximation error bound or consistency result for the truncation operator across arbitrary state-space models, which would require substantial new theoretical analysis.
Circularity Check
No significant circularity; claims rest on experimental validation
full rationale
The paper introduces T-SNL by defining a truncation operator on the state sequence to enable neural likelihood estimation in SSMs, then reports empirical gains in accuracy, stability, scalability, and amortization over SNL and other baselines. These performance claims are tied directly to simulation experiments rather than reducing to fitted parameters renamed as predictions or to self-citations whose content is presupposed. No equations equate the target posterior or likelihood to the truncation choice by construction, and the truncation itself is motivated as a practical approximation whose adequacy is assessed externally via held-out performance metrics. The derivation chain therefore remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Neural networks can be trained to approximate intractable likelihoods in state-space models from simulated data.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We approximate the observation process by a Markov chain of order L... pL(y1:T|θ)≈∏ p(yt|yt-L:t-1,θ)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
& Svensson, L.Bayesian filtering and smoothingVol
S¨ arkk¨ a, S. & Svensson, L.Bayesian filtering and smoothingVol. 17 (Cambridge university press, 2023)
work page 2023
-
[2]
Johansson, R., Robertsson, A., Nilsson, K. & Verhaegen, M. State-space system iden- tification of robot manipulator dynamics. Mechatronics10, 403–418 (2000)
work page 2000
-
[3]
Probabilistic robotics.Communi- cations of the ACM45, 52–57 (2002)
Thrun, S. Probabilistic robotics.Communi- cations of the ACM45, 52–57 (2002)
work page 2002
-
[4]
Paninski, L.et al.A new look at state-space models for neural data.Journal of computa- tional neuroscience29, 107–126 (2010)
work page 2010
-
[5]
Linderman, S., Nichols, A., Blei, D., Zim- mer, M. & Paninski, L. Hierarchical recurrent state space models reveal discrete and contin- uous dynamics of neural activity in c. elegans. BioRxiv621540 (2019)
work page 2019
-
[6]
Aghagolzadeh, M. & Truccolo, W.Latent state-space models for neural decoding.2014 36th Annual International Conference of the 12 IEEE Engineering in Medicine and Biology Society, 3033–3036. IEEE (IEEE, 2014)
work page 2014
-
[7]
Buckland, S., Newman, K., Thomas, L. & Koesters, N. State-space models for the dynamics of wild animal populations.Ecolog- ical modelling171, 157–175 (2004)
work page 2004
-
[8]
Newman, K.et al.State-space models for ecological time-series data: Practical model- fitting.Methods in Ecology and Evolution14, 26–42 (2023)
work page 2023
-
[9]
Sharma, S., Elvira, V., Chouzenoux, E. & Majumdar, A. Recurrent dictionary learning for state-space models with an application in stock forecasting.Neurocomputing450, 1–13 (2021)
work page 2021
-
[10]
Hamilton, J. D. State-space models.Hand- book of econometrics4, 3039–3080 (1994)
work page 1994
-
[11]
Lopes, H. F. & Tsay, R. S. Particle filters and bayesian inference in financial econometrics. Journal of Forecasting30, 168–209 (2011)
work page 2011
-
[12]
Martino, L., Read, J., Elvira, V. & Louzada, F. Cooperative parallel particle filters for online model selection and applications to urban mobility.Digital Signal Processing60, 172–185 (2017)
work page 2017
-
[13]
Shumway, R. H. & Stoffer, D. S. An approach to time series smoothing and forecasting using the EM algorithm.Journal of Time Series Analysis3, 253–264 (1982)
work page 1982
-
[14]
Chouzenoux, E. & Elvira, V. Sparse graph- ical linear dynamical systems.Journal of Machine Learning Research25, 1–53 (2024)
work page 2024
-
[15]
Andrieu, C., Doucet, A. & Holenstein, R. Particle markov chain monte carlo meth- ods.Journal of the Royal Statistical Society Series B: Statistical Methodology72, 269–342 (2010)
work page 2010
-
[16]
Kantas, N., Doucet, A., Singh, S. S., Maciejowski, J. & Chopin, N. On particle methods for parameter estimation in state- space models.Statistical Science30, 328–351 (2015)
work page 2015
-
[17]
Luengo, D., Martino, L., Bugallo, M., Elvira, V. & S¨ arkk¨ a, S. A survey of monte carlo methods for parameter estimation. EURASIP Journal on Advances in Signal Processing2020, 25 (2020)
work page 2020
-
[18]
AtlasCollaboration. An implementation of neural simulation-based inference for param- eter estimation in atlas.Reports on Progress in Physics88, 067801 (2025)
work page 2025
-
[19]
Tavar´ e, S., Balding, D. J., Griffiths, R. C. & Donnelly, P. Inferring coalescence times from dna sequence data.Genetics145, 505–518 (1997)
work page 1997
-
[20]
Sisson, S. A., Fan, Y. & Beaumont, M.Hand- book of approximate Bayesian computation (CRC press, 2018)
work page 2018
-
[21]
Pesonen, H.et al.Abc of the future.Interna- tional Statistical Review91, 243–268 (2023)
work page 2023
-
[22]
Marjoram, P., Molitor, J., Plagnol, V. & Tavar´ e, S. Markov chain monte carlo without likelihoods.Proceedings of the National Academy of Sciences100, 15324– 15328 (2003)
work page 2003
-
[23]
A., Cornuet, J.-M., Marin, J.- M
Beaumont, M. A., Cornuet, J.-M., Marin, J.- M. & Robert, C. P. Adaptive approximate bayesian computation.Biometrika96, 983– 990 (2009)
work page 2009
-
[24]
Simola, U., Cisewski-Kehe, J., Gutmann, M. U. & Corander, J. Adaptive approxi- mate bayesian computation tolerance selec- tion.Bayesian analysis16, 397–423 (2021)
work page 2021
-
[25]
Germain, M., Gregor, K., Murray, I. & Larochelle, H. Bach, F. & Blei, D. (eds) Made: Masked autoencoder for distribution estimation. (eds Bach, F. & Blei, D.)Pro- ceedings of the 32nd International Conference on Machine Learning, Vol. 37 ofProceed- ings of Machine Learning Research, 881–889 (PMLR, Lille, France, 2015)
work page 2015
-
[26]
Papamakarios, G. & Murray, I. Lee, D., Sugiyama, M., Luxburg, U., Guyon, I. & Garnett, R. (eds)Fastϵ-free inference of simulation models with bayesian conditional 13 density estimation. (eds Lee, D., Sugiyama, M., Luxburg, U., Guyon, I. & Garnett, R.) Advances in Neural Information Processing Systems, Vol. 29 (Curran Associates, Inc., 2016)
work page 2016
-
[27]
Lueckmann, J.-M.et al.Flexible statisti- cal inference for mechanistic models of neural dynamics.Advances in neural information processing systems30(2017)
work page 2017
-
[28]
Greenberg, D., Nonnenmacher, M. & Macke, J.Automatic posterior transformation for likelihood-free inference.International con- ference on machine learning, 2404–2414. PMLR (PMLR, 2019)
work page 2019
-
[29]
Papamakarios, G., Sterratt, D. & Murray, I. Chaudhuri, K. & Sugiyama, M. (eds) Sequential neural likelihood: Fast likelihood- free inference with autoregressive flows. (eds Chaudhuri, K. & Sugiyama, M.)Proceedings of the Twenty-Second International Confer- ence on Artificial Intelligence and Statistics, Vol. 89 ofProceedings of Machine Learning Research, ...
work page 2019
-
[30]
Thomas, O., Dutta, R., Corander, J., Kaski, S. & Gutmann, M. U. Likelihood-free infer- ence by ratio estimation.Bayesian Analysis 17, 1–31 (2022)
work page 2022
-
[31]
Hermans, J., Begy, V. & Louppe, G. Likelihood-free mcmc with amortized approx- imate ratio estimators.International confer- ence on machine learning, 4239–4248. PMLR (PMLR, 2020)
work page 2020
-
[32]
Cranmer, K., Brehmer, J. & Louppe, G. The frontier of simulation-based inference.Pro- ceedings of the National Academy of Sciences 117, 30055–30062 (2020)
work page 2020
-
[33]
Lueckmann, J.-M., Boelts, J., Greenberg, D., Goncalves, P. & Macke, J.Benchmark- ing simulation-based inference.Interna- tional conference on artificial intelligence and statistics, 343–351. PMLR (PMLR, 2021)
work page 2021
-
[34]
Stockman, S., Lawson, D. J. & Werner, M. J. Sb-etas: using simulation based inference for scalable, likelihood-free inference for the etas model of earthquake occurrences.Statistics and Computing34, 174 (2024)
work page 2024
-
[35]
Grazzini, J. & Richiardi, M. Estimation of ergodic agent-based models by simu- lated minimum distance.Journal of Eco- nomic Dynamics and Control51, 148–165 (2015). URL https://www.sciencedirect. com/science/article/pii/S0165188914002814
work page 2015
-
[36]
Grazzini, J., Richiardi, M. G. & Tsionas, M. Bayesian estimation of agent-based models.Journal of Economic Dynam- ics and Control77, 26–47 (2017). URL https://www.sciencedirect.com/science/ article/pii/S0165188917300222
work page 2017
-
[37]
URL https://doi.org/10.1093/ mnras/stx894
Hahn, C.et al.Approximate bayesian com- putation in large-scale structure: constraining the galaxy–halo connection.Monthly Notices of the Royal Astronomical Society469, 2791– 2805 (2017). URL https://doi.org/10.1093/ mnras/stx894
work page 2017
-
[38]
Verdier, H.et al.Simulation-based inference for non-parametric statistical comparison of biomolecule dynamics.PLOS Computational Biology19, 1–24 (2023). URL https://doi. org/10.1371/journal.pcbi.1010088
-
[39]
Toni, T., Welch, D., Strelkowa, N., Ipsen, A. & Stumpf, M. P. Approximate bayesian com- putation scheme for parameter inference and model selection in dynamical systems.Jour- nal of the Royal Society Interface6, 187–202 (2009)
work page 2009
-
[40]
Martin, G. M., McCabe, B. P., Frazier, D. T., Maneesoonthorn, W. & Robert, C. P. Aux- iliary likelihood-based approximate bayesian computation in state space models.Jour- nal of Computational and Graphical Statistics 28, 508–522 (2019)
work page 2019
-
[41]
Dean, T. A., Singh, S. S., Jasra, A. & Peters, G. W. Parameter estimation for hidden markov models with intractable likelihoods. Scandinavian Journal of Statistics41, 970– 987 (2014)
work page 2014
-
[42]
Aushev, A., Tran, T., Pesonen, H., Howes, A. & Kaski, S. Likelihood-free inference in 14 state-space models with unknown dynamics. Statistics and Computing34, 27 (2024)
work page 2024
-
[43]
Ward, W., Ryder, T., Prangle, D. & Alvarez, M.Black-box inference for non-linear latent force models.International conference on artificial intelligence and statistics, 3088–
-
[44]
Ryder, T., Prangle, D., Golightly, A. & Matthews, I.The neural moving average model for scalable variational inference of state space models.Uncertainty in Artificial Intelligence, 12–22. PMLR (PMLR, 2021)
work page 2021
-
[45]
Probabilistic recurrent state-space models.International confer- ence on machine learning, 1280–1289
Doerr, A.et al. Probabilistic recurrent state-space models.International confer- ence on machine learning, 1280–1289. PMLR (PMLR, 2018)
work page 2018
-
[46]
Khabibullin, R. & Seleznev, S. Fast esti- mation of bayesian state space models using amortized simulation-based inference.arXiv preprint arXiv:2210.07154(2022)
-
[47]
Aushev, A., Tran, T., Pesonen, H., Howes, A. & Kaski, S. Likelihood-free inference in state-space models with unknown dynamics. Statistics and Computing34, 27 (2024)
work page 2024
-
[48]
Doucet, A. & Johansen, A. M. A tutorial on particle filtering and smoothing: Fifteen years later.Handbook of nonlinear filtering 12(2009)
work page 2009
-
[49]
Del Moral, P. & Del Moral, P.Feynman-kac formulae:Genealogical and Interacting Par- ticle Systems with Applications(Springer, 2004)
work page 2004
-
[50]
Tsampourakis, K. & Elvira, V.Approxi- mating the likelihood ratio in linear-gaussian state-space models for change detection. ICASSP 2022-2022 IEEE International Con- ference on Acoustics, Speech and Signal Pro- cessing (ICASSP), 5912–5916. IEEE (IEEE, 2022)
work page 2022
-
[51]
Frazier, D. T., Martin, G. M., Robert, C. P. & Rousseau, J. Asymptotic proper- ties of approximate bayesian computation. Biometrika105, 593–607 (2018)
work page 2018
-
[52]
Bishop, C. M. Mixture density networks. Technical Report NCRG/94/004, Aston Uni- versity(1994)
work page 1994
-
[53]
Papamakarios, G., Pavlakou, T. & Murray, I. Guyon, I.et al.(eds)Masked autoregressive flow for density estimation. (eds Guyon, I. et al.)Advances in Neural Information Pro- cessing Systems, Vol. 30 (Curran Associates, Inc., 2017)
work page 2017
-
[54]
Gordon, N. J., Salmond, D. J. & Smith, A. F. Novel approach to nonlinear/non-gaussian bayesian state estimation.IEE proceedings F (radar and signal processing), Vol. 140, 107–113. IET (IET, 1993)
work page 1993
-
[55]
Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980(2014)
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[56]
Murray, I., Adams, R. & MacKay, D.Ellip- tical slice sampling.Proceedings of the thirteenth international conference on arti- ficial intelligence and statistics, 541–548. JMLR Workshop and Conference Proceed- ings (JMLR Workshop and Conference Pro- ceedings, 2010)
work page 2010
-
[57]
Lotka, A. J. Analytical note on certain rhyth- mic relations in organic systems.Proceedings of the National Academy of Sciences6, 410– 415 (1920). URL https://www.pnas.org/doi/ abs/10.1073/pnas.6.7.410
-
[58]
Gillespie, D. T. Exact stochastic simulation of coupled chemical reactions.The journal of physical chemistry81, 2340–2361 (1977)
work page 1977
-
[59]
J.Stochastic modelling for systems biology(Chapman and Hall/CRC, 2018)
Wilkinson, D. J.Stochastic modelling for systems biology(Chapman and Hall/CRC, 2018)
work page 2018
-
[60]
P´ erez-Vieites, S., Mari˜ no, I. P. & M´ ıguez, J. Probabilistic scheme for joint parameter estimation and state prediction in complex dynamical systems.Physical Review E98, 063305 (2018). 15 Supplementary Material 6 SMC-ABC adaptive tolerance selection Here we derive our implementation of the adaptive tolerance selection method of [24]. The imple- menta...
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.