pith. sign in

arxiv: 2606.01468 · v1 · pith:5LBX5ZLYnew · submitted 2026-05-31 · 📊 stat.ML · cs.AI· cs.LG

Computation-Aware Kalman Filtering with Model Selection for Neural Dynamics

Pith reviewed 2026-06-28 15:55 UTC · model grok-4.3

classification 📊 stat.ML cs.AIcs.LG
keywords computation-aware Kalman filteringmodel selectionstate-space modelsneural dynamicsBayesian inferencesingle-cell recordingslatent variable modelsuncertainty calibration
0
0 comments X

The pith

A new computation-aware state-space model performs model selection during Kalman filtering for neural dynamics in large state spaces.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a framework for Bayesian dynamical latent variable modeling that handles datasets where recorded neurons greatly outnumber experimental trials. It extends prior computation-aware filtering by adding a training loss and optimization procedure that selects models on the fly. The result keeps inference tractable without quadratic scaling or preset hyperparameters. On both synthetic and real single-cell recordings the method matches deep-network accuracy while delivering better-calibrated uncertainty. Experiments map out when this Bayesian route is preferable given the ratio of trials to neurons and other dataset constraints.

Core claim

The Computation-Aware State-Space Model (CASSM) incorporates model selection into computation-aware Kalman filtering through a novel training loss and optimization scheme, yielding tractable inference in large state-spaces for scale-imbalanced neural data; in this regime the approach is competitive with deep networks yet shows significantly improved uncertainty calibration over earlier scaled Bayesian methods.

What carries the argument

The Computation-Aware State-Space Model (CASSM), which augments computation-aware Kalman filtering with an integrated model-selection loss and optimizer to remain tractable at scale.

If this is right

  • Bayesian latent-variable models become practical choices for neural dynamics even when the number of trials is far smaller than the number of neurons.
  • Model selection occurs automatically during filtering rather than through separate hyperparameter search.
  • Uncertainty estimates remain better calibrated than those from earlier attempts to scale Bayesian methods to the same regime.
  • Researchers obtain explicit guidance on selecting among dynamical models according to the trial-to-neuron ratio and related dataset properties.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same computation-aware loss construction may transfer to other high-dimensional time-series domains where posterior approximations must stay cheap.
  • The reported roadmap for model choice could be tested as a general decision procedure when deciding between Bayesian and deep-network approaches on scientific datasets of varying scale.
  • Further experiments could check whether the calibration advantage persists when the method is applied to multi-region or multi-subject recordings.

Load-bearing premise

The novel training loss and optimization scheme can fold model selection into computation-aware filtering without restoring quadratic complexity or demanding fixed hyperparameters.

What would settle it

A head-to-head evaluation on held-out single-cell neural recordings that measures whether predictive accuracy matches deep networks while uncertainty calibration metrics (such as expected calibration error) improve over prior computation-aware Bayesian baselines.

Figures

Figures reproduced from arXiv: 2606.01468 by John P. Cunningham, Jonathan Wenger, JR Huml.

Figure 2
Figure 2. Figure 2: As the amount of computation decreases and pos [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 1
Figure 1. Figure 1: Hand velocities in the x − y plane for four different directions (0 ◦ , 90◦ , 180◦ , 270◦ ) for the Area 2 Bump dataset, where a primate is bumped from different angles during a reaching task. The mean predictions and standard deviations from CASSM are shown in the dashed line and filled areas, respectively. Kinematics are obtained by sampling from the CASSM filtering distribution after training and averag… view at source ↗
Figure 3
Figure 3. Figure 3: Grassmann distance between the learned Sk (dense and sparse variations) and the top eigenvectors of Gk . The distance decreases during training, indicating that the learned policy converges to the optimal theoretical policy at small neuron scales where PCA is possible. The random policy is a random projection that is kept fixed throughout training. regime, the assumption of fixed model hyperparameters Θk i… view at source ↗
Figure 4
Figure 4. Figure 4: CASSM does not begin to enter its linear scaling [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: We vary the ratio of trials to neurons, holding trials [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Test MSE across model classes under different neuron [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
read the original abstract

Due to their explicit priors and ability to model uncertainty, Bayesian methods have played a major role in dynamical latent variable modeling of single-cell neural recordings. However, modern-sized datasets have made overparameterized deep networks the preferred methods of choice due to their predictive power and favorable computational scaling. While many posterior approximations exist, all incur approximation errors. Recent work accounts for this error in the form of computational uncertainty but comes at the cost of quadratic complexity and assumes fixed model hyperparameters. Here we extend this development to model selection, including a novel training loss and optimization scheme, which yields tractable inference in large state-spaces. We introduce a framework, the Computation-Aware State-Space Model (CASSM), specifically designed for the scale-imbalanced regime, where the number of trials is significantly lower than the number of recorded neurons. In this regime, for both synthetic and real data, we show that our method is competitive with data-hungry deep networks, with significantly improved uncertainty calibration over previous attempts to scale Bayesian methods. Our experiments provide a roadmap to neuroscience researchers in choosing from a host of potential dynamical latent variable models given key dataset properties and constraints.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript introduces the Computation-Aware State-Space Model (CASSM), extending prior computation-aware Kalman filtering to incorporate model selection via a novel training loss and optimization scheme. This is claimed to yield tractable inference in large state-spaces for the scale-imbalanced regime (trials << neurons) in neural dynamics modeling. On synthetic and real data, the method is asserted to be competitive with deep networks while providing significantly improved uncertainty calibration over previous Bayesian scaling attempts.

Significance. If the tractability and empirical results hold, this would provide a valuable Bayesian alternative for neural latent variable modeling that scales computationally while retaining explicit uncertainty quantification, bridging gaps between traditional Bayesian methods and deep networks in neuroscience applications.

major comments (1)
  1. [Abstract] Abstract and Introduction: the central claim that the novel training loss and optimization scheme incorporates model selection while remaining tractable (linear in state dimension) is load-bearing but unsupported by any complexity analysis, equation, or pseudocode showing that quadratic terms from prior methods are avoided; without this, the extension to the scale-imbalanced regime cannot be evaluated.
minor comments (1)
  1. [Abstract] Abstract: the assertions of competitiveness and improved calibration would be strengthened by inclusion of key quantitative metrics, baselines, and error bars.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive review and for highlighting this important point about supporting the tractability claim. We address the major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract and Introduction: the central claim that the novel training loss and optimization scheme incorporates model selection while remaining tractable (linear in state dimension) is load-bearing but unsupported by any complexity analysis, equation, or pseudocode showing that quadratic terms from prior methods are avoided; without this, the extension to the scale-imbalanced regime cannot be evaluated.

    Authors: We agree that the abstract and introduction would be strengthened by explicit support for the tractability claim. The full manuscript contains the relevant derivations in Section 3.3 and Algorithm 1, but these are not summarized in the abstract or introduction. In the revised version we will add a short paragraph (with one equation and a reference to the algorithm) in the introduction that states the per-iteration complexity is O(N D) rather than O(N D^2), shows the key cancellation that removes the quadratic term from the earlier computation-aware filter, and notes that model selection is performed via a single forward pass over the same linear-time quantities. This addition will make the extension to the scale-imbalanced regime immediately evaluable from the front matter. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation chain not visible and no self-referential reductions in provided text

full rationale

The abstract and excerpt describe an extension of prior computation-aware filtering to model selection via a novel loss and optimizer, claiming tractability in large state spaces. No equations, fitting procedures, or derivation steps are shown that reduce a claimed prediction or result to its own inputs by construction. No self-citations are quoted as load-bearing for uniqueness or ansatz. The central claim of improved calibration and competitiveness remains independent of any visible circular step; external benchmarks (synthetic/real data experiments) are referenced but not shown to be tautological. This is the expected non-finding when no load-bearing reduction can be exhibited.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract alone supplies no concrete free parameters, axioms, or invented entities beyond naming the CASSM framework; full text would be needed to audit these.

pith-pipeline@v0.9.1-grok · 5735 in / 1157 out tokens · 31650 ms · 2026-06-28T15:55:20.696873+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

32 extracted references · 17 canonical work pages · 6 internal anchors

  1. [1]

    Neural manifolds for the control of movement.Neuron, 94(5):978–984, 2017

    Juan A Gallego, Matthew G Perich, Lee E Miller, and Sara A Solla. Neural manifolds for the control of movement.Neuron, 94(5):978–984, 2017

  2. [2]

    Siegle, Xiaoxuan Jia, and Severine Durand

    J.H. Siegle, Xiaoxuan Jia, and Severine Durand. Survey of spiking in the mouse visual system reveals functional hierarchy.Nature, 2021

  3. [3]

    Distributed coding of choice, action, and engagement across the mouse brain.Nature, 2019

    Nicholas Steinmetz, Peter Zatka-Haas, Matteo Carandini, and Kenneth Harris. Distributed coding of choice, action, and engagement across the mouse brain.Nature, 2019

  4. [4]

    Raj Magesh Gauthaman, Brice Ménard, and Michael F. Bonner. Universal scale-free repre- sentations in human visual cortex.PLOS Com- putational Biology, 21(11):e1013714, 2025. doi: 10.1371/journal.pcbi.1013714. URL https://doi. org/10.1371/journal.pcbi.1013714

  5. [5]

    Gaussian-process factor analysis for low-dimensional single- trial analysis of neural population activ- ity

    Byron M Yu, John P Cunningham, Gopal San- thanam, Stephen Ryu, Krishna V Shenoy, and Maneesh Sahani. Gaussian-process factor analysis for low-dimensional single- trial analysis of neural population activ- ity. 21, 2008. URL https://proceedings. neurips.cc/paper_files/paper/2008/file/ ad972f10e0800b49d76fed33a21f6698-Paper. pdf

  6. [6]

    O’Shea, Jasmine Collins, Rafal Jozefowicz, Sergey D

    Chethan Pandarinath, Daniel J. O’Shea, Jasmine Collins, Rafal Jozefowicz, Sergey D. Stavisky, Jonathan C. Kao, Eric M. Trautmann, Matthew T. Kaufman, Stephen I. Ryu, Leigh R. Hochberg, Jaimie M. Henderson, Krishna V. Shenoy, L. F. Abbott, and David Sussillo. Inferring single-trial neural population dynamics using sequential auto- encoders.Nature Methods, ...

  7. [7]

    Sedler, Harrison A

    Feng Zhu, Andrew R. Sedler, Harrison A. Grier, Nauman Ahad, Mark A. Davenport, Matthew T. Kaufman, Andrea Giovannucci, and Chethan Pan- darinath. Deep inference of latent dynamics with spatio-temporal super-resolution using selective backpropagation through time. 2021. URLhttps: //arxiv.org/abs/2111.00070

  8. [8]

    Karpowicz, Yahia H

    Brianna M. Karpowicz, Yahia H. Ali, Lahiru N. Wimalasena, Andrew R. Sedler, Mohammad Reza Keshtkaran, Kevin Bodkin, Xuan Ma, Lee E. Miller, and Chethan Pandarinath. Stabilizing brain- computer interfaces through alignment of latent dynamics.bioRxiv, 2022. doi: 10.1101/2022.04.06. 487388

  9. [9]

    Training Compute-Optimal Large Language Models

    Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hen- dricks, Johannes Welbl, Aidan Clark, Tom Henni- gan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, and Laurent S...

  10. [10]

    Temporal align- ment and latent gaussian process factor inference in population spike trains.Advances in Neural In- formation Processing Systems (NeurIPS), 31, 2018

    Lea Duncker and Maneesh Sahani. Temporal align- ment and latent gaussian process factor inference in population spike trains.Advances in Neural In- formation Processing Systems (NeurIPS), 31, 2018

  11. [11]

    Jensen, Ta-Chu Kao, Jasmine T

    Kristopher T. Jensen, Ta-Chu Kao, Jasmine T. Stone, and Guillaume Hennequin. Scalable Bayesian GPFA with automatic relevance de- termination and discrete noise models.bioRxiv,

  12. [12]

    URL https://www.biorxiv.org/content/early/ 2021/06/03/2021.06.03.446788

    doi: 10.1101/2021.06.03.446788. URL https://www.biorxiv.org/content/early/ 2021/06/03/2021.06.03.446788

  13. [13]

    Batched Large-scale Bayesian Optimization in High-dimensional Spaces

    Zi Wang, Clement Gehring, Pushmeet Kohli, and Stefanie Jegelka. Batched large-scale bayesian op- timization in high-dimensional spaces. 2018. URL https://arxiv.org/abs/1706.01445

  14. [14]

    Cunningham

    Jonathan Wenger, Geoff Pleiss, Marvin Pförtner, Philipp Hennig, and John P. Cunningham. Pos- terior and computational uncertainty in Gaussian processes. InAdvances in Neural Information Pro- cessing Systems (NeurIPS), 2022

  15. [16]

    Gardner, Geoff Pleiss, and John P

    Jonathan Wenger, Kaiwen Wu, Philipp Hennig, Jacob R. Gardner, Geoff Pleiss, and John P. Cun- ningham. Computation-aware gaussian processes: Model selection and linear-time inference. 2024

  16. [17]

    Mamba: Linear-Time Sequence Modeling with Selective State Spaces

    Albert Gu and Tri Dao. Mamba: Linear-Time Se- quence Modeling with Selective State Spaces, 2023. URLhttp://arxiv.org/abs/2312.00752

  17. [18]

    Cambridge Uni- versity Press, 2nd edition, 2023

    Simo Särkkä and Lennart Svensson.Bayesian Fil- tering and Smoothing, volume 17. Cambridge Uni- versity Press, 2nd edition, 2023. ISBN 978-1-108- 91230-3

  18. [19]

    Pnevmatikakis, Kamiar Rahnama Rad, Justin Huggins, and Liam Paninski

    Eftychios A. Pnevmatikakis, Kamiar Rahnama Rad, Justin Huggins, and Liam Paninski. Fast kalman filtering and forward–backward smoothing via a low-rank perturbative approach.Journal of Com- putational and Graphical Statistics, 22(3):671–690,

  19. [20]

    doi: 10.1080/10618600.2012.760461

  20. [21]

    Fast kalman filtering on quasi- linear dendritic trees.Journal of Computational Neuroscience, 28(2):211–228, 2010

    Liam Paninski. Fast kalman filtering on quasi- linear dendritic trees.Journal of Computational Neuroscience, 28(2):211–228, 2010. doi: 10.1007/ s10827-009-0200-4

  21. [22]

    The geometry of low-rank kalman filters.arXiv preprint,

    Silvère Bonnabel and Rodolphe Sepulchre. The geometry of low-rank kalman filters.arXiv preprint,

  22. [23]

    The geometry of low-rank Kalman filters

    URL https://arxiv.org/abs/1203.4049. Revised version: 2013. 9

  23. [24]

    Metamorphictestingoflarge languagemodelsfornaturallanguageprocessing.doi:10.48550/arXiv

    Daniel Y. Fu, Tri Dao, Khaled K. Saab, Armin W. Thomas, Atri Rudra, and Christopher Ré. Hun- gry hungry hippos: Towards language model- ing with state space models.arXiv preprint arXiv:2212.14052, 2022. doi: 10.48550/arXiv. 2212.14052. URL https://arxiv.org/abs/2212. 14052. ICLR 2023

  24. [25]

    Kalman filtering and smoothing so- lutions to temporal gaussian process regression models

    Simo Sarkka. Kalman filtering and smoothing so- lutions to temporal gaussian process regression models. 2010. URL https://users.aalto.fi/ ~ssarkka/pub/gp-ts-kfrts.pdf

  25. [26]

    Robust differentiable svd

    Wei Wang, Zheng Dang, Yinlin Hu, Pascal Fua, and Mathieu Salzmann. Robust differentiable svd. arXiv preprint arXiv:2104.03821, 2021. doi: 10. 1109/TPAMI.2021.3072422

  26. [27]

    Differentiable svd based on moore-penrose pseudoinverse for inverse imaging problems

    Yinghao Zhang and Yue Hu. Differentiable svd based on moore-penrose pseudoinverse for inverse imaging problems. 2024

  27. [28]

    Neural Comput

    YuanZhaoandIlMemmingPark. Variationallatent gaussian process for recovering single-trial dynamics from population spike trains.Neural Computation, 29(5):1293–1316, may 2017. doi: 10.1162/neco_a_ 00953

  28. [29]

    Neural latents benchmark’21: evaluating latent variable models of neural population activity.arXiv preprint arXiv:2109.04463, 2021

    Felix Pei, Joel Ye, David M. Zoltowski, Anqi Wu, Raeed H. Chowdhury, Hansem Sohn, Joseph E. O’Doherty, Krishna V. Shenoy, Matthew T. Kauf- man, Mark Churchland, Mehrdad Jazayeri, Lee E. Miller, Jonathan Pillow, Il Memming Park, Eva L. Dyer, and Chethan Pandarinath. Neural latents benchmark ’21: Evaluating latent variable mod- els of neural population acti...

  29. [30]

    Macke, Lars Büsing, John P

    Jakob H. Macke, Lars Büsing, John P. Cun- ningham, Byron M. Yu, Krishna V. Shenoy, and Maneesh Sahani. Empirical models of spiking in neural populations. InAdvances in Neural Information Processing Systems 24 (NIPS 2011), pages 1350–1358, 2011. URL https://papers.nips.cc/paper/2011/file/ 7143d7fbadfa4693b9eec507d9d37443-Paper. pdf

  30. [31]

    X. Chen, Y. Mu, Y. Hu, A. T. Kuan, M. Nikitchenko, O. Randlett, A. B. Chen, J. P. Gavornik, H. Sompolinsky, F. Engert, and M. B. Ahrens. Brain-wide organization of neuronal activ- ity and convergent sensorimotor transformations in larval zebrafish.Neuron, 100(4):876–890.e5, 2018. doi: 10.1016/j.neuron.2018.09.064

  31. [32]

    Sedler, Raeed H

    Mohammad Reza Keshtkaran, Andrew R. Sedler, Raeed H. Chowdhury, Raghav Tandon, Diya Basrai, Sarah L. Nguyen, Hansem Sohn, Mehrdad Jazay- eri, Lee E. Miller, and Chethan Pandarinath. A large-scale neural network training framework for generalized estimation of single-trial population dy- namics.Nature Methods, 19(12):1572–1577, 2022. doi: 10.1038/s41592-02...

  32. [33]

    David Sussillo, Rafal Jozefowicz, L. F. Abbott, and Chethan Pandarinath. Lfads - latent factor analysis via dynamical systems. 2016. URLhttps://arxiv. org/abs/1608.06315. 10 Supplementary Material The supplementary materials contain derivations for our theoretical framework and proofs for the mathematical statements in the main text. We also provide imple...