A probabilistic framework for online test-time adaptation

Daniel Corrales; David R\'ios Insua

arxiv: 2606.26457 · v1 · pith:P3VZYAZSnew · submitted 2026-06-24 · 📊 stat.ML · cs.LG

A probabilistic framework for online test-time adaptation

Daniel Corrales , David R\'ios Insua This is my paper

Pith reviewed 2026-06-26 00:23 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords online test-time adaptationprobabilistic frameworkstate-space modeldistributional shiftparameter evolution

0 comments

The pith

A state-space model unifies parameter learning and prediction for online test-time adaptation under distributional shifts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a probabilistic framework for adapting a model trained on labeled data to unlabeled test data when distributions may have shifted. It uses a state-space architecture to model how parameters evolve over time, allowing for learning, prior tuning, and predictions in an online setting. This matters because many real-world applications involve data arriving sequentially with possible changes in underlying distributions, requiring models to adapt without labels. The framework characterizes all key aspects of the adaptation process through this modeling approach.

Core claim

The framework is based on a state-space modelling architecture from which parameter learning, parameter time evolution, prior tuning, and prediction can be characterized for online test-time adaptation under potential distributional shifts.

What carries the argument

state-space modelling architecture that tracks parameter dynamics over time

If this is right

Parameters can be learned and updated sequentially as new unlabeled data arrives.
Prior distributions can be tuned based on the state evolution.
Predictions account for the uncertainty in parameter changes due to shifts.
Adaptation becomes a filtering problem in the state-space model.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Such a model could integrate with existing Bayesian online learning techniques for more robust adaptation.
Extensions might include handling multiple possible shift types within the state transitions.

Load-bearing premise

That the dynamics of model parameters during adaptation can be adequately represented by a state-space model.

What would settle it

A comparison where the state-space predictions fail to match observed adaptation performance on datasets with known distributional shifts.

Figures

Figures reproduced from arXiv: 2606.26457 by Daniel Corrales, David R\'ios Insua.

**Figure 2.** Figure 2: Structural relationships between source parameters [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Prior parameter dynamics induced by four choices of transition matrix [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Classification boundary evolution [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

**Figure 5.** Figure 5: Classification boundary evolution [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: Classification boundary evolution loss curvature information of the current step and discards the curvatures of previous steps. We define the parameters, following the previously used notation, as µt = µt|t−1 − βtΣtgt and Σ −1 t = W−1 r + βtHt. The covariance matrix does not use previous step information but uses noise-inflated curvature information of the current step. This approach may be seen as a curva… view at source ↗

read the original abstract

This paper presents a probabilistic framework for online test-time adaptation problems. In them, a model is trained on labeled data but must adapt to unlabeled data at test time under the assumption that training and test distributions potentially differ, that is, there might have been a distributional shift. The framework is based on a state-space modelling architecture from which parameter learning, parameter time evolution, prior tuning, and prediction can be characterized.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Abstract sketches a state-space probabilistic framing for online TTA but supplies no equations, derivations, or results to evaluate.

read the letter

The paper's central idea is to treat online test-time adaptation as a state-space problem so that parameter learning, time evolution, prior tuning, and prediction can all be handled inside one probabilistic architecture when training and test distributions may differ.

That framing is the only concrete element on offer. It tries to connect TTA to standard filtering and Bayesian updating ideas, which could in principle give a cleaner way to track how model parameters should change at test time.

Everything else is missing. The abstract contains no transition or observation equations, no inference procedure, no algorithm, and no experiments. We therefore cannot tell whether the state-space model is linear, whether it reduces to existing Kalman or particle filter methods, or whether it actually improves adaptation on any benchmark. The claim that the architecture "characterizes" the four listed tasks is asserted but not shown.

The main weakness is simply the lack of technical content. Without derivations or data, there is no way to check if the distributional-shift assumption is handled better than current TTA baselines or if the modeling choices introduce new problems such as identifiability or computational cost.

This would mainly interest researchers already working on probabilistic or sequential models for adaptation who want to see whether state-space ideas add leverage. Most readers looking for usable methods or comparative results will find nothing here.

I would not send it to referees in its current form. The authors need to supply the actual model, inference details, and at least preliminary validation before any serious review makes sense.

Referee Report

1 major / 0 minor

Summary. The paper presents a probabilistic framework for online test-time adaptation problems. A model is trained on labeled data but must adapt to unlabeled data at test time under potential distributional shift. The framework is based on a state-space modelling architecture from which parameter learning, parameter time evolution, prior tuning, and prediction can be characterized.

Significance. If rigorously developed with explicit derivations and validated empirically, such a framework could provide a unified probabilistic treatment of online TTA, enabling principled handling of distributional shift via state-space dynamics. The abstract alone supplies no such development, so significance cannot be assessed.

major comments (1)

[Abstract] Abstract: no equations, state-space model definition, learning rules, or experimental results are supplied, so the central claim that the architecture 'characterizes' parameter learning, time evolution, prior tuning, and prediction cannot be evaluated for soundness or novelty.

Simulated Author's Rebuttal

1 responses · 1 unresolved

We thank the referee for their review. We address the single major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: no equations, state-space model definition, learning rules, or experimental results are supplied, so the central claim that the architecture 'characterizes' parameter learning, time evolution, prior tuning, and prediction cannot be evaluated for soundness or novelty.

Authors: We agree that the provided manuscript consists solely of the abstract, which contains no equations, state-space model definition, learning rules, or experimental results. Consequently, the central claim cannot be evaluated for soundness or novelty from the given text. revision: no

standing simulated objections not resolved

Only the abstract is available, so we cannot supply the state-space model, derivations, or results needed to allow evaluation of the framework.

Circularity Check

0 steps flagged

No circularity detectable; abstract-only text provides no derivation chain

full rationale

Only the abstract is available, which states the existence of a state-space modelling architecture for characterizing parameter learning, time evolution, prior tuning, and prediction but supplies no equations, self-citations, fitted inputs, or ansatzes. No load-bearing steps exist to inspect for reduction to inputs by construction, self-definition, or self-citation chains. This matches the default case of honest non-finding when the paper is self-contained against external benchmarks and no evidence of circularity is present.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no free parameters, axioms, or invented entities are specified or can be extracted.

pith-pipeline@v0.9.1-grok · 5554 in / 1116 out tokens · 30763 ms · 2026-06-26T00:23:33.596382+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

36 extracted references · 4 canonical work pages

[1]

P. G. Arce, R. Naveiro, and D. R. Insua. Evasion attacks against bayesian predictive models. InProceedings of the Forty-First Conference on Uncertainty in Artificial Intel- ligence, pages 184–202, 2025

2025
[2]

C. M. Bishop and N. M. Nasrabadi.Pattern recognition and machine learning, volume 4. Springer, 2006

2006
[3]

P. G. Bissiri, C. C. Holmes, and S. G. Walker. A general framework for updating belief distributions.Journal of the Royal Statistical Society Series B: Statistical Methodology, 78(5):1103–1130, 2016

2016
[4]

Chapelle, B

O. Chapelle, B. Sch¨ olkopf, and A. Zien, editors.Semi-supervised learning. Adap- tive computation and machine learning. MIT Press, Cambridge, Mass, 2006. ISBN 9780262033589

2006
[5]

Daxberger, A

E. Daxberger, A. Kristiadi, A. Immer, R. Eschenhagen, M. Bauer, and P. Hennig. Laplace redux-effortless bayesian deep learning.Advances in neural information processing sys- tems, 34:20089–20103, 2021

2021
[6]

Duran-Martin, L

G. Duran-Martin, L. S´ anchez-Betancourt, A. Y. Shestopaloff, and K. Murphy. A unifying framework for generalised bayesian online learning in non-stationary environments.arXiv preprint arXiv:2411.10153, 2024

work page arXiv 2024
[7]

Duran-Martin, L

G. Duran-Martin, L. S´ anchez-Betancourt,´A. Cartea, and K. Murphy. Martingale poste- rior neural networks for fast sequential decision making.Advances in Neural Information Processing Systems, 38:87940–87988, 2026

2026
[8]

Goyal, M

S. Goyal, M. Sun, A. Raghunathan, and J. Z. Kolter. Test time adaptation via conjugate pseudo-labels.Advances in Neural Information Processing Systems, 35:6204–6218, 2022

2022
[9]

Grandvalet and Y

Y. Grandvalet and Y. Bengio. Semi-supervised learning by entropy minimization.Ad- vances in neural information processing systems, 17, 2004

2004
[10]

E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, W. Chen, et al. Lora: Low-rank adaptation of large language models.Iclr, 1(2):3, 2022

2022
[11]

Iwasawa and Y

Y. Iwasawa and Y. Matsuo. Test-time classifier adjustment module for model-agnostic domain generalization.Advances in Neural Information Processing Systems, 34:2427– 2440, 2021

2021
[12]

Jones, P

M. Jones, P. Chang, and K. Murphy. Bayesian online natural gradient (bong).Advances in Neural Information Processing Systems, 37:131104–131153, 2024

2024
[13]

Kirkpatrick, R

J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Mi- lan, J. Quan, T. Ramalho, A. Grabska-Barwinska, et al. Overcoming catastrophic for- getting in neural networks.Proceedings of the national academy of sciences, 114(13): 3521–3526, 2017

2017
[14]

Knoblauch, J

J. Knoblauch, J. Jewson, and T. Damoulas. An optimization-centric view on bayes’ rule: Reviewing and generalizing variational inference.Journal of Machine Learning Research, 23(132):1–109, 2022. 16

2022
[15]

P. W. Koh, S. Sagawa, H. Marklund, S. M. Xie, M. Zhang, A. Balsubramani, W. Hu, M. Yasunaga, R. L. Phillips, I. Gao, et al. Wilds: A benchmark of in-the-wild distribution shifts. InInternational conference on machine learning, pages 5637–5664. PMLR, 2021

2021
[16]

Kojima, J

Y. Kojima, J. Xu, X. Zou, and X. Wang. Lora-ttt: Low-rank test-time training for vision-language models.arXiv preprint arXiv:2502.02069, 2025

work page arXiv 2025
[17]

J.-H. Lee. Bayesian weight enhancement with steady-state adaptation for test-time adap- tation in dynamic environments. InForty-second International Conference on Machine Learning, 2025

2025
[18]

Lee and J.-H

J.-H. Lee and J.-H. Chang. Continual momentum filtering on parameter space for online test-time adaptation. InThe Twelfth International Conference on Learning Representa- tions, 2024

2024
[19]

Lee and J.-H

J.-H. Lee and J.-H. Chang. Stationary latent weight inference for unreliable observations from online test-time adaptation. InForty-first International Conference on Machine Learning, 2024

2024
[20]

Liang, D

J. Liang, D. Hu, and J. Feng. Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation. InInternational conference on machine learning, pages 6028–6039. PMLR, 2020

2020
[21]

Liang, R

J. Liang, R. He, and T. Tan. A comprehensive survey on test-time adaptation under distribution shifts.International Journal of Computer Vision, 133(1):31–64, 2025

2025
[22]

Y. Liu, P. Kothari, B. Van Delft, B. Bellot-Gurlet, T. Mordan, and A. Alahi. Ttt++: When does self-supervised test-time training fail or thrive?Advances in Neural Infor- mation Processing Systems, 34:21808–21820, 2021

2021
[23]

D. J. MacKay. A practical bayesian framework for backpropagation networks.Neural computation, 4(3):448–472, 1992

1992
[24]

R. A. Marsden, M. D¨ obler, and B. Yang. Universal test-time adaptation through weight ensembling, diversity weighting, and prior correction. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2555–2565, 2024

2024
[25]

J. Martens. New insights and perspectives on the natural gradient method.Journal of Machine Learning Research, 21(146):1–76, 2020

2020
[26]

K. P. Murphy.Probabilistic machine learning: Advanced topics. MIT press, 2023

2023
[27]

S. Niu, J. Wu, Y. Zhang, Y. Chen, S. Zheng, P. Zhao, and M. Tan. Efficient test-time model adaptation without forgetting. InInternational conference on machine learning, pages 16888–16905. PMLR, 2022

2022
[28]

Schirmer, D

M. Schirmer, D. Zhang, and E. Nalisnick. Temporal Test-Time Adaptation with State- Space Models, Nov. 2025. URLhttp://arxiv.org/abs/2407.12492. arXiv:2407.12492 [cs]

work page arXiv 2025
[29]

M. Seeger. Learning with labeled and unlabeled data. Technical report, Institute for Adaptive and Neural Computation, University of Edinburgh, 2000

2000
[30]

Y. Sun, X. Wang, Z. Liu, J. Miller, A. Efros, and M. Hardt. Test-time training with self-supervision for generalization under distribution shifts. InInternational conference on machine learning, pages 9229–9248. PMLR, 2020. 17

2020
[31]

D. Wang, E. Shelhamer, S. Liu, B. Olshausen, and T. Darrell. Tent: Fully test-time adaptation by entropy minimization. InInternational Conference on Learning Represen- tations, 2021

2021
[32]

Q. Wang, O. Fink, L. Van Gool, and D. Dai. Continual test-time domain adaptation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7201–7211, 2022

2022
[33]

West and J

M. West and J. Harrison.Bayesian forecasting and dynamic models. Springer, 1997

1997
[34]

Xiao and C

Z. Xiao and C. G. Snoek. Beyond model adaptation at test time: A survey.arXiv preprint arXiv:2411.03687, 2024

work page arXiv 2024
[35]

Zhang, S

M. Zhang, S. Levine, and C. Finn. Memo: Test time robustness via adaptation and augmentation.Advances in neural information processing systems, 35:38629–38642, 2022

2022
[36]

Zhou and S

A. Zhou and S. Levine. Bayesian adaptation for covariate shift.Advances in neural information processing systems, 34:914–927, 2021. 18

2021

[1] [1]

P. G. Arce, R. Naveiro, and D. R. Insua. Evasion attacks against bayesian predictive models. InProceedings of the Forty-First Conference on Uncertainty in Artificial Intel- ligence, pages 184–202, 2025

2025

[2] [2]

C. M. Bishop and N. M. Nasrabadi.Pattern recognition and machine learning, volume 4. Springer, 2006

2006

[3] [3]

P. G. Bissiri, C. C. Holmes, and S. G. Walker. A general framework for updating belief distributions.Journal of the Royal Statistical Society Series B: Statistical Methodology, 78(5):1103–1130, 2016

2016

[4] [4]

Chapelle, B

O. Chapelle, B. Sch¨ olkopf, and A. Zien, editors.Semi-supervised learning. Adap- tive computation and machine learning. MIT Press, Cambridge, Mass, 2006. ISBN 9780262033589

2006

[5] [5]

Daxberger, A

E. Daxberger, A. Kristiadi, A. Immer, R. Eschenhagen, M. Bauer, and P. Hennig. Laplace redux-effortless bayesian deep learning.Advances in neural information processing sys- tems, 34:20089–20103, 2021

2021

[6] [6]

Duran-Martin, L

G. Duran-Martin, L. S´ anchez-Betancourt, A. Y. Shestopaloff, and K. Murphy. A unifying framework for generalised bayesian online learning in non-stationary environments.arXiv preprint arXiv:2411.10153, 2024

work page arXiv 2024

[7] [7]

Duran-Martin, L

G. Duran-Martin, L. S´ anchez-Betancourt,´A. Cartea, and K. Murphy. Martingale poste- rior neural networks for fast sequential decision making.Advances in Neural Information Processing Systems, 38:87940–87988, 2026

2026

[8] [8]

Goyal, M

S. Goyal, M. Sun, A. Raghunathan, and J. Z. Kolter. Test time adaptation via conjugate pseudo-labels.Advances in Neural Information Processing Systems, 35:6204–6218, 2022

2022

[9] [9]

Grandvalet and Y

Y. Grandvalet and Y. Bengio. Semi-supervised learning by entropy minimization.Ad- vances in neural information processing systems, 17, 2004

2004

[10] [10]

E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, W. Chen, et al. Lora: Low-rank adaptation of large language models.Iclr, 1(2):3, 2022

2022

[11] [11]

Iwasawa and Y

Y. Iwasawa and Y. Matsuo. Test-time classifier adjustment module for model-agnostic domain generalization.Advances in Neural Information Processing Systems, 34:2427– 2440, 2021

2021

[12] [12]

Jones, P

M. Jones, P. Chang, and K. Murphy. Bayesian online natural gradient (bong).Advances in Neural Information Processing Systems, 37:131104–131153, 2024

2024

[13] [13]

Kirkpatrick, R

J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Mi- lan, J. Quan, T. Ramalho, A. Grabska-Barwinska, et al. Overcoming catastrophic for- getting in neural networks.Proceedings of the national academy of sciences, 114(13): 3521–3526, 2017

2017

[14] [14]

Knoblauch, J

J. Knoblauch, J. Jewson, and T. Damoulas. An optimization-centric view on bayes’ rule: Reviewing and generalizing variational inference.Journal of Machine Learning Research, 23(132):1–109, 2022. 16

2022

[15] [15]

P. W. Koh, S. Sagawa, H. Marklund, S. M. Xie, M. Zhang, A. Balsubramani, W. Hu, M. Yasunaga, R. L. Phillips, I. Gao, et al. Wilds: A benchmark of in-the-wild distribution shifts. InInternational conference on machine learning, pages 5637–5664. PMLR, 2021

2021

[16] [16]

Kojima, J

Y. Kojima, J. Xu, X. Zou, and X. Wang. Lora-ttt: Low-rank test-time training for vision-language models.arXiv preprint arXiv:2502.02069, 2025

work page arXiv 2025

[17] [17]

J.-H. Lee. Bayesian weight enhancement with steady-state adaptation for test-time adap- tation in dynamic environments. InForty-second International Conference on Machine Learning, 2025

2025

[18] [18]

Lee and J.-H

J.-H. Lee and J.-H. Chang. Continual momentum filtering on parameter space for online test-time adaptation. InThe Twelfth International Conference on Learning Representa- tions, 2024

2024

[19] [19]

Lee and J.-H

J.-H. Lee and J.-H. Chang. Stationary latent weight inference for unreliable observations from online test-time adaptation. InForty-first International Conference on Machine Learning, 2024

2024

[20] [20]

Liang, D

J. Liang, D. Hu, and J. Feng. Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation. InInternational conference on machine learning, pages 6028–6039. PMLR, 2020

2020

[21] [21]

Liang, R

J. Liang, R. He, and T. Tan. A comprehensive survey on test-time adaptation under distribution shifts.International Journal of Computer Vision, 133(1):31–64, 2025

2025

[22] [22]

Y. Liu, P. Kothari, B. Van Delft, B. Bellot-Gurlet, T. Mordan, and A. Alahi. Ttt++: When does self-supervised test-time training fail or thrive?Advances in Neural Infor- mation Processing Systems, 34:21808–21820, 2021

2021

[23] [23]

D. J. MacKay. A practical bayesian framework for backpropagation networks.Neural computation, 4(3):448–472, 1992

1992

[24] [24]

R. A. Marsden, M. D¨ obler, and B. Yang. Universal test-time adaptation through weight ensembling, diversity weighting, and prior correction. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2555–2565, 2024

2024

[25] [25]

J. Martens. New insights and perspectives on the natural gradient method.Journal of Machine Learning Research, 21(146):1–76, 2020

2020

[26] [26]

K. P. Murphy.Probabilistic machine learning: Advanced topics. MIT press, 2023

2023

[27] [27]

S. Niu, J. Wu, Y. Zhang, Y. Chen, S. Zheng, P. Zhao, and M. Tan. Efficient test-time model adaptation without forgetting. InInternational conference on machine learning, pages 16888–16905. PMLR, 2022

2022

[28] [28]

Schirmer, D

M. Schirmer, D. Zhang, and E. Nalisnick. Temporal Test-Time Adaptation with State- Space Models, Nov. 2025. URLhttp://arxiv.org/abs/2407.12492. arXiv:2407.12492 [cs]

work page arXiv 2025

[29] [29]

M. Seeger. Learning with labeled and unlabeled data. Technical report, Institute for Adaptive and Neural Computation, University of Edinburgh, 2000

2000

[30] [30]

Y. Sun, X. Wang, Z. Liu, J. Miller, A. Efros, and M. Hardt. Test-time training with self-supervision for generalization under distribution shifts. InInternational conference on machine learning, pages 9229–9248. PMLR, 2020. 17

2020

[31] [31]

D. Wang, E. Shelhamer, S. Liu, B. Olshausen, and T. Darrell. Tent: Fully test-time adaptation by entropy minimization. InInternational Conference on Learning Represen- tations, 2021

2021

[32] [32]

Q. Wang, O. Fink, L. Van Gool, and D. Dai. Continual test-time domain adaptation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7201–7211, 2022

2022

[33] [33]

West and J

M. West and J. Harrison.Bayesian forecasting and dynamic models. Springer, 1997

1997

[34] [34]

Xiao and C

Z. Xiao and C. G. Snoek. Beyond model adaptation at test time: A survey.arXiv preprint arXiv:2411.03687, 2024

work page arXiv 2024

[35] [35]

Zhang, S

M. Zhang, S. Levine, and C. Finn. Memo: Test time robustness via adaptation and augmentation.Advances in neural information processing systems, 35:38629–38642, 2022

2022

[36] [36]

Zhou and S

A. Zhou and S. Levine. Bayesian adaptation for covariate shift.Advances in neural information processing systems, 34:914–927, 2021. 18

2021