pith. sign in

arxiv: 2509.22813 · v2 · submitted 2025-09-26 · 💻 cs.CV

TRUST: Test-Time Refinement using Uncertainty-Guided SSM Traverses

Pith reviewed 2026-05-18 12:51 UTC · model grok-4.3

classification 💻 cs.CV
keywords test-time adaptationstate space modelsVMambadistribution shiftsrobustnesscomputer visionMamba architecturepseudo-labeling
0
0 comments X p. Extension

The pith

TRUST adapts state space models at test time by generating multiple traversal permutations of each input and averaging the resulting parameter updates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a test-time adaptation method designed specifically for state space models used in vision. It creates several different scan orders, or traversals, of the same image to produce varied causal views. Model predictions on these views serve as pseudo-labels to update only the Mamba-specific parameters, after which the updated weights from each traversal are averaged together. A sympathetic reader would care because this approach exploits the sequential processing structure unique to SSMs rather than treating the model as a black box. If successful, it offers a way to improve robustness when input distributions shift at deployment without any retraining or access to source data.

Core claim

The authors claim that by leveraging diverse traversal permutations to generate multiple causal perspectives of an input image, using the model's own predictions as pseudo-labels to update Mamba-specific parameters, and then averaging the adapted weights across scans, one can achieve consistent gains in robustness under distribution shifts. They position TRUST as the first method that explicitly uses the architectural properties of SSMs for test-time adaptation rather than generic techniques.

What carries the argument

Uncertainty-guided SSM traverses: the generation of multiple input scan-order permutations that produce distinct causal views, whose predictions then guide selective updates to Mamba blocks followed by weight averaging.

If this is right

  • The method yields measurable robustness gains across seven standard benchmarks involving distribution shifts.
  • It outperforms prior test-time adaptation approaches that do not exploit SSM-specific structure.
  • Averaging weights from multiple traversals integrates adaptation signals without requiring additional labeled data.
  • Only Mamba-specific parameters need updating, keeping the adaptation lightweight.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same traversal-and-average pattern could be tested on other sequential vision backbones that admit multiple scan orders.
  • Focusing adaptation on architecture-specific blocks may reduce compute compared with updating the entire network at test time.
  • Combining the uncertainty signal with explicit confidence calibration could further reduce the risk of noisy pseudo-labels.

Load-bearing premise

That the model's predictions on the generated traversal views are accurate enough to serve as pseudo-labels without introducing confirmation bias or harmful errors into the parameter updates.

What would settle it

A controlled experiment on one of the seven benchmarks in which the averaged adapted model shows no improvement or a clear drop in accuracy compared with the original unadapted VMamba under the same distribution shift.

Figures

Figures reproduced from arXiv: 2509.22813 by Ali Bahri, Christian Desrosiers, David Osowiechi, Gustavo Adolfo Vargas Hakim, Herve Lombaert, Ismail Ben Ayed, Mehrdad Noori, Moslem Yazdanpanah, Sahar Dastani, Samuel Barbeau.

Figure 1
Figure 1. Figure 1: An overview of the proposed method. Our network consists of three stages: offline, [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Loss surface of model parameters. To further illustrate this point, [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Performance comparison between stan￾dard augmentations and TRUST on CIFAR10-C dataset. Number of Traversal Permutations [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Effect of traversal permutation count on accuracy across three datasets. 1 2 4 6 8 Iteration 77.4 77.6 77.8 78.0 78.2 78.4 78.6 78.8 Accuracy (%) TRUST [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Accuracy comparison of different ag￾gregation strategies on CIFAR10-C dataset. abcd abdc adcb bacd badc dbca Traversal Permutation in Evaluation 70 72 74 76 78 80 Accuracy (%) 77.5 75.3 74.2 74.9 71.6 75.7 [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
Figure 9
Figure 9. Figure 9: GPU memory usage across traversals. Computational Overhead. We evaluate GPU memory usage as a function of the number of traver￾sal permutations used during parallel adaptation. Since only the SS2D blocks are updated, we instan￾tiate one SS2D block per traversal while sharing the rest of the network. Traversals are batched and routed to their corresponding SS2D blocks, and their outputs are then concatenate… view at source ↗
Figure 10
Figure 10. Figure 10: Detailed diagram of TRUST in Parallel mode. [PITH_FULL_IMAGE:figures/full_fig_p016_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Mean entropy of different traversal permutation across seven benchmarks. [PITH_FULL_IMAGE:figures/full_fig_p017_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Mean and standard deviation of the L2 norm for the bias parameters [PITH_FULL_IMAGE:figures/full_fig_p019_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Mean and standard deviation of the L2 norm for the weight parameters [PITH_FULL_IMAGE:figures/full_fig_p019_13.png] view at source ↗
read the original abstract

State Space Models (SSMs) have emerged as efficient alternatives to Vision Transformers (ViTs), with VMamba standing out as a pioneering architecture designed for vision tasks. However, their generalization performance degrades significantly under distribution shifts. To address this limitation, we propose TRUST (Test-Time Refinement using Uncertainty-Guided SSM Traverses), a novel test-time adaptation (TTA) method that leverages diverse traversal permutations to generate multiple causal perspectives of the input image. Model predictions serve as pseudo-labels to guide updates of the Mamba-specific parameters, and the adapted weights are averaged to integrate the learned information across traversal scans. Altogether, TRUST is the first approach that explicitly leverages the unique architectural properties of SSMs for adaptation. Experiments on seven benchmarks show that TRUST consistently improves robustness and outperforms existing TTA methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes TRUST, a test-time adaptation (TTA) method for Vision State Space Models (SSMs) such as VMamba. It generates multiple causal perspectives of an input image via uncertainty-guided traversal permutations, uses the model's own predictions on these views as pseudo-labels to update Mamba-specific parameters, and averages the adapted weights across scans. The central claim is that this is the first approach to explicitly leverage SSM architectural properties for adaptation, with experiments on seven benchmarks demonstrating consistent robustness improvements and outperformance over existing TTA methods.

Significance. If the empirical results hold under rigorous validation, the work could be significant for improving generalization of efficient SSM-based vision models under distribution shift by exploiting their unique traversal and causal ordering properties rather than generic TTA techniques. This offers a potential efficiency advantage over ViT-centric methods. The paper's strength lies in its empirical focus on a timely architecture, but significance is tempered by the need for detailed experimental controls to confirm the gains are not due to confounding factors.

major comments (2)
  1. [Method and Experiments] The load-bearing assumption that predictions on uncertainty-guided traversal views can serve as reliable pseudo-labels for updating Mamba-specific parameters without net confirmation bias or harm is not adequately tested. Under distribution shift the base model is already degraded, and traversals only reorder the same features; if uncertainty guidance merely down-weights outliers rather than correcting systematic errors, weight averaging may propagate bias. This should be addressed with an ablation or analysis of pseudo-label accuracy (e.g., in the method or experiments section).
  2. [Abstract and §4 (Experiments)] The abstract and experimental claims report consistent improvements across seven benchmarks and outperformance of existing TTA methods, yet provide no details on experimental setup, baselines, statistical significance, error bars, or hyperparameter sensitivity. Without these, it is impossible to evaluate whether the reported gains are robust or reproducible.
minor comments (2)
  1. [§3] Clarify the exact definition of 'uncertainty-guided' traversal selection and how uncertainty is computed from the SSM outputs.
  2. [Discussion or Conclusion] Add a limitations or failure-case discussion, particularly regarding when the pseudo-label assumption breaks.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which have helped us improve the clarity and rigor of our work. We address each major comment below.

read point-by-point responses
  1. Referee: [Method and Experiments] The load-bearing assumption that predictions on uncertainty-guided traversal views can serve as reliable pseudo-labels for updating Mamba-specific parameters without net confirmation bias or harm is not adequately tested. Under distribution shift the base model is already degraded, and traversals only reorder the same features; if uncertainty guidance merely down-weights outliers rather than correcting systematic errors, weight averaging may propagate bias. This should be addressed with an ablation or analysis of pseudo-label accuracy (e.g., in the method or experiments section).

    Authors: We acknowledge the importance of verifying that the pseudo-labels generated from uncertainty-guided traversals are reliable and do not introduce confirmation bias. While the original manuscript included experiments showing overall performance gains, we agree that a direct analysis of pseudo-label accuracy would strengthen the claims. In the revised manuscript, we have added a new subsection in the experiments (Section 4.3) that reports the accuracy of pseudo-labels against ground truth on datasets where labels are available for evaluation purposes. Additionally, we include an ablation comparing performance with and without uncertainty guidance, demonstrating that it reduces the propagation of errors. We believe this addresses the concern that traversals merely reorder features without correcting systematic errors. revision: yes

  2. Referee: [Abstract and §4 (Experiments)] The abstract and experimental claims report consistent improvements across seven benchmarks and outperformance of existing TTA methods, yet provide no details on experimental setup, baselines, statistical significance, error bars, or hyperparameter sensitivity. Without these, it is impossible to evaluate whether the reported gains are robust or reproducible.

    Authors: We appreciate the referee's point regarding the need for more detailed reporting to ensure reproducibility. The original manuscript's Section 4 describes the experimental setup, including the seven benchmarks, and compares against existing TTA methods such as TENT, AdaBN, and others. However, to enhance transparency, we have revised the abstract to include a brief mention of the evaluation protocol and added error bars (standard deviation over 3 random seeds) to all reported results in Tables 1-3. We have also included a hyperparameter sensitivity analysis in the supplementary material and performed statistical significance testing using paired t-tests to confirm that improvements are significant (p < 0.05). These changes make the claims more robust and reproducible. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical TTA method rests on experiments

full rationale

The paper presents TRUST as a test-time adaptation algorithm that generates traversal views of an input, uses model predictions as pseudo-labels to update Mamba-specific parameters, and averages the resulting weights. No equations or derivation steps are shown that reduce a claimed output to the method's own fitted inputs or self-referential definitions. The novelty statement that TRUST is the first to explicitly leverage SSM architectural properties is presented as a descriptive claim supported by the algorithm and benchmark results rather than by any self-citation chain or uniqueness theorem imported from prior author work. The central performance claims are tied directly to experimental outcomes on seven benchmarks, making the paper self-contained against external validation without load-bearing circular reductions.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The method rests on standard machine learning assumptions about pseudo-label quality in test-time adaptation and the utility of architectural-specific traversals in SSMs; no new physical entities or heavily fitted parameters are introduced in the abstract description.

axioms (2)
  • domain assumption Model predictions on augmented views can serve as sufficiently accurate pseudo-labels for parameter updates during test-time adaptation.
    Invoked when using predictions to guide updates of Mamba-specific parameters.
  • domain assumption Averaging weights from multiple traversal-based adaptations integrates complementary information without destructive interference.
    Central to the final step of combining adapted models.

pith-pipeline@v0.9.0 · 5707 in / 1462 out tokens · 37639 ms · 2026-05-18T12:51:50.648152+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages

  1. [1]

    Very deep convolutional networks for large-scale image recognition

    Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. InInternational conference on learning representations, 2014

  2. [2]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

  3. [3]

    Densely connected convolutional networks

    Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. Densely connected convolutional networks. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708, 2017

  4. [4]

    Efficientnet: Rethinking model scaling for convolutional neural networks

    Mingxing Tan and Quoc Le. Efficientnet: Rethinking model scaling for convolutional neural networks. InInternational conference on machine learning, pages 6105–6114. PMLR, 2019

  5. [5]

    A convnet for the 2020s

    Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, and Saining Xie. A convnet for the 2020s. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11976–11986, 2022

  6. [6]

    An image is worth 16x16 words: Transformers for image recognition at scale

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, G Heigold, S Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representations, 2020

  7. [7]

    Swin transformer: Hierarchical vision transformer using shifted windows

    Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. InProceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021

  8. [8]

    Hivit: A simpler and more efficient design of hierarchical vision transformer

    Xiaosong Zhang, Yunjie Tian, Lingxi Xie, Wei Huang, Qi Dai, Qixiang Ye, and Qi Tian. Hivit: A simpler and more efficient design of hierarchical vision transformer. InThe Eleventh International Conference on Learning Representations, 2023

  9. [9]

    Training data-efficient image transformers & distillation through attention

    Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Hervé Jégou. Training data-efficient image transformers & distillation through attention. In International conference on machine learning, pages 10347–10357. PMLR, 2021

  10. [10]

    Efficiently modeling long sequences with structured state spaces

    Albert Gu, Karan Goel, and Christopher Re. Efficiently modeling long sequences with structured state spaces. InInternational Conference on Learning Representations, 2024

  11. [11]

    Hungry hungry hippos: Towards language modeling with state space models

    Tri Dao, Daniel Y Fu, Khaled K Saab, Armin W Thomas, Atri Rudra, and Christopher Ré. Hungry hungry hippos: Towards language modeling with state space models. InProceedings of the 11th International Conference on Learning Representations (ICLR), 2023

  12. [12]

    Simplified state space layers for sequence modeling

    Jimmy TH Smith, Andrew Warrington, and Scott W Linderman. Simplified state space layers for sequence modeling. InICLR, 2023

  13. [13]

    Vmamba: Visual state space model.Advances in neural information processing systems, 37:103031–103063, 2024

    Yue Liu, Yunjie Tian, Yuzhong Zhao, Hongtian Yu, Lingxi Xie, Yaowei Wang, Qixiang Ye, Jianbin Jiao, and Yunfan Liu. Vmamba: Visual state space model.Advances in neural information processing systems, 37:103031–103063, 2024

  14. [14]

    Mamba: Linear-time sequence modeling with selective state spaces

    Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces. In First Conference on Language Modeling, 2024

  15. [15]

    Spectral state space model for rotation-invariant visual representation learning

    Sahar Dastani, Ali Bahri, Moslem Yazdanpanah, Mehrdad Noori, David Osowiechi, Gustavo Adolfo Vargas Hakim, Farzad Beizaee, Milad Cheraghalikhani, Arnab Kumar Mondal, Herve Lombaert, et al. Spectral state space model for rotation-invariant visual representation learning. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 23881– ...

  16. [16]

    Dgmamba: Domain generalization via generalized state space model

    Shaocong Long, Qianyu Zhou, Xiangtai Li, Xuequan Lu, Chenhao Ying, Yuan Luo, Lizhuang Ma, and Shuicheng Yan. Dgmamba: Domain generalization via generalized state space model. InProceedings of the 32nd ACM International Conference on Multimedia, pages 3607–3616, 2024. 11

  17. [17]

    On large-batch training for deep learning: Generalization gap and sharp minima

    Nitish Shirish Keskar, Jorge Nocedal, Ping Tak Peter Tang, Dheevatsa Mudigere, and Mikhail Smelyanskiy. On large-batch training for deep learning: Generalization gap and sharp minima. In5th International Conference on Learning Representations, ICLR 2017, 2017

  18. [19]

    Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation

    Jian Liang, Dapeng Hu, and Jiashi Feng. Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation. InInternational conference on machine learning, pages 6028–6039. PMLR, 2020

  19. [20]

    Tent: Fully test-time adaptation by entropy minimization

    Dequan Wang, Evan Shelhamer, Shaoteng Liu, Bruno Olshausen, and Trevor Darrell. Tent: Fully test-time adaptation by entropy minimization. InInternational Conference on Learning Representations, 2021

  20. [21]

    Efficient test-time model adaptation without forgetting

    Shuaicheng Niu, Jiaxiang Wu, Yifan Zhang, Yaofo Chen, Shijian Zheng, Peilin Zhao, and Mingkui Tan. Efficient test-time model adaptation without forgetting. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors,Proceedings of the 39th International Conference on Machine Learning, volume 162 ofProceedings of...

  21. [22]

    Towards open-set test-time adaptation utilizing the wisdom of crowds in entropy minimization

    Jungsoo Lee, Debasmit Das, Jaegul Choo, and Sungha Choi. Towards open-set test-time adaptation utilizing the wisdom of crowds in entropy minimization. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 16380–16389, October 2023

  22. [23]

    Towards stable test-time adaptation in dynamic wild world

    Shuaicheng Niu, Jiaxiang Wu, Yifan Zhang, Zhiquan Wen, Yaofo Chen, Peilin Zhao, and Mingkui Tan. Towards stable test-time adaptation in dynamic wild world. InInternetional Conference on Learning Representations, 2023

  23. [24]

    Robust test-time adaptation in dynamic scenarios

    Longhui Yuan, Binhui Xie, and Shuang Li. Robust test-time adaptation in dynamic scenarios. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15922–15932, 2023

  24. [25]

    Test-time adaptation via conjugate pseudo-labels.Advances in Neural Information Processing Systems, 2022

    Sachin Goyal, Mingjie Sun, Aditi Raghunanthan, and Zico Kolter. Test-time adaptation via conjugate pseudo-labels.Advances in Neural Information Processing Systems, 2022

  25. [26]

    Sotta: Robust test-time adaptation on noisy data streams.Advances in Neural Information Processing Systems, 36, 2024

    Taesik Gong, Yewon Kim, Taeckyung Lee, Sorn Chottananurak, and Sung-Ju Lee. Sotta: Robust test-time adaptation on noisy data streams.Advances in Neural Information Processing Systems, 36, 2024

  26. [27]

    Stamp: Outlier-aware test-time adaptation with stable memory replay

    Yongcan Yu, Lijun Sheng, Ran He, and Jian Liang. Stamp: Outlier-aware test-time adaptation with stable memory replay. InEuropean Conference on Computer Vision, pages 375–392, 2024

  27. [28]

    Unified entropy optimization for open-set test-time adaptation

    Zhengqing Gao, Xu-Yao Zhang, and Cheng-Lin Liu. Unified entropy optimization for open-set test-time adaptation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 23975–23984, June 2024

  28. [29]

    Continual test-time domain adaptation

    Qin Wang, Olga Fink, Luc Van Gool, and Dengxin Dai. Continual test-time domain adaptation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7201–7211, 2022

  29. [30]

    Contrastive test-time adaptation

    Dian Chen, Dequan Wang, Trevor Darrell, and Sayna Ebrahimi. Contrastive test-time adaptation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 295–305, 2022

  30. [31]

    Parameter-free online test-time adaptation

    Malik Boudiaf, Romain Mueller, Ismail Ben Ayed, and Luca Bertinetto. Parameter-free online test-time adaptation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8344–8353, 2022

  31. [32]

    Program: Prototype graph model based pseudo-label learning for test-time adaptation

    Haopeng Sun, Lumin Xu, Sheng Jin, Ping Luo, Chen Qian, and Wentao Liu. Program: Prototype graph model based pseudo-label learning for test-time adaptation. InThe Twelfth International Conference on Learning Representations, 2024. 12

  32. [33]

    Test-time adaptation via self-training with nearest neighbor information

    Minguk Jang, Sae-Young Chung, and Hye Won Chung. Test-time adaptation via self-training with nearest neighbor information. InThe Eleventh International Conference on Learning Representations, 2024

  33. [34]

    Test-time model adaptation with only forward passes

    Shuaicheng Niu, Chunyan Miao, Guohao Chen, Pengcheng Wu, and Peilin Zhao. Test-time model adaptation with only forward passes. InInternational Conference on Machine Learning, pages 38298–38315. PMLR, 2024

  34. [35]

    Test-time prompt tuning for zero-shot generalization in vision-language models

    Manli Shu, Weili Nie, De-An Huang, Zhiding Yu, Tom Goldstein, Anima Anandkumar, and Chaowei Xiao. Test-time prompt tuning for zero-shot generalization in vision-language models. Advances in Neural Information Processing Systems, 35:14274–14289, 2022

  35. [36]

    Efficient test-time adaptation of vision-language models

    Adilbek Karmanov, Dayan Guan, Shijian Lu, Abdulmotaleb El Saddik, and Eric Xing. Efficient test-time adaptation of vision-language models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14162–14171, 2024

  36. [37]

    Watt: Weight average test time adaptation of clip

    David Osowiechi, Mehrdad Noori, Gustavo Adolfo Vargas Hakim, Moslem Yazdanpanah, Ali Bahri, Milad Cheraghalikhani, Sahar Dastani, Farzad Beizaee, Ismail Ben Ayed, and Christian Desrosiers. Watt: Weight average test time adaptation of clip. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

  37. [38]

    Clipartt: Adaptation of clip to new domains at test time

    Gustavo A Vargas Hakim, David Osowiechi, Mehrdad Noori, Milad Cheraghalikhani, Ali Bahri, Moslem Yazdanpanah, Ismail Ben Ayed, and Christian Desrosiers. Clipartt: Adaptation of clip to new domains at test time. InProceedings of the Winter Conference on Applications of Computer Vision (WACV), pages 7092–7101, February 2025

  38. [39]

    Temporal test-time adaptation with state-space models.arXiv preprint arXiv:2407.12492, 2024

    Mona Schirmer, Dan Zhang, and Eric Nalisnick. Temporal test-time adaptation with state-space models.arXiv preprint arXiv:2407.12492, 2024

  39. [40]

    Learning to generalize: Meta- learning for domain generalization

    Da Li, Yongxin Yang, Yi-Zhe Song, and Timothy Hospedales. Learning to generalize: Meta- learning for domain generalization. InProceedings of the AAAI conference on artificial intelligence, volume 32, 2018

  40. [41]

    Domain generalization with mixstyle

    Kaiyang Zhou, Yongxin Yang, Yu Qiao, and Tao Xiang. Domain generalization with mixstyle. InInternational Conference on Learning Representations, 2021

  41. [42]

    Fds: Feedback-guided domain synthesis with multi-source conditional diffusion models for domain generalization

    Mehrdad Noori, Milad Cheraghalikhani, Ali Bahri, Gustavo A Vargas Hakim, David Osowiechi, Moslem Yazdanpanah, Ismail Ben Ayed, and Christian Desrosiers. Fds: Feedback-guided domain synthesis with multi-source conditional diffusion models for domain generalization. InProceedings of the Winter Conference on Applications of Computer Vision (WACV), pages 8493...

  42. [43]

    Averaging weights leads to wider optima and better generalization

    Pavel Izmailov, Dmitrii Podoprikhin, Timur Garipov, Dmitry Vetrov, and Andrew Gordon Wilson. Averaging weights leads to wider optima and better generalization. In34th Conference on Uncertainty in Artificial Intelligence 2018, UAI 2018, pages 876–885. Association For Uncertainty in Artificial Intelligence (AUAI), 2018

  43. [44]

    Swad: Domain generalization by seeking flat minima.Advances in Neural Information Processing Systems, 34:22405–22418, 2021

    Junbum Cha, Sanghyuk Chun, Kyungjae Lee, Han-Cheol Cho, Seunghyun Park, Yunsung Lee, and Sungrae Park. Swad: Domain generalization by seeking flat minima.Advances in Neural Information Processing Systems, 34:22405–22418, 2021

  44. [45]

    Test-time adaptation in point clouds: Leveraging sampling variation with weight averaging

    Ali Bahri, Moslem Yazdanpanah, Mehrdad Noori, Sahar Dastani, Milad Cheraghalikhani, David Osowiechi, Farzad Beizaee, Gustavo A Vargas Hakim, Ismail Ben Ayed, and Christian Desrosiers. Test-time adaptation in point clouds: Leveraging sampling variation with weight averaging. In2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages...

  45. [46]

    Purge-gate: Backpropagation- free test-time adaptation for point clouds classification via token purging.arXiv preprint arXiv:2509.09785, 2025

    Moslem Yazdanpanah, Ali Bahri, Mehrdad Noori, Sahar Dastani, Gustavo Adolfo Vargas Hakim, David Osowiechi, Ismail Ben Ayed, and Christian Desrosiers. Purge-gate: Backpropagation- free test-time adaptation for point clouds classification via token purging.arXiv preprint arXiv:2509.09785, 2025. 13

  46. [47]

    Smart- pc: Skeletal model adaptation for robust test-time training in point clouds.arXiv preprint arXiv:2505.19546, 2025

    Ali Bahri, Moslem Yazdanpanah, Sahar Dastani, Mehrdad Noori, Gustavo Adolfo Vargas Hakim, David Osowiechi, Farzad Beizaee, Ismail Ben Ayed, and Christian Desrosiers. Smart- pc: Skeletal model adaptation for robust test-time training in point clouds.arXiv preprint arXiv:2505.19546, 2025

  47. [48]

    Benchmarking neural network robustness to common corruptions and perturbations

    Dan Hendrycks and Thomas Dietterich. Benchmarking neural network robustness to common corruptions and perturbations. InInternational Conference on Learning Representations, 2019

  48. [49]

    Pacs: A dataset for physical audiovisual commonsense reasoning

    Samuel Yu, Peter Wu, Paul Pu Liang, Ruslan Salakhutdinov, and Louis-Philippe Morency. Pacs: A dataset for physical audiovisual commonsense reasoning. InEuropean Conference on Computer Vision, pages 292–309. Springer, 2022

  49. [50]

    Learning robust global repre- sentations by penalizing local predictive power.Advances in neural information processing systems, 32, 2019

    Haohan Wang, Songwei Ge, Zachary Lipton, and Eric P Xing. Learning robust global repre- sentations by penalizing local predictive power.Advances in neural information processing systems, 32, 2019

  50. [51]

    Do imagenet classifiers generalize to imagenet? InInternational conference on machine learning, pages 5389–5400

    Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. Do imagenet classifiers generalize to imagenet? InInternational conference on machine learning, pages 5389–5400. PMLR, 2019

  51. [52]

    The many faces of robustness: A critical analysis of out-of-distribution generalization

    Dan Hendrycks, Steven Basart, Norman Mu, Saurav Kadavath, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Zhu, Samyak Parajuli, Mike Guo, et al. The many faces of robustness: A critical analysis of out-of-distribution generalization. InProceedings of the IEEE/CVF international conference on computer vision, pages 8340–8349, 2021

  52. [53]

    The pascal visual object classes (voc) challenge.International journal of computer vision, 88(2):303–338, 2010

    Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. The pascal visual object classes (voc) challenge.International journal of computer vision, 88(2):303–338, 2010

  53. [54]

    The role of context for object detection and semantic segmentation in the wild

    Roozbeh Mottaghi, Xianjie Chen, Xiaobai Liu, Nam-Gyu Cho, Seong-Whan Lee, Sanja Fidler, Raquel Urtasun, and Alan Yuille. The role of context for object detection and semantic segmentation in the wild. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 891–898, 2014

  54. [55]

    Test-time adaptation of vision-language models for open-vocabulary semantic segmentation.arXiv preprint arXiv:2505.21844, 2025

    Mehrdad Noori, David Osowiechi, Gustavo Adolfo Vargas Hakim, Ali Bahri, Moslem Yazdan- panah, Sahar Dastani, Farzad Beizaee, Ismail Ben Ayed, and Christian Desrosiers. Test-time adaptation of vision-language models for open-vocabulary semantic segmentation.arXiv preprint arXiv:2505.21844, 2025

  55. [56]

    Model stock: All we need is just a few fine-tuned models

    Dong-Hwan Jang, Sangdoo Yun, and Dongyoon Han. Model stock: All we need is just a few fine-tuned models. InEuropean Conference on Computer Vision, pages 207–223. Springer, 2024. 14 TRUST: Test-Time Refinement using Uncertainty-Guided SSM Traverses A Implementation Details Pseudo-code.In this section, we give the pseudo-code for our proposed test-time adap...