pith. sign in

arxiv: 2603.00574 · v2 · pith:YMZEST75new · submitted 2026-02-28 · 💻 cs.CV · cs.AI

Decoupling Stability and Plasticity for Multi-Modal Test-Time Adaptation

Pith reviewed 2026-05-15 17:54 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords multi-modal test-time adaptationstability-plasticityasymmetric adaptationinterdimensional redundancynegative transfercatastrophic forgettingdomain shiftadapter decoupling
0
0 comments X

The pith

Decoupling each modality adapter into stable and plastic parts, activated asymmetrically by feature redundancy, lets models adapt to new domains without negative transfer or forgetting.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that multi-modal test-time adaptation fails when unbiased modalities suffer negative transfer and biased ones undergo catastrophic forgetting. It diagnoses the biased modality by its higher interdimensional redundancy in the latent space, then applies an asymmetric strategy: plastic components update for the biased modality while stable components update with KL regularization for the unbiased one. A sympathetic reader cares because pretrained multi-modal models must handle evolving real-world distributions without full retraining. If correct, the method preserves general knowledge while gaining domain-specific flexibility.

Core claim

The central claim is that the biased modality exhibits substantially higher interdimensional redundancy than the unbiased one in the unified latent space, allowing reliable identification followed by an asymmetric adaptation strategy in which each modality-specific adapter is split into stable and plastic components, with the plastic part activated and updated for the biased modality and the stable part updated under KL regularization for the unbiased modality.

What carries the argument

Decoupled stable and plastic components within each modality-specific adapter, selected asymmetrically according to the interdimensional redundancy metric.

If this is right

  • The model adapts flexibly to new domains while preserving generalizable knowledge.
  • Negative transfer is avoided in the unbiased modality through KL regularization on stable components.
  • Catastrophic forgetting is avoided in the biased modality by updating only its plastic components.
  • Overall accuracy exceeds prior state-of-the-art methods across diverse multi-modal benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The redundancy diagnostic might transfer to single-modal test-time adaptation if a comparable feature correlation measure can be defined.
  • The stable-plastic split could be tested on additional modality pairs such as audio-visual data.
  • KL regularization on stable components might combine with other regularization techniques to further strengthen preservation of pretraining knowledge.

Load-bearing premise

Higher interdimensional redundancy reliably identifies the biased modality and the stable-plastic split plus KL regularization prevents negative transfer without creating new failure modes.

What would settle it

A test set in which the redundancy measure mislabels the biased modality and the full DASP procedure still produces either forgetting in one modality or negative transfer in the other.

Figures

Figures reproduced from arXiv: 2603.00574 by Tao Jin, Yongbo He, Zirun Guo.

Figure 1
Figure 1. Figure 1: Limitations in Multi-Modal TTA. We evaluate changes in source domain performance during continual adaptation, mea￾sured as ∆ = Accorignal − Accadapted, for state-of-the-art methods (READ and TSA). Results indicate ongoing degradation in both multi-modal and uni-modal contexts. Performance drops in the biased modality are referred to as catastrophic forgetting, while drops in the unbiased modality are consi… view at source ↗
Figure 2
Figure 2. Figure 2: Entropy and confidence statistics on the VGGSound-C with corrupted audio modality. Since audio serves as the dominant modality in this dataset, it continues to display lower entropy and greater confidence, even in the presence of distribution shifts. • We perform comprehensive experiments on Kinetics50-C and VGGSound-C, and DASP exhibits enhanced adaptiv￾ity and stability in comparison to existing methods.… view at source ↗
Figure 3
Figure 3. Figure 3: Redundancy statistics on Kinetics50-C and VGGSound-C. The corrupted modality demonstrates increased redundancy in feature embeddings. Furthermore, the results underscore a significant correlation between redundancy and accuracy. nerable to distribution shifts across different modalities. Ex￾isting TTA methods, primarily developed for uni-modal tasks, inadequately address these complex shifts. In this conte… view at source ↗
Figure 4
Figure 4. Figure 4: The overview of our proposed DASP features a diagnose-then-mitigate framework. It begins by diagnosing the biased modality [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Sensitivity Analysis of Hyper-parameters: Batch Size (B), Redundancy Threshold (δ) and Loss Coefficents (λent, λkl). without asymmetric adaptation, and (iv) with asymmetric adaptation configured in the opposite manner. Removing either adapter resulted in decreased adaptive performance, indicating that the stable adapter is essential for extract￾ing domain-invariant features and improving discrimina￾tion, w… view at source ↗
Figure 6
Figure 6. Figure 6: Accuracy vs. Throughput and Memory Usage. Com￾pared to baselines, our method demonstrates superior performance with higher efficiency (observing comparable or lower computa￾tional cost and higher inference speed) on Kinetics50-C. modality observed in other methods. Meanwhile, the plastic adapter provides necessary plasticity and domain-specific knowledge for effective target domain adaptation. Lastly, we a… view at source ↗
Figure 7
Figure 7. Figure 7: The illustration of uni-modal continual corruption and [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: The results demonstrate a clear, positive correlation [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Redundancy vs. Batch Size. We investigate the corre￾lation between redundancy score and batch size on VGGSound-C with audio corruptions. between increasing corruption severity and the redundancy score R(Z). This confirms our theoretical hypothesis: as in￾puts deviate further from the source manifold (higher σ 2 α), the representation degradation exacerbates, which is pre￾cisely captured by the escalating r… view at source ↗
read the original abstract

Adapting pretrained multi-modal models to evolving test-time distributions, known as multi-modal test-time adaptation, presents a significant challenge. Existing methods frequently encounter negative transfer in the unbiased modality and catastrophic forgetting in the biased modality. To address these challenges, we propose Decoupling Adaptation for Stability and Plasticity (DASP), a novel diagnose-then-mitigate framework. Our analysis reveals a critical discrepancy within the unified latent space: the biased modality exhibits substantially higher interdimensional redundancy (i.e., strong correlations across feature dimensions) compared to the unbiased modality. Leveraging this insight, DASP identifies the biased modality and implements an asymmetric adaptation strategy. This strategy employs a decoupled architecture where each modality-specific adapter is divided into stable and plastic components. The asymmetric mechanism works as follows: for the biased modality, which requires plasticity, the plastic component is activated and updated to capture domain-specific information, while the stable component remains fixed. Conversely, for the unbiased modality, which requires stability, the plastic component is bypassed, and the stable component is updated using KL regularization to prevent negative transfer. This asymmetric design enables the model to adapt flexibly to new domains while preserving generalizable knowledge. Comprehensive evaluations on diverse multi-modal benchmarks demonstrate that DASP significantly outperforms state-of-the-art methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes Decoupling Adaptation for Stability and Plasticity (DASP), a diagnose-then-mitigate framework for multi-modal test-time adaptation. It observes that the biased modality exhibits higher interdimensional redundancy (feature-dimension correlations) in the unified latent space, uses this to identify the biased modality, and applies an asymmetric strategy: modality-specific adapters are split into stable and plastic components, with plasticity activated only for the biased modality while the unbiased modality uses KL-regularized updates on the stable component to avoid negative transfer and forgetting.

Significance. If the empirical claims hold, the work could meaningfully advance multi-modal TTA by providing a practical way to decouple stability and plasticity based on latent-space diagnostics, addressing negative transfer and catastrophic forgetting without requiring parameter-free derivations or machine-checked proofs.

major comments (2)
  1. [§3] §3 (method description): The central claim that higher interdimensional redundancy reliably diagnoses the biased modality is load-bearing for the entire asymmetric split, yet no theoretical bound, invariance proof, or cross-shift validation (e.g., across correlation-inducing vs. other shift types) is supplied; if the metric is an artifact of specific shift statistics, the diagnosis misroutes adapters and reintroduces the very negative transfer the method aims to prevent.
  2. [Abstract] Abstract and §4 (experiments): While the abstract asserts 'comprehensive evaluations on diverse multi-modal benchmarks' that 'significantly outperform state-of-the-art methods,' the manuscript supplies no quantitative tables, ablation results on the redundancy metric, or failure-mode analysis for the KL-regularized stable path, leaving the effectiveness of the asymmetric design unverified.
minor comments (1)
  1. [§3.1] Notation for 'interdimensional redundancy' is introduced without an explicit equation or pseudocode for its computation (e.g., correlation matrix norm or similar), which would aid reproducibility.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the detailed and constructive feedback. We address each major comment point by point below, indicating where revisions will be made to strengthen the manuscript.

read point-by-point responses
  1. Referee: [§3] §3 (method description): The central claim that higher interdimensional redundancy reliably diagnoses the biased modality is load-bearing for the entire asymmetric split, yet no theoretical bound, invariance proof, or cross-shift validation (e.g., across correlation-inducing vs. other shift types) is supplied; if the metric is an artifact of specific shift statistics, the diagnosis misroutes adapters and reintroduces the very negative transfer the method aims to prevent.

    Authors: We agree that the interdimensional redundancy metric is central to the diagnosis step and that stronger validation is warranted. The manuscript presents consistent empirical observations of elevated redundancy in biased modalities across the tested benchmarks. In the revised version we will add dedicated cross-shift experiments that compare correlation-inducing shifts against other shift types (e.g., additive noise, style transfer) to test the metric’s robustness. We will also include a sensitivity analysis and explicit discussion of potential failure cases. A formal theoretical bound or invariance proof is not currently available, as the diagnostic is derived from observed latent-space statistics rather than from a closed-form derivation. revision: partial

  2. Referee: [Abstract] Abstract and §4 (experiments): While the abstract asserts 'comprehensive evaluations on diverse multi-modal benchmarks' that 'significantly outperform state-of-the-art methods,' the manuscript supplies no quantitative tables, ablation results on the redundancy metric, or failure-mode analysis for the KL-regularized stable path, leaving the effectiveness of the asymmetric design unverified.

    Authors: We acknowledge that the current abstract is high-level and that the experimental section would benefit from additional quantitative detail. The full manuscript contains comparative results in §4, but we will revise the abstract to incorporate concrete performance deltas where space allows. We will also add (i) an ablation study isolating the redundancy metric and (ii) a failure-mode analysis of the KL-regularized stable path, either in the main text or as an expanded supplementary section. These additions will make the empirical support for the asymmetric design explicit. revision: yes

standing simulated objections not resolved
  • A theoretical bound or invariance proof establishing that interdimensional redundancy is a reliable, shift-type-invariant diagnostic for the biased modality.

Circularity Check

0 steps flagged

No significant circularity; architectural choice grounded in observed discrepancy

full rationale

The paper describes DASP as a diagnose-then-mitigate framework whose core step is an empirical observation of higher interdimensional redundancy in the biased modality, followed by an asymmetric stable/plastic adapter split. No equations, fitted parameters, or predictions are presented that reduce the claimed performance to a definition or input by construction. No self-citations are invoked as load-bearing uniqueness theorems, and the asymmetric mechanism is introduced as a novel design choice rather than derived from prior self-work. The derivation chain remains self-contained against external benchmarks and does not exhibit any of the enumerated circular patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the unproven domain assumption that interdimensional redundancy differences between modalities are stable and diagnostic enough to drive the asymmetric update rule without side effects.

axioms (1)
  • domain assumption The biased modality exhibits substantially higher interdimensional redundancy compared to the unbiased modality.
    This discrepancy is invoked to identify which modality needs plasticity versus stability.

pith-pipeline@v0.9.0 · 5521 in / 1234 out tokens · 55534 ms · 2026-05-15T17:54:39.631169+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages

  1. [1]

    Multimodal machine learning: A survey and tax- onomy.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019

    Tadas Baltrušaitis, Chaitanya Ahuja, and Louis-Philippe Morency. Multimodal machine learning: A survey and tax- onomy.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019. 1

  2. [2]

    Vlmo: Unified vision-language pre-training with mixture-of-modality-experts

    Hangbo Bao, Wenhui Wang, Li Dong, Qiang Liu, Owais Khan Mohammed, Kriti Aggarwal, Subhojit Som, Songhao Piao, and Furu Wei. Vlmo: Unified vision-language pre-training with mixture-of-modality-experts. InAdvances in Neural Information Processing Systems, 2022. 1

  3. [3]

    Vggsound: A large-scale audio-visual dataset

    Honglie Chen, Weidi Xie, Andrea Vedaldi, and Andrew Zis- serman. Vggsound: A large-scale audio-visual dataset. In International Conference on Acoustics, Speech and Signal Processing, 2020. 5, 1

  4. [4]

    Test-time selective adaptation for uni-modal distribu- tion shift in multi-modal data

    Mingcai Chen, Baoming Zhang, Zongbo Han, Yuntao Du, Wenyu Jiang, Yanmeng Wang, Shuai Feng, and Bingkun Bao. Test-time selective adaptation for uni-modal distribu- tion shift in multi-modal data. InInternational Conference on Machine Learning, 2025. 1, 3, 6

  5. [5]

    Domain generalization via model-agnostic learning of semantic features

    Qi Dou, Daniel Coelho de Castro, Konstantinos Kamnitsas, and Ben Glocker. Domain generalization via model-agnostic learning of semantic features. InAdvances in Neural Infor- mation Processing Systems, 2019. 2

  6. [6]

    Liu, David Harwath, Leonid Karlinsky, Hilde Kuehne, and James R

    Yuan Gong, Andrew Rouditchenko, Alexander H. Liu, David Harwath, Leonid Karlinsky, Hilde Kuehne, and James R. Glass. Contrastive audio-visual masked autoencoder. InIn- ternational Conference on Learning Representations, 2023. 5

  7. [7]

    Smoothing the shift: Towards stable test-time adaptation under complex multimodal noises

    Zirun Guo and Tao Jin. Smoothing the shift: Towards stable test-time adaptation under complex multimodal noises. InIn- ternational Conference on Learning Representations, 2025. 1, 3

  8. [8]

    Classifier-guided gradient modulation for enhanced multi- modal learning

    Zirun Guo, Tao Jin, Jingyuan Chen, and Zhou Zhao. Classifier-guided gradient modulation for enhanced multi- modal learning. InAdvances in Neural Information Process- ing Systems, 2024. 1

  9. [9]

    Benchmarking neu- ral network robustness to common corruptions and pertur- bations.International Conference on Learning Representa- tions, 2019

    Dan Hendrycks and Thomas Dietterich. Benchmarking neu- ral network robustness to common corruptions and pertur- bations.International Conference on Learning Representa- tions, 2019. 1

  10. [10]

    Augmix: A simple data processing method to improve robustness and uncertainty

    Dan Hendrycks, Norman Mu, Ekin D Cubuk, Barret Zoph, Justin Gilmer, and Balaji Lakshminarayanan. Augmix: A simple data processing method to improve robustness and uncertainty. InInternational Conference on Learning Repre- sentations, 2020. 2

  11. [11]

    The kinetics human action video dataset, 2017

    Will Kay, Joao Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, Mustafa Suleyman, and Andrew Zisserman. The kinetics human action video dataset, 2017. 5, 1

  12. [12]

    An im- age is worth 16x16 words: Transformers for image recogni- tion at scale

    Alexander Kolesnikov, Alexey Dosovitskiy, Dirk Weis- senborn, Georg Heigold, Jakob Uszkoreit, Lucas Beyer, Matthias Minderer, Mostafa Dehghani, Neil Houlsby, Syl- vain Gelly, Thomas Unterthiner, and Xiaohua Zhai. An im- age is worth 16x16 words: Transformers for image recogni- tion at scale. InInternational Conference on Learning Rep- resentations, 2021. 5

  13. [13]

    Becotta: Input-dependent online blending of experts for continual test-time adaptation

    Daeun Lee, Jaehong Yoon, and Sung Ju Hwang. Becotta: Input-dependent online blending of experts for continual test-time adaptation. InInternational Conference on Ma- chine Learning, 2024. 2

  14. [14]

    Bridging modalities via pro- gressive re-alignment for multimodal test-time adaptation

    Jiacheng Li and Songhe Feng. Bridging modalities via pro- gressive re-alignment for multimodal test-time adaptation. In Annual AAAI Conference on Artificial Intelligence, 2026. 4

  15. [15]

    BLIP: Bootstrapping language-image pre-training for unified vision-language understanding and generation

    Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. BLIP: Bootstrapping language-image pre-training for unified vision-language understanding and generation. InInterna- tional Conference on Machine Learning, 2022. 1

  16. [16]

    Do we really need to access the source data? source hypothesis transfer for un- supervised domain adaptation

    Jian Liang, Dapeng Hu, and Jiashi Feng. Do we really need to access the source data? source hypothesis transfer for un- supervised domain adaptation. InInternational Conference on Machine Learning, 2020. 5

  17. [17]

    Ttn: A domain-shift aware batch normalization in test- time adaptation

    Hyesu Lim, Byeonggeun Kim, Jaegul Choo, and Sungha Choi. Ttn: A domain-shift aware batch normalization in test- time adaptation. InInternational Conference on Learning Representations, 2023. 2

  18. [18]

    Multimodality helps unimodality: Cross- modal few-shot learning with multimodal models

    Zhiqiu Lin, Samuel Yu, Zhiyi Kuang, Deepak Pathak, and Deva Ramanan. Multimodality helps unimodality: Cross- modal few-shot learning with multimodal models. InCom- puter Vision and Pattern Recognition, 2023. 1

  19. [19]

    Vida: Home- ostatic visual domain adapter for continual test time adapta- tion

    Jiaming Liu, Senqiao Yang, Peidong Jia, Ming Lu, Yan- dong Guo, Wei Xue, and Shanghang Zhang. Vida: Home- ostatic visual domain adapter for continual test time adapta- tion. InInternational Conference on Learning Representa- tions, 2024. 2

  20. [20]

    Efficient test- time model adaptation without forgetting

    Shuaicheng Niu, Jiaxiang Wu, Yifan Zhang, Yaofo Chen, Shijian Zheng, Peilin Zhao, and Mingkui Tan. Efficient test- time model adaptation without forgetting. InInternational Conference on Machine Learning, 2022. 1, 2, 3, 6

  21. [21]

    Towards stable test-time adaptation in dynamic wild world

    Shuaicheng Niu, Jiaxiang Wu, Yifan Zhang, Zhiquan Wen, Yaofo Chen, Peilin Zhao, and Mingkui Tan. Towards stable test-time adaptation in dynamic wild world. InInternational Conference on Learning Representations, 2023. 1, 2, 3, 6

  22. [22]

    Test-time model adaptation with only forward passes

    Shuaicheng Niu, Chunyan Miao, Guohao Chen, Pengcheng Wu, and Peilin Zhao. Test-time model adaptation with only forward passes. InInternational Conference on Machine Learning, 2024. 2

  23. [23]

    Adapt in the wild: Test-time entropy minimization with sharpness and feature regularization

    Shuaicheng Niu, Guohao Chen, Deyu Chen, Yifan Zhang, Jiaxiang Wu, Zhiquan Wen, Yaofo Chen, Peilin Zhao, Chun- yan Miao, and Mingkui Tan. Adapt in the wild: Test-time entropy minimization with sharpness and feature regulariza- tion.arXiv preprint arXiv:2509.04977, 2025. 3

  24. [24]

    Robustness properties of facebook’s resnext wsl models.arXiv preprint arXiv:1907.07640, 2019

    A Emin Orhan. Robustness properties of facebook’s resnext wsl models.arXiv preprint arXiv:1907.07640, 2019. 2

  25. [25]

    Learning transferable visual models from natural language supervision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InInternational Conference on Machine Learning, 2021. 1

  26. [26]

    Generalizing across domains via cross-gradient training

    Shiv Shankar, Vihari Piratla, Soumen Chakrabarti, Sid- dhartha Chaudhuri, Preethi Jyothi, and Sunita Sarawagi. Generalizing across domains via cross-gradient training. InInternational Conference on Learning Representations,

  27. [27]

    Ecotta: Memory-efficient continual test-time adaptation via self-distilled regularization

    Junha Song, Jungsoo Lee, In So Kweon, and Sungha Choi. Ecotta: Memory-efficient continual test-time adaptation via self-distilled regularization. InComputer Vision and Pattern Recognition, 2023. 2

  28. [28]

    Tent: Fully test-time adaptation by entropy minimization

    Dequan Wang, Evan Shelhamer, Shaoteng Liu, Bruno Ol- shausen, and Trevor Darrell. Tent: Fully test-time adaptation by entropy minimization. InInternational Conference on Learning Representations, 2021. 1, 2, 6

  29. [29]

    Partition- then-adapt: Combating prediction bias for reliable multi- modal test-time adaptation

    Guowei Wang, Fan Lyu, and Changxing Ding. Partition- then-adapt: Combating prediction bias for reliable multi- modal test-time adaptation. InAdvances in Neural Infor- mation Processing Systems, 2025. 3

  30. [30]

    Con- tinual test-time domain adaptation

    Qin Wang, Olga Fink, Luc Van Gool, and Dengxin Dai. Con- tinual test-time domain adaptation. InComputer Vision and Pattern Recognition, 2022. 2

  31. [31]

    Image as a foreign language: Beit pretraining for vision and vision-language tasks

    Wenhui Wang, Hangbo Bao, Li Dong, Johan Bjorck, Zhil- iang Peng, Qiang Liu, Kriti Aggarwal, Owais Khan Mo- hammed, Saksham Singhal, Subhojit Som, and Furu Wei. Image as a foreign language: Beit pretraining for vision and vision-language tasks. InComputer Vision and Pattern Recognition, 2023. 1

  32. [32]

    Test-time adaption against multi-modal reliability bias

    Mouxing Yang, Yunfan Li, Changqing Zhang, Peng Hu, and Xi Peng. Test-time adaption against multi-modal reliability bias. InInternational Conference on Learning Representa- tions, 2024. 1, 3, 5, 6

  33. [33]

    Improving out-of-distribution robustness via selective augmentation

    Huaxiu Yao, Yu Wang, Sai Li, Linjun Zhang, Weixin Liang, James Zou, and Chelsea Finn. Improving out-of-distribution robustness via selective augmentation. InInternational Con- ference on Machine Learning, 2022. 2

  34. [34]

    Robust test-time adaptation in dynamic scenarios

    Longhui Yuan, Binhui Xie, and Shuang Li. Robust test-time adaptation in dynamic scenarios. InComputer Vision and Pattern Recognition, 2023. 2

  35. [35]

    Memo: Test time robustness via adaptation and augmentation

    Marvin Zhang, Sergey Levine, and Chelsea Finn. Memo: Test time robustness via adaptation and augmentation. In Advances in Neural Information Processing Systems, 2022. 2

  36. [36]

    Analytic con- tinual test-time adaptation for multi-modality corruption

    Yufei Zhang, Yicheng Xu, Hongxin Wei, Zhiping Lin, Xi- aofeng Zou, Cen Chen, and Huiping Zhuang. Analytic con- tinual test-time adaptation for multi-modality corruption. In ACM International Conference on Multimedia, pages 1929– 1937, 2025. 3

  37. [37]

    Attention bootstrapping for multi-modal test-time adaptation

    Yusheng Zhao, Junyu Luo, Xiao Luo, Jinsheng Huang, Jingyang Yuan, Zhiping Xiao, and Ming Zhang. Attention bootstrapping for multi-modal test-time adaptation. InAn- nual AAAI Conference on Artificial Intelligence, 2025. 3 10 Decoupling Stability and Plasticity for Multi-Modal Test-Time Adaptation Supplementary Material This appendix contains supplementary ...

  38. [38]

    in the wild

    More Experimental Details 6.1. Benchmarks We construct two benchmarks based on Kinetics [11] and VGGSound [3], to evaluate the performance of state-of-the- art methods under multi-modal domain shifts during test- time adaptation. We introduce three experimental setups: uni-modal episodic corruption, uni-modal continual corrup- tion, and interleaved modali...

  39. [39]

    For a perturbed sample ˜z∈R D, we consider the dominant rank-1 compo- nent: ˜z=z+αv

    Further Analysis of the Redundancy Score Theoretical Analysis.The distribution shift is modeled as a low-rank perturbation in the latent space. For a perturbed sample ˜z∈R D, we consider the dominant rank-1 compo- nent: ˜z=z+αv. To formalize our analysis, we establish the followingAssumptions: 1.The dimensions ofzare centered and uncorrelated,i.e., E[z] =...

  40. [40]

    2 Table 8.Episodic Adaptation.Comparison with SOTA methods on VGGSound-C with video corruptions (severity level 5) regarding Accuracy (%,↑)

    Extended Comparative Experiments Main experiments.We report additional results for the main experiments that were not included in the main text. 2 Table 8.Episodic Adaptation.Comparison with SOTA methods on VGGSound-C with video corruptions (severity level 5) regarding Accuracy (%,↑). Noise Blur Weather Digital Method Gauss. Shot Impul. Defoc. Glass Mot. ...