Towards Stable Test-Time Adaptation in Dynamic Wild World
read the original abstract
Test-time adaptation (TTA) has shown to be effective at tackling distribution shifts between training and testing data by adapting a given model on test samples. However, the online model updating of TTA may be unstable and this is often a key obstacle preventing existing TTA methods from being deployed in the real world. Specifically, TTA may fail to improve or even harm the model performance when test data have: 1) mixed distribution shifts, 2) small batch sizes, and 3) online imbalanced label distribution shifts, which are quite common in practice. In this paper, we investigate the unstable reasons and find that the batch norm layer is a crucial factor hindering TTA stability. Conversely, TTA can perform more stably with batch-agnostic norm layers, \ie, group or layer norm. However, we observe that TTA with group and layer norms does not always succeed and still suffers many failure cases. By digging into the failure cases, we find that certain noisy test samples with large gradients may disturb the model adaption and result in collapsed trivial solutions, \ie, assigning the same class label for all samples. To address the above collapse issue, we propose a sharpness-aware and reliable entropy minimization method, called SAR, for further stabilizing TTA from two aspects: 1) remove partial noisy samples with large gradients, 2) encourage model weights to go to a flat minimum so that the model is robust to the remaining noisy samples. Promising results demonstrate that SAR performs more stably over prior methods and is computationally efficient under the above wild test scenarios.
This paper has not been read by Pith yet.
Forward citations
Cited by 20 Pith papers
-
Discriminator-Guided Adaptive Diffusion for Source-Free Test-Time Adaptation under Image Corruptions
Discriminator-guided adaptive diffusion enables source-free test-time adaptation to 15 image corruption types by dynamically denoising inputs to align with the source domain while preserving discriminative features.
-
IMSE: Intrinsic Mixture of Spectral Experts Fine-tuning for Test-Time Adaptation
IMSE adapts Vision Transformers for test-time and continual test-time adaptation by tuning only singular values from SVD decompositions and using expert diversity plus domain retrieval, reaching SOTA with far fewer tr...
-
Efficient Test-Time Optimization for Depth Completion via Low-Rank Decoder Adaptation
Low-rank decoder adaptation enables efficient test-time optimization for zero-shot depth completion by updating only the subspace containing depth-relevant information.
-
Neural Collapse in Test-Time Adaptation
Sample-wise neural collapse reveals that feature-classifier misalignment drives TTA degradation under shifts, which NCTTA corrects via hybrid geometric-predictive targets.
-
Test-Time Distillation for Continual Model Adaptation
CoDiRe blends VLM and target model predictions via MSP-based weighting and Optimal Transport rectification to enable stable continual test-time adaptation, outperforming CoTTA by 10.55% on ImageNet-C at 48% of the com...
-
Contrastive Residual Energy Test-time Adaptation
CreTTA reformulates test-time adaptation of marginal distributions as residual energy learning, producing a contrastive objective that cancels the partition function and uses relative energy differences for adaptive g...
-
EmbodiTTA: Resource-Efficient Test-Time Adaptation for Embodied Visual Systems
OD-TTA enables resource-efficient test-time adaptation on edge devices by triggering updates only on detected domain shifts, achieving comparable accuracy with lower energy and computation costs for embodied visual systems.
-
Dual Distribution Estimation for Zero-shot Noisy Test-Time Adaptation with VLMs
DDE models class-wise positive feature Gaussians and negative label distributions to boost ID accuracy and OOD detection in zero-shot noisy TTA, reporting 3.70% harmonic mean gain and 6.20% FPR95 drop on ImageNet.
-
MAMVI: 3D Test-Time Adaptation via Masked Multi-View Point Clouds
MAMVI performs unified single-step TTA on masked multi-view point clouds with hybrid masking and confidence-adaptive learning rates, reporting SOTA on ShapeNet-C and ScanObjectNN-C plus 4.9-8.9x speedup.
-
DOME: Learning Transferable Domain Variables from Sparse Supervision for Test-Time Adaptation
DOME learns sample-specific domain variables from sparse supervision via vision-language models and a sparse domain bank to improve test-time adaptation performance.
-
Sample-wise Targeted Adversarial Attacks on Test-time Adaptation
Proposes meta-learning attack with priority-aware gradient alignment for sample-wise targeted attacks on TTA that maintain label distribution consistency with no-attack baseline.
-
GoTTA be Diverse: Rethinking Memory Policies for Test-Time Adaptation
Diversity-aware memory policies improve test-time adaptation performance most under constrained memory budgets and challenging non-i.i.d. streams.
-
Majorization-Guided Test-Time Adaptation for Vision-Language Models under Modality-Specific Shift
MG-MTTA improves VLM accuracy under modality-specific shifts by replacing pure entropy minimization with majorization-guided adaptation that incorporates a reliability-aware gate prior.
-
Pretrain-then-Adapt: Uncertainty-Aware Test-Time Adaptation for Text-based Person Search
UATTA adapts pre-trained text-image models at test time without labels by using disagreement in bidirectional retrieval rankings to estimate and mitigate uncertainty for improved person search.
-
AdaJEPA: An Adaptive Latent World Model
AdaJEPA performs closed-loop test-time adaptation of latent world models during MPC by executing an action chunk, observing the transition, and taking one gradient step on the model before replanning, yielding higher ...
-
Distill Once, Adapt Life-Long: Exploring Dataset Distillation for Continual Test-Time Adaptation
DO-ALL applies dataset distillation to generate synthetic source anchors that stabilize continual test-time adaptation under evolving domains without storing original source data.
-
Distill Once, Adapt Life-Long: Exploring Dataset Distillation for Continual Test-Time Adaptation
DO-ALL uses dataset distillation to create synthetic source anchors that enable stable long-term continual test-time adaptation without storing original source data.
-
SkySeg: Collaborative Onboard Semantic Segmentation with Heterogeneous UAVs in the Wild
SkySeg is a heterogeneous multi-UAV framework that fuses low- and high-definition images and uses cross-device test-time adaptation to enable real-time onboard semantic segmentation, reporting 3.6x faster inference an...
-
FlowDec: Temporal Conditional Flow Decorruptor for Robust Continuous Vision-Language Navigation
FlowDec is a novel image restoration framework using hybrid temporal conditioning and action-centroid filtering that claims to outperform prior decorruption methods on navigation accuracy and latency in VLN-CE.
-
Learning Inference Concurrency in DynamicGate MLP Structural and Mathematical Justification
DynamicGate MLP enables concurrent learning and inference by separating gating from representation parameters, so that even asynchronous updates produce outputs equivalent to a valid fixed model snapshot.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.