EmbodiTTA: Resource-Efficient Test-Time Adaptation for Embodied Visual Systems

Dong Ma; Xiao Ma; Young D. Kwon

arxiv: 2505.00986 · v2 · submitted 2025-05-02 · 💻 cs.LG · cs.CV

EmbodiTTA: Resource-Efficient Test-Time Adaptation for Embodied Visual Systems

Xiao Ma , Young D. Kwon , Dong Ma This is my paper

Pith reviewed 2026-05-22 16:30 UTC · model grok-4.3

classification 💻 cs.LG cs.CV

keywords test-time adaptationcontinual adaptationdomain shift detectionedge computingembodied AIresource efficiencybatch normalization

0 comments

The pith

OD-TTA adapts models on edge devices only when domain shifts are detected, cutting energy use while matching full adaptation accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces on-demand test-time adaptation (OD-TTA) as a way to make continual adaptation practical for resource-constrained embodied systems like robots. Rather than updating the model on every data batch, it uses a lightweight detector to trigger adaptation solely for significant domain changes. Additional components select the right source model and handle batch normalization updates efficiently even with small batches. Experiments demonstrate that this approach delivers comparable or superior performance to standard methods but with far lower computation and energy costs. This shift could enable real-time model improvement on devices where constant adaptation was previously too expensive.

Core claim

OD-TTA is an on-demand TTA framework that activates adaptation only upon detecting a significant domain shift, using a lightweight detection mechanism, a source domain selection module, and a decoupled Batch Normalization update scheme to achieve accurate adaptation with reduced memory and energy overhead on edge devices.

What carries the argument

The lightweight domain shift detection mechanism that decides when to trigger adaptation, combined with source selection and decoupled BN updates.

If this is right

Adaptation becomes feasible on devices with limited memory and battery by avoiding unnecessary updates.
Accuracy remains high or improves because adaptation is targeted rather than constant.
Small batch sizes work for adaptation thanks to the decoupled BN scheme.
Overall computation overhead drops substantially compared to continual TTA.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar on-demand strategies might benefit other continual learning tasks beyond visual adaptation.
Integrating this with hardware-specific optimizations could further reduce costs in embodied AI.
The detection mechanism implies that many domain changes are minor and do not warrant full adaptation.

Load-bearing premise

The domain shift detector correctly identifies when adaptation is needed without missing critical shifts or activating too frequently on insignificant variations.

What would settle it

A test scenario where the system fails to detect a genuine domain shift, resulting in degraded model performance compared to continuous adaptation.

Figures

Figures reproduced from arXiv: 2505.00986 by Dong Ma, Xiao Ma, Young D. Kwon.

**Figure 1.** Figure 1: Impact of domain shift: (a) indoor → outdoor (b) outdoor scenes with a light grass → outdoor scenes with a dark tree. on unlabeled target data, encouraging the model to make more confident predictions in the new domain. Such methods typically operate batch by batch (a chunk of streaming inputs) and alternate between two steps: (1) an adaptation step that updates model parameters via backpropagation, using … view at source ↗

**Figure 2.** Figure 2: Illustration of (a) continual and (b) on-demand test-time adaptation. Our work only triggers adaptation when significant domain shifts occur, thereby reducing adaptation frequency by up to 90% over C-TTA methods while outperforming all the baselines. • We presented EmbodiTTA, a novel on-demand TTA framework for embodied devices. It comprises three core innovations: a lightweight domain shift detector, a … view at source ↗

**Figure 3.** Figure 3: (a) Correlation of accuracy and entropy, (b) adaptation to a target domain from different source domains. process must be lightweight to conserve on-device resources. • Where to adapt from. Unlike continual TTA, where the distribution captured by the current model and the distribution of the incoming data are often similar, ondemand TTA encounters a large distribution disparity when an adaptation is trig… view at source ↗

**Figure 6.** Figure 6: Sample-wise domain shift detection using (a) entropy and (b) the proposed EMA entropy. A. Overview [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗

**Figure 7.** Figure 7: Mechanism of domain shift detection. wise entropy. Specifically, [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗

**Figure 9.** Figure 9: Visualization of clustered subsets from the ImageNet trainset. C. Source Domain Selection As noted in Section II-D2, adapting from a closer domain can enhance both the accuracy and the convergence speed of adaptation. This observation motivates us to select the domain most similar to the new domain from a candidate pool before adaptation, rather than directly adapting from the last domain, referred to as … view at source ↗

**Figure 10.** Figure 10: Scheme of similar candidate selection [PITH_FULL_IMAGE:figures/full_fig_p007_10.png] view at source ↗

**Figure 11.** Figure 11: Workflow of decoupled BN update. The workflow of our proposed solution is illustrated in Figure 10. Specifically, we cache N samples from the new domain (e.g., N = 128) and process them in small batches (e.g., batch size 16) through the source model to extract domain features. Before extraction, we update the BN statistics via a forward pass to align the model with the batch distribution6 . Then, for eac… view at source ↗

**Figure 12.** Figure 12: Energy consumption for processing domain data sequences of varying lengths under batch size = (a) 1 and (b) 16. by minimizing the frequency of adaptations. To evaluate the energy benefit of EmbodiTTA, we implemented the above methods on the Jetson Orin Nano with batch sizes of 1 and 168 . The evaluation measured total energy consumption across domains of varying lengths, ranging from 1,000 samples (transi… view at source ↗

**Figure 13.** Figure 13: EMA entropy change along the data stream on CIFAR10-C. Domain transitions occur every 10,000 samples and are highlighted in alternating background (white-gray) color. Red dotted lines indicate detected shifts. The table above shows the corresponding accuracy on each domain [PITH_FULL_IMAGE:figures/full_fig_p012_13.png] view at source ↗

**Figure 14.** Figure 14: Evaluation of using BN statistics for similar domain selection [PITH_FULL_IMAGE:figures/full_fig_p012_14.png] view at source ↗

read the original abstract

Continual Test-time adaptation (CTTA) continuously adapts the deployed model on every incoming batch of data. While achieving optimal accuracy, existing CTTA approaches present poor real-world applicability on resource-constrained edge devices, due to the substantial memory overhead and energy consumption. In this work, we first introduce a novel paradigm -- on-demand TTA -- which triggers adaptation only when a significant domain shift is detected. Then, we present OD-TTA, an on-demand TTA framework for accurate and efficient adaptation on edge devices. OD-TTA comprises three innovative techniques: 1) a lightweight domain shift detection mechanism to activate TTA only when it is needed, drastically reducing the overall computation overhead, 2) a source domain selection module that chooses an appropriate source model for adaptation, ensuring high and robust accuracy, 3) a decoupled Batch Normalization (BN) update scheme to enable memory-efficient adaptation with small batch sizes. Extensive experiments show that OD-TTA achieves comparable and even better performance while reducing the energy and computation overhead remarkably, making TTA a practical reality.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

OD-TTA reframes continual TTA as on-demand via shift detection plus source selection and decoupled BN, which could help edge embodied systems but hinges on the detector not missing real shifts.

read the letter

The main thing here is a shift from always-on continual test-time adaptation to an on-demand version that only runs when a domain shift is flagged. The authors package this with a lightweight detector, a source model selector, and a decoupled batch-norm update meant to work with small batches on memory-tight hardware. That combination directly targets the energy and compute costs that have kept CTTA off resource-constrained robots and edge cameras so far.

Referee Report

2 major / 2 minor

Summary. The paper introduces a new 'on-demand TTA' paradigm for continual test-time adaptation in embodied visual systems on resource-constrained devices. OD-TTA triggers adaptation only upon detecting significant domain shifts via a lightweight detector, selects an appropriate source model, and employs a decoupled batch normalization update to support memory-efficient adaptation with small batches. The authors claim that extensive experiments demonstrate comparable or superior accuracy to standard CTTA methods while achieving substantial reductions in energy and computational overhead.

Significance. If the efficiency gains hold without accuracy degradation, this work could meaningfully advance practical deployment of adaptive models on edge hardware for robotics and embodied AI, addressing key barriers of memory and power consumption that currently limit continual TTA.

major comments (2)

[Section 3.1] Section 3.1 (Domain Shift Detection): The central efficiency claim rests on the lightweight domain shift detector triggering adaptation only for significant shifts. No precision, recall, threshold sensitivity analysis, or ablation on missed shifts (e.g., gradual lighting or viewpoint changes in navigation) is reported, leaving the 'on-demand' premise unverified and risking silent accuracy loss in deployment.
[Section 4] Section 4 (Experiments): The abstract asserts 'comparable and even better performance' with 'remarkably' reduced overhead, yet the manuscript provides no concrete accuracy metrics, energy/computation numbers, baseline comparisons, or statistical significance tests. This absence prevents evaluation of whether the claimed gains are load-bearing or merely incremental.

minor comments (2)

[Section 3.1] The notation and exact formulation of the domain shift detection threshold and scoring function should be stated explicitly with an equation for reproducibility.
[Section 4] Figure captions and axis labels in the experimental results could more clearly distinguish energy vs. accuracy trade-offs across methods.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comments point by point below, outlining how we will strengthen the presentation and analysis in the revised version.

read point-by-point responses

Referee: [Section 3.1] Section 3.1 (Domain Shift Detection): The central efficiency claim rests on the lightweight domain shift detector triggering adaptation only for significant shifts. No precision, recall, threshold sensitivity analysis, or ablation on missed shifts (e.g., gradual lighting or viewpoint changes in navigation) is reported, leaving the 'on-demand' premise unverified and risking silent accuracy loss in deployment.

Authors: We agree that a dedicated evaluation of the domain shift detector would provide stronger support for the on-demand TTA premise. In the revised manuscript, we will add precision and recall metrics for the detector across different shift magnitudes, include threshold sensitivity analysis, and provide ablations examining performance under gradual domain shifts such as lighting variations or viewpoint changes typical in navigation scenarios. These additions will help confirm that the detector reliably triggers adaptation without introducing silent accuracy degradation. revision: yes
Referee: [Section 4] Section 4 (Experiments): The abstract asserts 'comparable and even better performance' with 'remarkably' reduced overhead, yet the manuscript provides no concrete accuracy metrics, energy/computation numbers, baseline comparisons, or statistical significance tests. This absence prevents evaluation of whether the claimed gains are load-bearing or merely incremental.

Authors: The experimental results, including accuracy metrics, energy and computation overhead numbers, and comparisons against standard CTTA baselines, are presented in Section 4 along with the associated tables and figures. To improve clarity and address the concern directly, we will revise Section 4 to more explicitly tabulate and highlight these concrete values, ensure all baseline comparisons are clearly labeled, and add statistical significance tests (such as mean and standard deviation over multiple runs or p-values) to demonstrate that the observed efficiency gains and accuracy levels are robust rather than incremental. revision: partial

Circularity Check

0 steps flagged

No significant circularity; empirical techniques and experiments stand independently

full rationale

The paper introduces an on-demand TTA paradigm and three concrete modules (lightweight shift detection, source selection, decoupled BN) whose value is demonstrated through empirical results on accuracy and resource use. No equations, fitted parameters renamed as predictions, or self-citation chains are load-bearing for the central claims. The derivation chain consists of proposed algorithmic choices validated externally by experiments rather than reducing to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review based on abstract only; no free parameters, axioms, or invented entities are explicitly detailed in the provided text.

pith-pipeline@v0.9.0 · 5718 in / 1011 out tokens · 47698 ms · 2026-05-22T16:30:41.202009+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

lightweight domain shift detection mechanism using exponential moving average (EMA) entropy to detect the domain shift
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean alpha_pin_under_high_calibration unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

decoupled BN update strategy... BN statistics... updated... with larger batch sizes... BN parameters with smaller batch sizes

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 4 internal anchors

[1]

Embodied intelligence: A synergy of morphology, action, perception and learning,

H. Liu, D. Guo, and A. Cangelosi, “Embodied intelligence: A synergy of morphology, action, perception and learning,”ACM Computing Surveys, vol. 57, no. 7, pp. 1–36, 2025

work page 2025
[2]

Generalisation in humans and deep neural networks,

R. Geirhos, C. R. Temme, J. Rauber, H. H. Sch ¨utt, M. Bethge, and F. A. Wichmann, “Generalisation in humans and deep neural networks,” Advances in neural information processing systems, vol. 31, 2018

work page 2018
[3]

Do imagenet clas- sifiers generalize to imagenet?

B. Recht, R. Roelofs, L. Schmidt, and V . Shankar, “Do imagenet clas- sifiers generalize to imagenet?” inInternational conference on machine learning. PMLR, 2019, pp. 5389–5400

work page 2019
[4]

Benchmarking Neural Network Robustness to Common Corruptions and Perturbations

D. Hendrycks and T. Dietterich, “Benchmarking neural network ro- bustness to common corruptions and perturbations,”arXiv preprint arXiv:1903.12261, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1903
[5]

Core50: a new dataset and benchmark for continuous object recognition,

V . Lomonaco and D. Maltoni, “Core50: a new dataset and benchmark for continuous object recognition,” inProceedings of the 1st Annual Conference on Robot Learning, ser. Proceedings of Machine Learning Research, S. Levine, V . Vanhoucke, and K. Goldberg, Eds., vol. 78. PMLR, 13–15 Nov 2017, pp. 17–26. [Online]. Available: https://proceedings.mlr.press/v78/...

work page 2017
[6]

Generalizing to unseen domains: A survey on domain generalization,

J. Wang, C. Lan, C. Liu, Y . Ouyang, T. Qin, W. Lu, Y . Chen, W. Zeng, and P. S. Yu, “Generalizing to unseen domains: A survey on domain generalization,”IEEE transactions on knowledge and data engineering, vol. 35, no. 8, pp. 8052–8072, 2022

work page 2022
[7]

A comprehensive survey on test-time adaptation under distribution shifts,

J. Liang, R. He, and T. Tan, “A comprehensive survey on test-time adaptation under distribution shifts,”International Journal of Computer Vision, vol. 133, no. 1, pp. 31–64, 2025

work page 2025
[8]

Tent: Fully Test-time Adaptation by Entropy Minimization

D. Wang, E. Shelhamer, S. Liu, B. Olshausen, and T. Darrell, “Tent: Fully test-time adaptation by entropy minimization,”arXiv preprint arXiv:2006.10726, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2006
[9]

Ef- ficient test-time model adaptation without forgetting,

S. Niu, J. Wu, Y . Zhang, Y . Chen, S. Zheng, P. Zhao, and M. Tan, “Ef- ficient test-time model adaptation without forgetting,” inInternational conference on machine learning. PMLR, 2022, pp. 16 888–16 905

work page 2022
[10]

Towards stable test-time adaptation in dynamic wild world.arXiv preprint arXiv:2302.12400, 2023

S. Niu, J. Wu, Y . Zhang, Z. Wen, Y . Chen, P. Zhao, and M. Tan, “Towards stable test-time adaptation in dynamic wild world,”arXiv preprint arXiv:2302.12400, 2023

work page arXiv 2023
[11]

Mecta: Memory-economic continual test-time model adaptation,

J. Hong, L. Lyu, J. Zhou, and M. Spranger, “Mecta: Memory-economic continual test-time model adaptation,” in2023 International Conference on Learning Representations, 2023

work page 2023
[12]

Shift: a synthetic driving dataset for continuous multi-task domain adaptation,

T. Sun, M. Segu, J. Postels, Y . Wang, L. Van Gool, B. Schiele, F. Tombari, and F. Yu, “Shift: a synthetic driving dataset for continuous multi-task domain adaptation,” inProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, 2022, pp. 21 371– 21 382

work page 2022
[13]

Where are we in the search for an artificial visual cortex for embodied intelligence?

A. Majumdar, K. Yadav, S. Arnaud, J. Ma, C. Chen, S. Silwal, A. Jain, V .-P. Berges, T. Wu, J. Vakilet al., “Where are we in the search for an artificial visual cortex for embodied intelligence?”Advances in Neural Information Processing Systems, vol. 36, pp. 655–677, 2023

work page 2023
[14]

arXiv preprint arXiv:2303.15361 , year=

J. Liang, R. He, and T. Tan, “A comprehensive survey on test-time adaptation under distribution shifts,”arXiv preprint arXiv:2303.15361, 2023

work page arXiv 2023
[15]

Lote-animal: A long time-span dataset for endangered animal behavior understanding,

D. Liu, J. Hou, S. Huang, J. Liu, Y . He, B. Zheng, J. Ning, and J. Zhang, “Lote-animal: A long time-span dataset for endangered animal behavior understanding,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 20 064–20 075

work page 2023
[16]

Online distribution shift detection via recency prediction,

R. Luo, R. Sinha, Y . Sun, A. Hindy, S. Zhao, S. Savarese, E. Schmer- ling, and M. Pavone, “Online distribution shift detection via recency prediction,”arXiv preprint arXiv:2211.09916, 2022

work page arXiv 2022
[17]

Window-based distri- bution shift detection for deep neural networks,

G. Bar Shalom, Y . Geifman, and R. El-Yaniv, “Window-based distri- bution shift detection for deep neural networks,”Advances in Neural Information Processing Systems, vol. 36, 2024

work page 2024
[18]

Effective restoration of source knowledge in continual test time adaptation,

F. F. Niloy, S. M. Ahmed, D. S. Raychaudhuri, S. Oymak, and A. K. Roy-Chowdhury, “Effective restoration of source knowledge in continual test time adaptation,” inProceedings of the IEEE/CVF Winter Confer- ence on Applications of Computer Vision, 2024, pp. 2091–2100. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 15

work page 2024
[19]

A simple signal for domain shift,

G. Chakrabarty, M. Sreenivas, and S. Biswas, “A simple signal for domain shift,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 3577–3584

work page 2023
[20]

Some methods for classification and analysis of multivariate observations,

J. MacQueenet al., “Some methods for classification and analysis of multivariate observations,” inProceedings of the fifth Berkeley sympo- sium on mathematical statistics and probability, vol. 1, no. 14. Oakland, CA, USA, 1967, pp. 281–297

work page 1967
[21]

Uncertainty-calibrated test-time model adaptation without forgetting,

M. Tan, G. Chen, J. Wu, Y . Zhang, Y . Chen, P. Zhao, and S. Niu, “Uncertainty-calibrated test-time model adaptation without forgetting,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

work page 2025
[22]

Learning multiple layers of features from tiny images,

A. Krizhevsky, G. Hintonet al., “Learning multiple layers of features from tiny images,” 2009

work page 2009
[23]

Imagenet: A large-scale hierarchical image database,

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009, pp. 248–255

work page 2009
[24]

Pytorch image models,

R. Wightman, “Pytorch image models,” https://github.com/rwightman/ pytorch-image-models, 2019

work page 2019
[25]

Universal test-time adaptation through weight ensembling, diversity weighting, and prior correction,

R. A. Marsden, M. D ¨obler, and B. Yang, “Universal test-time adaptation through weight ensembling, diversity weighting, and prior correction,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 2555–2565

work page 2024
[26]

L-tta: Lightweight test-time adaptation using a versatile stem layer,

J. Shin and H. Kim, “L-tta: Lightweight test-time adaptation using a versatile stem layer,”Advances in Neural Information Processing Systems, vol. 37, pp. 39 325–39 349, 2024

work page 2024
[27]

Surgeon: Memory-adaptive fully test-time adaptation via dynamic activation sparsity,

K. Ma, J. Tang, B. Guo, F. Dang, S. Liu, Z. Zhu, L. Wu, C. Fang, Y .-C. Chen, Z. Yuet al., “Surgeon: Memory-adaptive fully test-time adaptation via dynamic activation sparsity,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 30 514–30 523

work page 2025
[28]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778

work page 2016
[29]

Rethinking Atrous Convolution for Semantic Image Segmentation

L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmentation,”arXiv preprint arXiv:1706.05587, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[30]

Mobilenetv2: Inverted residuals and linear bottlenecks,

M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4510–4520

work page 2018
[31]

MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer

S. Mehta and M. Rastegari, “Mobilevit: light-weight, general- purpose, and mobile-friendly vision transformer,”arXiv preprint arXiv:2110.02178, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[32]

Artificial intelligence for long-term robot autonomy: A survey,

L. Kunze, N. Hawes, T. Duckett, M. Hanheide, and T. Krajn´ık, “Artificial intelligence for long-term robot autonomy: A survey,”IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 4023–4030, 2018

work page 2018
[33]

Addressing appearance change in outdoor robotics with adversarial domain adaptation,

M. Wulfmeier, A. Bewley, and I. Posner, “Addressing appearance change in outdoor robotics with adversarial domain adaptation,” in2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2017, pp. 1551–1558

work page 2017
[34]

Robot perceptual adaptation to environment changes for long-term human teammate following,

S. Siva and H. Zhang, “Robot perceptual adaptation to environment changes for long-term human teammate following,”The International Journal of Robotics Research, vol. 41, no. 7, pp. 706–720, 2022

work page 2022
[35]

The epic-kitchens dataset: Collection, challenges and baselines,

D. Damen, H. Doughty, G. M. Farinella, S. Fidler, A. Furnari, E. Kaza- kos, D. Moltisanti, J. Munro, T. Perrett, W. Price, and M. Wray, “The epic-kitchens dataset: Collection, challenges and baselines,”IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 43, no. 11, pp. 4125–4141, 2021

work page 2021
[36]

In search of lost online test-time adaptation: A survey,

Z. Wang, Y . Luo, L. Zheng, Z. Chen, S. Wang, and Z. Huang, “In search of lost online test-time adaptation: A survey,”International Journal of Computer Vision, pp. 1–34, 2024

work page 2024
[37]

Beyond model adaptation at test time: A survey,

Z. Xiao and C. G. Snoek, “Beyond model adaptation at test time: A survey,”arXiv preprint arXiv:2411.03687, 2024

work page arXiv 2024
[38]

Revisiting batch normalization for improving corruption robustness,

P. Benz, C. Zhang, A. Karjauv, and I. S. Kweon, “Revisiting batch normalization for improving corruption robustness,” inProceedings of the IEEE/CVF winter conference on applications of computer vision, 2021, pp. 494–503

work page 2021
[39]

Sotta: Robust test-time adaptation on noisy data streams,

T. Gong, Y . Kim, T. Lee, S. Chottananurak, and S.-J. Lee, “Sotta: Robust test-time adaptation on noisy data streams,”Advances in Neural Information Processing Systems, vol. 36, pp. 14 070–14 093, 2023

work page 2023
[40]

Continual test-time domain adaptation,

Q. Wang, O. Fink, L. Van Gool, and D. Dai, “Continual test-time domain adaptation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 7201–7211

work page 2022
[41]

Testing exchangeability on-line,

V . V ovk, I. Nouretdinov, and A. Gammerman, “Testing exchangeability on-line,” inProceedings of the 20th International Conference on Ma- chine Learning (ICML-03), 2003, pp. 768–775

work page 2003
[42]

Entropy-based concept shift detection,

P. V orburger and A. Bernstein, “Entropy-based concept shift detection,” inSixth International Conference on Data Mining (ICDM’06). IEEE, 2006, pp. 1113–1118

work page 2006

[1] [1]

Embodied intelligence: A synergy of morphology, action, perception and learning,

H. Liu, D. Guo, and A. Cangelosi, “Embodied intelligence: A synergy of morphology, action, perception and learning,”ACM Computing Surveys, vol. 57, no. 7, pp. 1–36, 2025

work page 2025

[2] [2]

Generalisation in humans and deep neural networks,

R. Geirhos, C. R. Temme, J. Rauber, H. H. Sch ¨utt, M. Bethge, and F. A. Wichmann, “Generalisation in humans and deep neural networks,” Advances in neural information processing systems, vol. 31, 2018

work page 2018

[3] [3]

Do imagenet clas- sifiers generalize to imagenet?

B. Recht, R. Roelofs, L. Schmidt, and V . Shankar, “Do imagenet clas- sifiers generalize to imagenet?” inInternational conference on machine learning. PMLR, 2019, pp. 5389–5400

work page 2019

[4] [4]

Benchmarking Neural Network Robustness to Common Corruptions and Perturbations

D. Hendrycks and T. Dietterich, “Benchmarking neural network ro- bustness to common corruptions and perturbations,”arXiv preprint arXiv:1903.12261, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1903

[5] [5]

Core50: a new dataset and benchmark for continuous object recognition,

V . Lomonaco and D. Maltoni, “Core50: a new dataset and benchmark for continuous object recognition,” inProceedings of the 1st Annual Conference on Robot Learning, ser. Proceedings of Machine Learning Research, S. Levine, V . Vanhoucke, and K. Goldberg, Eds., vol. 78. PMLR, 13–15 Nov 2017, pp. 17–26. [Online]. Available: https://proceedings.mlr.press/v78/...

work page 2017

[6] [6]

Generalizing to unseen domains: A survey on domain generalization,

J. Wang, C. Lan, C. Liu, Y . Ouyang, T. Qin, W. Lu, Y . Chen, W. Zeng, and P. S. Yu, “Generalizing to unseen domains: A survey on domain generalization,”IEEE transactions on knowledge and data engineering, vol. 35, no. 8, pp. 8052–8072, 2022

work page 2022

[7] [7]

A comprehensive survey on test-time adaptation under distribution shifts,

J. Liang, R. He, and T. Tan, “A comprehensive survey on test-time adaptation under distribution shifts,”International Journal of Computer Vision, vol. 133, no. 1, pp. 31–64, 2025

work page 2025

[8] [8]

Tent: Fully Test-time Adaptation by Entropy Minimization

D. Wang, E. Shelhamer, S. Liu, B. Olshausen, and T. Darrell, “Tent: Fully test-time adaptation by entropy minimization,”arXiv preprint arXiv:2006.10726, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2006

[9] [9]

Ef- ficient test-time model adaptation without forgetting,

S. Niu, J. Wu, Y . Zhang, Y . Chen, S. Zheng, P. Zhao, and M. Tan, “Ef- ficient test-time model adaptation without forgetting,” inInternational conference on machine learning. PMLR, 2022, pp. 16 888–16 905

work page 2022

[10] [10]

Towards stable test-time adaptation in dynamic wild world.arXiv preprint arXiv:2302.12400, 2023

S. Niu, J. Wu, Y . Zhang, Z. Wen, Y . Chen, P. Zhao, and M. Tan, “Towards stable test-time adaptation in dynamic wild world,”arXiv preprint arXiv:2302.12400, 2023

work page arXiv 2023

[11] [11]

Mecta: Memory-economic continual test-time model adaptation,

J. Hong, L. Lyu, J. Zhou, and M. Spranger, “Mecta: Memory-economic continual test-time model adaptation,” in2023 International Conference on Learning Representations, 2023

work page 2023

[12] [12]

Shift: a synthetic driving dataset for continuous multi-task domain adaptation,

T. Sun, M. Segu, J. Postels, Y . Wang, L. Van Gool, B. Schiele, F. Tombari, and F. Yu, “Shift: a synthetic driving dataset for continuous multi-task domain adaptation,” inProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, 2022, pp. 21 371– 21 382

work page 2022

[13] [13]

Where are we in the search for an artificial visual cortex for embodied intelligence?

A. Majumdar, K. Yadav, S. Arnaud, J. Ma, C. Chen, S. Silwal, A. Jain, V .-P. Berges, T. Wu, J. Vakilet al., “Where are we in the search for an artificial visual cortex for embodied intelligence?”Advances in Neural Information Processing Systems, vol. 36, pp. 655–677, 2023

work page 2023

[14] [14]

arXiv preprint arXiv:2303.15361 , year=

J. Liang, R. He, and T. Tan, “A comprehensive survey on test-time adaptation under distribution shifts,”arXiv preprint arXiv:2303.15361, 2023

work page arXiv 2023

[15] [15]

Lote-animal: A long time-span dataset for endangered animal behavior understanding,

D. Liu, J. Hou, S. Huang, J. Liu, Y . He, B. Zheng, J. Ning, and J. Zhang, “Lote-animal: A long time-span dataset for endangered animal behavior understanding,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 20 064–20 075

work page 2023

[16] [16]

Online distribution shift detection via recency prediction,

R. Luo, R. Sinha, Y . Sun, A. Hindy, S. Zhao, S. Savarese, E. Schmer- ling, and M. Pavone, “Online distribution shift detection via recency prediction,”arXiv preprint arXiv:2211.09916, 2022

work page arXiv 2022

[17] [17]

Window-based distri- bution shift detection for deep neural networks,

G. Bar Shalom, Y . Geifman, and R. El-Yaniv, “Window-based distri- bution shift detection for deep neural networks,”Advances in Neural Information Processing Systems, vol. 36, 2024

work page 2024

[18] [18]

Effective restoration of source knowledge in continual test time adaptation,

F. F. Niloy, S. M. Ahmed, D. S. Raychaudhuri, S. Oymak, and A. K. Roy-Chowdhury, “Effective restoration of source knowledge in continual test time adaptation,” inProceedings of the IEEE/CVF Winter Confer- ence on Applications of Computer Vision, 2024, pp. 2091–2100. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 15

work page 2024

[19] [19]

A simple signal for domain shift,

G. Chakrabarty, M. Sreenivas, and S. Biswas, “A simple signal for domain shift,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 3577–3584

work page 2023

[20] [20]

Some methods for classification and analysis of multivariate observations,

J. MacQueenet al., “Some methods for classification and analysis of multivariate observations,” inProceedings of the fifth Berkeley sympo- sium on mathematical statistics and probability, vol. 1, no. 14. Oakland, CA, USA, 1967, pp. 281–297

work page 1967

[21] [21]

Uncertainty-calibrated test-time model adaptation without forgetting,

M. Tan, G. Chen, J. Wu, Y . Zhang, Y . Chen, P. Zhao, and S. Niu, “Uncertainty-calibrated test-time model adaptation without forgetting,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

work page 2025

[22] [22]

Learning multiple layers of features from tiny images,

A. Krizhevsky, G. Hintonet al., “Learning multiple layers of features from tiny images,” 2009

work page 2009

[23] [23]

Imagenet: A large-scale hierarchical image database,

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009, pp. 248–255

work page 2009

[24] [24]

Pytorch image models,

R. Wightman, “Pytorch image models,” https://github.com/rwightman/ pytorch-image-models, 2019

work page 2019

[25] [25]

Universal test-time adaptation through weight ensembling, diversity weighting, and prior correction,

R. A. Marsden, M. D ¨obler, and B. Yang, “Universal test-time adaptation through weight ensembling, diversity weighting, and prior correction,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 2555–2565

work page 2024

[26] [26]

L-tta: Lightweight test-time adaptation using a versatile stem layer,

J. Shin and H. Kim, “L-tta: Lightweight test-time adaptation using a versatile stem layer,”Advances in Neural Information Processing Systems, vol. 37, pp. 39 325–39 349, 2024

work page 2024

[27] [27]

Surgeon: Memory-adaptive fully test-time adaptation via dynamic activation sparsity,

K. Ma, J. Tang, B. Guo, F. Dang, S. Liu, Z. Zhu, L. Wu, C. Fang, Y .-C. Chen, Z. Yuet al., “Surgeon: Memory-adaptive fully test-time adaptation via dynamic activation sparsity,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 30 514–30 523

work page 2025

[28] [28]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778

work page 2016

[29] [29]

Rethinking Atrous Convolution for Semantic Image Segmentation

L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmentation,”arXiv preprint arXiv:1706.05587, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[30] [30]

Mobilenetv2: Inverted residuals and linear bottlenecks,

M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4510–4520

work page 2018

[31] [31]

MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer

S. Mehta and M. Rastegari, “Mobilevit: light-weight, general- purpose, and mobile-friendly vision transformer,”arXiv preprint arXiv:2110.02178, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[32] [32]

Artificial intelligence for long-term robot autonomy: A survey,

L. Kunze, N. Hawes, T. Duckett, M. Hanheide, and T. Krajn´ık, “Artificial intelligence for long-term robot autonomy: A survey,”IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 4023–4030, 2018

work page 2018

[33] [33]

Addressing appearance change in outdoor robotics with adversarial domain adaptation,

M. Wulfmeier, A. Bewley, and I. Posner, “Addressing appearance change in outdoor robotics with adversarial domain adaptation,” in2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2017, pp. 1551–1558

work page 2017

[34] [34]

Robot perceptual adaptation to environment changes for long-term human teammate following,

S. Siva and H. Zhang, “Robot perceptual adaptation to environment changes for long-term human teammate following,”The International Journal of Robotics Research, vol. 41, no. 7, pp. 706–720, 2022

work page 2022

[35] [35]

The epic-kitchens dataset: Collection, challenges and baselines,

D. Damen, H. Doughty, G. M. Farinella, S. Fidler, A. Furnari, E. Kaza- kos, D. Moltisanti, J. Munro, T. Perrett, W. Price, and M. Wray, “The epic-kitchens dataset: Collection, challenges and baselines,”IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 43, no. 11, pp. 4125–4141, 2021

work page 2021

[36] [36]

In search of lost online test-time adaptation: A survey,

Z. Wang, Y . Luo, L. Zheng, Z. Chen, S. Wang, and Z. Huang, “In search of lost online test-time adaptation: A survey,”International Journal of Computer Vision, pp. 1–34, 2024

work page 2024

[37] [37]

Beyond model adaptation at test time: A survey,

Z. Xiao and C. G. Snoek, “Beyond model adaptation at test time: A survey,”arXiv preprint arXiv:2411.03687, 2024

work page arXiv 2024

[38] [38]

Revisiting batch normalization for improving corruption robustness,

P. Benz, C. Zhang, A. Karjauv, and I. S. Kweon, “Revisiting batch normalization for improving corruption robustness,” inProceedings of the IEEE/CVF winter conference on applications of computer vision, 2021, pp. 494–503

work page 2021

[39] [39]

Sotta: Robust test-time adaptation on noisy data streams,

T. Gong, Y . Kim, T. Lee, S. Chottananurak, and S.-J. Lee, “Sotta: Robust test-time adaptation on noisy data streams,”Advances in Neural Information Processing Systems, vol. 36, pp. 14 070–14 093, 2023

work page 2023

[40] [40]

Continual test-time domain adaptation,

Q. Wang, O. Fink, L. Van Gool, and D. Dai, “Continual test-time domain adaptation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 7201–7211

work page 2022

[41] [41]

Testing exchangeability on-line,

V . V ovk, I. Nouretdinov, and A. Gammerman, “Testing exchangeability on-line,” inProceedings of the 20th International Conference on Ma- chine Learning (ICML-03), 2003, pp. 768–775

work page 2003

[42] [42]

Entropy-based concept shift detection,

P. V orburger and A. Bernstein, “Entropy-based concept shift detection,” inSixth International Conference on Data Mining (ICDM’06). IEEE, 2006, pp. 1113–1118

work page 2006