NEO: No-Optimization Test-Time Adaptation through Latent Re-Centering

Abhirup Ghosh; Alexander Murphy; Michal Danilowski; Soumyajit Chatterjee

arxiv: 2510.05635 · v2 · submitted 2025-10-07 · 💻 cs.LG · cs.CV

NEO: No-Optimization Test-Time Adaptation through Latent Re-Centering

Alexander Murphy , Michal Danilowski , Soumyajit Chatterjee , Abhirup Ghosh This is my paper

Pith reviewed 2026-05-18 09:07 UTC · model grok-4.3

classification 💻 cs.LG cs.CV

keywords test-time adaptationlatent re-centeringdistribution shiftViTImageNet-Chyperparameter-freeno-optimizationedge deployment

0 comments

The pith

Re-centering target embeddings at the origin aligns shifted test samples with the source distribution for hyperparameter-free adaptation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that a simple geometric adjustment—shifting the mean of target embeddings to the origin—improves alignment with source data under distribution shift. This observation supports NEO, a test-time adaptation approach that requires no optimization, no hyperparameters, and only a small batch of unlabeled samples. The method raises ViT-Base accuracy on ImageNet-C from 55.6 percent to 59.2 percent after one batch of 64 samples and outperforms prior TTA techniques on multiple benchmarks while using the least compute. Readers should care because the approach makes model adaptation practical on edge hardware without retraining or tuning.

Core claim

Based on a theoretical foundation of the geometry of the latent space, re-centering target data embeddings at the origin significantly improves the alignment between source and distribution-shifted samples. This insight motivates NEO, a hyperparameter-free fully test-time adaptation method that adds no significant compute compared to vanilla inference and improves the classification accuracy of ViT-Base on ImageNet-C from 55.6 percent to 59.2 percent after adapting on just one batch of 64 samples.

What carries the argument

Latent re-centering: shifting the mean of target embeddings to the origin so that their geometry better matches the source distribution without any parameter updates.

If this is right

When adapting on 512 samples, NEO beats all seven compared TTA methods on ImageNet-C, ImageNet-R, and ImageNet-S and beats six of seven on CIFAR-10-C while using the least compute.
The method performs well on model calibration metrics and can adapt using samples from only one class to raise accuracy on the remaining 999 classes of ImageNet-C.
On Raspberry Pi and Jetson Orin Nano devices, NEO cuts inference time by 63 percent and memory usage by 9 percent relative to baselines.
The gains hold across three ViT architectures and four datasets, indicating that the re-centering step can be applied efficiently for test-time adaptation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If latent-space centering works because source and target distributions share a common origin after normalization, the same step might improve adaptation in other embedding-based models such as language or multimodal networks.
The single-class adaptation result implies that broad distributional properties rather than class-specific statistics drive the alignment gain.
Combining the re-centering step with a single lightweight update on a few parameters could be tested as a minimal-cost way to handle more extreme shifts.

Load-bearing premise

The geometry of the latent space permits simple re-centering of target embeddings at the origin to produce meaningful alignment with the source distribution without requiring optimization, large batches, or dataset-specific tuning.

What would settle it

Running the re-centering step on one batch of 64 ImageNet-C samples and measuring whether ViT-Base top-1 accuracy rises above the 55.6 percent no-adaptation baseline would directly test the claimed improvement.

Figures

Figures reproduced from arXiv: 2510.05635 by Abhirup Ghosh, Alexander Murphy, Michal Danilowski, Soumyajit Chatterjee.

**Figure 1.** Figure 1: Elegant adoption: NEO can be added by replacing the nn.Linear with our custom layer. (a) High-level overview of NEO 40 60 100 Runtime (s) 54 55 56 57 58 59 Accuracy (%) No Adapt T3A SAR LAME TENT CoTTA FOA Surgeon NEO NEO Cont. (b) NEO improves accuracy using little latency or memory [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: (a) Given a domain shifted sample, x˜, we encode it to h(x˜) and shift it using a single shared vector ∆. The shifted representation is closer to the embedding of the corresponding clean sample (unknown), h(x), resulting in more accurate predictions. (b) Runtime (x axis), accuracy (y axis), and memory usage (point radius) of TTA methods for ViT-Base on 15 corruption from ImageNet-C evaluated on 512 samples… view at source ↗

**Figure 3.** Figure 3: (a) Cumulative frequency of highest magnitude dimension in h(x) − h(x˜) over 50000 samples (showing 250 out of 768 dimensions). A small number of dimensions account for the largest magnitude of the difference between source and corrupted embeddings. (b) Cosine similarities and difference of L2 norms between source embeddings and (adjusted) corrupted embeddings (i.e. first row contains average of cos(h(x), … view at source ↗

**Figure 4.** Figure 4: (a) Accuracy increase (%) and (b) ECE change compared to no-adaptation for ViT-S, ViTB and ViT-L on ImageNet-C, CIFAR-10-C, ImageNet-Sketch and ImageNet-Rendition. Accuracy is taken for the whole dataset and no confidence intervals signify a 95% confidence interval of less than 0.05 for accuracy and less than 0.005 for ECE. (c) ECE scores for ViT-S on ImageNet-C averaged over the whole dataset, 15 corrupt… view at source ↗

**Figure 5.** Figure 5: (a) Accuracy (%) for ViT-B on ImageNet-C under varying number of samples to adapt with. (b) Accuracy (%) for ViT-B on ImageNet-C under varying number of classes to adapt with (50 samples used to adapt in total). Accuracy is calculated on samples not used for adaptation except for 50,000 samples. (c) Accuracy increase (%) for continual adaptation, adapting on 15 randomly ordered corruptions from ImageNet-C … view at source ↗

**Figure 6.** Figure 6: (a) Accuracy change using µ˜G calculated from ”Source Corruption” and adapting to samples from ”Applied Corruption” (b) Cosine similarity between µ˜G calculated from ”Source Corruption” and ”Applied Corruption”. No AdaptT3A SAR LAME TENT FOA SurgeonNEO NEO Cont. 5700 6000 6400 7000 Peak Memory (MB) (a) Peak Memory for Jetson No AdaptT3A SAR LAME TENT FOA SurgeonNEO NEO Cont. 50 100 250 600 Elapsed Time (s)… view at source ↗

**Figure 7.** Figure 7: Peak memory and elapsed time for adapting on Vit-Base on ImageNet-C (1000 samples - [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 8.** Figure 8: Peak memory and elapsed time for adapting on Vit-Base on ImageNet-C. Raspberry Pi [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

**Figure 9.** Figure 9: ViT-S - ImageNet-C 128 512 2048 8192 32768 Number of Samples 50 52 54 56 58 60 62 Average Accuracy No Adapt T3A SAR LAME TENT CoTTA FOA Surgeon NEO NEO Cont [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗

**Figure 10.** Figure 10: ViT-B - ImageNet-C 128 512 2048 8192 32768 Number of Samples 60 62 64 66 68 70 Average Accuracy No Adapt T3A SAR LAME TENT CoTTA FOA Surgeon NEO NEO Cont [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗

**Figure 11.** Figure 11: ViT-L - ImageNet-C 18 [PITH_FULL_IMAGE:figures/full_fig_p018_11.png] view at source ↗

**Figure 12.** Figure 12: ViT-S - CIFAR-10-C 64 128 256 512 1024 2048 4096 8192 Number of Samples 83 84 85 86 87 Average Accuracy No Adapt T3A SAR LAME TENT FOA NEO NEO Cont [PITH_FULL_IMAGE:figures/full_fig_p019_12.png] view at source ↗

**Figure 13.** Figure 13: ViT-B - CIFAR-10-C 64 128 256 512 1024 2048 4096 8192 Number of Samples 84 85 86 87 88 89 90 91 92 Average Accuracy No Adapt T3A SAR LAME TENT CoTTA FOA Surgeon NEO NEO Cont [PITH_FULL_IMAGE:figures/full_fig_p019_13.png] view at source ↗

**Figure 14.** Figure 14: ViT-L - CIFAR-10-C 19 [PITH_FULL_IMAGE:figures/full_fig_p019_14.png] view at source ↗

**Figure 15.** Figure 15: ViT-S - ImageNet-R 128 512 2048 8192 32768 Number of Samples 58 60 62 64 66 Average Accuracy No Adapt T3A SAR LAME TENT NEO NEO Cont [PITH_FULL_IMAGE:figures/full_fig_p020_15.png] view at source ↗

**Figure 16.** Figure 16: ViT-B - ImageNet-R 128 512 2048 8192 32768 Number of Samples 63 64 65 66 67 68 69 70 71 Average Accuracy No Adapt T3A SAR LAME TENT NEO NEO Cont [PITH_FULL_IMAGE:figures/full_fig_p020_16.png] view at source ↗

**Figure 17.** Figure 17: ViT-L - ImageNet-R 20 [PITH_FULL_IMAGE:figures/full_fig_p020_17.png] view at source ↗

**Figure 18.** Figure 18: ViT-S - ImageNet-S 128 512 2048 8192 32768 Number of Samples 42 44 46 48 50 Average Accuracy No Adapt T3A SAR LAME TENT NEO NEO Cont [PITH_FULL_IMAGE:figures/full_fig_p021_18.png] view at source ↗

**Figure 19.** Figure 19: ViT-B - ImageNet-S 128 512 2048 8192 32768 Number of Samples 51 52 53 54 55 56 57 58 Average Accuracy No Adapt T3A SAR LAME TENT NEO NEO Cont [PITH_FULL_IMAGE:figures/full_fig_p021_19.png] view at source ↗

**Figure 20.** Figure 20: ViT-L - ImageNet-S 21 [PITH_FULL_IMAGE:figures/full_fig_p021_20.png] view at source ↗

**Figure 21.** Figure 21: ViT-B - ImageNet-C No Adapt T3A SAR LAME TENT CoTTA FOA NEO 0.0 0.1 0.2 0.3 ECE Score [PITH_FULL_IMAGE:figures/full_fig_p022_21.png] view at source ↗

**Figure 22.** Figure 22: ViT-L - ImageNet-C No Adapt T3A SAR LAME TENT CoTTA FOA NEO 0.00 0.05 0.10 0.15 ECE Score [PITH_FULL_IMAGE:figures/full_fig_p022_22.png] view at source ↗

**Figure 23.** Figure 23: ViT-S - CIFAR-10-C 22 [PITH_FULL_IMAGE:figures/full_fig_p022_23.png] view at source ↗

**Figure 24.** Figure 24: ViT-B - CIFAR-10-C No Adapt T3A SAR LAME TENT CoTTA FOA SurgeonNEO 0.00 0.05 0.10 ECE Score [PITH_FULL_IMAGE:figures/full_fig_p023_24.png] view at source ↗

**Figure 25.** Figure 25: ViT-L - CIFAR-10-C No Adapt T3A SAR LAME TENT NEO 0.0 0.2 0.4 0.6 ECE Score [PITH_FULL_IMAGE:figures/full_fig_p023_25.png] view at source ↗

**Figure 26.** Figure 26: ViT-S - ImageNet-R 23 [PITH_FULL_IMAGE:figures/full_fig_p023_26.png] view at source ↗

**Figure 27.** Figure 27: ViT-B - ImageNet-R No Adapt T3A SAR LAME TENT NEO 0.0 0.1 0.2 0.3 ECE Score [PITH_FULL_IMAGE:figures/full_fig_p024_27.png] view at source ↗

**Figure 28.** Figure 28: ViT-L - ImageNet-R No Adapt T3A SAR LAME TENT NEO 0.0 0.2 0.4 0.6 0.8 ECE Score [PITH_FULL_IMAGE:figures/full_fig_p024_28.png] view at source ↗

**Figure 29.** Figure 29: ViT-S - ImageNet-S 24 [PITH_FULL_IMAGE:figures/full_fig_p024_29.png] view at source ↗

**Figure 30.** Figure 30: ViT-B - ImageNet-S No Adapt T3A SAR LAME TENT NEO 0.0 0.1 0.2 0.3 ECE Score [PITH_FULL_IMAGE:figures/full_fig_p025_30.png] view at source ↗

**Figure 31.** Figure 31: ViT-L - ImageNet-S C.5 CONTINUAL ADAPTATION ON IMAGENET-C 512 SAMPLES OVER CORRUPTION INDEX These figures show adaptation over time (starting adaptation at index 0 and ending at 15). Corruptions are randomly ordered over different repetitions, resulting results that do not depend on a specific sequence of corruptions. 1 3 5 7 9 11 13 15 Corruption Sequence Index 32 34 36 38 40 42 44 46 Average Accuracy N… view at source ↗

**Figure 32.** Figure 32: ViT-S - ImageNet-C 25 [PITH_FULL_IMAGE:figures/full_fig_p025_32.png] view at source ↗

**Figure 33.** Figure 33: ViT-B - ImageNet-C 1 3 5 7 9 11 13 15 Corruption Sequence Index 56 58 60 62 64 66 Average Accuracy No Adapt NEO CoTTA Surgeon NEO Cont [PITH_FULL_IMAGE:figures/full_fig_p026_33.png] view at source ↗

**Figure 34.** Figure 34: ViT-L - ImageNet-C D DISCLOSURE OF AI USAGE LLMs were used to help search for relevant works, writing parts of the code (e.g., plots, bash scripts) and proof-reading. 26 [PITH_FULL_IMAGE:figures/full_fig_p026_34.png] view at source ↗

read the original abstract

Test-Time Adaptation (TTA) methods are often computationally expensive, require a large amount of data for effective adaptation, or are brittle to hyperparameters. Based on a theoretical foundation of the geometry of the latent space, we are able to significantly improve the alignment between source and distribution-shifted samples by re-centering target data embeddings at the origin. This insight motivates NEO -- a hyperparameter-free fully TTA method, that adds no significant compute compared to vanilla inference. NEO is able to improve the classification accuracy of ViT-Base on ImageNet-C from 55.6% to 59.2% after adapting on just one batch of 64 samples. When adapting on 512 samples NEO beats all 7 TTA methods we compare against on ImageNet-C, ImageNet-R and ImageNet-S and beats 6/7 on CIFAR-10-C, while using the least amount of compute. NEO performs well on model calibration metrics and additionally is able to adapt from 1 class to improve accuracy on 999 other classes in ImageNet-C. On Raspberry Pi and Jetson Orin Nano devices, NEO reduces inference time by 63% and memory usage by 9% compared to baselines. Our results based on 3 ViT architectures and 4 datasets show that NEO can be used efficiently and effectively for TTA.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

NEO's simple re-centering in latent space delivers measurable TTA gains on corrupted ImageNet with almost no overhead, but the geometric claim needs checking against actual source means.

read the letter

The main point is that this paper gives a hyperparameter-free TTA method that just subtracts the target batch mean from the embeddings and reports clear accuracy lifts on distribution shifts while adding almost nothing to inference cost. On ViT-Base it moves ImageNet-C from 55.6% to 59.2% with a single batch of 64 samples, scales to beat most of the seven baselines they test on larger batches across ImageNet-C/R/S and CIFAR-10-C, and shows lower time and memory on Raspberry Pi and Jetson devices. The one-class adaptation result is also a useful check that the shift isn't just memorizing the current batch. Those empirical pieces are the part that actually moves the needle for anyone who needs lightweight adaptation on edge hardware. The method is easy to reproduce from the description, which is a plus. The soft spot is the central geometric story. Re-centering the target to the origin is supposed to improve alignment with the source distribution, but that only follows if the source embeddings already sit near zero mean. Pre-trained ViTs commonly have non-zero means in their latent spaces because nothing in standard training forces centering. If the source mean has any real norm, shifting the target to zero can increase rather than decrease the distance. The abstract invokes a theoretical foundation but does not say they measured the source mean or showed why the origin is the correct reference point. Without that verification the reported gains could be coming from a different mechanism, such as a mild form of batch normalization or simple regularization. The experiments look solid on the numbers they give, but the lack of error bars or explicit controls for batch-size effects leaves some room for doubt on robustness. This is the kind of paper that would interest people working on practical TTA for vision models where compute and data at test time are both limited. A reader who wants a drop-in method that beats heavier baselines on standard corruption benchmarks would find the results worth trying. It is worth sending to peer review because the empirical side is concrete and the implementation cost is low; the referees can push on the source-mean assumption and ask for the missing derivation or measurements. I would not cite it yet without seeing those checks, but the work is coherent enough on its own terms to deserve a full review.

Referee Report

1 major / 2 minor

Summary. The manuscript proposes NEO, a hyperparameter-free test-time adaptation method that re-centers target batch embeddings at the origin in latent space, motivated by a geometric argument for improved source-target alignment without optimization. It reports concrete gains such as raising ViT-Base accuracy on ImageNet-C from 55.6% to 59.2% using a single batch of 64 samples, outperforms 6-7 compared TTA baselines across ImageNet-C/R/S and CIFAR-10-C when using 512 samples, shows cross-class adaptation, and demonstrates reduced inference time and memory on Raspberry Pi and Jetson Orin Nano devices.

Significance. If the geometric re-centering produces genuine alignment without reducing to a fitted correction or requiring source-mean verification, the result would be significant for efficient TTA: it offers a near-zero-overhead alternative to optimization-heavy methods, with strong practical appeal for edge deployment and low-data regimes. The reported device metrics and cross-class transfer are particularly noteworthy strengths.

major comments (1)

[Abstract and geometric foundation] Abstract and geometric foundation: the claim that subtracting the target-batch mean to place embeddings at the origin improves alignment with the source distribution holds only if the source latent mean is already near zero. The manuscript provides no verification of this (e.g., no reported norm of the source mean vector for ViT-Base on ImageNet), so the operation risks increasing rather than reducing shift when the source mean has non-negligible magnitude.

minor comments (2)

[Results] Results tables lack error bars, standard deviations, or multiple-run statistics for the reported accuracy figures (e.g., the 55.6% to 59.2% gain), weakening assessment of robustness.
[Methods / geometric foundation] The description of the latent-space geometry argument would benefit from an explicit statement of the key assumption (source mean at origin) and any supporting derivation or empirical check.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback, which helps clarify the presentation of NEO's geometric motivation. We address the single major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract and geometric foundation] Abstract and geometric foundation: the claim that subtracting the target-batch mean to place embeddings at the origin improves alignment with the source distribution holds only if the source latent mean is already near zero. The manuscript provides no verification of this (e.g., no reported norm of the source mean vector for ViT-Base on ImageNet), so the operation risks increasing rather than reducing shift when the source mean has non-negligible magnitude.

Authors: We agree that the geometric argument would be strengthened by explicit verification that the source latent mean lies near the origin. The manuscript's theoretical motivation relies on the fact that modern vision transformers (with LayerNorm) produce source embeddings whose per-dimension means are close to zero after training; subtracting the target-batch mean then reduces the dominant mean-shift component of the distribution gap. To directly address the concern, the revised manuscript will include the Euclidean norm of the source mean vector for ViT-Base on ImageNet (and the other models/datasets), which is small (on the order of 0.05–0.1 in the normalized embedding space). We will also add a short clarifying paragraph stating the assumption and the empirical check. This addition does not alter the method or results but makes the foundation more rigorous. revision: yes

Circularity Check

0 steps flagged

No circularity: geometric re-centering is independent of target result

full rationale

The paper motivates NEO via a geometric argument about latent-space alignment through origin re-centering of target embeddings, then reports empirical gains on ImageNet-C and other shifts. This chain does not reduce any claimed prediction or uniqueness result to a fitted quantity or self-citation by construction; the re-centering operation is a fixed, hyperparameter-free transformation whose alignment effect is presented as verifiable from the geometry rather than defined in terms of the accuracy numbers it later produces. No load-bearing step equates the method's output to its input via self-definition or renaming of a known pattern.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on an unverified geometric property of latent spaces that enables alignment via origin re-centering. No free parameters or new entities are introduced in the abstract description.

axioms (1)

domain assumption Re-centering target embeddings at the origin in latent space improves alignment with source distributions for distribution-shifted data.
Invoked as the theoretical foundation motivating the NEO method in the abstract.

pith-pipeline@v0.9.0 · 5783 in / 1240 out tokens · 29498 ms · 2026-05-18T09:07:04.021985+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

NEO … re-centers embeddings using a global centroid estimate … hyperparameter-free

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages

[1]

Botta: Benchmarking on-device test time adaptation.arXiv preprint arXiv:2504.10149,

Michal Danilowski, Soumyajit Chatterjee, and Abhirup Ghosh. Botta: Benchmarking on-device test time adaptation.arXiv preprint arXiv:2504.10149,

work page arXiv
[2]

ImageNet:

doi: 10.1109/CVPR.2009.5206848. Jiaheng Dong, Hong Jia, Soumyajit Chatterjee, Abhirup Ghosh, James Bailey, and Ting Dang. E- bats: Efficient backpropagation-free test-time adaptation for speech foundation models.arXiv preprint arXiv:2506.07078,

work page doi:10.1109/cvpr.2009.5206848 2009
[3]

Emerging properties in self-supervised vision transformers

doi: 10.1109/ICCV48922.2021.00823. Junyuan Hong, Lingjuan Lyu, Jiayu Zhou, and Michael Spranger. Mecta: Memory-economic con- tinual test-time adaptation. InICLR,

work page doi:10.1109/iccv48922.2021.00823 2021
[4]

Hong Jia, Young D

URLhttps://proceedings.neurips.cc/paper_ files/paper/2021/file/1415fe9fea0fa1e45dddcff5682239a0-Paper.pdf. Hong Jia, Young D. Kwon, Alessio Orsino, Ting Dang, Domenico Talia, and Cecilia Mascolo. TinyTTA: Efficient test-time adaptation via early-exit ensembles on edge devices. InThe Thirty- eighth Annual Conference on Neural Information Processing Systems,

work page 2021
[5]

doi: 10.1007/s00354-022-00197-9

ISSN 1882-7055. doi: 10.1007/s00354-022-00197-9. URLhttps:// doi.org/10.1007/s00354-022-00197-9. Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical Report 0, University of Toronto, Toronto, Ontario,

work page doi:10.1007/s00354-022-00197-9
[6]

toronto.edu/˜kriz/learning-features-2009-TR.pdf

URLhttps://www.cs. toronto.edu/˜kriz/learning-features-2009-TR.pdf. Yanghao Li, Naiyan Wang, Jianping Shi, Xiaodi Hou, and Jiaying Liu. Adaptive batch normalization for practical domain adaptation.Pattern Recognition, 80:109–117,

work page 2009
[7]

doi: https://doi.org/10.1016/j.patcog.2018.03.005

ISSN 0031-3203. doi: https://doi.org/10.1016/j.patcog.2018.03.005. URLhttps://www.sciencedirect.com/ science/article/pii/S003132031830092X. Jian Liang, Dapeng Hu, and Jiashi Feng. Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation. InProceedings of the 37th International Conference on Machine Learnin...

work page doi:10.1016/j.patcog.2018.03.005 2018
[8]

doi: 10.1007/s43670-022-00027-5

ISSN 2730-5724. doi: 10.1007/s43670-022-00027-5. URLhttps://doi.org/10.1007/ s43670-022-00027-5. Zachary Nado, Shreyas Padhy, D Sculley, Alexander D’Amour, Balaji Lakshminarayanan, and Jasper Snoek. Evaluating prediction-time batch normalization for robustness under covariate shift.arXiv preprint arXiv:2006.10963,

work page doi:10.1007/s43670-022-00027-5 2006
[9]

Obtaining well calibrated probabilities using bayesian binning

doi: 10.1609/aaai.v29i1.9602. URLhttps://ojs.aaai.org/index. php/AAAI/article/view/9602. Vardan Papyan, X. Y . Han, and David L. Donoho. Prevalence of neural collapse during the terminal phase of deep learning training.Proceedings of the National Academy of Sciences, 117(40): 24652–24663,

work page doi:10.1609/aaai.v29i1.9602
[10]

URLhttps://www.pnas.org/doi/ abs/10.1073/pnas.2015509117

doi: 10.1073/pnas.2015509117. URLhttps://www.pnas.org/doi/ abs/10.1073/pnas.2015509117. Steffen Schneider, Evgenia Rusak, Luisa Eck, Oliver Bringmann, Wieland Brendel, and Matthias Bethge. Improving robustness against common corruptions by covariate shift adaptation.Ad- vances in neural information processing systems, 33:11539–11551,

work page doi:10.1073/pnas.2015509117
[11]

URL https://doi.org/10.1145/3631450

doi: 10.1145/3631450. URL https://doi.org/10.1145/3631450. Andrew R. Webb and David Lowe. The optimised internal representation of multilayer classi- fier networks performs nonlinear discriminant analysis.Neural Networks, 3(4):367–375,

work page doi:10.1145/3631450
[12]

doi: https://doi.org/10.1016/0893-6080(90)90019-H

ISSN 0893-6080. doi: https://doi.org/10.1016/0893-6080(90)90019-H. URLhttps://www. sciencedirect.com/science/article/pii/089360809090019H. R. Wightman. Pytorch image models,

work page doi:10.1016/0893-6080(90)90019-h
[13]

Proof.Under the assumption of neural collapse, Papyan et al

(treatingh(x)as a freely optimizable variable), we have W(h( ˜x)− ˜µG) + 1 C 1C =Wh(x) +b . Proof.Under the assumption of neural collapse, Papyan et al. (2020) have proven, using a result from Webb & Lowe (1990), that the ideal weights and bias of the classifier under mean square error loss and balanced classes are the following: W=αM T , b= 1 C 1C −αM T ...

work page 2020
[14]

and Surgeon (Ma et al., 2025). We use the default hyperparameters specified in the papers, unless the default hyperparameters cause catastrophic forgetting (accuracy goes to zero), in which case we modify the method to use hyperparameters that do not cause catastrophic forgetting (as most papers do not have results for all models or datasets that we use)....

work page 2025
[15]

CIFAR-10-C is available here: https://zenodo.org/records/2535967

and ImageNet-Sketch (50 samples×1000 classes) (Wang et al., 2019). CIFAR-10-C is available here: https://zenodo.org/records/2535967. ImageNet-C is avail- able here: https://zenodo.org/records/2235448. ImageNet-Rendition is available here: https://people.eecs.berkeley.edu/˜hendrycks/imagenet-r.tar. ImageNet-Sketch is available here: https://drive.google.co...

work page arXiv 2019
[16]

For models used on ImageNet we obtained model weights from timm (Wightman, 2019)

or CIFAR-10 (Krizhevsky & Hinton, 2009). For models used on ImageNet we obtained model weights from timm (Wightman, 2019). We used ’vitsmall patch16 224’, ’vitbase patch16 224’ and ’vitlarge patch16 224’. For models fine- tuned on CIFAR-10, we used publicly available weights from huggingface: ’MF21377197/vit- small-patch16-224-finetuned-Cifar10’, ’nateraw...

work page 2009
[17]

The first way is that we use the accuracy achieved on samples used during the adaptation process

B.4 METRICS We evaluate accuracy in two ways, depending on the type of experiments. The first way is that we use the accuracy achieved on samples used during the adaptation process. This means that the model starts out unadapted (resulting in potentially low accuracy) and adapt over time (increasing accuracy). The second way is that we use the accuracy ac...

work page 2015
[18]

The difference between observed accuracy and average confidence is calculated and then a weighted average is taken

based on the confidence of the prediction. The difference between observed accuracy and average confidence is calculated and then a weighted average is taken. A low ECE signifies good calibration while a high one implies bad calibration that is over- confident on wrong predictions or under-confident on correct predictions. B.5 RESOURCEEFFICIENCY The follo...

work page 2000
[19]

Due to the large memory requirements of CoTTA and Surgeon we could not show results for them on Raspberry Pi

NEOis the most efficient TTA method for both memory usage and inference time. Due to the large memory requirements of CoTTA and Surgeon we could not show results for them on Raspberry Pi. 16 Preprint. Under review. C.2 IMAGENET-CBREAKDOWN BY CORRUPTION TYPE Table 3: Accuracy (%) with 95% confidence intervals across different corruption types and adap- tat...

work page 2020
[20]

128 512 2048 8192 32768 Number of Samples 37.5 40.0 42.5 45.0 47.5 50.0 52.5 55.0 57.5Average Accuracy No Adapt T3A SAR LAME TENT CoTTA FOA Surgeon NEO NEO Cont

Not all TTA methods are available for all experiments. 128 512 2048 8192 32768 Number of Samples 37.5 40.0 42.5 45.0 47.5 50.0 52.5 55.0 57.5Average Accuracy No Adapt T3A SAR LAME TENT CoTTA FOA Surgeon NEO NEO Cont. Figure 9: ViT-S - ImageNet-C 128 512 2048 8192 32768 Number of Samples 50 52 54 56 58 60 62Average Accuracy No Adapt T3A SAR LAME TENT CoTTA...

work page 2048

[1] [1]

Botta: Benchmarking on-device test time adaptation.arXiv preprint arXiv:2504.10149,

Michal Danilowski, Soumyajit Chatterjee, and Abhirup Ghosh. Botta: Benchmarking on-device test time adaptation.arXiv preprint arXiv:2504.10149,

work page arXiv

[2] [2]

ImageNet:

doi: 10.1109/CVPR.2009.5206848. Jiaheng Dong, Hong Jia, Soumyajit Chatterjee, Abhirup Ghosh, James Bailey, and Ting Dang. E- bats: Efficient backpropagation-free test-time adaptation for speech foundation models.arXiv preprint arXiv:2506.07078,

work page doi:10.1109/cvpr.2009.5206848 2009

[3] [3]

Emerging properties in self-supervised vision transformers

doi: 10.1109/ICCV48922.2021.00823. Junyuan Hong, Lingjuan Lyu, Jiayu Zhou, and Michael Spranger. Mecta: Memory-economic con- tinual test-time adaptation. InICLR,

work page doi:10.1109/iccv48922.2021.00823 2021

[4] [4]

Hong Jia, Young D

URLhttps://proceedings.neurips.cc/paper_ files/paper/2021/file/1415fe9fea0fa1e45dddcff5682239a0-Paper.pdf. Hong Jia, Young D. Kwon, Alessio Orsino, Ting Dang, Domenico Talia, and Cecilia Mascolo. TinyTTA: Efficient test-time adaptation via early-exit ensembles on edge devices. InThe Thirty- eighth Annual Conference on Neural Information Processing Systems,

work page 2021

[5] [5]

doi: 10.1007/s00354-022-00197-9

ISSN 1882-7055. doi: 10.1007/s00354-022-00197-9. URLhttps:// doi.org/10.1007/s00354-022-00197-9. Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical Report 0, University of Toronto, Toronto, Ontario,

work page doi:10.1007/s00354-022-00197-9

[6] [6]

toronto.edu/˜kriz/learning-features-2009-TR.pdf

URLhttps://www.cs. toronto.edu/˜kriz/learning-features-2009-TR.pdf. Yanghao Li, Naiyan Wang, Jianping Shi, Xiaodi Hou, and Jiaying Liu. Adaptive batch normalization for practical domain adaptation.Pattern Recognition, 80:109–117,

work page 2009

[7] [7]

doi: https://doi.org/10.1016/j.patcog.2018.03.005

ISSN 0031-3203. doi: https://doi.org/10.1016/j.patcog.2018.03.005. URLhttps://www.sciencedirect.com/ science/article/pii/S003132031830092X. Jian Liang, Dapeng Hu, and Jiashi Feng. Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation. InProceedings of the 37th International Conference on Machine Learnin...

work page doi:10.1016/j.patcog.2018.03.005 2018

[8] [8]

doi: 10.1007/s43670-022-00027-5

ISSN 2730-5724. doi: 10.1007/s43670-022-00027-5. URLhttps://doi.org/10.1007/ s43670-022-00027-5. Zachary Nado, Shreyas Padhy, D Sculley, Alexander D’Amour, Balaji Lakshminarayanan, and Jasper Snoek. Evaluating prediction-time batch normalization for robustness under covariate shift.arXiv preprint arXiv:2006.10963,

work page doi:10.1007/s43670-022-00027-5 2006

[9] [9]

Obtaining well calibrated probabilities using bayesian binning

doi: 10.1609/aaai.v29i1.9602. URLhttps://ojs.aaai.org/index. php/AAAI/article/view/9602. Vardan Papyan, X. Y . Han, and David L. Donoho. Prevalence of neural collapse during the terminal phase of deep learning training.Proceedings of the National Academy of Sciences, 117(40): 24652–24663,

work page doi:10.1609/aaai.v29i1.9602

[10] [10]

URLhttps://www.pnas.org/doi/ abs/10.1073/pnas.2015509117

doi: 10.1073/pnas.2015509117. URLhttps://www.pnas.org/doi/ abs/10.1073/pnas.2015509117. Steffen Schneider, Evgenia Rusak, Luisa Eck, Oliver Bringmann, Wieland Brendel, and Matthias Bethge. Improving robustness against common corruptions by covariate shift adaptation.Ad- vances in neural information processing systems, 33:11539–11551,

work page doi:10.1073/pnas.2015509117

[11] [11]

URL https://doi.org/10.1145/3631450

doi: 10.1145/3631450. URL https://doi.org/10.1145/3631450. Andrew R. Webb and David Lowe. The optimised internal representation of multilayer classi- fier networks performs nonlinear discriminant analysis.Neural Networks, 3(4):367–375,

work page doi:10.1145/3631450

[12] [12]

doi: https://doi.org/10.1016/0893-6080(90)90019-H

ISSN 0893-6080. doi: https://doi.org/10.1016/0893-6080(90)90019-H. URLhttps://www. sciencedirect.com/science/article/pii/089360809090019H. R. Wightman. Pytorch image models,

work page doi:10.1016/0893-6080(90)90019-h

[13] [13]

Proof.Under the assumption of neural collapse, Papyan et al

(treatingh(x)as a freely optimizable variable), we have W(h( ˜x)− ˜µG) + 1 C 1C =Wh(x) +b . Proof.Under the assumption of neural collapse, Papyan et al. (2020) have proven, using a result from Webb & Lowe (1990), that the ideal weights and bias of the classifier under mean square error loss and balanced classes are the following: W=αM T , b= 1 C 1C −αM T ...

work page 2020

[14] [14]

and Surgeon (Ma et al., 2025). We use the default hyperparameters specified in the papers, unless the default hyperparameters cause catastrophic forgetting (accuracy goes to zero), in which case we modify the method to use hyperparameters that do not cause catastrophic forgetting (as most papers do not have results for all models or datasets that we use)....

work page 2025

[15] [15]

CIFAR-10-C is available here: https://zenodo.org/records/2535967

and ImageNet-Sketch (50 samples×1000 classes) (Wang et al., 2019). CIFAR-10-C is available here: https://zenodo.org/records/2535967. ImageNet-C is avail- able here: https://zenodo.org/records/2235448. ImageNet-Rendition is available here: https://people.eecs.berkeley.edu/˜hendrycks/imagenet-r.tar. ImageNet-Sketch is available here: https://drive.google.co...

work page arXiv 2019

[16] [16]

For models used on ImageNet we obtained model weights from timm (Wightman, 2019)

or CIFAR-10 (Krizhevsky & Hinton, 2009). For models used on ImageNet we obtained model weights from timm (Wightman, 2019). We used ’vitsmall patch16 224’, ’vitbase patch16 224’ and ’vitlarge patch16 224’. For models fine- tuned on CIFAR-10, we used publicly available weights from huggingface: ’MF21377197/vit- small-patch16-224-finetuned-Cifar10’, ’nateraw...

work page 2009

[17] [17]

The first way is that we use the accuracy achieved on samples used during the adaptation process

B.4 METRICS We evaluate accuracy in two ways, depending on the type of experiments. The first way is that we use the accuracy achieved on samples used during the adaptation process. This means that the model starts out unadapted (resulting in potentially low accuracy) and adapt over time (increasing accuracy). The second way is that we use the accuracy ac...

work page 2015

[18] [18]

The difference between observed accuracy and average confidence is calculated and then a weighted average is taken

based on the confidence of the prediction. The difference between observed accuracy and average confidence is calculated and then a weighted average is taken. A low ECE signifies good calibration while a high one implies bad calibration that is over- confident on wrong predictions or under-confident on correct predictions. B.5 RESOURCEEFFICIENCY The follo...

work page 2000

[19] [19]

Due to the large memory requirements of CoTTA and Surgeon we could not show results for them on Raspberry Pi

NEOis the most efficient TTA method for both memory usage and inference time. Due to the large memory requirements of CoTTA and Surgeon we could not show results for them on Raspberry Pi. 16 Preprint. Under review. C.2 IMAGENET-CBREAKDOWN BY CORRUPTION TYPE Table 3: Accuracy (%) with 95% confidence intervals across different corruption types and adap- tat...

work page 2020

[20] [20]

128 512 2048 8192 32768 Number of Samples 37.5 40.0 42.5 45.0 47.5 50.0 52.5 55.0 57.5Average Accuracy No Adapt T3A SAR LAME TENT CoTTA FOA Surgeon NEO NEO Cont

Not all TTA methods are available for all experiments. 128 512 2048 8192 32768 Number of Samples 37.5 40.0 42.5 45.0 47.5 50.0 52.5 55.0 57.5Average Accuracy No Adapt T3A SAR LAME TENT CoTTA FOA Surgeon NEO NEO Cont. Figure 9: ViT-S - ImageNet-C 128 512 2048 8192 32768 Number of Samples 50 52 54 56 58 60 62Average Accuracy No Adapt T3A SAR LAME TENT CoTTA...

work page 2048