Test-Time Training with Self-Supervision for Generalization under Distribution Shifts

Alexei A. Efros; John Miller; Moritz Hardt; Xiaolong Wang; Yu Sun; Zhuang Liu

arxiv: 1909.13231 · v3 · pith:XF2D2IKAnew · submitted 2019-09-29 · 💻 cs.LG · cs.CV· stat.ML

Test-Time Training with Self-Supervision for Generalization under Distribution Shifts

Yu Sun , Xiaolong Wang , Zhuang Liu , John Miller , Alexei A. Efros , Moritz Hardt This is my paper

classification 💻 cs.LG cs.CVstat.ML

keywords trainingapproachdatadistributionshiftstesttest-timeaimed

0 comments

read the original abstract

In this paper, we propose Test-Time Training, a general approach for improving the performance of predictive models when training and test data come from different distributions. We turn a single unlabeled test sample into a self-supervised learning problem, on which we update the model parameters before making a prediction. This also extends naturally to data in an online stream. Our simple approach leads to improvements on diverse image classification benchmarks aimed at evaluating robustness to distribution shifts.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

The Power of Test-Time Training for Approximate Sampling
cs.DS 2026-06 unverdicted novelty 7.0

Establishes a quadratic lower bound on query complexity for sampling from large classes of distributions given approximate density oracles, answers an open question on optimality of random walks, and shows circumventi...
GITCO: Gated Inference-Time Context Optimization in TSFMs
cs.AI 2026-06 unverdicted novelty 6.0

GITCO delivers +1.95% average MASE reduction on TimesFM 2.5 across 53 datasets by gated inference-time suppression of anomalous patches, capturing 89.9% of the improvement upper bound.
Learning Inference Concurrency in DynamicGate MLP Structural and Mathematical Justification
cs.LG 2026-04 unverdicted novelty 4.0

DynamicGate MLP enables concurrent learning and inference by separating gating from representation parameters, so that even asynchronous updates produce outputs equivalent to a valid fixed model snapshot.