Test-Time Training with Self-Supervision for Generalization under Distribution Shifts
read the original abstract
In this paper, we propose Test-Time Training, a general approach for improving the performance of predictive models when training and test data come from different distributions. We turn a single unlabeled test sample into a self-supervised learning problem, on which we update the model parameters before making a prediction. This also extends naturally to data in an online stream. Our simple approach leads to improvements on diverse image classification benchmarks aimed at evaluating robustness to distribution shifts.
This paper has not been read by Pith yet.
Forward citations
Cited by 3 Pith papers
-
The Power of Test-Time Training for Approximate Sampling
Establishes a quadratic lower bound on query complexity for sampling from large classes of distributions given approximate density oracles, answers an open question on optimality of random walks, and shows circumventi...
-
GITCO: Gated Inference-Time Context Optimization in TSFMs
GITCO delivers +1.95% average MASE reduction on TimesFM 2.5 across 53 datasets by gated inference-time suppression of anomalous patches, capturing 89.9% of the improvement upper bound.
-
Learning Inference Concurrency in DynamicGate MLP Structural and Mathematical Justification
DynamicGate MLP enables concurrent learning and inference by separating gating from representation parameters, so that even asynchronous updates produce outputs equivalent to a valid fixed model snapshot.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.