On the Learnability of Test-Time Adaptation: A Recovery Complexity Perspective

Kun-Yang Yu; Lan-Zhe Guo; Ming Yang; Shi-Yu Tian; Yu-Feng Li; Zhi Zhou

arxiv: 2605.28057 · v2 · pith:WYTLD6Z6new · submitted 2026-05-27 · 💻 cs.LG · cs.AI

On the Learnability of Test-Time Adaptation: A Recovery Complexity Perspective

Zhi Zhou , Ming Yang , Shi-Yu Tian , Kun-Yang Yu , Lan-Zhe Guo , Yu-Feng Li This is my paper

Pith reviewed 2026-06-29 13:40 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords test-time adaptationnon-stationary streamsrecovery complexitylearnabilitydistribution shiftsadaptivity-information trade-offtheoretical framework

0 comments

The pith

Test-time adaptation faces fundamental limits measured by recovery complexity under non-stationary streams.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a theoretical framework to analyze whether test-time adaptation can reliably maintain performance on streams where the data distribution keeps changing without labels. It defines recovery complexity as the time after a shift until excess risk stays below a target level with high probability, and extends this to long-term TTA learnability. A discrete surrogate model for the streams unifies the study of gradual and abrupt changes. Order-wise matching lower and upper bounds on recovery complexity are derived, exposing fundamental limits and an adaptivity-information trade-off. This provides unified learnability guarantees that complement prior regret-based analyses.

Core claim

The authors introduce (ε,δ)-Recovery Complexity to quantify the post-shift time required to keep excess risk below a target with high probability, extended to (ε,ρ)-TTA Learnability for long-term reliability. Using a novel discrete surrogate for non-stationary streams, they obtain order-wise matching lower and upper bounds on recovery complexity for gradual and abrupt shifts, which demonstrate fundamental limits of TTA and an intrinsic adaptivity-information trade-off, yielding unified learnability guarantees.

What carries the argument

(ε,δ)-Recovery Complexity, which quantifies the time needed after a distribution shift to maintain low excess risk with high probability, and its extension to TTA learnability.

Load-bearing premise

The discrete surrogate model for non-stationary test streams preserves the essential dynamics of real distribution shifts.

What would settle it

Empirical recovery times in TTA experiments on streams with known shift types that fall outside the order of the derived upper and lower bounds.

Figures

Figures reproduced from arXiv: 2605.28057 by Kun-Yang Yu, Lan-Zhe Guo, Ming Yang, Shi-Yu Tian, Yu-Feng Li, Zhi Zhou.

**Figure 1.** Figure 1: Overview of the theoretical modeling and analysis in this paper. A unified model of non-stationary test streams underpins a local analysis via (ϵ, δ)-Recovery Complexity, which is lifted to a global (ϵ, ρ)-TTA Learnability. stated on the same neighborhood Nr(θ1), consistent with the local proxy-optimal competitor in Definition 2.11. However, even with the local Assumptions 2.1 and 2.2 on Nr(θ1), establishi… view at source ↗

read the original abstract

Test-time adaptation (TTA) aims to adapt models to maintain reliable performance on non-stationary test streams without requiring labeled data. Despite its empirical success, the learnability of TTA under non-stationary streams remains unexplored. A key challenge is the lack of a principled theoretical framework that simultaneously aligns with the TTA objective and captures both continuously evolving distribution shifts and intrinsic information constraints. To address this gap, we propose the first theoretical framework for studying the learnability of TTA and introduce $(\epsilon,\delta)$-Recovery Complexity and $(\epsilon,\rho)$-TTA Learnability. Recovery complexity measures the post-shift time needed to maintain excess risk below a target level with high probability, and is further extended to TTA learnability, which measures the long-term reliability of TTA. Within this framework, we introduce a novel discrete surrogate for non-stationary test streams, enabling a unified and tractable analysis of both gradual and abrupt shifts. We derive order-wise matching lower and upper bounds on recovery complexity, revealing fundamental limits of TTA and an intrinsic adaptivity-information trade-off. These results provide unified learnability guarantees for TTA that complement regret-based analyses.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper sets up recovery complexity and a discrete surrogate to bound TTA learnability under shifts, but the surrogate's realism is the part that needs checking.

read the letter

The paper introduces (ε,δ)-Recovery Complexity to measure how long TTA needs after a shift to keep excess risk low, then extends it to long-term learnability, and derives order-wise matching bounds via a new discrete surrogate for non-stationary streams.

What is new is the framework itself: recovery complexity and TTA learnability as quantities that directly target the no-label reliability goal, plus the surrogate that handles gradual and abrupt changes in one analysis. Prior work was mostly regret-based; this tries to give explicit limits on recovery time and an adaptivity-information trade-off. That is a clear step in a direction the field has lacked.

The work does a solid job of aligning the theory with TTA's actual objective and of producing clean, order-wise tight bounds inside the model. The circularity risk looks low because the quantities are defined from the surrogate rather than reverse-engineered from results.

The soft spot is the surrogate. The matching bounds and the claimed trade-off live entirely inside this discretization. If the surrogate changes the effective rate of distribution change or the information per step relative to real streams, the limits do not transfer. The abstract states the surrogate enables unified analysis but does not detail how it is built or what assumptions it imposes, so it is hard to judge fidelity from the given text. The proofs are also not visible here, which leaves the exact tightness open.

This is for readers who work on theoretical foundations of adaptation and online learning. Someone looking for new algorithms or immediate practical guidance will not find much. A serious referee should see it because the gap is real and the structure is substantive enough to evaluate, even if the modeling choice requires close scrutiny in review.

Referee Report

2 major / 2 minor

Summary. The paper introduces the first theoretical framework for the learnability of test-time adaptation (TTA) under non-stationary streams. It defines (ε,δ)-Recovery Complexity as the post-shift time to keep excess risk below a target with high probability, extends it to (ε,ρ)-TTA Learnability for long-term reliability, proposes a novel discrete surrogate for non-stationary test streams to unify analysis of gradual and abrupt shifts, and derives order-wise matching lower and upper bounds on recovery complexity that reveal an adaptivity-information trade-off and provide unified learnability guarantees complementing regret-based analyses.

Significance. If the matching bounds hold under a surrogate that faithfully captures information constraints and shift rates, the work would be significant as the first principled learnability theory for TTA, identifying fundamental limits and trade-offs not addressed by regret analyses. The order-wise matching bounds, if rigorously derived with complete proofs, represent a clear strength in providing tight characterizations.

major comments (2)

[Abstract and surrogate definition section] Abstract and the section defining the discrete surrogate: the order-wise matching bounds and adaptivity-information trade-off are derived inside this novel surrogate. The abstract claims it enables unified tractable analysis of gradual and abrupt shifts, but the central claim requires that discretization preserves the essential information constraints and shift rates of real non-stationary processes; without a theorem or explicit argument showing that discretization artifacts do not alter recovery-time scaling, the bounds do not necessarily transfer to practical TTA.
[Bounds derivation section] The section deriving the lower and upper bounds: the claim of order-wise matching bounds is load-bearing for the fundamental limits and unified guarantees. The abstract states these are derived, but without the full proofs, explicit assumptions on the surrogate, or the definition of the (ε,δ)-Recovery Complexity in the main text, it is not possible to verify whether the math supports the claims or whether modeling choices affect the result.

minor comments (2)

[Introduction or definitions section] Clarify the relationship between (ε,δ)-Recovery Complexity and (ε,ρ)-TTA Learnability with an explicit statement or diagram early in the paper.
[Preliminaries] Ensure all assumptions on the non-stationary streams (e.g., shift rates, information per step) are listed explicitly before the surrogate is introduced.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below.

read point-by-point responses

Referee: Abstract and the section defining the discrete surrogate: the order-wise matching bounds and adaptivity-information trade-off are derived inside this novel surrogate. The abstract claims it enables unified tractable analysis of gradual and abrupt shifts, but without a theorem or explicit argument showing that discretization artifacts do not alter recovery-time scaling, the bounds do not necessarily transfer to practical TTA.

Authors: The surrogate in Section 3 is constructed to preserve the governing parameters of shift rate and per-step information availability that control recovery scaling. The order-wise bounds are derived directly for this model, which unifies gradual and abrupt cases by design. We will add a proposition in revision that bounds the discretization error on recovery time under standard Lipschitz assumptions on the underlying shift process, confirming that the scaling is unaffected. revision: yes
Referee: The section deriving the lower and upper bounds: the claim of order-wise matching bounds is load-bearing for the fundamental limits and unified guarantees. The abstract states these are derived, but without the full proofs, explicit assumptions on the surrogate, or the definition of the (ε,δ)-Recovery Complexity in the main text, it is not possible to verify whether the math supports the claims or whether modeling choices affect the result.

Authors: Definition 2.1 in the main text introduces (ε,δ)-Recovery Complexity. Section 3.1 lists the surrogate assumptions. Theorems 4.1 and 4.2 state the matching bounds, with complete proofs in the appendix. We will expand the main-text presentation of the definition and key assumptions for clarity, while retaining the appendix proofs. revision: partial

Circularity Check

0 steps flagged

No circularity: bounds derived from newly introduced definitions and surrogate

full rationale

The paper defines new quantities ((ε,δ)-Recovery Complexity and (ε,ρ)-TTA Learnability) and a novel discrete surrogate for non-stationary streams, then derives matching lower/upper bounds inside that framework. No step reduces a claimed prediction to a fitted parameter by construction, no self-citation is load-bearing for the central result, and the surrogate is presented as an enabling modeling choice rather than a hidden tautology. The derivation chain is self-contained and does not collapse to its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the validity of the discrete surrogate as a faithful model of non-stationary streams and on standard probabilistic assumptions implicit in (ε,δ) definitions; no free parameters or invented physical entities are mentioned.

axioms (1)

domain assumption A novel discrete surrogate accurately captures both gradual and abrupt distribution shifts for the purpose of bounding recovery complexity.
Introduced in the abstract to enable unified analysis.

invented entities (1)

(ε,δ)-Recovery Complexity no independent evidence
purpose: Quantify post-shift time to keep excess risk below target with high probability
New definition introduced to measure TTA reliability.

pith-pipeline@v0.9.1-grok · 5748 in / 1265 out tokens · 36809 ms · 2026-06-29T13:40:33.909697+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

4 extracted references · 1 canonical work pages

[1]

and Snoek, C

Xiao, Z. and Snoek, C. G. M. Beyond model adaptation at test time: A survey.CoRR, abs/2411.03687,

work page arXiv
[2]

By the greedy rule, right before the shift at time ti, the current anchor equals ti−1, and the shift is triggered because W1(Pti ,P t(i−1))>∆ W /2.(23) On the other hand, by the triangle inequality along the original path, W1(Pt(i−1) ,P ti)≤ ti−1X t=t(i−1) W1(Pt,P t+1).(24) Combining (23) and (24) yields, for eachi∈[m], ti−1X t=ti−1 W1(Pt,P t+1)>∆ W /2.(2...

1986
[3]

on X ×Y with U∼ν and mi ∈ {0,∆} (the label is constant so the loss reduces to a function ofx alone). The translation coupling ((U, y0),(∆ +U, y 0)) in product metric yields W1(P(0),P (1))≤E U−(∆ +U) = ∆ = 2ρ sat ≤∆ W (33) by the theorem’s shift-budget hypothesis, so the hard instance lies in the assumed stream class. With per-sample supervised loss ℓ(θ;x)...

2012
[4]

Our analysis reveals the proxy-task alignment α of Assump- tion 2.1 as the key quantity governing recovery

on the corruption benchmarks CIFAR-10-C, CIFAR-100- C, and ImageNet-C (Hendrycks & Dietterich, 2019). Our analysis reveals the proxy-task alignment α of Assump- tion 2.1 as the key quantity governing recovery. In the following experiments, we measure its empirical counterpart ˜α=⟨∇ℓ ent,∇ℓ ce⟩/∥∇ℓce∥2 in practice, which is the observed alignment between t...

2019

[1] [1]

and Snoek, C

Xiao, Z. and Snoek, C. G. M. Beyond model adaptation at test time: A survey.CoRR, abs/2411.03687,

work page arXiv

[2] [2]

By the greedy rule, right before the shift at time ti, the current anchor equals ti−1, and the shift is triggered because W1(Pti ,P t(i−1))>∆ W /2.(23) On the other hand, by the triangle inequality along the original path, W1(Pt(i−1) ,P ti)≤ ti−1X t=t(i−1) W1(Pt,P t+1).(24) Combining (23) and (24) yields, for eachi∈[m], ti−1X t=ti−1 W1(Pt,P t+1)>∆ W /2.(2...

1986

[3] [3]

on X ×Y with U∼ν and mi ∈ {0,∆} (the label is constant so the loss reduces to a function ofx alone). The translation coupling ((U, y0),(∆ +U, y 0)) in product metric yields W1(P(0),P (1))≤E U−(∆ +U) = ∆ = 2ρ sat ≤∆ W (33) by the theorem’s shift-budget hypothesis, so the hard instance lies in the assumed stream class. With per-sample supervised loss ℓ(θ;x)...

2012

[4] [4]

Our analysis reveals the proxy-task alignment α of Assump- tion 2.1 as the key quantity governing recovery

on the corruption benchmarks CIFAR-10-C, CIFAR-100- C, and ImageNet-C (Hendrycks & Dietterich, 2019). Our analysis reveals the proxy-task alignment α of Assump- tion 2.1 as the key quantity governing recovery. In the following experiments, we measure its empirical counterpart ˜α=⟨∇ℓ ent,∇ℓ ce⟩/∥∇ℓce∥2 in practice, which is the observed alignment between t...

2019