Self-Normalized Martingales and Uniform Regret Bounds for Linear Regression

Fan Chen , Jian Qian , Alexander Rakhlin , Nikita Zhivotovskiy

Authors on Pith no claims yet

Pith reviewed 2026-05-09 13:47 UTC · model grok-4.3

classification 📊 stat.ML cs.LGmath.STstat.TH

keywords boundsregretscale-invariantself-normalizedemphboundedcovariatesdoubly-uniform

0 comments

The pith

Nontrivial scale-invariant self-normalized martingale bounds exist only in d=1 (with O(log T) doubly-uniform regret), are impossible in d>1 without assumptions, and are recovered under smoothness.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Self-normalized martingales appear in confidence ellipsoids for online least squares, which power many bandit and RL algorithms. Existing bounds usually require bounded covariates or explicit regularization and lose scale-invariance even though the underlying quantity is scale-invariant by definition. The paper shows that, without further assumptions, nontrivial scale-invariant upper bounds are possible only when the dimension is one; in that case they obtain O(log T) bounds that hold for arbitrary covariates. In dimension greater than one, no such nontrivial bound can hold in full generality. This dichotomy directly controls whether doubly-uniform regret (simultaneously independent of covariate scale and comparator norm) is achievable in online linear regression. The authors give an explicit algorithm achieving O(log T) doubly-uniform regret in one dimension and prove sublinear doubly-uniform regret is impossible in higher dimensions. Under a smoothness assumption (bounded Radon-Nikodym derivatives of conditional covariate laws), they restore sublinear regret for d>1 without bounded covariates and obtain a self-normalized concentration inequality free of the usual regularization penalties.

Core claim

Without further assumptions, nontrivial scale-invariant bounds on self-normalized martingales exist only in dimension d=1; in d=1 we obtain O(log T) scale-invariant self-normalized bounds without any assumptions on the covariates. For d>1 no nontrivial scale-invariant bound can hold in full generality. This implies O(log T) doubly-uniform regret in d=1 and impossibility of sublinear doubly-uniform regret in d>1, resolving the open question of Gaillard et al.

Load-bearing premise

The natural smoothness condition (bounded Radon-Nikodym derivatives of the conditional covariate laws with respect to a fixed base measure) that is used to recover sublinear regret and a regularization-free self-normalized inequality for d>1.

read the original abstract

Self-normalized martingale inequalities lie at the heart of confidence ellipsoids for online least squares and, more broadly, many bandit and reinforcement-learning results. Yet existing vector and scalar results typically rely on bounded covariates and an explicit regularization matrix, producing bounds that are \emph{not scale-invariant}: although the self-normalized quantity is scale-invariant by definition, its standard upper bounds are not. We characterize when scale-invariant upper bounds on self-normalized martingales are possible. Without further assumptions, we prove that nontrivial scale-invariant bounds exist only in dimension $d=1$; moreover, in $d=1$ we obtain $O(\log T)$ scale-invariant self-normalized bounds without any assumptions on the covariates. In contrast, for $d>1$ we show that no nontrivial scale-invariant bound can hold in full generality. We then connect this dichotomy to \emph{doubly-uniform} regret in online linear regression (i.e., regret bounds that are simultaneously independent of the covariate scale and the comparator norm) and use it to resolve the open question of Gaillard, Gerchinovitz, Huard, and Stoltz, \emph{``Uniform regret bounds over $\mathbb{R}^d$ for the sequential linear regression problem with the square loss''} (ALT 2019): in $d=1$ we give an explicit algorithm with $O(\log T)$ doubly-uniform regret, whereas for $d>1$ sublinear doubly-uniform regret is impossible. Finally, under a natural \emph{smoothness} condition (bounded Radon--Nikodym derivatives of the conditional covariate laws with respect to a fixed base measure), we recover sublinear regret for $d>1$ without bounded covariates and derive a self-normalized concentration inequality free of the usual regularization penalties, yielding arguably a first natural scale-invariant bound for adaptive, non-i.i.d. vector martingales.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard martingale theory and online-learning definitions. No free parameters are fitted. The smoothness condition is an additional domain assumption required only for the d>1 positive result.

axioms (2)

standard math Standard properties of vector and scalar martingales and self-normalized processes
Invoked throughout the development of the concentration inequalities.
domain assumption Bounded Radon-Nikodym derivatives of conditional covariate laws under the smoothness condition
This assumption is introduced to obtain the regularization-free bound and sublinear regret for d>1.

pith-pipeline@v0.9.0 · 5667 in / 1601 out tokens · 108910 ms · 2026-05-09T13:47:13.490304+00:00 · methodology

Self-Normalized Martingales and Uniform Regret Bounds for Linear Regression

Core claim

Load-bearing premise

discussion (0)