pith. sign in

arxiv: 2604.27442 · v2 · pith:Q4AMPNU5new · submitted 2026-04-30 · 🧮 math.ST · stat.ML· stat.TH

Bayesian online learning in the one-pass regime: Frequentist validity and uncertainty quantification

Pith reviewed 2026-07-01 08:21 UTC · model grok-4.3

classification 🧮 math.ST stat.MLstat.TH
keywords bayesian online learningone-pass regimebernstein-von mises theoremuncertainty quantificationsequential inferencefrequentist validitygeneralized linear models
0
0 comments X

The pith

A warm-start Bayesian online algorithm achieves optimal posterior convergence and valid uncertainty quantification in the strict one-pass regime.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a Bayesian online learning algorithm built for the one-pass setting, where each data point is encountered only once and mini-batch sizes stay fixed. It adds a warm-start phase to keep sequential posterior updates stable. The authors prove that this posterior converges at the optimal rate and satisfies an online version of the Bernstein-von Mises theorem, so that credible intervals retain correct frequentist coverage. The argument rests on a new theoretical framework that departs from prior online-learning analyses. Experiments on generalized linear models show the method matches the accuracy of full-batch Bayesian inference while beating other online procedures.

Core claim

For the proposed algorithm that incorporates a warm-start phase, the sequentially updated posterior attains the optimal convergence rate. Building on this, an online analogue of the Bernstein-von Mises theorem holds and guarantees valid uncertainty quantification without any requirement that mini-batch sample sizes diverge.

What carries the argument

The warm-start phase within the sequential posterior update, which stabilizes one-pass Bayesian inference and enables both the optimal convergence rate and the online Bernstein-von Mises result.

If this is right

  • The sequentially updated posterior converges at the optimal rate in the one-pass regime.
  • Credible intervals from the online posterior provide valid frequentist coverage without requiring growing mini-batch sizes.
  • On generalized linear models the method matches the performance of the full batch Bayesian estimator.
  • The procedure outperforms existing online Bayesian methods in numerical experiments.
  • The analysis relies on a novel theoretical framework distinct from existing online learning literature.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The warm-start construction may extend to streaming settings where data cannot be stored or revisited.
  • The same stabilization technique could be tested on models outside the generalized linear family.
  • If the novel framework generalizes, it may supply frequentist guarantees for other sequential Bayesian procedures.
  • Links between the online Bernstein-von Mises result and classical online learning rates could support hybrid frequentist-Bayesian streaming algorithms.

Load-bearing premise

The algorithm must include a warm-start phase to produce stable sequential updates that deliver optimal convergence and the online Bernstein-von Mises theorem in the strict one-pass regime.

What would settle it

An experiment in which the online posterior credible intervals fail to attain nominal frequentist coverage when mini-batch sizes remain fixed throughout the one-pass process.

Figures

Figures reproduced from arXiv: 2604.27442 by Dongguen Kim, Jeyong Lee, Junhyeok Choi, Minwoo Chae.

Figure 1
Figure 1. Figure 1: ℓ2-error across different parameter dimensions 𝑝. where 𝑧𝛼/2 denotes the upper 𝛼/2 quantile of 𝑁(0, 1), and 𝜎ˆ 𝑗,SGD and 𝜎ˆ 𝑗,BOO denote the 𝑗-th diagonal component of F −1 𝑛,SGDV𝑛,SGDF −1 𝑛,SGD and 𝛀 −1 𝑛 , respectively. Setting 𝛼 = 0.05, we report the coverage probability (CP) and the length (Len) of intervals in Tables 2 and 3. Finally, we also investigate the sensitivity of the algorithm to the phase t… view at source ↗
Figure 2
Figure 2. Figure 2: ℓ2-error across independent and correlated design. we take 𝚺 = I𝑝, whereas for the correlated design we set 𝚺 = A diag  ( 𝑗 2 /𝑝 2 ) 𝑝 𝑗=1  A ⊤ , where A is a randomly generated orthogonal matrix. This correlated design is inspired by Boyer and Godichon￾Baggioni (2023) and allows us to assess the robustness of each method with respect to the covariance structure view at source ↗
Figure 3
Figure 3. Figure 3: ℓ2-error across varying 𝑀 view at source ↗
Figure 4
Figure 4. Figure 4: ℓ2-error across varying ∥𝜃0 − 𝜃★∥2. regression model, in contrast, the role of 𝑡0 becomes more pronounced: BOO attains MLE-level performance in terms of ℓ2-error once 𝑀 exceeds approximately 0.25. 4.3.2 Sensitivity to initialization We next investigate the sensitivity to the initialization 𝜃0, measured by the distance ∥Δ0 ∥2 = ∥𝜃0 − 𝜃★∥2. Here, 𝜃0 denotes the prior location parameter in Π0 for BOO methods … view at source ↗
read the original abstract

Bayesian online learning provides a coherent framework for sequential inference. However, its theoretical understanding remains limited, particularly in the one-pass setting. Existing theoretical guarantees typically require the mini-batch sample size to diverge, a condition that fails in the one-pass regime. In this paper, we propose a new Bayesian online learning algorithm tailored to the one-pass setting, which incorporates a warm-start phase to ensure stable sequential updates. For this algorithm, we show that the sequentially updated posterior attains the optimal convergence rate. Building on this, we establish an online analogue of the Bernstein-von Mises theorem, which guarantees valid uncertainty quantification without diverging mini-batch sample sizes. Our analysis is based on a novel theoretical framework that differs fundamentally from existing approaches in the online learning literature. Numerical experiments on generalized linear models show that the proposed method matches the performance of the batch estimator while outperforming existing online procedures.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes a new Bayesian online learning algorithm for the strict one-pass regime that incorporates a warm-start phase to stabilize sequential posterior updates. It claims that the sequentially updated posterior attains the optimal convergence rate and establishes an online analogue of the Bernstein-von Mises theorem guaranteeing valid uncertainty quantification without requiring diverging mini-batch sizes. The analysis relies on a novel theoretical framework distinct from existing online learning approaches. Numerical experiments on generalized linear models indicate that the method matches batch estimator performance and outperforms existing online procedures.

Significance. If the central claims hold with the stated conditions, the work would be significant for closing a gap in theoretical guarantees for Bayesian sequential inference under strict one-pass constraints, where prior results typically demand diverging batch sizes. The online BvM result and novel framework could provide a foundation for uncertainty quantification in streaming settings. The experiments offer supporting evidence of practical competitiveness with batch methods.

major comments (3)
  1. [§2] §2 (warm-start phase definition): The manuscript does not specify whether the initial block size in the warm-start phase is a fixed constant independent of total sample size n or is permitted to grow with n. If the latter is required to control remainder terms in the convergence and LAN-type arguments underlying the online BvM result, the construction would fail to satisfy the advertised 'no diverging mini-batch sizes' condition for the one-pass regime.
  2. [§4] §4 (proof of optimal convergence rate): The abstract states that the sequentially updated posterior attains the optimal rate, but the provided text contains no explicit statement of the regularity conditions (e.g., on the prior, likelihood, or LAN expansion) under which this rate is derived, nor any comparison to the minimax rate for the batch case. Without these, it is impossible to verify that the warm-start plus one-pass updates indeed achieve the claimed rate without hidden batch-size growth.
  3. [§5] §5 (online BvM theorem): The online analogue of Bernstein-von Mises is asserted to hold without diverging mini-batches, yet no indication is given of how the posterior after the warm-start satisfies the necessary local asymptotic normality conditions for the subsequent one-pass updates. If the initial posterior must itself be obtained from a growing block to seed the LAN property, the result reduces to a standard BvM on a diverging initial sample.
minor comments (2)
  1. [Abstract] The abstract and introduction would benefit from an explicit statement of the precise one-pass regime (fixed batch size = 1 after warm-start) versus the warm-start block size.
  2. [§1] Notation for the sequential posterior (e.g., π_n vs. π_{n,k}) is introduced without a consolidated table or section summarizing all symbols.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thorough review and insightful comments, which help clarify key aspects of the presentation. We address each major comment below. The concerns primarily involve missing explicit statements and clarifications rather than fundamental flaws in the results, and we will revise the manuscript to incorporate the requested details while preserving the core claims.

read point-by-point responses
  1. Referee: [§2] §2 (warm-start phase definition): The manuscript does not specify whether the initial block size in the warm-start phase is a fixed constant independent of total sample size n or is permitted to grow with n. If the latter is required to control remainder terms in the convergence and LAN-type arguments underlying the online BvM result, the construction would fail to satisfy the advertised 'no diverging mini-batch sizes' condition for the one-pass regime.

    Authors: The warm-start phase in the algorithm uses a fixed block size that is independent of the total sample size n. The novel theoretical framework in the paper establishes that this fixed initialization is sufficient to control the relevant remainder terms and initiate the LAN property, which is then propagated through the one-pass updates. This maintains the strict one-pass regime after the warm-start without any diverging batch sizes. We will revise §2 to explicitly define the fixed block size and add a reference to the supporting lemma. revision: yes

  2. Referee: [§4] §4 (proof of optimal convergence rate): The abstract states that the sequentially updated posterior attains the optimal rate, but the provided text contains no explicit statement of the regularity conditions (e.g., on the prior, likelihood, or LAN expansion) under which this rate is derived, nor any comparison to the minimax rate for the batch case. Without these, it is impossible to verify that the warm-start plus one-pass updates indeed achieve the claimed rate without hidden batch-size growth.

    Authors: We agree that the regularity conditions and rate comparison should be stated explicitly for clarity. Under standard assumptions ensuring a LAN expansion (smooth likelihood, prior with positive density at the true parameter), the sequentially updated posterior achieves the same n^{-1/2} rate as the batch minimax rate. The proof in §4 shows this holds without hidden growth in batch sizes due to the warm-start stabilization. We will add these conditions and the explicit comparison in a revised §4. revision: yes

  3. Referee: [§5] §5 (online BvM theorem): The online analogue of Bernstein-von Mises is asserted to hold without diverging mini-batches, yet no indication is given of how the posterior after the warm-start satisfies the necessary local asymptotic normality conditions for the subsequent one-pass updates. If the initial posterior must itself be obtained from a growing block to seed the LAN property, the result reduces to a standard BvM on a diverging initial sample.

    Authors: The online BvM result relies on the novel framework, which demonstrates that the fixed-size warm-start posterior, when combined with the information from one-pass updates, satisfies the LAN conditions asymptotically without requiring the initial block to grow. The proof uses recursive propagation of asymptotic normality rather than relying on a large initial sample alone. We will expand §5 with a high-level sketch of this argument to address the concern. revision: partial

Circularity Check

0 steps flagged

No circularity; derivation self-contained in novel framework

full rationale

The abstract and claims present a new algorithm incorporating a warm-start phase, from which optimal posterior convergence and an online Bernstein-von Mises result are derived via a novel theoretical framework explicitly stated to differ from existing online learning approaches. No equations, definitions, or self-citations are exhibited that reduce the central results (convergence rate or BvM) to fitted inputs, self-definitions, or load-bearing prior author work. The warm-start is described as an algorithmic component enabling the one-pass regime rather than a parameter whose value is fitted and then renamed as a prediction. The paper is therefore self-contained against external benchmarks with no detectable circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no specific free parameters, axioms, or invented entities can be identified or audited.

pith-pipeline@v0.9.1-grok · 5691 in / 1067 out tokens · 46619 ms · 2026-07-01T08:21:48.791143+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

7 extracted references

  1. [1]

    𝛀0 2 + 𝑡0∑︁ 𝑠=1 H𝑠 (𝜃 𝑡0) −H 𝑠 (𝜃★) 2 + 𝑡∑︁ 𝑠=𝑡0+1 H𝑠 (𝜃 𝑠−1 ) −H 𝑠 (𝜃★) 2 # (A3) ≤ (𝐾 2𝑡) −1

    By the update formula𝜃 𝑡 =𝜃 𝑡−1 −𝛀 −1 𝑡 𝑔𝑡, we have 𝑉𝑡 = 𝛀1/2 𝑡 (𝜃 𝑡 −𝜃 ★) 2 2 = 𝛀1/2 𝑡 (𝜃 𝑡−1 −𝛀 −1 𝑡 𝑔𝑡 −𝜃 ★) 2 2 = 𝛀1/2 𝑡 (𝜃 𝑡−1 −𝜃 ★) 2 2 + 𝛀1/2 𝑡 𝛀−1 𝑡 𝑔𝑡 2 2 −2⟨𝛀 1/2 𝑡 (𝜃 𝑡−1 −𝜃 ★),𝛀 1/2 𝑡 𝛀−1 𝑡 𝑔𝑡 ⟩ = 𝛀1/2 𝑡 (𝜃 𝑡−1 −𝜃 ★) 2 2 + 𝛀−1/2 𝑡 𝑔𝑡 2 2 −2⟨𝜃 𝑡−1 −𝜃 ★, 𝑔𝑡 ⟩ =𝑉 𝑡−1 + ⟨H𝑡 ,Δ ⊗2 𝑡−1 ⟩ + ⟨𝑔 𝑡 ,𝛀 −1 𝑡 𝑔𝑡 ⟩ −2⟨𝑔 𝑡 ,Δ 𝑡−1 ⟩, where the last equality h...

  2. [2]

    There exist some constants𝐶 1, 𝐶2, 𝐶3, 𝐶4 >0such that for any𝑡∈Nwith𝑡≥𝐶 1 𝑝 𝜆min ∑︁ 𝑠∈𝐼 𝑡 (𝐶2 ) 𝑥𝑠𝑥⊤ 𝑠 ≥𝐶 3𝑡, 𝜆 max 𝑡∑︁ 𝑠=1 𝑥𝑠𝑥⊤ 𝑠 ≤𝐶 4𝑡

  3. [3]

    There exist some constants𝐶 5, 𝐶6 >0such that max 𝑡∈N ∥𝑥 𝑡 ∥2 ·𝑟≤𝐶 5,max 𝑡∈N ∥𝑥 𝑡 ∥2 2 ≤𝐶 6. 44

  4. [4]

    For the phase transition time 𝑡0 satisfying 𝑡0 ≥𝐶 1(𝑝+x) , there exists an event ℰ0(x) with P𝜃★ (ℰ0(x))> 1−𝑒 −x such that, onℰ0(x), ˆ𝜃MAP 𝑡0 ∈Θ 𝑟

  5. [5]

    There exist some constants 𝜈, 𝛼 >0 such that for any 𝑡∈N E 𝑒𝜆(𝜖 2 𝑡 −𝜎 2 𝑡 ) | F𝑡−1 ≤exp 𝜈2 𝜆2 2 ,∀|𝜆| ≤1/𝛼

    Let 𝜖𝑡 =𝑌 𝑡 −𝑏 ′ (𝑥 ⊤ 𝑡 𝜃★) and 𝜎2 𝑡 =𝑏 ′′ (𝑥 ⊤ 𝑡 𝜃★). There exist some constants 𝜈, 𝛼 >0 such that for any 𝑡∈N E 𝑒𝜆(𝜖 2 𝑡 −𝜎 2 𝑡 ) | F𝑡−1 ≤exp 𝜈2 𝜆2 2 ,∀|𝜆| ≤1/𝛼. Then, there exists an event ℰ(x) with P𝜃★ (ℰ(x)) ≥1−𝑒 −x such that on ℰ(x) ∩ℰ 0, for any 𝑡∈N with 𝑡0 < 𝑡 and{𝜃 𝑠}𝑡−1 𝑠=𝑡0+1 ⊂Θ 𝑟, 𝑡∑︁ 𝑠=𝑡0+1 ⟨∇ℓ𝑠 (𝜃 𝑠−1 ),𝛀 −1 𝑠 ∇ℓ𝑠 (𝜃 𝑠−1 )⟩ ≤𝐾 𝑝 log(𝑡/𝑡 0) +...

  6. [6]

    For𝑡∈N,𝑌 𝑡 admits a jointly measurable family of conditional densities (𝜔, 𝜃, 𝑦) ↦→𝑝 𝑡 , 𝜃(𝑦| F 𝑡−1 ) (𝜔),(𝜔, 𝜃, 𝑦) ∈Ω×Θ×R such that for every𝜃∈Θand all measurable𝐴⊂R P𝜃 𝑌𝑡 ∈𝐴| F 𝑡−1 = ∫ 𝐴 𝑝𝑡 , 𝜃(𝑦| F 𝑡−1 )d𝑦,a.s

  7. [7]

    Letℓ 𝑡 (𝜃)=−log𝑝 𝑡 , 𝜃(𝑌𝑡 | F𝑡−1 )

    For𝑡∈N, the support𝑆 𝑡 ={𝑦∈R:𝑝 𝑡 , 𝜃(𝑦| F 𝑡−1 )>0}is independent of𝜃∈Θalmost surely. Letℓ 𝑡 (𝜃)=−log𝑝 𝑡 , 𝜃(𝑌𝑡 | F𝑡−1 ). Then, for any fixed true parameter𝜃 ★ andx>0, we have P𝜃★ sup 𝑡≥1 𝑡∑︁ 𝑠=1 ℓ𝑠 (𝜃★) −ℓ 𝑠 (e𝜃𝑠−1 ) ≤x ≥1−𝑒 −x. Proof.Let𝑀 0 =1. For𝑡∈N, define the likelihood ratio process(𝑀 𝑡 )𝑡∈N as follows: 𝑀𝑡 = 𝑡Ö 𝑠=1 𝑝𝑠, e𝜃𝑠−1 (𝑌𝑠 | F 𝑠−1 ) 𝑝𝑠, 𝜃★ (𝑌𝑠...