Batch learning equals online learning in Bayesian supervised learning

H\^ong V\^an L\^e

arxiv: 2510.16892 · v5 · pith:MIC25EWJnew · submitted 2025-10-19 · 🧮 math.ST · stat.TH

Batch learning equals online learning in Bayesian supervised learning

H\^ong V\^an L\^e This is my paper

Pith reviewed 2026-05-22 11:45 UTC · model grok-4.3

classification 🧮 math.ST stat.TH

keywords Bayesian supervised learningBayesian inversionprobabilistic morphismsposterior predictive distributiondependent Dirichlet processKalman filterSouslin spaceprojective system

0 comments

The pith

In Bayesian supervised learning with conditionally independent data, sequential and batch Bayesian inversions produce the same result.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that for universal models of the form (P(Y)^X, mu, Id, P(Y)^X) with Y a Souslin space, the Bayesian posterior obtained by updating one data point at a time equals the posterior obtained from the entire batch at once. The proof relies on the functoriality of probabilistic morphisms to equate the two inversion procedures even when the observations are not identically distributed. This yields a recursive update rule for posterior predictive distributions that recovers the Kalman filter as a special case in Gaussian process regression and extends to priors such as dependent Dirichlet processes.

Core claim

Using functoriality of probabilistic morphisms, sequential and batch Bayesian inversions coincide in supervised learning models with conditionally independent data on universal models (P(Y)^X, mu, Id, P(Y)^X) for arbitrary input X and Souslin label space Y, without domination or discreteness assumptions on the sampling operators.

What carries the argument

Functoriality of probabilistic morphisms, which equates the composition of sequential Bayesian inversions with the single batch inversion on the same data.

If this is right

Posterior predictive distributions admit a recursive update that avoids reprocessing all previous data.
The same equivalence applies to models equipped with dependent Dirichlet process priors constructed via copulas.
Probability measures on the space of functions P(Y)^X are characterized by projective systems, generalizing earlier results for Souslin spaces.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Practical streaming algorithms for Bayesian supervised learning can match the accuracy of full-batch recomputation under conditional independence.
The recursive formula may simplify online inference in non-Gaussian settings beyond the Kalman filter case.
Similar functorial arguments could be tested in other Bayesian models that lack the universal form.

Load-bearing premise

The observations remain conditionally independent given the random function drawn from the prior on the space of functions.

What would settle it

A concrete model with Souslin Y, arbitrary X, and conditionally independent data in which the sequential posterior differs numerically from the batch posterior.

read the original abstract

In this paper we study Bayesian supervised learning models proposed by L\^e in \cite{Le2025}. We show the existence of Bayesian inversions on universal Bayesian supervised learning models $(\mathcal{P}(\mathcal{Y})^{\mathcal{X}}, \mu, \mathrm{Id}_{\mathcal{P}(\mathcal{Y})^{\mathcal{X}}}, \mathcal{P}(\mathcal{Y})^{\mathcal{X}}$ for arbitrary input space $\mathcal{X}$, Souslin label space $\mathcal{Y}$, and prior probability measure $\mu \in \mathcal{P}( \mathcal{P}(\mathcal{Y})^{\mathcal{X}})$. Using functoriality of probabilistic morphisms, we prove that sequential and batch Bayesian inversions coincide in supervised learning models with conditionally independent (possibly non-i.i.d.) data \cite{Le2025}. This equivalence holds without domination or discreteness assumptions on sampling operators. We derive a recursive formula for posterior predictive distributions, which reduces to the Kalman filter in Gaussian process regression. For Souslin label spaces $\mathcal{Y}$ and arbitrary input sets $\mathcal{X}$, we characterize probability measures on $\mathcal{P}(\mathcal{Y})^{\mathcal{X}}$ via projective systems, generalizing Orbanz \cite{Orbanz2011}. We revisit MacEachern's Dependent Dirichlet Processes (DDP) \cite{MacEachern2000} using copula-based constructions \cite{BJQ2012} and show how to compute posterior predictive distributions in universal Bayesian supervised models with DDP priors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Batch and online Bayesian updates coincide for conditionally independent data in general supervised models, via functoriality and projective systems.

read the letter

The main point is that batch Bayesian inversion equals the sequential version in supervised learning models with conditionally independent data, even for arbitrary input spaces X and Souslin output spaces Y, without domination or discreteness assumptions on the sampling operators. The paper proves this using functoriality of probabilistic morphisms on the universal model (P(Y)^X, mu, Id, P(Y)^X) and derives a recursive formula for the posterior predictive that recovers the Kalman filter in the Gaussian case. It also characterizes measures on P(Y)^X via projective systems, extending Orbanz, and revisits DDP priors through copula constructions to compute predictives. These pieces fit together cleanly and handle non-i.i.d. conditional independence without reducing to special cases. The equivalence unifies two perspectives that often require separate analysis, and the special-case check is a useful sanity test. The argument draws on prior results for functoriality and projective limits rather than building everything from scratch, so the advance is mainly in the application to this equivalence and the general setting. The DDP part functions more as an illustration than a standalone contribution. No load-bearing gaps appear in the outline, but the full derivations would need verification for any subtle handling of the projective limits or morphism composition. This paper targets researchers in Bayesian nonparametrics and measure-theoretic learning theory who care about sequential versus batch inference on general spaces. A reader working on theoretical foundations or computational shortcuts for online updating would find direct value. It is structured and grounded enough to deserve a serious referee.

Referee Report

2 major / 4 minor

Summary. The manuscript studies Bayesian supervised learning models and proves that sequential and batch Bayesian inversions coincide for models with conditionally independent data. It establishes existence of Bayesian inversions in universal models for arbitrary input spaces and Souslin label spaces using projective systems. Functoriality of probabilistic morphisms is used to show the equivalence without domination or discreteness assumptions. A recursive formula for posterior predictive distributions is derived, which specializes to the Kalman filter in Gaussian process regression. The paper also generalizes the characterization of probability measures on function spaces via projective systems and revisits Dependent Dirichlet Processes using copula constructions.

Significance. This work is significant in that it provides a general, assumption-minimal proof of the equivalence between batch and online Bayesian learning in supervised settings. By leveraging categorical and measure-theoretic tools, it offers a unified framework that recovers classical results like the Kalman filter as special cases. The extensions to projective systems and DDP priors contribute to the foundations of Bayesian nonparametrics, potentially enabling more efficient recursive computations in complex models.

major comments (2)

[§3] §3 (equivalence via functoriality): the argument that sequential and batch inversions coincide relies on the batch measure factoring under the probabilistic morphism functor for conditionally independent data; an explicit verification that the projective limit commutes with the inversion in this case is needed to confirm the reduction, as this step is central to the main claim.
[§4] §4 (projective systems characterization): the generalization of Orbanz's result to arbitrary X and universal models (P(Y)^X, mu, Id, P(Y)^X) should confirm that the consistency conditions of the projective system hold for general priors mu without additional regularity on the sampling operators.

minor comments (4)

[Introduction] Clarify the components of the universal model tuple (P(Y)^X, mu, Id, P(Y)^X) at first use, especially the role of the identity morphism.
[References] Expand citations for Le2025, Orbanz2011, MacEachern2000, and BJQ2012 with full bibliographic information.
[Recursive formula section] Number the recursive posterior predictive formula as an equation and explicitly derive its reduction to the Kalman filter in the Gaussian process case.
[DDP section] In the DDP application, contrast the copula-based construction more clearly with prior approaches to highlight computational advantages for posterior predictives.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading of the manuscript and for the positive recommendation of minor revision. We address each major comment below and indicate the changes to be made in the revised version.

read point-by-point responses

Referee: [§3] §3 (equivalence via functoriality): the argument that sequential and batch inversions coincide relies on the batch measure factoring under the probabilistic morphism functor for conditionally independent data; an explicit verification that the projective limit commutes with the inversion in this case is needed to confirm the reduction, as this step is central to the main claim.

Authors: We thank the referee for this observation. The equivalence in Theorem 3.1 is obtained by applying the probabilistic morphism functor to the product measure induced by conditional independence, which factors the batch posterior. The commutation with the projective limit is a consequence of the functor preserving limits in the category of probability measures on Souslin spaces. To make this step fully explicit, we will add a short lemma in Section 3 that directly verifies the commutation for conditionally independent data, thereby confirming the reduction under the stated assumptions. revision: yes
Referee: [§4] §4 (projective systems characterization): the generalization of Orbanz's result to arbitrary X and universal models (P(Y)^X, mu, Id, P(Y)^X) should confirm that the consistency conditions of the projective system hold for general priors mu without additional regularity on the sampling operators.

Authors: We appreciate the request for explicit confirmation. In the proof of the generalized characterization (Proposition 4.2), the consistency conditions of the projective system follow directly from the identity morphisms in the universal model and the Souslin property of Y, which guarantees the existence of the projective limit for any prior μ in P(P(Y)^X). No further regularity on the sampling operators is imposed beyond measurability, which is part of the model definition. We will insert a clarifying remark after Proposition 4.2 to state this explicitly for general priors. revision: yes

Circularity Check

0 steps flagged

Minor self-citation to prior model definition; derivation otherwise independent

full rationale

The paper defines universal Bayesian supervised learning models as (P(Y)^X, mu, Id, P(Y)^X) and invokes functoriality of probabilistic morphisms together with projective-limit characterizations from the cited prior work Le2025 to establish existence of inversions and to equate sequential versus batch procedures under conditional independence. No equation or construction inside the present manuscript reduces the claimed equivalence to a fitted parameter, a self-referential definition, or a renaming of an input quantity. The recursive posterior-predictive formula is obtained directly from the same functorial application and is shown to recover the Kalman filter as a special case, supplying an external consistency check. The single self-citation is therefore minor and non-load-bearing for the central equivalence result.

Axiom & Free-Parameter Ledger

0 free parameters · 3 axioms · 0 invented entities

The paper relies on standard measure-theoretic background rather than introducing new fitted parameters or postulated entities.

axioms (3)

domain assumption Y is a Souslin space
Invoked to guarantee the projective-system characterization of measures on P(Y)^X.
domain assumption Data are conditionally independent given the random function
Required for the functoriality argument that equates sequential and batch inversions.
standard math Functoriality of probabilistic morphisms
Used to transfer the inversion operation across the batch and sequential settings.

pith-pipeline@v0.9.0 · 5788 in / 1377 out tokens · 59382 ms · 2026-05-22T11:45:49.444029+00:00 · methodology

Batch learning equals online learning in Bayesian supervised learning

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)