Batch learning equals online learning in Bayesian supervised learning
Pith reviewed 2026-05-22 11:45 UTC · model grok-4.3
The pith
In Bayesian supervised learning with conditionally independent data, sequential and batch Bayesian inversions produce the same result.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Using functoriality of probabilistic morphisms, sequential and batch Bayesian inversions coincide in supervised learning models with conditionally independent data on universal models (P(Y)^X, mu, Id, P(Y)^X) for arbitrary input X and Souslin label space Y, without domination or discreteness assumptions on the sampling operators.
What carries the argument
Functoriality of probabilistic morphisms, which equates the composition of sequential Bayesian inversions with the single batch inversion on the same data.
If this is right
- Posterior predictive distributions admit a recursive update that avoids reprocessing all previous data.
- The same equivalence applies to models equipped with dependent Dirichlet process priors constructed via copulas.
- Probability measures on the space of functions P(Y)^X are characterized by projective systems, generalizing earlier results for Souslin spaces.
Where Pith is reading between the lines
- Practical streaming algorithms for Bayesian supervised learning can match the accuracy of full-batch recomputation under conditional independence.
- The recursive formula may simplify online inference in non-Gaussian settings beyond the Kalman filter case.
- Similar functorial arguments could be tested in other Bayesian models that lack the universal form.
Load-bearing premise
The observations remain conditionally independent given the random function drawn from the prior on the space of functions.
What would settle it
A concrete model with Souslin Y, arbitrary X, and conditionally independent data in which the sequential posterior differs numerically from the batch posterior.
read the original abstract
In this paper we study Bayesian supervised learning models proposed by L\^e in \cite{Le2025}. We show the existence of Bayesian inversions on universal Bayesian supervised learning models $(\mathcal{P}(\mathcal{Y})^{\mathcal{X}}, \mu, \mathrm{Id}_{\mathcal{P}(\mathcal{Y})^{\mathcal{X}}}, \mathcal{P}(\mathcal{Y})^{\mathcal{X}}$ for arbitrary input space $\mathcal{X}$, Souslin label space $\mathcal{Y}$, and prior probability measure $\mu \in \mathcal{P}( \mathcal{P}(\mathcal{Y})^{\mathcal{X}})$. Using functoriality of probabilistic morphisms, we prove that sequential and batch Bayesian inversions coincide in supervised learning models with conditionally independent (possibly non-i.i.d.) data \cite{Le2025}. This equivalence holds without domination or discreteness assumptions on sampling operators. We derive a recursive formula for posterior predictive distributions, which reduces to the Kalman filter in Gaussian process regression. For Souslin label spaces $\mathcal{Y}$ and arbitrary input sets $\mathcal{X}$, we characterize probability measures on $\mathcal{P}(\mathcal{Y})^{\mathcal{X}}$ via projective systems, generalizing Orbanz \cite{Orbanz2011}. We revisit MacEachern's Dependent Dirichlet Processes (DDP) \cite{MacEachern2000} using copula-based constructions \cite{BJQ2012} and show how to compute posterior predictive distributions in universal Bayesian supervised models with DDP priors.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript studies Bayesian supervised learning models and proves that sequential and batch Bayesian inversions coincide for models with conditionally independent data. It establishes existence of Bayesian inversions in universal models for arbitrary input spaces and Souslin label spaces using projective systems. Functoriality of probabilistic morphisms is used to show the equivalence without domination or discreteness assumptions. A recursive formula for posterior predictive distributions is derived, which specializes to the Kalman filter in Gaussian process regression. The paper also generalizes the characterization of probability measures on function spaces via projective systems and revisits Dependent Dirichlet Processes using copula constructions.
Significance. This work is significant in that it provides a general, assumption-minimal proof of the equivalence between batch and online Bayesian learning in supervised settings. By leveraging categorical and measure-theoretic tools, it offers a unified framework that recovers classical results like the Kalman filter as special cases. The extensions to projective systems and DDP priors contribute to the foundations of Bayesian nonparametrics, potentially enabling more efficient recursive computations in complex models.
major comments (2)
- [§3] §3 (equivalence via functoriality): the argument that sequential and batch inversions coincide relies on the batch measure factoring under the probabilistic morphism functor for conditionally independent data; an explicit verification that the projective limit commutes with the inversion in this case is needed to confirm the reduction, as this step is central to the main claim.
- [§4] §4 (projective systems characterization): the generalization of Orbanz's result to arbitrary X and universal models (P(Y)^X, mu, Id, P(Y)^X) should confirm that the consistency conditions of the projective system hold for general priors mu without additional regularity on the sampling operators.
minor comments (4)
- [Introduction] Clarify the components of the universal model tuple (P(Y)^X, mu, Id, P(Y)^X) at first use, especially the role of the identity morphism.
- [References] Expand citations for Le2025, Orbanz2011, MacEachern2000, and BJQ2012 with full bibliographic information.
- [Recursive formula section] Number the recursive posterior predictive formula as an equation and explicitly derive its reduction to the Kalman filter in the Gaussian process case.
- [DDP section] In the DDP application, contrast the copula-based construction more clearly with prior approaches to highlight computational advantages for posterior predictives.
Simulated Author's Rebuttal
We thank the referee for their careful reading of the manuscript and for the positive recommendation of minor revision. We address each major comment below and indicate the changes to be made in the revised version.
read point-by-point responses
-
Referee: [§3] §3 (equivalence via functoriality): the argument that sequential and batch inversions coincide relies on the batch measure factoring under the probabilistic morphism functor for conditionally independent data; an explicit verification that the projective limit commutes with the inversion in this case is needed to confirm the reduction, as this step is central to the main claim.
Authors: We thank the referee for this observation. The equivalence in Theorem 3.1 is obtained by applying the probabilistic morphism functor to the product measure induced by conditional independence, which factors the batch posterior. The commutation with the projective limit is a consequence of the functor preserving limits in the category of probability measures on Souslin spaces. To make this step fully explicit, we will add a short lemma in Section 3 that directly verifies the commutation for conditionally independent data, thereby confirming the reduction under the stated assumptions. revision: yes
-
Referee: [§4] §4 (projective systems characterization): the generalization of Orbanz's result to arbitrary X and universal models (P(Y)^X, mu, Id, P(Y)^X) should confirm that the consistency conditions of the projective system hold for general priors mu without additional regularity on the sampling operators.
Authors: We appreciate the request for explicit confirmation. In the proof of the generalized characterization (Proposition 4.2), the consistency conditions of the projective system follow directly from the identity morphisms in the universal model and the Souslin property of Y, which guarantees the existence of the projective limit for any prior μ in P(P(Y)^X). No further regularity on the sampling operators is imposed beyond measurability, which is part of the model definition. We will insert a clarifying remark after Proposition 4.2 to state this explicitly for general priors. revision: yes
Circularity Check
Minor self-citation to prior model definition; derivation otherwise independent
full rationale
The paper defines universal Bayesian supervised learning models as (P(Y)^X, mu, Id, P(Y)^X) and invokes functoriality of probabilistic morphisms together with projective-limit characterizations from the cited prior work Le2025 to establish existence of inversions and to equate sequential versus batch procedures under conditional independence. No equation or construction inside the present manuscript reduces the claimed equivalence to a fitted parameter, a self-referential definition, or a renaming of an input quantity. The recursive posterior-predictive formula is obtained directly from the same functorial application and is shown to recover the Kalman filter as a special case, supplying an external consistency check. The single self-citation is therefore minor and non-load-bearing for the central equivalence result.
Axiom & Free-Parameter Ledger
axioms (3)
- domain assumption Y is a Souslin space
- domain assumption Data are conditionally independent given the random function
- standard math Functoriality of probabilistic morphisms
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.