Beyond Euclidean Summaries: Online Change Point Detection for Distribution-Valued Data

Xiaoyu Chen; Yingyan Zeng; Yujing Huang

arxiv: 2602.07252 · v2 · pith:5NN55CNCnew · submitted 2026-02-06 · 📊 stat.ME

Beyond Euclidean Summaries: Online Change Point Detection for Distribution-Valued Data

Yingyan Zeng , Yujing Huang , Xiaoyu Chen This is my paper

Pith reviewed 2026-05-22 11:17 UTC · model grok-4.3

classification 📊 stat.ME

keywords change point detectionWasserstein spacedistribution-valued dataonline monitoringFréchet barycentertangent spacesequential detection

0 comments

The pith

Treating streaming batches as points in 2-Wasserstein space and mapping them to a tangent plane at the initial barycenter lets standard detectors catch shape changes that moments miss.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an online change-point procedure that works directly on batches of data whose empirical distributions vary over time. Rather than first collapsing each batch to a vector of moments or features, it keeps the full distributional information by embedding the batches in the space of probability measures equipped with the 2-Wasserstein distance. A single reference barycenter is estimated from the first observations; every later empirical distribution is then projected onto the tangent space at that barycenter, turning the nonlinear geometry into ordinary Euclidean vectors. Classical multivariate monitoring statistics can therefore be run on these vectors. The resulting procedure is shown to register shifts in shape or geometry with shorter delay than moment-based or model-free alternatives while preserving the same average run length under no change.

Core claim

Changes in the law of a stochastic process taking values in the space of probability measures are detected by estimating a pre-change Fréchet barycenter from initial data, mapping each new empirical distribution to the tangent space at that barycenter, and applying adapted sequential monitoring statistics to the resulting tangent vectors.

What carries the argument

Reference-centered tangent-space representation of 2-Wasserstein space, which converts the nonlinear geometry of distributions into a Euclidean vector field on which classical change-point detectors operate.

If this is right

Detection delay decreases for shifts that alter variance, multimodality, or support while low-order moments stay fixed.
The same average run length under no change is maintained because the tangent vectors inherit the asymptotic behavior of the original monitoring statistics.
Theoretical control of false-alarm probability follows from the local Euclidean structure once the barycenter is fixed.
The method applies to any data type whose batches admit empirical distribution estimates, including images, sensor histograms, and compositional observations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Periodic recomputation of the reference barycenter could extend reliable operation over long streams whose distributions drift slowly.
The same linearization step may be useful for other optimal-transport distances or for data whose geometry is given by a different metric.
In applications where only a small initial window is available, the quality of the barycenter estimate becomes the dominant factor limiting early detection.

Load-bearing premise

Distributional changes remain close enough to the initial reference barycenter that the tangent-space approximation stays accurate and the barycenter itself can be estimated reliably from the first observations.

What would settle it

A simulation in which the post-change distributions lie far from the initial barycenter, the tangent detector fails to flag the shift within the claimed delay, yet a direct nonlinear Wasserstein-distance monitor detects it promptly.

read the original abstract

Existing online change-point detection (CPD) methods rely on fixed-dimensional Euclidean summaries, implicitly assuming that distributional changes are well captured by moment-based or feature-based representations. They can obscure important changes in distributional shape or geometry. We propose an intrinsic distribution-valued CPD framework that treats streaming batch data as a stochastic process on the 2-Wasserstein space. Our method detects changes in the law of this process by mapping each empirical distribution to a tangent space relative to a pre-change Fr\'echet barycenter, yielding a reference-centered local linearization of 2-Wasserstein space. This representation enables sequential detectors by adapting classical multivariate monitoring statistics to tangent fields. We provide theoretical guarantees and demonstrate, via synthetic and real-world experiments, that our approach detects complex distributional shifts with reduced detection delay at matched $\mathrm{ARL}_0$ compared with moments-based and model-free baselines. The code is available at https://github.com/yyzeng43/IDD-icml .

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adapts Wasserstein tangent spaces for online CPD on distributions and demonstrates better detection performance, but finite-sample barycenter stability needs scrutiny.

read the letter

This paper's main contribution is a method for online change point detection that operates on distribution-valued data by linearizing around a Fréchet barycenter in Wasserstein space and then using adapted multivariate detectors. It is new in combining the intrinsic geometry of the 2-Wasserstein space with classical sequential analysis tools for streaming settings. The work does well in providing both theoretical guarantees and empirical evidence from synthetic and real data that shows improved detection delays compared to baselines that rely on moments or model-free approaches. Releasing the code is also a plus for reproducibility. The potential soft spot is in the estimation of the pre-change Fréchet barycenter from finite initial observations. If this estimate is off, especially with complex or high-dimensional distributions, the tangent space linearization may not hold accurately, which could undermine the claimed performance advantages and the control of ARL0. The stress test raises a fair point here that deserves checking in the full derivations and experiments. Overall, this is for researchers focused on statistical monitoring of non-Euclidean or distributional data streams. A reader interested in extending change detection beyond Euclidean assumptions will find practical value and ideas to build on. The paper shows clear thinking on the problem and engages with relevant literature, so it deserves a serious referee. I recommend sending it for peer review.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes an online change-point detection method for distribution-valued streaming data by embedding each batch into the tangent space of the 2-Wasserstein space at an estimated pre-change Fréchet barycenter and then applying adapted multivariate CUSUM and EWMA statistics. It claims theoretical guarantees for the resulting detectors together with empirical evidence of reduced detection delay at matched ARL0 relative to moments-based and model-free baselines on both synthetic and real-world examples. Reproducible code is provided.

Significance. If the central claims hold, the work offers a principled extension of change-point detection beyond Euclidean summaries, preserving geometric information that moment-based methods discard. The explicit provision of open code is a clear strength that supports reproducibility and further scrutiny. The approach could find use in applications where data arrive as empirical distributions rather than fixed-dimensional vectors.

major comments (2)

[§3.1–3.2] §3.1–3.2: The tangent-space linearization is defined at the Fréchet barycenter estimated from the initial window. No finite-sample error bounds or sensitivity analysis are given for how barycenter estimation error propagates into the tangent coordinates or into the nominal ARL0 of the adapted CUSUM/EWMA statistics; this assumption is load-bearing for the claimed reduction in detection delay.
[§5] §5: The experimental results report favorable detection delays, yet the section supplies neither the number of Monte Carlo replications, standard-error bars on the delay and ARL0 figures, nor explicit rules for data exclusion or hyper-parameter selection; without these details the robustness of the performance comparison cannot be verified.

minor comments (1)

[§2] Notation for the logarithmic map and tangent vectors is introduced without a short reminder of the precise definition used; a one-sentence clarification would aid readers unfamiliar with Wasserstein geometry.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for their constructive feedback and for recognizing the potential significance of our work on intrinsic change-point detection for distribution-valued data. We address each of the major comments below.

read point-by-point responses

Referee: [§3.1–3.2] §3.1–3.2: The tangent-space linearization is defined at the Fréchet barycenter estimated from the initial window. No finite-sample error bounds or sensitivity analysis are given for how barycenter estimation error propagates into the tangent coordinates or into the nominal ARL0 of the adapted CUSUM/EWMA statistics; this assumption is load-bearing for the claimed reduction in detection delay.

Authors: The referee correctly identifies that our theoretical guarantees assume the pre-change Fréchet barycenter is fixed or estimated consistently from the initial window. While we do not provide explicit finite-sample bounds in the current manuscript, the asymptotic theory ensures that the estimation error becomes negligible as the initial window size increases, preserving the control of ARL0 and the detection properties. To strengthen the presentation, we will include a sensitivity analysis in the revised manuscript, consisting of additional simulations that vary the initial window size and quantify the impact on detection delay and ARL0. This will demonstrate the robustness of the approach in finite samples. revision: yes
Referee: [§5] §5: The experimental results report favorable detection delays, yet the section supplies neither the number of Monte Carlo replications, standard-error bars on the delay and ARL0 figures, nor explicit rules for data exclusion or hyper-parameter selection; without these details the robustness of the performance comparison cannot be verified.

Authors: We agree that these experimental details are essential for verifying the robustness of our comparisons. In the revised version, we will add the number of Monte Carlo replications performed, include standard error bars on all reported figures, and provide explicit descriptions of the hyper-parameter tuning procedure and any data preprocessing or exclusion rules applied in the experiments. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation adapts established Wasserstein geometry and classical statistics

full rationale

The paper's core construction maps empirical distributions to a tangent space at a pre-change Fréchet barycenter in 2-Wasserstein space and then applies adapted multivariate CUSUM/EWMA detectors. This step relies on standard properties of the Wasserstein metric and logarithmic map, which are external to the present work and not defined in terms of the claimed detection-delay gains. No equation reduces a performance metric to a parameter fitted from the target result itself, and no load-bearing premise is justified solely by self-citation. The framework is therefore self-contained against external benchmarks in optimal transport and sequential analysis.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on standard domain assumptions from optimal transport and statistical process control; no free parameters, new entities, or ad-hoc axioms are visible in the abstract.

axioms (1)

domain assumption Streaming batch data can be treated as a stochastic process on the 2-Wasserstein space.
Explicitly stated in the abstract as the modeling choice that enables the tangent-space representation.

pith-pipeline@v0.9.0 · 5700 in / 1143 out tokens · 39274 ms · 2026-05-22T11:17:56.886915+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

mapping each empirical distribution to a tangent space relative to a pre-change Fréchet barycenter, yielding a reference-centered local linearization of 2-Wasserstein space
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Proposition 3.4 (Radial Isometry) … W₂²(¯µ, µ) = ∥v(µ)∥²_{L²(¯µ)}

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.