Blurring Mean Shift for Clustering Functional Data: A Scalable Algorithm and Convergence Analysis

arxiv: 2507.14457 · v3 · submitted 2025-07-19 · 📊 stat.ME

Blurring Mean Shift for Clustering Functional Data: A Scalable Algorithm and Convergence Analysis

Toshinari Morimoto , Ting-Li Chen , Su-Yun Huang , Ruey S. Tsay This is my paper

Pith reviewed 2026-05-19 04:44 UTC · model grok-4.3

classification 📊 stat.ME

keywords functional data clusteringblurring mean shiftconvergence analysisstochastic algorithmHilbert spacescalable clusteringkernel methods

0 comments p. Extension

The pith

Blurring mean shift converges for functional data in Hilbert space with a stochastic variant that approximates the full updates for large subsets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper adapts blurring mean shift clustering to functional data by operating in a Hilbert space setting. It proves that the full iterative procedure converges and shows that a stochastic version using random partitions produces one-step updates close to the full algorithm when subsets are large enough. A sympathetic reader would care because this supplies a theoretically supported way to group curves or profiles without first declaring how many groups exist and without computing on every observation at each step.

Core claim

The full blurring functional mean shift procedure converges, and when the subset size is sufficiently large the one-step update of the stochastic variant is well approximated by the corresponding update of the full algorithm.

What carries the argument

The blurring kernel applied iteratively to shift functional observations toward density modes inside the Hilbert space.

If this is right

Clustering proceeds without any preset number of groups.
The method applies directly to infinite-dimensional observations such as time series or spatial profiles.
Random partitioning reduces per-iteration cost while preserving the direction of each shift for sufficiently large subsets.
Convergence of the full procedure supplies a stopping criterion and reliability guarantee for the iterates.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same stochastic-partition idea could be tested on other kernel-based functional clustering routines to check whether one-step approximation still holds.
Simulation studies that track the distance between full and stochastic trajectories as subset size grows would quantify the approximation rate left implicit in the analysis.
The Hilbert-space contraction condition might be relaxed to other Banach spaces if the kernel is adjusted accordingly.

Load-bearing premise

The functional observations are elements of a Hilbert space and the blurring kernel is chosen so that the iterative map remains well-defined and contractive in that space.

What would settle it

Run the full and stochastic procedures on the hourly Taiwan PM2.5 data or Argo profiles and measure whether the one-step cluster assignments diverge materially once subset size exceeds a moderate fraction of the total sample.

Figures

Figures reproduced from arXiv: 2507.14457 by Ruey S. Tsay, Su-Yun Huang, Ting-Li Chen, Toshinari Morimoto.

**Figure 2.** Figure 2: Histogram of pairwise distances between 1,000 randomly sampled functions. [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗

**Figure 3.** Figure 3: Boxplots of computation time (top row), Adjusted Rand Index (middle row), and [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

**Figure 4.** Figure 4: Clustering result based on hourly PM2.5 trajectories from the AirBox dataset. Each curve in Panel (a) represents a monitoring site and shows its temporal PM2.5 variation, colored by cluster. Panel (b) displays the geographic locations of the sites, also colored by cluster, illustrating the spatial distribution of each group. 6 Application to Argo profiles In this section, we turn to the Argo dataset, a lar… view at source ↗

**Figure 5.** Figure 5: Cluster maps of temperature profiles: (a) the four largest clusters overlaid on a single [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

**Figure 6.** Figure 6: Cluster maps of salinity profiles: (a) the four largest clusters overlaid on a single map, [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗

read the original abstract

This paper extends the blurring mean shift algorithm from vector-valued data to functional data, enabling effective clustering in infinite-dimensional settings without requiring specification of the number of clusters. To address the computational challenges posed by large-scale datasets, we introduce a fast stochastic variant that significantly reduces computational complexity. We provide a rigorous convergence analysis for the full blurring functional mean shift procedure, establishing theoretical guarantees for its iterative behavior. For the stochastic variant, we provide partial theoretical justification by showing that, when the subset size is sufficiently large, its one-step update is well approximated by the corresponding update of the full algorithm. The proposed method is demonstrated through real-data applications, including hourly Taiwan PM$_{2.5}$ measurements and Argo oceanographic profiles. Our key contributions include: (1) extending the blurring mean shift algorithm to functional data in a Hilbert-space setting; (2) developing a scalable stochastic variant based on random partitioning for large-scale data; (3) establishing convergence results for the full blurring functional mean shift algorithm; and (4) demonstrating the scalability and practical usefulness of the proposed method through simulation and real-data applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Blurring mean shift gets extended to functional data with convergence theory, yet the stochastic version lacks control on error accumulation over iterations.

read the letter

The main thing here is an extension of blurring mean shift clustering to functional data in a Hilbert space, complete with a stochastic random-partition variant for speed and convergence analysis for the full method. The stochastic part only gets a partial result showing one-step approximation when the subset size is large. What the paper does is move the vector-valued blurring mean shift into infinite dimensions and add the scalable version based on random partitioning. The convergence results for the full iterative procedure are new in this setting, and the applications to hourly PM2.5 measurements and Argo profiles illustrate how it clusters without pre-choosing the number of groups. The work is grounded enough in the abstract to suggest clear thinking on the extension. The real-data examples add practical value. The soft spot is exactly the one in the stress test: the stochastic variant's justification is limited to a single update. With multiple iterations, approximation errors could add up, and nothing in the provided summary indicates a bound that prevents this accumulation or invokes a contraction to control it over steps. This makes the convergence claim for the fast algorithm rest on an implicit assumption that is not verified. The handling of kernels and operators in the infinite-dimensional case also merits checking. This is aimed at statisticians working on clustering for functional observations, such as in environmental or oceanographic data. A reader seeking a cluster-free method that scales would get something usable from it, assuming the theory checks out. It has enough new elements and evidence to deserve a serious referee. I would send it to peer review, asking referees to focus on whether the one-step result extends to the full stochastic procedure.

Referee Report

1 major / 0 minor

Summary. The manuscript extends the blurring mean shift algorithm to functional data in a Hilbert-space setting for clustering without pre-specifying the number of clusters. It introduces a stochastic variant based on random partitioning to reduce computational cost for large datasets and supplies a convergence analysis for the full deterministic procedure together with a partial result showing that the one-step update of the stochastic variant approximates the full update when the subset size is sufficiently large. The approach is illustrated on hourly Taiwan PM2.5 data and Argo oceanographic profiles.

Significance. If the stated convergence results hold under the Hilbert-space assumptions, the work supplies a theoretically supported, cluster-number-free method for functional data clustering that scales to large samples. The combination of an infinite-dimensional formulation with a stochastic approximation addresses a practical bottleneck in functional data analysis, and the real-data examples indicate applicability to environmental and oceanographic monitoring.

major comments (1)

Abstract and contributions (4): the partial justification for the stochastic variant is limited to a one-step approximation result. Because the algorithm is iterative, establishing that the stochastic procedure converges requires controlling the accumulation of approximation errors across successive iterations; the manuscript does not provide a uniform-in-iteration error bound or invoke a contraction argument that would prevent error propagation, leaving the overall reliability of the stochastic variant for clustering unverified.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback on our manuscript. We address the major comment below.

read point-by-point responses

Referee: Abstract and contributions (4): the partial justification for the stochastic variant is limited to a one-step approximation result. Because the algorithm is iterative, establishing that the stochastic procedure converges requires controlling the accumulation of approximation errors across successive iterations; the manuscript does not provide a uniform-in-iteration error bound or invoke a contraction argument that would prevent error propagation, leaving the overall reliability of the stochastic variant for clustering unverified.

Authors: We appreciate the referee's observation on this point. The manuscript explicitly describes the result for the stochastic variant as a one-step approximation (see abstract and Section 4), rather than a full convergence guarantee for the iterative procedure. We agree that controlling accumulated approximation errors over multiple iterations would require additional technical arguments, such as a uniform bound or contraction mapping, which are not developed here. In the revised manuscript we will update the abstract and the list of contributions to state more precisely that the stochastic analysis is limited to the one-step case, and we will add a short remark in the discussion section noting that error propagation across iterations remains open for future work. This change will align the stated claims with the actual theorems while preserving the practical motivation and empirical evidence for the stochastic variant. revision: yes

Circularity Check

0 steps flagged

No circularity: convergence analysis and one-step approximation are independently derived

full rationale

The paper establishes convergence results for the full blurring functional mean shift algorithm in a Hilbert-space setting and separately shows that the stochastic variant's one-step update approximates the full update for large subset sizes. No equations reduce a claimed prediction or convergence guarantee to a fitted parameter or self-referential definition by construction. The provided abstract and contributions list no load-bearing self-citations that justify the central claims, nor any ansatz smuggled via prior work by the same authors. The derivation chain remains self-contained against external benchmarks such as standard mean-shift convergence arguments in functional spaces, yielding no detectable circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard Hilbert-space assumptions for functional data and on the existence of a suitable blurring kernel that preserves the iterative structure; no explicit free parameters or new invented entities are named in the abstract.

axioms (1)

domain assumption Functional observations belong to a Hilbert space in which the mean-shift operator is well-defined
Invoked throughout the abstract when moving from vector-valued to functional data.

pith-pipeline@v0.9.0 · 5735 in / 1301 out tokens · 45827 ms · 2026-05-19T04:44:22.951482+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce the notion of a 'surrogate density' ρ(f | {f_i}) = n⁻¹ Σ K_h(‖f−f_i‖_H) … the functional mean shift operator M(f | {f_i}) … Theorem 1 (Convergence properties) … monotonic increase of the average surrogate density … Gâteaux derivative …
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our analysis is carried out in L²([0,1]) … no pointwise smoothness assumptions …

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages · 1 internal anchor

[1]

Springer Science & Business Media, 2000

Denis Bosq.Linear Processes in Function Spaces: Theory and Applications, volume 149. Springer Science & Business Media, 2000

work page 2000
[2]

An open framework for partic- ipatory pm2.5 monitoring in smart cities.IEEE Access, 5:14441–14454, 2017

Ling-Jyh Chen, Yao-Hua Ho, Hu-Cheng Lee, Hsuan-Cho Wu, Hao-Min Liu, Hsin-Hung Hsieh, Yu-Te Huang, and Shih-Chun Candice Lung. An open framework for partic- ipatory pm2.5 monitoring in smart cities.IEEE Access, 5:14441–14454, 2017. doi: 10.1109/ACCESS.2017.2723919

work page doi:10.1109/access.2017.2723919 2017
[3]

On the convergence and consistency of the blurring mean-shift process

Ting-Li Chen. On the convergence and consistency of the blurring mean-shift process. Annals of the Institute of Statistical Mathematics, 67(1):157–176, 2015

work page 2015
[4]

Mean shift, mode seeking, and clustering.IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(8):790–799, 1995

Yizong Cheng. Mean shift, mode seeking, and clustering.IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(8):790–799, 1995

work page 1995
[5]

The functional mean-shift algorithm for mode hunting and clustering in infinite dimensions

Mattia Ciollaro, Christopher Genovese, Jing Lei, and Larry Wasserman. The functional mean-shift algorithm for mode hunting and clustering in infinite dimensions.arXiv preprint arXiv:1408.1187, 2014. 32

work page internal anchor Pith review Pith/arXiv arXiv 2014
[6]

Mean shift: A robust approach toward feature space analysis.IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5):603–619, 2002

Dorin Comaniciu and Peter Meer. Mean shift: A robust approach toward feature space analysis.IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5):603–619, 2002

work page 2002
[7]

Nonparametric estimation of a surrogate density function in infinite-dimensional spaces.Journal of Nonparametric Statis- tics, 24(2):447–464, 2012

Fr´ ed´ eric Ferraty, Nadia Kudraszow, and Philippe Vieu. Nonparametric estimation of a surrogate density function in infinite-dimensional spaces.Journal of Nonparametric Statis- tics, 24(2):447–464, 2012. doi: 10.1080/10485252.2012.671943. URLhttps://doi.org/ 10.1080/10485252.2012.671943

work page doi:10.1080/10485252.2012.671943 2012
[8]

The estimation of the gradient of a density function, with applications in pattern recognition.IEEE Transactions on Information Theory, 21(1):32–40, 1975

Keinosuke Fukunaga and Larry Hostetler. The estimation of the gradient of a density function, with applications in pattern recognition.IEEE Transactions on Information Theory, 21(1):32–40, 1975

work page 1975
[9]

Journal of Classification , year=1985, volume=

Lawrence Hubert and Phipps Arabie. Comparing partitions.Journal of Classification, 2 (1):193–218, 1985. doi: 10.1007/BF01908075

work page doi:10.1007/bf01908075 1985
[10]

Tsay.Statistical Learning for Big Dependent Data

Daniel Pe˜ na and Ruey S. Tsay.Statistical Learning for Big Dependent Data. John Wiley and Sons, Inc., Hoboken, NJ, 2021

work page 2021
[11]

Randomized self- updating process for clustering large-scale data.Statistics and Computing, 34(1):47, 2024

Shang-Ying Shiu, Yen-Shiu Chin, Szu-Han Lin, and Ting-Li Chen. Randomized self- updating process for clustering large-scale data.Statistics and Computing, 34(1):47, 2024

work page 2024
[12]

Argo data 1999–2019: Two million temperature-salinity profiles and subsurface velocity observations from a global array of profiling floats.Frontiers in Marine Science, 7:700, 2020

Annie PS Wong, Susan E Wijffels, Stephen C Riser, Sylvie Pouliquen, Shigeki Hosoda, Dean Roemmich, John Gilson, Gregory C Johnson, Kim Martini, David J Murphy, et al. Argo data 1999–2019: Two million temperature-salinity profiles and subsurface velocity observations from a global array of profiling floats.Frontiers in Marine Science, 7:700, 2020

work page 1999
[13]

Convergence analysis of mean shift.IEEE Trans- actions on Pattern Analysis and Machine Intelligence, 46(10):6688–6698, 2024

Ryoya Yamasaki and Toshiyuki Tanaka. Convergence analysis of mean shift.IEEE Trans- actions on Pattern Analysis and Machine Intelligence, 46(10):6688–6698, 2024

work page 2024
[14]

A functional-data approach to the argo data

Drew Yarger, Stilian Stoev, and Tailen Hsing. A functional-data approach to the argo data. The Annals of Applied Statistics, 16(1):216–246, 2022. 33

work page 2022

[1] [1]

Springer Science & Business Media, 2000

Denis Bosq.Linear Processes in Function Spaces: Theory and Applications, volume 149. Springer Science & Business Media, 2000

work page 2000

[2] [2]

An open framework for partic- ipatory pm2.5 monitoring in smart cities.IEEE Access, 5:14441–14454, 2017

Ling-Jyh Chen, Yao-Hua Ho, Hu-Cheng Lee, Hsuan-Cho Wu, Hao-Min Liu, Hsin-Hung Hsieh, Yu-Te Huang, and Shih-Chun Candice Lung. An open framework for partic- ipatory pm2.5 monitoring in smart cities.IEEE Access, 5:14441–14454, 2017. doi: 10.1109/ACCESS.2017.2723919

work page doi:10.1109/access.2017.2723919 2017

[3] [3]

On the convergence and consistency of the blurring mean-shift process

Ting-Li Chen. On the convergence and consistency of the blurring mean-shift process. Annals of the Institute of Statistical Mathematics, 67(1):157–176, 2015

work page 2015

[4] [4]

Mean shift, mode seeking, and clustering.IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(8):790–799, 1995

Yizong Cheng. Mean shift, mode seeking, and clustering.IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(8):790–799, 1995

work page 1995

[5] [5]

The functional mean-shift algorithm for mode hunting and clustering in infinite dimensions

Mattia Ciollaro, Christopher Genovese, Jing Lei, and Larry Wasserman. The functional mean-shift algorithm for mode hunting and clustering in infinite dimensions.arXiv preprint arXiv:1408.1187, 2014. 32

work page internal anchor Pith review Pith/arXiv arXiv 2014

[6] [6]

Mean shift: A robust approach toward feature space analysis.IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5):603–619, 2002

Dorin Comaniciu and Peter Meer. Mean shift: A robust approach toward feature space analysis.IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5):603–619, 2002

work page 2002

[7] [7]

Nonparametric estimation of a surrogate density function in infinite-dimensional spaces.Journal of Nonparametric Statis- tics, 24(2):447–464, 2012

Fr´ ed´ eric Ferraty, Nadia Kudraszow, and Philippe Vieu. Nonparametric estimation of a surrogate density function in infinite-dimensional spaces.Journal of Nonparametric Statis- tics, 24(2):447–464, 2012. doi: 10.1080/10485252.2012.671943. URLhttps://doi.org/ 10.1080/10485252.2012.671943

work page doi:10.1080/10485252.2012.671943 2012

[8] [8]

The estimation of the gradient of a density function, with applications in pattern recognition.IEEE Transactions on Information Theory, 21(1):32–40, 1975

Keinosuke Fukunaga and Larry Hostetler. The estimation of the gradient of a density function, with applications in pattern recognition.IEEE Transactions on Information Theory, 21(1):32–40, 1975

work page 1975

[9] [9]

Journal of Classification , year=1985, volume=

Lawrence Hubert and Phipps Arabie. Comparing partitions.Journal of Classification, 2 (1):193–218, 1985. doi: 10.1007/BF01908075

work page doi:10.1007/bf01908075 1985

[10] [10]

Tsay.Statistical Learning for Big Dependent Data

Daniel Pe˜ na and Ruey S. Tsay.Statistical Learning for Big Dependent Data. John Wiley and Sons, Inc., Hoboken, NJ, 2021

work page 2021

[11] [11]

Randomized self- updating process for clustering large-scale data.Statistics and Computing, 34(1):47, 2024

Shang-Ying Shiu, Yen-Shiu Chin, Szu-Han Lin, and Ting-Li Chen. Randomized self- updating process for clustering large-scale data.Statistics and Computing, 34(1):47, 2024

work page 2024

[12] [12]

Argo data 1999–2019: Two million temperature-salinity profiles and subsurface velocity observations from a global array of profiling floats.Frontiers in Marine Science, 7:700, 2020

Annie PS Wong, Susan E Wijffels, Stephen C Riser, Sylvie Pouliquen, Shigeki Hosoda, Dean Roemmich, John Gilson, Gregory C Johnson, Kim Martini, David J Murphy, et al. Argo data 1999–2019: Two million temperature-salinity profiles and subsurface velocity observations from a global array of profiling floats.Frontiers in Marine Science, 7:700, 2020

work page 1999

[13] [13]

Convergence analysis of mean shift.IEEE Trans- actions on Pattern Analysis and Machine Intelligence, 46(10):6688–6698, 2024

Ryoya Yamasaki and Toshiyuki Tanaka. Convergence analysis of mean shift.IEEE Trans- actions on Pattern Analysis and Machine Intelligence, 46(10):6688–6698, 2024

work page 2024

[14] [14]

A functional-data approach to the argo data

Drew Yarger, Stilian Stoev, and Tailen Hsing. A functional-data approach to the argo data. The Annals of Applied Statistics, 16(1):216–246, 2022. 33

work page 2022