pith. sign in

arxiv: 2605.05996 · v1 · submitted 2026-05-07 · 📊 stat.ML · cs.LG

Gaussian mixture models in Hilbert spaces via kernel methods

Pith reviewed 2026-05-08 05:12 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords Gaussian mixture modelskernel mean embeddingsHilbert spacesfunctional dataclusteringinfinite-dimensional dataoptimizationapproximation theory
0
0 comments X

The pith

Gaussian mixture models can be defined for data in Hilbert spaces using kernel mean embeddings to achieve dense approximations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces a Gaussian mixture model tailored for random objects taking values in Hilbert spaces, such as time series of functions or graph data. By relying on kernel mean embeddings, the components of the mixture can be represented without needing explicit densities, which are often problematic in infinite dimensions. The authors provide optimization procedures for estimating the model parameters from data and prove that these procedures are well-defined. They further show that such mixtures can approximate any probability measure on the Hilbert space arbitrarily well. This is relevant for clustering applications in fields where data naturally live in high- or infinite-dimensional spaces, like medical functional data or network structures.

Core claim

The central contribution is a kernel-based Gaussian mixture model for Hilbert-space valued data. The model uses kernel mean embeddings to define mixture components, allowing for efficient optimization of the parameters. Theoretical analysis confirms the algorithm's validity and demonstrates that the class of such models is dense in the space of all probability measures on the Hilbert space.

What carries the argument

Kernel mean embeddings of Gaussian mixture components, which map the mixtures into a reproducing kernel Hilbert space to enable computation and approximation in the original infinite-dimensional space.

If this is right

  • Mixtures defined this way can densely approximate arbitrary probability distributions on Hilbert spaces.
  • The optimization algorithms provide practical estimation for clustering infinite-dimensional observations.
  • The approach applies directly to L² functional data and to random graphs represented in Laplacian spaces.
  • Theoretical guarantees ensure the model remains well-defined even when dimensions are infinite.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • One could test whether this embedding approach outperforms standard dimensionality reduction techniques in clustering accuracy for functional datasets.
  • The density result suggests potential use in nonparametric density estimation tasks within Hilbert spaces.
  • Extensions to time-series dependencies or other kernel choices might broaden applicability to dynamic data without additional assumptions.

Load-bearing premise

That the kernel mean embeddings preserve enough structure from the Gaussian components in the Hilbert space to allow both practical optimization and dense approximation of measures.

What would settle it

Observing a specific probability measure on a Hilbert space, such as a non-Gaussian distribution with certain smoothness properties, that cannot be approximated closer than some positive distance by any finite Gaussian mixture under the kernel embedding.

Figures

Figures reproduced from arXiv: 2605.05996 by Antonio \'Alvarez-L\'opez, Daniel L\'opez-Montero, Marcos Matabuena.

Figure 1
Figure 1. Figure 1: Representative structured data samples. Left: Hourly continuous glucose monitoring paths. Center: Signals on a fixed graph with varying node intensities. Right: Correlation matrices. Motivation. The motivation for this work is to perform clustering of dynamic, time-varying functional objects [9, 10] that take values in a possibly infinite-dimensional space X view at source ↗
Figure 2
Figure 2. Figure 2: Numerical sensitivity of the MMD objective to various hyperparameters and sample sizes. view at source ↗
Figure 3
Figure 3. Figure 3: Temporal glucose mixture in H1 . Left: Empirical (top) and model-predicted (bottom) mean glucose surfaces. Top right: Global cluster weights πk(t) (solid lines) with group-level posteriors for control (dotted) γ¯ctrl(t) and treatment (dashed) γ¯treat(t). Bottom right: Learned means. We parameterize π(t) = softmax z(t)  , where the logit path z: [0, T] → R K is the solution to a neural ODE [67], initialize… view at source ↗
Figure 4
Figure 4. Figure 4: Correlation-based temporal mixture in Sym(24). Left: MMD2 training loss (top) and cluster weights πk(t) (bottom). Right: Learned mean correlation matrices. 0 200 0.02 0.03 Loss Training loss (MMD²) 5 10 15 20 Treatment week 0.0 0.5 1.0 Probability (t): cluster weights 1(t) 2(t) 3(t) Week 1.4 Week 8.0 Week 14.7 Week 21.4 Control Treatment Similar (high) Less similar view at source ↗
Figure 5
Figure 5. Figure 5: Individual similarity graph temporal mixture. view at source ↗
Figure 6
Figure 6. Figure 6: R d Gaussian mixture recovery. Left: MMD2 training loss. Right: Empirical histogram of the samples overlaid with the true (blue) and learned (dashed coral) densities. Applicability. The measure-theoretic components of our framework—including Gaussian mixtures, Radon–Nikodym responsibilities, and MMD weak density arguments—extend naturally to separable Banach spaces. While orthogonal projections are unavail… view at source ↗
Figure 7
Figure 7. Figure 7: L 2 (0, 1; R 2 ) mixture with K = 5. (a) Raw trajectories (dimension 1) overlaid with true means. (b) True (dashed) versus predicted (solid) mean functions. (c) True versus predicted mixture weights. (d) MMD2 training loss. mixture of Gaussian processes [12, 38, 71]. Experiment. We generate multivariate functional data in X = L 2 (0, 1; R 2 ) from K = 5 Gaussian components with weights π = (0.30, 0.25, 0.2… view at source ↗
Figure 8
Figure 8. Figure 8: L 2 ([0, 1]2 ) mixture with K = 3. Left columns: True mean surfaces mk(s, t) (top row) and predicted surfaces (bottom row), aligned by color. Right column: MMD2 training loss (top) and true versus predicted mixture weights (bottom). x y z Data on S 2 True 1 True 2 True 3 Pred 1 Pred 2 Pred 3 0 100 200 300 400 Epoch 10 3 10 2 M M D2 MMD2 training loss k=1 k=2 k=3 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40… view at source ↗
Figure 9
Figure 9. Figure 9: L 2 (SO(3)) mixture with K = 3. Left: Data projected on S 2 with true and learned component directions. Middle: MMD2 training loss. Right: True versus predicted mixture weights. F.5 Graph signals We next test the method on graph-structured data, where the Hilbert-space geometry is induced by the graph Laplacian. Let G = (V, E) be a finite weighted graph with Laplacian L = D − W. For α > 0, we equip R |V | … view at source ↗
Figure 10
Figure 10. Figure 10: Graph-signal mixture with K = 3. Left block: True (top) versus predicted (bottom) mean signals per component, plotted on the shared Erdős–Rényi graph using a common colormap. Right column: MMD2 training loss (top) and true versus predicted weights (bottom). 0 2 4 t 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 x0(t) (a) Sample paths (state dim 0) 0 100 200 300 400 Epoch 10 1 10 0 M M D2 (b) MMD2 training loss 0… view at source ↗
Figure 11
Figure 11. Figure 11: Linear SDE system identification via MMD. view at source ↗
Figure 12
Figure 12. Figure 12: Representative QM9 molecules per component. Columns correspond to learned components view at source ↗
Figure 13
Figure 13. Figure 13: Most representative NTU skeleton sequences per component. Columns correspond to learned view at source ↗
read the original abstract

Modern datasets across many disciplines increasingly consist of time-evolving, potentially infinite-dimensional random objects, such as dynamic functional data, which are naturally modeled in Hilbert spaces. In these settings, characterizing probability measures, for example, through densities, can be ill-defined or technically challenging. Motivated by clustering applications, we propose a Gaussian mixture framework for Hilbert-space-valued data based on kernel mean embeddings and develop efficient optimization algorithms for estimation. We establish theoretical guarantees showing that the proposed algorithm is well defined and that the model yields a dense class of approximations in infinite-dimensional spaces. We evaluate the framework through extensive experiments on diverse structures and data geometries, including $L^2$-functional data and random graphs in Laplacian spaces arising in modern medical applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes a Gaussian mixture model framework for Hilbert-space-valued data (e.g., functional data or graph Laplacians) that replaces direct density modeling with kernel mean embeddings of Gaussian components. It develops associated optimization algorithms for parameter estimation and asserts two main theoretical results: (i) the algorithm is well-defined, and (ii) the resulting model class is dense in the space of probability measures on the Hilbert space. The claims are supported by experiments on L²-functional data and random graphs arising in medical applications.

Significance. If the density and well-definedness guarantees can be established rigorously, the work would supply a practical, density-free route to clustering and approximation of measures on infinite-dimensional spaces where classical densities are unavailable. The kernel-embedding approach naturally accommodates the cited data geometries and could extend existing GMM methodology beyond Euclidean settings. The experiments on medical graph data hint at downstream utility, but the absence of quantitative metrics, baselines, or error bars in the abstract leaves the practical significance difficult to gauge.

major comments (2)
  1. [Abstract] Abstract: The central claim that 'the model yields a dense class of approximations in infinite-dimensional spaces' is load-bearing for the theoretical contribution. This requires that finite mixtures of kernel mean embeddings of Gaussians are dense in the image of the embedding map over all probability measures. The manuscript appears to rely on kernel universality without an explicit argument that the restricted Gaussian-component family is dense in the weak topology induced by the RKHS; the skeptic correctly flags that injectivity of the embedding (characteristic kernel) does not automatically imply the Gaussian restriction is dense. A concrete counter-example exclusion or additional regularity condition on the kernel and the Gaussian family is needed.
  2. [Abstract] Abstract and theoretical sections: No derivation, proof sketch, or error analysis is supplied for either the well-definedness of the optimization algorithm or the density guarantee, despite the abstract asserting 'theoretical guarantees.' Without these, it is impossible to verify that the algorithm avoids degeneracy or that the approximation error can be controlled uniformly over the Hilbert space.
minor comments (1)
  1. [Abstract] Experiments are described only qualitatively ('extensive experiments on diverse structures'); quantitative results, error bars, baseline comparisons, and specific performance metrics should be added to allow assessment of practical performance.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review. We address each major comment below and will revise the manuscript to incorporate the requested clarifications and additions to the theoretical arguments.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that 'the model yields a dense class of approximations in infinite-dimensional spaces' is load-bearing for the theoretical contribution. This requires that finite mixtures of kernel mean embeddings of Gaussians are dense in the image of the embedding map over all probability measures. The manuscript appears to rely on kernel universality without an explicit argument that the restricted Gaussian-component family is dense in the weak topology induced by the RKHS; the skeptic correctly flags that injectivity of the embedding (characteristic kernel) does not automatically imply the Gaussian restriction is dense. A concrete counter-example exclusion or additional regularity condition on the kernel and the Gaussian family is needed.

    Authors: We agree that an explicit argument is required to establish density of the Gaussian-component mixtures in the weak topology induced by the RKHS, beyond mere universality of the kernel. The manuscript invokes the characteristic property to guarantee injectivity of the embedding, but does not fully detail why the Gaussian restriction preserves density. In the revision we will add a proof sketch in the theoretical section (and appendix) showing that, under the stated assumptions on the kernel (universal/characteristic) and with the Gaussian family parameterized by means and covariances that are dense in the Hilbert space, finite mixtures of their embeddings are dense in the image of the embedding map. We will also include a brief discussion excluding counterexamples by appealing to the approximation power of Gaussians under the kernel metric. revision: yes

  2. Referee: [Abstract] Abstract and theoretical sections: No derivation, proof sketch, or error analysis is supplied for either the well-definedness of the optimization algorithm or the density guarantee, despite the abstract asserting 'theoretical guarantees.' Without these, it is impossible to verify that the algorithm avoids degeneracy or that the approximation error can be controlled uniformly over the Hilbert space.

    Authors: We acknowledge that the abstract asserts theoretical guarantees while the main text provides only high-level statements without full derivations or error bounds. In the revised manuscript we will expand the theoretical sections to include (i) a derivation establishing well-definedness of the optimization algorithm together with regularization conditions that prevent degeneracy, and (ii) a proof sketch plus error analysis for the density result that yields uniform approximation bounds over bounded sets in the Hilbert space. These additions will be placed in the main theoretical development and supported by an appendix containing the complete arguments. revision: yes

Circularity Check

0 steps flagged

No circularity: density and well-definedness claims are external theorems

full rationale

The paper's central claims rest on establishing that the GMM kernel embedding algorithm is well-defined and that finite mixtures yield dense approximations in the RKHS. These are presented as theoretical results derived from properties of kernel mean embeddings and Gaussian mixtures, not as quantities defined in terms of themselves or as fitted parameters relabeled as predictions. No equations or self-citations are shown reducing the density guarantee to a tautology (e.g., no self-definitional embedding or ansatz smuggled via prior work by the same authors). The derivation chain therefore remains self-contained against external kernel universality results and does not collapse by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no specific free parameters, axioms, or invented entities can be extracted. The approach relies on standard kernel mean embeddings and Hilbert-space structure, but details of any kernel choices, regularization, or convergence assumptions are absent.

pith-pipeline@v0.9.0 · 5422 in / 1106 out tokens · 69367 ms · 2026-05-08T05:12:49.646034+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

74 extracted references · 74 canonical work pages

  1. [1]

    J. O. Ramsay et al.Functional Data Analysis. Springer Series in Statistics. Springer New York, 2005. 10

  2. [2]

    Springer Series in Statistics

    Frédéric Ferraty et al.Nonparametric Functional Data Analysis. Springer Series in Statistics. Springer New York, 2006

  3. [3]

    Model-Based Clustering, Discriminant Analysis, and Density Estimation

    Chris Fraley et al. “Model-Based Clustering, Discriminant Analysis, and Density Estimation”. In: Journal of the American Statistical Association97.458 (2002), pp. 611–631

  4. [4]

    Maximum Likelihood from Incomplete Data Via theEMAlgorithm

    A. P. Dempster et al. “Maximum Likelihood from Incomplete Data Via theEMAlgorithm”. In:Journal of the Royal Statistical Society Series B: Statistical Methodology39.1 (1977), pp. 1–22

  5. [5]

    Encyclopedia of Mathematics and Its Applications

    Giuseppe Da Prato et al.Stochastic Equations in Infinite Dimensions. Encyclopedia of Mathematics and Its Applications. Cambridge University Press, 2014

  6. [6]

    Haoyu Lu et al.Sequential Monte Carlo with Gaussian Mixture Approximation for Infinite-Dimensional Statistical Inverse Problems. 2026

  7. [7]

    A Kernel Two-Sample Test

    Arthur Gretton et al. “A Kernel Two-Sample Test”. In:Journal of Machine Learning Research13.25 (2012), pp. 723–773

  8. [8]

    Universality, Characteristic Kernels and RKHS Embedding of Measures

    Bharath K. Sriperumbudur et al. “Universality, Characteristic Kernels and RKHS Embedding of Measures”. In:Journal of Machine Learning Research12.70 (2011), pp. 2389–2410

  9. [9]

    Functional Models for Time-Varying Random Objects

    Paromita Dubey et al. “Functional Models for Time-Varying Random Objects”. In:Journal of the Royal Statistical Society Series B: Statistical Methodology82.2 (2020), pp. 275–327

  10. [10]

    Modeling Time-Varying Random Objects and Dynamic Networks

    Paromita Dubey et al. “Modeling Time-Varying Random Objects and Dynamic Networks”. In:Journal of the American Statistical Association117.540 (2022), pp. 2252–2267

  11. [11]

    Trial of hybrid closed-loop control in young children with type 1 diabetes

    R Paul Wadwa et al. “Trial of hybrid closed-loop control in young children with type 1 diabetes”. In: New England Journal of Medicine388.11 (2023), pp. 991–1001

  12. [12]

    Antonio Álvarez-López et al.Continuous-Time Learning of Probability Distributions: A Case Study in a Digital Trial of Young Children with Type 1 Diabetes. 2026. arXiv:2603.24427

  13. [13]

    François-Xavier Briol et al.A Dictionary of Closed-Form Kernel Mean Embeddings. 2025

  14. [14]

    Karhunen–Loève Decomposition of Gaussian Measures on Banach Spaces

    Xavier Bay et al. “Karhunen–Loève Decomposition of Gaussian Measures on Banach Spaces”. In: Probability and Mathematical Statistics39.2 (2019), pp. 279–297

  15. [15]

    Model-Based Clustering of Time Series in Group-Specific Functional Subspaces

    Charles Bouveyron et al. “Model-Based Clustering of Time Series in Group-Specific Functional Subspaces”. In:Advances in Data Analysis and Classification5.4 (2011), pp. 281–300

  16. [16]

    Functional Clustering and Identifying Substructures of Longitudinal Data

    Jeng-Min Chiou et al. “Functional Clustering and Identifying Substructures of Longitudinal Data”. In:Journal of the Royal Statistical Society Series B: Statistical Methodology69.4 (2007), pp. 679–699

  17. [17]

    Clustering for Sparsely Sampled Functional Data

    Gareth M James et al. “Clustering for Sparsely Sampled Functional Data”. In:Journal of the American Statistical Association98.462 (2003), pp. 397–408

  18. [18]

    Wavelet-Based Clustering for Mixed-Effects Functional Models in High Dimension

    M. Giacofci et al. “Wavelet-Based Clustering for Mixed-Effects Functional Models in High Dimension”. In:Biometrics69.1 (2013), pp. 31–40

  19. [19]

    Funclust: A Curves Clustering Method Using Functional Random Variables Density Approximation

    Julien Jacques et al. “Funclust: A Curves Clustering Method Using Functional Random Variables Density Approximation”. In:Neurocomputing112 (2013), pp. 164–171

  20. [20]

    Vladimir Bogachev.Gaussian Measures. Vol. 62. Mathematical Surveys and Monographs. American Mathematical Society, 1998

  21. [21]

    Defining Probability Density for a Distribution of Random Functions

    Aurore Delaigle et al. “Defining Probability Density for a Distribution of Random Functions”. In: The Annals of Statistics38.2 (2010)

  22. [22]

    K-Means Algorithms for Functional Data

    María Luz López García et al. “K-Means Algorithms for Functional Data”. In:Neurocomputing151 (2015), pp. 231–245

  23. [23]

    A Comparison of Hierarchical Methods for Clustering Functional Data

    Laura Ferreira et al. “A Comparison of Hierarchical Methods for Clustering Functional Data”. In: Communications in Statistics - Simulation and Computation38.9 (2009), pp. 1925–1949

  24. [24]

    A Hilbert Space Embedding for Distributions

    Alex Smola et al. “A Hilbert Space Embedding for Distributions”. In:Algorithmic Learning Theory. Ed. by Marcus Hutter et al. Springer, 2007, pp. 13–31

  25. [25]

    Kernel Mean Embedding of Distributions: A Review and Beyond

    Krikamol Muandet et al. “Kernel Mean Embedding of Distributions: A Review and Beyond”. In: Foundations and Trends®in Machine Learning10.1–2 (2017), pp. 1–141

  26. [26]

    Springer US, 2004

    Alain Berlinet et al.Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer US, 2004. 11

  27. [27]

    Equivalence of Distance-Based and RKHS-based Statistics in Hypothesis Testing

    Dino Sejdinovic et al. “Equivalence of Distance-Based and RKHS-based Statistics in Hypothesis Testing”. In:The Annals of Statistics41.5 (2013)

  28. [28]

    Generative Moment Matching Networks

    Yujia Li et al. “Generative Moment Matching Networks”. In:Proceedings of the 32nd International Conference on Machine Learning. PMLR, 2015, pp. 1718–1727

  29. [29]

    François-Xavier Briol et al.Statistical Inference for Generative Models with Maximum Mean Discrep- ancy. 2019

  30. [30]

    Finite Sample Properties of Parametric MMD Estimation: Robustness to Misspecification and Dependence

    Badr-Eddine Chérief-Abdellatif et al. “Finite Sample Properties of Parametric MMD Estimation: Robustness to Misspecification and Dependence”. In:Bernoulli28.1 (2022), pp. 181–213

  31. [31]

    Minimax Estimation of Kernel Mean Embeddings

    Ilya Tolstikhin et al. “Minimax Estimation of Kernel Mean Embeddings”. In:Journal of Machine Learning Research18.86 (2017), pp. 1–47

  32. [32]

    arXiv.org

    Guilherme França et al.Kernel K-Groups via Hartigan’s Method. arXiv.org. 2017

  33. [33]

    Kernel Biclustering Algorithm in Hilbert Spaces

    Marcos Matabuena et al. “Kernel Biclustering Algorithm in Hilbert Spaces”. In:Advances in Data Analysis and Classification(2025)

  34. [34]

    An Analysis of Distributional Reinforcement Learning with Gaussian Mixtures

    Mathis Antonetti et al. “An Analysis of Distributional Reinforcement Learning with Gaussian Mixtures”. In:Transactions on Machine Learning Research(2025)

  35. [35]

    A Kernel Two-Sample Test for Functional Data

    George Wynne et al. “A Kernel Two-Sample Test for Functional Data”. In:Journal of Machine Learning Research23.73 (2022), pp. 1–51

  36. [36]

    D. M. Titterington et al.Statistical Analysis of Finite Mixture Distributions. Wiley, 1985

  37. [37]

    Wasserstein Distributional Learning via Majorization-Minimization

    Chengliang Tang et al. “Wasserstein Distributional Learning via Majorization-Minimization”. In: Proceedings of The 26th International Conference on Artificial Intelligence and Statistics. PMLR, 2023, pp. 10703–10731

  38. [38]

    Mixtures of Gaussian Processes

    Volker Tresp. “Mixtures of Gaussian Processes”. In:Advances in Neural Information Processing Systems. Vol. 13. MIT Press, 2000

  39. [39]

    Estimating Mixture of Gaussian Processes by Kernel Smoothing

    Mian Huang et al. “Estimating Mixture of Gaussian Processes by Kernel Smoothing”. In:Journal of Business & Economic Statistics32.2 (2014), pp. 259–270

  40. [40]

    Clustering Gene Expression Time Series Data Using an Infinite Gaussian Process Mixture Model

    Ian C. McDowell et al. “Clustering Gene Expression Time Series Data Using an Infinite Gaussian Process Mixture Model”. In:PLOS Computational Biology14.1 (2018), e1005896

  41. [41]

    Statistical Aspects of Wasserstein Distances

    Victor M. Panaretos et al. “Statistical Aspects of Wasserstein Distances”. In:Annual Review of Statistics and Its Application6 (2019), pp. 405–431

  42. [42]

    Fast and Eager k -Medoids Clustering: O ( k ) Runtime Improvement of the PAM, CLARA, and CLARANS Algorithms

    Erich Schubert et al. “Fast and Eager k -Medoids Clustering: O ( k ) Runtime Improvement of the PAM, CLARA, and CLARANS Algorithms”. In:Information Systems101 (2021), p. 101804

  43. [43]

    Web-Scale k-Means Clustering

    D. Sculley. “Web-Scale k-Means Clustering”. In:Proceedings of the 19th International Conference on World Wide Web. ACM, 2010, pp. 1177–1178

  44. [44]

    Algorithms for Hierarchical Clustering: An Overview

    Fionn Murtagh et al. “Algorithms for Hierarchical Clustering: An Overview”. In:WIREs Data Mining and Knowledge Discovery2.1 (2012), pp. 86–97

  45. [45]

    Hierarchical Grouping to Optimize an Objective Function

    Joe H. Ward. “Hierarchical Grouping to Optimize an Objective Function”. In:Journal of the American Statistical Association58.301 (1963), pp. 236–244

  46. [46]

    BIRCH: An Efficient Data Clustering Method for Very Large Databases

    Tian Zhang et al. “BIRCH: An Efficient Data Clustering Method for Very Large Databases”. In:ACM SIGMOD Record25.2 (1996), pp. 103–114

  47. [47]

    A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise

    Martin Ester et al. “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise”. In:Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. KDD’96. AAAI Press, 1996, pp. 226–231

  48. [48]

    OPTICS: Ordering Points to Identify the Clustering Structure

    Mihael Ankerst et al. “OPTICS: Ordering Points to Identify the Clustering Structure”. In:Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data. ACM, 1999, pp. 49–60

  49. [49]

    On Spectral Clustering: Analysis and an Algorithm

    Andrew Y. Ng et al. “On Spectral Clustering: Analysis and an Algorithm”. In:Proceedings of the 15th International Conference on Neural Information Processing Systems: Natural and Synthetic. NIPS’01. MIT Press, 2001, pp. 849–856

  50. [50]

    Scikit-Learn: Machine Learning in Python

    F. Pedregosa et al. “Scikit-Learn: Machine Learning in Python”. In:Journal of Machine Learning Research12 (2011), pp. 2825–2830. 12

  51. [51]

    Routledge, 2017

    Leo Breiman et al.Classification And Regression Trees. Routledge, 2017

  52. [52]

    Categorical Functional Data Analysis. the Cfda r Package

    Cristian Preda et al. “Categorical Functional Data Analysis. the Cfda r Package”. In:Mathematics 9.23 (2021), p. 3074

  53. [53]

    Carnegie Mellon University, 2001

    Robert Thomas Olszewski.Generalized Feature Extraction for Structural Pattern Recognition in Time- Series Data. Carnegie Mellon University, 2001

  54. [54]

    Structure-Activity Relationship of Mutagenic Aromatic and Heteroaro- matic Nitro Compounds. Correlation with Molecular Orbital Energies and Hydrophobicity

    Asim Kumar Debnath et al. “Structure-Activity Relationship of Mutagenic Aromatic and Heteroaro- matic Nitro Compounds. Correlation with Molecular Orbital Energies and Hydrophobicity”. In: Journal of Medicinal Chemistry34.2 (1991), pp. 786–797

  55. [55]

    Wiley Series in Probability and Statistics

    Leonard Kaufman et al.Finding Groups in Data: An Introduction to Cluster Analysis. Wiley Series in Probability and Statistics. Wiley, 1990

  56. [56]

    A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems

    G. N. Lance et al. “A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems”. In:The Computer Journal9.4 (1967), pp. 373–380

  57. [57]

    Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection

    Ricardo J. G. B. Campello et al. “Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection”. In:ACM Transactions on Knowledge Discovery from Data10.1 (2015), pp. 1–51

  58. [58]

    Clustering to Minimize the Maximum Intercluster Distance

    Teofilo F. Gonzalez. “Clustering to Minimize the Maximum Intercluster Distance”. In:Theoretical Computer Science38 (1985), pp. 293–306

  59. [59]

    Bezdek.Pattern Recognition with Fuzzy Objective Function Algorithms

    James C. Bezdek.Pattern Recognition with Fuzzy Objective Function Algorithms. Springer US, 1981

  60. [60]

    Clustering by Passing Messages Between Data Points

    Brendan J. Frey et al. “Clustering by Passing Messages Between Data Points”. In:Science315.5814 (2007), pp. 972–976

  61. [61]

    Mean Shift, Mode Seeking, and Clustering

    Yizong Cheng. “Mean Shift, Mode Seeking, and Clustering”. In:IEEE Transactions on Pattern Analysis and Machine Intelligence17.8 (1995), pp. 790–799

  62. [62]

    A Review of Standards and Statistics Used to Describe Blood Glucose Monitor Performance

    Jan S Krouwer et al. “A Review of Standards and Statistics Used to Describe Blood Glucose Monitor Performance”. In:Journal of Diabetes Science and Technology4.1 (2010), pp. 75–83

  63. [63]

    Statistical Tools to Analyze Continuous Glucose Monitor Data

    William Clarke et al. “Statistical Tools to Analyze Continuous Glucose Monitor Data”. In:Diabetes technology & therapeutics11 (2009), S–45

  64. [64]

    Network Analysis of Intrinsic Functional Brain Connectivity in Alzheimer’s Disease

    Kaustubh Supekar et al. “Network Analysis of Intrinsic Functional Brain Connectivity in Alzheimer’s Disease”. In:PLOS Computational Biology4.6 (2008), e1000100

  65. [65]

    Parkinson’s Disease-Related Spatial Covariance Pattern Identified with Resting-State Functional MRI

    Tao Wu et al. “Parkinson’s Disease-Related Spatial Covariance Pattern Identified with Resting-State Functional MRI”. In:Journal of Cerebral Blood Flow & Metabolism35.11 (2015), pp. 1764–1770

  66. [66]

    Closed-Loop Insulin Delivery in Suboptimally Controlled Type 1 Diabetes: A Multicentre, 12-Week Randomised Trial

    Martin Tauschmann et al. “Closed-Loop Insulin Delivery in Suboptimally Controlled Type 1 Diabetes: A Multicentre, 12-Week Randomised Trial”. In:The Lancet392.10155 (2018), pp. 1321–1329

  67. [67]

    Ricky T. Q. Chen et al.Neural Ordinary Differential Equations. 2019

  68. [68]

    An Approach to Incorporate Subsampling Into a Generic Bayesian Hierarchical Model

    Jonathan R. Bradley. “An Approach to Incorporate Subsampling Into a Generic Bayesian Hierarchical Model”. In:Journal of Computational and Graphical Statistics30.4 (2021), pp. 889–905

  69. [69]

    Scikit-Fda: APythonPackage for Functional Data Analysis

    Carlos Ramos-Carreño et al. “Scikit-Fda: APythonPackage for Functional Data Analysis”. In: Journal of Statistical Software109.2 (2024)

  70. [70]

    Bishop.Pattern Recognition and Machine Learning

    Christopher M. Bishop.Pattern Recognition and Machine Learning. Information Science and Statistics. Springer, 2006

  71. [71]

    Adaptive Computation and Machine Learning

    Carl Edward Rasmussen et al.Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning. MIT Press, 2008

  72. [72]

    Amir Shahroudy et al.NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis. 2016. A Background This section establishes the notation and foundational properties of Gaussian measures and kernel methods used throughout this work. Appendix E discusses the extension of this framework to Banach spaces. 13 A.1 Gaussian measures on separable Hilbert sp...

  73. [73]

    Gaussian radial kernel (Proposition B.1)Let A∈RM×Mdenote the matrix representation of the restricted operatorA|XM in the basis (er)M r=1, with entries Arℓ=⟨Aeℓ,er⟩X . The closed forms of Proposition B.1 become the finite-dimensional approximations J(M) i,k := det ( IM +A 1/2KkA1/2)−1/2 ×exp { −1 2(xi−mk)⊤A1/2( IM +A 1/2KkA1/2)−1 A1/2(xi−mk) } , I(M) k,s :...

  74. [74]

    In the projected space, this becomesκ(x,y) = (x⊤y+c) p onRM

    Polynomial kernel (Proposition B.2)Fix an integer p≥1and c≥0and consider κ(x,y) = (⟨x,y⟩X +c)p. In the projected space, this becomesκ(x,y) = (x⊤y+c) p onRM . ComputingJ (M) i,k .Fory∼νk,M , define the scalars µ(M) i,k :=x⊤ i mk +c, v (M) i,k :=x⊤ i Kk xi, so that x⊤ i y+c is a one-dimensional Gaussian with meanµ(M) i,k and variancev(M) i,k . Then Proposit...