Gaussian mixture models in Hilbert spaces via kernel methods
Pith reviewed 2026-05-08 05:12 UTC · model grok-4.3
The pith
Gaussian mixture models can be defined for data in Hilbert spaces using kernel mean embeddings to achieve dense approximations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central contribution is a kernel-based Gaussian mixture model for Hilbert-space valued data. The model uses kernel mean embeddings to define mixture components, allowing for efficient optimization of the parameters. Theoretical analysis confirms the algorithm's validity and demonstrates that the class of such models is dense in the space of all probability measures on the Hilbert space.
What carries the argument
Kernel mean embeddings of Gaussian mixture components, which map the mixtures into a reproducing kernel Hilbert space to enable computation and approximation in the original infinite-dimensional space.
If this is right
- Mixtures defined this way can densely approximate arbitrary probability distributions on Hilbert spaces.
- The optimization algorithms provide practical estimation for clustering infinite-dimensional observations.
- The approach applies directly to L² functional data and to random graphs represented in Laplacian spaces.
- Theoretical guarantees ensure the model remains well-defined even when dimensions are infinite.
Where Pith is reading between the lines
- One could test whether this embedding approach outperforms standard dimensionality reduction techniques in clustering accuracy for functional datasets.
- The density result suggests potential use in nonparametric density estimation tasks within Hilbert spaces.
- Extensions to time-series dependencies or other kernel choices might broaden applicability to dynamic data without additional assumptions.
Load-bearing premise
That the kernel mean embeddings preserve enough structure from the Gaussian components in the Hilbert space to allow both practical optimization and dense approximation of measures.
What would settle it
Observing a specific probability measure on a Hilbert space, such as a non-Gaussian distribution with certain smoothness properties, that cannot be approximated closer than some positive distance by any finite Gaussian mixture under the kernel embedding.
Figures
read the original abstract
Modern datasets across many disciplines increasingly consist of time-evolving, potentially infinite-dimensional random objects, such as dynamic functional data, which are naturally modeled in Hilbert spaces. In these settings, characterizing probability measures, for example, through densities, can be ill-defined or technically challenging. Motivated by clustering applications, we propose a Gaussian mixture framework for Hilbert-space-valued data based on kernel mean embeddings and develop efficient optimization algorithms for estimation. We establish theoretical guarantees showing that the proposed algorithm is well defined and that the model yields a dense class of approximations in infinite-dimensional spaces. We evaluate the framework through extensive experiments on diverse structures and data geometries, including $L^2$-functional data and random graphs in Laplacian spaces arising in modern medical applications.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a Gaussian mixture model framework for Hilbert-space-valued data (e.g., functional data or graph Laplacians) that replaces direct density modeling with kernel mean embeddings of Gaussian components. It develops associated optimization algorithms for parameter estimation and asserts two main theoretical results: (i) the algorithm is well-defined, and (ii) the resulting model class is dense in the space of probability measures on the Hilbert space. The claims are supported by experiments on L²-functional data and random graphs arising in medical applications.
Significance. If the density and well-definedness guarantees can be established rigorously, the work would supply a practical, density-free route to clustering and approximation of measures on infinite-dimensional spaces where classical densities are unavailable. The kernel-embedding approach naturally accommodates the cited data geometries and could extend existing GMM methodology beyond Euclidean settings. The experiments on medical graph data hint at downstream utility, but the absence of quantitative metrics, baselines, or error bars in the abstract leaves the practical significance difficult to gauge.
major comments (2)
- [Abstract] Abstract: The central claim that 'the model yields a dense class of approximations in infinite-dimensional spaces' is load-bearing for the theoretical contribution. This requires that finite mixtures of kernel mean embeddings of Gaussians are dense in the image of the embedding map over all probability measures. The manuscript appears to rely on kernel universality without an explicit argument that the restricted Gaussian-component family is dense in the weak topology induced by the RKHS; the skeptic correctly flags that injectivity of the embedding (characteristic kernel) does not automatically imply the Gaussian restriction is dense. A concrete counter-example exclusion or additional regularity condition on the kernel and the Gaussian family is needed.
- [Abstract] Abstract and theoretical sections: No derivation, proof sketch, or error analysis is supplied for either the well-definedness of the optimization algorithm or the density guarantee, despite the abstract asserting 'theoretical guarantees.' Without these, it is impossible to verify that the algorithm avoids degeneracy or that the approximation error can be controlled uniformly over the Hilbert space.
minor comments (1)
- [Abstract] Experiments are described only qualitatively ('extensive experiments on diverse structures'); quantitative results, error bars, baseline comparisons, and specific performance metrics should be added to allow assessment of practical performance.
Simulated Author's Rebuttal
We thank the referee for the careful and constructive review. We address each major comment below and will revise the manuscript to incorporate the requested clarifications and additions to the theoretical arguments.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that 'the model yields a dense class of approximations in infinite-dimensional spaces' is load-bearing for the theoretical contribution. This requires that finite mixtures of kernel mean embeddings of Gaussians are dense in the image of the embedding map over all probability measures. The manuscript appears to rely on kernel universality without an explicit argument that the restricted Gaussian-component family is dense in the weak topology induced by the RKHS; the skeptic correctly flags that injectivity of the embedding (characteristic kernel) does not automatically imply the Gaussian restriction is dense. A concrete counter-example exclusion or additional regularity condition on the kernel and the Gaussian family is needed.
Authors: We agree that an explicit argument is required to establish density of the Gaussian-component mixtures in the weak topology induced by the RKHS, beyond mere universality of the kernel. The manuscript invokes the characteristic property to guarantee injectivity of the embedding, but does not fully detail why the Gaussian restriction preserves density. In the revision we will add a proof sketch in the theoretical section (and appendix) showing that, under the stated assumptions on the kernel (universal/characteristic) and with the Gaussian family parameterized by means and covariances that are dense in the Hilbert space, finite mixtures of their embeddings are dense in the image of the embedding map. We will also include a brief discussion excluding counterexamples by appealing to the approximation power of Gaussians under the kernel metric. revision: yes
-
Referee: [Abstract] Abstract and theoretical sections: No derivation, proof sketch, or error analysis is supplied for either the well-definedness of the optimization algorithm or the density guarantee, despite the abstract asserting 'theoretical guarantees.' Without these, it is impossible to verify that the algorithm avoids degeneracy or that the approximation error can be controlled uniformly over the Hilbert space.
Authors: We acknowledge that the abstract asserts theoretical guarantees while the main text provides only high-level statements without full derivations or error bounds. In the revised manuscript we will expand the theoretical sections to include (i) a derivation establishing well-definedness of the optimization algorithm together with regularization conditions that prevent degeneracy, and (ii) a proof sketch plus error analysis for the density result that yields uniform approximation bounds over bounded sets in the Hilbert space. These additions will be placed in the main theoretical development and supported by an appendix containing the complete arguments. revision: yes
Circularity Check
No circularity: density and well-definedness claims are external theorems
full rationale
The paper's central claims rest on establishing that the GMM kernel embedding algorithm is well-defined and that finite mixtures yield dense approximations in the RKHS. These are presented as theoretical results derived from properties of kernel mean embeddings and Gaussian mixtures, not as quantities defined in terms of themselves or as fitted parameters relabeled as predictions. No equations or self-citations are shown reducing the density guarantee to a tautology (e.g., no self-definitional embedding or ansatz smuggled via prior work by the same authors). The derivation chain therefore remains self-contained against external kernel universality results and does not collapse by construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
J. O. Ramsay et al.Functional Data Analysis. Springer Series in Statistics. Springer New York, 2005. 10
work page 2005
-
[2]
Frédéric Ferraty et al.Nonparametric Functional Data Analysis. Springer Series in Statistics. Springer New York, 2006
work page 2006
-
[3]
Model-Based Clustering, Discriminant Analysis, and Density Estimation
Chris Fraley et al. “Model-Based Clustering, Discriminant Analysis, and Density Estimation”. In: Journal of the American Statistical Association97.458 (2002), pp. 611–631
work page 2002
-
[4]
Maximum Likelihood from Incomplete Data Via theEMAlgorithm
A. P. Dempster et al. “Maximum Likelihood from Incomplete Data Via theEMAlgorithm”. In:Journal of the Royal Statistical Society Series B: Statistical Methodology39.1 (1977), pp. 1–22
work page 1977
-
[5]
Encyclopedia of Mathematics and Its Applications
Giuseppe Da Prato et al.Stochastic Equations in Infinite Dimensions. Encyclopedia of Mathematics and Its Applications. Cambridge University Press, 2014
work page 2014
-
[6]
Haoyu Lu et al.Sequential Monte Carlo with Gaussian Mixture Approximation for Infinite-Dimensional Statistical Inverse Problems. 2026
work page 2026
-
[7]
Arthur Gretton et al. “A Kernel Two-Sample Test”. In:Journal of Machine Learning Research13.25 (2012), pp. 723–773
work page 2012
-
[8]
Universality, Characteristic Kernels and RKHS Embedding of Measures
Bharath K. Sriperumbudur et al. “Universality, Characteristic Kernels and RKHS Embedding of Measures”. In:Journal of Machine Learning Research12.70 (2011), pp. 2389–2410
work page 2011
-
[9]
Functional Models for Time-Varying Random Objects
Paromita Dubey et al. “Functional Models for Time-Varying Random Objects”. In:Journal of the Royal Statistical Society Series B: Statistical Methodology82.2 (2020), pp. 275–327
work page 2020
-
[10]
Modeling Time-Varying Random Objects and Dynamic Networks
Paromita Dubey et al. “Modeling Time-Varying Random Objects and Dynamic Networks”. In:Journal of the American Statistical Association117.540 (2022), pp. 2252–2267
work page 2022
-
[11]
Trial of hybrid closed-loop control in young children with type 1 diabetes
R Paul Wadwa et al. “Trial of hybrid closed-loop control in young children with type 1 diabetes”. In: New England Journal of Medicine388.11 (2023), pp. 991–1001
work page 2023
- [12]
-
[13]
François-Xavier Briol et al.A Dictionary of Closed-Form Kernel Mean Embeddings. 2025
work page 2025
-
[14]
Karhunen–Loève Decomposition of Gaussian Measures on Banach Spaces
Xavier Bay et al. “Karhunen–Loève Decomposition of Gaussian Measures on Banach Spaces”. In: Probability and Mathematical Statistics39.2 (2019), pp. 279–297
work page 2019
-
[15]
Model-Based Clustering of Time Series in Group-Specific Functional Subspaces
Charles Bouveyron et al. “Model-Based Clustering of Time Series in Group-Specific Functional Subspaces”. In:Advances in Data Analysis and Classification5.4 (2011), pp. 281–300
work page 2011
-
[16]
Functional Clustering and Identifying Substructures of Longitudinal Data
Jeng-Min Chiou et al. “Functional Clustering and Identifying Substructures of Longitudinal Data”. In:Journal of the Royal Statistical Society Series B: Statistical Methodology69.4 (2007), pp. 679–699
work page 2007
-
[17]
Clustering for Sparsely Sampled Functional Data
Gareth M James et al. “Clustering for Sparsely Sampled Functional Data”. In:Journal of the American Statistical Association98.462 (2003), pp. 397–408
work page 2003
-
[18]
Wavelet-Based Clustering for Mixed-Effects Functional Models in High Dimension
M. Giacofci et al. “Wavelet-Based Clustering for Mixed-Effects Functional Models in High Dimension”. In:Biometrics69.1 (2013), pp. 31–40
work page 2013
-
[19]
Funclust: A Curves Clustering Method Using Functional Random Variables Density Approximation
Julien Jacques et al. “Funclust: A Curves Clustering Method Using Functional Random Variables Density Approximation”. In:Neurocomputing112 (2013), pp. 164–171
work page 2013
-
[20]
Vladimir Bogachev.Gaussian Measures. Vol. 62. Mathematical Surveys and Monographs. American Mathematical Society, 1998
work page 1998
-
[21]
Defining Probability Density for a Distribution of Random Functions
Aurore Delaigle et al. “Defining Probability Density for a Distribution of Random Functions”. In: The Annals of Statistics38.2 (2010)
work page 2010
-
[22]
K-Means Algorithms for Functional Data
María Luz López García et al. “K-Means Algorithms for Functional Data”. In:Neurocomputing151 (2015), pp. 231–245
work page 2015
-
[23]
A Comparison of Hierarchical Methods for Clustering Functional Data
Laura Ferreira et al. “A Comparison of Hierarchical Methods for Clustering Functional Data”. In: Communications in Statistics - Simulation and Computation38.9 (2009), pp. 1925–1949
work page 2009
-
[24]
A Hilbert Space Embedding for Distributions
Alex Smola et al. “A Hilbert Space Embedding for Distributions”. In:Algorithmic Learning Theory. Ed. by Marcus Hutter et al. Springer, 2007, pp. 13–31
work page 2007
-
[25]
Kernel Mean Embedding of Distributions: A Review and Beyond
Krikamol Muandet et al. “Kernel Mean Embedding of Distributions: A Review and Beyond”. In: Foundations and Trends®in Machine Learning10.1–2 (2017), pp. 1–141
work page 2017
-
[26]
Alain Berlinet et al.Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer US, 2004. 11
work page 2004
-
[27]
Equivalence of Distance-Based and RKHS-based Statistics in Hypothesis Testing
Dino Sejdinovic et al. “Equivalence of Distance-Based and RKHS-based Statistics in Hypothesis Testing”. In:The Annals of Statistics41.5 (2013)
work page 2013
-
[28]
Generative Moment Matching Networks
Yujia Li et al. “Generative Moment Matching Networks”. In:Proceedings of the 32nd International Conference on Machine Learning. PMLR, 2015, pp. 1718–1727
work page 2015
-
[29]
François-Xavier Briol et al.Statistical Inference for Generative Models with Maximum Mean Discrep- ancy. 2019
work page 2019
-
[30]
Finite Sample Properties of Parametric MMD Estimation: Robustness to Misspecification and Dependence
Badr-Eddine Chérief-Abdellatif et al. “Finite Sample Properties of Parametric MMD Estimation: Robustness to Misspecification and Dependence”. In:Bernoulli28.1 (2022), pp. 181–213
work page 2022
-
[31]
Minimax Estimation of Kernel Mean Embeddings
Ilya Tolstikhin et al. “Minimax Estimation of Kernel Mean Embeddings”. In:Journal of Machine Learning Research18.86 (2017), pp. 1–47
work page 2017
- [32]
-
[33]
Kernel Biclustering Algorithm in Hilbert Spaces
Marcos Matabuena et al. “Kernel Biclustering Algorithm in Hilbert Spaces”. In:Advances in Data Analysis and Classification(2025)
work page 2025
-
[34]
An Analysis of Distributional Reinforcement Learning with Gaussian Mixtures
Mathis Antonetti et al. “An Analysis of Distributional Reinforcement Learning with Gaussian Mixtures”. In:Transactions on Machine Learning Research(2025)
work page 2025
-
[35]
A Kernel Two-Sample Test for Functional Data
George Wynne et al. “A Kernel Two-Sample Test for Functional Data”. In:Journal of Machine Learning Research23.73 (2022), pp. 1–51
work page 2022
-
[36]
D. M. Titterington et al.Statistical Analysis of Finite Mixture Distributions. Wiley, 1985
work page 1985
-
[37]
Wasserstein Distributional Learning via Majorization-Minimization
Chengliang Tang et al. “Wasserstein Distributional Learning via Majorization-Minimization”. In: Proceedings of The 26th International Conference on Artificial Intelligence and Statistics. PMLR, 2023, pp. 10703–10731
work page 2023
-
[38]
Mixtures of Gaussian Processes
Volker Tresp. “Mixtures of Gaussian Processes”. In:Advances in Neural Information Processing Systems. Vol. 13. MIT Press, 2000
work page 2000
-
[39]
Estimating Mixture of Gaussian Processes by Kernel Smoothing
Mian Huang et al. “Estimating Mixture of Gaussian Processes by Kernel Smoothing”. In:Journal of Business & Economic Statistics32.2 (2014), pp. 259–270
work page 2014
-
[40]
Clustering Gene Expression Time Series Data Using an Infinite Gaussian Process Mixture Model
Ian C. McDowell et al. “Clustering Gene Expression Time Series Data Using an Infinite Gaussian Process Mixture Model”. In:PLOS Computational Biology14.1 (2018), e1005896
work page 2018
-
[41]
Statistical Aspects of Wasserstein Distances
Victor M. Panaretos et al. “Statistical Aspects of Wasserstein Distances”. In:Annual Review of Statistics and Its Application6 (2019), pp. 405–431
work page 2019
-
[42]
Erich Schubert et al. “Fast and Eager k -Medoids Clustering: O ( k ) Runtime Improvement of the PAM, CLARA, and CLARANS Algorithms”. In:Information Systems101 (2021), p. 101804
work page 2021
-
[43]
D. Sculley. “Web-Scale k-Means Clustering”. In:Proceedings of the 19th International Conference on World Wide Web. ACM, 2010, pp. 1177–1178
work page 2010
-
[44]
Algorithms for Hierarchical Clustering: An Overview
Fionn Murtagh et al. “Algorithms for Hierarchical Clustering: An Overview”. In:WIREs Data Mining and Knowledge Discovery2.1 (2012), pp. 86–97
work page 2012
-
[45]
Hierarchical Grouping to Optimize an Objective Function
Joe H. Ward. “Hierarchical Grouping to Optimize an Objective Function”. In:Journal of the American Statistical Association58.301 (1963), pp. 236–244
work page 1963
-
[46]
BIRCH: An Efficient Data Clustering Method for Very Large Databases
Tian Zhang et al. “BIRCH: An Efficient Data Clustering Method for Very Large Databases”. In:ACM SIGMOD Record25.2 (1996), pp. 103–114
work page 1996
-
[47]
A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise
Martin Ester et al. “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise”. In:Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. KDD’96. AAAI Press, 1996, pp. 226–231
work page 1996
-
[48]
OPTICS: Ordering Points to Identify the Clustering Structure
Mihael Ankerst et al. “OPTICS: Ordering Points to Identify the Clustering Structure”. In:Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data. ACM, 1999, pp. 49–60
work page 1999
-
[49]
On Spectral Clustering: Analysis and an Algorithm
Andrew Y. Ng et al. “On Spectral Clustering: Analysis and an Algorithm”. In:Proceedings of the 15th International Conference on Neural Information Processing Systems: Natural and Synthetic. NIPS’01. MIT Press, 2001, pp. 849–856
work page 2001
-
[50]
Scikit-Learn: Machine Learning in Python
F. Pedregosa et al. “Scikit-Learn: Machine Learning in Python”. In:Journal of Machine Learning Research12 (2011), pp. 2825–2830. 12
work page 2011
-
[51]
Leo Breiman et al.Classification And Regression Trees. Routledge, 2017
work page 2017
-
[52]
Categorical Functional Data Analysis. the Cfda r Package
Cristian Preda et al. “Categorical Functional Data Analysis. the Cfda r Package”. In:Mathematics 9.23 (2021), p. 3074
work page 2021
-
[53]
Carnegie Mellon University, 2001
Robert Thomas Olszewski.Generalized Feature Extraction for Structural Pattern Recognition in Time- Series Data. Carnegie Mellon University, 2001
work page 2001
-
[54]
Asim Kumar Debnath et al. “Structure-Activity Relationship of Mutagenic Aromatic and Heteroaro- matic Nitro Compounds. Correlation with Molecular Orbital Energies and Hydrophobicity”. In: Journal of Medicinal Chemistry34.2 (1991), pp. 786–797
work page 1991
-
[55]
Wiley Series in Probability and Statistics
Leonard Kaufman et al.Finding Groups in Data: An Introduction to Cluster Analysis. Wiley Series in Probability and Statistics. Wiley, 1990
work page 1990
-
[56]
A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems
G. N. Lance et al. “A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems”. In:The Computer Journal9.4 (1967), pp. 373–380
work page 1967
-
[57]
Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection
Ricardo J. G. B. Campello et al. “Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection”. In:ACM Transactions on Knowledge Discovery from Data10.1 (2015), pp. 1–51
work page 2015
-
[58]
Clustering to Minimize the Maximum Intercluster Distance
Teofilo F. Gonzalez. “Clustering to Minimize the Maximum Intercluster Distance”. In:Theoretical Computer Science38 (1985), pp. 293–306
work page 1985
-
[59]
Bezdek.Pattern Recognition with Fuzzy Objective Function Algorithms
James C. Bezdek.Pattern Recognition with Fuzzy Objective Function Algorithms. Springer US, 1981
work page 1981
-
[60]
Clustering by Passing Messages Between Data Points
Brendan J. Frey et al. “Clustering by Passing Messages Between Data Points”. In:Science315.5814 (2007), pp. 972–976
work page 2007
-
[61]
Mean Shift, Mode Seeking, and Clustering
Yizong Cheng. “Mean Shift, Mode Seeking, and Clustering”. In:IEEE Transactions on Pattern Analysis and Machine Intelligence17.8 (1995), pp. 790–799
work page 1995
-
[62]
A Review of Standards and Statistics Used to Describe Blood Glucose Monitor Performance
Jan S Krouwer et al. “A Review of Standards and Statistics Used to Describe Blood Glucose Monitor Performance”. In:Journal of Diabetes Science and Technology4.1 (2010), pp. 75–83
work page 2010
-
[63]
Statistical Tools to Analyze Continuous Glucose Monitor Data
William Clarke et al. “Statistical Tools to Analyze Continuous Glucose Monitor Data”. In:Diabetes technology & therapeutics11 (2009), S–45
work page 2009
-
[64]
Network Analysis of Intrinsic Functional Brain Connectivity in Alzheimer’s Disease
Kaustubh Supekar et al. “Network Analysis of Intrinsic Functional Brain Connectivity in Alzheimer’s Disease”. In:PLOS Computational Biology4.6 (2008), e1000100
work page 2008
-
[65]
Parkinson’s Disease-Related Spatial Covariance Pattern Identified with Resting-State Functional MRI
Tao Wu et al. “Parkinson’s Disease-Related Spatial Covariance Pattern Identified with Resting-State Functional MRI”. In:Journal of Cerebral Blood Flow & Metabolism35.11 (2015), pp. 1764–1770
work page 2015
-
[66]
Martin Tauschmann et al. “Closed-Loop Insulin Delivery in Suboptimally Controlled Type 1 Diabetes: A Multicentre, 12-Week Randomised Trial”. In:The Lancet392.10155 (2018), pp. 1321–1329
work page 2018
-
[67]
Ricky T. Q. Chen et al.Neural Ordinary Differential Equations. 2019
work page 2019
-
[68]
An Approach to Incorporate Subsampling Into a Generic Bayesian Hierarchical Model
Jonathan R. Bradley. “An Approach to Incorporate Subsampling Into a Generic Bayesian Hierarchical Model”. In:Journal of Computational and Graphical Statistics30.4 (2021), pp. 889–905
work page 2021
-
[69]
Scikit-Fda: APythonPackage for Functional Data Analysis
Carlos Ramos-Carreño et al. “Scikit-Fda: APythonPackage for Functional Data Analysis”. In: Journal of Statistical Software109.2 (2024)
work page 2024
-
[70]
Bishop.Pattern Recognition and Machine Learning
Christopher M. Bishop.Pattern Recognition and Machine Learning. Information Science and Statistics. Springer, 2006
work page 2006
-
[71]
Adaptive Computation and Machine Learning
Carl Edward Rasmussen et al.Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning. MIT Press, 2008
work page 2008
-
[72]
Amir Shahroudy et al.NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis. 2016. A Background This section establishes the notation and foundational properties of Gaussian measures and kernel methods used throughout this work. Appendix E discusses the extension of this framework to Banach spaces. 13 A.1 Gaussian measures on separable Hilbert sp...
work page 2016
-
[73]
Gaussian radial kernel (Proposition B.1)Let A∈RM×Mdenote the matrix representation of the restricted operatorA|XM in the basis (er)M r=1, with entries Arℓ=⟨Aeℓ,er⟩X . The closed forms of Proposition B.1 become the finite-dimensional approximations J(M) i,k := det ( IM +A 1/2KkA1/2)−1/2 ×exp { −1 2(xi−mk)⊤A1/2( IM +A 1/2KkA1/2)−1 A1/2(xi−mk) } , I(M) k,s :...
-
[74]
In the projected space, this becomesκ(x,y) = (x⊤y+c) p onRM
Polynomial kernel (Proposition B.2)Fix an integer p≥1and c≥0and consider κ(x,y) = (⟨x,y⟩X +c)p. In the projected space, this becomesκ(x,y) = (x⊤y+c) p onRM . ComputingJ (M) i,k .Fory∼νk,M , define the scalars µ(M) i,k :=x⊤ i mk +c, v (M) i,k :=x⊤ i Kk xi, so that x⊤ i y+c is a one-dimensional Gaussian with meanµ(M) i,k and variancev(M) i,k . Then Proposit...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.