Learning Mixture Models via Efficient High-dimensional Sparse Fourier Transforms

Alkis Kalavasis; Manolis Zampetakis; Pravesh K. Kothari; Shuchen Li

arxiv: 2601.05157 · v2 · pith:UBL2LPB4new · submitted 2026-01-08 · 💻 cs.DS · cs.LG· stat.ML

Learning Mixture Models via Efficient High-dimensional Sparse Fourier Transforms

Alkis Kalavasis , Pravesh K. Kothari , Shuchen Li , Manolis Zampetakis This is my paper

Pith reviewed 2026-05-22 12:15 UTC · model grok-4.3

classification 💻 cs.DS cs.LGstat.ML

keywords mixture modelssparse Fourier transformsheavy-tailed distributionsparameter learninghigh-dimensional statisticsrobust estimation

0 comments

The pith

A poly(d,k) algorithm learns parameters of heavy-tailed spherical mixture models without separation or finite moments.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a polynomial-time algorithm to recover the parameters of a mixture of k spherical distributions in d dimensions. It succeeds precisely when the component distributions have characteristic functions with sufficiently heavy tails, covering cases like the Laplace distribution that lack finite covariances. Prior moment-based methods require super-polynomial samples even for Laplace mixtures and implicitly rely on low-degree moments. The new approach uses high-dimensional sparse Fourier transforms and imposes no minimum separation between means, unlike spherical Gaussian mixtures.

Core claim

We give a poly(d,k) time and sample algorithm for learning the parameters of a mixture of k spherical distributions that succeeds whenever the cluster distributions have a characteristic function with sufficiently heavy tails. Such distributions include the Laplace distribution but exclude Gaussians. The algorithm requires no minimum separation between the cluster means and composes with existing techniques to give best-of-both-worlds guarantees for mixtures containing both heavy-tailed and sub-Gaussian components.

What carries the argument

Efficient high-dimensional sparse Fourier transforms applied to the characteristic function of the mixture to recover component parameters.

If this is right

The method handles heavy-tailed distributions without finite covariances that moment-based algorithms cannot learn efficiently.
No minimum separation between means is required, in contrast to Gaussian mixtures.
The algorithm composes with sub-Gaussian techniques to obtain combined guarantees.
It yields a consistent robust mean estimator against noise-oblivious adversaries.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The sparse Fourier transform technique may extend to other statistical estimation tasks that involve characteristic functions.
It could motivate new algorithms for robust estimation in adversarial models arising from multiple hypothesis testing.
Similar Fourier-based methods might relax separation assumptions in other mixture learning settings.

Load-bearing premise

The component distributions possess characteristic functions with sufficiently heavy tails.

What would settle it

Applying the algorithm to a mixture of Gaussians (which have light-tailed characteristic functions) and checking whether it recovers the parameters correctly in the absence of separation.

Figures

Figures reproduced from arXiv: 2601.05157 by Alkis Kalavasis, Manolis Zampetakis, Pravesh K. Kothari, Shuchen Li.

read the original abstract

In this work, we give a ${\rm poly}(d,k)$ time and sample algorithm for efficiently learning the parameters of a mixture of $k$ spherical distributions in $d$ dimensions. Unlike all previous methods, our techniques apply to heavy-tailed distributions and include examples that do not even have finite covariances. Our method succeeds whenever the cluster distributions have a characteristic function with sufficiently heavy tails. Such distributions include the Laplace distribution but crucially exclude Gaussians. All previous methods for learning mixture models relied implicitly or explicitly on the low-degree moments. Even for the case of Laplace distributions, we prove that any such algorithm must use super-polynomially many samples. Our method thus adds to the short list of techniques that bypass the limitations of the method of moments. Somewhat surprisingly, our algorithm does not require any minimum separation between the cluster means. This is in stark contrast to spherical Gaussian mixtures where a minimum $\ell_2$-separation is provably necessary even information-theoretically [Regev and Vijayaraghavan '17]. Our methods compose well with existing techniques and allow obtaining ''best of both worlds" guarantees for mixtures where every component either has a heavy-tailed characteristic function or has a sub-Gaussian tail with a light-tailed characteristic function. Our algorithm is based on a new approach to learning mixture models via efficient high-dimensional sparse Fourier transforms. We believe that this method will find more applications to statistical estimation. As an example, we give an algorithm for consistent robust mean estimation against noise-oblivious adversaries, a model practically motivated by the literature on multiple hypothesis testing. It was formally proposed in a recent Master's thesis by one of the authors, and has already inspired follow-up works.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a poly(d,k) algorithm for heavy-tailed spherical mixtures via sparse Fourier transforms, bypassing moments and separation.

read the letter

The main thing to know is that this work claims a polynomial in d and k time and sample algorithm for recovering the parameters of a mixture of k spherical distributions, and it works for heavy-tailed cases like Laplace that have no finite covariance, without needing any separation between the means. It also proves a super-polynomial sample lower bound for any moment-based method on Laplace mixtures, which sets up the motivation for the new technique. The approach uses efficient high-dimensional sparse Fourier transforms on the characteristic function and extends to a consistent robust mean estimator under noise-oblivious adversaries. They note that the method composes with existing sub-Gaussian techniques to handle mixed cases. These are concrete algorithmic contributions that open a route around the usual moment or separation assumptions. The soft spot is the quantitative tail condition on the characteristic functions. The abstract requires them to be 'sufficiently heavy,' and the stress-test note correctly flags that if the needed decay rate or the frequency support for the sparse recovery depends on d or k, or if variance control on the empirical characteristic function estimator adds hidden inverse-polynomial factors, the claimed bounds could slip. The paper would need to make the exact dependence explicit and show the discretization and sampling costs remain polynomial. Overall this is aimed at people working on high-dimensional mixture learning and robust estimation who want alternatives to moments. A reader interested in new algorithmic tools for heavy-tailed settings would find it useful. It deserves a serious referee because the claims are specific, the lower bound is a clean negative result, and the Fourier angle is distinct from prior work. I would send it out for peer review.

Referee Report

1 major / 2 minor

Summary. The manuscript claims a poly(d,k) time and sample algorithm for learning parameters of a mixture of k spherical distributions in d dimensions via efficient high-dimensional sparse Fourier transforms on characteristic functions. The algorithm applies to heavy-tailed distributions (e.g., Laplace) whose characteristic functions have sufficiently heavy tails, even when covariances are infinite, and requires no minimum separation between means. It proves a super-polynomial sample lower bound for any method-of-moments approach on Laplace mixtures and composes with sub-Gaussian techniques for mixed-tail cases; an application to noise-oblivious robust mean estimation is also given.

Significance. If the central claims hold, the work is significant for adding a Fourier-analytic technique to the short list of methods that bypass moment-based limitations in mixture learning. The ability to handle distributions without finite second moments and without separation assumptions is a clear advance over prior Gaussian-focused results, and the best-of-both-worlds composition plus the robust-mean application illustrate broader utility of the sparse-FT primitive.

major comments (1)

[Abstract and §1] Abstract and §1 (algorithm statement): the claim of poly(d,k) sample and time complexity is conditioned on the characteristic functions having 'sufficiently heavy tails,' but no explicit quantitative threshold (decay rate or minimal frequency scale) is given in terms of d and k. If the discretization grid for the high-dimensional sparse FT or the variance bound on the empirical characteristic-function estimator must be refined inversely with this rate, the polynomial dependence may fail; this is load-bearing for the main algorithmic result.

minor comments (2)

[Abstract] The lower-bound argument for moment methods on Laplace mixtures is cited but lacks a section or theorem pointer in the abstract; adding an explicit reference would improve readability.
Clarify the precise notion of 'spherical' used and how the high-dimensional sparse FT is implemented (e.g., grid size, recovery algorithm) to make the technical approach self-contained.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading of the manuscript and for highlighting its potential significance. We address the single major comment below.

read point-by-point responses

Referee: [Abstract and §1] Abstract and §1 (algorithm statement): the claim of poly(d,k) sample and time complexity is conditioned on the characteristic functions having 'sufficiently heavy tails,' but no explicit quantitative threshold (decay rate or minimal frequency scale) is given in terms of d and k. If the discretization grid for the high-dimensional sparse FT or the variance bound on the empirical characteristic-function estimator must be refined inversely with this rate, the polynomial dependence may fail; this is load-bearing for the main algorithmic result.

Authors: We agree that the abstract and introduction would benefit from an explicit quantitative statement of the heavy-tail condition to make the polynomial dependence fully transparent. In the body of the paper (Definition 2.3 and the analysis preceding Theorem 4.1), the condition is already stated quantitatively: the characteristic function φ must satisfy |φ(t)| ≥ (1 + ||t||_2^2)^{-C} for a constant C independent of d and k, over a frequency range whose radius is bounded by a polynomial in d and k. This lower bound directly controls both the radius of the sparse support (hence the discretization grid size) and the variance of the empirical characteristic-function estimator, ensuring that all parameters remain polynomial. We will revise the abstract and §1 to include a concise version of this threshold (e.g., “assuming the characteristic function satisfies a polynomial lower bound with dimension-independent exponent”). revision: yes

Circularity Check

1 steps flagged

Minor self-citation on example application; main derivation via sparse Fourier transform is self-contained

specific steps

self citation load bearing [Abstract]
"As an example, we give an algorithm for consistent robust mean estimation against noise-oblivious adversaries, a model practically motivated by the literature on multiple hypothesis testing. It was formally proposed in a recent Master's thesis by one of the authors, and has already inspired follow-up works."

The example application of robust mean estimation is attributed to a Master's thesis by one of the authors, constituting a self-citation. However, this is not load-bearing for the primary result on mixture models, which relies on a new Fourier transform approach.

full rationale

The paper presents a novel algorithmic technique based on high-dimensional sparse Fourier transforms for learning mixture models under heavy-tailed characteristic function conditions. The derivation does not reduce to fitted parameters or self-cited uniqueness theorems for the core claim. The only self-reference is to an author's thesis for an illustrative application, which does not underpin the main poly(d,k) algorithm or its analysis. The method is positioned as bypassing method-of-moments limitations with an independent construction, making the overall derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that component characteristic functions have sufficiently heavy tails; no free parameters or invented entities are mentioned in the abstract.

axioms (1)

domain assumption Component distributions have characteristic functions with sufficiently heavy tails.
The abstract states that the algorithm succeeds whenever this tail condition holds.

pith-pipeline@v0.9.0 · 5861 in / 1163 out tokens · 62339 ms · 2026-05-22T12:15:47.301573+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our method succeeds whenever the component distributions have a characteristic function with sufficiently heavy tails... Laplace... exclude Gaussians... via efficient high-dimensional sparse Fourier transforms
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

No minimum separation... poly(d,k) time and sample... SFD property

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages

[1]

good” if there is a permutation𝜋such that max 𝑗∈[𝑘]∥𝜇𝑗−b𝜇(ℓ) 𝜋(𝑗)∥2≤𝜀′ and max 𝑗∈[𝑘]|𝑤𝑗−b𝑤 𝜋(𝑗)|≤𝜀𝑤 , and we will call it “bad

Proceedings.IEEE. 2002, pp. 113–122 (cit. on pp. 1, 14). A Boosting for Mixture Models: Proof of Lemma 2.9 Proof of Lemma 2.9.We run𝑅=𝑂(log (1/𝛿))independent copies of𝐴with input accuracy𝜀 ′= min{𝜀/3,𝛾/16},and greedily perform a clustering algorithm on the set of all𝑘𝑅pairs of weights and points, denoted by𝑀. Let𝑀 𝑤 and𝑀 𝜇 denote the sets of the first (we...

work page 2002
[2]

the b𝜇𝑗 chosen in line 8 will satisfy∥b𝜇𝑗−𝜇𝜋(𝑗)∥2≤3𝜀′≤𝜀

work page
[3]

the b𝑤 𝑗 chosen in line 9 will satisfy| b𝑤 𝑗−𝑤𝜋(𝑗)|≤𝜀𝑤

work page
[4]

after line 10,𝑀 𝜇∩𝐵𝑑 3𝜀′(𝜇𝜋(𝑗))=∅

work page
[5]

When𝑗=1, the above four statements are proved as follows

after line 10, for𝑗 ′∈[𝑘]\{𝜋(𝑗′′)}𝑗′′∈[𝑗],𝑀 𝜇∩𝐵𝑑 𝜀′(𝜇𝑗′)will not be removed. When𝑗=1, the above four statements are proved as follows

work page
[6]

Hence, we can indeed pick in line 8 a b𝜇𝑗∈𝑀𝜇 that |𝑀𝜇∩𝐵𝑑 2𝜀′(b𝜇𝑗)|≥3 5 𝑅> 2 5 𝑅, and by Lemma A.2, there is a𝑗 ′∈[𝑘]that∥b𝜇𝑗−𝜇𝑗′∥2≤3𝜀′≤𝜀

From the proof of Lemma A.1, we know for each true𝜇 𝑗′,𝑗 ′∈[𝑘],|𝑀𝜇∩𝐵𝑑 𝜀′(𝜇𝑗′)|≥3 5 𝑅, and thus for some b𝜇∈𝑀𝜇,|𝑀𝜇∩𝐵𝑑 2𝜀′(b𝜇)|≥3 5 𝑅. Hence, we can indeed pick in line 8 a b𝜇𝑗∈𝑀𝜇 that |𝑀𝜇∩𝐵𝑑 2𝜀′(b𝜇𝑗)|≥3 5 𝑅> 2 5 𝑅, and by Lemma A.2, there is a𝑗 ′∈[𝑘]that∥b𝜇𝑗−𝜇𝑗′∥2≤3𝜀′≤𝜀. Such𝑗 ′will be unique, as otherwise min 𝑖≠𝑗∥𝜇𝑖−𝜇𝑗∥2≤6𝜀′< 𝛾. Let𝜋(𝑗)=𝑗′. 51

work page
[7]

Meanwhile,|𝑊|≤𝑅

Since𝐵 𝑑 𝜀′(𝜇𝜋(𝑗))⊆𝐵𝑑 4𝜀′(b𝜇𝑗), and at least3 5 𝑅points in𝐵 𝑑 𝜀′(𝜇𝜋(𝑗))are from some good rounds, the set𝑊:={ b𝑤:(b𝑤, b𝜇)∈𝑀,∥b𝜇𝑗−b𝜇∥2≤4𝜀′}contains at least 3 5 𝑅weights b𝑤that| b𝑤−𝑤𝜋(𝑗)|≤𝜀𝑤 . Meanwhile,|𝑊|≤𝑅. Otherwise, since there are at most𝑅rounds that have added the result into𝑀, there will be distinct( b𝑤′,b𝜇′),(b𝑤′′,b𝜇′′)that come from the same roun...

work page
[8]

In line 10, we remove{(b𝑤, b𝜇):∥b𝜇𝑗−b𝜇∥2≤6𝜀′}from𝑀, while every𝑥∈𝐵𝑑 3𝜀′(𝜇𝜋(𝑗))will satisfy ∥b𝜇𝑗−𝑥∥2≤∥b𝜇𝑗−𝜇𝜋(𝑗)∥2+∥𝜇𝜋(𝑗)−𝑥∥2≤6𝜀′

work page
[9]

When𝑗≥2, assume the four statements hold for all the previous iterations, the statements are proved as follows

Meanwhile, we will not remove any points in𝐵 𝑑 𝜀′(𝜇𝑗′′)for𝑗′′≠𝜋(𝑗), because otherwise if 𝑥∈𝐵𝑑 𝜀′(𝜇𝑗′′)is removed, then 𝜇𝜋(𝑗)−𝜇𝑗′′ 2≤ 𝜇𝜋(𝑗)−b𝜇𝑗 2+ b𝜇𝑗−𝑥 2+ 𝑥−𝜇𝑗′′ 2≤3𝜀′+6𝜀′+𝜀′< 𝛾, which is a contradiction. When𝑗≥2, assume the four statements hold for all the previous iterations, the statements are proved as follows

work page
[10]

Hence, we can indeed pick in line 8 a b𝜇𝑗∈𝑀𝜇 that|𝑀𝜇∩𝐵𝑑 2𝜀′(b𝜇𝑗)|≥3 5 𝑅> 2 5 𝑅, and by Lemma A.2, there is a𝑗 ′∈[𝑘]that∥b𝜇𝑗−𝜇𝑗′∥2≤3𝜀′≤𝜀

From part 4 of the induction hypothesis and the proof of Lemma A.1, for𝑗′∈[𝑘]\{𝜋(𝑗′′)}𝑗′′∈[𝑗−1], |𝑀𝜇∩𝐵𝑑 𝜀′(𝜇𝑗′)|≥3 5 𝑅, and thus there is some b𝜇∈𝑀𝜇 that|𝑀𝜇∩𝐵𝑑 2𝜀′(b𝜇)|≥3 5 𝑅. Hence, we can indeed pick in line 8 a b𝜇𝑗∈𝑀𝜇 that|𝑀𝜇∩𝐵𝑑 2𝜀′(b𝜇𝑗)|≥3 5 𝑅> 2 5 𝑅, and by Lemma A.2, there is a𝑗 ′∈[𝑘]that∥b𝜇𝑗−𝜇𝑗′∥2≤3𝜀′≤𝜀. Similarly, such𝑗′will be unique. And from pa...

work page
[11]

Since at least 3 5 𝑅points in𝐵 𝑑 𝜀′(𝜇𝜋(𝑗))are from good rounds, similarly as in the case for𝑗=1, |b𝑤 𝑗−𝑤𝜋(𝑗)|≤𝜀𝑤 . 3, 4. Also similarly, in line 10, we will remove any points b𝜇∈𝑀𝜇 that∥b𝜇−𝜇𝜋(𝑗)∥2≤3𝜀′, without removing any points in𝐵 𝑑 𝜀′(𝜇𝑗′′)for𝑗′′≠𝜋(𝑗). Therefore, we have showed that with probability 1−𝛿, Algorithm 2 outputs{(b𝑤 𝑗 ,b𝜇𝑗)}𝑗∈[𝑘]such that ...

work page
[12]

Lap(𝜇𝑗′, 𝐼)has only exp(−poly(𝑘′))mass inside𝑆 𝑗, since Laplace distributions have exponential tail and we assume the separation between the Gaussian and Laplace components𝛾 SF = poly(𝑘′). 56

work page

[1] [1]

good” if there is a permutation𝜋such that max 𝑗∈[𝑘]∥𝜇𝑗−b𝜇(ℓ) 𝜋(𝑗)∥2≤𝜀′ and max 𝑗∈[𝑘]|𝑤𝑗−b𝑤 𝜋(𝑗)|≤𝜀𝑤 , and we will call it “bad

Proceedings.IEEE. 2002, pp. 113–122 (cit. on pp. 1, 14). A Boosting for Mixture Models: Proof of Lemma 2.9 Proof of Lemma 2.9.We run𝑅=𝑂(log (1/𝛿))independent copies of𝐴with input accuracy𝜀 ′= min{𝜀/3,𝛾/16},and greedily perform a clustering algorithm on the set of all𝑘𝑅pairs of weights and points, denoted by𝑀. Let𝑀 𝑤 and𝑀 𝜇 denote the sets of the first (we...

work page 2002

[2] [2]

the b𝜇𝑗 chosen in line 8 will satisfy∥b𝜇𝑗−𝜇𝜋(𝑗)∥2≤3𝜀′≤𝜀

work page

[3] [3]

the b𝑤 𝑗 chosen in line 9 will satisfy| b𝑤 𝑗−𝑤𝜋(𝑗)|≤𝜀𝑤

work page

[4] [4]

after line 10,𝑀 𝜇∩𝐵𝑑 3𝜀′(𝜇𝜋(𝑗))=∅

work page

[5] [5]

When𝑗=1, the above four statements are proved as follows

after line 10, for𝑗 ′∈[𝑘]\{𝜋(𝑗′′)}𝑗′′∈[𝑗],𝑀 𝜇∩𝐵𝑑 𝜀′(𝜇𝑗′)will not be removed. When𝑗=1, the above four statements are proved as follows

work page

[6] [6]

Hence, we can indeed pick in line 8 a b𝜇𝑗∈𝑀𝜇 that |𝑀𝜇∩𝐵𝑑 2𝜀′(b𝜇𝑗)|≥3 5 𝑅> 2 5 𝑅, and by Lemma A.2, there is a𝑗 ′∈[𝑘]that∥b𝜇𝑗−𝜇𝑗′∥2≤3𝜀′≤𝜀

From the proof of Lemma A.1, we know for each true𝜇 𝑗′,𝑗 ′∈[𝑘],|𝑀𝜇∩𝐵𝑑 𝜀′(𝜇𝑗′)|≥3 5 𝑅, and thus for some b𝜇∈𝑀𝜇,|𝑀𝜇∩𝐵𝑑 2𝜀′(b𝜇)|≥3 5 𝑅. Hence, we can indeed pick in line 8 a b𝜇𝑗∈𝑀𝜇 that |𝑀𝜇∩𝐵𝑑 2𝜀′(b𝜇𝑗)|≥3 5 𝑅> 2 5 𝑅, and by Lemma A.2, there is a𝑗 ′∈[𝑘]that∥b𝜇𝑗−𝜇𝑗′∥2≤3𝜀′≤𝜀. Such𝑗 ′will be unique, as otherwise min 𝑖≠𝑗∥𝜇𝑖−𝜇𝑗∥2≤6𝜀′< 𝛾. Let𝜋(𝑗)=𝑗′. 51

work page

[7] [7]

Meanwhile,|𝑊|≤𝑅

Since𝐵 𝑑 𝜀′(𝜇𝜋(𝑗))⊆𝐵𝑑 4𝜀′(b𝜇𝑗), and at least3 5 𝑅points in𝐵 𝑑 𝜀′(𝜇𝜋(𝑗))are from some good rounds, the set𝑊:={ b𝑤:(b𝑤, b𝜇)∈𝑀,∥b𝜇𝑗−b𝜇∥2≤4𝜀′}contains at least 3 5 𝑅weights b𝑤that| b𝑤−𝑤𝜋(𝑗)|≤𝜀𝑤 . Meanwhile,|𝑊|≤𝑅. Otherwise, since there are at most𝑅rounds that have added the result into𝑀, there will be distinct( b𝑤′,b𝜇′),(b𝑤′′,b𝜇′′)that come from the same roun...

work page

[8] [8]

In line 10, we remove{(b𝑤, b𝜇):∥b𝜇𝑗−b𝜇∥2≤6𝜀′}from𝑀, while every𝑥∈𝐵𝑑 3𝜀′(𝜇𝜋(𝑗))will satisfy ∥b𝜇𝑗−𝑥∥2≤∥b𝜇𝑗−𝜇𝜋(𝑗)∥2+∥𝜇𝜋(𝑗)−𝑥∥2≤6𝜀′

work page

[9] [9]

When𝑗≥2, assume the four statements hold for all the previous iterations, the statements are proved as follows

Meanwhile, we will not remove any points in𝐵 𝑑 𝜀′(𝜇𝑗′′)for𝑗′′≠𝜋(𝑗), because otherwise if 𝑥∈𝐵𝑑 𝜀′(𝜇𝑗′′)is removed, then 𝜇𝜋(𝑗)−𝜇𝑗′′ 2≤ 𝜇𝜋(𝑗)−b𝜇𝑗 2+ b𝜇𝑗−𝑥 2+ 𝑥−𝜇𝑗′′ 2≤3𝜀′+6𝜀′+𝜀′< 𝛾, which is a contradiction. When𝑗≥2, assume the four statements hold for all the previous iterations, the statements are proved as follows

work page

[10] [10]

Hence, we can indeed pick in line 8 a b𝜇𝑗∈𝑀𝜇 that|𝑀𝜇∩𝐵𝑑 2𝜀′(b𝜇𝑗)|≥3 5 𝑅> 2 5 𝑅, and by Lemma A.2, there is a𝑗 ′∈[𝑘]that∥b𝜇𝑗−𝜇𝑗′∥2≤3𝜀′≤𝜀

From part 4 of the induction hypothesis and the proof of Lemma A.1, for𝑗′∈[𝑘]\{𝜋(𝑗′′)}𝑗′′∈[𝑗−1], |𝑀𝜇∩𝐵𝑑 𝜀′(𝜇𝑗′)|≥3 5 𝑅, and thus there is some b𝜇∈𝑀𝜇 that|𝑀𝜇∩𝐵𝑑 2𝜀′(b𝜇)|≥3 5 𝑅. Hence, we can indeed pick in line 8 a b𝜇𝑗∈𝑀𝜇 that|𝑀𝜇∩𝐵𝑑 2𝜀′(b𝜇𝑗)|≥3 5 𝑅> 2 5 𝑅, and by Lemma A.2, there is a𝑗 ′∈[𝑘]that∥b𝜇𝑗−𝜇𝑗′∥2≤3𝜀′≤𝜀. Similarly, such𝑗′will be unique. And from pa...

work page

[11] [11]

Since at least 3 5 𝑅points in𝐵 𝑑 𝜀′(𝜇𝜋(𝑗))are from good rounds, similarly as in the case for𝑗=1, |b𝑤 𝑗−𝑤𝜋(𝑗)|≤𝜀𝑤 . 3, 4. Also similarly, in line 10, we will remove any points b𝜇∈𝑀𝜇 that∥b𝜇−𝜇𝜋(𝑗)∥2≤3𝜀′, without removing any points in𝐵 𝑑 𝜀′(𝜇𝑗′′)for𝑗′′≠𝜋(𝑗). Therefore, we have showed that with probability 1−𝛿, Algorithm 2 outputs{(b𝑤 𝑗 ,b𝜇𝑗)}𝑗∈[𝑘]such that ...

work page

[12] [12]

Lap(𝜇𝑗′, 𝐼)has only exp(−poly(𝑘′))mass inside𝑆 𝑗, since Laplace distributions have exponential tail and we assume the separation between the Gaussian and Laplace components𝛾 SF = poly(𝑘′). 56

work page