Can we spot a fake?

Grigoris Paouris; Roman Vershynin; Shahar Mendelson

arxiv: 2410.18880 · v2 · submitted 2024-10-24 · 🧮 math.ST · math.PR· stat.TH

Can we spot a fake?

Shahar Mendelson , Grigoris Paouris , Roman Vershynin This is my paper

Pith reviewed 2026-05-23 19:21 UTC · model grok-4.3

classification 🧮 math.ST math.PRstat.TH

keywords detectability radiusGaussian widthadversarial corruptionfake data detectionhigh-dimensional statisticshypothesis testing

0 comments

The pith

For symmetric trick sets, the largest undetectable data corruption radius equals twice the scaled Gaussian width.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies the largest radius r at which an adversary can add a corruption vector from a fixed set T to a standard Gaussian vector X so that the result remains statistically indistinguishable from clean data. It proves that when T is highly symmetric the critical radius r(T) is approximately twice the scaled Gaussian width of T. The matching upper bound on r(T) holds for arbitrary sets T and extends to non-Gaussian source distributions, while the lower bound requires symmetry and leads to a conjecture involving a focused version of the Gaussian width.

Core claim

For highly symmetric sets T the detectability radius r(T) is approximately twice the scaled Gaussian width of T; the upper bound holds for arbitrary T and generalizes to arbitrary non-Gaussian distributions of the real data X.

What carries the argument

The detectability radius r(T), the largest r such that X + r t(X) is indistinguishable from X for any choice of t from T.

If this is right

The upper bound on the undetectable radius applies to every fixed set T.
The same upper bound continues to hold when the clean data X follows any distribution with sufficiently light tails instead of the standard Gaussian.
For sets that lack high symmetry the lower bound can fail, but a focused Gaussian width that emphasizes the most important directions may restore the two-sided characterization.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The result supplies a concrete geometric test for whether a given collection of possible corruptions can be hidden inside Gaussian noise.
The same radius calculation may bound the power of any statistical test that tries to detect low-dimensional adversarial perturbations without knowing T in advance.

Load-bearing premise

The lower bound on the radius requires the set T to be highly symmetric.

What would settle it

An explicit computation or simulation for a concrete non-symmetric set T showing that the smallest undetectable radius differs from twice the scaled Gaussian width by more than a constant factor.

read the original abstract

The problem of detecting fake data inspires the following seemingly simple mathematical question. Sample a data point $X$ from the standard normal distribution in $\mathbb{R}^n$. An adversary observes $X$ and corrupts it by adding a vector $rt$, where they can choose any vector $t$ from a fixed set $T$ of the adversary's ``tricks'', and where $r>0$ is a fixed radius. The adversary's choice of $t=t(X)$ may depend on the true data $X$. The adversary wants to hide the corruption by making the fake data $X+rt$ statistically indistinguishable from the real data $X$. What is the largest radius $r=r(T)$ for which the adversary can create an undetectable fake? We show that for highly symmetric sets $T$, the detectability radius $r(T)$ is approximately twice the scaled Gaussian width of $T$. The upper bound actually holds for arbitrary sets $T$ and generalizes to arbitrary, non-Gaussian distributions of real data $X$. The lower bound may fail for not highly symmetric $T$, but we conjecture that this problem can be solved by considering the focused version of the Gaussian width of $T$, which focuses on the most important directions of $T$.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper cleanly links the undetectable corruption radius to twice the scaled Gaussian width for symmetric T, with a general upper bound that extends to non-Gaussian data.

read the letter

The main point is that this paper defines the detectability radius r(T) as the largest corruption level an adversary can hide, and shows it equals roughly twice the scaled Gaussian width when T is highly symmetric. The upper bound holds for arbitrary T and carries over to non-Gaussian real data X. That separation of claims is stated clearly in the abstract. The new element is the problem setup itself, which turns fake-data detection into a precise question about this radius and its geometric threshold. The general upper bound is the part that stands on its own and broadens the result beyond the usual Gaussian setting. The softer spot is the lower bound, which requires high symmetry on T; for general T the paper only conjectures that a focused width version works. This is not hidden or circular, just a limitation the abstract flags up front. The math uses standard high-dimensional probability tools and the citation pattern fits the topic without obvious gaps. Readers working on robust statistics or geometric functional analysis would get value from the upper bound and the open conjecture. The work is coherent enough on its own terms to deserve referee time rather than a desk reject, even if the lower bound needs more development.

Referee Report

1 major / 0 minor

Summary. The manuscript studies the largest radius r=r(T) such that an adversary, given X~N(0,I_n), can choose t(X) in a fixed set T and form the corrupted vector X+rt that remains statistically indistinguishable from X. It establishes that for highly symmetric T the detectability radius satisfies r(T) approximately equal to twice the scaled Gaussian width of T. An upper bound on r(T) is proved for arbitrary T and is shown to extend to non-Gaussian distributions of the real data X; a matching lower bound is obtained only under the high-symmetry assumption, with a conjecture offered for the general case via a focused Gaussian width.

Significance. If the stated bounds hold, the work supplies a clean geometric characterization of an adversarial detectability threshold in terms of Gaussian width, a quantity already central to high-dimensional probability and convex geometry. The generality of the upper bound (arbitrary T, non-Gaussian X) and the explicit partitioning of the claim into a proved upper bound versus a symmetry-dependent lower bound are strengths. The conjecture concerning focused width identifies a concrete direction for subsequent research.

major comments (1)

[Abstract] The abstract and reader's summary indicate that the central claim equates r(T) to twice the scaled Gaussian width only for highly symmetric T, yet the precise definitions of both 'scaled Gaussian width' and 'highly symmetric' are not supplied in the available text. Without these definitions and the supporting proof details, the quantitative factor of 'approximately twice' cannot be verified as load-bearing for the stated result.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review and for highlighting the need for clarity on key definitions. We address the single major comment below. The full manuscript provides the requested definitions and proofs in the body; the abstract is kept concise per standard practice.

read point-by-point responses

Referee: [Abstract] The abstract and reader's summary indicate that the central claim equates r(T) to twice the scaled Gaussian width only for highly symmetric T, yet the precise definitions of both 'scaled Gaussian width' and 'highly symmetric' are not supplied in the available text. Without these definitions and the supporting proof details, the quantitative factor of 'approximately twice' cannot be verified as load-bearing for the stated result.

Authors: The abstract is intentionally brief and does not repeat formal definitions. 'Highly symmetric' is defined in Definition 2.4 as the class of sets T that are invariant under arbitrary coordinate sign flips and permutations (i.e., the orthogonal group generated by signed permutation matrices leaves T invariant). The scaled Gaussian width appears in Section 2.2 as w(T)/sqrt(n), where w(T) := E[sup_{t in T} <g, t>] for g ~ N(0,I_n). The factor of approximately two is load-bearing and is proved as follows: the general upper bound (Theorem 3.1) shows r(T) <= 2 * (scaled width) + o(1) for arbitrary T (and extends to non-Gaussian X); the matching lower bound (Theorem 5.3) holds precisely when T is highly symmetric and uses a symmetry-based coupling argument to show that the adversary can achieve r(T) >= 2 * (scaled width) - o(1). Full proof details occupy Sections 3-5. We are happy to insert a one-sentence pointer to these definitions into the abstract if the referee prefers. revision: partial

Circularity Check

0 steps flagged

Derivation self-contained with no circular steps

full rationale

The paper establishes an upper bound on the detectability radius r(T) that holds for arbitrary sets T via general concentration inequalities applicable to non-Gaussian data, while the matching lower bound is restricted to highly symmetric T with a conjecture for the general case using focused width. No load-bearing step reduces by definition, fitted parameter, or self-citation chain to its own inputs; the claimed relations follow from standard high-dimensional probability tools without circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper builds on standard tools from high-dimensional probability without introducing new free parameters or entities.

axioms (2)

standard math Standard properties of the Gaussian distribution in high dimensions
The data X is sampled from standard normal, and results rely on concentration and width properties.
standard math Existence of Gaussian width as a well-defined geometric measure
The result is expressed in terms of this quantity.

pith-pipeline@v0.9.0 · 5755 in / 1196 out tokens · 26953 ms · 2026-05-23T19:21:38.841946+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

On Talagrand's Convexity Conjecture
math.PR 2026-05 unverdicted novelty 8.0

Any centered 1-subgaussian random vector equals the sum of a universal number of standard Gaussians, solving Talagrand's convexity conjecture.

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages · cited by 1 Pith paper

[1]

Addario-Berry, N

L. Addario-Berry, N. Broutin, L. Devroye, G. Lugosi, On combinatorial testing prob- lems, Annals of Statistics 38 (2010), 3063–3092

work page 2010
[2]

Arias-Castro, E

E. Arias-Castro, E. Candes, H. Helgason, O. Zeitouni, Searching for a trail of evidence in a maze, Annals of Statistics 36 (2008), 1726–1757

work page 2008
[3]

Arias-Castro, E

E. Arias-Castro, E. Cand` es, A. Durand, Detection of an anomalous cluster in a network, Annals of Statistics 39 (2011), 278–304

work page 2011
[4]

Arias-Castro, E

E. Arias-Castro, E. Candes, Y. Plan, Global testing under sparse alternatives: ANOV A, multiple comparisons and the higher criticism, Annals of Statistics 39 (2011), 2533–2556

work page 2011
[5]

Artstein-Avidan, A

S. Artstein-Avidan, A. Giannopoulos, V. Milman, Asymptotic Geometric Analysis, Part I. Mathematical Surveys and Monographs, 2015

work page 2015
[6]

Artstein-Avidan, A

S. Artstein-Avidan, A. Giannopoulos, V. Milman, Asymptotic Geometric Analysis, Part II. American Mathematical Society, 2021

work page 2021
[7]

Baraud, Non-asymptotic minimax rates of testing in signal detectio n, Bernoulli 8 (2002), 577–606

Y. Baraud, Non-asymptotic minimax rates of testing in signal detectio n, Bernoulli 8 (2002), 577–606

work page 2002
[8]

Boucheron, G

S. Boucheron, G. Lugosi, P. Massart, Concentration Inequalities, A nonasymptotic theory of independence. Clarendon press, Oxford 2012

work page 2012
[9]

T. Cai, J. Jin, M. Low, Estimation and conﬁdence sets for sparse normal mixtures, Annals of Statistics 35 (2007), 2421–2449

work page 2007
[10]

Carpentier, O

A. Carpentier, O. Collier, L. Comminges, A. Tsybakov, Y . Wang, Minimax rate of testing in sparse linear regression, Automation and Remote Control 80 (2019), 1817– 1834

work page 2019
[11]

Donoho, J

D. Donoho, J. Jin, Higher criticism for detecting sparse heterogeneous mixtu res, An- nals of Statistics 32 (2004), 962–994

work page 2004
[12]

Donoho, J

D. Donoho, J. Jin, Higher criticism thresholding: Optimal feature selection when useful features are rare and weak, Proc. Natl. Acad. Sci. USA 105 (2008), 14790– 14795

work page 2008
[13]

Donoho, J

D. Donoho, J. Jin, Feature selection by higher criticism thresholding achiev es the optimal phase diagram, Philos. Trans. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 367 (2009), 4449–4470

work page 2009
[14]

P. Hall, J. Jin, Innovated higher criticism for detecting sparse signals in correlated noise, Ann. Statist. 38 (2010), 1686–1732. 16 SHAHAR MENDELSON, GRIGORIS PAOURIS, AND ROMAN VERSHYNIN

work page 2010
[15]

Ingster, Minimax detection of a signal in ℓp metrics, Journal of Mathematical Sciences 68 (1994), 503–515

Yu. Ingster, Minimax detection of a signal in ℓp metrics, Journal of Mathematical Sciences 68 (1994), 503–515

work page 1994
[16]

Ingster, Adaptive detection of a signal of growing dimension, I, II, Math

Y. Ingster, Adaptive detection of a signal of growing dimension, I, II, Math. Methods Statist. 10 (2002), 395–421

work page 2002
[17]

Ingster, C

Y. Ingster, C. Pouet, A. Tsybakov, Classiﬁcation of sparse high-dimensional vectors, Philosophical Transactions: Mathematical, Physical and E ngineering Sciences 367 (2009), 4427–4448

work page 2009
[18]

Ingster, A

Y. Ingster, A. Tsybakov, N. Verzelen, Detection boundary in sparse regression, Elec- tronic Journal of Statistics 4 (2010), 1476–1526

work page 2010
[19]

Mukherjee, S

R. Mukherjee, S. Sen, On minimax exponents of sparse testing, preprint (2020)

work page 2020
[20]

Smirnov, Gaussian volume bounds under hypercube translations and ge neraliza- tions, preprint (2024)

G. Smirnov, Gaussian volume bounds under hypercube translations and ge neraliza- tions, preprint (2024)

work page 2024
[21]

Talagrand, A new look at independence, The Annals of probability (1996), 1–34

M. Talagrand, A new look at independence, The Annals of probability (1996), 1–34

work page 1996
[22]

Tukey, J. W. (1976). T13 N: The higher criticism. Course Notes, Statistics 411, Princeton Univ

work page 1976
[23]

Vershynin, High dimensional probability

R. Vershynin, High dimensional probability. An introduction with applic ations in Data Science. Cambridge University Press, 2018. The Australian National University Email address : shahar.mendelson@anu.edu.au Texas A&M University and Princeton University Email address : grigoris@tamu.edu University of California, Irvine Email address : rvershyn@uci.edu

work page 2018

[1] [1]

Addario-Berry, N

L. Addario-Berry, N. Broutin, L. Devroye, G. Lugosi, On combinatorial testing prob- lems, Annals of Statistics 38 (2010), 3063–3092

work page 2010

[2] [2]

Arias-Castro, E

E. Arias-Castro, E. Candes, H. Helgason, O. Zeitouni, Searching for a trail of evidence in a maze, Annals of Statistics 36 (2008), 1726–1757

work page 2008

[3] [3]

Arias-Castro, E

E. Arias-Castro, E. Cand` es, A. Durand, Detection of an anomalous cluster in a network, Annals of Statistics 39 (2011), 278–304

work page 2011

[4] [4]

Arias-Castro, E

E. Arias-Castro, E. Candes, Y. Plan, Global testing under sparse alternatives: ANOV A, multiple comparisons and the higher criticism, Annals of Statistics 39 (2011), 2533–2556

work page 2011

[5] [5]

Artstein-Avidan, A

S. Artstein-Avidan, A. Giannopoulos, V. Milman, Asymptotic Geometric Analysis, Part I. Mathematical Surveys and Monographs, 2015

work page 2015

[6] [6]

Artstein-Avidan, A

S. Artstein-Avidan, A. Giannopoulos, V. Milman, Asymptotic Geometric Analysis, Part II. American Mathematical Society, 2021

work page 2021

[7] [7]

Baraud, Non-asymptotic minimax rates of testing in signal detectio n, Bernoulli 8 (2002), 577–606

Y. Baraud, Non-asymptotic minimax rates of testing in signal detectio n, Bernoulli 8 (2002), 577–606

work page 2002

[8] [8]

Boucheron, G

S. Boucheron, G. Lugosi, P. Massart, Concentration Inequalities, A nonasymptotic theory of independence. Clarendon press, Oxford 2012

work page 2012

[9] [9]

T. Cai, J. Jin, M. Low, Estimation and conﬁdence sets for sparse normal mixtures, Annals of Statistics 35 (2007), 2421–2449

work page 2007

[10] [10]

Carpentier, O

A. Carpentier, O. Collier, L. Comminges, A. Tsybakov, Y . Wang, Minimax rate of testing in sparse linear regression, Automation and Remote Control 80 (2019), 1817– 1834

work page 2019

[11] [11]

Donoho, J

D. Donoho, J. Jin, Higher criticism for detecting sparse heterogeneous mixtu res, An- nals of Statistics 32 (2004), 962–994

work page 2004

[12] [12]

Donoho, J

D. Donoho, J. Jin, Higher criticism thresholding: Optimal feature selection when useful features are rare and weak, Proc. Natl. Acad. Sci. USA 105 (2008), 14790– 14795

work page 2008

[13] [13]

Donoho, J

D. Donoho, J. Jin, Feature selection by higher criticism thresholding achiev es the optimal phase diagram, Philos. Trans. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 367 (2009), 4449–4470

work page 2009

[14] [14]

P. Hall, J. Jin, Innovated higher criticism for detecting sparse signals in correlated noise, Ann. Statist. 38 (2010), 1686–1732. 16 SHAHAR MENDELSON, GRIGORIS PAOURIS, AND ROMAN VERSHYNIN

work page 2010

[15] [15]

Ingster, Minimax detection of a signal in ℓp metrics, Journal of Mathematical Sciences 68 (1994), 503–515

Yu. Ingster, Minimax detection of a signal in ℓp metrics, Journal of Mathematical Sciences 68 (1994), 503–515

work page 1994

[16] [16]

Ingster, Adaptive detection of a signal of growing dimension, I, II, Math

Y. Ingster, Adaptive detection of a signal of growing dimension, I, II, Math. Methods Statist. 10 (2002), 395–421

work page 2002

[17] [17]

Ingster, C

Y. Ingster, C. Pouet, A. Tsybakov, Classiﬁcation of sparse high-dimensional vectors, Philosophical Transactions: Mathematical, Physical and E ngineering Sciences 367 (2009), 4427–4448

work page 2009

[18] [18]

Ingster, A

Y. Ingster, A. Tsybakov, N. Verzelen, Detection boundary in sparse regression, Elec- tronic Journal of Statistics 4 (2010), 1476–1526

work page 2010

[19] [19]

Mukherjee, S

R. Mukherjee, S. Sen, On minimax exponents of sparse testing, preprint (2020)

work page 2020

[20] [20]

Smirnov, Gaussian volume bounds under hypercube translations and ge neraliza- tions, preprint (2024)

G. Smirnov, Gaussian volume bounds under hypercube translations and ge neraliza- tions, preprint (2024)

work page 2024

[21] [21]

Talagrand, A new look at independence, The Annals of probability (1996), 1–34

M. Talagrand, A new look at independence, The Annals of probability (1996), 1–34

work page 1996

[22] [22]

Tukey, J. W. (1976). T13 N: The higher criticism. Course Notes, Statistics 411, Princeton Univ

work page 1976

[23] [23]

Vershynin, High dimensional probability

R. Vershynin, High dimensional probability. An introduction with applic ations in Data Science. Cambridge University Press, 2018. The Australian National University Email address : shahar.mendelson@anu.edu.au Texas A&M University and Princeton University Email address : grigoris@tamu.edu University of California, Irvine Email address : rvershyn@uci.edu

work page 2018