pith. sign in

arxiv: 2410.18880 · v2 · submitted 2024-10-24 · 🧮 math.ST · math.PR· stat.TH

Can we spot a fake?

Pith reviewed 2026-05-23 19:21 UTC · model grok-4.3

classification 🧮 math.ST math.PRstat.TH
keywords detectability radiusGaussian widthadversarial corruptionfake data detectionhigh-dimensional statisticshypothesis testing
0
0 comments X

The pith

For symmetric trick sets, the largest undetectable data corruption radius equals twice the scaled Gaussian width.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies the largest radius r at which an adversary can add a corruption vector from a fixed set T to a standard Gaussian vector X so that the result remains statistically indistinguishable from clean data. It proves that when T is highly symmetric the critical radius r(T) is approximately twice the scaled Gaussian width of T. The matching upper bound on r(T) holds for arbitrary sets T and extends to non-Gaussian source distributions, while the lower bound requires symmetry and leads to a conjecture involving a focused version of the Gaussian width.

Core claim

For highly symmetric sets T the detectability radius r(T) is approximately twice the scaled Gaussian width of T; the upper bound holds for arbitrary T and generalizes to arbitrary non-Gaussian distributions of the real data X.

What carries the argument

The detectability radius r(T), the largest r such that X + r t(X) is indistinguishable from X for any choice of t from T.

If this is right

  • The upper bound on the undetectable radius applies to every fixed set T.
  • The same upper bound continues to hold when the clean data X follows any distribution with sufficiently light tails instead of the standard Gaussian.
  • For sets that lack high symmetry the lower bound can fail, but a focused Gaussian width that emphasizes the most important directions may restore the two-sided characterization.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The result supplies a concrete geometric test for whether a given collection of possible corruptions can be hidden inside Gaussian noise.
  • The same radius calculation may bound the power of any statistical test that tries to detect low-dimensional adversarial perturbations without knowing T in advance.

Load-bearing premise

The lower bound on the radius requires the set T to be highly symmetric.

What would settle it

An explicit computation or simulation for a concrete non-symmetric set T showing that the smallest undetectable radius differs from twice the scaled Gaussian width by more than a constant factor.

read the original abstract

The problem of detecting fake data inspires the following seemingly simple mathematical question. Sample a data point $X$ from the standard normal distribution in $\mathbb{R}^n$. An adversary observes $X$ and corrupts it by adding a vector $rt$, where they can choose any vector $t$ from a fixed set $T$ of the adversary's ``tricks'', and where $r>0$ is a fixed radius. The adversary's choice of $t=t(X)$ may depend on the true data $X$. The adversary wants to hide the corruption by making the fake data $X+rt$ statistically indistinguishable from the real data $X$. What is the largest radius $r=r(T)$ for which the adversary can create an undetectable fake? We show that for highly symmetric sets $T$, the detectability radius $r(T)$ is approximately twice the scaled Gaussian width of $T$. The upper bound actually holds for arbitrary sets $T$ and generalizes to arbitrary, non-Gaussian distributions of real data $X$. The lower bound may fail for not highly symmetric $T$, but we conjecture that this problem can be solved by considering the focused version of the Gaussian width of $T$, which focuses on the most important directions of $T$.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript studies the largest radius r=r(T) such that an adversary, given X~N(0,I_n), can choose t(X) in a fixed set T and form the corrupted vector X+rt that remains statistically indistinguishable from X. It establishes that for highly symmetric T the detectability radius satisfies r(T) approximately equal to twice the scaled Gaussian width of T. An upper bound on r(T) is proved for arbitrary T and is shown to extend to non-Gaussian distributions of the real data X; a matching lower bound is obtained only under the high-symmetry assumption, with a conjecture offered for the general case via a focused Gaussian width.

Significance. If the stated bounds hold, the work supplies a clean geometric characterization of an adversarial detectability threshold in terms of Gaussian width, a quantity already central to high-dimensional probability and convex geometry. The generality of the upper bound (arbitrary T, non-Gaussian X) and the explicit partitioning of the claim into a proved upper bound versus a symmetry-dependent lower bound are strengths. The conjecture concerning focused width identifies a concrete direction for subsequent research.

major comments (1)
  1. [Abstract] The abstract and reader's summary indicate that the central claim equates r(T) to twice the scaled Gaussian width only for highly symmetric T, yet the precise definitions of both 'scaled Gaussian width' and 'highly symmetric' are not supplied in the available text. Without these definitions and the supporting proof details, the quantitative factor of 'approximately twice' cannot be verified as load-bearing for the stated result.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review and for highlighting the need for clarity on key definitions. We address the single major comment below. The full manuscript provides the requested definitions and proofs in the body; the abstract is kept concise per standard practice.

read point-by-point responses
  1. Referee: [Abstract] The abstract and reader's summary indicate that the central claim equates r(T) to twice the scaled Gaussian width only for highly symmetric T, yet the precise definitions of both 'scaled Gaussian width' and 'highly symmetric' are not supplied in the available text. Without these definitions and the supporting proof details, the quantitative factor of 'approximately twice' cannot be verified as load-bearing for the stated result.

    Authors: The abstract is intentionally brief and does not repeat formal definitions. 'Highly symmetric' is defined in Definition 2.4 as the class of sets T that are invariant under arbitrary coordinate sign flips and permutations (i.e., the orthogonal group generated by signed permutation matrices leaves T invariant). The scaled Gaussian width appears in Section 2.2 as w(T)/sqrt(n), where w(T) := E[sup_{t in T} <g, t>] for g ~ N(0,I_n). The factor of approximately two is load-bearing and is proved as follows: the general upper bound (Theorem 3.1) shows r(T) <= 2 * (scaled width) + o(1) for arbitrary T (and extends to non-Gaussian X); the matching lower bound (Theorem 5.3) holds precisely when T is highly symmetric and uses a symmetry-based coupling argument to show that the adversary can achieve r(T) >= 2 * (scaled width) - o(1). Full proof details occupy Sections 3-5. We are happy to insert a one-sentence pointer to these definitions into the abstract if the referee prefers. revision: partial

Circularity Check

0 steps flagged

Derivation self-contained with no circular steps

full rationale

The paper establishes an upper bound on the detectability radius r(T) that holds for arbitrary sets T via general concentration inequalities applicable to non-Gaussian data, while the matching lower bound is restricted to highly symmetric T with a conjecture for the general case using focused width. No load-bearing step reduces by definition, fitted parameter, or self-citation chain to its own inputs; the claimed relations follow from standard high-dimensional probability tools without circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper builds on standard tools from high-dimensional probability without introducing new free parameters or entities.

axioms (2)
  • standard math Standard properties of the Gaussian distribution in high dimensions
    The data X is sampled from standard normal, and results rely on concentration and width properties.
  • standard math Existence of Gaussian width as a well-defined geometric measure
    The result is expressed in terms of this quantity.

pith-pipeline@v0.9.0 · 5755 in / 1196 out tokens · 26953 ms · 2026-05-23T19:21:38.841946+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. On Talagrand's Convexity Conjecture

    math.PR 2026-05 unverdicted novelty 8.0

    Any centered 1-subgaussian random vector equals the sum of a universal number of standard Gaussians, solving Talagrand's convexity conjecture.

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages · cited by 1 Pith paper

  1. [1]

    Addario-Berry, N

    L. Addario-Berry, N. Broutin, L. Devroye, G. Lugosi, On combinatorial testing prob- lems, Annals of Statistics 38 (2010), 3063–3092

  2. [2]

    Arias-Castro, E

    E. Arias-Castro, E. Candes, H. Helgason, O. Zeitouni, Searching for a trail of evidence in a maze, Annals of Statistics 36 (2008), 1726–1757

  3. [3]

    Arias-Castro, E

    E. Arias-Castro, E. Cand` es, A. Durand, Detection of an anomalous cluster in a network, Annals of Statistics 39 (2011), 278–304

  4. [4]

    Arias-Castro, E

    E. Arias-Castro, E. Candes, Y. Plan, Global testing under sparse alternatives: ANOV A, multiple comparisons and the higher criticism, Annals of Statistics 39 (2011), 2533–2556

  5. [5]

    Artstein-Avidan, A

    S. Artstein-Avidan, A. Giannopoulos, V. Milman, Asymptotic Geometric Analysis, Part I. Mathematical Surveys and Monographs, 2015

  6. [6]

    Artstein-Avidan, A

    S. Artstein-Avidan, A. Giannopoulos, V. Milman, Asymptotic Geometric Analysis, Part II. American Mathematical Society, 2021

  7. [7]

    Baraud, Non-asymptotic minimax rates of testing in signal detectio n, Bernoulli 8 (2002), 577–606

    Y. Baraud, Non-asymptotic minimax rates of testing in signal detectio n, Bernoulli 8 (2002), 577–606

  8. [8]

    Boucheron, G

    S. Boucheron, G. Lugosi, P. Massart, Concentration Inequalities, A nonasymptotic theory of independence. Clarendon press, Oxford 2012

  9. [9]

    T. Cai, J. Jin, M. Low, Estimation and confidence sets for sparse normal mixtures, Annals of Statistics 35 (2007), 2421–2449

  10. [10]

    Carpentier, O

    A. Carpentier, O. Collier, L. Comminges, A. Tsybakov, Y . Wang, Minimax rate of testing in sparse linear regression, Automation and Remote Control 80 (2019), 1817– 1834

  11. [11]

    Donoho, J

    D. Donoho, J. Jin, Higher criticism for detecting sparse heterogeneous mixtu res, An- nals of Statistics 32 (2004), 962–994

  12. [12]

    Donoho, J

    D. Donoho, J. Jin, Higher criticism thresholding: Optimal feature selection when useful features are rare and weak, Proc. Natl. Acad. Sci. USA 105 (2008), 14790– 14795

  13. [13]

    Donoho, J

    D. Donoho, J. Jin, Feature selection by higher criticism thresholding achiev es the optimal phase diagram, Philos. Trans. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 367 (2009), 4449–4470

  14. [14]

    P. Hall, J. Jin, Innovated higher criticism for detecting sparse signals in correlated noise, Ann. Statist. 38 (2010), 1686–1732. 16 SHAHAR MENDELSON, GRIGORIS PAOURIS, AND ROMAN VERSHYNIN

  15. [15]

    Ingster, Minimax detection of a signal in ℓp metrics, Journal of Mathematical Sciences 68 (1994), 503–515

    Yu. Ingster, Minimax detection of a signal in ℓp metrics, Journal of Mathematical Sciences 68 (1994), 503–515

  16. [16]

    Ingster, Adaptive detection of a signal of growing dimension, I, II, Math

    Y. Ingster, Adaptive detection of a signal of growing dimension, I, II, Math. Methods Statist. 10 (2002), 395–421

  17. [17]

    Ingster, C

    Y. Ingster, C. Pouet, A. Tsybakov, Classification of sparse high-dimensional vectors, Philosophical Transactions: Mathematical, Physical and E ngineering Sciences 367 (2009), 4427–4448

  18. [18]

    Ingster, A

    Y. Ingster, A. Tsybakov, N. Verzelen, Detection boundary in sparse regression, Elec- tronic Journal of Statistics 4 (2010), 1476–1526

  19. [19]

    Mukherjee, S

    R. Mukherjee, S. Sen, On minimax exponents of sparse testing, preprint (2020)

  20. [20]

    Smirnov, Gaussian volume bounds under hypercube translations and ge neraliza- tions, preprint (2024)

    G. Smirnov, Gaussian volume bounds under hypercube translations and ge neraliza- tions, preprint (2024)

  21. [21]

    Talagrand, A new look at independence, The Annals of probability (1996), 1–34

    M. Talagrand, A new look at independence, The Annals of probability (1996), 1–34

  22. [22]

    Tukey, J. W. (1976). T13 N: The higher criticism. Course Notes, Statistics 411, Princeton Univ

  23. [23]

    Vershynin, High dimensional probability

    R. Vershynin, High dimensional probability. An introduction with applic ations in Data Science. Cambridge University Press, 2018. The Australian National University Email address : shahar.mendelson@anu.edu.au Texas A&M University and Princeton University Email address : grigoris@tamu.edu University of California, Irvine Email address : rvershyn@uci.edu