pith. sign in

arxiv: 1907.05401 · v1 · pith:3V2RGOJ6new · submitted 2019-07-11 · 💻 cs.DS · cs.CC· cs.CG· cs.CR· cs.LG

Computational Concentration of Measure: Optimal Bounds, Reductions, and More

Pith reviewed 2026-05-24 22:40 UTC · model grok-4.3

classification 💻 cs.DS cs.CCcs.CGcs.CRcs.LG
keywords computational concentration of measureproduct measuresHamming distancemembership oraclepoisoning attacksevasion attacksMcDiarmid inequalitycoin tossing protocols
0
0 comments X

The pith

Given a random point and an S-membership oracle, a polynomial-time algorithm finds a point in S differing in O(sqrt(n ln(1/(eps delta)))) coordinates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a computational version of concentration of measure for product probability spaces. Any set S with probability epsilon is known to have a point within Hamming distance O(sqrt(n ln(1/(eps delta)))) of a random sample with high probability. The new result shows this nearby point can be found in polynomial time by querying an oracle that reveals only whether candidate points belong to S. The construction yields direct algorithmic attacks on learning procedures when the original failure probability is non-negligible.

Core claim

There exists a polynomial-time algorithm that, given a random point and oracle access to membership in S, outputs a point in S that differs from the input in only O(sqrt(n ln(1/(eps delta)))) coordinates with high probability. The algorithm, called MUCIO, processes coordinates sequentially and decides each change according to a multiplicative version of the coordinate's conditional influence given prior decisions.

What carries the argument

MUCIO (MUltiplicative Conditional Influence Optimizer), which updates each coordinate by a multiplicative conditional-influence test computed from the membership oracle answers on previously decided coordinates.

If this is right

  • Polynomial-time poisoning attacks on learning algorithms become possible whenever the underlying vulnerability has any cryptographically non-negligible probability.
  • Polynomial-time evasion attacks are obtained in settings where the original distributional vulnerability is non-negligible.
  • An algorithmic version of McDiarmid's inequality holds, giving efficient concentration around the mean.
  • New tampering algorithms exist for collective coin-tossing protocols via the generalization to discrete random processes.
  • Computational concentration of measure holds for high-dimensional Gaussians under the l1 metric through an algorithmic reduction between metric probability spaces.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same oracle-driven search technique may transfer to other product-like spaces where membership testing is cheap but explicit construction of close points is hard.
  • The exponential lower bounds for non-adaptive queries indicate that sequential adaptivity is essential for the polynomial runtime.
  • The reduction framework could be used to obtain computational concentration results in additional continuous or weighted settings beyond the ones shown.
  • Concrete implementations on standard learning tasks would allow direct measurement of attack success rates under the derived distance bounds.

Load-bearing premise

The underlying probability space must be a product measure and an efficient membership oracle for the target set S must exist.

What would settle it

An explicit product distribution and small set S for which every polynomial-time adaptive oracle algorithm fails to output a point in S within the stated Hamming distance on a non-negligible fraction of random inputs.

read the original abstract

Product measures of dimension $n$ are known to be concentrated in Hamming distance: for any set $S$ in the product space of probability $\epsilon$, a random point in the space, with probability $1-\delta$, has a neighbor in $S$ that is different from the original point in only $O(\sqrt{n\ln(1/(\epsilon\delta))})$ coordinates. We obtain the tight computational version of this result, showing how given a random point and access to an $S$-membership oracle, we can find such a close point in polynomial time. This resolves an open question of [Mahloujifar and Mahmoody, ALT 2019]. As corollaries, we obtain polynomial-time poisoning and (in certain settings) evasion attacks against learning algorithms when the original vulnerabilities have any cryptographically non-negligible probability. We call our algorithm MUCIO ("MUltiplicative Conditional Influence Optimizer") since proceeding through the coordinates, it decides to change each coordinate of the given point based on a multiplicative version of the influence of that coordinate, where influence is computed conditioned on previously updated coordinates. We also define a new notion of algorithmic reduction between computational concentration of measure in different metric probability spaces. As an application, we get computational concentration of measure for high-dimensional Gaussian distributions under the $\ell_1$ metric. We prove several extensions to the results above: (1) Our computational concentration result is also true when the Hamming distance is weighted. (2) We obtain an algorithmic version of concentration around mean, more specifically, McDiarmid's inequality. (3) Our result generalizes to discrete random processes, and this leads to new tampering algorithms for collective coin tossing protocols. (4) We prove exponential lower bounds on the average running time of non-adaptive query algorithms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims a polynomial-time algorithm MUCIO that, given a random point x and membership oracle for a set S of measure ε in an n-dimensional product space, outputs a point in S at Hamming distance O(√(n ln(1/(εδ)))) from x with probability 1-δ. This supplies the computational analogue of the classical concentration-of-measure statement and resolves an open question of Mahloujifar-Mahmoody (ALT 2019). Corollaries include poly-time poisoning/evasion attacks, a reduction yielding computational concentration for high-dimensional Gaussians under ℓ1, an algorithmic McDiarmid inequality, tampering algorithms for collective coin tossing, and exponential lower bounds for non-adaptive query algorithms. Weighted Hamming distance is also handled.

Significance. If the claimed tight bound and polynomial query complexity hold, the work supplies the first explicit algorithmic realization of concentration of measure, directly enabling efficient attacks on learning algorithms whenever the information-theoretic vulnerability probability is non-negligible. The new notion of algorithmic reduction between metric probability spaces and the lower-bound results add technical depth.

major comments (2)
  1. [MUCIO algorithm and analysis (main theorem)] The section describing MUCIO and its analysis: the central claim requires that additive sampling error in estimating each multiplicative conditional influence does not inflate the final distance beyond O(√(n ln(1/(εδ)))) or force super-polynomial total queries. No explicit error-propagation argument (union bound, multiplicative drift, or potential-function analysis) is supplied for the n sequential multiplicative updates; this is load-bearing for both the tight bound and the polynomial-time guarantee.
  2. [Main theorem statement and proof] Theorem on the main computational concentration result: the stated O(√(n ln(1/(εδ)))) distance is claimed to be preserved exactly under the sampling-based implementation, yet the manuscript supplies no quantitative bound relating per-coordinate sample size, failure probability, and the final distance deviation. This must be supplied to confirm the result is not merely information-theoretic.
minor comments (2)
  1. [Abstract and Theorem 1] The abstract states the running time is polynomial but does not record the precise dependence on 1/ε and 1/δ; the main theorem statement should make this explicit.
  2. [Preliminaries] Notation for the conditional influence (multiplicative version) is introduced without a displayed equation; adding a numbered definition would improve readability when the error analysis is revised.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the careful and constructive review. We address the two major comments below. We agree that the analysis requires additional explicit arguments for error control and will revise the manuscript to supply them.

read point-by-point responses
  1. Referee: [MUCIO algorithm and analysis (main theorem)] The section describing MUCIO and its analysis: the central claim requires that additive sampling error in estimating each multiplicative conditional influence does not inflate the final distance beyond O(√(n ln(1/(εδ)))) or force super-polynomial total queries. No explicit error-propagation argument (union bound, multiplicative drift, or potential-function analysis) is supplied for the n sequential multiplicative updates; this is load-bearing for both the tight bound and the polynomial-time guarantee.

    Authors: We agree that an explicit error-propagation argument for the n sequential updates is not provided in the current manuscript and is necessary to rigorously support both the distance bound and polynomial query complexity. In the revision we will add a dedicated subsection that applies a potential-function analysis (tracking the product of conditional influences) together with a union bound over the n steps. With per-coordinate sample size polynomial in n, 1/ε and log(1/δ), the total additive deviation remains absorbed inside the O(√(n ln(1/(εδ)))) term with probability 1-δ. revision: yes

  2. Referee: [Main theorem statement and proof] Theorem on the main computational concentration result: the stated O(√(n ln(1/(εδ)))) distance is claimed to be preserved exactly under the sampling-based implementation, yet the manuscript supplies no quantitative bound relating per-coordinate sample size, failure probability, and the final distance deviation. This must be supplied to confirm the result is not merely information-theoretic.

    Authors: We concur that a quantitative relation between sample size, per-step failure probability, and final distance deviation must be stated explicitly. The revised proof will include a supporting lemma that, for m = O(n log(n/δ)/ε²) samples per coordinate, each multiplicative influence estimate deviates by at most an additive 1/poly(n) factor with high probability; the accumulated effect over n coordinates is then bounded by a term that does not increase the leading O(√(n ln(1/(εδ)))) distance. This will be derived via standard Chernoff bounds and a telescoping product argument. revision: yes

Circularity Check

0 steps flagged

New algorithmic construction (MUCIO) resolves prior open question; no load-bearing self-citation or definitional reduction

full rationale

The paper's core contribution is an explicit polynomial-time algorithm (MUCIO) that, given a membership oracle for S, finds a nearby point in S by sequentially deciding coordinate flips via estimated multiplicative conditional influences. This is presented as resolving the open question from the authors' prior ALT 2019 work rather than deriving the bound or runtime from any fitted parameter, ansatz, or self-cited uniqueness theorem. The self-citation is only to the statement of the open problem; the algorithm and its analysis are self-contained in the present manuscript. No equation reduces the claimed O(sqrt(n log(1/(εδ)))) distance or poly-query bound to a renaming or construction-internal fit. Minor self-citation exists but is not load-bearing for the central claim.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Review performed from abstract only; the central addition is an algorithmic procedure rather than new free parameters or invented entities. The result rests on the known (non-computational) concentration fact and on oracle access.

axioms (2)
  • domain assumption Product measures on discrete spaces exhibit concentration of measure in Hamming distance.
    Stated as known background in the abstract.
  • domain assumption An efficient membership oracle for the target set S is available.
    Required for the computational procedure described.

pith-pipeline@v0.9.0 · 5891 in / 1346 out tokens · 36785 ms · 2026-05-24T22:40:12.207061+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.