Smoothed Analysis of Learning from Positive Samples

Anay Mehrotra; Jane H. Lee; Manolis Zampetakis

arxiv: 2504.10428 · v2 · submitted 2025-04-14 · 📊 stat.ML · cs.DS· cs.LG· math.ST· stat.TH

Smoothed Analysis of Learning from Positive Samples

Jane H. Lee , Anay Mehrotra , Manolis Zampetakis This is my paper

Pith reviewed 2026-05-22 19:47 UTC · model grok-4.3

classification 📊 stat.ML cs.DScs.LGmath.STstat.TH

keywords smoothed analysispositive-only learningVC dimensionPAC learningbinary classificationtruncated estimationdistribution smoothness

0 comments

The pith

All VC classes become learnable from positive-only samples once the distribution is smooth relative to a reference.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard positive-only PAC learning yields mostly negative results: almost no natural classes, such as two-dimensional halfspaces, are learnable from positive samples alone. This paper replaces worst-case analysis with a smoothed model that assumes the unknown target distribution is smooth with respect to a known reference distribution. In this model every class of finite VC dimension becomes learnable, and the sample complexity is O(VC dimension over epsilon squared) to reach epsilon classification error. The authors also supply an efficient algorithm that works whenever the class admits a low-degree polynomial approximation in L1 norm relative to the reference. The same smoothness condition yields improved algorithms for estimating parameters of exponential families from unknown truncations and for detecting truncation in non-product distributions.

Core claim

In the smoothed positive-only learning model, where samples are drawn from a target distribution D* whose density is bounded relative to a reference distribution D, every concept class with finite VC dimension is PAC-learnable from positive samples alone using O(VC(C)/ε²) samples to achieve error at most ε.

What carries the argument

The smoothness condition that the density of the target distribution D* is upper-bounded by a constant multiple of the reference density of D; this condition supplies the uniform convergence needed for generalization from positive samples.

If this is right

Every finite-VC class admits a positive-only learner whose sample complexity is linear in the VC dimension and quadratic in 1/ε.
Any class that admits poly(ε) L1 approximation by degree-k polynomials has a polynomial-time positive-only learner running in time poly(d^k/ε).
Exponential-family parameters can be estimated in polynomial time from samples truncated to an unknown set that is L1-approximable by non-negative polynomials.
Truncation detection is possible for broad classes that include non-product distributions.
Learning is possible when samples come from a short list of reference distributions, one of which witnesses the smoothness of the target.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Real-world positive-only data sets in bioinformatics or ecology may often satisfy the smoothness condition, which would explain why heuristic methods succeed despite worst-case impossibility.
The same smoothness lens could be applied to other learning settings that currently have strong negative results, potentially recovering polynomial sample bounds.
Empirical checks of density ratios between observed positive samples and plausible reference distributions could validate or refute the modeling assumption on concrete data sets.

Load-bearing premise

The unknown target distribution must be smooth with respect to some known reference distribution.

What would settle it

A concrete VC class C together with a reference distribution D and a smooth D* such that no algorithm using o(VC(C)/ε²) positive samples achieves error ε on D*.

read the original abstract

Binary classification from positive-only samples is a variant of PAC learning where the learner receives i.i.d. positive samples and aims to learn a classifier with low error. Previous work by Natarajan, Gereb-Graus, and Shvaytser characterized learnability and revealed a largely negative picture: almost no interesting classes, including two-dimensional halfspaces, are learnable. This poses a challenge for applications from bioinformatics to ecology, where practitioners rely on heuristics. In this work, we initiate a smoothed analysis of positive-only learning. We assume samples from a reference distribution $D$ such that the true distribution $D^*$ is smooth with respect to it. In stark contrast to the worst-case setting, we show that all VC classes become learnable in the smoothed model, requiring $O(VC/\epsilon^2)$ positive samples for $\epsilon$ classification error. We also give an efficient algorithm for any class admitting $\mathrm{poly}(\epsilon)$-approximation by degree-$k$ polynomials whose range is lower-bounded by a constant with respect to $D$ in L1-norm. It runs in time $\mathrm{poly}(d^k/\epsilon)$, qualitatively matching L1-regression. Our results also imply faster or more general algorithms for: (1) estimation with unknown-truncation, giving the first polynomial-time algorithm for estimating exponential-family parameters from samples truncated to an unknown set approximable by non-negative polynomials in L1 norm, improving on [KTZ FOCS19; LMZ FOCS24], who required strong L2-approximation; (2) truncation detection for broad classes, including non-product distributions, improving on [DLNS STOC24]'s who required product distributions; and (3) learning from a list of reference distributions, where samples come from $O(1)$ distributions, one of which witnesses smoothness of $D^*$, as arises when list-decoding algorithms learn samplers for $D^*$ from corrupted data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript initiates a smoothed analysis of positive-only binary classification. It assumes access to samples from a reference distribution D such that the unknown target distribution D* is smooth w.r.t. D, and claims that under this model every VC class becomes learnable from positive samples alone, with sample complexity O(VC/ε²) to achieve ε classification error. The authors also give a polynomial-time algorithm for classes admitting poly(ε)-approximation by degree-k polynomials that are lower-bounded by a positive constant in L1(D) norm, and derive corollaries for exponential-family estimation under unknown truncation, truncation detection, and learning from a short list of reference distributions.

Significance. If the central claims hold, the work is significant: it converts the largely negative worst-case picture for positive-only learning into a positive result for a natural smoothed model that is relevant to applications in bioinformatics and ecology. The O(VC/ε²) rate (when it holds) matches the standard PAC sample complexity up to constants, and the polynomial-approximation route to efficient algorithms is technically clean. The corollaries for truncated exponential-family estimation and list-decoding settings are concrete and improve on prior work that required stronger L2 approximation or product distributions.

major comments (1)

[Abstract and §3] Abstract and the statement of the main learnability theorem (likely Theorem 3.1 or §3): the claimed sample complexity O(VC/ε²) contains no visible factor of the smoothness parameter M (or equivalent quantitative measure such as ess sup dD*/dD). If the argument reduces uniform convergence over the VC class to reweighted samples whose weights are bounded by M, then either the O notation conceals a polynomial dependence on M or the theorem is stated only for constant-bounded M. The manuscript must make this dependence explicit, as it is load-bearing for the headline claim that 'all VC classes become learnable' with the stated rate.

minor comments (2)

[§2] The quantitative definition of smoothness (the precise role of M or the smoothing radius) should be stated with its parameter in the preliminaries section before any theorem statements that invoke it.
[§4] In the efficient-algorithm section, the running-time bound poly(d^k/ε) should explicitly list the dependence on the approximation degree k and on the smoothness parameter.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments, which help clarify the presentation of our results. We address the single major comment below and will revise the manuscript to make the dependence on the smoothness parameter explicit.

read point-by-point responses

Referee: [Abstract and §3] Abstract and the statement of the main learnability theorem (likely Theorem 3.1 or §3): the claimed sample complexity O(VC/ε²) contains no visible factor of the smoothness parameter M (or equivalent quantitative measure such as ess sup dD*/dD). If the argument reduces uniform convergence over the VC class to reweighted samples whose weights are bounded by M, then either the O notation conceals a polynomial dependence on M or the theorem is stated only for constant-bounded M. The manuscript must make this dependence explicit, as it is load-bearing for the headline claim that 'all VC classes become learnable' with the stated rate.

Authors: We agree that the dependence on the smoothness parameter must be stated explicitly. In the proof of Theorem 3.1, the argument proceeds by reweighting the loss with respect to the reference distribution D, where the Radon-Nikodym derivative is bounded by M. Standard Hoeffding-type bounds for the resulting bounded random variables then yield a sample complexity of O(M² VC / ε²) (up to logarithmic factors in 1/δ). The abstract and high-level statements in Section 3 used the O(VC/ε²) notation under the convention that M is a fixed constant, but this was imprecise. We will revise the abstract, the statement of Theorem 3.1, and the discussion in §3 to display the explicit M² factor. This change is purely presentational and does not alter the qualitative result that every VC class is learnable in the smoothed model. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained under external smoothness assumption

full rationale

The paper posits smoothness of D* w.r.t. reference D as an external modeling assumption and derives learnability of VC classes with O(VC/ε²) positive samples from uniform convergence after reweighting. No quoted step reduces the bound or the theorem to a fitted parameter, self-definition, or self-citation chain by construction. The smoothness parameter is not derived from the sample-complexity claim, and the analysis does not rename known results or smuggle ansatzes via prior self-work in a load-bearing manner. The central claim therefore retains independent content relative to its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central positive results rest on the external modeling assumption that D* is smooth relative to a reference distribution D; no free parameters or new invented entities are introduced in the abstract.

axioms (1)

domain assumption The true distribution D* is smooth with respect to a reference distribution D.
This modeling choice is invoked to obtain learnability for all VC classes and is the key departure from the worst-case negative results.

pith-pipeline@v0.9.0 · 5911 in / 1282 out tokens · 63110 ms · 2026-05-22T19:47:36.669660+00:00 · methodology

Smoothed Analysis of Learning from Positive Samples

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)