Efficient Statistics With Unknown Truncation, Polynomial Time Algorithms, Beyond Gaussians
Pith reviewed 2026-05-23 20:10 UTC · model grok-4.3
The pith
A d^{poly(ℓ/ε)} time algorithm estimates parameters of exponential families from samples truncated by any unknown set approximable by degree-ℓ polynomials.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
For any exponential family obeying the required structural conditions and any unknown truncation set S that is ε-close to a degree-ℓ polynomial, the parameters can be recovered to accuracy ε in time d^{poly(ℓ/ε)}; the same framework yields the first algorithms for arbitrary-covariance Gaussians under unknown truncation and for linear regression with Gaussian features under unknown truncation, while a separate poly(d/ε)-time algorithm exists when S is a halfspace or axis-aligned rectangle.
What carries the argument
A reduction that converts the truncated estimation task into a robust positive-unlabeled learning problem, then solves the latter via polynomial approximation of the truncation set.
If this is right
- The first algorithm exists that recovers the mean and covariance of an arbitrary Gaussian from samples observed only inside an unknown set S.
- The first algorithm exists for linear regression when the feature vectors are Gaussian and the responses are observed only inside an unknown truncation set.
- Polynomial-time recovery is possible for all Gaussians and a broader class of exponential families when the truncation set is a halfspace or an axis-aligned rectangle.
- A general reduction maps positive-unlabeled learning under covariate shift to ordinary positive-negative learning.
Where Pith is reading between the lines
- If the structural assumptions on the exponential family can be verified for additional families, the same polynomial-approximation approach would immediately give truncated estimators for those families as well.
- The reduction from positive-unlabeled to positive-negative learning may apply to other missing-data or selection-bias settings beyond truncation.
- Testing whether a concrete truncation set admits a low-degree polynomial approximation could become a practical preprocessing step for deciding which algorithm to run.
Load-bearing premise
The exponential family must obey unspecified structural conditions and the unknown truncation set S must admit an ε-approximation by a degree-ℓ polynomial.
What would settle it
An explicit exponential family obeying the structural conditions together with a truncation set S that cannot be approximated by any low-degree polynomial for which no d^{poly(ℓ/ε)}-time algorithm exists.
Figures
read the original abstract
We study the estimation of distributional parameters when samples are shown only if they fall in some unknown set $S \subseteq \mathbb{R}^d$. Kontonis, Tzamos, and Zampetakis (FOCS'19) gave a $d^{\mathrm{poly}(1/\varepsilon)}$ time algorithm for finding $\varepsilon$-accurate parameters for the special case of Gaussian distributions with diagonal covariance matrix. Recently, Diakonikolas, Kane, Pittas, and Zarifis (COLT'24) showed that this exponential dependence on $1/\varepsilon$ is necessary even when $S$ belongs to some well-behaved classes. These works leave the following open problems which we address in this work: Can we estimate the parameters of any Gaussian or even extend beyond Gaussians? Can we design $\mathrm{poly}(d/\varepsilon)$ time algorithms when $S$ is a simple set such as a halfspace? We make progress on both of these questions by providing the following results: 1. Toward the first question, we give a $d^{\mathrm{poly}(\ell/\varepsilon)}$ time algorithm for any exponential family that satisfies some structural assumptions and any unknown set $S$ that is $\varepsilon$-approximable by degree-$\ell$ polynomials. This result has two important applications: 1a) The first algorithm for estimating arbitrary Gaussian distributions from samples truncated to an unknown $S$; and 1b) The first algorithm for linear regression with unknown truncation and Gaussian features. 2. To address the second question, we provide an algorithm with runtime $\mathrm{poly}(d/\varepsilon)$ that works for a set of exponential families (containing all Gaussians) when $S$ is a halfspace or an axis-aligned rectangle. Along the way, we develop tools that may be of independent interest, including, a reduction from PAC learning with positive and unlabeled samples to PAC learning with positive and negative samples that is robust to certain covariate shifts.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims a d^{poly(ℓ/ε)} time algorithm for parameter estimation in exponential families satisfying structural assumptions, when samples are truncated to an unknown set S that is ε-approximable by degree-ℓ polynomials. This yields the first algorithms for arbitrary (non-diagonal) Gaussians under unknown truncation and for linear regression with unknown truncation and Gaussian features. It also gives a poly(d/ε) time algorithm for a class of exponential families (including Gaussians) when S is a halfspace or axis-aligned rectangle, plus a reduction from positive-unlabeled to positive-negative PAC learning that is robust to covariate shift.
Significance. If the central claims hold, the work is significant: it resolves open questions left by Kontonis et al. (FOCS'19) and Diakonikolas et al. (COLT'24) by extending efficient truncated estimation beyond diagonal Gaussians, supplies the first algorithms for arbitrary Gaussians and truncated linear regression, and provides a poly(d/ε) runtime for simple truncation sets. The PU-to-PN reduction robust to covariate shift is a tool of independent interest. Credit is due for the explicit algorithmic constructions and the reduction.
minor comments (3)
- [Abstract and §1] Abstract and introduction repeatedly refer to 'some structural assumptions' on the exponential family without a concise enumerated list; move or add an explicit bullet list of the assumptions (with forward reference to the theorem statement) to improve readability.
- [Theorem statements in §3 and §4] The runtime statements (e.g., d^{poly(ℓ/ε)}) should explicitly list dependence on all parameters including the failure probability δ and any constants hidden in the poly notation.
- [§2 (Preliminaries)] Notation for the polynomial approximability of S (ε-approximable by degree-ℓ polynomials) is introduced without a formal definition or reference to the precise norm or measure used; add a short definition paragraph or equation.
Simulated Author's Rebuttal
We thank the referee for their positive assessment of the paper, recognition of its significance in resolving open questions from prior work, and recommendation of minor revision. No specific major comments were raised in the report.
Circularity Check
No significant circularity in derivation chain
full rationale
The paper presents novel algorithmic results for estimating parameters of exponential families under unknown truncation sets that satisfy polynomial approximability or simplicity conditions (halfspaces, rectangles). These are framed as new constructions with explicit runtimes, including a reduction from positive-unlabeled to positive-negative PAC learning robust to covariate shift. Prior citations (e.g., Kontonis et al. FOCS'19 for the diagonal-Gaussian special case) provide context and contrast rather than load-bearing justification for the new claims; the central results do not reduce by definition or construction to fitted inputs, self-defined quantities, or unverified self-citations. The derivation chain is self-contained as independent algorithmic development.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
The Reverse Isoperimetric Problem for Gaussian Measure
ISBN : 9780486612720. URL: https://books.google.com/books?id=MtU8uP7XMvoC (cit. on p. 40). [Bal93] Keith Ball. “The Reverse Isoperimetric Problem for Gaussian Measure”. In: Discrete Comput. Geom. 10.4 (Dec. 1993), pp. 411–420.ISSN : 0179-5376. DOI: 10.1007/BF02573986. URL: https://doi.org/10.1007/BF02573986 (cit. on p. 16). [Ber46] Joseph Berkson. “Limita...
-
[2]
Learning From Satisfy- ing Assignments Under Continuous Distributions
Orlando, FL, USA: Association for Computing Machinery, 2023, pp. 1699–1712. ISBN : 9781450399135. DOI: 10.1145/3564246.3585177 (cit. on pp. 7, 20). [CDS20] Clément L. Canonne, Anindya De, and Rocco A. Servedio. “Learning From Satisfy- ing Assignments Under Continuous Distributions”. In: Proceedings of the 2020 ACM- SIAM Symposium on Discrete Algorithms (S...
-
[3]
New Results for Learning Noisy Parities and Halfspaces
URL: https : / / proceedings . neurips . cc / paper _ files / paper / 2021 / file / 0ed8861dc36bee580d100f91283d0559-Paper.pdf (cit. on pp. 1, 2, 19, 20). [Efr22] Bradley Efron. Exponential Families in Theory and Practice . Institute of Mathematical Statistics Textbooks. Cambridge University Press, 2022. DOI: https://doi.org/10. 1017/9781108773157 (cit. o...
-
[4]
An Investigation of the Consequences of Partial Aggregation of Micro-Economic Data
URL: https://api.semanticscholar.org/CorpusID:220363551 (cit. on p. 20). [FW72] Edgar L. Feige and Harold W. Watts. “An Investigation of the Consequences of Partial Aggregation of Micro-Economic Data”. In:Econometrica 40.2 (1972), pp. 343–360.ISSN : 00129682, 14680262. URL: http://www.jstor.org/stable/1909411 (cit. on p. 19). [FYC23] Jiaojiao Fan, Bo Yuan...
-
[5]
URL: http://www.nber.org/chapters/c10489 (cit. on p. 19). [IR15] Guido W. Imbens and Donald B. Rubin. Causal Inference for Statistics, Social, and Biomed- ical Sciences: An Introduction. Cambridge University Press, 2015.DOI: 10.1017/CBO9781139025751 (cit. on pp. 1, 19). [IZD20] Andrew Ilyas, Emmanouil Zampetakis, and Constantinos Daskalakis. “A Theoreti- ...
-
[6]
Learning Geometric Concepts via Gaussian Surface Area
DOI: 10.1137/060649057 . URL: https://doi.org/10.1137/060649057 (cit. on pp. 9, 15, 16, 35). [KOS08] Adam R. Klivans, Ryan O’Donnell, and Rocco A. Servedio. “Learning Geometric Concepts via Gaussian Surface Area”. In: 2008 49th Annual IEEE Symposium on Foun- dations of Computer Science . 2008, pp. 541–550. DOI: 10.1109/FOCS.2008.64 (cit. on p. 16). [KSST1...
-
[7]
A Computationally Efficient Method for Learning Exponential Family Distributions
DOI: 10.1007/s00477-011-0458-8 (cit. on p. 1). [SB14] Shai Shalev-Shwartz and Shai Ben-David. Understanding Machine Learning: From The- ory to Algorithms . Cambridge University Press, 2014. DOI: https : / / doi . org / 10 . 1017/CBO9781107298019 (cit. on pp. 9, 16, 36). [SSW21] Abhin Shah, Devavrat Shah, and Gregory Wornell. “A Computationally Efficient M...
-
[8]
[DGTZ18]) Σ−1/2 (bµ S − µ S) 2 ≤ ε and (1 − ε)ΣS ⪯bΣS ⪯ (1 + ε)ΣS
(Relation between (bµ ,bΣ) and (µ S, ΣS); Lemma 5 of Daskalakis et al. [DGTZ18]) Σ−1/2 (bµ S − µ S) 2 ≤ ε and (1 − ε)ΣS ⪯bΣS ⪯ (1 + ε)ΣS
-
[9]
[DGTZ18]) Σ−1/2 (bµ S − µ ) ≤ O log1/2 1 α , bΣS ⪰ Ω α2 · Σ, and Σ−1/2bΣSΣ−1/2 − I F ≤ O log 1 α
(Relation between (bµ ,bΣ) and (µ , Σ); Corollary 1 of Daskalakis et al. [DGTZ18]) Σ−1/2 (bµ S − µ ) ≤ O log1/2 1 α , bΣS ⪰ Ω α2 · Σ, and Σ−1/2bΣSΣ−1/2 − I F ≤ O log 1 α
-
[10]
(Further relations between (bµ ,bΣ) and (µ , Σ); Proposition 1 of Daskalakis et al. [DGTZ18]) I − Σ1/2bΣ−1Σ1/2 F ≤ O log (1/α) α2 and I −bΣ1/2 S Σ−1bΣ1/2 S F ≤ O log (1/α) α2 , Ω α2 · I ⪯ Σ−1/2bΣSΣ−1/2 ⪯ O 1 α2 · I and Ω α2 · I ⪯bΣ−1/2 S ΣbΣ−1/2 S ⪯ O 1 α2 · I , bΣ−1 S Σ1/2 (bµ S − µ ) 2 ≤ O log (1/α) α2 and Σ−1bΣ1/2 S (bµ S − µ ) 2 ≤ O log (1/α) α2 . B P...
-
[11]
Case A (t ≤ −1.05): Since H(·) ≥ 0 and t ≤ 0, 2H2(t) − 3tH (t) ≥ 0 and, hence, t2 − 1 + 2H2(t) − 3tH (t) ≥ t2 − 1 (t≤−1.05) > 1 10
-
[12]
Further, on t ≥ − 1.05, H(t) ≥ e−1.052/2 √ 2π(1−Φ(−1.05)) =: c(1.05)
Case B ( t ∈ (−1.05, −2/3]): Since H(·) ≥ 0 and t ≤ − 2/3, 2 H2(t) − 3tH (t) ≥ 2H2(t) + 2H(t). Further, on t ≥ − 1.05, H(t) ≥ e−1.052/2 √ 2π(1−Φ(−1.05)) =: c(1.05). Hence, 2 H2(t) − 3tH (t) ≥ 2H2(t) + 2H(t) ≥ 2c(1.05)2 + 2c(1.05) ≥ 0.68. Further, t2 − 1 ≥ − 5/9 ≥ − 0.56. It follows that t2 − 1 + 2H2(t) − 3tH (t) ≥ 0.1. 79
-
[13]
Further, on t ≥ − 2/3, H(t) ≥ e−(2/3)2/2 √ 2π(1−Φ(−2/3)) =: c(2/3)
Case C ( t ∈ (−2/3, −2/5]): Since H(·) ≥ 0 and t ≤ − 2/5, 2 H2(t) − 3tH (t) ≥ 2H2(t) + 6H(t)/5. Further, on t ≥ − 2/3, H(t) ≥ e−(2/3)2/2 √ 2π(1−Φ(−2/3)) =: c(2/3). Hence, 2 H2(t) − 3tH (t) ≥ 2H2(t) + 6H(t)/5 ≥ 2c(2/3)2 + (6/5)c(2/3) ≥ 0.878. Further, t2 − 1 ≥ − 0.85. It follows that t2 − 1 + 2H2(t) − 3tH (t) ≥ 0.02
-
[14]
Further, on t ≥ − 2/5, H(t) ≥ e−(2/5)2/2 √ 2π(1−Φ(−2/5)) =: c(2/5)
Case D (t ∈ (−2/5, −1/5]): Since H(·) ≥ 0 and t ≤ 1/5, 2H2(t) − 3tH (t) ≥ 2H2(t) + 3H(t)/5. Further, on t ≥ − 2/5, H(t) ≥ e−(2/5)2/2 √ 2π(1−Φ(−2/5)) =: c(2/5). Hence, 2 H2(t) − 3tH (t) ≥ 2H2(t) + 3H(t)/5 ≥ 2c(2/5)2 + (3/5)c(2/5) ≥ 0.968. Further, t2 − 1 ≥ − 0.96. It follows that t2 − 1 + 2H2(t) − 3tH (t) ≥ 0.008
-
[15]
Further, on t ≥ − 1/5, H(t) ≥ e−(1/5)2/2 √ 2π(1−Φ(−1/5)) =: c(1/5)
Case E ( t ∈ (−1/5, −1/20]): Since H(·) ≥ 0 and t ≤ 1/20, 2 H2(t) − 3tH (t) ≥ 2H2(t) + 3H(t)/20. Further, on t ≥ − 1/5, H(t) ≥ e−(1/5)2/2 √ 2π(1−Φ(−1/5)) =: c(1/5). Hence, 2 H2(t) − 3tH (t) ≥ 2H2(t) + 3H(t)/20 ≥ 2c(1/5)2 + (3/20)c(1/5) ≥ 1.01. Further, t2 − 1 ≥ − 1. It follows that t2 − 1 + 2H2(t) − 3tH (t) ≥ 0.01
-
[16]
Further, on t ≥ − 1/20, H(t) ≥ e−(1/20)2/2 √ 2π(1−Φ(−1/20)) =: c(1/20)
Case F ( t ∈ (−1/20, 0]): Since H(·) ≥ 0 and t ≤ 0, 2 H2(t) − 3tH (t) ≥ 2H2(t). Further, on t ≥ − 1/20, H(t) ≥ e−(1/20)2/2 √ 2π(1−Φ(−1/20)) =: c(1/20). Hence, 2H2(t) − 3tH (t) ≥ 2H2(t) ≥ 2c(1/20)2 ≥ 1.17. Further, t2 − 1 ≥ −1. It follows that t2 − 1 + 2H2(t) − 3tH (t) ≥ 0.17. B.6 Proof of Theorem 8.1: Sample Complexity of Estimation with Unknown Trunca- t...
-
[17]
(Strong Convexity) L (·) is Ω λ · k−k strongly convex
-
[18]
Moreover, given a sampler for E(·), v can be efficiently computed
(Stochastic gradient and their Second moments) The random variable v = t(x) − t(z) where x ∼ E(θ⋆, S⋆) and z ∼ E(eθ) is an unbiased estimate ofL (·) and satisfies E h ∥v∥2 2 | θ i ≤ DΛ. Moreover, given a sampler for E(·), v can be efficiently computed
-
[19]
(Starting point) Any point θ ∈ Θ can be used as a starting point and satisfies ∥θ −eθ∥2 ≤ D
-
[20]
(Projection to Θ) Finally, we inherit the projection oracle to the domain Θ which contains the optimizereθ from Assumption 3.3. The following result follows. Proposition C.1. Suppose Assumptions 1 and 2 and Assumptions 3.1, 3.2 and 3.4 hold. Supposediam(Θ) ≤ D. Fix any ε, δ ∈ (0, 1). There is an algorithm that, given n independent samples from E(θ⋆, S⋆) f...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.