On efficient robust regression with subquadratic samples
Pith reviewed 2026-05-20 00:57 UTC · model grok-4.3
The pith
A near-linear-time algorithm performs robust linear regression with subquadratic samples and O(sqrt(ε κ)) prediction error when ε κ is small.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
There is a near-linear-time algorithm that uses Õ(d/ε⁴) samples to achieve prediction error O(√(ε κ)) for robust linear regression under Gaussian covariates with covariance condition number κ, provided ε κ ≲ 1. This is complemented by an SQ lower bound that efficient algorithms cannot achieve o(√(ε κ)) error with subquadratic samples, and a low-degree lower bound suggesting higher sample needs without the ε κ condition.
What carries the argument
A near-linear-time robust estimation procedure that tolerates corruptions while producing a linear predictor with the stated error bound under unknown covariance.
If this is right
- The algorithm improves sample complexity and runtime over all prior efficient methods for the same error guarantee.
- Efficient statistical query algorithms require Ω(d²) samples to achieve error o(√(ε κ)) when ε κ ≲ 1.
- Without the ε κ ≲ 1 restriction, efficient algorithms may need Õ(min{d ε² κ², ε² d²}) samples to beat the trivial zero predictor.
- These bounds clarify the necessary trade-offs among sample complexity, runtime, condition number, and achievable prediction error.
Where Pith is reading between the lines
- The upper and lower bounds together indicate that subquadratic samples suffice for this robust estimation task only under the stated product condition on corruption and condition number.
- The same near-linear filtering approach might extend to related problems such as robust covariance estimation or mean estimation under similar assumptions.
- Empirical validation on synthetic instances with controlled κ and ε could confirm whether the O(√(ε κ)) error is tight in practice.
- The low-degree polynomial lower bound supplies fine-grained evidence that quadratic-sample barriers persist for many polynomial-time methods beyond statistical queries.
Load-bearing premise
The input samples come from a linear model with Gaussian covariates whose covariance matrix has bounded condition number κ, and the analysis requires that the corruption fraction ε satisfies ε κ ≲ 1.
What would settle it
Running the algorithm on Gaussian data with small ε κ and observing prediction error much larger than O(√(ε κ)), or finding that any subquadratic-sample SQ algorithm achieves only o(√(ε κ)) error, would falsify the main claims.
read the original abstract
We revisit the problem of robust linear regression under Gaussian covariates with an unknown covariance matrix of condition number $\kappa$. For this fundamental problem, significant gaps remain in our understanding of the trade-offs among sample complexity, condition number, runtime, and prediction error for efficient algorithms. Our first result is a near-linear-time algorithm that uses $\widetilde{O}(d/\epsilon^4)$ samples, where $d$ is the dimension and $\epsilon$ is the corruption rate, and achieves prediction error $O(\sqrt{\epsilon\kappa})$ under the condition $\epsilon\kappa \lesssim 1$, improving over all prior works. We complement this result with a Statistical Query (SQ) lower bound showing that efficient SQ algorithms achieving error $o(\sqrt{\epsilon\kappa})$ when $\epsilon \kappa \lesssim 1$ require queries that take $\Omega(d^2)$ samples to simulate. Finally, we prove a low-degree polynomial lower bound that gives fine-grained evidence that, without assumptions such as $\epsilon \kappa \lesssim 1$, efficient algorithms may require $\tilde{\Omega}\left(\min\{d\epsilon^{2}\kappa^{2},\ \epsilon^{2}d^{2}\}\right)$ samples to significantly outperform the trivial estimator that always guesses $0$.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript studies robust linear regression with Gaussian covariates whose covariance matrix has unknown condition number κ. It gives a near-linear-time algorithm using Õ(d/ε⁴) samples that achieves prediction error O(√(ε κ)) whenever ε κ ≲ 1. The algorithmic result is paired with an SQ lower bound showing that any efficient SQ algorithm achieving o(√(ε κ)) error in this regime requires Ω(d²) samples to simulate, and with a low-degree polynomial lower bound indicating that, absent the ε κ ≲ 1 assumption, efficient algorithms may need Õ(min{d ε² κ², ε² d²}) samples to beat the trivial zero estimator.
Significance. If the stated guarantees hold, the work closes several gaps in the sample-complexity/runtime/prediction-error trade-offs for efficient robust regression. The explicit matching of the algorithmic error O(√(ε κ)) with the SQ lower bound, together with the fine-grained low-degree evidence for the necessity of the ε κ ≲ 1 regime, supplies concrete evidence of near-optimality within the class of efficient algorithms.
major comments (2)
- [§3] §3 (main algorithmic theorem): the runtime analysis claims near-linear time after drawing Õ(d/ε⁴) samples; the dependence of the per-sample processing time on κ and on the failure probability must be stated explicitly, because any super-linear factor in κ would violate the near-linear claim once κ grows with d or 1/ε.
- [§5] §5 (SQ lower bound): the reduction is stated for the regime ε κ ≲ 1; it is unclear whether the Ω(d²) sample lower bound continues to hold (or strengthens) when ε κ ≫ 1, which would affect the interpretation of the low-degree bound in §6.
minor comments (2)
- [Preliminaries] The notation Õ(·) is used throughout without an explicit definition; a single sentence in the preliminaries would remove ambiguity for readers outside the subfield.
- [Table 1] Table 1 (comparison with prior work) lists sample complexities but omits the precise prediction-error guarantees of the cited algorithms; adding one column would make the improvement O(√(ε κ)) versus prior bounds immediately visible.
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive feedback on our manuscript. We address the major comments point by point below and will incorporate clarifications into the revised version.
read point-by-point responses
-
Referee: [§3] §3 (main algorithmic theorem): the runtime analysis claims near-linear time after drawing Õ(d/ε⁴) samples; the dependence of the per-sample processing time on κ and on the failure probability must be stated explicitly, because any super-linear factor in κ would violate the near-linear claim once κ grows with d or 1/ε.
Authors: We agree that the runtime dependencies should be stated explicitly to support the near-linear time claim. In the analysis of the main algorithmic result in §3, each sample is processed in time O(d · polylog(d, 1/ε, log(1/δ))), where δ is the failure probability; this is independent of κ. The total runtime is therefore Õ((d/ε⁴) · d · polylog(d, 1/ε, log(1/δ))), which is near-linear in the input size (n d with n = Õ(d/ε⁴)) up to polylog factors. We will add an explicit statement of these dependencies in the revised manuscript. revision: yes
-
Referee: [§5] §5 (SQ lower bound): the reduction is stated for the regime ε κ ≲ 1; it is unclear whether the Ω(d²) sample lower bound continues to hold (or strengthens) when ε κ ≫ 1, which would affect the interpretation of the low-degree bound in §6.
Authors: The SQ lower bound in §5 is constructed specifically for the regime ε κ ≲ 1 to match the setting where the algorithm achieves O(√(ε κ)) error. In this regime we prove that achieving o(√(ε κ)) error requires Ω(d²) samples for any efficient SQ algorithm. When ε κ ≫ 1 the algorithmic guarantee no longer applies, and the low-degree polynomial lower bound in §6 separately addresses the sample complexity needed to beat the trivial zero estimator without the ε κ ≲ 1 assumption. The Ω(d²) bound does not necessarily extend or strengthen outside the stated regime because the hard-instance reduction relies on ε κ being bounded. We will add a clarifying remark in the revised manuscript. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper's central results—an algorithmic construction achieving Õ(d/ε⁴) samples and O(√(ε κ)) prediction error under the explicit regime ε κ ≲ 1, together with matching SQ and low-degree lower bounds—are derived from standard Gaussian covariate assumptions and established SQ/low-degree techniques. No load-bearing step reduces by construction to a fitted parameter, self-referential definition, or self-citation chain; the derivation remains self-contained against external benchmarks and does not rename or smuggle in prior results as new predictions.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Covariates are drawn from a Gaussian distribution with unknown covariance matrix of condition number κ
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theorem 1.2: near-linear-time algorithm ... Õ(d/ε⁴) samples ... O(√(ε κ)) under ε κ ≲ 1
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanembed_strictMono_of_one_lt unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Lemma 5.7 and concentration arguments for Gaussian projections and fourth-moment bounds
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Low-degree lower bound via Hermite coefficients and two-dimensional corruption subspace (Theorem 7.5)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Proceedings of the International Conference on Machine Learning (ICML) , year =
d’Orsi, Tommaso and Novikov, Gleb and Steurer, David , title =. Proceedings of the International Conference on Machine Learning (ICML) , year =
-
[2]
Hermite polynomial and Gaussian random variable , author =. MathOverflow , year =
-
[3]
The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=
Information-Computation Tradeoffs for Noiseless Linear Regression with Oblivious Contamination , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=
-
[4]
Proceedings of the International Conference on Machine Learning (ICML) , year =
Das, Aditya and Feldman, Dan and Honorio, Jorge , title =. Proceedings of the International Conference on Machine Learning (ICML) , year =
-
[5]
Robust regression via multivariate regression depth , author =. Bernoulli , volume =. 2020 , publisher =
work page 2020
-
[6]
Journal of the American Statistical Association , volume=
Robust regression with covariate filtering: Heavy tails and adversarial contamination , author=. Journal of the American Statistical Association , volume=. 2025 , publisher=
work page 2025
-
[7]
The Annals of Statistics , volume=
Generalized resilience and robust statistics , author=. The Annals of Statistics , volume=. 2022 , publisher=
work page 2022
-
[8]
and Stewart, Alistair , title =
Diakonikolas, Ilias and Kane, Daniel M. and Stewart, Alistair , title =. Proceedings of the 31st Conference On Learning Theory (COLT) , year =
-
[9]
Conference On Learning Theory , pages=
Efficient algorithms for outlier-robust regression , author=. Conference On Learning Theory , pages=. 2018 , organization=
work page 2018
- [10]
- [11]
- [12]
-
[13]
An introduction to matrix concentration inequalities , author=. Foundations and trends. 2015 , publisher=
work page 2015
-
[14]
International conference on machine learning , pages=
Robust sparse regression under adversarial corruption , author=. International conference on machine learning , pages=. 2013 , organization=
work page 2013
-
[15]
International Conference on Machine Learning , pages=
Sever: A robust meta-algorithm for stochastic optimization , author=. International Conference on Machine Learning , pages=. 2019 , organization=
work page 2019
- [16]
-
[17]
2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS) , pages=
Online and distribution-free robustness: Regression and contextual bandits with huber contamination , author=. 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS) , pages=. 2022 , organization=
work page 2021
-
[18]
Advances in Neural Information Processing Systems , volume=
Relu regression with massart noise , author=. Advances in Neural Information Processing Systems , volume=
-
[19]
Breakthroughs in statistics: Methodology and distribution , pages=
Robust estimation of a location parameter , author=. Breakthroughs in statistics: Methodology and distribution , pages=. 1992 , publisher=
work page 1992
-
[20]
International Conference on Machine Learning , pages=
Consistent regression when oblivious outliers overwhelm , author=. International Conference on Machine Learning , pages=. 2021 , organization=
work page 2021
-
[21]
Advances in Neural Information Processing Systems , volume=
Near-optimal algorithms for gaussians with huber contamination: Mean estimation and linear regression , author=. Advances in Neural Information Processing Systems , volume=
-
[22]
The Thirty Sixth Annual Conference on Learning Theory , pages=
Distribution-independent regression for generalized linear models with oblivious corruptions , author=. The Thirty Sixth Annual Conference on Learning Theory , pages=. 2023 , organization=
work page 2023
-
[23]
Robust regression and outlier detection , author=. 2003 , publisher=
work page 2003
-
[24]
Proceedings of the 57th Annual ACM Symposium on Theory of Computing , pages=
Sos certificates for sparse singular values and their applications: Robust statistics, subspace distortion, and more , author=. Proceedings of the 57th Annual ACM Symposium on Theory of Computing , pages=
-
[25]
The Thirty Seventh Annual Conference on Learning Theory , pages=
Lasso with latents: Efficient estimation, covariate rescaling, and computational-statistical gaps , author=. The Thirty Seventh Annual Conference on Learning Theory , pages=. 2024 , organization=
work page 2024
-
[26]
Advances in Neural Information Processing Systems , volume=
Sparse PCA from sparse linear regression , author=. Advances in Neural Information Processing Systems , volume=. 2018 , pages=
work page 2018
-
[27]
Conference on Learning Theory , pages=
Reducibility and statistical-computational gaps from secret leakage , author=. Conference on Learning Theory , pages=. 2020 , organization=
work page 2020
-
[28]
The Thirty Seventh Annual Conference on Learning Theory , pages=
Computational-statistical gaps for improper learning in sparse linear regression , author=. The Thirty Seventh Annual Conference on Learning Theory , pages=. 2024 , organization=
work page 2024
-
[29]
The evaluation of the collision matrix , author=. Physical review , volume=. 1950 , publisher=
work page 1950
-
[30]
On a formula for the product-moment coefficient of any order of a normal frequency distribution in any number of variables , author=. Biometrika , volume=. 1918 , publisher=
work page 1918
- [31]
-
[32]
Hermite Polynomials , author =
- [33]
-
[34]
Advances in Neural Information Processing Systems , volume=
Statistical query lower bounds for list-decodable linear regression , author=. Advances in Neural Information Processing Systems , volume=
- [35]
-
[36]
Proceedings of the 34th Conference on Learning Theory , series=
Statistical Query Algorithms and Low-Degree Tests Are Almost Equivalent , author=. Proceedings of the 34th Conference on Learning Theory , series=. 2021 , publisher=
work page 2021
-
[37]
Conference On Learning Theory , pages=
Reducibility and computational lower bounds for problems with planted sparse structure , author=. Conference On Learning Theory , pages=. 2018 , organization=
work page 2018
-
[38]
Statistical inference and the sum of squares method , author=. 2018 , publisher=
work page 2018
-
[39]
ISAAC Congress (International Society for Analysis, its Applications and Computation) , pages=
Notes on computational hardness of hypothesis testing: Predictions using the low-degree likelihood ratio , author=. ISAAC Congress (International Society for Analysis, its Applications and Computation) , pages=. 2019 , organization=
work page 2019
-
[40]
Computational complexity of statistics: New insights from low- degree polynomials
Computational Complexity of Statistics: New Insights from Low-Degree Polynomials , author=. arXiv preprint arXiv:2506.10748 , year=
-
[41]
2025 IEEE 66th Annual Symposium on Foundations of Computer Science (FOCS) , year=
The quasi-polynomial low-degree conjecture is false , author=. 2025 IEEE 66th Annual Symposium on Foundations of Computer Science (FOCS) , year=
work page 2025
-
[42]
arXiv preprint arXiv:2007.06072 , year=
A spectral algorithm for robust regression with subgaussian rates , author=. arXiv preprint arXiv:2007.06072 , year=
-
[43]
arXiv preprint arXiv:2209.02856 , year=
A spectral least-squares-type method for heavy-tailed corrupted regression with unknown covariance & heterogeneous noise , author=. arXiv preprint arXiv:2209.02856 , year=
-
[44]
Advances in Neural Information Processing Systems , volume=
Sq lower bounds for non-gaussian component analysis with weaker assumptions , author=. Advances in Neural Information Processing Systems , volume=
- [45]
-
[46]
PeerJ Computer Science , volume=
SymPy: symbolic computing in Python , author=. PeerJ Computer Science , volume=. 2017 , publisher=
work page 2017
-
[47]
Optimal spectral recovery of a planted vector in a subspace , author=. Bernoulli , volume=. 2025 , publisher=
work page 2025
-
[48]
Advances in neural information processing systems , volume=
Robust regression via hard thresholding , author=. Advances in neural information processing systems , volume=. 2015 , pages=
work page 2015
-
[49]
Journal of the ACM (JACM) , volume=
Efficient noise-tolerant learning from statistical queries , author=. Journal of the ACM (JACM) , volume=. 1998 , publisher=
work page 1998
-
[50]
2024 IEEE 65th Annual Symposium on Foundations of Computer Science (FOCS) , pages=
Sum-of-squares lower bounds for non-gaussian component analysis , author=. 2024 IEEE 65th Annual Symposium on Foundations of Computer Science (FOCS) , pages=. 2024 , organization=
work page 2024
-
[51]
arXiv preprint arXiv:2511.19398 , year=
PTF Testing Lower Bounds for Non-Gaussian Component Analysis , author=. arXiv preprint arXiv:2511.19398 , year=
-
[52]
Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing , pages=
Continuous lwe , author=. Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing , pages=
-
[53]
SIAM Journal on Computing , volume=
A nearly tight sum-of-squares lower bound for the planted clique problem , author=. SIAM Journal on Computing , volume=. 2019 , publisher=
work page 2019
-
[54]
2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS) , pages=
Efficient bayesian estimation from few samples: community detection and related problems , author=. 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS) , pages=. 2017 , organization=
work page 2017
-
[55]
2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS) , pages=
The power of sum-of-squares for detecting hidden structures , author=. 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS) , pages=. 2017 , organization=
work page 2017
-
[56]
Advances in Neural Information Processing Systems , volume=
Consistent robust regression , author=. Advances in Neural Information Processing Systems , volume=. 2017 , pages=
work page 2017
-
[57]
Conference on Learning Theory , pages=
Adaptive hard thresholding for near-optimal consistent robust regression , author=. Conference on Learning Theory , pages=. 2019 , organization=
work page 2019
-
[58]
Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=
Robust estimation via robust gradient estimation , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2020 , publisher=
work page 2020
-
[59]
Advances in Neural Information Processing Systems , volume=
Quantum entropy scoring for fast robust mean estimation and improved outlier detection , author=. Advances in Neural Information Processing Systems , volume=. 2019 , pages=
work page 2019
-
[60]
Journal of the ACM (JACM) , volume=
Statistical algorithms and a lower bound for detecting planted cliques , author=. Journal of the ACM (JACM) , volume=. 2017 , publisher=
work page 2017
-
[61]
Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing , pages=
Robust linear regression: Optimal rates in polynomial time , author=. Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing , pages=
-
[62]
Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms , pages=
Efficient algorithms and lower bounds for robust linear regression , author=. Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms , pages=. 2019 , organization=
work page 2019
-
[63]
Advances in Neural Information Processing Systems , volume=
Robust regression revisited: Acceleration and improved estimation rates , author=. Advances in Neural Information Processing Systems , volume=
-
[64]
SIAM Journal on Computing , volume=
Robust estimators in high-dimensions without the computational intractability , author=. SIAM Journal on Computing , volume=. 2019 , publisher=
work page 2019
-
[65]
Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms , pages=
Robustly learning a gaussian: Getting optimal error, efficiently , author=. Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms , pages=. 2018 , organization=
work page 2018
-
[66]
Thirty-seventh Conference on Neural Information Processing Systems , year=
Near-Optimal Algorithms for Gaussians with Huber Contamination: Mean Estimation and Linear Regression , author=. Thirty-seventh Conference on Neural Information Processing Systems , year=
-
[67]
Robust regression and outlier detection , author=. 2005 , publisher=
work page 2005
-
[68]
IEEE Transactions on Information Theory , year=
Robust Mean Estimation in High Dimensions: An Outlier-Fraction Agnostic and Efficient Algorithm , author=. IEEE Transactions on Information Theory , year=
-
[69]
Robust learning: Information theory and algorithms , author=. 2018 , publisher=
work page 2018
-
[70]
Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing , pages=
List-decodable robust mean estimation and learning mixtures of spherical gaussians , author=. Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing , pages=
-
[71]
Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms , pages=
List decodable learning via sum of squares , author=. Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms , pages=. 2020 , organization=
work page 2020
-
[72]
Proceedings of the 2025 ACM-SIAM Symposium on Discrete Algorithms (SODA) , year =
Outlier-Robust Mean Estimation near the Breakdown Point via Sum-of-Squares , author=. Proceedings of the 2025 ACM-SIAM Symposium on Discrete Algorithms (SODA) , year =
work page 2025
-
[73]
Advances in Neural Information Processing Systems , volume=
Private estimation algorithms for stochastic block models and mixture models , author=. Advances in Neural Information Processing Systems , volume=
-
[74]
Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing , pages=
Quantum entanglement, sum of squares, and the log rank conjecture , author=. Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing , pages=
-
[75]
Advances in Neural Information Processing Systems , volume=
List-decodable sparse mean estimation via difference-of-pairs filtering , author=. Advances in Neural Information Processing Systems , volume=
-
[76]
Better Agnostic Clustering Via Relaxed Tensor Norms
Better agnostic clustering via relaxed tensor norms , author=. arXiv preprint arXiv:1711.07465 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[77]
Conference on learning theory , pages=
How hard is robust mean estimation? , author=. Conference on learning theory , pages=. 2019 , organization=
work page 2019
-
[78]
Bretagnolle, Jean and Huber, Catherine , journal=. Estimation des densit
-
[79]
Information and information stability of random variables and processes , author=. Holden-Day , year=
-
[80]
Resilience: A Criterion for Learning in the Presence of Arbitrary Outliers
Resilience: A criterion for learning in the presence of arbitrary outliers , author=. arXiv preprint arXiv:1703.04940 , year=
work page internal anchor Pith review Pith/arXiv arXiv
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.