Convex Basins in Single-Index Model Loss Landscapes: Applications to Robust Recovery under Strong Adversarial Corruption

Jatin Batra; Sagnik Chatterjee; Santanu Das

arxiv: 2605.29497 · v1 · pith:V5USDIL7new · submitted 2026-05-28 · 💻 cs.LG

Convex Basins in Single-Index Model Loss Landscapes: Applications to Robust Recovery under Strong Adversarial Corruption

Santanu Das , Sagnik Chatterjee , Jatin Batra This is my paper

Pith reviewed 2026-06-29 09:00 UTC · model grok-4.3

classification 💻 cs.LG

keywords robust recoverysingle-index modelsnon-monotonic link functionsadversarial corruptionconvex basinspectral initializationGaussian squared-loss

0 comments

The pith

For generic non-monotonic single-index models, the squared-loss landscape retains a dimension-independent convex basin around the ground truth that robust spectral initialization reaches despite constant-fraction adversarial corruption.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proves that Gaussian single-index models with asymmetric non-monotonic link functions such as GeLU and Swish possess a constant-radius convex basin around the true parameter in their squared-loss landscape. This basin stays accessible by robust spectral methods even when a constant fraction of covariates and responses are adversarially corrupted and noise is heavy-tailed. The structural property supplies a warm start for robust gradient descent that converges to error O(σ√ε) using near-linear samples and time, delivering the first such guarantees for this family of models.

Core claim

For a broad class of nonlinear non-monotonic SIMs, a dimension-independent, constant-radius convex basin exists around the ground truth in the Gaussian squared-loss landscape and remains efficiently reachable via robust spectral initialization even under adversarial contamination, yielding a final estimation error of O(σ√ε) after robust gradient descent in Õ(nd) time with Õ(d) samples.

What carries the argument

The constant-radius convex basin around the ground truth in the squared-loss landscape, preserved for non-monotonic links under contamination and reachable by robust spectral initialization.

If this is right

Robust gradient descent started inside the basin converges to the stated error bound.
The method achieves the first robust recovery guarantees for previously unhandled non-monotonic links.
Sample and time complexity remain near-linear in the dimension.
Recovery error scales as O(σ√ε) where ε is the contamination fraction.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The basin property may extend to initialization strategies in deeper networks whose scalar gates use the same activations.
Similar landscape analysis could apply to other robust estimation problems that currently lack non-monotonic guarantees.
Empirical measurement of basin radius for concrete links would test the dimension-independence claim directly.

Load-bearing premise

The link functions belong to the broad class of generic asymmetric non-monotonic functions for which the Gaussian squared-loss landscape possesses the stated convex basin property under the given noise and corruption model.

What would settle it

A concrete numerical check or analytic counterexample for a specific non-monotonic link such as Swish showing that, under the stated adversarial contamination model, either no constant-radius convex basin exists around the true parameter or robust spectral initialization fails to land inside it.

read the original abstract

We study the problem of robustly learning Gaussian Single Index Models (SIMs) in the presence of heavy-tailed noise and a constant fraction of adversarially corrupted covariates and responses. Prior work on robust recovery has considered settings such as linear regression (Pensia et al., JASA 2024), strictly monotonic link functions (Awasthi et al., NeurIPS 2022), and phase retrieval (Buna and Rebeschini, AISTATS 2025). However, these techniques do not extend to generic asymmetric non-monotonic link functions such as \textsc{GeLU} and \textsc{Swish}, which arise naturally as scalar primitives in modern gated neural architectures. We close this gap by giving the first robust recovery algorithm with near-linear sample and time complexity for generic non-monotonic link functions, thereby establishing the first robust recovery guarantees for a broad family of nonlinear SIMs for which \textit{no guarantees were previously known}. Our central contribution is a new structural understanding of the Gaussian squared-loss landscape under adversarial contamination. Crucially, we prove that for a broad class of nonlinear non-monotonic SIMs, a dimension-independent, constant-radius convex basin exists around the ground truth and is efficiently reachable via robust spectral initialization even under adversarial contamination. Prior works fail to establish both guarantees simultaneously, thereby either breaking down under adversarial contamination or failing to handle generic non-monotonic link functions. Together, these structural insights yield a principled warm start for robust gradient descent that provably converges to a final estimation error of $O(\sigma\sqrt{\epsilon})$ in $\tilde{O}(nd)$ time with $\tilde{O}(d)$ samples, where $\epsilon$ is the contamination fraction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper claims the first robust recovery guarantees for generic non-monotonic single-index models like GeLU under adversarial corruption by proving a constant-radius convex basin reachable via robust spectral initialization.

read the letter

The main takeaway is that this work closes a gap by handling asymmetric non-monotonic links in robust SIM recovery, where earlier results stopped at monotonic cases or linear regression. It shows a dimension-independent convex basin around the ground truth in the contaminated squared-loss landscape and pairs it with a robust spectral method that reaches the basin, after which gradient descent gives O(σ√ε) error in near-linear time.

What stands out is the structural claim on the landscape under heavy-tailed noise and constant-fraction corruption; the abstract positions this as new for the broad class that includes practical activations. The sample complexity of Õ(d) and time Õ(nd) are attractive if the basin property holds without extra restrictions.

The soft spots are in the breadth of the link class and the noise model. The argument needs to confirm that GeLU and Swish fit the stated conditions without hidden monotonicity or symmetry requirements, and the heavy-tailed noise handling should be checked against the exact moment assumptions. The initialization step also needs to be verified as truly robust rather than relying on the basin radius being derived in a way that assumes the result.

This is for researchers working on robust estimation for gated networks or non-linear SIMs. The central claim is coherent on its own terms and the citation pattern covers the relevant prior work on monotonic and linear cases. It deserves a serious referee because the gap it targets is real and the claimed complexity is practical.

Referee Report

0 major / 2 minor

Summary. The paper claims to give the first robust recovery guarantees for Gaussian single-index models with generic asymmetric non-monotonic link functions (e.g., GeLU, Swish) under heavy-tailed noise and constant-fraction adversarial corruption of covariates and responses. The central technical contribution is a proof that the contaminated squared-loss landscape contains a dimension-independent, constant-radius convex basin around the ground truth that is reachable by a robust spectral initializer; this warm-start then permits robust gradient descent to achieve error O(σ√ε) in Õ(nd) time using Õ(d) samples.

Significance. If the claimed structural result on the convex basin holds for the stated class of links, the work would meaningfully extend robust SIM recovery beyond monotonic or phase-retrieval settings to link functions that appear as building blocks in modern gated networks, while retaining near-linear complexity. The combination of a provably reachable basin under contamination and explicit sample/time bounds would be a notable advance.

minor comments (2)

[Abstract] Abstract: the phrase 'broad class of nonlinear non-monotonic SIMs' is used without a precise characterization of the link-function assumptions (e.g., growth conditions, asymmetry requirements) that would allow a reader to verify whether GeLU and Swish are covered by the stated theorems.
[Abstract] Abstract: the error bound O(σ√ε) is stated without indicating whether σ is the noise scale or a different quantity and without referencing the theorem that establishes the bound.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their summary of the paper and for recognizing the potential significance of establishing a reachable convex basin in the contaminated squared-loss landscape for generic non-monotonic links. We are encouraged by the assessment that this would extend robust SIM recovery beyond monotonic and phase-retrieval settings while retaining near-linear complexity. No specific major comments appear in the provided report, so we have no individual points to address point-by-point.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The abstract and claims present a proof of a dimension-independent convex basin around the ground truth for non-monotonic SIMs under contamination, along with reachability via robust spectral initialization. No equations, fitted parameters, self-citations, or ansatzes are provided that would reduce any prediction or uniqueness claim to the inputs by construction. The derivation is described as a new structural result independent of the target guarantees, with no visible self-definitional or fitted-input patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; all technical details required for the ledger are absent.

pith-pipeline@v0.9.1-grok · 5848 in / 1050 out tokens · 24276 ms · 2026-06-29T09:00:17.618203+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

4 extracted references · 3 canonical work pages

[1]

URLhttps://dl.acm.org/doi/10.5555/3666122.3666569

Curran Associates, Inc. URLhttps://dl.acm.org/doi/10.5555/3666122.3666569. Bubeck, S. Convex optimization: Algorithms and complexity.Found. Trends Mach. Learn., 8(3–4):231–357, November

work page doi:10.5555/3666122.3666569
[2]

doi: 10.1561/2200000050

ISSN 1935-8237. doi: 10.1561/2200000050. URLhttps://doi.org/10.1561/2200000050. Buna, A. and Rebeschini, P. Robust gradient descent for phase retrieval. In Li, Y ., Mandt, S., Agrawal, S., and Khan, E. (eds.),Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, vol- ume 258 ofProceedings of Machine Learning Research,...

work page doi:10.1561/2200000050 1935
[3]

Klivans, A., Kothari, P

URLhttps://www.microsoft.com/en-us/research/publication/isotron-algorithm- high-dimensional-isotonic-regression/. Klivans, A., Kothari, P. K., and Meka, R. Efficient algorithms for outlier-robust regression. InProceedings of the 31st Conference On Learning Theory, volume 75 ofProceedings of Machine Learning Research, pp. 1420–1430. PMLR, 06–09 Jul 2018. U...

work page doi:10.2307/2290563 2018
[4]

""Calculations

and Lemma 2.2 from (Buna & Rebeschini, 2025). Look at any of the above-mentioned references for the proof of the Lemma D.1. We are omitting the proof here. Our second key lemma computes the trace and operator norm of the covariance matrix of the gradient. Lemma D.2.Let’s assume thatβ, β ⋆ ∈R d are fixed vectors andX∼ N(0,I d),y=f(X ⊤β⋆) + ζ. LetΣ = Var f(...

2025

[1] [1]

URLhttps://dl.acm.org/doi/10.5555/3666122.3666569

Curran Associates, Inc. URLhttps://dl.acm.org/doi/10.5555/3666122.3666569. Bubeck, S. Convex optimization: Algorithms and complexity.Found. Trends Mach. Learn., 8(3–4):231–357, November

work page doi:10.5555/3666122.3666569

[2] [2]

doi: 10.1561/2200000050

ISSN 1935-8237. doi: 10.1561/2200000050. URLhttps://doi.org/10.1561/2200000050. Buna, A. and Rebeschini, P. Robust gradient descent for phase retrieval. In Li, Y ., Mandt, S., Agrawal, S., and Khan, E. (eds.),Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, vol- ume 258 ofProceedings of Machine Learning Research,...

work page doi:10.1561/2200000050 1935

[3] [3]

Klivans, A., Kothari, P

URLhttps://www.microsoft.com/en-us/research/publication/isotron-algorithm- high-dimensional-isotonic-regression/. Klivans, A., Kothari, P. K., and Meka, R. Efficient algorithms for outlier-robust regression. InProceedings of the 31st Conference On Learning Theory, volume 75 ofProceedings of Machine Learning Research, pp. 1420–1430. PMLR, 06–09 Jul 2018. U...

work page doi:10.2307/2290563 2018

[4] [4]

""Calculations

and Lemma 2.2 from (Buna & Rebeschini, 2025). Look at any of the above-mentioned references for the proof of the Lemma D.1. We are omitting the proof here. Our second key lemma computes the trace and operator norm of the covariance matrix of the gradient. Lemma D.2.Let’s assume thatβ, β ⋆ ∈R d are fixed vectors andX∼ N(0,I d),y=f(X ⊤β⋆) + ζ. LetΣ = Var f(...

2025