On the Convergence of Stochastic Gradient Descent for Nonlinear Ill-Posed Problems

Bangti Jin; Jun Zou; Zehui Zhou

arxiv: 1907.03132 · v1 · pith:74UGH5IZnew · submitted 2019-07-06 · 🧮 math.OC · cs.NA· math.NA

On the Convergence of Stochastic Gradient Descent for Nonlinear Ill-Posed Problems

Bangti Jin , Zehui Zhou , Jun Zou This is my paper

Pith reviewed 2026-05-25 01:43 UTC · model grok-4.3

classification 🧮 math.OC cs.NAmath.NA

keywords stochastic gradient descentnonlinear ill-posed problemsregularizationconvergence ratestangential cone conditionLandweber iterationinverse problemsHilbert spaces

0 comments

The pith

Stochastic gradient descent regularizes nonlinear ill-posed inverse problems when stopped by a priori rules under the tangential cone condition.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that a randomized version of the Landweber iteration, which draws one equation at random per step to estimate the gradient, acts as a regularizer for nonlinear ill-posed problems in Hilbert spaces. The analysis proves that the iterates remain stable and converge to a solution of the noise-free problem when an a priori stopping rule is applied. Under additional sourcewise and range invariance conditions, explicit convergence rates are derived. Because each iteration uses only a single equation, the method avoids computing the full gradient and therefore scales to large systems.

Core claim

Under the canonical tangential cone condition, the stochastic gradient descent iteration satisfies the regularizing property for a priori stopping rules; convergence rates then follow from suitable sourcewise and range invariance conditions.

What carries the argument

The stochastic gradient descent iteration on the nonlinear system, which forms an unbiased gradient estimate by randomly selecting one equation at each step.

Load-bearing premise

The nonlinear forward operator satisfies the tangential cone condition.

What would settle it

A concrete nonlinear operator satisfying the tangential cone condition for which the stochastic iterates diverge or fail to approach the true solution under the stated a priori stopping rule.

read the original abstract

In this work, we analyze the regularizing property of the stochastic gradient descent for the efficient numerical solution of a class of nonlinear ill-posed inverse problems in Hilbert spaces. At each step of the iteration, the method randomly chooses one equation from the nonlinear system to obtain an unbiased stochastic estimate of the gradient, and then performs a descent step with the estimated gradient. It is a randomized version of the classical Landweber method for nonlinear inverse problems, and it is highly scalable to the problem size and holds significant potentials for solving large-scale inverse problems. Under the canonical tangential cone condition, we prove the regularizing property for a priori stopping rules, and then establish the convergence rates under suitable sourcewise condition and range invariance condition.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper extends stochastic gradient analysis to nonlinear ill-posed inverse problems under the tangential cone condition.

read the letter

This paper's main contribution is a convergence theory for stochastic gradient descent on nonlinear ill-posed problems, using a randomized Landweber iteration that samples one equation per step. It proves the regularizing property for a priori stopping rules and then establishes convergence rates under the tangential cone condition, sourcewise condition, and range invariance condition. The extension to the nonlinear stochastic case is the new element, building on deterministic nonlinear Landweber and stochastic results for simpler settings. The paper does well by emphasizing the scalability for large problems and keeping the proofs direct without circular reasoning in the rates. The assumptions are inherited from the deterministic literature, so the tangential cone condition is still required to control the nonlinearity. This is not a new weakness, but it means the result applies where that condition holds. The handling of the stochastic estimate appears to rely on expectation recovering the mean direction, and the stress-test finds no unhandled variance issues or internal contradictions. One minor point is that practical performance might depend on how the random sampling affects the iteration in finite samples, but the theory focuses on the asymptotic regularizing behavior. The citation pattern seems standard for the field, referencing the relevant Landweber and stochastic gradient papers. This is useful for people developing scalable solvers for inverse problems. Readers working on optimization methods in that subfield would get the most from it. The work shows honest engagement with the literature and the technical claims are presented clearly. It is solid enough to go to peer review for a full check of the derivations.

Referee Report

0 major / 4 minor

Summary. The manuscript analyzes a stochastic gradient descent method (randomized Landweber iteration) for nonlinear ill-posed inverse problems in Hilbert spaces. At each step a single equation is chosen uniformly to form an unbiased estimator of the gradient; the iteration is stopped by an a priori rule. Under the tangential cone condition the method is shown to be regularizing; under additional sourcewise and range-invariance conditions convergence rates are derived that match the deterministic case.

Significance. If the proofs hold, the result supplies the first rigorous regularization theory for a scalable, unbiased stochastic variant of the classical Landweber method. The fact that the same structural hypotheses (tangential cone, source condition, range invariance) suffice, with the stochastic gradient entering only through its unbiasedness, is a clean and useful extension. The work directly addresses the need for theoretically justified iterative solvers on large-scale nonlinear inverse problems.

minor comments (4)

The statement of the tangential cone condition (presumably §2) should explicitly record the constant δ and the radius r in which it holds; these parameters appear in the stopping-rule analysis but are not carried through the rate statements.
Notation for the stochastic index selection (uniform over the m equations) is introduced only in the abstract and the introduction; a dedicated paragraph in §2 would improve readability.
The range-invariance condition is used to obtain the rate but is not compared with the weaker conditions that suffice for the deterministic Landweber method; a short remark would clarify the trade-off.
Several displayed equations in the rate proof contain the same generic constant C without distinguishing its dependence on the source parameter; re-labeling would avoid confusion.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. No specific major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper derives regularizing properties and convergence rates for stochastic Landweber iteration directly from the tangential cone condition (plus sourcewise and range-invariance assumptions) that are stated as external structural hypotheses on the nonlinear forward operator. The argument treats the stochastic gradient as an unbiased estimator whose expectation recovers the deterministic descent direction, with the same nonlinearity control; this is a standard reduction and does not presuppose the target rates or reduce any claimed result to a fitted parameter or self-citation chain. No load-bearing step is shown to be equivalent to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 3 axioms · 0 invented entities

The claims rest on three domain-standard assumptions in nonlinear inverse-problem theory; no free parameters or new entities are introduced in the abstract.

axioms (3)

domain assumption Tangential cone condition on the nonlinear operator
Invoked to prove the regularizing property for a priori stopping rules
domain assumption Sourcewise condition
Required to obtain convergence rates
domain assumption Range invariance condition
Required to obtain convergence rates

pith-pipeline@v0.9.0 · 5653 in / 1301 out tokens · 21978 ms · 2026-05-25T01:43:37.803380+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Under the canonical tangential cone condition, we prove the regularizing property for a priori stopping rules, and then establish the convergence rates under suitable sourcewise condition and range invariance condition.
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Assumption 2.1(ii) ... ||F(x) - F(˜x) - F'(˜x)(x - ˜x)|| ≤ η||F(x) - F(˜x)||

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages

[1]

Bottou, F

L. Bottou, F. E. Curtis, and J. Nocedal. Optimization methods fo r large-scale machine learning. SIAM Rev., 60(2):223–311, 2018

work page 2018
[2]

K. Chen, Q. Li, and J.-G. Liu. Online learning in optical tomography: a stochastic approach. Inverse Problems, 34(7):075010, 26 pp., 2018

work page 2018
[3]

Clason and V

C. Clason and V. H. Nhu. Bouligand–Landweber iteration for a non -smooth ill-posed problem. Numer. Math., page in press, 2019

work page 2019
[4]

Dieuleveut and F

A. Dieuleveut and F. Bach. Nonparametric stochastic approxima tion with large step-sizes. Ann. Statist. , 44(4):1363–1399, 2016

work page 2016
[5]

H. W. Engl, M. Hanke, and A. Neubauer. Regularization of Inverse Problems . Kluwer, Dordrecht, 1996

work page 1996
[6]

Hanke, A

M. Hanke, A. Neubauer, and O. Scherzer. A convergence analy sis of the Landweber iteration for nonlinear ill-posed problems. Numer. Math. , 72(1):21–37, 1995

work page 1995
[7]

G. T. Herman, A. Lent, and P. H. Lutz. Relaxation method for ima ge reconstruction. Comm. ACM , 21(2):152–158, 1978

work page 1978
[8]

G. T. Herman and L. B. Meyer. Algebraic reconstruction techniq ues can be made computationally eﬃcient. IEEE Trans. Medical Imag. , 12(3):600–609, 1993

work page 1993
[9]

Ito and B

K. Ito and B. Jin. A new approach to nonlinear constrained Tikhon ov regularization. Inverse Problems , 27(10):105005, 23 pp., 2011

work page 2011
[10]

Ito and B

K. Ito and B. Jin. Inverse Problems: Tikhonov Theory and Algorithms . World Scientiﬁc, Hackensack, NJ, 2015

work page 2015
[11]

Y. Jiao, B. Jin, and X. Lu. Preasymptotic convergence of rand omized Kaczmarz method. Inverse Problems, 33(12):125012, 21 pp., 2017

work page 2017
[12]

Jin and X

B. Jin and X. Lu. On the regularizing property of stochastic gra dient descent. Inverse Problems , 35(1):015004, 27 pp., 2019

work page 2019
[13]

Kaltenbacher, A

B. Kaltenbacher, A. Neubauer, and O. Scherzer. Iterative Regularization Methods for Nonlinear Ill-posed Problems. Walter de Gruyter, Berlin, 2008

work page 2008
[14]

D. P. Kingma and J. Ba. Adam: a method for stochastic optimizat ion. In Proceedings of the 3rd International Conference on Learning Representations (ICLR) , 2015

work page 2015
[15]

H. J. Kushner and G. G. Yin. Stochastic Approximation and Recursive Algorithms and App lications. Springer-Verlag, New York, second edition, 2003

work page 2003
[16]

Landweber

L. Landweber. An iteration formula for Fredholm integral equa tions of the ﬁrst kind. Amer. J. Math. , 73:615–624, 1951

work page 1951
[17]

Lin and L

J. Lin and L. Rosasco. Optimal rates for multi-pass stochastic gradient methods. J. Mach. Learn. Res. , 18:1–47, 2017

work page 2017
[18]

A. K. Louis. Inverse und Schlecht Gestellte Probleme . B. G. Teubner, Stuttgart, 1989

work page 1989
[19]

S. F. McCormick and G. H. Rodrigue. A uniform approach to grad ient methods for linear operator equations. J. Math. Anal. Appl. , 49:275–285, 1975

work page 1975
[20]

Needell, N

D. Needell, N. Srebro, and R. Ward. Stochastic gradient desce nt, weighted sampling, and the randomized Kaczmarz algorithm. Math. Program., Ser. A , 155(1-2):549–573, 2016

work page 2016
[21]

Robbins and S

H. Robbins and S. Monro. A stochastic approximation method. Ann. Math. Stat. , 22:400–407, 1951

work page 1951
[22]

Scherzer, M

O. Scherzer, M. Grasmair, H. Grossauer, M. Haltmeier, and F. Lenzen. Variational Methods in Imaging . Springer, New York, 2009. 27

work page 2009
[23]

Schuster, B

T. Schuster, B. Kaltenbacher, B. Hofmann, and K. S. Kazimier ski. Regularization Methods in Banach Spaces. Walter de Gruyter, Berlin, 2012

work page 2012
[24]

Strohmer and R

T. Strohmer and R. Vershynin. A randomized Kaczmarz algorith m with exponential convergence. J. Fourier Anal. Appl. , 15(2):262–278, 2009

work page 2009
[25]

Sutskever, J

I. Sutskever, J. Martens, G. Dahl, and G. E. Hinton. On the imp ortance of initialization and momentum in deep learning. In S. Dasgupta and D. Mcallester, editors, Proceedings of the 30th International Conference on Machine Learning (ICML-13) , pages 1139–1147, Atlanta, GA, 2013

work page 2013
[26]

Y. S. Tan and R. Vershynin. Phase retrieval via randomized Kac zmarz: theoretical guarantees. Inf. Inference, 8(1):97–123, 2019

work page 2019
[27]

V. V. Vasin. Iterative methods for solving ill-posed problems with a priori information in Hilbert spaces. Zh. Vychisl. Mat. i Mat. Fiz. , 28(7):971–980, 1117, 1988

work page 1988
[28]

G. M. Va ˘ ınikko and A. Y. Veretennikov.Iteration Procedures in Ill-posed Problems. “Nauka”, Moscow, 1986

work page 1986
[29]

Ying and M

Y. Ying and M. Pontil. Online gradient descent learning algorithms. Found. Comput. Math. , 8(5):561–596, 2008

work page 2008
[30]

T. Zhang. Solving large scale linear prediction problems using stoc hastic gradient descent algorithms. In C. Brodley, editor, Proceedings of of the Twenty First International Conferenc e on Machine Learning , pages 919–926, Banﬀ, Alberta, Canada, 2004. 28

work page 2004

[1] [1]

Bottou, F

L. Bottou, F. E. Curtis, and J. Nocedal. Optimization methods fo r large-scale machine learning. SIAM Rev., 60(2):223–311, 2018

work page 2018

[2] [2]

K. Chen, Q. Li, and J.-G. Liu. Online learning in optical tomography: a stochastic approach. Inverse Problems, 34(7):075010, 26 pp., 2018

work page 2018

[3] [3]

Clason and V

C. Clason and V. H. Nhu. Bouligand–Landweber iteration for a non -smooth ill-posed problem. Numer. Math., page in press, 2019

work page 2019

[4] [4]

Dieuleveut and F

A. Dieuleveut and F. Bach. Nonparametric stochastic approxima tion with large step-sizes. Ann. Statist. , 44(4):1363–1399, 2016

work page 2016

[5] [5]

H. W. Engl, M. Hanke, and A. Neubauer. Regularization of Inverse Problems . Kluwer, Dordrecht, 1996

work page 1996

[6] [6]

Hanke, A

M. Hanke, A. Neubauer, and O. Scherzer. A convergence analy sis of the Landweber iteration for nonlinear ill-posed problems. Numer. Math. , 72(1):21–37, 1995

work page 1995

[7] [7]

G. T. Herman, A. Lent, and P. H. Lutz. Relaxation method for ima ge reconstruction. Comm. ACM , 21(2):152–158, 1978

work page 1978

[8] [8]

G. T. Herman and L. B. Meyer. Algebraic reconstruction techniq ues can be made computationally eﬃcient. IEEE Trans. Medical Imag. , 12(3):600–609, 1993

work page 1993

[9] [9]

Ito and B

K. Ito and B. Jin. A new approach to nonlinear constrained Tikhon ov regularization. Inverse Problems , 27(10):105005, 23 pp., 2011

work page 2011

[10] [10]

Ito and B

K. Ito and B. Jin. Inverse Problems: Tikhonov Theory and Algorithms . World Scientiﬁc, Hackensack, NJ, 2015

work page 2015

[11] [11]

Y. Jiao, B. Jin, and X. Lu. Preasymptotic convergence of rand omized Kaczmarz method. Inverse Problems, 33(12):125012, 21 pp., 2017

work page 2017

[12] [12]

Jin and X

B. Jin and X. Lu. On the regularizing property of stochastic gra dient descent. Inverse Problems , 35(1):015004, 27 pp., 2019

work page 2019

[13] [13]

Kaltenbacher, A

B. Kaltenbacher, A. Neubauer, and O. Scherzer. Iterative Regularization Methods for Nonlinear Ill-posed Problems. Walter de Gruyter, Berlin, 2008

work page 2008

[14] [14]

D. P. Kingma and J. Ba. Adam: a method for stochastic optimizat ion. In Proceedings of the 3rd International Conference on Learning Representations (ICLR) , 2015

work page 2015

[15] [15]

H. J. Kushner and G. G. Yin. Stochastic Approximation and Recursive Algorithms and App lications. Springer-Verlag, New York, second edition, 2003

work page 2003

[16] [16]

Landweber

L. Landweber. An iteration formula for Fredholm integral equa tions of the ﬁrst kind. Amer. J. Math. , 73:615–624, 1951

work page 1951

[17] [17]

Lin and L

J. Lin and L. Rosasco. Optimal rates for multi-pass stochastic gradient methods. J. Mach. Learn. Res. , 18:1–47, 2017

work page 2017

[18] [18]

A. K. Louis. Inverse und Schlecht Gestellte Probleme . B. G. Teubner, Stuttgart, 1989

work page 1989

[19] [19]

S. F. McCormick and G. H. Rodrigue. A uniform approach to grad ient methods for linear operator equations. J. Math. Anal. Appl. , 49:275–285, 1975

work page 1975

[20] [20]

Needell, N

D. Needell, N. Srebro, and R. Ward. Stochastic gradient desce nt, weighted sampling, and the randomized Kaczmarz algorithm. Math. Program., Ser. A , 155(1-2):549–573, 2016

work page 2016

[21] [21]

Robbins and S

H. Robbins and S. Monro. A stochastic approximation method. Ann. Math. Stat. , 22:400–407, 1951

work page 1951

[22] [22]

Scherzer, M

O. Scherzer, M. Grasmair, H. Grossauer, M. Haltmeier, and F. Lenzen. Variational Methods in Imaging . Springer, New York, 2009. 27

work page 2009

[23] [23]

Schuster, B

T. Schuster, B. Kaltenbacher, B. Hofmann, and K. S. Kazimier ski. Regularization Methods in Banach Spaces. Walter de Gruyter, Berlin, 2012

work page 2012

[24] [24]

Strohmer and R

T. Strohmer and R. Vershynin. A randomized Kaczmarz algorith m with exponential convergence. J. Fourier Anal. Appl. , 15(2):262–278, 2009

work page 2009

[25] [25]

Sutskever, J

I. Sutskever, J. Martens, G. Dahl, and G. E. Hinton. On the imp ortance of initialization and momentum in deep learning. In S. Dasgupta and D. Mcallester, editors, Proceedings of the 30th International Conference on Machine Learning (ICML-13) , pages 1139–1147, Atlanta, GA, 2013

work page 2013

[26] [26]

Y. S. Tan and R. Vershynin. Phase retrieval via randomized Kac zmarz: theoretical guarantees. Inf. Inference, 8(1):97–123, 2019

work page 2019

[27] [27]

V. V. Vasin. Iterative methods for solving ill-posed problems with a priori information in Hilbert spaces. Zh. Vychisl. Mat. i Mat. Fiz. , 28(7):971–980, 1117, 1988

work page 1988

[28] [28]

G. M. Va ˘ ınikko and A. Y. Veretennikov.Iteration Procedures in Ill-posed Problems. “Nauka”, Moscow, 1986

work page 1986

[29] [29]

Ying and M

Y. Ying and M. Pontil. Online gradient descent learning algorithms. Found. Comput. Math. , 8(5):561–596, 2008

work page 2008

[30] [30]

T. Zhang. Solving large scale linear prediction problems using stoc hastic gradient descent algorithms. In C. Brodley, editor, Proceedings of of the Twenty First International Conferenc e on Machine Learning , pages 919–926, Banﬀ, Alberta, Canada, 2004. 28

work page 2004