Linear convergence of Gearhart-Koshy accelerated Kaczmarz methods for tensor linear systems

Deren Han; Jiaxin Xie; Yijie Wang; Yonghan Sun

arxiv: 2604.05816 · v1 · submitted 2026-04-07 · 🧮 math.NA · cs.NA

Linear convergence of Gearhart-Koshy accelerated Kaczmarz methods for tensor linear systems

Yijie Wang , Yonghan Sun , Deren Han , Jiaxin Xie This is my paper

Pith reviewed 2026-05-10 18:38 UTC · model grok-4.3

classification 🧮 math.NA cs.NA

keywords Kaczmarz methodGearhart-Koshy accelerationtensor linear systemslinear convergenceleast-norm solutioniterative methodsnumerical linear algebra

0 comments

The pith

The generalized Gearhart-Koshy accelerated Kaczmarz method for tensor linear systems converges linearly to the unique least-norm solution.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proves that adding the generalized Gearhart-Koshy acceleration to the Kaczmarz method produces linear convergence when the unknowns are arranged as tensors. The acceleration performs an exact affine search along the line of successive projections onto hyperplanes, tightening the error reduction at each step compared with plain cycling. A reader would care because tensor equations arise in imaging, data compression, and multi-way analysis, and a guaranteed linear rate plus an efficient implementation makes these solvers practical for large instances. The same acceleration works for incremental, shuffle-once, and random-reshuffling variants, and the analysis supplies a strictly smaller convergence factor than the unaccelerated method. A linear-time Gram-Schmidt realization is given together with an explicit link to Arnoldi-type Krylov techniques.

Core claim

The generalized Gearhart-Koshy acceleration applied to the Kaczmarz method for tensor linear systems yields linear convergence to the unique least-norm solution. This holds for cyclic, incremental, shuffle-once, and random-reshuffling schemes, with an improved convergence upper bound compared to the plain Kaczmarz method.

What carries the argument

The generalized Gearhart-Koshy acceleration, an exact affine search technique for successive orthogonal projections onto hyperplanes in the tensor setting.

If this is right

The accelerated scheme applies directly to incremental, shuffle-once, and random-reshuffling Kaczmarz variants on tensors.
It supplies a strictly better convergence upper bound than the unaccelerated Kaczmarz method for the same ordering.
A Gram-Schmidt implementation computes each new iterate in linear time relative to the problem size.
The acceleration framework is equivalent to a short Arnoldi-type Krylov step, opening the door to hybrid solvers.
Numerical tests on random and structured tensor systems confirm the predicted linear rates.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Because the acceleration is an exact line search, the same device could be inserted into other projection or splitting methods that operate on high-order arrays.
The linear-time Gram-Schmidt step suggests these methods remain feasible even when the tensor dimensions are thousands in each mode.
The Krylov connection implies that further acceleration may be obtained by enlarging the search subspace beyond one dimension while preserving the projection property.
Relaxing the unique-solution assumption would allow the same machinery to be analyzed for inconsistent systems or minimum-norm least-squares problems.

Load-bearing premise

The tensor linear system admits a unique least-norm solution, which requires consistency together with range conditions on the tensor operator that make the feasible set a singleton orthogonal to the null space.

What would settle it

A concrete tensor linear system together with an explicit starting point for which the accelerated iterates fail to exhibit linear error reduction or do not approach the least-norm solution.

Figures

Figures reproduced from arXiv: 2604.05816 by Deren Han, Jiaxin Xie, Yijie Wang, Yonghan Sun.

**Figure 1.** Figure 1: A geometric interpretation of the generalized Gearhart-Koshy acceleration. The next iterate X k+1 is determined by finding the point in the affine subspace Πk = aff X jk,τ , . . . , X k ,Pπk (X k ) [PITH_FULL_IMAGE:figures/full_fig_p009_1.png] view at source ↗

**Figure 2.** Figure 2: shows that both implementations require identical iteration counts for all tested κ and r, which is consistent with their mathematical equivalence. However, GS-TKGK-SO consistently achieves lower CPU time. This performance difference is consistent with the theoretical complexity analysis in Section 4.1. Tri-TKGK-SO involves repeated extraction and concatenation operations on tensor arrays, incurring additi… view at source ↗

**Figure 3.** Figure 3: The figures illustrate the evolution of RSE with respect to the number of iterations (top) and the CPU time (bottom). The title of each subplot indicates the corresponding values of κ and r. We set m = 100, ℓ = 75, n = 3, and p = 75. All experiments terminate once the RSE < 10−12 . 5.2. Real world data. In this section, we employ the built-in MATLAB video dataset traffic.avi as the ground truth to simulate… view at source ↗

**Figure 4.** Figure 4: Comparison of RSE versus number of full iterations (left) and CPU time (right) for TK, GS-TKGK, and tAKSHBM on blurred traffic.avi video reconstruction. The algorithm terminates after at most 2000 full iterations [PITH_FULL_IMAGE:figures/full_fig_p023_4.png] view at source ↗

**Figure 5.** Figure 5: Blurred traffic.avi video reconstruction results for tAKSHBM, GS-TKGK-SO and TK-SO on frames indexed by 20, 51, 72, and 100. The title of each subplot (except the first column of original frames) reports the PSNR and SSIM values of the recovered or blurred frame with respect to the corresponding original frame. The algorithms terminate if RSE < 5 × 10−3 [PITH_FULL_IMAGE:figures/full_fig_p024_5.png] view at source ↗

read the original abstract

The generalized Gearhart-Koshy acceleration is a recent exact affine search technique designed for the method of cyclic projections onto hyperplanes, i.e., the Kaczmarz method. However, its convergence properties, particularly the linear convergence rate, have not been thoroughly established. In this paper, we systematically establish the linear convergence of the generalized Gearhart-Koshy accelerated Kaczmarz method for tensor linear systems, proving that it converges linearly to the unique least-norm solution. Our analysis is general and applies to several popular Kaczmarz variants, including incremental, shuffle-once, and random-reshuffling schemes, and demonstrates that this acceleration approach yields a better convergence upper bound compared to the plain Kaczmarz method. We also propose an efficient Gram-Schmidt-based implementation that computes the next iterate in linear time. Building on this implementation, we establish a connection between this acceleration framework and Arnoldi-type Krylov subspace methods, further highlighting its efficiency and potential. Our theoretical results are supported by numerical experiments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a solid linear convergence proof for Gearhart-Koshy accelerated Kaczmarz on consistent tensor systems, with a tighter rate, variant coverage, and a linear-time Gram-Schmidt implementation.

read the letter

The core result here is a general proof that the generalized Gearhart-Koshy acceleration applied to Kaczmarz-type methods on tensor linear systems converges linearly to the unique least-norm solution, and that the contraction factor improves on the plain cyclic version. The same argument covers incremental, shuffle-once, and random-reshuffling variants without extra assumptions beyond consistency and the stated range condition on the tensor operator. They also supply a Gram-Schmidt implementation that keeps the per-iteration cost linear and note a connection to Arnoldi-type Krylov methods as a byproduct. Numerical checks on small consistent instances line up with the predicted rates. The analysis rests on standard projection properties and error recursions, so it avoids circularity or hidden fitting. Extending the scalar-case argument to the multilinear setting is the non-routine part, and the paper handles it cleanly. The experiments are limited to small problems, which leaves open questions about scaling behavior on larger tensors. The unique least-norm solution requirement is stated up front, but it does restrict the result to consistent systems; inconsistent cases are left for future work or other techniques. The Krylov link is presented as a corollary rather than a central claim, so it adds flavor without carrying the main load. This is useful reading for anyone working on row-action or projection methods for tensor equations. A reader who already knows the scalar Kaczmarz literature will see exactly where the tensor extension sits and how the acceleration pays off. I would send it for peer review; the proof appears self-contained and the implementation detail is practical enough to justify referee time.

Referee Report

0 major / 3 minor

Summary. The manuscript establishes linear convergence of the generalized Gearhart-Koshy accelerated Kaczmarz method for consistent tensor linear systems possessing a unique least-norm solution. It extends the standard cyclic-projection argument to the exact line-search acceleration step, derives a strictly smaller contraction factor than the unaccelerated Kaczmarz iteration, and shows that the same analysis covers incremental, shuffle-once, and random-reshuffling variants. An efficient Gram-Schmidt implementation is proposed that runs in linear time per iteration, and a connection to Arnoldi-type Krylov methods is derived as a corollary. Numerical experiments on small consistent instances confirm the predicted rates.

Significance. If the central derivation holds, the work supplies a parameter-free acceleration with a provably improved linear rate for projection methods on tensor equations, together with explicit coverage of several popular Kaczmarz variants and an efficient implementation. These elements constitute a clear strengthening of existing convergence theory for Kaczmarz-type solvers in the tensor setting and may be useful for large-scale applications where tensor linear systems appear.

minor comments (3)

[§3.2] §3.2, after Eq. (3.7): the transition from the two-step error recursion to the single-step contraction factor is stated without an intermediate inequality; inserting the missing step would make the comparison with the plain Kaczmarz rate fully explicit.
[Table 1] Table 1: the reported iteration counts for the accelerated method are given only for one random seed; adding a second column with standard deviation over 10 independent runs would strengthen the empirical support for the theoretical rate improvement.
[§4.1] §4.1: the statement that the Gram-Schmidt implementation 'computes the next iterate in linear time' is correct only when the number of rows is fixed; a brief remark on the dependence on the tensor order would avoid potential misreading.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and positive assessment of our manuscript, including the accurate summary of our contributions and the recommendation for minor revision. No specific major comments were raised.

read point-by-point responses

Referee: No major comments were provided; the report consists of a summary, significance statement, and minor-revision recommendation.

Authors: We appreciate the referee's recognition that the central derivation supplies a parameter-free acceleration with a provably improved linear rate, explicit coverage of the listed Kaczmarz variants, and an efficient Gram-Schmidt implementation. We will incorporate minor revisions to improve exposition, add a short remark on the connection to Arnoldi methods in the introduction, and correct any typographical issues. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the convergence proof

full rationale

The manuscript establishes linear convergence of the generalized Gearhart-Koshy accelerated Kaczmarz iteration for consistent tensor linear systems possessing a unique least-norm solution. The derivation proceeds from explicit range conditions on the tensor operator, applies the standard cyclic-projection error recursion, and obtains a strictly improved contraction factor for the exact line-search step; the same recursion covers the listed variants without introducing fitted parameters or self-referential definitions. Assumptions are stated up front, the Gram-Schmidt implementation and Arnoldi link are corollaries rather than load-bearing steps, and no equation reduces the claimed rate to a quantity defined by the result itself.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The proof relies on standard properties of Hilbert-space projections and the assumption that the tensor system possesses a unique least-norm solution; no free parameters or invented entities are introduced.

axioms (2)

domain assumption The tensor linear system has a unique least-norm solution.
Explicitly stated as the convergence target in the abstract.
standard math Orthogonal projections onto hyperplanes satisfy the usual geometric properties in the underlying Hilbert space.
Implicit foundation of every Kaczmarz-type analysis.

pith-pipeline@v0.9.0 · 5487 in / 1297 out tokens · 45389 ms · 2026-05-10T18:38:39.159177+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We systematically establish the linear convergence of the generalized Gearhart-Koshy accelerated Kaczmarz method for tensor linear systems, proving that it converges linearly to the unique least-norm solution.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the iteration sequence {X_k} generated by Algorithm 1 satisfies ||X_{k+1}-X_0^*||_F² ≤ ρ_πk ||X_k-X_0^*||_F² with ρ_πk<1

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

71 extracted references · 71 canonical work pages

[1]

K. Ahn, C. Yun, and S. Sra. SGD with shuﬄing: optimal rates without component convexity and large epoch requirements. Advances in Neural Information Processing Systems , 33:17526–17535, 2020

work page 2020
[2]

Bai and W.-T

Z.-Z. Bai and W.-T. Wu. On greedy randomized Kaczmarz met hod for solving large sparse linear systems. SIAM Journal on Scientiﬁc Computing , 40(1):A592–A606, 2018

work page 2018
[3]

W. Bao, F. Zhang, W. Li, Q. Wang, and Y. Gao. Randomized ave rage Kaczmarz algorithm for tensor linear systems. Mathematics, 10(23):4594, 2022

work page 2022
[4]

Castillo, J

A. Castillo, J. Haddock, I. Hartsock, P. Hoyos, L. Kassab , A. Kryshchenko, K. Larripa, D. Needell, S. Suryanarayanan, and K. Y. Djima. Block Gauss-Seidel meth ods for t-product tensor regression. arXiv preprint arXiv:2503.19155 , 2025

work page arXiv 2025
[5]

Castillo, J

A. Castillo, J. Haddock, I. Hartsock, P. Hoyos, L. Kassab , A. Kryshchenko, K. Larripa, D. Needell, S. Suryanarayanan, and K. Yacoubou-Djima. Randomized Kacz marz methods for t-product tensor linear systems with factorized operators. arXiv preprint arXiv:2412.10583 , 2024

work page arXiv 2024
[6]

Chen and J

X. Chen and J. Qin. Regularized Kaczmarz algorithms for t ensor recovery. SIAM Journal on Imaging Sciences, 14(4):1439–1471, 2021

work page 2021
[7]

K. Du. Tight upper bounds for the convergence of the rando mized extended Kaczmarz and Gauss–Seidel algorithms. Numerical Linear Algebra with Applications , 26(3):e2233, 2019

work page 2019
[8]

W. B. Gearhart and M. Koshy. Acceleration schemes for the method of alternating projections. Journal of Computational and Applied Mathematics , 26(3):235–249, 1989

work page 1989
[9]

G. H. Golub and C. F. Van Loan. Matrix computations . JHU press, 2013

work page 2013
[10]

Gordon, R

R. Gordon, R. Bender, and G. T. Herman. Algebraic recons truction techniques (ART) for three- dimensional electron microscopy and X-ray photography. J. Theor. Biol. , 29:471–481, 1970

work page 1970
[11]

R. M. Gower, D. Molitor, J. Moorman, and D. Needell. On ad aptive sketch-and-project for solving linear systems. SIAM Journal on Matrix Analysis and Applications , 42(2):954–989, 2021

work page 2021
[12]

R. M. Gower and P. Richt´ arik. Randomized iterative met hods for linear systems. SIAM Journal on Matrix Analysis and Applications , 36(4):1660–1690, 2015. 26 YIJIE W ANG, YONGHAN SUN, DEREN HAN, AND JIAXIN XIE

work page 2015
[13]

M. E. Guide, A. E. Ichi, K. Jbilou, and R. Sadaka. Tensor K rylov subspace methods via the T-product for color image processing. arXiv preprint arXiv:2006.07133 , 2020

work page arXiv 2006
[14]

D. Han, Y. Su, and J. Xie. Randomized Douglas-Rachford m ethods for linear systems: Improved accuracy and eﬃciency. SIAM J. Optim. , 34(1):1045–1070, 2024

work page 2024
[15]

Han and J

D. Han and J. Xie. On pseudoinverse-free randomized met hods for linear systems: Uniﬁed framework and acceleration. Optimization Methods and Software , pages 1–36, 2025

work page 2025
[16]

Han and J.-X

D.-R. Han and J.-X. Xie. A simple linear convergence ana lysis of the randomized reshuﬄing Kaczmarz method. Journal of the Operations Research Society of China , pages 1–13, 2025

work page 2025
[17]

Hegland and J

M. Hegland and J. Rieger. Generalized Gearhart-Koshy a cceleration is a Krylov space method of a new type. arXiv preprint arXiv:2311.18305 , 2023

work page arXiv 2023
[18]

G. T. Herman and L. B. Meyer. Algebraic reconstruction t echniques can be made computationally eﬃcient (positron emission tomography application). IEEE Trans. Medical Imaging , 12:600–609, 1993

work page 1993
[19]

Jeong and D

H. Jeong and D. Needell. Linear convergence of reshuﬄin g Kaczmarz methods with sparse constraints. SIAM Journal on Scientiﬁc Computing , 47(4):A2323–A2352, 2025

work page 2025
[20]

Karczmarz

S. Karczmarz. Angen¨ aherte auﬂ¨ osung von systemen lin earer glei-chungen. Bull. Int. Acad. Pol. Sic. Let., Cl. Sci. Math. Nat. , pages 355–357, 1937

work page 1937
[21]

Karczmarz

S. Karczmarz. Angenaherte auﬂosung von systemen linea rer glei-chungen. Bull. Int. Acad. Pol. Sic. Let., Cl. Sci. Math. Nat. , pages 355–357, 1937

work page 1937
[22]

M. E. Kilmer, K. Braman, N. Hao, and R. C. Hoover. Third-o rder tensors as operators on matrices: A theoretical and computational framework with applicatio ns in imaging. SIAM Journal on Matrix Analysis and Applications , 34(1):148–172, 2013

work page 2013
[23]

M. E. Kilmer and C. D. Martin. Factorization strategies for third-order tensors. Linear Algebra and its Applications, 435(3):641–658, 2011

work page 2011
[24]

Y. Liao, W. Li, and D. Yang. The accelerated tensor Kaczm arz algorithm with adaptive parameters for solving tensor systems. Applied Numerical Mathematics , 202:100–119, 2024

work page 2024
[25]

Liu and S

J. Liu and S. Wright. An accelerated randomized Kaczmar z algorithm. Mathematics of Computation , 85(297):153–178, 2016

work page 2016
[26]

Loizou and P

N. Loizou and P. Richt´ arik. Momentum and stochastic mo mentum for stochastic gradient, Newton, proximal point and subspace descent methods. Comput. Optim. Appl. , 77:653–710, 2020

work page 2020
[27]

K. Lund. The tensor t-function: A deﬁnition for functions of third-order tensor s. Numerical Linear Algebra with Applications , 27(3):e2288, 2020

work page 2020
[28]

Luo and A

H. Luo and A. Ma. Frontal slice approaches for tensor lin ear systems. arXiv preprint arXiv:2408.13547 , 2024

work page arXiv 2024
[29]

Ma and D

A. Ma and D. Molitor. Randomized Kaczmarz for tensor lin ear systems. BIT Numerical Mathematics , 62(1):171–194, 2022

work page 2022
[30]

Ma and D

A. Ma and D. Needell. Stochastic gradient descent for li near systems with missing data. Numer. Math. Theory Methods Appl. , 12(1):1–20, 2019

work page 2019
[31]

A. Ma, D. Needell, and A. Ramdas. Convergence propertie s of the randomized extended Gauss-Seidel and Kaczmarz methods. SIAM J. Matrix Anal. A. , 36(4):1590–1604, 2015

work page 2015
[32]

Y. Miao, L. Qi, and Y. Wei. Generalized tensor function v ia the tensor singular value decomposition based on the T-product. Linear Algebra and its Applications , 590:258–303, 2020

work page 2020
[33]

Mishchenko, A

K. Mishchenko, A. Khaled, and P. Richt´ arik. Random res huﬄing: Simple analysis with vast improve- ments. Advances in Neural Information Processing Systems , 33:17309–17320, 2020

work page 2020
[34]

I. Necoara. Faster randomized block Kaczmarz algorith ms. SIAM Journal on Matrix Analysis and Applications, 40(4):1425–1452, 2019

work page 2019
[35]

Needell, N

D. Needell, N. Srebro, and R. Ward. Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm. Math. Program., 155:549–573, 2016

work page 2016
[36]

Needell and J

D. Needell and J. A. Tropp. Paved with good intentions: a nalysis of a randomized block Kaczmarz method. Linear Algebra and its Applications , 441:199–221, 2014

work page 2014
[37]

Newman and M

E. Newman and M. E. Kilmer. Nonnegative tensor patch dic tionary approaches for image compression and deblurring applications. SIAM Journal on Imaging Sciences , 13(3):1084–1112, 2020. LINEAR CONVERGENCE OF TKGK 27

work page 2020
[38]

L. M. Nguyen, Q. Tran-Dinh, D. T. Phan, P. H. Nguyen, and M . Van Dijk. A uniﬁed convergence analysis for shuﬄing-type gradient methods. J. Mach. Learn. Res. , 22(207):1–44, 2021

work page 2021
[39]

Reichel and U

L. Reichel and U. O. Ugwu. The tensor Golub-Kahan-Tikho nov method applied to the solution of ill- posed problems with at-product structure. Numerical Linear Algebra with Applications , 29(1):e2412, 2022

work page 2022
[40]

J. Rieger. Generalized Gearhart-Koshy acceleration f or the Kaczmarz method. Mathematics of Compu- tation, 92(341):1251–1272, 2023

work page 2023
[41]

Robbins and S

H. Robbins and S. Monro. A stochastic approximation met hod. Ann. Math. Statistics , pages 400–407, 1951

work page 1951
[42]

Safran and O

I. Safran and O. Shamir. How good is SGD with random shuﬄi ng? In Conference on Learning Theory , pages 3250–3284. PMLR, 2020

work page 2020
[43]

Sch¨ opfer and D

F. Sch¨ opfer and D. Lorenz. Linear convergence of the ra ndomized sparse Kaczmarz method. Math. Program., 173:509–536, 2019

work page 2019
[44]

Soltani, M

S. Soltani, M. E. Kilmer, and P. C. Hansen. A tensor-base d dictionary learning approach to tomographic image reconstruction. BIT Numerical Mathematics , 56:1425–1454, 2016

work page 2016
[45]

Strohmer and R

T. Strohmer and R. Vershynin. A randomized Kaczmarz alg orithm with exponential convergence. Jour- nal of Fourier Analysis and Applications , 15(2):262–278, 2009

work page 2009
[46]

Y. Su, D. Han, Y. Zeng, and J. Xie. On greedy multi-step in ertial randomized Kaczmarz method for solving linear systems. Calcolo, 61(4):68, 2024

work page 2024
[47]

Y. Su, D. Han, Y. Zeng, and J. Xie. On the convergence anal ysis of the greedy randomized Kaczmarz method. Numerical Algorithms, pages 1–22, 2026

work page 2026
[48]

Y. Sun, D. Han, and J. Xie. Connecting randomized iterat ive methods with Krylov subspaces. arXiv preprint arXiv:2505.20602 , 2025

work page arXiv 2025
[49]

M. K. Tam. Gearhart-Koshy acceleration for aﬃne subspa ces. Operations Research Letters, 49(2):157– 163, 2021

work page 2021
[50]

L. Tang, Y. Yu, Y. Zhang, and H. Li. Sketch-and-project m ethods for tensor linear systems. Numerical Linear Algebra with Applications , 30(2):e2470, 2023

work page 2023
[51]

Xie, H.-D

J. Xie, H.-D. Qi, and D. Han. Randomized iterative metho ds for generalized absolute value equations: Solvability and error bounds. SIAM Journal on Optimization , 35(3):1731–1760, 2025

work page 2025
[52]

B. Ying, K. Yuan, S. Vlaski, and A. H. Sayed. Stochastic l earning under random reshuﬄing with constant step-sizes. IEEE Trans. Signal Process. , 67(2):474–489, 2018

work page 2018
[53]

Y. Zeng, D. Han, Y. Su, and J. Xie. Randomized Kaczmarz me thod with adaptive stepsizes for incon- sistent linear systems. Numer. Algorithms , 94:1403–1420, 2023

work page 2023
[54]

Y. Zeng, D. Han, Y. Su, and J. Xie. On adaptive stochastic heavy ball momentum for solving linear systems. SIAM Journal on Matrix Analysis and Applications , 45(3):1259–1286, 2024

work page 2024
[55]

Y. Zeng, D. Han, Y. Su, and J. Xie. On adaptive stochastic extended iterative methods for solving least squares. Mathematics of Computation , 2025

work page 2025
[56]

Y. Zeng, D. Han, Y. Su, and J. Xie. Stochastic dual coordi nate descent with adaptive heavy ball momentum for linearly constrained convex optimization. Numerische Mathematik, 158(2):749–794, 2026

work page 2026
[57]

P. Zhou, C. Lu, Z. Lin, and C. Zhang. Tensor factorizatio n for low-rank tensor completion. IEEE Transactions on Image Processing , 27(3):1152–1163, 2017

work page 2017
[58]

Zouzias and N

A. Zouzias and N. M. Freris. Randomized extended Kaczma rz for solving least squares. SIAM Journal on Matrix Analysis and Applications , 34(2):773–793, 2013

work page 2013
[59]

Proof of the main results 7.1

Appendix. Proof of the main results 7.1. Proof of Theorem 2.1. 28 YIJIE W ANG, YONGHAN SUN, DEREN HAN, AND JIAXIN XIE Proof of Theorem 2.1. To establish the results, we ﬁrst note that all iterates of Al gorithm 1 belong to the aﬃne subspace X 0 + rangeK (A†). Indeed, for any 1 ≤ i ≤ m, we have rangeK (A† i::) = range K(A⊤ i::) ⊆ rangeK(A⊤ ) = range K(A†),...

work page
[60]

A ∗ X k = B ⇒ rπ k (X k) = 0

and the inclusion follows from the fact that A⊤ i:: = A⊤ ∗ Ii::. Consequently, Algorithm 1 ensures that X k ∈ X 0 + rangeK(A†). Besides, X 0 ∗ = A† ∗ B + ( I − A † ∗ A ) ∗ X 0 = X 0 + A† ∗ (B − A ∗ X 0) ∈ X 0 + range K(A†), therefore X k − X 0 ∗ ∈ rangeK(A†). Since A† ∗ A is the projector onto range K(A†), it follows (25) A† ∗ A ∗ (X k − X 0 ∗ ) = X k − X...

work page
[61]

and ( 10), respectively. Given that ⟨X k − X 0 ∗ , X k − P π k (X k)⟩ = ∥X k − P π k(X k)∥2 F + ⟨Pπ k (X k) − X 0 ∗ , X k − P π k (X k)⟩, it follows that (28) ⟨X k − X 0 ∗ , X k − P π k (X k)⟩2 = ∥X k − P π k (X k)∥4 F + ⟨Pπ k (X k) − X 0 ∗ , X k − P π k (X k)⟩2 + 2∥X k − P π k (X k)∥2 F ⟨Pπ k (X k) − X 0 ∗ , X k − P π k (X k)⟩. From the deﬁnition of γk in (

work page
[62]

and its equivalent form in ( 11), we have ⟨X k − X 0 ∗ , X k − P π k (X k)⟩ = γk = 1 2 ∥rπ k (X k)∥2 F + 1 2 ∥X k − P π k (X k)∥2 F . Hence, we can rewrite ⟨Pπ k (X k) − X 0 ∗ , X k − P π k (X k)⟩ as (29) ⟨Pπ k (X k) − X 0 ∗ , X k − P π k (X k)⟩ = −∥X k − P π k (X k)∥2 F + ⟨X k − X 0 ∗ , X k − P π k (X k)⟩ = 1 2 ∥rπ k (X k)∥2 F − 1 2 ∥X k − P π k (X k)∥2 ...

work page
[63]

and ( 9), respectively. Using the equivalent form of sk k− jk,τ +1 in ( 16), we can further express it as sk k− jk,τ +1 = γk ∥(I − VkV † k )⃗dk∥2 2 = γk ∥⃗dk∥2 2 ( 1 − ∥VkV † k ⃗dk∥2 2 ∥⃗dk∥2 2 )− 1 , where the last equality follows from ∥(I − VkV † k )⃗dk∥2 2 = ∥⃗dk∥2 2 − ∥ VkV † k ∥2 2∥⃗dk∥2

work page
[64]

Recall the deﬁnitions of γk in (

work page
[65]

As βk = ( 1 − ∥VkV † k ⃗dk∥2 2 ∥ ⃗dk∥2 2 )− 1 in ( 14), we conclude that sk k− jk,τ +1 = β kζ2 k∥X k−X 0 ∗ ∥2 F γk and ∥X k+1 − X 0 ∗ ∥2 F ≤ (1 − βkζ2 k )∥X k − X 0 ∗ ∥2 F

and ζk in ( 14), it holds that γ 2 k ∥ ⃗dk∥2 2 = ζ2 k ∥X k − X 0 ∗ ∥2 F . As βk = ( 1 − ∥VkV † k ⃗dk∥2 2 ∥ ⃗dk∥2 2 )− 1 in ( 14), we conclude that sk k− jk,τ +1 = β kζ2 k∥X k−X 0 ∗ ∥2 F γk and ∥X k+1 − X 0 ∗ ∥2 F ≤ (1 − βkζ2 k )∥X k − X 0 ∗ ∥2 F . Next, we prove that 1 − βkζ2 k ≤ ∥ bcirc ( Tπ k ∗ A† ∗ A ) ∥2

work page
[66]

Consequently, because VkV † k is the orthogonal projector onto Range(Vk), it holds that 0 < 1 − ∥VkV † k ⃗dk∥2 2 ∥ ⃗dk∥2 2 ≤ 1, which implies that βk ≥ 1

Since Mk has full column rank, it follows that ⃗dk /∈ Range(Vk). Consequently, because VkV † k is the orthogonal projector onto Range(Vk), it holds that 0 < 1 − ∥VkV † k ⃗dk∥2 2 ∥ ⃗dk∥2 2 ≤ 1, which implies that βk ≥ 1. Therefore, to establish 1 − βkζ2 k ≤ ∥ bcirc ( Tπ k ∗ A† ∗ A ) ∥2 2, it suﬃces to prove the inequality 1 − ζ2 k ≤ LINEAR CONVERGENCE OF T...

work page
[67]

From the deﬁnition of ζk in (

work page
[68]

From the inequality (

and Lemma 7.1, 1− ζ2 k is expressed as 1 − ζ2 k = 1 −  rπ k (X k)  2 F ∥X k − X 0∗ ∥2 F − ⟨Pπ k (X k) − X 0 ∗ , X k − P π k (X k)⟩2 ∥X k − X 0∗ ∥2 F ∥X k − P π k (X k)∥2 F .(30) Using the equation in ( 27), We further obtain 1 − ζ2 k ≤ 1 −  rπ k (X k)  2 F ∥X k − X 0∗ ∥2 F =  Pπ k (X k) − X 0 ∗  2 F ∥X k − X 0∗ ∥2 F . From the inequality (

work page
[69]

If ∥rπ k (X k)∥2 F ̸= ∥X k − P π k (X k)∥2 F , then the last term of (

in the proof of Theorem 2.1, it holds that   Pπ k (X k) − X 0 ∗    2 F ≤   bcirc(Tπ k ∗ A† ∗ A)    2 2   X k − X 0 ∗    2 F , which results in 1 − ζ2 k ≤  bcirc(Tπ k ∗ A† ∗ A)  2 2 . If ∥rπ k (X k)∥2 F ̸= ∥X k − P π k (X k)∥2 F , then the last term of (

work page
[70]

Hence we complete the proof

gives ⟨Pπ k (X k) − X 0 ∗ , X k − P π k (X k)⟩2 ∥X k − X 0∗ ∥2 F ∥X k − P π k (X k)∥2 F = ( ∥rπ k (X k)∥2 F − ∥X k − P π k (X k)∥2 F )2 4 ∥X k − X 0∗ ∥2 F ∥X k − P π k (X k)∥2 F > 0, where the equality follows from ( 29). Hence we complete the proof. □ 7.4. Proof of Proposition 4.1 and Theorem 4.2. Proof of Proposition 4.1. We prove by induction on k ≥ 1 ...

work page
[71]

By induction, the sequences {⃗ xk}k≥ 0 coincide

Therefore, both algorithms generate the same ⃗ xk+1. By induction, the sequences {⃗ xk}k≥ 0 coincide. □ LINEAR CONVERGENCE OF TKGK 35 School of Mathematical Sciences, Beihang University, Beij ing, 100191, China. Email address : yj wang@buaa.edu.cn School of Mathematical Sciences, Beihang University, Beij ing, 100191, China. Email address : sunyonghan@buaa...

work page

[1] [1]

K. Ahn, C. Yun, and S. Sra. SGD with shuﬄing: optimal rates without component convexity and large epoch requirements. Advances in Neural Information Processing Systems , 33:17526–17535, 2020

work page 2020

[2] [2]

Bai and W.-T

Z.-Z. Bai and W.-T. Wu. On greedy randomized Kaczmarz met hod for solving large sparse linear systems. SIAM Journal on Scientiﬁc Computing , 40(1):A592–A606, 2018

work page 2018

[3] [3]

W. Bao, F. Zhang, W. Li, Q. Wang, and Y. Gao. Randomized ave rage Kaczmarz algorithm for tensor linear systems. Mathematics, 10(23):4594, 2022

work page 2022

[4] [4]

Castillo, J

A. Castillo, J. Haddock, I. Hartsock, P. Hoyos, L. Kassab , A. Kryshchenko, K. Larripa, D. Needell, S. Suryanarayanan, and K. Y. Djima. Block Gauss-Seidel meth ods for t-product tensor regression. arXiv preprint arXiv:2503.19155 , 2025

work page arXiv 2025

[5] [5]

Castillo, J

A. Castillo, J. Haddock, I. Hartsock, P. Hoyos, L. Kassab , A. Kryshchenko, K. Larripa, D. Needell, S. Suryanarayanan, and K. Yacoubou-Djima. Randomized Kacz marz methods for t-product tensor linear systems with factorized operators. arXiv preprint arXiv:2412.10583 , 2024

work page arXiv 2024

[6] [6]

Chen and J

X. Chen and J. Qin. Regularized Kaczmarz algorithms for t ensor recovery. SIAM Journal on Imaging Sciences, 14(4):1439–1471, 2021

work page 2021

[7] [7]

K. Du. Tight upper bounds for the convergence of the rando mized extended Kaczmarz and Gauss–Seidel algorithms. Numerical Linear Algebra with Applications , 26(3):e2233, 2019

work page 2019

[8] [8]

W. B. Gearhart and M. Koshy. Acceleration schemes for the method of alternating projections. Journal of Computational and Applied Mathematics , 26(3):235–249, 1989

work page 1989

[9] [9]

G. H. Golub and C. F. Van Loan. Matrix computations . JHU press, 2013

work page 2013

[10] [10]

Gordon, R

R. Gordon, R. Bender, and G. T. Herman. Algebraic recons truction techniques (ART) for three- dimensional electron microscopy and X-ray photography. J. Theor. Biol. , 29:471–481, 1970

work page 1970

[11] [11]

R. M. Gower, D. Molitor, J. Moorman, and D. Needell. On ad aptive sketch-and-project for solving linear systems. SIAM Journal on Matrix Analysis and Applications , 42(2):954–989, 2021

work page 2021

[12] [12]

R. M. Gower and P. Richt´ arik. Randomized iterative met hods for linear systems. SIAM Journal on Matrix Analysis and Applications , 36(4):1660–1690, 2015. 26 YIJIE W ANG, YONGHAN SUN, DEREN HAN, AND JIAXIN XIE

work page 2015

[13] [13]

M. E. Guide, A. E. Ichi, K. Jbilou, and R. Sadaka. Tensor K rylov subspace methods via the T-product for color image processing. arXiv preprint arXiv:2006.07133 , 2020

work page arXiv 2006

[14] [14]

D. Han, Y. Su, and J. Xie. Randomized Douglas-Rachford m ethods for linear systems: Improved accuracy and eﬃciency. SIAM J. Optim. , 34(1):1045–1070, 2024

work page 2024

[15] [15]

Han and J

D. Han and J. Xie. On pseudoinverse-free randomized met hods for linear systems: Uniﬁed framework and acceleration. Optimization Methods and Software , pages 1–36, 2025

work page 2025

[16] [16]

Han and J.-X

D.-R. Han and J.-X. Xie. A simple linear convergence ana lysis of the randomized reshuﬄing Kaczmarz method. Journal of the Operations Research Society of China , pages 1–13, 2025

work page 2025

[17] [17]

Hegland and J

M. Hegland and J. Rieger. Generalized Gearhart-Koshy a cceleration is a Krylov space method of a new type. arXiv preprint arXiv:2311.18305 , 2023

work page arXiv 2023

[18] [18]

G. T. Herman and L. B. Meyer. Algebraic reconstruction t echniques can be made computationally eﬃcient (positron emission tomography application). IEEE Trans. Medical Imaging , 12:600–609, 1993

work page 1993

[19] [19]

Jeong and D

H. Jeong and D. Needell. Linear convergence of reshuﬄin g Kaczmarz methods with sparse constraints. SIAM Journal on Scientiﬁc Computing , 47(4):A2323–A2352, 2025

work page 2025

[20] [20]

Karczmarz

S. Karczmarz. Angen¨ aherte auﬂ¨ osung von systemen lin earer glei-chungen. Bull. Int. Acad. Pol. Sic. Let., Cl. Sci. Math. Nat. , pages 355–357, 1937

work page 1937

[21] [21]

Karczmarz

S. Karczmarz. Angenaherte auﬂosung von systemen linea rer glei-chungen. Bull. Int. Acad. Pol. Sic. Let., Cl. Sci. Math. Nat. , pages 355–357, 1937

work page 1937

[22] [22]

M. E. Kilmer, K. Braman, N. Hao, and R. C. Hoover. Third-o rder tensors as operators on matrices: A theoretical and computational framework with applicatio ns in imaging. SIAM Journal on Matrix Analysis and Applications , 34(1):148–172, 2013

work page 2013

[23] [23]

M. E. Kilmer and C. D. Martin. Factorization strategies for third-order tensors. Linear Algebra and its Applications, 435(3):641–658, 2011

work page 2011

[24] [24]

Y. Liao, W. Li, and D. Yang. The accelerated tensor Kaczm arz algorithm with adaptive parameters for solving tensor systems. Applied Numerical Mathematics , 202:100–119, 2024

work page 2024

[25] [25]

Liu and S

J. Liu and S. Wright. An accelerated randomized Kaczmar z algorithm. Mathematics of Computation , 85(297):153–178, 2016

work page 2016

[26] [26]

Loizou and P

N. Loizou and P. Richt´ arik. Momentum and stochastic mo mentum for stochastic gradient, Newton, proximal point and subspace descent methods. Comput. Optim. Appl. , 77:653–710, 2020

work page 2020

[27] [27]

K. Lund. The tensor t-function: A deﬁnition for functions of third-order tensor s. Numerical Linear Algebra with Applications , 27(3):e2288, 2020

work page 2020

[28] [28]

Luo and A

H. Luo and A. Ma. Frontal slice approaches for tensor lin ear systems. arXiv preprint arXiv:2408.13547 , 2024

work page arXiv 2024

[29] [29]

Ma and D

A. Ma and D. Molitor. Randomized Kaczmarz for tensor lin ear systems. BIT Numerical Mathematics , 62(1):171–194, 2022

work page 2022

[30] [30]

Ma and D

A. Ma and D. Needell. Stochastic gradient descent for li near systems with missing data. Numer. Math. Theory Methods Appl. , 12(1):1–20, 2019

work page 2019

[31] [31]

A. Ma, D. Needell, and A. Ramdas. Convergence propertie s of the randomized extended Gauss-Seidel and Kaczmarz methods. SIAM J. Matrix Anal. A. , 36(4):1590–1604, 2015

work page 2015

[32] [32]

Y. Miao, L. Qi, and Y. Wei. Generalized tensor function v ia the tensor singular value decomposition based on the T-product. Linear Algebra and its Applications , 590:258–303, 2020

work page 2020

[33] [33]

Mishchenko, A

K. Mishchenko, A. Khaled, and P. Richt´ arik. Random res huﬄing: Simple analysis with vast improve- ments. Advances in Neural Information Processing Systems , 33:17309–17320, 2020

work page 2020

[34] [34]

I. Necoara. Faster randomized block Kaczmarz algorith ms. SIAM Journal on Matrix Analysis and Applications, 40(4):1425–1452, 2019

work page 2019

[35] [35]

Needell, N

D. Needell, N. Srebro, and R. Ward. Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm. Math. Program., 155:549–573, 2016

work page 2016

[36] [36]

Needell and J

D. Needell and J. A. Tropp. Paved with good intentions: a nalysis of a randomized block Kaczmarz method. Linear Algebra and its Applications , 441:199–221, 2014

work page 2014

[37] [37]

Newman and M

E. Newman and M. E. Kilmer. Nonnegative tensor patch dic tionary approaches for image compression and deblurring applications. SIAM Journal on Imaging Sciences , 13(3):1084–1112, 2020. LINEAR CONVERGENCE OF TKGK 27

work page 2020

[38] [38]

L. M. Nguyen, Q. Tran-Dinh, D. T. Phan, P. H. Nguyen, and M . Van Dijk. A uniﬁed convergence analysis for shuﬄing-type gradient methods. J. Mach. Learn. Res. , 22(207):1–44, 2021

work page 2021

[39] [39]

Reichel and U

L. Reichel and U. O. Ugwu. The tensor Golub-Kahan-Tikho nov method applied to the solution of ill- posed problems with at-product structure. Numerical Linear Algebra with Applications , 29(1):e2412, 2022

work page 2022

[40] [40]

J. Rieger. Generalized Gearhart-Koshy acceleration f or the Kaczmarz method. Mathematics of Compu- tation, 92(341):1251–1272, 2023

work page 2023

[41] [41]

Robbins and S

H. Robbins and S. Monro. A stochastic approximation met hod. Ann. Math. Statistics , pages 400–407, 1951

work page 1951

[42] [42]

Safran and O

I. Safran and O. Shamir. How good is SGD with random shuﬄi ng? In Conference on Learning Theory , pages 3250–3284. PMLR, 2020

work page 2020

[43] [43]

Sch¨ opfer and D

F. Sch¨ opfer and D. Lorenz. Linear convergence of the ra ndomized sparse Kaczmarz method. Math. Program., 173:509–536, 2019

work page 2019

[44] [44]

Soltani, M

S. Soltani, M. E. Kilmer, and P. C. Hansen. A tensor-base d dictionary learning approach to tomographic image reconstruction. BIT Numerical Mathematics , 56:1425–1454, 2016

work page 2016

[45] [45]

Strohmer and R

T. Strohmer and R. Vershynin. A randomized Kaczmarz alg orithm with exponential convergence. Jour- nal of Fourier Analysis and Applications , 15(2):262–278, 2009

work page 2009

[46] [46]

Y. Su, D. Han, Y. Zeng, and J. Xie. On greedy multi-step in ertial randomized Kaczmarz method for solving linear systems. Calcolo, 61(4):68, 2024

work page 2024

[47] [47]

Y. Su, D. Han, Y. Zeng, and J. Xie. On the convergence anal ysis of the greedy randomized Kaczmarz method. Numerical Algorithms, pages 1–22, 2026

work page 2026

[48] [48]

Y. Sun, D. Han, and J. Xie. Connecting randomized iterat ive methods with Krylov subspaces. arXiv preprint arXiv:2505.20602 , 2025

work page arXiv 2025

[49] [49]

M. K. Tam. Gearhart-Koshy acceleration for aﬃne subspa ces. Operations Research Letters, 49(2):157– 163, 2021

work page 2021

[50] [50]

L. Tang, Y. Yu, Y. Zhang, and H. Li. Sketch-and-project m ethods for tensor linear systems. Numerical Linear Algebra with Applications , 30(2):e2470, 2023

work page 2023

[51] [51]

Xie, H.-D

J. Xie, H.-D. Qi, and D. Han. Randomized iterative metho ds for generalized absolute value equations: Solvability and error bounds. SIAM Journal on Optimization , 35(3):1731–1760, 2025

work page 2025

[52] [52]

B. Ying, K. Yuan, S. Vlaski, and A. H. Sayed. Stochastic l earning under random reshuﬄing with constant step-sizes. IEEE Trans. Signal Process. , 67(2):474–489, 2018

work page 2018

[53] [53]

Y. Zeng, D. Han, Y. Su, and J. Xie. Randomized Kaczmarz me thod with adaptive stepsizes for incon- sistent linear systems. Numer. Algorithms , 94:1403–1420, 2023

work page 2023

[54] [54]

Y. Zeng, D. Han, Y. Su, and J. Xie. On adaptive stochastic heavy ball momentum for solving linear systems. SIAM Journal on Matrix Analysis and Applications , 45(3):1259–1286, 2024

work page 2024

[55] [55]

Y. Zeng, D. Han, Y. Su, and J. Xie. On adaptive stochastic extended iterative methods for solving least squares. Mathematics of Computation , 2025

work page 2025

[56] [56]

Y. Zeng, D. Han, Y. Su, and J. Xie. Stochastic dual coordi nate descent with adaptive heavy ball momentum for linearly constrained convex optimization. Numerische Mathematik, 158(2):749–794, 2026

work page 2026

[57] [57]

P. Zhou, C. Lu, Z. Lin, and C. Zhang. Tensor factorizatio n for low-rank tensor completion. IEEE Transactions on Image Processing , 27(3):1152–1163, 2017

work page 2017

[58] [58]

Zouzias and N

A. Zouzias and N. M. Freris. Randomized extended Kaczma rz for solving least squares. SIAM Journal on Matrix Analysis and Applications , 34(2):773–793, 2013

work page 2013

[59] [59]

Proof of the main results 7.1

Appendix. Proof of the main results 7.1. Proof of Theorem 2.1. 28 YIJIE W ANG, YONGHAN SUN, DEREN HAN, AND JIAXIN XIE Proof of Theorem 2.1. To establish the results, we ﬁrst note that all iterates of Al gorithm 1 belong to the aﬃne subspace X 0 + rangeK (A†). Indeed, for any 1 ≤ i ≤ m, we have rangeK (A† i::) = range K(A⊤ i::) ⊆ rangeK(A⊤ ) = range K(A†),...

work page

[60] [60]

A ∗ X k = B ⇒ rπ k (X k) = 0

and the inclusion follows from the fact that A⊤ i:: = A⊤ ∗ Ii::. Consequently, Algorithm 1 ensures that X k ∈ X 0 + rangeK(A†). Besides, X 0 ∗ = A† ∗ B + ( I − A † ∗ A ) ∗ X 0 = X 0 + A† ∗ (B − A ∗ X 0) ∈ X 0 + range K(A†), therefore X k − X 0 ∗ ∈ rangeK(A†). Since A† ∗ A is the projector onto range K(A†), it follows (25) A† ∗ A ∗ (X k − X 0 ∗ ) = X k − X...

work page

[61] [61]

and ( 10), respectively. Given that ⟨X k − X 0 ∗ , X k − P π k (X k)⟩ = ∥X k − P π k(X k)∥2 F + ⟨Pπ k (X k) − X 0 ∗ , X k − P π k (X k)⟩, it follows that (28) ⟨X k − X 0 ∗ , X k − P π k (X k)⟩2 = ∥X k − P π k (X k)∥4 F + ⟨Pπ k (X k) − X 0 ∗ , X k − P π k (X k)⟩2 + 2∥X k − P π k (X k)∥2 F ⟨Pπ k (X k) − X 0 ∗ , X k − P π k (X k)⟩. From the deﬁnition of γk in (

work page

[62] [62]

and its equivalent form in ( 11), we have ⟨X k − X 0 ∗ , X k − P π k (X k)⟩ = γk = 1 2 ∥rπ k (X k)∥2 F + 1 2 ∥X k − P π k (X k)∥2 F . Hence, we can rewrite ⟨Pπ k (X k) − X 0 ∗ , X k − P π k (X k)⟩ as (29) ⟨Pπ k (X k) − X 0 ∗ , X k − P π k (X k)⟩ = −∥X k − P π k (X k)∥2 F + ⟨X k − X 0 ∗ , X k − P π k (X k)⟩ = 1 2 ∥rπ k (X k)∥2 F − 1 2 ∥X k − P π k (X k)∥2 ...

work page

[63] [63]

and ( 9), respectively. Using the equivalent form of sk k− jk,τ +1 in ( 16), we can further express it as sk k− jk,τ +1 = γk ∥(I − VkV † k )⃗dk∥2 2 = γk ∥⃗dk∥2 2 ( 1 − ∥VkV † k ⃗dk∥2 2 ∥⃗dk∥2 2 )− 1 , where the last equality follows from ∥(I − VkV † k )⃗dk∥2 2 = ∥⃗dk∥2 2 − ∥ VkV † k ∥2 2∥⃗dk∥2

work page

[64] [64]

Recall the deﬁnitions of γk in (

work page

[65] [65]

As βk = ( 1 − ∥VkV † k ⃗dk∥2 2 ∥ ⃗dk∥2 2 )− 1 in ( 14), we conclude that sk k− jk,τ +1 = β kζ2 k∥X k−X 0 ∗ ∥2 F γk and ∥X k+1 − X 0 ∗ ∥2 F ≤ (1 − βkζ2 k )∥X k − X 0 ∗ ∥2 F

and ζk in ( 14), it holds that γ 2 k ∥ ⃗dk∥2 2 = ζ2 k ∥X k − X 0 ∗ ∥2 F . As βk = ( 1 − ∥VkV † k ⃗dk∥2 2 ∥ ⃗dk∥2 2 )− 1 in ( 14), we conclude that sk k− jk,τ +1 = β kζ2 k∥X k−X 0 ∗ ∥2 F γk and ∥X k+1 − X 0 ∗ ∥2 F ≤ (1 − βkζ2 k )∥X k − X 0 ∗ ∥2 F . Next, we prove that 1 − βkζ2 k ≤ ∥ bcirc ( Tπ k ∗ A† ∗ A ) ∥2

work page

[66] [66]

Consequently, because VkV † k is the orthogonal projector onto Range(Vk), it holds that 0 < 1 − ∥VkV † k ⃗dk∥2 2 ∥ ⃗dk∥2 2 ≤ 1, which implies that βk ≥ 1

Since Mk has full column rank, it follows that ⃗dk /∈ Range(Vk). Consequently, because VkV † k is the orthogonal projector onto Range(Vk), it holds that 0 < 1 − ∥VkV † k ⃗dk∥2 2 ∥ ⃗dk∥2 2 ≤ 1, which implies that βk ≥ 1. Therefore, to establish 1 − βkζ2 k ≤ ∥ bcirc ( Tπ k ∗ A† ∗ A ) ∥2 2, it suﬃces to prove the inequality 1 − ζ2 k ≤ LINEAR CONVERGENCE OF T...

work page

[67] [67]

From the deﬁnition of ζk in (

work page

[68] [68]

From the inequality (

and Lemma 7.1, 1− ζ2 k is expressed as 1 − ζ2 k = 1 −  rπ k (X k)  2 F ∥X k − X 0∗ ∥2 F − ⟨Pπ k (X k) − X 0 ∗ , X k − P π k (X k)⟩2 ∥X k − X 0∗ ∥2 F ∥X k − P π k (X k)∥2 F .(30) Using the equation in ( 27), We further obtain 1 − ζ2 k ≤ 1 −  rπ k (X k)  2 F ∥X k − X 0∗ ∥2 F =  Pπ k (X k) − X 0 ∗  2 F ∥X k − X 0∗ ∥2 F . From the inequality (

work page

[69] [69]

If ∥rπ k (X k)∥2 F ̸= ∥X k − P π k (X k)∥2 F , then the last term of (

in the proof of Theorem 2.1, it holds that   Pπ k (X k) − X 0 ∗    2 F ≤   bcirc(Tπ k ∗ A† ∗ A)    2 2   X k − X 0 ∗    2 F , which results in 1 − ζ2 k ≤  bcirc(Tπ k ∗ A† ∗ A)  2 2 . If ∥rπ k (X k)∥2 F ̸= ∥X k − P π k (X k)∥2 F , then the last term of (

work page

[70] [70]

Hence we complete the proof

gives ⟨Pπ k (X k) − X 0 ∗ , X k − P π k (X k)⟩2 ∥X k − X 0∗ ∥2 F ∥X k − P π k (X k)∥2 F = ( ∥rπ k (X k)∥2 F − ∥X k − P π k (X k)∥2 F )2 4 ∥X k − X 0∗ ∥2 F ∥X k − P π k (X k)∥2 F > 0, where the equality follows from ( 29). Hence we complete the proof. □ 7.4. Proof of Proposition 4.1 and Theorem 4.2. Proof of Proposition 4.1. We prove by induction on k ≥ 1 ...

work page

[71] [71]

By induction, the sequences {⃗ xk}k≥ 0 coincide

Therefore, both algorithms generate the same ⃗ xk+1. By induction, the sequences {⃗ xk}k≥ 0 coincide. □ LINEAR CONVERGENCE OF TKGK 35 School of Mathematical Sciences, Beihang University, Beij ing, 100191, China. Email address : yj wang@buaa.edu.cn School of Mathematical Sciences, Beihang University, Beij ing, 100191, China. Email address : sunyonghan@buaa...

work page