Accelerated alternating minimization algorithm for low-rank approximations in the Chebyshev norm

Alexander Osinsky; Dmitry Zheltkov; Stanislav Morozov

arxiv: 2410.05247 · v2 · pith:YXYLTC4Wnew · submitted 2024-10-07 · 🧮 math.NA · cs.NA

Accelerated alternating minimization algorithm for low-rank approximations in the Chebyshev norm

Stanislav Morozov , Dmitry Zheltkov , Alexander Osinsky This is my paper

Pith reviewed 2026-05-23 19:55 UTC · model grok-4.3

classification 🧮 math.NA cs.NA

keywords low-rank approximationChebyshev normalternating minimization2-way alternancematrix approximationnumerical optimizationelement-wise error

0 comments

The pith

A 2-way alternance of rank r is necessary for any optimal low-rank matrix approximation in the Chebyshev norm.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an accelerated version of alternating minimization to compute low-rank matrix approximations that minimize the maximum absolute entry-wise error, known as the Chebyshev norm. It defines a 2-way alternance of rank r on the residual and proves this pattern must appear in any optimal solution. The same condition holds for every limit point reached by the alternating minimization procedure. These results tie the practical iterative method to a concrete necessary condition for optimality in an element-wise error measure. Readers interested in scalable matrix approximations would see both a working algorithm and a test that candidate solutions must pass.

Core claim

We introduce the notion of a 2-way alternance of rank r. We show that the presence of a 2-way alternance of rank r is the necessary condition of the optimal low-rank approximation in the Chebyshev norm and that all limit points of the alternating minimization method satisfy this condition.

What carries the argument

The 2-way alternance of rank r, a sign-alternation pattern of rank r on the residual matrix that is shown to be required for optimality.

If this is right

Any limit point of the alternating minimization iteration satisfies the 2-way alternance condition.
The accelerated procedure produces usable low-rank approximations for large matrices, as verified by numerical tests.
The alternance pattern supplies a necessary optimality test independent of the particular algorithm used to generate the approximation.
Existence of an optimal approximant is presupposed when invoking the alternance condition.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

A computed approximation could be checked for optimality by testing whether its residual exhibits the required 2-way alternance.
The same necessary condition might be adapted to related entrywise approximation problems such as weighted Chebyshev norms.
Implementation could terminate the iteration early once a 2-way alternance is observed, since the limit-point property already holds.

Load-bearing premise

An optimal low-rank approximant exists and the Chebyshev-norm problem can be characterized by alternation properties without extra regularity conditions on the matrix entries or singular vectors.

What would settle it

Exhibit a concrete matrix A and integer r for which some optimal Chebyshev-norm rank-r approximant lacks any 2-way alternance of rank r.

Figures

Figures reproduced from arXiv: 2410.05247 by Alexander Osinsky, Dmitry Zheltkov, Stanislav Morozov.

**Figure 2.** Figure 2: Approximation error of the Hilbert matrix with alternating minimization [PITH_FULL_IMAGE:figures/full_fig_p018_2.png] view at source ↗

**Figure 3.** Figure 3: Approximation error of the identity matrix with alternating minimization [PITH_FULL_IMAGE:figures/full_fig_p019_3.png] view at source ↗

**Figure 4.** Figure 4: Approximation error for the function-generated matrix with alternating min [PITH_FULL_IMAGE:figures/full_fig_p020_4.png] view at source ↗

read the original abstract

Nowadays, low-rank approximations of matrices are an important component of many methods in science and engineering. Traditionally, low-rank approximations are considered in unitary invariant norms, however, recently element-wise approximations have also received significant attention in the literature. In this paper, we propose an accelerated alternating minimization algorithm for solving the problem of low-rank approximation of matrices in the Chebyshev norm. Through the numerical evaluation we demonstrate the effectiveness of the proposed procedure for large-scale problems. We also theoretically investigate the alternating minimization method and introduce the notion of a $2$-way alternance of rank $r$. We show that the presence of a $2$-way alternance of rank $r$ is the necessary condition of the optimal low-rank approximation in the Chebyshev norm and that all limit points of the alternating minimization method satisfy this condition.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces a 2-way alternance condition as necessary for optimality in Chebyshev-norm low-rank approximation and pairs it with an accelerated alternating-minimization scheme, but the necessity claim sits on shaky ground for the non-convex rank-r set.

read the letter

The paper defines a 2-way alternance of rank r as a necessary condition for an optimal low-rank approximant in the entrywise max norm and proves that limit points of alternating minimization satisfy it. They also give an acceleration of the basic alternating-minimization loop and report that it handles large matrices effectively in their tests. That combination is the actual new material relative to earlier work on elementwise low-rank approximation. The numerical side looks practical for the applications where uniform error control matters, and the algorithm itself is a straightforward extension of a standard approach to this non-convex problem. The link between the alternance condition and the algorithm's stationary points is a clean theoretical observation on its own terms. The soft spot is the necessity statement. The set of rank-at-most-r matrices is nonlinear, so the usual equioscillation arguments do not transfer without extra structure on where the max-norm error is attained and how those positions interact with the tangent space. The abstract gives no indication that the proof supplies or verifies such regularity, which means the claim can fail on a positive-measure set of matrices, exactly as the stress-test note flags. Without the full derivations it is impossible to see whether they handled rank-deficient cases or sparse attainment patterns. This paper is for people already working on entrywise-norm low-rank problems in numerical linear algebra or approximation theory. A reader who needs a working method for the Chebyshev case will get a usable algorithm and a new diagnostic to check; the optimality condition may still be worth exploring even if it needs tightening. It deserves a serious referee to examine the proofs and the experimental setup rather than a desk rejection.

Referee Report

2 major / 2 minor

Summary. The paper proposes an accelerated alternating minimization algorithm for low-rank matrix approximation in the entrywise Chebyshev (max) norm. It introduces the notion of a 2-way alternance of rank r and proves that its presence is a necessary condition for optimality; it further shows that all limit points of the alternating minimization iterates satisfy this condition. Numerical experiments illustrate the method's performance on large-scale instances.

Significance. If the necessity result is established under verifiable conditions, the work supplies a new optimality characterization for low-rank approximation outside the usual unitarily invariant norms, together with a practical scalable algorithm. The explicit link between the alternance property and the limit points of AM is a concrete theoretical contribution that can be checked numerically.

major comments (2)

[theoretical investigation paragraph / § on optimality condition] Theoretical investigation section: the necessity proof that a 2-way alternance of rank r is required for optimality proceeds by an alternation argument on the nonlinear manifold of rank-at-most-r matrices. The argument does not state or verify the required transversality/attainment conditions on the positions where the error attains its maximum norm relative to the tangent space at the candidate approximant. Consequently the necessity statement does not hold for all matrices; it fails on the positive-measure set where the max-norm error is attained on fewer than 2r+1 linearly independent positions or when the approximant is rank-deficient.
[limit-point theorem] Definition of 2-way alternance and its use in the limit-point theorem: the paper asserts that every limit point satisfies the alternance condition, yet the proof sketch does not address the case in which the sequence of iterates approaches a rank-deficient matrix or when the active set of equioscillation positions changes discontinuously. This leaves open whether the claimed property is preserved under the stated convergence assumptions.

minor comments (2)

[preliminaries] Notation for the Chebyshev norm and the rank-r manifold should be introduced once with a single symbol rather than redefined in multiple places.
[algorithm description] The description of the acceleration step in the algorithm lacks an explicit line-search or step-size rule; a short paragraph clarifying how the acceleration parameter is chosen would improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. We address the two major comments point by point below, indicating the revisions we will make.

read point-by-point responses

Referee: Theoretical investigation section: the necessity proof that a 2-way alternance of rank r is required for optimality proceeds by an alternation argument on the nonlinear manifold of rank-at-most-r matrices. The argument does not state or verify the required transversality/attainment conditions on the positions where the error attains its maximum norm relative to the tangent space at the candidate approximant. Consequently the necessity statement does not hold for all matrices; it fails on the positive-measure set where the max-norm error is attained on fewer than 2r+1 linearly independent positions or when the approximant is rank-deficient.

Authors: We agree that the necessity result is presented without an explicit statement of the transversality and attainment conditions required for the alternation argument to apply on the manifold. The manuscript implicitly assumes generic attainment of the maximum error at a sufficient number of independent positions. We will revise the theoretical section to state these conditions clearly (error attained at least at 2r+1 linearly independent positions and approximant of exact rank r) and note that the necessity holds under them. This clarifies the scope without misrepresenting the result for the cases covered by the argument. revision: partial
Referee: Definition of 2-way alternance and its use in the limit-point theorem: the paper asserts that every limit point satisfies the alternance condition, yet the proof sketch does not address the case in which the sequence of iterates approaches a rank-deficient matrix or when the active set of equioscillation positions changes discontinuously. This leaves open whether the claimed property is preserved under the stated convergence assumptions.

Authors: The limit-point result is derived under the standing assumption that the iterates converge to a matrix of exact rank r with a stable active equioscillation set. The proof sketch does not treat rank-deficient limits or discontinuous changes in the active set. We will revise the relevant section to explicitly list these assumptions and add a remark that the property holds when the limit has exact rank r and the active set stabilizes. Additional technical details can be supplied in an appendix if required. revision: partial

Circularity Check

0 steps flagged

No circularity: optimality condition derived from problem definition without reduction to inputs or self-citations

full rationale

The paper presents the 2-way alternance of rank r as a necessary condition for optimality in the Chebyshev norm, derived from the problem setup and alternation properties, with the additional claim that AM limit points satisfy it. No equations or steps in the provided abstract or description reduce a prediction or central claim to a fitted parameter, self-citation chain, or definitional tautology. The derivation is framed as a theoretical investigation assuming existence of an optimum, without evidence of smuggling ansatzes, renaming known results, or load-bearing self-citations. This is the common case of a self-contained theoretical argument against external benchmarks, warranting score 0.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claims rest on standard properties of matrix norms and alternating minimization; the new 2-way alternance is introduced as a defined concept rather than an invented physical entity. No explicit free parameters or ad-hoc axioms are stated in the abstract.

axioms (1)

domain assumption Existence of an optimal low-rank approximant in the Chebyshev norm for any input matrix.
Invoked implicitly when stating the necessary condition for optimality.

invented entities (1)

2-way alternance of rank r no independent evidence
purpose: Necessary optimality condition for low-rank Chebyshev approximation.
Newly defined in the paper; no independent evidence provided beyond the theoretical argument.

pith-pipeline@v0.9.0 · 5670 in / 1322 out tokens · 27317 ms · 2026-05-23T19:55:25.532294+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages · 1 internal anchor

[1]

Budzinskiy , Quasioptimal alternating projections and their use in low-rank approximation of matrices and tensors , arXiv preprint arXiv:2308.16097, (2023)

S. Budzinskiy , Quasioptimal alternating projections and their use in low-rank approximation of matrices and tensors , arXiv preprint arXiv:2308.16097, (2023)

work page arXiv 2023
[2]

Budzinskiy, Entrywise tensor-train approximation of large tensors via random embeddings , arXiv preprint arXiv:2403.11768, (2024)

S. Budzinskiy, Entrywise tensor-train approximation of large tensors via random embeddings , arXiv preprint arXiv:2403.11768, (2024)

work page arXiv 2024
[3]

Budzinskiy , On the distance to low-rank matrices in the maximum norm , Linear Algebra and its Applications, 688 (2024), pp

S. Budzinskiy , On the distance to low-rank matrices in the maximum norm , Linear Algebra and its Applications, 688 (2024), pp. 44–58

work page 2024
[4]

Budzinskiy , When big data actually are low-rank, or entrywise approximation of certain function-generated matrices, arXiv preprint arXiv:2407.03250, (2024)

S. Budzinskiy , When big data actually are low-rank, or entrywise approximation of certain function-generated matrices, arXiv preprint arXiv:2407.03250, (2024)

work page arXiv 2024
[5]

V. Daugavet, Uniform approximation of a function of two variables, tabulated as the product of functions of a single variable , USSR Computational Mathematics and Mathematical Physics, 11 (1971), pp. 1–16

work page 1971
[6]

V. K. Dzyadyk , On the approximation of functions on sets consisting of a finite number of points, Theory of Function Approximation and its Applications, (1974), p. 69—80

work page 1974
[7]

V. K. Dzyadyk , Introduction to the Theory of Uniform Approximation of Functions by Poly- nomials, Nauka, 1977

work page 1977
[8]

A. Y. Garnaev and E. D. Gluskin , The widths of a euclidean ball , in Doklady Akademii Nauk, vol. 277, 1984, pp. 1048–1052

work page 1984
[9]

Gillis and Y

N. Gillis and Y. Shitov , Low-rank matrix approximation in the infinity norm, Linear Algebra and its Applications, 581 (2019), pp. 367–382

work page 2019
[10]

E. D. Gluskin , The octahedron is badly approximated by random subspaces, Functional Analy- sis and Its Applications, 20 (1986), pp. 11–16

work page 1986
[11]

G. H. Golub and C. F. V an Loan , Matrix computations, JHU press, 2013. ACCELERATED ALTERNATING MINIMIZATION ALGORITHM 31

work page 2013
[12]

X. He, H. Zhang, M.-Y. Kan, and T.-S. Chua , Fast matrix factorization for online recom- mendation with implicit feedback , in Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, 2016, pp. 549–558

work page 2016
[13]

B. S. Kashin , The diameters of octahedra , Uspekhi Matematicheskikh Nauk, 30 (1975), pp. 251–252

work page 1975
[14]

S. A. Matveev, A. P. Smirnov, and E. Tyrtyshnikov , A fast numerical method for the cauchy problem for the smoluchowski equation , Journal of Computational Physics, 282 (2015), pp. 23–32

work page 2015
[15]

M. J. Mohlenkamp , Musings on multilinear fitting , Linear Algebra and its Applications, 438 (2013), pp. 834–852

work page 2013
[16]

Morozov, M

S. Morozov, M. Smirnov, and N. Zamarashkin , On the optimal rank-1 approximation of matrices in the chebyshev norm, Linear Algebra and its Applications, 679 (2023), pp. 4–29

work page 2023
[17]

Rectangular maximum volume and projective volume search algorithms

A. Osinsky , Rectangular maximum volume and projective volume search algorithms , arXiv preprint arXiv:1809.02334, (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[18]

Pinkus, N-widths in Approximation Theory, Springer Berlin, Heidelberg, 1985, https://doi

A. Pinkus, N-widths in Approximation Theory, Springer Berlin, Heidelberg, 1985, https://doi. org/10.1007/978-3-642-69894-1

work page doi:10.1007/978-3-642-69894-1 1985
[19]

T. N. Sainath, B. Kingsbury, V. Sindhwani, E. Arisoy, and B. Ramabhadran , Low-rank matrix factorization for deep neural network training with high-dimensional output targets, in 2013 IEEE international conference on acoustics, speech and signal processing, IEEE, 2013, pp. 6655–6659

work page 2013
[20]

C. E. Shannon , Probability of error for optimal codes in a gaussian channel , Bell System Technical Journal, 38 (1959), pp. 611–656

work page 1959
[21]

S. W. Son, Z. Chen, W. Hendrix, A. Agrawal, W.-k. Liao, and A. Choudhary , Data com- pression for the exascale computing era-survey, Supercomputing frontiers and innovations, 1 (2014), pp. 76–88

work page 2014
[22]

Udell and A

M. Udell and A. Townsend , Why are big data matrices approximately low rank? , SIAM Journal on Mathematics of Data Science, 1 (2019), pp. 144–160

work page 2019
[23]

Zamarashkin, S

N. Zamarashkin, S. Morozov, and E. Tyrtyshnikov , On the best approximation algorithm by low-rank matrices in Chebyshev’s norm, Computational Mathematics and Mathematical Physics, 62 (2022), pp. 701–718

work page 2022

[1] [1]

Budzinskiy , Quasioptimal alternating projections and their use in low-rank approximation of matrices and tensors , arXiv preprint arXiv:2308.16097, (2023)

S. Budzinskiy , Quasioptimal alternating projections and their use in low-rank approximation of matrices and tensors , arXiv preprint arXiv:2308.16097, (2023)

work page arXiv 2023

[2] [2]

Budzinskiy, Entrywise tensor-train approximation of large tensors via random embeddings , arXiv preprint arXiv:2403.11768, (2024)

S. Budzinskiy, Entrywise tensor-train approximation of large tensors via random embeddings , arXiv preprint arXiv:2403.11768, (2024)

work page arXiv 2024

[3] [3]

Budzinskiy , On the distance to low-rank matrices in the maximum norm , Linear Algebra and its Applications, 688 (2024), pp

S. Budzinskiy , On the distance to low-rank matrices in the maximum norm , Linear Algebra and its Applications, 688 (2024), pp. 44–58

work page 2024

[4] [4]

Budzinskiy , When big data actually are low-rank, or entrywise approximation of certain function-generated matrices, arXiv preprint arXiv:2407.03250, (2024)

S. Budzinskiy , When big data actually are low-rank, or entrywise approximation of certain function-generated matrices, arXiv preprint arXiv:2407.03250, (2024)

work page arXiv 2024

[5] [5]

V. Daugavet, Uniform approximation of a function of two variables, tabulated as the product of functions of a single variable , USSR Computational Mathematics and Mathematical Physics, 11 (1971), pp. 1–16

work page 1971

[6] [6]

V. K. Dzyadyk , On the approximation of functions on sets consisting of a finite number of points, Theory of Function Approximation and its Applications, (1974), p. 69—80

work page 1974

[7] [7]

V. K. Dzyadyk , Introduction to the Theory of Uniform Approximation of Functions by Poly- nomials, Nauka, 1977

work page 1977

[8] [8]

A. Y. Garnaev and E. D. Gluskin , The widths of a euclidean ball , in Doklady Akademii Nauk, vol. 277, 1984, pp. 1048–1052

work page 1984

[9] [9]

Gillis and Y

N. Gillis and Y. Shitov , Low-rank matrix approximation in the infinity norm, Linear Algebra and its Applications, 581 (2019), pp. 367–382

work page 2019

[10] [10]

E. D. Gluskin , The octahedron is badly approximated by random subspaces, Functional Analy- sis and Its Applications, 20 (1986), pp. 11–16

work page 1986

[11] [11]

G. H. Golub and C. F. V an Loan , Matrix computations, JHU press, 2013. ACCELERATED ALTERNATING MINIMIZATION ALGORITHM 31

work page 2013

[12] [12]

X. He, H. Zhang, M.-Y. Kan, and T.-S. Chua , Fast matrix factorization for online recom- mendation with implicit feedback , in Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, 2016, pp. 549–558

work page 2016

[13] [13]

B. S. Kashin , The diameters of octahedra , Uspekhi Matematicheskikh Nauk, 30 (1975), pp. 251–252

work page 1975

[14] [14]

S. A. Matveev, A. P. Smirnov, and E. Tyrtyshnikov , A fast numerical method for the cauchy problem for the smoluchowski equation , Journal of Computational Physics, 282 (2015), pp. 23–32

work page 2015

[15] [15]

M. J. Mohlenkamp , Musings on multilinear fitting , Linear Algebra and its Applications, 438 (2013), pp. 834–852

work page 2013

[16] [16]

Morozov, M

S. Morozov, M. Smirnov, and N. Zamarashkin , On the optimal rank-1 approximation of matrices in the chebyshev norm, Linear Algebra and its Applications, 679 (2023), pp. 4–29

work page 2023

[17] [17]

Rectangular maximum volume and projective volume search algorithms

A. Osinsky , Rectangular maximum volume and projective volume search algorithms , arXiv preprint arXiv:1809.02334, (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[18] [18]

Pinkus, N-widths in Approximation Theory, Springer Berlin, Heidelberg, 1985, https://doi

A. Pinkus, N-widths in Approximation Theory, Springer Berlin, Heidelberg, 1985, https://doi. org/10.1007/978-3-642-69894-1

work page doi:10.1007/978-3-642-69894-1 1985

[19] [19]

T. N. Sainath, B. Kingsbury, V. Sindhwani, E. Arisoy, and B. Ramabhadran , Low-rank matrix factorization for deep neural network training with high-dimensional output targets, in 2013 IEEE international conference on acoustics, speech and signal processing, IEEE, 2013, pp. 6655–6659

work page 2013

[20] [20]

C. E. Shannon , Probability of error for optimal codes in a gaussian channel , Bell System Technical Journal, 38 (1959), pp. 611–656

work page 1959

[21] [21]

S. W. Son, Z. Chen, W. Hendrix, A. Agrawal, W.-k. Liao, and A. Choudhary , Data com- pression for the exascale computing era-survey, Supercomputing frontiers and innovations, 1 (2014), pp. 76–88

work page 2014

[22] [22]

Udell and A

M. Udell and A. Townsend , Why are big data matrices approximately low rank? , SIAM Journal on Mathematics of Data Science, 1 (2019), pp. 144–160

work page 2019

[23] [23]

Zamarashkin, S

N. Zamarashkin, S. Morozov, and E. Tyrtyshnikov , On the best approximation algorithm by low-rank matrices in Chebyshev’s norm, Computational Mathematics and Mathematical Physics, 62 (2022), pp. 701–718

work page 2022