Binno: A 1st-order method for Bi-level Nonconvex Nonsmooth Optimization for Matrix Factorizations

Andersen Ang; Flavia Esposito; Laura Selicato

arxiv: 2510.21390 · v3 · submitted 2025-10-24 · 🧮 math.OC

Binno: A 1st-order method for Bi-level Nonconvex Nonsmooth Optimization for Matrix Factorizations

Laura Selicato , Flavia Esposito , Andersen Ang This is my paper

Pith reviewed 2026-05-18 04:50 UTC · model grok-4.3

classification 🧮 math.OC

keywords bi-level optimizationnonconvex nonsmooth optimizationproximal gradient methodmatrix factorizationalternating minimizationlinesearchdescent property

0 comments

The pith

Binno pairs proximal-gradient updates on each level with a linesearch-weighted convex combination to guarantee simultaneous descent for both in nonconvex nonsmooth bi-level problems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops Binno as a first-order method for bi-level optimization in which both the upper-level and lower-level objectives are nonconvex and nonsmooth. It extends the proximal alternate minimization framework by adding a descent-driven averaging mechanism that couples the levels after separate blockwise proximal-gradient steps. A linesearch then chooses the weights of a block-diagonal convex combination so that a surrogate of the overall bi-level objective decreases at every iteration. The method is demonstrated on sparse low-rank matrix factorization, where an l1 penalty sits at the upper level and a nuclear-norm penalty at the lower level, and is shown to produce lower reconstruction error than standard approaches on both synthetic matrices and a real traffic-video dataset.

Core claim

Binno induces a descent property for a suitable surrogate of the bi-level objective. Each iteration performs blockwise proximal-gradient updates for the upper and lower problems separately, then forms a calibrated, block-diagonal convex combination of the two iterates. A linesearch selects combination weights enforcing simultaneous descent of both objectives. Conditions are given ensuring that both these weights and the descent directions induced by the associated proximal-gradient maps exist.

What carries the argument

The calibrated block-diagonal convex combination of proximal-gradient iterates, with weights chosen by linesearch to enforce simultaneous descent of both levels.

If this is right

The procedure applies directly to sparse low-rank factorization with an elementwise l1 upper-level penalty and a nuclear-norm lower-level penalty coupled by a Frobenius data-fidelity term.
It produces lower reconstruction error and higher peak signal-to-noise ratio than standard methods on synthetic matrices and real traffic-video data.
It recovers policy-preferred equilibria on a regularized market-clearing problem and improves on existing bi-level baselines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same descent-driven averaging step could be inserted into other proximal schemes that already handle single-level nonconvex nonsmooth problems.
If fixed weights can be derived analytically for particular regularizers, the linesearch could be removed to obtain a parameter-free variant.

Load-bearing premise

Linesearch can locate combination weights that turn the proximal-gradient maps into valid descent directions for both levels at once.

What would settle it

A run on the traffic-video dataset in which the linesearch returns no valid weights or the method yields higher reconstruction error than the compared bi-level baseline.

Figures

Figures reproduced from arXiv: 2510.21390 by Andersen Ang, Flavia Esposito, Laura Selicato.

**Figure 2.** Figure 2: Synthetic experiment: (left) sparse factors and resulting matrix; (right) Objec [PITH_FULL_IMAGE:figures/full_fig_p023_2.png] view at source ↗

**Figure 3.** Figure 3: Objective bi-level function convergence wrt iterates for the real dataset. [PITH_FULL_IMAGE:figures/full_fig_p024_3.png] view at source ↗

**Figure 4.** Figure 4: (a) Original clip (clip 5 as example) and its noisy observation; (b) Low-rank [PITH_FULL_IMAGE:figures/full_fig_p024_4.png] view at source ↗

**Figure 5.** Figure 5: Low-rank L and sparse S matrices into which the observed matrix is decomposed for all the tested algorithms, for 6 different clips [PITH_FULL_IMAGE:figures/full_fig_p025_5.png] view at source ↗

read the original abstract

Nonconvex and nonsmooth bi-level optimization poses critical theoretical challenges, while arising in several applications. In this work, we develop a method for nonconvex, nonsmooth bi-level optimization and introduce Binno, a first-order method that builds on proximal-gradient updates within the the proximal alternate minimization framework with descent conditions from variational analysis. Binno couples the two levels via a descent-driven averaging mechanism, extending single-level proximal schemes to the nonconvex nonsmooth bi-level setting. We show thatBinno induces a descent property for a suitable surrogate of the bi-level objective. Each iteration performs blockwise proximal-gradient updates for the upper and lower problems separately, then forms a calibrated, block-diagonal convex combination of the two iterates. A linesearch selects combination weights enforcing simultaneous descent of both objectives. We give conditions ensuring that both these weights and the descent directions induced by the associated proximal-gradient maps exist. We apply Binno to sparse low-rank factorization, where the upper level uses elementwise l1 penalties and the lower level uses nuclear norms, coupled via a Frobenius data term. We test Binno on synthetic matrices and a real traffic-video dataset, achieving lower reconstruction error and higher peak signal-to-noise ratio than standard methods. We also validate it on a regularized market-clearing problem, where it selects policy-preferred equilibria, and compare it with bi-level baseline, showing consistent improvements

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Binno adds a descent-driven averaging step to proximal alternate minimization for nonconvex nonsmooth bilevel problems and shows practical gains on matrix factorization, though the linesearch conditions for simultaneous descent need tighter verification.

read the letter

Binno extends single-level proximal schemes to the bilevel nonconvex nonsmooth case by coupling the levels through a calibrated block-diagonal convex combination after separate proximal-gradient updates, then using linesearch to pick weights that enforce descent on a surrogate objective. That averaging mechanism is the main new piece and looks like a direct, workable extension of existing variational tools rather than a complete reinvention. The paper applies it cleanly to sparse low-rank factorization, with an l1 upper level and nuclear-norm lower level tied by a Frobenius term, plus a quick test on regularized market clearing for equilibrium selection. On synthetic matrices and the traffic-video dataset it reports lower reconstruction error and higher PSNR than standard methods, which is the kind of concrete outcome that matters for downstream use. The experiments are limited but at least compare against relevant baselines and cover both synthetic and real data. The soft spot is the theory. The claim that variational analysis guarantees the existence of suitable weights and directions for the coupled nonconvex case is stated, but the abstract gives no derivations or explicit checks on whether the linesearch interval stays nonempty when the Frobenius coupling is active. If that interval empties on some iterates the descent property fails and the convergence argument does not go through. The stress-test concern about unverified conditions for the proximal maps therefore lands; the paper would be stronger with either a counter-example showing the interval is always nonempty or a more explicit bound. Overall this is for readers who need first-order methods for bilevel problems in matrix factorization or equilibrium selection and who are willing to treat the linesearch step as a practical heuristic until the conditions are pinned down. It is worth sending to a serious referee because the algorithmic idea is clear, the application is relevant, and the empirical results are at least directionally positive, even if the theory section will probably need expansion.

Referee Report

2 major / 2 minor

Summary. The paper introduces Binno, a first-order method for bi-level nonconvex nonsmooth optimization. It builds on proximal alternate minimization by performing blockwise proximal-gradient updates on the upper (l1) and lower (nuclear-norm) levels, then forming a calibrated block-diagonal convex combination of the resulting iterates. A linesearch selects the combination weights to enforce simultaneous descent on a surrogate of the bi-level objective. The authors state conditions from variational analysis that guarantee existence of such weights and directions, apply the method to sparse low-rank matrix factorization with a coupled Frobenius term, and report improved reconstruction error and PSNR on synthetic matrices, traffic-video data, and a regularized market-clearing problem relative to standard and bi-level baselines.

Significance. If the claimed descent property and linesearch guarantees hold, the work supplies a practical first-order scheme for a difficult class of bi-level problems that arise in matrix factorizations and equilibrium selection. The explicit construction of the block-diagonal averaging step and the empirical comparisons on both synthetic and real data constitute concrete contributions; however, the absence of explicit error bounds or rate statements in the provided material limits the assessed theoretical impact.

major comments (2)

[§3] §3 (Convergence analysis): The claim that variational-analysis conditions guarantee nonempty linesearch intervals for the block-diagonal weights is load-bearing for the simultaneous-descent property, yet the manuscript provides no explicit verification that these conditions remain satisfied when the proximal maps act on the coupled Frobenius term in the nonconvex nonsmooth regime; if the admissible weight interval is empty at any iterate, the descent argument collapses.
[§4] §4 (Application to matrix factorization): The upper-level l1 and lower-level nuclear-norm proximal operators are applied separately before the convex combination, but the paper does not quantify how the coupling through the data-fidelity term affects the Lipschitz constants or subdifferential inclusions used to justify the existence of descent directions; this detail is required to confirm that the stated conditions apply to the concrete problem.

minor comments (2)

[Abstract] Abstract: typographical errors include 'within the the proximal' and 'thatBinno' (missing space); these should be corrected for readability.
Notation: the surrogate objective whose descent is enforced is introduced without an explicit equation label; adding a numbered display equation would clarify subsequent references.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments on our manuscript. The points raised regarding the convergence analysis and its application to the matrix factorization problem are well-taken. We address each major comment below and will make revisions to the manuscript to provide the requested clarifications and verifications.

read point-by-point responses

Referee: [§3] §3 (Convergence analysis): The claim that variational-analysis conditions guarantee nonempty linesearch intervals for the block-diagonal weights is load-bearing for the simultaneous-descent property, yet the manuscript provides no explicit verification that these conditions remain satisfied when the proximal maps act on the coupled Frobenius term in the nonconvex nonsmooth regime; if the admissible weight interval is empty at any iterate, the descent argument collapses.

Authors: We appreciate the referee highlighting this critical aspect. The variational analysis conditions in §3 are formulated at a general level to ensure the existence of suitable weights for the block-diagonal convex combination that guarantee simultaneous descent on the surrogate bi-level objective. These conditions depend on the properties of the proximal-gradient directions and the smoothness of the coupling term. In the matrix factorization application, the Frobenius data-fidelity term is smooth and its gradient is Lipschitz continuous with constant equal to twice the square of the operator norm of the data matrix (or a bound thereof). The proximal operators for the l1 and nuclear norms are applied to the respective blocks after the gradient step on the coupled term, preserving the subdifferential inclusions. While the manuscript states the general conditions, we acknowledge the lack of an explicit check for this setting. In the revision, we will add a subsection or appendix that verifies the conditions hold for the coupled Frobenius term by providing bounds on the Lipschitz constant and confirming that the descent directions exist under mild assumptions on the data. This will ensure the linesearch intervals are nonempty. revision: yes
Referee: [§4] §4 (Application to matrix factorization): The upper-level l1 and lower-level nuclear-norm proximal operators are applied separately before the convex combination, but the paper does not quantify how the coupling through the data-fidelity term affects the Lipschitz constants or subdifferential inclusions used to justify the existence of descent directions; this detail is required to confirm that the stated conditions apply to the concrete problem.

Authors: We agree that more detail on the impact of the coupling is necessary. The data-fidelity term couples the upper and lower variables through the Frobenius norm, which enters the gradient computations for both levels. The Lipschitz constant for these gradients can be quantified as L = 2 ||A||^2 where A is the data matrix (or its appropriate reshaping), which is finite for any fixed data. The subdifferential inclusions for the proximal steps remain valid because the nonsmooth regularizers are block-separable, and the smooth coupling is fully accounted for in the explicit gradient steps prior to applying the proximal maps. To address the referee's concern, we will revise the application section to include these explicit quantifications and a short argument showing that the general conditions from §3 are satisfied in this case. This will not change the algorithm but will strengthen the theoretical justification for its use on the matrix factorization problem. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation builds on external proximal and variational tools without self-reduction

full rationale

The paper constructs Binno by extending proximal-gradient updates and alternate minimization with linesearch on block-diagonal convex combinations to enforce simultaneous descent on a bi-level surrogate. The existence of suitable weights and directions is asserted under stated conditions from variational analysis, but the central descent property is derived from the proximal maps and linesearch mechanism rather than being presupposed by definition or by a fitted parameter. No equation reduces to its own input by construction, no prediction is statistically forced from a subset fit, and no load-bearing uniqueness theorem or ansatz is imported solely via self-citation. The matrix-factorization application and numerical tests serve as external validation rather than closing a self-referential loop. The derivation chain therefore remains independent of its own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The abstract relies on standard results from variational analysis and proximal operators without introducing new free parameters or invented entities; the existence of suitable linesearch weights is treated as a domain assumption.

axioms (1)

domain assumption Existence of proximal-gradient maps that induce descent directions under the stated conditions
Invoked to guarantee that the linesearch can select valid combination weights.

pith-pipeline@v0.9.0 · 5787 in / 1330 out tokens · 30106 ms · 2026-05-18T04:50:39.263746+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages · 1 internal anchor

[1]

Dempe, A

S. Dempe, A. Zemkoho (Eds.), Bilevel optimization, 2020th Edition, Springer optimization and its applications, Springer Nature, Cham, Switzerland, 2020

work page 2020
[2]

Del Buono, F

N. Del Buono, F. Esposito, L. Selicato, R. Zdunek, Bi-level algorithm for optimizing hyperparameters in penalized nonnegative matrix fac- torization, Applied Mathematics and Computation 457 (2023) 128184. doi:https://doi.org/10.1016/j.amc.2023.128184. 26

work page doi:10.1016/j.amc.2023.128184 2023
[3]

Del Buono, F

N. Del Buono, F. Esposito, L. Selicato, R. Zdunek, Penalty hyperpa- rameter optimization with diversity measure for nonnegative low-rank approximation, Applied Numerical Mathematics 208 (2025) 189–204, special Volume on Numerical Analysis and Scientific Computation with Applications.doi:https://doi.org/10.1016/j.apnum.2024.10.002

work page doi:10.1016/j.apnum.2024.10.002 2025
[4]

P. L. Combettes, J.-C. Pesquet, Proximal Splitting Methods in Signal Processing, Springer New York, New York, NY, 2011, pp. 185–212.doi: 10.1007/978-1-4419-9569-8_10

work page doi:10.1007/978-1-4419-9569-8_10 2011
[5]

Parikh, S

N. Parikh, S. Boyd, et al., Proximal algorithms, Foundations and Trends in Optimization 1 (3) (2014) 127–239.doi:10.1561/2400000003

work page doi:10.1561/2400000003 2014
[6]

Sabach, S

S. Sabach, S. Shtern, A first order method for solving convex bilevel optimization problems, SIAM Journal on Optimization 27 (2) (2017) 640–660.doi:10.1137/16M105592X

work page doi:10.1137/16m105592x 2017
[7]

J. Cao, R. Jiang, E. Y. Hamedani, A. Mokhtari, An accelerated gradient method for convex smooth simple bilevel optimization, in: The Thirty- eighth Annual Conference on Neural Information Processing Systems, 2024

work page 2024
[8]

Giang-Tran, N

K.-H. Giang-Tran, N. Ho-Nguyen, D. Lee, A projection-free method for solving convex bilevel optimization problems, Mathematical Program- ming 213 (1-2) (2024) 907–940.doi:10.1007/s10107-024-02157-1

work page doi:10.1007/s10107-024-02157-1 2024
[9]

Bolte, S

J. Bolte, S. Sabach, M. Teboulle, Proximal alternating linearized minimization for nonconvex and nonsmooth problems, Mathe- matical Programming 146 (1-2) (2013) 459–494.doi:10.1007/ s10107-013-0701-9

work page 2013
[10]

Martinet, Brève communication

B. Martinet, Brève communication. régularisation d’inéquations variationnelles par approximations successives, Revue francaise d’informatique et de recherche opérationnelle. Série rouge 4 (R3) (1970) 154–158

work page 1970
[11]

Society for Industrial and Applied Mathematics, Philadelphia, PA, 2017

A. Beck, First-Order Methods in Optimization, Society for Industrial and Applied Mathematics, 2017.doi:10.1137/1.9781611974997. 27

work page doi:10.1137/1.9781611974997 2017
[12]

H. H. Bauschke, P. L. Combettes, Convex Analysis and Monotone Oper- ator Theory in Hilbert Spaces, Springer International Publishing, 2017. doi:10.1007/978-3-319-48311-5

work page doi:10.1007/978-3-319-48311-5 2017
[13]

Nesterov, Lectures on Convex Optimization, Springer International Publishing, 2018.doi:10.1007/978-3-319-91578-4

Y. Nesterov, Lectures on Convex Optimization, Springer International Publishing, 2018.doi:10.1007/978-3-319-91578-4

work page doi:10.1007/978-3-319-91578-4 2018
[14]

PAMI38(7), 1425–1438 (2016).https://doi.org/10.1109/TPAMI

P. Sprechmann, A. M. Bronstein, G. Sapiro, Learning Efficient Sparse and Low Rank Models, IEEE Transactions on Pattern Analysis and Machine Intelligence 37 (9) (2015) 1821–1833.doi:10.1109/TPAMI. 2015.2392779

work page doi:10.1109/tpami 2015
[15]

Watson, Characterization of the subdifferential of some matrix norms, Linear Algebra and its Applications 170 (1992) 33–45.doi: https://doi.org/10.1016/0024-3795(92)90407-2

G. Watson, Characterization of the subdifferential of some matrix norms, Linear Algebra and its Applications 170 (1992) 33–45.doi: https://doi.org/10.1016/0024-3795(92)90407-2

work page doi:10.1016/0024-3795(92)90407-2 1992
[16]

J.-F. Cai, E. J. Candès, Z. Shen, A singular value thresholding algorithm for matrix completion, SIAM Journal on Optimization 20 (4) (2010) 1956–1982.doi:10.1137/080738970

work page doi:10.1137/080738970 2010
[17]

Y. Ji, J. Eisenstein, Discriminative improvements to distributional sen- tence similarity, in: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 2013

work page 2013
[18]

N. S. Aybat, D. Goldfarb, G. Iyengar, Fast first-order methods for stable principal component pursuit (2011).doi:10.48550/ARXIV.1105.2126

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1105.2126 2011
[19]

D. D. Lee, H. S. Seung, Learning the parts of objects by non-negative matrix factorization, Nature 401 (6755) (1999) 788–791.doi:10.1038/ 44565

work page 1999
[20]

Z. Lin, H. Li, C. Fang, Alternating Direction Method of Multipliers for Machine Learning, Springer Nature Singapore, 2022.doi:10.1007/ 978-981-16-9840-8

work page 2022
[21]

N. S. Aybat, D. Goldfarb, S. Ma, Efficient algorithms for robust and stable principal component pursuit problems, Computational Optimiza- tion and Applications 58 (1) (2014) 1–29.doi:https://doi.org/10. 1007/s10589-013-9613-0. 28

work page 2014
[22]

N. S. Aybat, G. Iyengar, An alternating direction method with in- creasing penalty for stable principal component pursuit, Computa- tional Optimization and Applications 61 (3) (2015) 635–668.doi: 10.1007/s10589-015-9736-6

work page doi:10.1007/s10589-015-9736-6 2015
[23]

Sobral, T

A. Sobral, T. Bouwmans, E.-h. Zahzah, LRSLibrary: Low-Rank and Sparse tools for Background Modeling and Subtraction in Videos, in: Robust Low-Rank and Sparse Matrix Decomposition: Applications in Image and Video Processing, CRC Press, Taylor and Francis Group., 2015

work page 2015
[24]

A. Chan, N. Vasconcelos, Probabilistic kernels for the classification of auto-regressive visual processes, in: 2005 IEEE Computer Society Con- ferenceonComputerVisionandPatternRecognition(CVPR’05), Vol.1, IEEE, pp. 846–851.doi:10.1109/cvpr.2005.279

work page doi:10.1109/cvpr.2005.279 2005
[25]

A. B. Chan, N. Vasconcelos, Classification and retrieval of traffic video using auto-regressive stochastic processes, in: IEEE Proceedings. In- telligent Vehicles Symposium, 2005., IEEE, 2005, pp. 771–776.doi: 10.1109/IVS.2005.1505198

work page doi:10.1109/ivs.2005.1505198 2005
[26]

Halko, P

N. Halko, P. G. Martinsson, J. A. Tropp, Finding structure with ran- domness: Probabilistic algorithms for constructing approximate matrix decompositions, SIAM Review 53 (2) (2011) 217–288.doi:10.1137/ 090771806

work page 2011
[27]

Z. Wang, A. C. Bovik, Modern Image Quality Assessment, Morgan & Claypool, 2006.doi:https://doi.org/10.1007/978-3-031-02238-8. 29

work page doi:10.1007/978-3-031-02238-8 2006

[1] [1]

Dempe, A

S. Dempe, A. Zemkoho (Eds.), Bilevel optimization, 2020th Edition, Springer optimization and its applications, Springer Nature, Cham, Switzerland, 2020

work page 2020

[2] [2]

Del Buono, F

N. Del Buono, F. Esposito, L. Selicato, R. Zdunek, Bi-level algorithm for optimizing hyperparameters in penalized nonnegative matrix fac- torization, Applied Mathematics and Computation 457 (2023) 128184. doi:https://doi.org/10.1016/j.amc.2023.128184. 26

work page doi:10.1016/j.amc.2023.128184 2023

[3] [3]

Del Buono, F

N. Del Buono, F. Esposito, L. Selicato, R. Zdunek, Penalty hyperpa- rameter optimization with diversity measure for nonnegative low-rank approximation, Applied Numerical Mathematics 208 (2025) 189–204, special Volume on Numerical Analysis and Scientific Computation with Applications.doi:https://doi.org/10.1016/j.apnum.2024.10.002

work page doi:10.1016/j.apnum.2024.10.002 2025

[4] [4]

P. L. Combettes, J.-C. Pesquet, Proximal Splitting Methods in Signal Processing, Springer New York, New York, NY, 2011, pp. 185–212.doi: 10.1007/978-1-4419-9569-8_10

work page doi:10.1007/978-1-4419-9569-8_10 2011

[5] [5]

Parikh, S

N. Parikh, S. Boyd, et al., Proximal algorithms, Foundations and Trends in Optimization 1 (3) (2014) 127–239.doi:10.1561/2400000003

work page doi:10.1561/2400000003 2014

[6] [6]

Sabach, S

S. Sabach, S. Shtern, A first order method for solving convex bilevel optimization problems, SIAM Journal on Optimization 27 (2) (2017) 640–660.doi:10.1137/16M105592X

work page doi:10.1137/16m105592x 2017

[7] [7]

J. Cao, R. Jiang, E. Y. Hamedani, A. Mokhtari, An accelerated gradient method for convex smooth simple bilevel optimization, in: The Thirty- eighth Annual Conference on Neural Information Processing Systems, 2024

work page 2024

[8] [8]

Giang-Tran, N

K.-H. Giang-Tran, N. Ho-Nguyen, D. Lee, A projection-free method for solving convex bilevel optimization problems, Mathematical Program- ming 213 (1-2) (2024) 907–940.doi:10.1007/s10107-024-02157-1

work page doi:10.1007/s10107-024-02157-1 2024

[9] [9]

Bolte, S

J. Bolte, S. Sabach, M. Teboulle, Proximal alternating linearized minimization for nonconvex and nonsmooth problems, Mathe- matical Programming 146 (1-2) (2013) 459–494.doi:10.1007/ s10107-013-0701-9

work page 2013

[10] [10]

Martinet, Brève communication

B. Martinet, Brève communication. régularisation d’inéquations variationnelles par approximations successives, Revue francaise d’informatique et de recherche opérationnelle. Série rouge 4 (R3) (1970) 154–158

work page 1970

[11] [11]

Society for Industrial and Applied Mathematics, Philadelphia, PA, 2017

A. Beck, First-Order Methods in Optimization, Society for Industrial and Applied Mathematics, 2017.doi:10.1137/1.9781611974997. 27

work page doi:10.1137/1.9781611974997 2017

[12] [12]

H. H. Bauschke, P. L. Combettes, Convex Analysis and Monotone Oper- ator Theory in Hilbert Spaces, Springer International Publishing, 2017. doi:10.1007/978-3-319-48311-5

work page doi:10.1007/978-3-319-48311-5 2017

[13] [13]

Nesterov, Lectures on Convex Optimization, Springer International Publishing, 2018.doi:10.1007/978-3-319-91578-4

Y. Nesterov, Lectures on Convex Optimization, Springer International Publishing, 2018.doi:10.1007/978-3-319-91578-4

work page doi:10.1007/978-3-319-91578-4 2018

[14] [14]

PAMI38(7), 1425–1438 (2016).https://doi.org/10.1109/TPAMI

P. Sprechmann, A. M. Bronstein, G. Sapiro, Learning Efficient Sparse and Low Rank Models, IEEE Transactions on Pattern Analysis and Machine Intelligence 37 (9) (2015) 1821–1833.doi:10.1109/TPAMI. 2015.2392779

work page doi:10.1109/tpami 2015

[15] [15]

Watson, Characterization of the subdifferential of some matrix norms, Linear Algebra and its Applications 170 (1992) 33–45.doi: https://doi.org/10.1016/0024-3795(92)90407-2

G. Watson, Characterization of the subdifferential of some matrix norms, Linear Algebra and its Applications 170 (1992) 33–45.doi: https://doi.org/10.1016/0024-3795(92)90407-2

work page doi:10.1016/0024-3795(92)90407-2 1992

[16] [16]

J.-F. Cai, E. J. Candès, Z. Shen, A singular value thresholding algorithm for matrix completion, SIAM Journal on Optimization 20 (4) (2010) 1956–1982.doi:10.1137/080738970

work page doi:10.1137/080738970 2010

[17] [17]

Y. Ji, J. Eisenstein, Discriminative improvements to distributional sen- tence similarity, in: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 2013

work page 2013

[18] [18]

N. S. Aybat, D. Goldfarb, G. Iyengar, Fast first-order methods for stable principal component pursuit (2011).doi:10.48550/ARXIV.1105.2126

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1105.2126 2011

[19] [19]

D. D. Lee, H. S. Seung, Learning the parts of objects by non-negative matrix factorization, Nature 401 (6755) (1999) 788–791.doi:10.1038/ 44565

work page 1999

[20] [20]

Z. Lin, H. Li, C. Fang, Alternating Direction Method of Multipliers for Machine Learning, Springer Nature Singapore, 2022.doi:10.1007/ 978-981-16-9840-8

work page 2022

[21] [21]

N. S. Aybat, D. Goldfarb, S. Ma, Efficient algorithms for robust and stable principal component pursuit problems, Computational Optimiza- tion and Applications 58 (1) (2014) 1–29.doi:https://doi.org/10. 1007/s10589-013-9613-0. 28

work page 2014

[22] [22]

N. S. Aybat, G. Iyengar, An alternating direction method with in- creasing penalty for stable principal component pursuit, Computa- tional Optimization and Applications 61 (3) (2015) 635–668.doi: 10.1007/s10589-015-9736-6

work page doi:10.1007/s10589-015-9736-6 2015

[23] [23]

Sobral, T

A. Sobral, T. Bouwmans, E.-h. Zahzah, LRSLibrary: Low-Rank and Sparse tools for Background Modeling and Subtraction in Videos, in: Robust Low-Rank and Sparse Matrix Decomposition: Applications in Image and Video Processing, CRC Press, Taylor and Francis Group., 2015

work page 2015

[24] [24]

A. Chan, N. Vasconcelos, Probabilistic kernels for the classification of auto-regressive visual processes, in: 2005 IEEE Computer Society Con- ferenceonComputerVisionandPatternRecognition(CVPR’05), Vol.1, IEEE, pp. 846–851.doi:10.1109/cvpr.2005.279

work page doi:10.1109/cvpr.2005.279 2005

[25] [25]

A. B. Chan, N. Vasconcelos, Classification and retrieval of traffic video using auto-regressive stochastic processes, in: IEEE Proceedings. In- telligent Vehicles Symposium, 2005., IEEE, 2005, pp. 771–776.doi: 10.1109/IVS.2005.1505198

work page doi:10.1109/ivs.2005.1505198 2005

[26] [26]

Halko, P

N. Halko, P. G. Martinsson, J. A. Tropp, Finding structure with ran- domness: Probabilistic algorithms for constructing approximate matrix decompositions, SIAM Review 53 (2) (2011) 217–288.doi:10.1137/ 090771806

work page 2011

[27] [27]

Z. Wang, A. C. Bovik, Modern Image Quality Assessment, Morgan & Claypool, 2006.doi:https://doi.org/10.1007/978-3-031-02238-8. 29

work page doi:10.1007/978-3-031-02238-8 2006