pith. sign in

arxiv: 2605.28980 · v1 · pith:LDWNKIVDnew · submitted 2026-05-27 · 🧮 math.OC · cs.LG· cs.NA· eess.SP· math.NA

Manifold-based Algorithms for the Hadamard Decomposition

Pith reviewed 2026-06-29 10:19 UTC · model grok-4.3

classification 🧮 math.OC cs.LGcs.NAeess.SPmath.NA
keywords Hadamard decompositionmanifold optimizationlow-rank approximationelement-wise productgradient methodssparse matricesinitializationnonnegative matrix factorization
0
0 comments X

The pith

Reformulating Hadamard decomposition as a manifold-constrained matrix product enables three new optimization algorithms that scale to large sparse data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that the Hadamard decomposition, seeking two low-rank factors whose elementwise product approximates a given matrix, admits a reformulation as X approximately equal to W H transpose with W and H constrained to manifolds that have r1 times r2 columns. This reformulation supports the design of a block projected gradient method and a manifold gradient descent procedure that avoids explicit projections, both suited to sparse inputs, plus a Manopt-based solver on the original form. New initialization strategies are introduced to raise solution quality. Numerical tests indicate the resulting methods match or exceed prior approaches on synthetic and real matrices.

Core claim

The Hadamard decomposition X approximately equal to X1 circle X2 with rank(X1) equal to r1 and rank(X2) equal to r2 can be rewritten as X approximately equal to W H transpose where W and H each possess r1 r2 columns and lie on prescribed manifolds; this identity permits three manifold optimization algorithms, one of which operates without projection steps, that compute the decomposition efficiently on large sparse data and remain competitive with the truncated SVD and existing solvers.

What carries the argument

The reformulation X approximately equal to W H transpose with W and H having r1 r2 columns and belonging to certain manifolds, which converts the elementwise-product problem into a standard low-rank factorization amenable to manifold gradient methods.

If this is right

  • The block projected gradient and manifold gradient descent algorithms become practical for large sparse matrices where projection steps are costly.
  • The proposed initializations raise the accuracy attainable by any of the three solvers.
  • The manifold reformulation makes the decomposition generically capable of representing matrices of rank up to r1 r2 with the same parameter count as a rank-r1 or rank-r2 factorization.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the manifold geometry can be exploited further, the same reformulation might yield closed-form updates for certain loss functions beyond squared error.
  • The approach could be tested on matrices that are known to require high rank for good approximation, to quantify how much the r1 r2 rank boost actually improves reconstruction error in practice.
  • Extending the initialization strategies to warm-start from a truncated SVD might further reduce the number of iterations needed on real-world data.

Load-bearing premise

The manifold reformulation preserves every solution of the original Hadamard decomposition without adding bias or restricting the set of attainable matrices.

What would settle it

A counter-example matrix whose known minimal-error Hadamard factors cannot be recovered to the same accuracy by any of the three new algorithms while the truncated SVD succeeds would falsify the claim of competitiveness.

Figures

Figures reproduced from arXiv: 2605.28980 by Arnaud Vandaele, Nicolas Gillis, Stefano Sicilia, Subhayan Saha.

Figure 1
Figure 1. Figure 1: Dog image (400 × 600 × 3): reconstruction by the different rank-20 HDs and by the rank-40 TSVD, relative error for each color band is reported. In all cases the best initialization is FS, except for the second layer with BCD for which it is the SVD-based initialization. 23 [PITH_FULL_IMAGE:figures/full_fig_p023_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Approximations of the cameraman image (256 × 256) with rank-16 HD and rank-32 TSVD. The relative error in percent is reported. In all cases, the best initialization is FS. 5.3 Sparse document datasets We now consider text datasets from [47], which are sparse matrices. The algorithms manBCD and projBCD exploit the sparsity structure, while Manopt and BCD cannot, as they both require building full m × n matr… view at source ↗
read the original abstract

Given a matrix $X$, and two ranks $r_1$ and $r_2$, the Hadamard decomposition (HD) looks for two low-rank matrices, $X_1$ of rank $r_1$ and $X_2$ of rank $r_2$, both of the same size as $X$, such that $X\approx X_1\circ X_2$, where $\circ$ is the Hadamard (element-wise) product. In most cases, HD is more expressive than standard low-rank approximations such as the truncated singular value decomposition (TSVD), as it can represent higher-rank matrices with the same number of parameters; this is because the rank of $X_1 \circ X_2$ is generically equal to $r_1 r_2$. In this paper, we first present some theoretical insights for HD, in particular a useful reformulation $X\approx WH^\top$ where $W$ and $H$ have $r_1 r_2$ columns and belong to certain manifolds. These allow us to develop three new algorithms for computing HD. The first one uses the representation $X\approx X_1\circ X_2$ and relies on the Manopt toolbox. The other two rely on the reformulation $X\approx WH^\top$: one is a block projected gradient method, and the other is a manifold-based gradient descent algorithm that does not require projection onto the feasible set. The last two algorithms are particularly effective for handling large sparse data. We also propose new initializations that allow us to improve the accuracy of the HD. We compare our algorithms and initialization strategies with the TSVD and with the state of the art. Numerical results show that the new methods are efficient and competitive on both synthetic and real data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents theoretical insights into the Hadamard decomposition (HD) problem of finding low-rank matrices X1 (rank r1) and X2 (rank r2) such that X ≈ X1 ◦ X2. It introduces a reformulation as X ≈ WH^T where W and H have r1 r2 columns and lie on certain manifolds, develops three algorithms (a Manopt-based method on the original formulation, a block projected gradient method, and a manifold gradient descent method on the reformulation), proposes new initializations, and reports that the methods are efficient and competitive with TSVD and prior methods on synthetic and real data, especially for large sparse matrices.

Significance. If the reformulation is equivalent to the original HD problem, the manifold-based algorithms could provide practical, scalable tools for an approximation technique that is more expressive than TSVD (achieving effective rank up to r1 r2 with fewer parameters). The emphasis on handling large sparse data via the reformulation is a potential strength for applications in optimization and data analysis.

major comments (2)
  1. [theoretical insights / reformulation] The section on theoretical insights presents the reformulation X ≈ WH^T (with W, H having r1 r2 columns on specific manifolds) as enabling the algorithms, but provides no explicit verification that the mapping is surjective onto all feasible (X1, X2) pairs satisfying rank(X1)=r1 and rank(X2)=r2, or that objective values coincide exactly. This is load-bearing for the claim that the new algorithms solve the stated HD task and that numerical comparisons demonstrate superiority on it rather than a potentially restricted subproblem.
  2. [numerical results] The numerical results section claims the methods are 'efficient and competitive' on synthetic and real data, but the description lacks detailed experimental protocols, convergence analysis, or error bounds for the manifold-constrained formulations. Without these, it is unclear whether the reported performance directly validates the HD solutions or is affected by any restriction in the feasible set induced by the manifolds.
minor comments (2)
  1. [abstract] The abstract states that the last two algorithms 'are particularly effective for handling large sparse data' but does not specify the sparsity exploitation mechanism or provide scaling results with matrix dimensions.
  2. [reformulation] Notation for the manifolds on which W and H are constrained is introduced without an early definition or reference to standard manifold optimization literature.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the presentation of our results. We address the two major comments point by point below.

read point-by-point responses
  1. Referee: The section on theoretical insights presents the reformulation X ≈ WH^T (with W, H having r1 r2 columns on specific manifolds) as enabling the algorithms, but provides no explicit verification that the mapping is surjective onto all feasible (X1, X2) pairs satisfying rank(X1)=r1 and rank(X2)=r2, or that objective values coincide exactly. This is load-bearing for the claim that the new algorithms solve the stated HD task and that numerical comparisons demonstrate superiority on it rather than a potentially restricted subproblem.

    Authors: We appreciate this observation. The reformulation is obtained by substituting the low-rank factorizations of X1 and X2 into the Hadamard product and regrouping terms to obtain W and H; the manifold constraints are chosen precisely so that every pair (X1, X2) of the required ranks is attained. To make this equivalence fully explicit, we will insert a new proposition (with proof) in the theoretical insights section establishing surjectivity of the map and equality of the objective values. This addition will confirm that the algorithms address the original HD problem. revision: yes

  2. Referee: The numerical results section claims the methods are 'efficient and competitive' on synthetic and real data, but the description lacks detailed experimental protocols, convergence analysis, or error bounds for the manifold-constrained formulations. Without these, it is unclear whether the reported performance directly validates the HD solutions or is affected by any restriction in the feasible set induced by the manifolds.

    Authors: We agree that additional detail will strengthen the numerical section. We will expand it to include: (i) complete experimental protocols (parameter choices, stopping criteria, data generation procedures, and computational environment), (ii) a convergence analysis for the manifold gradient descent algorithm, and (iii) a brief discussion of error bounds for the constrained formulations, cross-referencing the new equivalence proposition. These revisions will clarify that the reported results pertain to the original HD task. revision: yes

Circularity Check

0 steps flagged

No circularity: reformulation and algorithms rest on independent manifold optimization and numerical validation

full rationale

The paper introduces a reformulation of Hadamard decomposition as X ≈ WH^T with W, H on specified manifolds, then develops three algorithms (one via Manopt on the original form, two on the reformulation) and compares them empirically to TSVD and prior methods. No derivation step reduces a claimed result to its own inputs by construction, no fitted parameters are relabeled as predictions, and no load-bearing claims depend on self-citations whose validity is internal to the paper. The equivalence of the reformulation is asserted as a theoretical insight but is not used to derive the numerical competitiveness claim; that rests on separate algorithmic implementation and experiments. This is a standard non-circular algorithmic paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper relies on standard assumptions from manifold optimization and low-rank matrix theory with no new free parameters, axioms, or invented entities introduced beyond the problem setup itself.

pith-pipeline@v0.9.1-grok · 5883 in / 1130 out tokens · 30469 ms · 2026-06-29T10:19:11.007373+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

47 extracted references · 6 canonical work pages · 1 internal anchor

  1. [1]

    A. M. S. Ang and N. Gillis. Accelerating nonnegative matrix factorization algorithms using extrap- olation.Neural Computation, 31(2):417–439, 2019

  2. [2]

    Alternating Direction Method of Multipliers for Nonlinear Matrix Decompositions

    A. Awari, N. Gillis, and A. Vandaele. Alternating Direction Method of Multipliers for Nonlinear Matrix Decompositions.arXiv preprint arXiv:2512.17473, 2025

  3. [3]

    Bocci, E

    C. Bocci, E. Carlini, and J. Kileel. Hadamard products of linear spaces.Journal of Algebra, 448:595–617, 2016

  4. [4]

    Boumal, B

    N. Boumal, B. Mishra, P.-A. Absil, and R. Sepulchre. Manopt, a Matlab toolbox for optimization on manifolds.The Journal of Machine Learning Research, 15(1):1455–1459, 2014

  5. [5]

    N. E. Breslow, J. H. Lubin, P. Marek, and B. Langholz. Multiplicative models and cohort analysis. Journal of the American Statistical Association, 78(381):1–12, 1983

  6. [6]

    Budzinskiy

    S. Budzinskiy. When big data actually are low-rank, or entrywise approximation of certain function- generated matrices.SIAM Journal on Mathematics of Data Science, 7(3):1098–1122, 2025

  7. [7]

    Ciaperoni, A

    M. Ciaperoni, A. Gionis, and H. Mannila. The Hadamard decomposition problem.Data Mining and Knowledge Discovery, 38(4):2306–2347, 2024

  8. [8]

    De Marinis, N

    A. De Marinis, N. Guglielmi, S. Sicilia, and F. Tudisco. Improving the robustness of neural ODEs with minimal weight perturbation.arXiv preprint arXiv:2501.10740, 2025

  9. [9]

    Durrieu, B

    J.-L. Durrieu, B. David, and G. Richard. A musically motivated mid-level representation for pitch estimation and musical audio source separation.IEEE Journal of Selected Topics in Signal Pro- cessing, 5(6):1180–1191, 2011

  10. [10]

    Durrieu, A

    J.-L. Durrieu, A. Ozerov, C. Févotte, G. Richard, and B. David. Main instrument separation from stereophonic audio signals using a source/filter model. InEuropean Signal Processing Conference, 2009

  11. [11]

    Durrieu, G

    J.-L. Durrieu, G. Richard, B. David, and C. Févotte. Source/filter model for unsupervised main melody extraction from polyphonic audio signals.IEEE Transactions on Audio, Speech and Lan- guage Processing, 18(3):564–575, 2010. 25

  12. [12]

    Eckart and G

    C. Eckart and G. Young. The approximation of one matrix by another of lower rank.Psychometrika, 1(3):211–218, 1936

  13. [13]

    Fawzi, J

    H. Fawzi, J. Gouveia, P. A. Parrilo, R. Z. Robinson, and R. R. Thomas. Positive semidefinite rank. Mathematical Programming, 153(1):133–177, 2015

  14. [14]

    Fischer and C

    A. Fischer and C. Igel. An introduction to restricted Boltzmann machines. InIberoamerican congress on pattern recognition, pages 14–36. Springer, 2012

  15. [15]

    Springer New York, New York, NY, 2017

    N.Friedenberg, A.Oneto, andR.L.Williams.Minkowski Sums and Hadamard Products of Algebraic Varieties, pages 133–157. Springer New York, New York, NY, 2017

  16. [16]

    Friedenberg, A

    N. Friedenberg, A. Oneto, and R. L. Williams. Minkowski sums and Hadamard products of algebraic varieties. InCombinatorial Algebraic Geometry: Selected Papers From the 2016 Apprenticeship Program, pages 133–157. Springer, 2017

  17. [17]

    Gillis.Nonnegative matrix factorization

    N. Gillis.Nonnegative matrix factorization. SIAM, Philadelphia, 2020

  18. [18]

    Gillis, M

    N. Gillis, M. Porcelli, and G. Seraghiti. An extrapolated and provably convergent algorithm for nonlinear matrix decomposition with the ReLU function.arXiv preprint arXiv:2503.23832, 2025

  19. [19]

    Guglielmi and C

    N. Guglielmi and C. Lubich. Matrix stabilization using differential equations.SIAM Journal on Numerical Analysis, 55(6):3097–3119, 2017

  20. [20]

    Guglielmi and C

    N. Guglielmi and C. Lubich. Matrix nearness problems and eigenvalue optimization.arXiv preprint arXiv:2503.14750, 2025

  21. [21]

    Guglielmi, C

    N. Guglielmi, C. Lubich, and S. Sicilia. Rank-1 Matrix Differential Equations for Structured Eigen- value Optimization.SIAM Journal on Numerical Analysis, 61(4):1737–1762, 2023

  22. [22]

    Guglielmi and S

    N. Guglielmi and S. Sicilia. Stabilization of a matrix via a low-rank-adaptive ODE.BIT Numerical Mathematics, 64(4):38, 2024

  23. [23]

    Guglielmi and S

    N. Guglielmi and S. Sicilia. A low-rank ODE for spectral clustering stability.Linear Algebra and its applications, 721:250–276, 2025

  24. [24]

    Huang, T

    Q. Huang, T. Ko, Z. Zhuang, L. Tang, and Y. Zhang. Hira: Parameter-efficient Hadamard high-rank adaptation for large language models. InInternational Conference on Learning Representations, 2025

  25. [25]

    Hyeon-Woo, M

    N. Hyeon-Woo, M. Ye-Bin, and T. Oh. Fedpara: Low-rank hadamard product parameterization for efficient federated learning.ICLR, 2022

  26. [26]

    Hyeon-Woo, M

    N. Hyeon-Woo, M. Ye-Bin, and T.-H. Oh. Fedpara: Low-rank Hadamard product for communication-efficient federated learning.arXiv preprint arXiv:2108.06098, 2021

  27. [27]

    Koch and C

    O. Koch and C. Lubich. Dynamical low-rank approximation.SIAM Journal on Matrix Analysis and Applications, 29(2):434–454, 2007

  28. [28]

    Koren, R

    Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. Computer, 42(8):30–37, 2009

  29. [29]

    Lefebvre, A

    J. Lefebvre, A. Vandaele, and N. Gillis. Component-Wise Squared Factorization. InInternational Workshop on Machine Learning for Signal Processing (MLSP), 2024

  30. [30]

    Y. Li, L. Balzano, D. Needell, and H. Lyu. Convergence and complexity of block majorization- minimization for constrained block-Riemannian optimization.Journal of Machine Learning Re- search, 27(42):1–77, 2026

  31. [31]

    Loconte, N

    L. Loconte, N. Di Mauro, R. Peharz, and A. Vergari. Your Knowledge Graph Embeddings are Se- cretly Circuits and You Should Treat Them as Such. InThe 5th Workshop on Tractable Probabilistic Modeling, 2022

  32. [32]

    Loconte, N

    L. Loconte, N. D. Mauro, R. Peharz, and A. Vergari. How to turn your knowledge graph embeddings into generative models, 2024

  33. [33]

    Mnih and R

    A. Mnih and R. R. Salakhutdinov. Probabilistic matrix factorization.Advances in Neural Informa- tion Processing Systems, 20, 2007. 26

  34. [34]

    Nesterov.Introductory lectures on convex optimization: A basic course, volume 137

    Y. Nesterov.Introductory lectures on convex optimization: A basic course, volume 137. Springer Science & Business Media, 2nd edition, 2018

  35. [35]

    Nguyen, A

    H. Nguyen, A. Awari, A. Vandaele, and N. Gillis. Nonlinear Matrix Decomposition with the Sigmoid Function. InInternational Workshop on Machine Learning for Signal Processing (MLSP), 2025

  36. [36]

    Oneto and N

    A. Oneto and N. Vannieuwenhoven. Hadamard-Hitchcock decompositions: identifiability and com- putation.arXiv preprint arXiv:2308.06597, 2023

  37. [37]

    Park and Y

    T. Park and Y. Nakatsukasa. Low-rank approximation of parameter-dependent matrices via cur decomposition.SIAM Journal on Scientific Computing, 47(3):A1858–A1887, 2025

  38. [38]

    Peng and R

    L. Peng and R. Vidal. Block coordinate descent on smooth manifolds: Convergence theory and twenty-one examples, 2023

  39. [39]

    L. K. Saul. A Nonlinear Matrix Decomposition for Mining the Zeros of Sparse Data.SIAM Journal on Mathematics of Data Science, 4(2):431–463, 2022

  40. [40]

    Seraghiti, A

    G. Seraghiti, A. Awari, A. Vandaele, M. Porcelli, and N. Gillis. Accelerated algorithms for nonlinear matrix decomposition with the ReLU function. InInternational Workshop on Machine Learning for Signal Processing (MLSP), 2023

  41. [41]

    Sicilia.Low-rank properties in structured matrix nearness problems

    S. Sicilia.Low-rank properties in structured matrix nearness problems. PhD Thesis, Gran Sasso Sci- ence Institute, January 2025. Available athttps://iris.gssi.it/handle/20.500.12571/33684? mode=simple

  42. [42]

    J. A. Tropp, A. Yurtsever, M. Udell, and V. Cevher. Practical sketching algorithms for low-rank matrix approximation.SIAM Journal on Matrix Analysis and Applications, 38(4):1454–1485, 2017

  43. [43]

    J. A. Tropp, A. Yurtsever, M. Udell, and V. Cevher. Streaming low-rank matrix approximation with an application to scientific simulation.SIAM Journal on Scientific Computing, 41(4):A2430–A2463, 2019

  44. [44]

    Udell, C

    M. Udell, C. Horn, R. Zadeh, S. Boyd, et al. Generalized low rank models.Foundations and Trends®in Machine Learning, 9(1):1–118, 2016

  45. [45]

    Udell and A

    M. Udell and A. Townsend. Why are big data matrices approximately low rank?SIAM Journal on Mathematics of Data Science, 1(1):144–160, 2019

  46. [46]

    Wertz, A

    S. Wertz, A. Vandaele, and N. Gillis. Efficient algorithms for the hadamard decomposition. In International Workshop on Machine Learning for Signal Processing (MLSP), 2025

  47. [47]

    Zhong and J

    S. Zhong and J. Ghosh. Generative model-based document clustering: a comparative study.Knowl- edge and Information Systems, 8(3):374–384, 2005. 27