On solving symmetric multi-type orthogonal non-negative matrix tri-factorization problem
Pith reviewed 2026-06-27 19:48 UTC · model grok-4.3
The pith
Two heuristic algorithms find high-quality solutions for the symmetric multi-type orthogonal non-negative matrix tri-factorization problem.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The two proposed algorithms, a fixed-point method derived from penalized KKT conditions and a three-stage ADAM method, reliably compute high-quality local solutions to the symmetric multi-type orthogonal non-negative matrix tri-factorization problem, recovering factorizations close to the optimum on synthetic data and producing competitive embeddings on real networks.
What carries the argument
The shared non-negative orthogonal matrix G that defines the common structure in the approximations G S_i G^T for each input matrix.
If this is right
- The learned G provides an assignment-type structure suitable for node clustering in networks.
- The embeddings perform competitively in link prediction and node classification tasks on citation networks.
- Both methods remain stable when input matrices contain noise.
- The factorization improves interpretability over standard methods due to non-negativity and orthogonality.
Where Pith is reading between the lines
- This joint factorization could be adapted for other multi-view network data beyond citation networks.
- The orthogonality constraint might be relaxed in future work to handle overlapping communities.
- Since the methods are heuristics, combining them with global optimization techniques could further improve solution quality.
Load-bearing premise
The non-convex problem has high-quality local solutions that the fixed-point and ADAM heuristics can consistently locate.
What would settle it
On synthetic instances where the true generating G and S_i are known, the algorithms fail to recover factors with small reconstruction error or large deviation from the ground truth.
read the original abstract
We study the symmetric multi-type orthogonal non-negative matrix tri-factorization problem, where several symmetric non-negative matrices are simultaneously approximated by factors of the form $GS_{i}G^{\top}$, with a shared non-negative and orthogonal factor $G$. This model is motivated by clustering and network analysis, where non-negativity improves interpretability and orthogonality gives a natural assignment-type structure to the latent factor. Since the resulting optimization problem is highly non-convex, we develop two heuristic algorithms for computing high-quality local solutions. The first one is a fixed point method derived from the Karush-Kuhn-Tucker conditions after adding a penalty term for the orthogonality constraint. The second one is a three-stage ADAM-based method that combines non-negativity-preserving optimization, orthogonalization, and restricted ADAM refinement on the feasible set. We evaluate both methods on synthetic data, including noisy instances, and on citation network benchmarks. The synthetic experiments show that both algorithms recover factorizations close to the optimum and remain stable under noise. On real networks, the learned embeddings are competitive with or better than standard baselines such as SVD, node2vec, and classical link prediction heuristics in link prediction, node clustering, and node classification tasks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper studies the symmetric multi-type orthogonal non-negative matrix tri-factorization problem, in which multiple symmetric non-negative matrices are approximated simultaneously as GS_i G^T using a shared non-negative orthogonal factor G. Motivated by clustering and network analysis, it develops two heuristics for this non-convex problem: a fixed-point iteration derived from penalized KKT conditions, and a three-stage ADAM optimizer that enforces non-negativity then orthogonality then refines on the feasible set. Synthetic experiments (including noisy cases) are claimed to show recovery close to the optimum and stability; real-data experiments on citation networks claim competitive or superior performance versus SVD, node2vec, and classical heuristics on link prediction, node clustering, and node classification.
Significance. If the empirical claims hold, the work supplies practical, interpretable methods for a structured NMF variant useful in network embedding. A strength is the direct use of synthetic ground-truth recovery experiments to test whether the heuristics locate high-quality local minima rather than poor ones, together with multi-task real-data benchmarks against established baselines. This addresses the acknowledged non-convexity via concrete, evaluable procedures rather than theoretical guarantees alone.
major comments (2)
- [§4] §4 (synthetic experiments): the central claim that both algorithms 'recover factorizations close to the optimum' is load-bearing for the motivation of the heuristics, yet the manuscript reports no quantitative recovery metrics (e.g., relative Frobenius error to ground-truth factors, success rates across random initializations) or error bars. Without these numbers it is impossible to judge how close 'close' is or whether the methods reliably avoid poor minima.
- [§4] §4 (real-data experiments): the claim of competitiveness or superiority on link prediction, clustering, and classification rests on comparisons to SVD, node2vec, and heuristics, but the text provides no details on hyper-parameter tuning protocols, number of runs, or statistical significance tests. This weakens the downstream-performance part of the central empirical argument.
minor comments (2)
- [§3] The orthogonality penalty coefficient and ADAM stage-transition thresholds are free parameters; their selection procedure should be stated explicitly in §3.
- [§2] Notation for the multi-type matrices S_i and the shared G should be introduced once with a clear table or diagram rather than piecemeal.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which highlight opportunities to strengthen the empirical sections. We address each major point below and will revise the manuscript to incorporate the requested details and metrics.
read point-by-point responses
-
Referee: §4 (synthetic experiments): the central claim that both algorithms 'recover factorizations close to the optimum' is load-bearing for the motivation of the heuristics, yet the manuscript reports no quantitative recovery metrics (e.g., relative Frobenius error to ground-truth factors, success rates across random initializations) or error bars. Without these numbers it is impossible to judge how close 'close' is or whether the methods reliably avoid poor minima.
Authors: We agree that quantitative metrics are necessary to make the recovery claims precise and verifiable. In the revision we will add tables and plots reporting relative Frobenius error to the ground-truth factors, success rates (fraction of runs recovering within a tolerance) across random initializations, and error bars (standard deviation over runs) for both the noiseless and noisy synthetic regimes. revision: yes
-
Referee: §4 (real-data experiments): the claim of competitiveness or superiority on link prediction, clustering, and classification rests on comparisons to SVD, node2vec, and heuristics, but the text provides no details on hyper-parameter tuning protocols, number of runs, or statistical significance tests. This weakens the downstream-performance part of the central empirical argument.
Authors: We acknowledge that the current manuscript lacks sufficient experimental-protocol transparency. The revised version will specify the hyper-parameter search ranges and selection criteria, state the number of independent runs performed for each method and dataset, and report statistical significance (e.g., paired t-tests or Wilcoxon signed-rank tests with p-values) for the performance differences on link prediction, clustering, and classification tasks. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper's core methods are derived from standard KKT conditions (with an added penalty) and off-the-shelf ADAM optimization applied in stages; neither reduces to a fitted parameter or self-citation that defines the claimed performance. Synthetic recovery experiments directly measure distance to known ground-truth factors, and real-data results compare against external baselines (SVD, node2vec, heuristics) on independent tasks. No equation or premise is equivalent to its inputs by construction, and the non-convexity handling is evaluated empirically rather than assumed via internal redefinition.
Axiom & Free-Parameter Ledger
free parameters (2)
- orthogonality penalty coefficient
- ADAM learning rates and stage transition thresholds
axioms (2)
- standard math KKT optimality conditions apply to the penalized problem
- domain assumption Non-negativity and orthogonality can be approximately enforced via staged projection and refinement
Reference graph
Works this paper leans on
-
[1]
J Optimiz Theory App190, 234–258 (2021) https://doi.org/10.1007/s10957-021-01880-5
Ahookhosh, M., Hien, L.T.K., Gillis, N., Patrinos, P.: A block inertial Bregman proximal algorithm for nonsmooth nonconvex problems with application to symmetric nonnegative matrix tri-factorization. J Optimiz Theory App190, 234–258 (2021) https://doi.org/10.1007/s10957-021-01880-5
-
[2]
Mathematics9(5), 540 (2021)
Asadi, S., Povh, J.: A block coordinate descent-based projected gradient algorithm for orthogonal non- negative matrix factorization. Mathematics9(5), 540 (2021)
2021
-
[3]
Adv Data Anal Classi13, 825–853 (2019) https://doi.org/10.1007/s11634-018-0348-8
Abe, H., Yadohisa, H.: Orthogonal nonnegative matrix tri-factorization based on Tweedie distributions. Adv Data Anal Classi13, 825–853 (2019) https://doi.org/10.1007/s11634-018-0348-8
-
[4]
Athena Scientific, Natick, MA (2016) 24
Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Natick, MA (2016) 24
2016
-
[5]
IEEE T Knowl Data En35(5), 5132–5146 (2022) https://doi.org/10.1109/ TKDE.2022.3145489
Chen, Y., Lei, Z., Rao, Y., Xie, H., Wang, F.L., Yin, J., Li, Q.: Parallel non-negative matrix tri-factorization for text data co-clustering. IEEE T Knowl Data En35(5), 5132–5146 (2022) https://doi.org/10.1109/ TKDE.2022.3145489
arXiv 2022
-
[6]
Cichocki, A., Zdunek, R., Phan, A.H., Amari, S.-i.: Nonnegative Matrix and Tensor Factorizations: Applica- tions to Exploratory Multi-way Data Analysis and Blind Source Separation. John Wiley & Sons, Hoboken, NJ (2009) ˇCopar, A., ˇZitnik, M., Zupan, B.: Scalable non-negative matrix tri-factorization. Biodata Min10, 41 (2017) https://doi.org/10.1186/s1304...
-
[7]
Found Comput Math20, 119–154 (2020) https://doi.org/10.1007/s10208-018-09409-5
Davis, D., Drusvyatskiy, D., Kakade, S., Lee, J.D.: Stochastic subgradient method converges on tame functions. Found Comput Math20, 119–154 (2020) https://doi.org/10.1007/s10208-018-09409-5
-
[8]
Ding, C., Li, T., Peng, W., Park, H.: Orthogonal nonnegative matrix t-factorizations for clustering. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 126–135. Association for Computing Machinery, Philadelphia, PA (2006). https://doi.org/10. 1145/1150402.1150420
arXiv 2006
-
[9]
Algorithms12(10) (2019) https://doi.org/10.3390/a12100216
Favati, P., Lotti, G., Menchi, O., Romani, F.: Adaptive clustering via symmetric nonnegative matrix factorization of the similarity matrix. Algorithms12(10) (2019) https://doi.org/10.3390/a12100216
-
[10]
Society for Industrial and Applied Mathematics, Philadelphia, PA (2020)
Gillis, N.: Nonnegative Matrix Factorization. Society for Industrial and Applied Mathematics, Philadelphia, PA (2020)
2020
-
[11]
KDD - node2vec: Scalable Feature Learning for Networks,
Grover, A., Leskovec, J.: node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864. Association for Computing Machinery, San Francisco, CA (2016). https://doi.org/10.1145/2939672.2939754 Gligorijevi´ c, V., Malod-Dognin, N., Prˇ zulj, N.: Fuse: Mul...
-
[12]
Hajinezhad, D., Chang, T.-H., Wang, X., Shi, Q., Hong, M.: Nonnegative matrix factorization using ADMM: Algorithm and convergence analysis. In: Proceedings of the 2016 IEEE International Con- ference on Acoustics, Speech and Signal Processing, pp. 4742–4746. IEEE, Shanghai, China (2016). https://doi.org/10.1109/ICASSP.2016.7472577
-
[13]
J Global Optim82(2), 283–312 (2022) https://doi.org/10.1007/s10898-021-01074-3
Hribar, R., Hrga, T., Papa, G., Petelin, G., Povh, J., Prˇ zulj, N., Vukaˇ sinovi´ c, V.: Four algorithms to solve symmetric multi-type non-negative matrix tri-factorization problem. J Global Optim82(2), 283–312 (2022) https://doi.org/10.1007/s10898-021-01074-3
-
[14]
SIAM Rev53(2), 217–288 (2011) https://doi.org/10
Halko, N., Martinsson, P.-G., Tropp, J.A.: Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev53(2), 217–288 (2011) https://doi.org/10. 1137/090771806
2011
-
[15]
https://www.cs.toronto.edu/∼tijmen/csc321/slides/lecture slides lec6.pdf (2012)
Hinton, G., Srivastava, N., Swersky, K.: Neural networks for machine learning (Lecture 6a): Overview of mini- batch gradient descent. https://www.cs.toronto.edu/∼tijmen/csc321/slides/lecture slides lec6.pdf (2012)
2012
-
[16]
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization (2014) arXiv:1412.6980 [cs.LG]
Pith/arXiv arXiv 2014
-
[17]
SIAM J Matrix Anal A30(2), 713–730 (2008) https://doi.org/10.1137/ 07069239X 25
Kim, H., Park, H.: Nonnegative matrix factorization based on alternating nonnegativity constrained least squares and active set method. SIAM J Matrix Anal A30(2), 713–730 (2008) https://doi.org/10.1137/ 07069239X 25
2008
-
[18]
Method Inform Med55(4), 340–346 (2016) https://doi.org/10.3414/ ME15-01-0108
Kastrin, A., Rindflesch, T.C., Hristovski, D.: Link prediction on a network of co-occurring MeSH terms: Towards literature-based discovery. Method Inform Med55(4), 340–346 (2016) https://doi.org/10.3414/ ME15-01-0108
2016
-
[19]
Physica A553, 124289 (2020) https://doi.org/10.1016/j.physa.2020.124289
Kumar, A., Singh, S.S., Singh, K., Biswas, B.: Link prediction techniques, applications, and performance: A survey. Physica A553, 124289 (2020) https://doi.org/10.1016/j.physa.2020.124289
-
[20]
Knowl-Based Syst283, 111192 (2024) https://doi.org/10.1016/j.knosys
Kong, Q., Sun, J., Xu, Z.: Joint orthogonal symmetric non-negative matrix factorization for community detection in attribute network. Knowl-Based Syst283, 111192 (2024) https://doi.org/10.1016/j.knosys. 2023.111192
-
[21]
Bioinformatics 34(4), 652–659 (2018) https://doi.org/10.1093/bioinformatics/btx613
Lever, J., Gakkhar, S., Gottlieb, M., Rashnavadi, T., Lin, S., Siu, C., Smith, M., Jones, M.R., Krzywinski, M., Jones, S.J.: A collaborative filtering-based approach to biomedical knowledge discovery. Bioinformatics 34(4), 652–659 (2018) https://doi.org/10.1093/bioinformatics/btx613
-
[22]
Neural Computation 19(10), 2756–2779 (2007)
Lin, C.-J.: Projected gradient methods for nonnegative matrix factorization. Neural Comput19(10), 2756– 2779 (2007) https://doi.org/10.1162/neco.2007.19.10.2756
-
[23]
IEEE T Neur Net Lear33(3), 1203–1215 (2022) https: //doi.org/10.1109/TNNLS.2020.3041360
Luo, X., Liu, Z., Jin, L., Zhou, Y., Zhou, M.: Symmetric nonnegative matrix factorization-based community detection models and their convergence analysis. IEEE T Neur Net Lear33(3), 1203–1215 (2022) https: //doi.org/10.1109/TNNLS.2020.3041360
-
[24]
J Am Soc Inf Sci Tec58(7), 1019–1031 (2007) https://doi.org/10.1002/asi.20591
Liben-Nowell, D., Kleinberg, J.: The link-prediction problem for social networks. J Am Soc Inf Sci Tec58(7), 1019–1031 (2007) https://doi.org/10.1002/asi.20591
-
[25]
Liu, K., Wang, H.: Robust multi-relational clustering viaℓ 1-norm symmetric nonnegative matrix factoriza- tion. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, pp. 397–401. Association for Computational Linguistics, Beijing, China (2015)....
-
[26]
https://doi.org/10.24963/ijcai.2018/340
Stockholm, Sweden (2018). https://doi.org/10.24963/ijcai.2018/340
-
[27]
Sci Rep15, 7840 (2025) https://doi.org/10.1038/s41598-025-91757-8
Li, Q., Wang, Y., Wang, J., Zhao, C.: Improving drug repositioning accuracy using non-negative matrix tri-factorization. Sci Rep15, 7840 (2025) https://doi.org/10.1038/s41598-025-91757-8
-
[28]
Nat Commun10, 805 (2019) https://doi.org/10.1038/s41467-019-08797-8
Malod-Dognin, N., Petschnigg, J., Windels, S.F., Povh, J., Hemingway, H., Ketteler, R., Prˇ zulj, N.: Towards a data-integrated cell. Nat Commun10, 805 (2019) https://doi.org/10.1038/s41467-019-08797-8
-
[29]
Math Program95(2), 407–430 (2003) https://doi.org/10.1007/s10107-002-0355-5
Mittelmann, H.D.: An independent benchmarking of SDP and SOCP solvers. Math Program95(2), 407–430 (2003) https://doi.org/10.1007/s10107-002-0355-5
-
[30]
Int J Mol Sci25(17), 9576 (2024) https://doi.org/10.3390/ijms25179576
Messa, L., Testa, C., Carelli, S., Rey, F., Jacchetti, E., Cereda, C., Raimondi, M.T., Ceri, S., Pinoli, P.: Non- negative matrix tri-factorization for representation learning in multi-omics datasets with applications to drug repurposing and selection. Int J Mol Sci25(17), 9576 (2024) https://doi.org/10.3390/ijms25179576
-
[31]
In: Proceedings of the 14th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp
Ma, H., Zhao, W., Tan, Q., Shi, Z.: Orthogonal nonnegative matrix tri-factorization for semi-supervised doc- ument co-clustering. In: Proceedings of the 14th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 189–200. Springer, Hyderabad, India (2010). https://doi.org/10.1007/978-3-642-13672-6 19
-
[32]
Knowl-Based Syst201, 106054 (2020) https://doi.org/10.1016/j.knosys.2020.106054
Peng, S., Ser, W., Chen, B., Lin, Z.: Robust orthogonal nonnegative matrix tri-factorization for data representation. Knowl-Based Syst201, 106054 (2020) https://doi.org/10.1016/j.knosys.2020.106054
-
[33]
Journal of computational and applied mathematics , volume=
Rousseeuw, P.J.: Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J 26 Comput Appl Math20, 53–65 (1987) https://doi.org/10.1016/0377-0427(87)90125-7
-
[34]
In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, pp
Salah, A., Ailem, M., Nadif, M.: Word co-occurrence regularized non-negative matrix tri-factorization for text data co-clustering. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, pp. 3992–3999. AAAI Press, New Orleans, LA (2018). https://doi.org/10.1609/aaai.v32i1.11659
-
[35]
SIAM J Optimiz20(3), 1364–1377 (2010) https://doi.org/10.1137/070709967
Vavasis, S.A.: On the complexity of nonnegative matrix factorization. SIAM J Optimiz20(3), 1364–1377 (2010) https://doi.org/10.1137/070709967
-
[36]
Wang, H., Huang, H., Ding, C.: Simultaneous clustering of multi-type relational data via symmetric nonneg- ative matrix tri-factorization. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 279–284. Association for Computing Machinery, Glasgow, UK (2011). https://doi.org/10.1145/2063576.2063621
-
[37]
Inform Process Manag46(5), 559–570 (2010) https://doi.org/10.1016/j.ipm.2009.12.007
Yoo, J., Choi, S.: Orthogonal nonnegative matrix tri-factorization for co-clustering: Multiplicative updates on stiefel manifolds. Inform Process Manag46(5), 559–570 (2010) https://doi.org/10.1016/j.ipm.2009.12.007
-
[38]
Big Data Min Anal8(1), 214–232 (2025) https://doi.org/10.26599/ BDMA.2024.9020055
Yu, J., Che, H., Leung, M.-F., Liu, C., Wu, W., Yan, Z.: Robust non-negative matrix tri-factorization with dual hyper-graph regularization. Big Data Min Anal8(1), 214–232 (2025) https://doi.org/10.26599/ BDMA.2024.9020055
arXiv 2025
-
[39]
In: Proceedings of the 33rd International Conference on Machine Learning, pp
Yang, Z., Cohen, W., Salakhudinov, R.: Revisiting semi-supervised learning with graph embeddings. In: Proceedings of the 33rd International Conference on Machine Learning, pp. 40–48. JMLR, New York, NY (2016). https://doi.org/10.5555/3045390.3045396
-
[40]
IEEE Trans Emerg Top Comput Intell9(6) (2025) https://doi.org/10.1109/TETCI.2025.3572129
Zhou, Q., Che, H., Guo, W., He, X., Leung, M.-F., Wen, S.: Robust low-rank tensor constrained orthogonal symmetric non-negative matrix factorization for multi-layer networks community detection. IEEE Trans Emerg Top Comput Intell9(6) (2025) https://doi.org/10.1109/TETCI.2025.3572129
-
[41]
In: Proceedings of the 2016 6th International Conference on Machinery, Materials, Envi- ronment, Biotechnology and Computer
Zhang, G., Cai, G., Wu, H., Zheng, S.: A nonnegative matrix tri-factorization technique for recommendation in microblog. In: Proceedings of the 2016 6th International Conference on Machinery, Materials, Envi- ronment, Biotechnology and Computer. Atlantis Press, Tianjin, China (2016). https://doi.org/10.2991/ mmebc-16.2016.293
2016
-
[42]
ACM Comput Surv55(2), 38 (2022) https://doi.org/10.1145/3491206 27
Zhou, J., Liu, L., Wei, W., Fan, J.: Network representation learning: From preprocessing, feature extraction to node embedding. ACM Comput Surv55(2), 38 (2022) https://doi.org/10.1145/3491206 27
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.