High-Performance Star-M SVD for Big Data Compression

Aditya Devarakonda; Grey Ballard; Md Taufique Hussain; Naman Pesricha; Srinivas Eswar; Vishwas Rao

arxiv: 2605.16058 · v1 · pith:QVGW34J5new · submitted 2026-05-15 · 💻 cs.DC · cs.MS

High-Performance Star-M SVD for Big Data Compression

Md Taufique Hussain , Grey Ballard , Aditya Devarakonda , Srinivas Eswar , Naman Pesricha , Vishwas Rao This is my paper

Pith reviewed 2026-05-19 18:47 UTC · model grok-4.3

classification 💻 cs.DC cs.MS

keywords star-M SVDtensor decompositionbig data compressionshared-memory parallelhigh-performance computingscientific datasetsoptimality guarantees

0 comments

The pith

A shared-memory parallel implementation of the star-M SVD enables high-performance compression of large scientific datasets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a high-performance shared-memory parallel software for computing the star-M SVD. This tensor decomposition works in a matrix-mimetic way under the star-M framework and carries optimality guarantees for certain data types. Earlier versions stayed confined to slower productivity languages, limiting their use on big datasets. A sympathetic reader would care because effective compression lets scientists store and analyze much larger volumes of data while keeping essential accuracy.

Core claim

The authors present a shared-memory parallel high-performance solution for the algorithms that underlie the star-M SVD, a tensor decomposition that operates in matrix-mimetic fashion within the star-M tensor framework and carries optimality guarantees with demonstrated performance on specific data.

What carries the argument

The star-M SVD, a tensor singular-value decomposition that performs matrix-mimetic operations under the star-M tensor framework to deliver optimal compression.

If this is right

Optimal compression of extensive scientific datasets becomes practical at scale.
Enhanced data analysis and insights follow from the ability to handle larger compressed volumes.
Complex mathematical operations on big data can run more efficiently than with traditional matrix methods.
Tensor-based compression achieves superior ratios with minimal accuracy loss compared to matrix approaches.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The shared-memory design may serve as a foundation for later distributed-memory extensions that address even larger problems.
Integration with existing high-performance linear-algebra libraries could further reduce development time for similar tensor tools.
The approach might generalize to other tensor operations that benefit from matrix-mimetic properties.

Load-bearing premise

The star-M SVD supplies optimality guarantees and exceptional performance on the targeted types of data.

What would settle it

Benchmark runs of the new parallel code against prior productivity-language versions on representative large scientific datasets, checking both wall-clock time and achieved compression ratios against accuracy thresholds.

Figures

Figures reproduced from arXiv: 2605.16058 by Aditya Devarakonda, Grey Ballard, Md Taufique Hussain, Naman Pesricha, Srinivas Eswar, Vishwas Rao.

**Figure 2.** Figure 2: TTM performance of three variants (batched vs. loop vs. parfor ) for the ncep-air-6 dataset across thread counts and TTM modes. 8 [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Slice-wise SVD wall time (parallel slices, sequential SVD vs. sequential slices, parallel [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: Breakdown times for the different t-SVDM-II strategies. Depending on the time and [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Compression ratio of the ncep-air-6 tensor for different algorithms. 4.1.3 Strong Scaling [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: Strong scaling of both t-SVDM-I and t-SVDM-II for the [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: Median pointwise relative error at extreme temperature events (850 hPa) for EOF (top) [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 8.** Figure 8: Iso-surfaces of the z-component of vorticity, ωz = ∂v/∂x − ∂u/∂y, for the Taylor–Green Vortex flow: original (top-left), compressed reconstruction (bottom-left), and pointwise error (topright). Reconstruction is via t-SVDM-II-DCT at tolerance 10−1 (compression ratio ≈ 16×). Red and blue denote positive and negative values. 4.2.2 Compression Quality [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

**Figure 9.** Figure 9: Compression ratio of the cfd tensor for different algorithms. 4.2.3 Strong Scaling [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗

**Figure 10.** Figure 10: Strong scaling of both t-SVDM-I and t-SVDM-II for the [PITH_FULL_IMAGE:figures/full_fig_p017_10.png] view at source ↗

**Figure 11.** Figure 11: Two-dimensional slice through the X-ray diffuse-scattering volume: original (top-left), [PITH_FULL_IMAGE:figures/full_fig_p018_11.png] view at source ↗

**Figure 12.** Figure 12: Compression ratio of the xray tensor for different algorithms. 4.3.3 Strong Scaling [PITH_FULL_IMAGE:figures/full_fig_p019_12.png] view at source ↗

**Figure 13.** Figure 13: Strong scaling of both t-SVDM-I and t-SVDM-II for the [PITH_FULL_IMAGE:figures/full_fig_p019_13.png] view at source ↗

read the original abstract

In the era of big data, effectively compressing large datasets while performing complex mathematical operations is crucial. Tensor-based decomposition methods have shown superior compression capabilities with minimal loss of accuracy compared to traditional matrix methods. Under the star-M tensor framework, tensors can be decomposed in a matrix-mimetic way, including using the star-M SVD. This tensor SVD has optimality guarantees and has shown exceptional performance on specific types of data, but software implementations have been mostly limited to productivity-oriented languages. In this work, we present our development of a shared-memory parallel, high-performance solution designed to efficiently implement the underlying algorithms. This software will enable optimal compression of extensive scientific datasets, paving the way for enhanced data analysis and insights.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Shared-memory parallel star-M SVD implementation lacks any benchmarks or scalability analysis for true big-data tensors.

read the letter

The punchline is that this paper describes the development of a shared-memory parallel implementation for the star-M SVD but provides no empirical evidence or scalability details to back up its big-data compression promises. What is new here is the focus on a high-performance version in a setting that supports parallel execution on shared memory systems. Previous work on star-M SVD was mostly in languages not optimized for speed, so this effort to create an efficient implementation could help users who need to compress large tensors without too much accuracy loss. The paper does a good job highlighting the advantages of tensor-based methods over traditional ones for scientific data. The optimality guarantees mentioned for the star-M SVD on specific data types are a solid foundation to build on. Where it falls short is in the lack of any results. The abstract talks about the software enabling optimal compression but doesn't show benchmarks, error rates, or even basic validation. On top of that, the shared-memory design limits its use for datasets that don't fit in one node's memory, and there's no mention of handling larger scales through other means. This makes the applicability to truly big data questionable. This work is for practitioners in high-performance computing and data-intensive science who are looking for better tools to handle tensor decompositions. A reader focused on practical software solutions might find it worthwhile if the methods and code are described thoroughly. The paper shows honest engagement with the literature on tensor SVDs and the need for performance improvements. I think it deserves a serious referee to evaluate the implementation quality and any hidden results. My recommendation is to send it for peer review instead of a desk reject, so we can see the full picture on performance and whether the shared-memory choice is justified for the target use cases.

Referee Report

2 major / 0 minor

Summary. The manuscript presents the development of a shared-memory parallel, high-performance implementation of the star-M SVD under the star-M tensor framework for decomposing and compressing large scientific datasets. It asserts that this software solution will enable optimal compression of extensive datasets with minimal accuracy loss, extending beyond existing productivity-language implementations.

Significance. If the implementation is shown through benchmarks to deliver high performance and the optimality guarantees translate to practical gains, the work could provide a valuable high-performance computing tool for tensor-based compression in scientific big data applications. It addresses a noted gap in efficient software for the star-M SVD.

major comments (2)

The abstract asserts that the software 'will enable optimal compression of extensive scientific datasets' and describes a 'high-performance solution,' yet the manuscript provides no benchmarks, performance numbers, error metrics, or validation results to support these claims.
The shared-memory parallel design is presented without any memory-footprint analysis, out-of-core strategy, or distributed-memory extension, leaving unsecured the central claim of applicability to tensors whose size exceeds single-node RAM.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. The comments highlight important areas where the manuscript can be strengthened with additional evidence and clarification. We address each major comment below and indicate the corresponding revisions.

read point-by-point responses

Referee: The abstract asserts that the software 'will enable optimal compression of extensive scientific datasets' and describes a 'high-performance solution,' yet the manuscript provides no benchmarks, performance numbers, error metrics, or validation results to support these claims.

Authors: We agree that the current manuscript lacks empirical support for the performance and compression claims. In the revised version we will add a new experimental section that reports runtime, parallel speedup, memory usage, and reconstruction error metrics on representative large scientific datasets, directly comparing against existing productivity-language implementations of star-M SVD. revision: yes
Referee: The shared-memory parallel design is presented without any memory-footprint analysis, out-of-core strategy, or distributed-memory extension, leaving unsecured the central claim of applicability to tensors whose size exceeds single-node RAM.

Authors: The present work targets shared-memory systems for tensors that fit in single-node RAM, which already addresses a practical gap. We will add an explicit memory-footprint analysis and a limitations subsection that states the current scope and notes that out-of-core or distributed-memory extensions are required for tensors larger than available RAM; these extensions are identified as future work. revision: partial

Circularity Check

0 steps flagged

No circularity: software implementation paper with no derivations or self-referential predictions

full rationale

The paper presents the development of a shared-memory parallel high-performance implementation of the star-M SVD for tensor compression. No mathematical derivation chain, fitted parameters, or predictions appear in the provided abstract or description. The optimality guarantees are referenced from prior work on the star-M framework rather than derived or fitted within this manuscript. The work is an engineering and software effort focused on efficient implementation, not a theoretical claim that reduces to its own inputs by construction. No self-citation load-bearing steps, ansatzes, or renamings of known results are present. The derivation is therefore self-contained with no circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the contribution rests on the assumed properties of the star-M SVD framework referenced in the text.

pith-pipeline@v0.9.0 · 5660 in / 914 out tokens · 37059 ms · 2026-05-19T18:47:17.791198+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages

[1]

Higham, Jakub Kurzak, Piotr Luszczek, Stanimire Tomov, and Mawussi Zounon

Ahmad Abdelfattah, Timothy Costa, Jack Dongarra, Mark Gates, Azzam Haidar, Sven Hammarling, Nicholas J. Higham, Jakub Kurzak, Piotr Luszczek, Stanimire Tomov, and Mawussi Zounon. A set of batched basic linear algebra subprograms and lapack routines.ACM Trans. Math. Softw., 47(3), June 2021

work page 2021
[2]

Parallel algorithms for tensor train arithmetic

Hussam Al Daas, Grey Ballard, and Peter Benner. Parallel algorithms for tensor train arithmetic. SIAM Journal on Scientific Computing, 44(1):C25–C53, 2022

work page 2022
[3]

Woody Austin, Grey Ballard, and Tamara G. Kolda. Parallel tensor compression for large-scale scientific data. In2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 912–922, 2016

work page 2016
[4]

Grey Ballard, Alicia Klinvex, and Tamara G. Kolda. TuckerMPI: A parallel C++/MPI software package for large-scale data compression via the tucker tensor decomposition.ACM Transactions on Mathematical Software, 46(2), June 2020

work page 2020
[5]

Grey Ballard, Alicia Klinvex, and Tamara G. Kolda. TuckerMPI: A parallel C++/MPI software package for large-scale data compression via the Tucker tensor decomposition.ACM Transactions on Mathematical Software, 46(2):1–31, 2020

work page 2020
[6]

Kolda.Tensor Decompositions for Data Science

Grey Ballard and Tamara G. Kolda.Tensor Decompositions for Data Science. Cambridge University Press, 2025

work page 2025
[7]

Mesh-based super-resolution of fluid flows with multiscale graph neural networks.Computer Methods in Applied Mechanics and Engineering, 443:118072, 2025

Shivam Barwey, Pinaki Pal, Saumil Patel, Riccardo Balin, Bethany Lusch, Venkatram Vish- wanath, Romit Maulik, and Ramesh Balakrishnan. Mesh-based super-resolution of fluid flows with multiscale graph neural networks.Computer Methods in Applied Mechanics and Engineering, 443:118072, 2025

work page 2025
[8]

Eckart-Young

J. Douglas Carroll and Jih-Jie Chang. Analysis of individual differences in multidimensional scaling via an N-way generalization of “Eckart-Young” decomposition.Psychometrika, 35(3):283– 319, 1970. 20

work page 1970
[9]

A multilinear singular value decomposition.SIAM Journal on Matrix Analysis and Applications, 21(4):1253–1278, 2000

Lieven De Lathauwer, Bart De Moor, and Joos Vandewalle. A multilinear singular value decomposition.SIAM Journal on Matrix Analysis and Applications, 21(4):1253–1278, 2000

work page 2000
[10]

On the best rank-1 and rank- (r1, r2,

Lieven De Lathauwer, Bart De Moor, and Joos Vandewalle. On the best rank-1 and rank- (r1, r2, . . . , rn) approximation of higher-order tensors.SIAM Journal on Matrix Analysis and Applications, 21(4):1324–1342, 2000

work page 2000
[11]

Accelerating numerical dense linear algebra calculations with gpus

Jack Dongarra, Mark Gates, Azzam Haidar, Jakub Kurzak, Piotr Luszczek, Stanimire Tomov, and Ichitaro Yamazaki. Accelerating numerical dense linear algebra calculations with gpus. Numerical Computations with GPUs, pages 1–26, 2014

work page 2014
[12]

The approximation of one matrix by another of lower rank

Carl Eckart and Gale Young. The approximation of one matrix by another of lower rank. Psychometrika, 1(3):211–218, 1936

work page 1936
[13]

Matheson, and Haesun Park

Srinivas Eswar, Koby Hayashi, Grey Ballard, Ramakrishnan Kannan, Michael A. Matheson, and Haesun Park. PLANC: Parallel low-rank approximation with nonnegativity constraints. ACM Transactions on Mathematical Software, 47(3), June 2021

work page 2021
[14]

NekRS, a GPU-accelerated spectral element Navier–Stokes solver.Parallel Computing, 114:102982, 2022

Paul Fischer, Stefan Kerkemeier, Misun Min, Yu-Hsiang Lan, Malachi Phillips, Thilina Rath- nayake, Elia Merzari, Ananias Tomboulides, Ali Karakus, Noel Chalmers, and Tim Warbur- ton. NekRS, a GPU-accelerated spectral element Navier–Stokes solver.Parallel Computing, 114:102982, 2022

work page 2022
[15]

Hannachi, I

A. Hannachi, I. T. Jolliffe, and D. B. Stephenson. Empirical Orthogonal Functions and Related Techniques in Atmospheric Science: A Review.International Journal of Climatology, 27(9):1119–1152, 2007

work page 2007
[16]

explanatory

Richard A. Harshman. Foundations of the PARAFAC procedure: Models and conditions for an “explanatory” multi-modal factor analysis.UCLA Working Papers in Phonetics, 16:1–84, 1970

work page 1970
[17]

Helal, Jan Laukemann, Fabio Checconi, Jesmin Jahan Tithi, Teresa Ranadive, Fabrizio Petrini, and Jeewhan Choi

Ahmed E. Helal, Jan Laukemann, Fabio Checconi, Jesmin Jahan Tithi, Teresa Ranadive, Fabrizio Petrini, and Jeewhan Choi. ALTO: adaptive linearized storage of sparse tensors. InProceedings of the 35th ACM International Conference on Supercomputing, ICS ’21, page 404?416, New York, NY, USA, 2021. Association for Computing Machinery

work page 2021
[18]

Hitchcock

Frank L. Hitchcock. The expression of a tensor or a polyadic as a sum of products.Journal of Mathematics and Physics, 6(1–4):164–189, 1927

work page 1927
[19]

Intel Corporation.Developer Reference for Intel®oneAPI Math Kernel Library, 2026

work page 2026
[20]

Yujing Jiang, Daniel Cooley, and Michael F. Wehner. Principal component analysis for extremes and application to U.S. precipitation.Journal of Climate, 33(15), 2020

work page 2020
[21]

Kalnay, M

E. Kalnay, M. Kanamitsu, R. Kistler, W. Collins, D. Deaven, L. Gandin, M. Iredell, S. Saha, G. White, J. Woollen, Y. Zhu, M. Chelliah, W. Ebisuzaki, W. Higgins, J. Janowiak, K. C. Mo, C. Ropelewski, J. Wang, A. Leetmaa, R. Reynolds, R. Jenne, and D. Joseph. The NCEP/NCAR 40-year reanalysis project.Bulletin of the American Meteorological Society, 77(3):437...

work page 1996
[22]

Kaya and B

O. Kaya and B. U¸ car. High performance parallel algorithms for the Tucker decomposition of sparse tensors. In45th International Conference on Parallel Processing (ICPP ’16), pages 103–112, 2016. 21

work page 2016
[23]

Projected tensor-tensor products for efficient computation of optimal multiway data representations.Linear Algebra and its Applications, 729:100–147, 2025

Katherine Keegan and Elizabeth Newman. Projected tensor-tensor products for efficient computation of optimal multiway data representations.Linear Algebra and its Applications, 729:100–147, 2025

work page 2025
[24]

A tensor svd-based classification algorithm applied to fmri data, 2021

Katherine Keegan, Tanvi Vishwanath, and Yihua Xu. A tensor svd-based classification algorithm applied to fmri data, 2021

work page 2021
[25]

Tensor-tensor products with invertible linear transforms.Linear Algebra and its Applications, 485:545–570, 2015

Eric Kernfeld, Misha Kilmer, and Shuchin Aeron. Tensor-tensor products with invertible linear transforms.Linear Algebra and its Applications, 485:545–570, 2015

work page 2015
[26]

Tensor-tensor algebra for optimal representation and compression of multiway data.Proceedings of the National Academy of Sciences, 118(28):e2015851118, 2021

Misha E Kilmer, Lior Horesh, Haim Avron, and Elizabeth Newman. Tensor-tensor algebra for optimal representation and compression of multiway data.Proceedings of the National Academy of Sciences, 118(28):e2015851118, 2021

work page 2021
[27]

Kolda and Brett W

Tamara G. Kolda and Brett W. Bader. Tensor decompositions and applications.SIAM Review, 51(3):455–500, 2009

work page 2009
[28]

Recov- ering sparse DFT from missing signals via interior point method on GPU.arXiv preprint arXiv:2502.04217, 2025

Wei Kuang, Vishwas Rao, Alexis Montoison, Fran¸ cois Pacaud, and Mihai Anitescu. Recov- ering sparse DFT from missing signals via interior point method on GPU.arXiv preprint arXiv:2502.04217, 2025

work page arXiv 2025
[29]

Carnegie Mellon University, June 2018

Canyi Lu.Tensor-Tensor Product Toolbox. Carnegie Mellon University, June 2018. https: //github.com/canyilu/tproduct

work page 2018
[30]

Symmetric gauge functions and unitarily invariant norms.The Quarterly Journal of Mathematics, 11(1):50–59, 1960

Leon Mirsky. Symmetric gauge functions and unitarily invariant norms.The Quarterly Journal of Mathematics, 11(1):50–59, 1960

work page 1960
[31]

Optimal matrix-mimetic tensor algebras via variable projection.SIAM Journal on Matrix Analysis and Applications, 46(3):1764–1790, 2025

Elizabeth Newman and Katherine Keegan. Optimal matrix-mimetic tensor algebras via variable projection.SIAM Journal on Matrix Analysis and Applications, 46(3):1764–1790, 2025

work page 2025
[32]

Oseledets

Ivan V. Oseledets. Tensor-train decomposition.SIAM Journal on Scientific Computing, 33(5):2295–2317, 2011

work page 2011
[33]

Phipps and Tamara G

Eric T. Phipps and Tamara G. Kolda. Software for sparse tensor decomposition on emerging computing architectures.SIAM Journal on Scientific Computing, 41(3):C269–C290, 2019. GenTen: shared-memory/Kokkos parallel CP decomposition

work page 2019
[34]

Performance of the low-rank tt-svd for large dense tensors on modern multicore cpus.SIAM Journal on Scientific Computing, 44(4):C287–C309, 2022

Melven R¨ ohrig-Z¨ ollner, Jonas Thies, and Achim Basermann. Performance of the low-rank tt-svd for large dense tensors on modern multicore cpus.SIAM Journal on Scientific Computing, 44(4):C287–C309, 2022

work page 2022
[35]

A medium-grained algorithm for distributed sparse tensor factorization

Shaden Smith and George Karypis. A medium-grained algorithm for distributed sparse tensor factorization. In2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 902–911, 2016. SPLATT: distributed-memory parallel CP decomposition

work page 2016
[36]

Ledyard R. Tucker. Some mathematical notes on three-mode factor analysis.Psychometrika, 31(3):279–311, 1966

work page 1966
[37]

Amped: Accelerating mttkrp for billion-scale sparse tensor decomposition on multiple gpus

Sasindu Wijeratne, Rajgopal Kannan, and Viktor Prasanna. Amped: Accelerating mttkrp for billion-scale sparse tensor decomposition on multiple gpus. InProceedings of the 54th International Conference on Parallel Processing, ICPP ’25, page 208–217, New York, NY, USA,

work page
[38]

Association for Computing Machinery. 22

work page

[1] [1]

Higham, Jakub Kurzak, Piotr Luszczek, Stanimire Tomov, and Mawussi Zounon

Ahmad Abdelfattah, Timothy Costa, Jack Dongarra, Mark Gates, Azzam Haidar, Sven Hammarling, Nicholas J. Higham, Jakub Kurzak, Piotr Luszczek, Stanimire Tomov, and Mawussi Zounon. A set of batched basic linear algebra subprograms and lapack routines.ACM Trans. Math. Softw., 47(3), June 2021

work page 2021

[2] [2]

Parallel algorithms for tensor train arithmetic

Hussam Al Daas, Grey Ballard, and Peter Benner. Parallel algorithms for tensor train arithmetic. SIAM Journal on Scientific Computing, 44(1):C25–C53, 2022

work page 2022

[3] [3]

Woody Austin, Grey Ballard, and Tamara G. Kolda. Parallel tensor compression for large-scale scientific data. In2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 912–922, 2016

work page 2016

[4] [4]

Grey Ballard, Alicia Klinvex, and Tamara G. Kolda. TuckerMPI: A parallel C++/MPI software package for large-scale data compression via the tucker tensor decomposition.ACM Transactions on Mathematical Software, 46(2), June 2020

work page 2020

[5] [5]

Grey Ballard, Alicia Klinvex, and Tamara G. Kolda. TuckerMPI: A parallel C++/MPI software package for large-scale data compression via the Tucker tensor decomposition.ACM Transactions on Mathematical Software, 46(2):1–31, 2020

work page 2020

[6] [6]

Kolda.Tensor Decompositions for Data Science

Grey Ballard and Tamara G. Kolda.Tensor Decompositions for Data Science. Cambridge University Press, 2025

work page 2025

[7] [7]

Mesh-based super-resolution of fluid flows with multiscale graph neural networks.Computer Methods in Applied Mechanics and Engineering, 443:118072, 2025

Shivam Barwey, Pinaki Pal, Saumil Patel, Riccardo Balin, Bethany Lusch, Venkatram Vish- wanath, Romit Maulik, and Ramesh Balakrishnan. Mesh-based super-resolution of fluid flows with multiscale graph neural networks.Computer Methods in Applied Mechanics and Engineering, 443:118072, 2025

work page 2025

[8] [8]

Eckart-Young

J. Douglas Carroll and Jih-Jie Chang. Analysis of individual differences in multidimensional scaling via an N-way generalization of “Eckart-Young” decomposition.Psychometrika, 35(3):283– 319, 1970. 20

work page 1970

[9] [9]

A multilinear singular value decomposition.SIAM Journal on Matrix Analysis and Applications, 21(4):1253–1278, 2000

Lieven De Lathauwer, Bart De Moor, and Joos Vandewalle. A multilinear singular value decomposition.SIAM Journal on Matrix Analysis and Applications, 21(4):1253–1278, 2000

work page 2000

[10] [10]

On the best rank-1 and rank- (r1, r2,

Lieven De Lathauwer, Bart De Moor, and Joos Vandewalle. On the best rank-1 and rank- (r1, r2, . . . , rn) approximation of higher-order tensors.SIAM Journal on Matrix Analysis and Applications, 21(4):1324–1342, 2000

work page 2000

[11] [11]

Accelerating numerical dense linear algebra calculations with gpus

Jack Dongarra, Mark Gates, Azzam Haidar, Jakub Kurzak, Piotr Luszczek, Stanimire Tomov, and Ichitaro Yamazaki. Accelerating numerical dense linear algebra calculations with gpus. Numerical Computations with GPUs, pages 1–26, 2014

work page 2014

[12] [12]

The approximation of one matrix by another of lower rank

Carl Eckart and Gale Young. The approximation of one matrix by another of lower rank. Psychometrika, 1(3):211–218, 1936

work page 1936

[13] [13]

Matheson, and Haesun Park

Srinivas Eswar, Koby Hayashi, Grey Ballard, Ramakrishnan Kannan, Michael A. Matheson, and Haesun Park. PLANC: Parallel low-rank approximation with nonnegativity constraints. ACM Transactions on Mathematical Software, 47(3), June 2021

work page 2021

[14] [14]

NekRS, a GPU-accelerated spectral element Navier–Stokes solver.Parallel Computing, 114:102982, 2022

Paul Fischer, Stefan Kerkemeier, Misun Min, Yu-Hsiang Lan, Malachi Phillips, Thilina Rath- nayake, Elia Merzari, Ananias Tomboulides, Ali Karakus, Noel Chalmers, and Tim Warbur- ton. NekRS, a GPU-accelerated spectral element Navier–Stokes solver.Parallel Computing, 114:102982, 2022

work page 2022

[15] [15]

Hannachi, I

A. Hannachi, I. T. Jolliffe, and D. B. Stephenson. Empirical Orthogonal Functions and Related Techniques in Atmospheric Science: A Review.International Journal of Climatology, 27(9):1119–1152, 2007

work page 2007

[16] [16]

explanatory

Richard A. Harshman. Foundations of the PARAFAC procedure: Models and conditions for an “explanatory” multi-modal factor analysis.UCLA Working Papers in Phonetics, 16:1–84, 1970

work page 1970

[17] [17]

Helal, Jan Laukemann, Fabio Checconi, Jesmin Jahan Tithi, Teresa Ranadive, Fabrizio Petrini, and Jeewhan Choi

Ahmed E. Helal, Jan Laukemann, Fabio Checconi, Jesmin Jahan Tithi, Teresa Ranadive, Fabrizio Petrini, and Jeewhan Choi. ALTO: adaptive linearized storage of sparse tensors. InProceedings of the 35th ACM International Conference on Supercomputing, ICS ’21, page 404?416, New York, NY, USA, 2021. Association for Computing Machinery

work page 2021

[18] [18]

Hitchcock

Frank L. Hitchcock. The expression of a tensor or a polyadic as a sum of products.Journal of Mathematics and Physics, 6(1–4):164–189, 1927

work page 1927

[19] [19]

Intel Corporation.Developer Reference for Intel®oneAPI Math Kernel Library, 2026

work page 2026

[20] [20]

Yujing Jiang, Daniel Cooley, and Michael F. Wehner. Principal component analysis for extremes and application to U.S. precipitation.Journal of Climate, 33(15), 2020

work page 2020

[21] [21]

Kalnay, M

E. Kalnay, M. Kanamitsu, R. Kistler, W. Collins, D. Deaven, L. Gandin, M. Iredell, S. Saha, G. White, J. Woollen, Y. Zhu, M. Chelliah, W. Ebisuzaki, W. Higgins, J. Janowiak, K. C. Mo, C. Ropelewski, J. Wang, A. Leetmaa, R. Reynolds, R. Jenne, and D. Joseph. The NCEP/NCAR 40-year reanalysis project.Bulletin of the American Meteorological Society, 77(3):437...

work page 1996

[22] [22]

Kaya and B

O. Kaya and B. U¸ car. High performance parallel algorithms for the Tucker decomposition of sparse tensors. In45th International Conference on Parallel Processing (ICPP ’16), pages 103–112, 2016. 21

work page 2016

[23] [23]

Projected tensor-tensor products for efficient computation of optimal multiway data representations.Linear Algebra and its Applications, 729:100–147, 2025

Katherine Keegan and Elizabeth Newman. Projected tensor-tensor products for efficient computation of optimal multiway data representations.Linear Algebra and its Applications, 729:100–147, 2025

work page 2025

[24] [24]

A tensor svd-based classification algorithm applied to fmri data, 2021

Katherine Keegan, Tanvi Vishwanath, and Yihua Xu. A tensor svd-based classification algorithm applied to fmri data, 2021

work page 2021

[25] [25]

Tensor-tensor products with invertible linear transforms.Linear Algebra and its Applications, 485:545–570, 2015

Eric Kernfeld, Misha Kilmer, and Shuchin Aeron. Tensor-tensor products with invertible linear transforms.Linear Algebra and its Applications, 485:545–570, 2015

work page 2015

[26] [26]

Tensor-tensor algebra for optimal representation and compression of multiway data.Proceedings of the National Academy of Sciences, 118(28):e2015851118, 2021

Misha E Kilmer, Lior Horesh, Haim Avron, and Elizabeth Newman. Tensor-tensor algebra for optimal representation and compression of multiway data.Proceedings of the National Academy of Sciences, 118(28):e2015851118, 2021

work page 2021

[27] [27]

Kolda and Brett W

Tamara G. Kolda and Brett W. Bader. Tensor decompositions and applications.SIAM Review, 51(3):455–500, 2009

work page 2009

[28] [28]

Recov- ering sparse DFT from missing signals via interior point method on GPU.arXiv preprint arXiv:2502.04217, 2025

Wei Kuang, Vishwas Rao, Alexis Montoison, Fran¸ cois Pacaud, and Mihai Anitescu. Recov- ering sparse DFT from missing signals via interior point method on GPU.arXiv preprint arXiv:2502.04217, 2025

work page arXiv 2025

[29] [29]

Carnegie Mellon University, June 2018

Canyi Lu.Tensor-Tensor Product Toolbox. Carnegie Mellon University, June 2018. https: //github.com/canyilu/tproduct

work page 2018

[30] [30]

Symmetric gauge functions and unitarily invariant norms.The Quarterly Journal of Mathematics, 11(1):50–59, 1960

Leon Mirsky. Symmetric gauge functions and unitarily invariant norms.The Quarterly Journal of Mathematics, 11(1):50–59, 1960

work page 1960

[31] [31]

Optimal matrix-mimetic tensor algebras via variable projection.SIAM Journal on Matrix Analysis and Applications, 46(3):1764–1790, 2025

Elizabeth Newman and Katherine Keegan. Optimal matrix-mimetic tensor algebras via variable projection.SIAM Journal on Matrix Analysis and Applications, 46(3):1764–1790, 2025

work page 2025

[32] [32]

Oseledets

Ivan V. Oseledets. Tensor-train decomposition.SIAM Journal on Scientific Computing, 33(5):2295–2317, 2011

work page 2011

[33] [33]

Phipps and Tamara G

Eric T. Phipps and Tamara G. Kolda. Software for sparse tensor decomposition on emerging computing architectures.SIAM Journal on Scientific Computing, 41(3):C269–C290, 2019. GenTen: shared-memory/Kokkos parallel CP decomposition

work page 2019

[34] [34]

Performance of the low-rank tt-svd for large dense tensors on modern multicore cpus.SIAM Journal on Scientific Computing, 44(4):C287–C309, 2022

Melven R¨ ohrig-Z¨ ollner, Jonas Thies, and Achim Basermann. Performance of the low-rank tt-svd for large dense tensors on modern multicore cpus.SIAM Journal on Scientific Computing, 44(4):C287–C309, 2022

work page 2022

[35] [35]

A medium-grained algorithm for distributed sparse tensor factorization

Shaden Smith and George Karypis. A medium-grained algorithm for distributed sparse tensor factorization. In2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 902–911, 2016. SPLATT: distributed-memory parallel CP decomposition

work page 2016

[36] [36]

Ledyard R. Tucker. Some mathematical notes on three-mode factor analysis.Psychometrika, 31(3):279–311, 1966

work page 1966

[37] [37]

Amped: Accelerating mttkrp for billion-scale sparse tensor decomposition on multiple gpus

Sasindu Wijeratne, Rajgopal Kannan, and Viktor Prasanna. Amped: Accelerating mttkrp for billion-scale sparse tensor decomposition on multiple gpus. InProceedings of the 54th International Conference on Parallel Processing, ICPP ’25, page 208–217, New York, NY, USA,

work page

[38] [38]

Association for Computing Machinery. 22

work page