pith. sign in

arxiv: 1907.06470 · v1 · pith:IRHWLNPGnew · submitted 2019-07-15 · 💻 cs.MS · cs.NA· math.NA

Out-of-core singular value decomposition

Pith reviewed 2026-05-24 21:18 UTC · model grok-4.3

classification 💻 cs.MS cs.NAmath.NA
keywords singular value decompositionout-of-core algorithmsrandomized SVDlarge-scale matrix factorizationparallel computingexternal memory algorithms
0
0 comments X

The pith

An out-of-core randomized SVD factors arbitrarily large dense and sparse matrices using limited memory and external storage.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents an implementation of randomized singular value decomposition that operates out-of-core, allowing it to factor matrices too large to fit in main memory by relying on external storage such as HDDs or SSDs. The solution is designed to be fully scalable and efficiently parallelizable while supporting both dense and sparse matrices. It introduces a partitioning technique that enables memory and I/O planning, automatic load balancing, and the ability to resume interrupted computations without recalculating prior steps. A sympathetic reader would care because standard SVD methods assume fast random access or large intermediate storage, which fails for the very large matrices common in machine learning, data science, and signal processing.

Core claim

The central claim is that an out-of-core randomized SVD solution factors both dense and sparse matrices of arbitrarily large size within arbitrarily small memory limits by using an innovative technique for partitioning matrices that supports out-of-core and parallel processing, memory and I/O use planning, automatic load balancing, performance tuning, and resumption of interrupted operations from persistent external storage.

What carries the argument

The innovative technique for partitioning matrices that supports out-of-core processing, parallelization, and resumption from external storage.

If this is right

  • Very large matrices in applications like machine learning can be factored on commodity hardware without high-memory requirements.
  • Interrupted computations on large matrices can resume from the last completed partition without restarting from scratch.
  • The same partitioning supports automatic load balancing and performance tuning across parallel processes.
  • The approach applies to both dense and sparse matrices, removing the need for separate in-core algorithms for each type.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The partitioning approach might extend to other randomized linear algebra routines such as QR or eigenvalue decompositions.
  • Commodity hardware could handle SVD tasks previously requiring distributed clusters if the I/O efficiency holds.
  • Resumption from disk could change workflows for long-running factorizations in production data pipelines.

Load-bearing premise

The partitioning technique enables efficient out-of-core operations and parallelization without I/O or coordination overhead that would prevent scaling to arbitrarily large matrices in arbitrarily small memory.

What would settle it

Running the method on a matrix whose size exceeds available RAM plus disk capacity and observing that it either fails to complete or requires memory proportional to the full matrix size would falsify the central claim.

Figures

Figures reproduced from arXiv: 1907.06470 by Miroslav Ba\v{c}\'ak, Stefan Bordag, Vadim Demchik.

Figure 1
Figure 1. Figure 1: Sparse matrix representation 3.7. Multi-threading. ExB SVD Library uses the POSIX pthreads execution model to provide efficient process threading. Users can determine the number of threads at run-time, or at any stage during processing, including setting the number of threads used for individual operations or matrices. To avoid inefficiencies caused by threading overhead, there is an internal automatic re-… view at source ↗
Figure 2
Figure 2. Figure 2: displays the processing times calculating the approximated SVD factorizations of these two matrices, as a function of the memory limit we allocated to it. The experiments were performed on a dedicated system with dual Intel Xeon E5-2640 v4 @2.40GHz CPUs, 512GB RAM as core memory, and a 2TB NVMe-SSD as out-of-core storage. Memory limits were allocated per matrix (see Parameter 1 in Section 3.5) and not as a… view at source ↗
Figure 3
Figure 3. Figure 3: Image reconstruction via low rank SVD with and without power iterations. 4.2.1. Power iteration [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Rank 50 image reconstruction with different precisions. All image compression experiments in this section were performed with the memory allocation re￾stricted to 1KB per matrix – less than the memory required for a single row or column of the source image – in order to demonstrate the implementation’s ability to function under very extreme memory restrictions. 5. Conclusions SVD is time-honored and well-e… view at source ↗
read the original abstract

Singular value decomposition (SVD) is a standard matrix factorization technique that produces optimal low-rank approximations of matrices. It has diverse applications, including machine learning, data science and signal processing. However, many common problems involve very large matrices that cannot fit in the main memory of commodity computers, making it impractical to use standard SVD algorithms that assume fast random access or large amounts of space for intermediate calculations. To address this issue, we have implemented an out-of-core (external memory) randomized SVD solution that is fully scalable and efficiently parallelizable. This solution factors both dense and sparse matrices of arbitrarily large size within arbitrarily small memory limits, efficiently using out-of-core storage as needed. It uses an innovative technique for partitioning matrices that lends itself to out-of-core and parallel processing, as well as memory and I/O use planning, automatic load balancing, performance tuning, and makes possible a number of other practical enhancements to the current state-of-the-art. Furthermore, by using persistent external storage (generally HDDs or SSDs), users can resume interrupted operations without having to recalculate previously performed steps, solving a major practical problem in factoring very large matrices.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper claims to have implemented an out-of-core randomized SVD algorithm that factors dense and sparse matrices of arbitrary size using arbitrarily small main memory by leveraging external storage (HDDs/SSDs), a novel matrix partitioning technique for out-of-core and parallel processing, automatic load balancing, performance tuning, and resumability of interrupted computations without recalculation.

Significance. If the implementation and its claimed properties hold under empirical validation, the work would provide a practical extension of randomized SVD methods to external-memory settings, enabling low-rank approximations for matrices that exceed main memory on commodity hardware and addressing a common bottleneck in machine learning, data science, and signal processing applications.

major comments (1)
  1. [Abstract] Abstract: the central claims of full scalability, efficient parallelizability, arbitrary matrix size within arbitrarily small memory limits, and practical enhancements via the partitioning technique are asserted without any accompanying benchmarks, accuracy measurements, I/O analysis, runtime comparisons, or validation details; this absence makes it impossible to assess whether the implementation actually achieves the stated properties.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review and for highlighting the need for clearer linkage between claims and supporting evidence. We address the single major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claims of full scalability, efficient parallelizability, arbitrary matrix size within arbitrarily small memory limits, and practical enhancements via the partitioning technique are asserted without any accompanying benchmarks, accuracy measurements, I/O analysis, runtime comparisons, or validation details; this absence makes it impossible to assess whether the implementation actually achieves the stated properties.

    Authors: The abstract is a concise summary and does not contain the detailed empirical results; those appear in the experimental and implementation sections of the full manuscript, which include benchmarks, accuracy measurements, I/O analysis, runtime comparisons, and validation on both dense and sparse matrices. We agree that the abstract could better signal the existence of this supporting material. We will revise the abstract to reference the location of the empirical validation and, space permitting, include one or two key quantitative highlights so that readers can immediately assess the claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper describes an engineering implementation of an out-of-core randomized SVD using a matrix partitioning technique for external-memory and parallel processing. No derivation chain, equations, or first-principles claims are present that could reduce to self-definition, fitted inputs renamed as predictions, or self-citation load-bearing steps. All performance and scalability properties are stated as direct consequences of the implemented algorithm and storage strategy, with no mathematical reductions or uniqueness theorems invoked that loop back to the paper's own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the effectiveness of randomized SVD approximations and the unproven efficiency of the new partitioning scheme for out-of-core use; no free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption Randomized SVD yields sufficiently accurate low-rank approximations for the intended applications
    The method builds on randomized SVD without deriving or validating its approximation quality in the abstract.

pith-pipeline@v0.9.0 · 5735 in / 1281 out tokens · 28272 ms · 2026-05-24T21:18:00.530091+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · 1 internal anchor

  1. [1]

    Dongarra, M

    J. Dongarra, M. Gates, A. Haidar, J. Kurzak, P. Luszczek, S. Tomov, and I. Yamazaki , The singular value decomposition: Anatomy of optimizing an algorithm for extreme scale , SIAM Rev., 60 (2018), pp. 808–865

  2. [2]

    Kabir, A

    K. Kabir, A. Haidar, S. Tomov, A. Bouteiller, and J. Dongarra , A framework for out of memory SVD algo- rithms, in High Performance Computing, J. M. Kunkel, R. Yokota, P. Balaji, and D. Keyes, eds., Cham, 2017, Springer International Publishing, pp. 158–178

  3. [3]

    Haidar, K

    A. Haidar, K. Kabir, D. F ayad, S. Tomov, and J. Dongarra , Out of memory SVD solver for big data , in 2017 IEEE High Performance Extreme Computing Conference (HPEC), Sept 2017, pp. 1–7

  4. [4]

    Rabani and S

    E. Rabani and S. Toledo , Out-of-core SVD and QR decompositions , in Proceedings of the 10th SIAM Conference on Parallel Processing for Scientific Computing, Norfolk, Virginia, March 2001

  5. [5]

    Gates, S

    M. Gates, S. Tomov and J. Dongarra , Accelerating the SVD two stage bidiagonal reduction and divide and conquer using GPUs, in Parallel Computing, 74 (2018), pp. 3–18

  6. [6]

    D. I. Martin, J. C. Martin, M. W. Berry, and M. Browne , Out-of-core SVD performance for document indexing , Appl. Numer. Math., 57 (2007), pp. 1230–1239

  7. [7]

    Y. Lu, F. Ino, and Y. Matsushita , High-performance out-of-core block randomized singular value decomposition on GPU, arXiv:1706.07191, (2017)

  8. [8]

    Halko, P

    N. Halko, P. G. Martinsson, and J. A. Tropp , Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions, SIAM Review, 53 (2011), pp. 217–288

  9. [9]

    Dongarra and D

    J. Dongarra and D. W alker , The design of linear algebra libraries for high performance computers , tech. rep., United States Department of Energy: Office of Scientific and Technical Information, August 1993

  10. [10]

    Bouge, P

    L. Bouge, P. Fraigniaud, A. Mignotte, and Y. Robert , Efficient block cyclic data redistribution , in Euro-Par’96 Parallel Processing. Euro-Par 1996, vol. 1123 of Lecture Notes in Computer Science, 1996

  11. [11]

    E. F. D’Azevedo and J. J. Dongarra , The design and implementation of the parallel out-of-core scalapack lu, qr, and cholesky factorization routines , Concurrency - Practice and Experience, 12 (2000), pp. 1481–1493

  12. [12]

    N. P. Halko , Randomized methods for computing low-rank approximations of matrices , ProQuest LLC, Ann Arbor, MI, 2012. Thesis (Ph.D.)–University of Colorado at Boulder

  13. [13]

    G. H. Golub and C. F. V an Loan , Matrix Computations, Johns Hopkins Studies in the Mathematical Sciences, Johns Hopkins University Press, Baltimore, MD, fourth ed., 2013

  14. [14]

    I. S. Duff, A. M. Erisman, and J. K. Reid , Direct Methods for Sparse Matrices , Oxford University Press, 1986

  15. [15]

    T. A. Davis and Y. Hu , The University of Florida sparse matrix collection , ACM Trans. Math. Software, 38 (2011), pp. Art. 1, 25

  16. [16]

    R. F. Boisvert, R. Pozo, and K. A. Remington , The matrix market exchange formats: Initial design , tech. rep., US Dept of Commerce: National Institute of Standards and Technology, December 1996

  17. [17]

    Hooker , Centerfold of Lenna Sj¨ o¨ oblom, Playboy, (November 1972)

    D. Hooker , Centerfold of Lenna Sj¨ o¨ oblom, Playboy, (November 1972). Photograph

  18. [18]

    Demchik , Scalable out-of-core BLAS library from scratch , In preparation, (2019)

    V. Demchik , Scalable out-of-core BLAS library from scratch , In preparation, (2019). ExB Labs, Seeburgstr. 100, 04103 Leipzig, Germany E-mail address: demchik@exb.de E-mail address: bordag@exb.de 5https://www.exb.de/