Evaluating Rust for Sparse Matrix Kernels in Scientific Computing

Fabio Durastante; Luca Lombardo

arxiv: 2606.19213 · v1 · pith:PQ5YUVDDnew · submitted 2026-06-17 · 💻 cs.MS · cs.NA· math.NA

Evaluating Rust for Sparse Matrix Kernels in Scientific Computing

Luca Lombardo , Fabio Durastante This is my paper

Pith reviewed 2026-06-26 18:30 UTC · model grok-4.3

classification 💻 cs.MS cs.NAmath.NA

keywords Rustsparse matricesSpMVKrylov methodsmatrix exponentialperformance evaluationscientific computingmemory safety

0 comments

The pith

Rust sparse kernels match Eigen and PSBLAS performance on core scientific workloads while trailing PETSc on blocked formats.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests Rust as a memory-safe systems language for the sparse matrix operations that underpin scientific computing. The authors code sparse matrix-vector multiplication, Lanczos Krylov methods, and matrix-exponential evaluation in Rust, then time them against Intel oneMKL, Eigen, PETSc, and PSBLAS on a collection of test matrices. Results show Rust reaches speeds comparable to Eigen and PSBLAS for CSC storage while falling behind PETSc's optimized blocked CSR routines. This matters because high-performance numerical code has long accepted memory-unsafe languages for speed; comparable results would let developers keep safety guarantees without rewriting everything in C or Fortran.

Core claim

Rust implementations of the three workloads achieve performance comparable to Eigen and PSBLAS for CSC formats across the benchmark suite, while trailing PETSc's advanced blocked CSR optimizations. The study examines how compile-time monomorphization, SIMD vectorization, and FFI boundaries interact with Rust's safety model and finds that these features support competitive runtimes without prohibitive overhead.

What carries the argument

The three workloads (SpMV, Lanczos-based Krylov methods, and matrix-exponential evaluation) implemented natively in Rust and timed against established C++ and Fortran libraries on representative sparse matrices.

If this is right

Rust can serve as a drop-in replacement for CSC-based sparse kernels without major performance loss relative to Eigen and PSBLAS.
Compile-time monomorphization and auto-vectorization in Rust suffice to reach state-of-the-art speeds for these operations.
FFI boundaries allow Rust code to interoperate with existing libraries while preserving safety invariants.
Adoption of Rust would be most immediate for codes already using CSC storage rather than advanced blocked CSR formats.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Teams maintaining large scientific codebases could incrementally replace unsafe kernels with Rust versions where memory safety bugs are a recurring concern.
The same evaluation approach could be applied to other candidate languages to map the current performance-safety frontier for numerical libraries.
Extending the benchmarks to GPU offload or distributed-memory settings would test whether Rust's ecosystem supports the next layer of scientific workloads.

Load-bearing premise

The selected matrices and three workloads represent the main computational patterns that dominate scientific computing applications.

What would settle it

A set of benchmarks on a wider collection of matrices showing Rust kernels more than 20 percent slower than all baselines on average would falsify the comparability claim.

read the original abstract

Sparse matrix kernels form the computational backbone of scientific computing, traditionally relying on C/C++ and Fortran implementations that prioritize performance over memory safety. This work evaluates Rust as a systems-level alternative for sparse linear algebra by implementing and benchmarking three core workloads: sparse matrix-vector multiplication (SpMV), Lanczos-based Krylov methods, and matrix-exponential evaluation. We compare native Rust code against established baselines (Intel oneMKL, Eigen, PETSc, and PSBLAS) across a suite of representative matrices. Our results show that Rust's sparse kernels achieve performance comparable to Eigen and PSBLAS, tracking the state-of-the-art for CSC formats, while trailing PETSc's advanced blocked CSR optimizations. By analyzing compile-time monomorphization, SIMD vectorization, and FFI boundaries, we assess the practical impact of Rust's safety model and ecosystem readiness. The study provides concrete, evidence-based guidance for modernizing high-performance numerical software stacks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Rust matches Eigen and PSBLAS on CSC sparse kernels for the tested workloads but trails PETSc on blocked CSR, with the main open question being how representative the matrix suite actually is.

read the letter

Rust kernels reach performance levels comparable to Eigen and PSBLAS on CSC formats for SpMV, Lanczos, and matrix exponential, while lagging PETSc's blocked CSR work. That is the concrete takeaway from the benchmarks they ran.

The paper supplies new numbers by implementing the kernels natively in Rust and running head-to-head tests against oneMKL, Eigen, PETSc, and PSBLAS. The discussion of monomorphization, SIMD vectorization, and FFI overhead adds useful detail on where Rust's safety features cost or save time. This kind of direct, workload-level comparison is what people need when weighing language options for numerical stacks.

The soft spot is the matrix suite. The abstract calls the matrices representative without spelling out selection criteria, sparsity pattern coverage, size range, or condition-number spread. If the set leans toward small or narrow classes of problems, the generalization to broader scientific computing patterns rests on weaker ground. The workloads themselves are standard, so the issue is mainly in how far the results can be pushed.

This is the sort of paper that belongs in a reading group focused on HPC language modernization or sparse linear algebra tooling. It contains enough fresh empirical data and practical analysis to justify sending it to peer review, provided the benchmark description is tightened up.

Referee Report

2 major / 2 minor

Summary. The manuscript evaluates Rust as a systems-level language for sparse matrix kernels in scientific computing. It implements three workloads—SpMV, Lanczos-based Krylov methods, and matrix-exponential evaluation—in native Rust and benchmarks them against Intel oneMKL, Eigen, PETSc, and PSBLAS across a suite of representative matrices. The central claim is that Rust kernels achieve performance comparable to Eigen and PSBLAS for CSC formats while trailing PETSc's blocked CSR optimizations; the work further analyzes the performance impact of Rust features including compile-time monomorphization, SIMD vectorization, and FFI boundaries to provide guidance on ecosystem readiness.

Significance. If the empirical comparisons hold and the matrix suite is representative, the paper supplies concrete evidence that Rust can serve as a competitive, memory-safe alternative for core numerical kernels without major performance penalties in CSC-based workloads. This has potential implications for modernizing scientific software stacks. The manuscript is credited for its direct analysis of Rust-specific mechanisms (monomorphization and FFI) and for framing results as actionable guidance rather than abstract claims.

major comments (2)

[Abstract and benchmark description] Abstract, paragraph on benchmarks: the central performance claim (Rust tracks Eigen/PSBLAS for CSC and trails PETSc blocked CSR) rests on the assertion of 'a suite of representative matrices,' yet no selection criteria, coverage of sparsity structures (block-structured FEM matrices, high-condition-number PDE matrices), or scale diversity are supplied. This omission is load-bearing because the generalization to 'scientific computing applications' cannot be evaluated without it.
[Abstract] Abstract and results presentation: performance outcomes are stated without accompanying data tables, error bars, implementation details on CSC vs. CSR handling, or exclusion criteria for the matrix suite. This prevents verification that the comparison is fair and that post-hoc choices did not affect the reported conclusions.

minor comments (2)

[Abstract] The abstract introduces SpMV, Lanczos, and matrix-exponential without first spelling out the acronyms or briefly defining the workloads for readers outside the immediate subfield.
[Abstract] The phrase 'state-of-the-art for CSC formats' would benefit from explicit version numbers or commit hashes for the baseline libraries to allow exact reproduction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful review and specific suggestions for improving the clarity of our benchmark description and results presentation. We address each major comment below and commit to revisions that will make the matrix suite selection and performance data more transparent and verifiable.

read point-by-point responses

Referee: [Abstract and benchmark description] Abstract, paragraph on benchmarks: the central performance claim (Rust tracks Eigen/PSBLAS for CSC and trails PETSc blocked CSR) rests on the assertion of 'a suite of representative matrices,' yet no selection criteria, coverage of sparsity structures (block-structured FEM matrices, high-condition-number PDE matrices), or scale diversity are supplied. This omission is load-bearing because the generalization to 'scientific computing applications' cannot be evaluated without it.

Authors: We agree that explicit documentation of matrix selection criteria is necessary to support generalization claims. In the revised manuscript we will add a new subsection (likely in Section 3 or 4) that details the selection process, including coverage of block-structured FEM matrices, high-condition-number PDE matrices, sparsity pattern diversity, matrix scale range, and any exclusion rules applied. This addition will directly address the load-bearing nature of the claim. revision: yes
Referee: [Abstract] Abstract and results presentation: performance outcomes are stated without accompanying data tables, error bars, implementation details on CSC vs. CSR handling, or exclusion criteria for the matrix suite. This prevents verification that the comparison is fair and that post-hoc choices did not affect the reported conclusions.

Authors: The abstract is a concise summary and cannot contain full tables or error bars. The full manuscript already presents performance tables, repeated-run statistics (error bars), CSC/CSR implementation differences, and matrix handling details in the Results section. To improve verifiability we will (1) revise the abstract to explicitly reference the Results section for these data and (2) expand the Results section with a dedicated paragraph on exclusion criteria and fairness safeguards if the current text is insufficiently explicit. We cannot embed tabular data in the abstract itself. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical benchmarks with no derivations or fitted predictions

full rationale

The paper reports measured runtime and performance numbers from direct comparisons of Rust sparse kernels against Eigen, PETSc, PSBLAS, and oneMKL on a fixed matrix suite for SpMV, Lanczos, and matrix-exponential workloads. No equations, first-principles derivations, parameter fits, or predictions appear; the central claim is simply that the observed timings are comparable or trailing. The representativeness of the matrix suite is an external assumption about coverage, not a self-referential definition or reduction of any result to its own inputs. No self-citation chains, uniqueness theorems, or ansatzes are invoked to support the performance statements. The study is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are invoked; the work is an empirical performance study whose central claim rests on the representativeness of the test matrices and workloads.

pith-pipeline@v0.9.1-grok · 5689 in / 1116 out tokens · 26732 ms · 2026-06-26T18:30:59.787842+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

38 extracted references · 24 canonical work pages

[2]

https://arxiv.org/ abs/2411.13259

URL https://arxiv.org/abs/2411.13259. https://arxiv.org/ abs/2411.13259

arXiv
[3]

H. Anzt, E. Boman, R. Falgout, P. Ghysels, M. Heroux, X. Li, L.CurfmanMcInnes,R.TranMills,S.Rajamanickam,K.Rupp, B. Smith, I. Yamazaki, and U. Meier Yang. Preparing sparse solvers for exascale computing.Philosophical Transactions of theRoyalSocietyA:Mathematical,PhysicalandEngineeringSci- ences, 378(2166):20190053, 01 2020. ISSN 1364-503X. doi: 10.1098/rs...

work page doi:10.1098/rsta.2019.0053 2020
[4]

H. Anzt, T. Cojean, Y.-C. Chen, G. Flegar, F. Göbel, T. Grütz- macher, P. Nayak, T. Ribizel, and Y.-H. Tsai. Ginkgo: A high performance numerical linear algebra library.Journal of Open Source Software, 5(52):2260, 2020. doi: 10.21105/joss.02260. URLhttps://doi.org/10.21105/joss.02260

work page doi:10.21105/joss.02260 2020
[5]

Journal of Numerical Mathematics 33, 403–415

D. Arndt, W. Bangerth, M. Bergbauer, and et al. Thedeal.ii library, version 9.7.J.Numer.Math., 33(4):403–415, 2025. ISSN 1570-2820,1569-3953. doi:10.1515/jnma-2025-0115. URLhttps: //doi.org/10.1515/jnma-2025-0115

work page doi:10.1515/jnma-2025-0115 2025
[6]

Balay, S

S. Balay, S. Abhyankar, M. Adams, J. Brown, P. Brune, K. Buschelman, L. Dalcin, A. Dener, V. Eijkhout, W. Gropp, D. Karpeyev, D. Kaushik, M. Knepley, D. May, L. McInnes, R. Mills, T. Munson, K. Rupp, P. Sanan, and H. Zhang. PETSc Users Manual. Technical report, Argonne National Laboratory, 2019

2019
[7]

V. A. Barker, L. S. Blackford, J. Dongarra, J. Du Croz, S. Ham- marling, M. Marinova, J. Waśniewski, and P. Yalamov.LA- PACK95 users’ guide, volume 13 ofSoftware, Environments, and Tools. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2001. ISBN 0-89871-504-0. doi: 10.1137/1. 9780898718201. URLhttps://doi.org/10.1137/1.9780898718201

work page doi:10.1137/1 2001
[8]

Benzi and P

M. Benzi and P. Boito. Matrix functions in network analysis. GAMM-Mitt.,43(3):e202000012,36,2020. ISSN0936-7195,1522-

2020
[9]

URL https://doi.org/10

doi: 10.1002/gamm.202000012. URL https://doi.org/10. 1002/gamm.202000012

work page doi:10.1002/gamm.202000012
[10]

A. Bílý, J. Pereira, and P. Müller. A refinement methodology for distributed programs in rust.Proc. ACM Program. Lang., 9 (OOPSLA2), Oct. 2025. doi: 10.1145/3763119. URL https://doi. org/10.1145/3763119

work page doi:10.1145/3763119 2025
[11]

M. Bitar. Rust and julia for scientific computing.Computing inScience&Engineering,26(1):72–76,2024. doi:10.1109/MCSE. 2024.3369988

work page doi:10.1109/mcse 2024
[12]

GraphBLASparaRust

R.Broketa,H.Brumatto,andV.Silva. GraphBLASparaRust. In Anais da XXV Escola Regional de Computação Bahia, Alagoas e Sergipe,pages172–181,PortoAlegre,RS,Brasil,2025.SBC. doi: 10.5753/erbase.2025.13668. URL https://sol.sbc.org.br/index. php/erbase/article/view/39301

work page doi:10.5753/erbase.2025.13668 2025
[13]

Cardellini, S

V. Cardellini, S. Filippone, and D.W.I. Rouson. Design Pat- terns for Sparse-Matrix Computations on Hybrid CPU/GPU Platforms.Scientific Programming, 22(1):469753, 2014. doi: https://doi.org/10.3233/SPR-130363. URLhttps://onlinelibrary. wiley.com/doi/abs/10.3233/SPR-130363

work page doi:10.3233/spr-130363 2014
[14]

Errorbounds for Lanczos-based matrix function approximation.SIAM J

T.Chen,A.Greenbaum,C.Musco,andC.Musco. Errorbounds for Lanczos-based matrix function approximation.SIAM J. Matrix Anal. Appl., 43(2):787–811, 2022. ISSN 0895-4798,1095-

2022
[15]

URL https://doi.org/10.1137/ 21M1427784

doi: 10.1137/21M1427784. URL https://doi.org/10.1137/ 21M1427784

work page doi:10.1137/21m1427784
[16]

T. A. Davis and Y. Hu. The university of florida sparse matrix collection.ACM Trans. Math. Softw., 38(1), Dec. 2011. ISSN 0098-3500. doi:10.1145/2049662.2049663. URLhttps://doi.org/ 10.1145/2049662.2049663

work page doi:10.1145/2049662.2049663 2011
[17]

Benchmarkingoptimizationsoftware withperformanceprofiles.Math.Program.,91(2):201–213,2002

E.D.DolanandJ.J.Moré. Benchmarkingoptimizationsoftware withperformanceprofiles.Math.Program.,91(2):201–213,2002. ISSN 0025-5610,1436-4646. doi: 10.1007/s101070100263. URL https://doi.org/10.1007/s101070100263

work page doi:10.1007/s101070100263 2002
[18]

I. S. Duff, M. A. Heroux, and R. Pozo. An overview of the sparsebasiclinearalgebrasubprograms:thenewstandardfrom the BLAS Technical Forum.ACM Trans. Math. Software, 28(2): 239–267,2002. ISSN0098-3500,1557-7295. doi:10.1145/567806. 567810. URLhttps://doi.org/10.1145/567806.567810

work page doi:10.1145/567806 2002
[19]

I.S. Duff. A survey of sparse matrix research.Proceedingsofthe IEEE,65(4):500–535,1977. doi:10.1109/PROC.1977.10514

work page doi:10.1109/proc.1977.10514 1977
[20]

D’Ambra, F

P. D’Ambra, F. Durastante, and S. Filippone. Parallel Sparse Computation Toolkit.Software Impacts, 15:100463, 2023. ISSN 2665-9638. doi: https://doi.org/10.1016/j.simpa.2022. 13of14 100463. URL https://www.sciencedirect.com/science/article/ pii/S2665963822001476

work page doi:10.1016/j.simpa.2022 2023
[21]

PSBLAS:alibraryforparallellin- ear algebra computation on sparse matrices.ACMTrans.Math

S.FilipponeandM.Colajanni. PSBLAS:alibraryforparallellin- ear algebra computation on sparse matrices.ACMTrans.Math. Softw.,26(4):527–550,Dec.2000. ISSN0098-3500. doi:10.1145/ 365723.365732. URLhttps://doi.org/10.1145/365723.365732

work page doi:10.1145/365723.365732 2000
[22]

Sparse Matrix-Vector Multiplication on GPGPUs.ACM Trans

S.Filippone,V.Cardellini,D.Barbieri,andA.Fanfarillo. Sparse Matrix-Vector Multiplication on GPGPUs.ACM Trans. Math. Softw., 43(4), Jan. 2017. ISSN 0098-3500. doi: 10.1145/3017994. URLhttps://doi.org/10.1145/3017994

work page doi:10.1145/3017994 2017
[23]

D Friese, R

R. D Friese, R. Gioiosa, J. Cottam, E. Multu, G. Roek, P. Thomadakis, and M. Raugas. Lamellar: A Rust-based Asyn- chronous Tasking and PGAS Runtime for High Performance Computing. InSC24-W: Workshops of the International Confer- ence for High Performance Computing, Networking, Storage and Analysis,pages1236–1251.IEEE,2024

2024
[24]

N. J. Higham.Functions of matrices. Society for Industrial and AppliedMathematics(SIAM),Philadelphia,PA,2008.ISBN978- 0-89871-646-7. doi: 10.1137/1.9780898717778. URL https://doi. org/10.1137/1.9780898717778. Theoryandcomputation

work page doi:10.1137/1.9780898717778 2008
[25]

EnhancingTypeSafetyinMPIwithRust: AStaticallyVerifiedApproachforRSMPI

N.IqbalandJ.Brown. EnhancingTypeSafetyinMPIwithRust: AStaticallyVerifiedApproachforRSMPI. InWorkshoponAsyn- chronous Many-Task Systems and Applications, pages 133–139. Springer,2025

2025
[26]

Basic linearalgebrasubprogramsforfortranusage.ACMTrans.Math

C.L.Lawson,R.J.Hanson,D.R.Kincaid,andF.T.Krogh. Basic linearalgebrasubprogramsforfortranusage.ACMTrans.Math. Softw., 5(3):308–323, Sept. 1979. ISSN 0098-3500. doi: 10.1145/ 355841.355847. URLhttps://doi.org/10.1145/355841.355847

work page doi:10.1145/355841.355847 1979
[27]

Martinelli and G

M. Martinelli and G. Manzini. A Functional Tensor Train Library in RUST for Numerical Integration and Resolution of Partial Differential Equations. In Ivan Lirkov and Svetozar Margenov, editors,Large-Scale Scientific Computations, pages 223–233, Cham, 2024. Springer Nature Switzerland. ISBN 978- 3-031-56208-2

2024
[28]

N. D. Matsakis and F. S. Klock. The rust language. InProceed- ingsofthe2014ACMSIGAdaannualconferenceonHighintegrity languagetechnology,pages103–104,2014

2014
[29]

cuda-oxide: A customrustcbackend for compiling GPU kernels in pure Rust, 2026

NVIDIA NVLabs. cuda-oxide: A customrustcbackend for compiling GPU kernels in pure Rust, 2026. Available at https: //github.com/NVlabs/cuda-oxide,accessedMay20,2026

2026
[30]

Quiñones El Kazdadi

S. Quiñones El Kazdadi. faer: A general-purpose linear algebra library for Rust. Docs.rs documentation, 2026. https://docs.rs/ faer/latest/faer/index.html

2026
[31]

diffsol: Rust crate for solv- ing differential equations.Journal of Open Source Software, 11 (117):9384,2026

Martin Robinson and Alex Allmont. diffsol: Rust crate for solv- ing differential equations.Journal of Open Source Software, 11 (117):9384,2026. doi:10.21105/joss.09384. URLhttps://doi.org/ 10.21105/joss.09384

work page doi:10.21105/joss.09384 2026
[32]

rsmpi: MPI bindings for Rust, 2025

rsmpi. rsmpi: MPI bindings for Rust, 2025. Version 0.8.1. Available at https://github.com/rsmpi/rsmpi, accessed May 20, 2026

2025
[33]

Rust CUDA: GPU code fully in Rust, 2025

Rust-CUDA. Rust CUDA: GPU code fully in Rust, 2025. Available at https://rust-gpu.github.io/blog/2025/08/11/ rust-cuda-update/,accessedMay20,2026

2025
[34]

Availableathttps: //github.com/Rust-GPU/rust-gpu,accessedMay20,2026

Rust-GPU.rust-gpu:Rustasafirst-classlanguageandecosystem forGPUgraphicsandcomputeshaders,2026. Availableathttps: //github.com/Rust-GPU/rust-gpu,accessedMay20,2026

2026
[35]

Y. Saad. Analysis of some Krylov subspace approximations to the matrix exponential operator.SIAM J. Numer. Anal., 29(1): 209–228, 1992. ISSN 0036-1429. doi: 10.1137/0729014. URL https://doi.org/10.1137/0729014

work page doi:10.1137/0729014 1992
[36]

Saad,Iterative Methods for Sparse Linear Systems, Society for Industrial and Applied Mathematics, second ed., 2003, https://doi.org/10.1137/1.9780898718003

Y.Saad.Iterativemethodsforsparselinearsystems. SocietyforIn- dustrialandAppliedMathematics,Philadelphia,PA,secondedi- tion, 2003. ISBN 0-89871-534-2. doi: 10.1137/1.9780898718003. URLhttps://doi.org/10.1137/1.9780898718003

work page doi:10.1137/1.9780898718003 2003
[37]

Saad.Numerical methods for large eigenvalue problems, volume 66 ofClassics in Applied Mathematics

Y. Saad.Numerical methods for large eigenvalue problems, volume 66 ofClassics in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, revised edition, 2011. ISBN 978-1-611970-72-2. doi: 10.1137/1.9781611970739.ch1. URL https://doi.org/10.1137/1. 9781611970739.ch1

work page doi:10.1137/1.9781611970739.ch1 2011
[38]

R. B. Sidje. Expokit: a software package for computing ma- trixexponentials.ACMTrans.Math.Softw.,24(1):130–156,Mar
[39]

doi: 10.1145/285861.285868

ISSN 0098-3500. doi: 10.1145/285861.285868. URL https: //doi.org/10.1145/285861.285868. SupportingInformation ThecodeforrunningthebenchmarkisavailablefromtheGitHub repositorylukefleed/hpla-rs. 14of14 arXiv,2024

work page doi:10.1145/285861.285868 2024

[1] [2]

https://arxiv.org/ abs/2411.13259

URL https://arxiv.org/abs/2411.13259. https://arxiv.org/ abs/2411.13259

arXiv

[2] [3]

H. Anzt, E. Boman, R. Falgout, P. Ghysels, M. Heroux, X. Li, L.CurfmanMcInnes,R.TranMills,S.Rajamanickam,K.Rupp, B. Smith, I. Yamazaki, and U. Meier Yang. Preparing sparse solvers for exascale computing.Philosophical Transactions of theRoyalSocietyA:Mathematical,PhysicalandEngineeringSci- ences, 378(2166):20190053, 01 2020. ISSN 1364-503X. doi: 10.1098/rs...

work page doi:10.1098/rsta.2019.0053 2020

[3] [4]

H. Anzt, T. Cojean, Y.-C. Chen, G. Flegar, F. Göbel, T. Grütz- macher, P. Nayak, T. Ribizel, and Y.-H. Tsai. Ginkgo: A high performance numerical linear algebra library.Journal of Open Source Software, 5(52):2260, 2020. doi: 10.21105/joss.02260. URLhttps://doi.org/10.21105/joss.02260

work page doi:10.21105/joss.02260 2020

[4] [5]

Journal of Numerical Mathematics 33, 403–415

D. Arndt, W. Bangerth, M. Bergbauer, and et al. Thedeal.ii library, version 9.7.J.Numer.Math., 33(4):403–415, 2025. ISSN 1570-2820,1569-3953. doi:10.1515/jnma-2025-0115. URLhttps: //doi.org/10.1515/jnma-2025-0115

work page doi:10.1515/jnma-2025-0115 2025

[5] [6]

Balay, S

S. Balay, S. Abhyankar, M. Adams, J. Brown, P. Brune, K. Buschelman, L. Dalcin, A. Dener, V. Eijkhout, W. Gropp, D. Karpeyev, D. Kaushik, M. Knepley, D. May, L. McInnes, R. Mills, T. Munson, K. Rupp, P. Sanan, and H. Zhang. PETSc Users Manual. Technical report, Argonne National Laboratory, 2019

2019

[6] [7]

V. A. Barker, L. S. Blackford, J. Dongarra, J. Du Croz, S. Ham- marling, M. Marinova, J. Waśniewski, and P. Yalamov.LA- PACK95 users’ guide, volume 13 ofSoftware, Environments, and Tools. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2001. ISBN 0-89871-504-0. doi: 10.1137/1. 9780898718201. URLhttps://doi.org/10.1137/1.9780898718201

work page doi:10.1137/1 2001

[7] [8]

Benzi and P

M. Benzi and P. Boito. Matrix functions in network analysis. GAMM-Mitt.,43(3):e202000012,36,2020. ISSN0936-7195,1522-

2020

[8] [9]

URL https://doi.org/10

doi: 10.1002/gamm.202000012. URL https://doi.org/10. 1002/gamm.202000012

work page doi:10.1002/gamm.202000012

[9] [10]

A. Bílý, J. Pereira, and P. Müller. A refinement methodology for distributed programs in rust.Proc. ACM Program. Lang., 9 (OOPSLA2), Oct. 2025. doi: 10.1145/3763119. URL https://doi. org/10.1145/3763119

work page doi:10.1145/3763119 2025

[10] [11]

M. Bitar. Rust and julia for scientific computing.Computing inScience&Engineering,26(1):72–76,2024. doi:10.1109/MCSE. 2024.3369988

work page doi:10.1109/mcse 2024

[11] [12]

GraphBLASparaRust

R.Broketa,H.Brumatto,andV.Silva. GraphBLASparaRust. In Anais da XXV Escola Regional de Computação Bahia, Alagoas e Sergipe,pages172–181,PortoAlegre,RS,Brasil,2025.SBC. doi: 10.5753/erbase.2025.13668. URL https://sol.sbc.org.br/index. php/erbase/article/view/39301

work page doi:10.5753/erbase.2025.13668 2025

[12] [13]

Cardellini, S

V. Cardellini, S. Filippone, and D.W.I. Rouson. Design Pat- terns for Sparse-Matrix Computations on Hybrid CPU/GPU Platforms.Scientific Programming, 22(1):469753, 2014. doi: https://doi.org/10.3233/SPR-130363. URLhttps://onlinelibrary. wiley.com/doi/abs/10.3233/SPR-130363

work page doi:10.3233/spr-130363 2014

[13] [14]

Errorbounds for Lanczos-based matrix function approximation.SIAM J

T.Chen,A.Greenbaum,C.Musco,andC.Musco. Errorbounds for Lanczos-based matrix function approximation.SIAM J. Matrix Anal. Appl., 43(2):787–811, 2022. ISSN 0895-4798,1095-

2022

[14] [15]

URL https://doi.org/10.1137/ 21M1427784

doi: 10.1137/21M1427784. URL https://doi.org/10.1137/ 21M1427784

work page doi:10.1137/21m1427784

[15] [16]

T. A. Davis and Y. Hu. The university of florida sparse matrix collection.ACM Trans. Math. Softw., 38(1), Dec. 2011. ISSN 0098-3500. doi:10.1145/2049662.2049663. URLhttps://doi.org/ 10.1145/2049662.2049663

work page doi:10.1145/2049662.2049663 2011

[16] [17]

Benchmarkingoptimizationsoftware withperformanceprofiles.Math.Program.,91(2):201–213,2002

E.D.DolanandJ.J.Moré. Benchmarkingoptimizationsoftware withperformanceprofiles.Math.Program.,91(2):201–213,2002. ISSN 0025-5610,1436-4646. doi: 10.1007/s101070100263. URL https://doi.org/10.1007/s101070100263

work page doi:10.1007/s101070100263 2002

[17] [18]

I. S. Duff, M. A. Heroux, and R. Pozo. An overview of the sparsebasiclinearalgebrasubprograms:thenewstandardfrom the BLAS Technical Forum.ACM Trans. Math. Software, 28(2): 239–267,2002. ISSN0098-3500,1557-7295. doi:10.1145/567806. 567810. URLhttps://doi.org/10.1145/567806.567810

work page doi:10.1145/567806 2002

[18] [19]

I.S. Duff. A survey of sparse matrix research.Proceedingsofthe IEEE,65(4):500–535,1977. doi:10.1109/PROC.1977.10514

work page doi:10.1109/proc.1977.10514 1977

[19] [20]

D’Ambra, F

P. D’Ambra, F. Durastante, and S. Filippone. Parallel Sparse Computation Toolkit.Software Impacts, 15:100463, 2023. ISSN 2665-9638. doi: https://doi.org/10.1016/j.simpa.2022. 13of14 100463. URL https://www.sciencedirect.com/science/article/ pii/S2665963822001476

work page doi:10.1016/j.simpa.2022 2023

[20] [21]

PSBLAS:alibraryforparallellin- ear algebra computation on sparse matrices.ACMTrans.Math

S.FilipponeandM.Colajanni. PSBLAS:alibraryforparallellin- ear algebra computation on sparse matrices.ACMTrans.Math. Softw.,26(4):527–550,Dec.2000. ISSN0098-3500. doi:10.1145/ 365723.365732. URLhttps://doi.org/10.1145/365723.365732

work page doi:10.1145/365723.365732 2000

[21] [22]

Sparse Matrix-Vector Multiplication on GPGPUs.ACM Trans

S.Filippone,V.Cardellini,D.Barbieri,andA.Fanfarillo. Sparse Matrix-Vector Multiplication on GPGPUs.ACM Trans. Math. Softw., 43(4), Jan. 2017. ISSN 0098-3500. doi: 10.1145/3017994. URLhttps://doi.org/10.1145/3017994

work page doi:10.1145/3017994 2017

[22] [23]

D Friese, R

R. D Friese, R. Gioiosa, J. Cottam, E. Multu, G. Roek, P. Thomadakis, and M. Raugas. Lamellar: A Rust-based Asyn- chronous Tasking and PGAS Runtime for High Performance Computing. InSC24-W: Workshops of the International Confer- ence for High Performance Computing, Networking, Storage and Analysis,pages1236–1251.IEEE,2024

2024

[23] [24]

N. J. Higham.Functions of matrices. Society for Industrial and AppliedMathematics(SIAM),Philadelphia,PA,2008.ISBN978- 0-89871-646-7. doi: 10.1137/1.9780898717778. URL https://doi. org/10.1137/1.9780898717778. Theoryandcomputation

work page doi:10.1137/1.9780898717778 2008

[24] [25]

EnhancingTypeSafetyinMPIwithRust: AStaticallyVerifiedApproachforRSMPI

N.IqbalandJ.Brown. EnhancingTypeSafetyinMPIwithRust: AStaticallyVerifiedApproachforRSMPI. InWorkshoponAsyn- chronous Many-Task Systems and Applications, pages 133–139. Springer,2025

2025

[25] [26]

Basic linearalgebrasubprogramsforfortranusage.ACMTrans.Math

C.L.Lawson,R.J.Hanson,D.R.Kincaid,andF.T.Krogh. Basic linearalgebrasubprogramsforfortranusage.ACMTrans.Math. Softw., 5(3):308–323, Sept. 1979. ISSN 0098-3500. doi: 10.1145/ 355841.355847. URLhttps://doi.org/10.1145/355841.355847

work page doi:10.1145/355841.355847 1979

[26] [27]

Martinelli and G

M. Martinelli and G. Manzini. A Functional Tensor Train Library in RUST for Numerical Integration and Resolution of Partial Differential Equations. In Ivan Lirkov and Svetozar Margenov, editors,Large-Scale Scientific Computations, pages 223–233, Cham, 2024. Springer Nature Switzerland. ISBN 978- 3-031-56208-2

2024

[27] [28]

N. D. Matsakis and F. S. Klock. The rust language. InProceed- ingsofthe2014ACMSIGAdaannualconferenceonHighintegrity languagetechnology,pages103–104,2014

2014

[28] [29]

cuda-oxide: A customrustcbackend for compiling GPU kernels in pure Rust, 2026

NVIDIA NVLabs. cuda-oxide: A customrustcbackend for compiling GPU kernels in pure Rust, 2026. Available at https: //github.com/NVlabs/cuda-oxide,accessedMay20,2026

2026

[29] [30]

Quiñones El Kazdadi

S. Quiñones El Kazdadi. faer: A general-purpose linear algebra library for Rust. Docs.rs documentation, 2026. https://docs.rs/ faer/latest/faer/index.html

2026

[30] [31]

diffsol: Rust crate for solv- ing differential equations.Journal of Open Source Software, 11 (117):9384,2026

Martin Robinson and Alex Allmont. diffsol: Rust crate for solv- ing differential equations.Journal of Open Source Software, 11 (117):9384,2026. doi:10.21105/joss.09384. URLhttps://doi.org/ 10.21105/joss.09384

work page doi:10.21105/joss.09384 2026

[31] [32]

rsmpi: MPI bindings for Rust, 2025

rsmpi. rsmpi: MPI bindings for Rust, 2025. Version 0.8.1. Available at https://github.com/rsmpi/rsmpi, accessed May 20, 2026

2025

[32] [33]

Rust CUDA: GPU code fully in Rust, 2025

Rust-CUDA. Rust CUDA: GPU code fully in Rust, 2025. Available at https://rust-gpu.github.io/blog/2025/08/11/ rust-cuda-update/,accessedMay20,2026

2025

[33] [34]

Availableathttps: //github.com/Rust-GPU/rust-gpu,accessedMay20,2026

Rust-GPU.rust-gpu:Rustasafirst-classlanguageandecosystem forGPUgraphicsandcomputeshaders,2026. Availableathttps: //github.com/Rust-GPU/rust-gpu,accessedMay20,2026

2026

[34] [35]

Y. Saad. Analysis of some Krylov subspace approximations to the matrix exponential operator.SIAM J. Numer. Anal., 29(1): 209–228, 1992. ISSN 0036-1429. doi: 10.1137/0729014. URL https://doi.org/10.1137/0729014

work page doi:10.1137/0729014 1992

[35] [36]

Saad,Iterative Methods for Sparse Linear Systems, Society for Industrial and Applied Mathematics, second ed., 2003, https://doi.org/10.1137/1.9780898718003

Y.Saad.Iterativemethodsforsparselinearsystems. SocietyforIn- dustrialandAppliedMathematics,Philadelphia,PA,secondedi- tion, 2003. ISBN 0-89871-534-2. doi: 10.1137/1.9780898718003. URLhttps://doi.org/10.1137/1.9780898718003

work page doi:10.1137/1.9780898718003 2003

[36] [37]

Saad.Numerical methods for large eigenvalue problems, volume 66 ofClassics in Applied Mathematics

Y. Saad.Numerical methods for large eigenvalue problems, volume 66 ofClassics in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, revised edition, 2011. ISBN 978-1-611970-72-2. doi: 10.1137/1.9781611970739.ch1. URL https://doi.org/10.1137/1. 9781611970739.ch1

work page doi:10.1137/1.9781611970739.ch1 2011

[37] [38]

R. B. Sidje. Expokit: a software package for computing ma- trixexponentials.ACMTrans.Math.Softw.,24(1):130–156,Mar

[38] [39]

doi: 10.1145/285861.285868

ISSN 0098-3500. doi: 10.1145/285861.285868. URL https: //doi.org/10.1145/285861.285868. SupportingInformation ThecodeforrunningthebenchmarkisavailablefromtheGitHub repositorylukefleed/hpla-rs. 14of14 arXiv,2024

work page doi:10.1145/285861.285868 2024