Evaluating Rust for Sparse Matrix Kernels in Scientific Computing
Pith reviewed 2026-06-26 18:30 UTC · model grok-4.3
The pith
Rust sparse kernels match Eigen and PSBLAS performance on core scientific workloads while trailing PETSc on blocked formats.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Rust implementations of the three workloads achieve performance comparable to Eigen and PSBLAS for CSC formats across the benchmark suite, while trailing PETSc's advanced blocked CSR optimizations. The study examines how compile-time monomorphization, SIMD vectorization, and FFI boundaries interact with Rust's safety model and finds that these features support competitive runtimes without prohibitive overhead.
What carries the argument
The three workloads (SpMV, Lanczos-based Krylov methods, and matrix-exponential evaluation) implemented natively in Rust and timed against established C++ and Fortran libraries on representative sparse matrices.
If this is right
- Rust can serve as a drop-in replacement for CSC-based sparse kernels without major performance loss relative to Eigen and PSBLAS.
- Compile-time monomorphization and auto-vectorization in Rust suffice to reach state-of-the-art speeds for these operations.
- FFI boundaries allow Rust code to interoperate with existing libraries while preserving safety invariants.
- Adoption of Rust would be most immediate for codes already using CSC storage rather than advanced blocked CSR formats.
Where Pith is reading between the lines
- Teams maintaining large scientific codebases could incrementally replace unsafe kernels with Rust versions where memory safety bugs are a recurring concern.
- The same evaluation approach could be applied to other candidate languages to map the current performance-safety frontier for numerical libraries.
- Extending the benchmarks to GPU offload or distributed-memory settings would test whether Rust's ecosystem supports the next layer of scientific workloads.
Load-bearing premise
The selected matrices and three workloads represent the main computational patterns that dominate scientific computing applications.
What would settle it
A set of benchmarks on a wider collection of matrices showing Rust kernels more than 20 percent slower than all baselines on average would falsify the comparability claim.
read the original abstract
Sparse matrix kernels form the computational backbone of scientific computing, traditionally relying on C/C++ and Fortran implementations that prioritize performance over memory safety. This work evaluates Rust as a systems-level alternative for sparse linear algebra by implementing and benchmarking three core workloads: sparse matrix-vector multiplication (SpMV), Lanczos-based Krylov methods, and matrix-exponential evaluation. We compare native Rust code against established baselines (Intel oneMKL, Eigen, PETSc, and PSBLAS) across a suite of representative matrices. Our results show that Rust's sparse kernels achieve performance comparable to Eigen and PSBLAS, tracking the state-of-the-art for CSC formats, while trailing PETSc's advanced blocked CSR optimizations. By analyzing compile-time monomorphization, SIMD vectorization, and FFI boundaries, we assess the practical impact of Rust's safety model and ecosystem readiness. The study provides concrete, evidence-based guidance for modernizing high-performance numerical software stacks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript evaluates Rust as a systems-level language for sparse matrix kernels in scientific computing. It implements three workloads—SpMV, Lanczos-based Krylov methods, and matrix-exponential evaluation—in native Rust and benchmarks them against Intel oneMKL, Eigen, PETSc, and PSBLAS across a suite of representative matrices. The central claim is that Rust kernels achieve performance comparable to Eigen and PSBLAS for CSC formats while trailing PETSc's blocked CSR optimizations; the work further analyzes the performance impact of Rust features including compile-time monomorphization, SIMD vectorization, and FFI boundaries to provide guidance on ecosystem readiness.
Significance. If the empirical comparisons hold and the matrix suite is representative, the paper supplies concrete evidence that Rust can serve as a competitive, memory-safe alternative for core numerical kernels without major performance penalties in CSC-based workloads. This has potential implications for modernizing scientific software stacks. The manuscript is credited for its direct analysis of Rust-specific mechanisms (monomorphization and FFI) and for framing results as actionable guidance rather than abstract claims.
major comments (2)
- [Abstract and benchmark description] Abstract, paragraph on benchmarks: the central performance claim (Rust tracks Eigen/PSBLAS for CSC and trails PETSc blocked CSR) rests on the assertion of 'a suite of representative matrices,' yet no selection criteria, coverage of sparsity structures (block-structured FEM matrices, high-condition-number PDE matrices), or scale diversity are supplied. This omission is load-bearing because the generalization to 'scientific computing applications' cannot be evaluated without it.
- [Abstract] Abstract and results presentation: performance outcomes are stated without accompanying data tables, error bars, implementation details on CSC vs. CSR handling, or exclusion criteria for the matrix suite. This prevents verification that the comparison is fair and that post-hoc choices did not affect the reported conclusions.
minor comments (2)
- [Abstract] The abstract introduces SpMV, Lanczos, and matrix-exponential without first spelling out the acronyms or briefly defining the workloads for readers outside the immediate subfield.
- [Abstract] The phrase 'state-of-the-art for CSC formats' would benefit from explicit version numbers or commit hashes for the baseline libraries to allow exact reproduction.
Simulated Author's Rebuttal
We thank the referee for the careful review and specific suggestions for improving the clarity of our benchmark description and results presentation. We address each major comment below and commit to revisions that will make the matrix suite selection and performance data more transparent and verifiable.
read point-by-point responses
-
Referee: [Abstract and benchmark description] Abstract, paragraph on benchmarks: the central performance claim (Rust tracks Eigen/PSBLAS for CSC and trails PETSc blocked CSR) rests on the assertion of 'a suite of representative matrices,' yet no selection criteria, coverage of sparsity structures (block-structured FEM matrices, high-condition-number PDE matrices), or scale diversity are supplied. This omission is load-bearing because the generalization to 'scientific computing applications' cannot be evaluated without it.
Authors: We agree that explicit documentation of matrix selection criteria is necessary to support generalization claims. In the revised manuscript we will add a new subsection (likely in Section 3 or 4) that details the selection process, including coverage of block-structured FEM matrices, high-condition-number PDE matrices, sparsity pattern diversity, matrix scale range, and any exclusion rules applied. This addition will directly address the load-bearing nature of the claim. revision: yes
-
Referee: [Abstract] Abstract and results presentation: performance outcomes are stated without accompanying data tables, error bars, implementation details on CSC vs. CSR handling, or exclusion criteria for the matrix suite. This prevents verification that the comparison is fair and that post-hoc choices did not affect the reported conclusions.
Authors: The abstract is a concise summary and cannot contain full tables or error bars. The full manuscript already presents performance tables, repeated-run statistics (error bars), CSC/CSR implementation differences, and matrix handling details in the Results section. To improve verifiability we will (1) revise the abstract to explicitly reference the Results section for these data and (2) expand the Results section with a dedicated paragraph on exclusion criteria and fairness safeguards if the current text is insufficiently explicit. We cannot embed tabular data in the abstract itself. revision: partial
Circularity Check
No circularity: empirical benchmarks with no derivations or fitted predictions
full rationale
The paper reports measured runtime and performance numbers from direct comparisons of Rust sparse kernels against Eigen, PETSc, PSBLAS, and oneMKL on a fixed matrix suite for SpMV, Lanczos, and matrix-exponential workloads. No equations, first-principles derivations, parameter fits, or predictions appear; the central claim is simply that the observed timings are comparable or trailing. The representativeness of the matrix suite is an external assumption about coverage, not a self-referential definition or reduction of any result to its own inputs. No self-citation chains, uniqueness theorems, or ansatzes are invoked to support the performance statements. The study is therefore self-contained against external benchmarks and receives the default non-circularity finding.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[2]
https://arxiv.org/ abs/2411.13259
URL https://arxiv.org/abs/2411.13259. https://arxiv.org/ abs/2411.13259
-
[3]
H. Anzt, E. Boman, R. Falgout, P. Ghysels, M. Heroux, X. Li, L.CurfmanMcInnes,R.TranMills,S.Rajamanickam,K.Rupp, B. Smith, I. Yamazaki, and U. Meier Yang. Preparing sparse solvers for exascale computing.Philosophical Transactions of theRoyalSocietyA:Mathematical,PhysicalandEngineeringSci- ences, 378(2166):20190053, 01 2020. ISSN 1364-503X. doi: 10.1098/rs...
-
[4]
H. Anzt, T. Cojean, Y.-C. Chen, G. Flegar, F. Göbel, T. Grütz- macher, P. Nayak, T. Ribizel, and Y.-H. Tsai. Ginkgo: A high performance numerical linear algebra library.Journal of Open Source Software, 5(52):2260, 2020. doi: 10.21105/joss.02260. URLhttps://doi.org/10.21105/joss.02260
-
[5]
Journal of Numerical Mathematics 33, 403–415
D. Arndt, W. Bangerth, M. Bergbauer, and et al. Thedeal.ii library, version 9.7.J.Numer.Math., 33(4):403–415, 2025. ISSN 1570-2820,1569-3953. doi:10.1515/jnma-2025-0115. URLhttps: //doi.org/10.1515/jnma-2025-0115
-
[6]
Balay, S
S. Balay, S. Abhyankar, M. Adams, J. Brown, P. Brune, K. Buschelman, L. Dalcin, A. Dener, V. Eijkhout, W. Gropp, D. Karpeyev, D. Kaushik, M. Knepley, D. May, L. McInnes, R. Mills, T. Munson, K. Rupp, P. Sanan, and H. Zhang. PETSc Users Manual. Technical report, Argonne National Laboratory, 2019
2019
-
[7]
V. A. Barker, L. S. Blackford, J. Dongarra, J. Du Croz, S. Ham- marling, M. Marinova, J. Waśniewski, and P. Yalamov.LA- PACK95 users’ guide, volume 13 ofSoftware, Environments, and Tools. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2001. ISBN 0-89871-504-0. doi: 10.1137/1. 9780898718201. URLhttps://doi.org/10.1137/1.9780898718201
work page doi:10.1137/1 2001
-
[8]
Benzi and P
M. Benzi and P. Boito. Matrix functions in network analysis. GAMM-Mitt.,43(3):e202000012,36,2020. ISSN0936-7195,1522-
2020
-
[9]
doi: 10.1002/gamm.202000012. URL https://doi.org/10. 1002/gamm.202000012
-
[10]
A. Bílý, J. Pereira, and P. Müller. A refinement methodology for distributed programs in rust.Proc. ACM Program. Lang., 9 (OOPSLA2), Oct. 2025. doi: 10.1145/3763119. URL https://doi. org/10.1145/3763119
-
[11]
M. Bitar. Rust and julia for scientific computing.Computing inScience&Engineering,26(1):72–76,2024. doi:10.1109/MCSE. 2024.3369988
-
[12]
R.Broketa,H.Brumatto,andV.Silva. GraphBLASparaRust. In Anais da XXV Escola Regional de Computação Bahia, Alagoas e Sergipe,pages172–181,PortoAlegre,RS,Brasil,2025.SBC. doi: 10.5753/erbase.2025.13668. URL https://sol.sbc.org.br/index. php/erbase/article/view/39301
-
[13]
V. Cardellini, S. Filippone, and D.W.I. Rouson. Design Pat- terns for Sparse-Matrix Computations on Hybrid CPU/GPU Platforms.Scientific Programming, 22(1):469753, 2014. doi: https://doi.org/10.3233/SPR-130363. URLhttps://onlinelibrary. wiley.com/doi/abs/10.3233/SPR-130363
-
[14]
Errorbounds for Lanczos-based matrix function approximation.SIAM J
T.Chen,A.Greenbaum,C.Musco,andC.Musco. Errorbounds for Lanczos-based matrix function approximation.SIAM J. Matrix Anal. Appl., 43(2):787–811, 2022. ISSN 0895-4798,1095-
2022
-
[15]
URL https://doi.org/10.1137/ 21M1427784
doi: 10.1137/21M1427784. URL https://doi.org/10.1137/ 21M1427784
-
[16]
T. A. Davis and Y. Hu. The university of florida sparse matrix collection.ACM Trans. Math. Softw., 38(1), Dec. 2011. ISSN 0098-3500. doi:10.1145/2049662.2049663. URLhttps://doi.org/ 10.1145/2049662.2049663
-
[17]
Benchmarkingoptimizationsoftware withperformanceprofiles.Math.Program.,91(2):201–213,2002
E.D.DolanandJ.J.Moré. Benchmarkingoptimizationsoftware withperformanceprofiles.Math.Program.,91(2):201–213,2002. ISSN 0025-5610,1436-4646. doi: 10.1007/s101070100263. URL https://doi.org/10.1007/s101070100263
-
[18]
I. S. Duff, M. A. Heroux, and R. Pozo. An overview of the sparsebasiclinearalgebrasubprograms:thenewstandardfrom the BLAS Technical Forum.ACM Trans. Math. Software, 28(2): 239–267,2002. ISSN0098-3500,1557-7295. doi:10.1145/567806. 567810. URLhttps://doi.org/10.1145/567806.567810
-
[19]
I.S. Duff. A survey of sparse matrix research.Proceedingsofthe IEEE,65(4):500–535,1977. doi:10.1109/PROC.1977.10514
-
[20]
P. D’Ambra, F. Durastante, and S. Filippone. Parallel Sparse Computation Toolkit.Software Impacts, 15:100463, 2023. ISSN 2665-9638. doi: https://doi.org/10.1016/j.simpa.2022. 13of14 100463. URL https://www.sciencedirect.com/science/article/ pii/S2665963822001476
-
[21]
PSBLAS:alibraryforparallellin- ear algebra computation on sparse matrices.ACMTrans.Math
S.FilipponeandM.Colajanni. PSBLAS:alibraryforparallellin- ear algebra computation on sparse matrices.ACMTrans.Math. Softw.,26(4):527–550,Dec.2000. ISSN0098-3500. doi:10.1145/ 365723.365732. URLhttps://doi.org/10.1145/365723.365732
-
[22]
Sparse Matrix-Vector Multiplication on GPGPUs.ACM Trans
S.Filippone,V.Cardellini,D.Barbieri,andA.Fanfarillo. Sparse Matrix-Vector Multiplication on GPGPUs.ACM Trans. Math. Softw., 43(4), Jan. 2017. ISSN 0098-3500. doi: 10.1145/3017994. URLhttps://doi.org/10.1145/3017994
-
[23]
D Friese, R
R. D Friese, R. Gioiosa, J. Cottam, E. Multu, G. Roek, P. Thomadakis, and M. Raugas. Lamellar: A Rust-based Asyn- chronous Tasking and PGAS Runtime for High Performance Computing. InSC24-W: Workshops of the International Confer- ence for High Performance Computing, Networking, Storage and Analysis,pages1236–1251.IEEE,2024
2024
-
[24]
N. J. Higham.Functions of matrices. Society for Industrial and AppliedMathematics(SIAM),Philadelphia,PA,2008.ISBN978- 0-89871-646-7. doi: 10.1137/1.9780898717778. URL https://doi. org/10.1137/1.9780898717778. Theoryandcomputation
-
[25]
EnhancingTypeSafetyinMPIwithRust: AStaticallyVerifiedApproachforRSMPI
N.IqbalandJ.Brown. EnhancingTypeSafetyinMPIwithRust: AStaticallyVerifiedApproachforRSMPI. InWorkshoponAsyn- chronous Many-Task Systems and Applications, pages 133–139. Springer,2025
2025
-
[26]
Basic linearalgebrasubprogramsforfortranusage.ACMTrans.Math
C.L.Lawson,R.J.Hanson,D.R.Kincaid,andF.T.Krogh. Basic linearalgebrasubprogramsforfortranusage.ACMTrans.Math. Softw., 5(3):308–323, Sept. 1979. ISSN 0098-3500. doi: 10.1145/ 355841.355847. URLhttps://doi.org/10.1145/355841.355847
-
[27]
Martinelli and G
M. Martinelli and G. Manzini. A Functional Tensor Train Library in RUST for Numerical Integration and Resolution of Partial Differential Equations. In Ivan Lirkov and Svetozar Margenov, editors,Large-Scale Scientific Computations, pages 223–233, Cham, 2024. Springer Nature Switzerland. ISBN 978- 3-031-56208-2
2024
-
[28]
N. D. Matsakis and F. S. Klock. The rust language. InProceed- ingsofthe2014ACMSIGAdaannualconferenceonHighintegrity languagetechnology,pages103–104,2014
2014
-
[29]
cuda-oxide: A customrustcbackend for compiling GPU kernels in pure Rust, 2026
NVIDIA NVLabs. cuda-oxide: A customrustcbackend for compiling GPU kernels in pure Rust, 2026. Available at https: //github.com/NVlabs/cuda-oxide,accessedMay20,2026
2026
-
[30]
Quiñones El Kazdadi
S. Quiñones El Kazdadi. faer: A general-purpose linear algebra library for Rust. Docs.rs documentation, 2026. https://docs.rs/ faer/latest/faer/index.html
2026
-
[31]
Martin Robinson and Alex Allmont. diffsol: Rust crate for solv- ing differential equations.Journal of Open Source Software, 11 (117):9384,2026. doi:10.21105/joss.09384. URLhttps://doi.org/ 10.21105/joss.09384
-
[32]
rsmpi: MPI bindings for Rust, 2025
rsmpi. rsmpi: MPI bindings for Rust, 2025. Version 0.8.1. Available at https://github.com/rsmpi/rsmpi, accessed May 20, 2026
2025
-
[33]
Rust CUDA: GPU code fully in Rust, 2025
Rust-CUDA. Rust CUDA: GPU code fully in Rust, 2025. Available at https://rust-gpu.github.io/blog/2025/08/11/ rust-cuda-update/,accessedMay20,2026
2025
-
[34]
Availableathttps: //github.com/Rust-GPU/rust-gpu,accessedMay20,2026
Rust-GPU.rust-gpu:Rustasafirst-classlanguageandecosystem forGPUgraphicsandcomputeshaders,2026. Availableathttps: //github.com/Rust-GPU/rust-gpu,accessedMay20,2026
2026
-
[35]
Y. Saad. Analysis of some Krylov subspace approximations to the matrix exponential operator.SIAM J. Numer. Anal., 29(1): 209–228, 1992. ISSN 0036-1429. doi: 10.1137/0729014. URL https://doi.org/10.1137/0729014
-
[36]
Y.Saad.Iterativemethodsforsparselinearsystems. SocietyforIn- dustrialandAppliedMathematics,Philadelphia,PA,secondedi- tion, 2003. ISBN 0-89871-534-2. doi: 10.1137/1.9780898718003. URLhttps://doi.org/10.1137/1.9780898718003
-
[37]
Saad.Numerical methods for large eigenvalue problems, volume 66 ofClassics in Applied Mathematics
Y. Saad.Numerical methods for large eigenvalue problems, volume 66 ofClassics in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, revised edition, 2011. ISBN 978-1-611970-72-2. doi: 10.1137/1.9781611970739.ch1. URL https://doi.org/10.1137/1. 9781611970739.ch1
-
[38]
R. B. Sidje. Expokit: a software package for computing ma- trixexponentials.ACMTrans.Math.Softw.,24(1):130–156,Mar
-
[39]
ISSN 0098-3500. doi: 10.1145/285861.285868. URL https: //doi.org/10.1145/285861.285868. SupportingInformation ThecodeforrunningthebenchmarkisavailablefromtheGitHub repositorylukefleed/hpla-rs. 14of14 arXiv,2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.