pith. sign in

arxiv: 2604.00736 · v2 · pith:GH5NRLXOnew · submitted 2026-04-01 · 💻 cs.DC · cs.ET

Is RISC-V Ready for Machine Learning? Portable Gaussian Processes Using Asynchronous Tasks

classification 💻 cs.DC cs.ET
keywords performancerisc-vscalingchipgaussianparallelscalabilitysingle-core
0
0 comments X
read the original abstract

Gaussian processes are widely used in machine learning domains but remain computationally demanding, limiting their efficient scalability across emerging hardware platforms. The GPRat library addresses these challenges using the HPX asynchronous many-task runtime system. In this work, we extend GPRat to enable portability across multiple hardware architectures and evaluate its performance on representative x86-64, ARM, and RISC-V chips. We conduct node-level strong scaling and problem size scaling benchmarks for Gaussian process prediction and hyperparameter optimization to assess single-core performance, parallel scalability, and architectural efficiency. Our results show that while the x86-64 Zen 2 chip achieves a 58% single-core performance advantage over the ARM-based Fujitsu A64FX, superior parallel scaling allows the 48-core ARM chip to outperform the 64-core Zen 2 by 9% at full node utilization. The evaluated SOPHON SG2042 RISC-V chip exhibits substantially lower performance and weaker scalability, with single-core performance lagging by up to a factor of 14 and large-scale parallel workloads showing slowdowns of up to a factor of 24. For problem size scaling, ARM and x86-64 systems demonstrate comparable performance within 23%. These findings highlight the growing competitiveness of purpose-built ARM chips. Furthermore, they underscore the importance of wide-register vectorization support and improvements to the memory subsystem for upcoming RISC-V platforms, especially when targeted by many-task runtimes.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.