Monte Cimone v3: Where RISC-V Stands in High-Performance Computing
Pith reviewed 2026-05-25 00:22 UTC · model grok-4.3
The pith
The SG2044 RISC-V processor in Monte Cimone v3 more than doubles single-core performance and delivers 3.08 GFLOPs/W efficiency comparable to x86 and Arm servers.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The SG2044 more than doubles single-core performance and improves scalability compared to SG2042. MCv3 achieves an energy efficiency of 3.08 GFLOPs/W which improves of 10x w.r.t. MCv1 and is in the range of x86-64 and Arm servers. On pure performance when normalized on the SIMD/Vector length MCv3 on its peak efficiency point (16 cores) achieves 46% performance of Intel Sapphire Rapids server and 91% performance of NVIDIA Grace CPU superchip.
What carries the argument
The Monte Cimone v3 cluster built on the SOPHGO Sophon SG2044 processor, measured via HPL and STREAM benchmarks paired with power instrumentation for cross-architecture comparison.
If this is right
- RISC-V processors can now deliver energy efficiency in the same band as established x86-64 and Arm server CPUs on dense linear algebra and memory bandwidth workloads.
- Single-core gains from SG2042 to SG2044 translate into better cluster scalability under the same HPL and STREAM conditions.
- When normalized to vector length, RISC-V reaches over 90 percent of NVIDIA Grace performance at the 16-core efficiency point.
- Iterative hardware generations in open testbeds like Monte Cimone can close the absolute performance gap with proprietary server chips.
Where Pith is reading between the lines
- If the reported efficiency holds under production compilers and larger node counts, RISC-V clusters could become practical for power-limited HPC installations.
- The normalization step highlights that remaining gaps are largely in vector unit width and software maturity rather than fundamental architectural inefficiency.
- Extending the same measurement protocol to additional RISC-V chips would create a public performance trajectory that vendors could target.
Load-bearing premise
The HPL and STREAM benchmarks together with the chosen power measurement methodology provide a representative and architecture-fair comparison of HPC-relevant performance and efficiency across the RISC-V, x86, and Arm platforms tested.
What would settle it
Running the same HPL and STREAM workloads on the SG2044 hardware with an independent power meter or an expanded benchmark suite that produces efficiency or normalized performance figures outside the reported ranges relative to the Intel and NVIDIA references.
Figures
read the original abstract
The Monte Cimone project provides a RISC-V testbed for High-Performacne Computing cluster. This paper presents Monte Cimone v3 (MCv3), the third iteration of the Monte Cimone RISC-V HPC cluster, integrating the SOPHGO Sophon SG2044 processor, an evolution of the SG2042 used in MCv2. We characterize MCv3 using HPL and STREAM benchmarks coupled with power measurements, and compare it against two reference platforms: the Intel Xeon Platinum 8480+(Sapphire Rapids) and the NVIDIA Grace CPU Superchip. Our results show that the SG2044 more than doubles single-core performance and improves scalability compared to SG2042. MCv3 achieves an energy efficiency of 3.08GFLOPs/W which improves of 10x w.r.t. MCv1 and is in the range of x86-64 and Arm servers. On pure performance when normalized on the SIMD/Vector length MCv3 on its peak efficiency point (16 cores) achieves 46% performance of Intel Sapphire Rapids server and 91% performance of NVIDIA Grace CPU superchip.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents Monte Cimone v3, the third iteration of a RISC-V HPC cluster testbed now using the SOPHGO SG2044 processor (successor to the SG2042 in MCv2). It reports HPL and STREAM benchmark results together with power measurements, claiming that SG2044 more than doubles single-core performance and improves scalability versus SG2042, that MCv3 reaches 3.08 GFLOPs/W (10× improvement over MCv1 and comparable to x86-64/Arm servers), and that, after SIMD/vector-length normalization, MCv3 at its 16-core peak-efficiency point attains 46 % of an Intel Sapphire Rapids server and 91 % of an NVIDIA Grace CPU superchip.
Significance. If the cross-platform measurement protocols prove architecture-fair, the work supplies concrete empirical data on RISC-V HPC progress, documenting a substantial efficiency gain and relative performance standings that can inform both hardware development and procurement decisions.
major comments (3)
- [Methods / Experimental Setup] The power-measurement methodology (instrumentation, sampling rate, and measurement point—wall, package, or node) is not described with sufficient detail to verify that the reported 3.08 GFLOPs/W efficiency and the 10× improvement claim rest on comparable quantities across MCv1–MCv3 and the Intel/Arm reference platforms.
- [Results / Benchmark Configuration] The normalized performance ratios (46 % of Sapphire Rapids, 91 % of Grace) presuppose equivalent optimization effort for HPL and STREAM on all three architectures; the manuscript does not report compiler flags, vector-extension usage, or problem-size choices that would allow an independent assessment of optimization parity.
- [Results] Error bars, run-to-run variability, or statistical justification for the single-core doubling and scalability claims versus SG2042 are absent, making it impossible to judge whether the reported gains exceed measurement uncertainty.
minor comments (2)
- [Abstract] Abstract contains the typo “High-Performacne”.
- [Figures] Figure captions and axis labels should explicitly state the power-measurement domain (e.g., “CPU package power”) to avoid reader misinterpretation.
Simulated Author's Rebuttal
We thank the referee for the constructive comments that highlight areas where additional detail will strengthen the manuscript. We address each major comment below and commit to revisions that provide the requested information without altering the core claims.
read point-by-point responses
-
Referee: [Methods / Experimental Setup] The power-measurement methodology (instrumentation, sampling rate, and measurement point—wall, package, or node) is not described with sufficient detail to verify that the reported 3.08 GFLOPs/W efficiency and the 10× improvement claim rest on comparable quantities across MCv1–MCv3 and the Intel/Arm reference platforms.
Authors: We agree that the power-measurement methodology requires more detail to support the efficiency numbers and cross-platform comparisons. In the revised manuscript we will add a dedicated subsection specifying the instrumentation (power meters or on-board sensors), sampling rates, and measurement points (wall, package, or node) used for MCv1–MCv3 as well as the Intel Sapphire Rapids and NVIDIA Grace platforms. revision: yes
-
Referee: [Results / Benchmark Configuration] The normalized performance ratios (46 % of Sapphire Rapids, 91 % of Grace) presuppose equivalent optimization effort for HPL and STREAM on all three architectures; the manuscript does not report compiler flags, vector-extension usage, or problem-size choices that would allow an independent assessment of optimization parity.
Authors: The observation is correct: the manuscript currently omits compiler flags, vector-extension details, and problem-size choices. We will revise the Results section to report these parameters explicitly for each architecture and benchmark, enabling readers to evaluate optimization parity for the normalized 46 % and 91 % figures. revision: yes
-
Referee: [Results] Error bars, run-to-run variability, or statistical justification for the single-core doubling and scalability claims versus SG2042 are absent, making it impossible to judge whether the reported gains exceed measurement uncertainty.
Authors: We acknowledge that the current text lacks error bars or run-to-run statistics. The revised manuscript will include multiple-run data, standard deviations or error bars, and a brief statistical justification for the single-core performance doubling and scalability improvements relative to SG2042. revision: yes
Circularity Check
No circularity: empirical benchmark results with no derivations or self-referential reductions
full rationale
The paper reports direct empirical measurements from HPL and STREAM benchmarks plus power figures on SG2044, SG2042, Sapphire Rapids, and Grace platforms. No equations, fitted parameters, predictions derived from inputs, or self-citation chains appear in the abstract or described claims. All reported values (e.g., 3.08 GFLOPs/W, 46% normalized performance) are stated as outcomes of running standard benchmarks, not reductions by construction. This matches the default expectation for measurement papers; the reader's score of 1.0 is consistent with absence of any load-bearing circular step.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Monte Cimone: Paving the Road for the First Generation of RISC-V High-Performance Computers , year=
Bartolini, Andrea and Ficarelli, Federico and Parisi, Emanuele and Beneventi, Francesco and Barchi, Francesco and Gregori, Daniele and Magugliani, Fabrizio and Cicala, Marco and Gianfreda, Cosimo and Cesarini, Daniele and Acquaviva, Andrea and Benini, Luca , booktitle=. Monte Cimone: Paving the Road for the First Generation of RISC-V High-Performance Comp...
-
[2]
Monte Cimone v2: HPC RISC-V Cluster Evaluation and Optimization
Venieri, Emanuele and Manoni, Simone and Ceccolini, Gabriele and Madella, Giacomo and Ficarelli, Federico and Gregori, Daniele and Acquaviva, Andrea and Benini, Luca and Bartolini, Andrea. Monte Cimone v2: HPC RISC-V Cluster Evaluation and Optimization. High Performance Computing. 2026
work page 2026
-
[3]
Brown, Nick and Jamieson, Maurice and Lee, Joseph and Wang, Paul , title =. Proceedings of the SC '23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis , pages =. 2023 , isbn =. doi:10.1145/3624062.3624234 , abstract =
-
[4]
Brown, Nick , title =. Proceedings of the SC '25 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis , pages =. 2025 , isbn =. doi:10.1145/3731599.3767531 , abstract =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.