Sarus Suite: Cloud-native Containers for HPC
Pith reviewed 2026-05-10 06:27 UTC · model grok-4.3
The pith
HPC containers achieve production performance and scalability using a standard Podman engine plus dedicated integration layers rather than a custom runtime.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Sarus Suite keeps the Podman container engine unmodified and supplies the missing HPC capabilities through four complementary layers: declarative runtime specification, scheduler-native execution, scalable shared-image access, and standards-based host capability injection. On a Cray EX GH200 system the system delivers performance and scaling equivalent to the production Enroot+Pyxis stack for PyFR, SPH-EXA, Megatron-LM, and Pynamic workloads while showing consistently lower per-node container startup times. It further supports unmodified upstream OCI images, including those from NGC, and allows cloud-native multi-container workflows expressed as Kubernetes manifests.
What carries the argument
The layered architecture that places scheduler semantics, scalable image access, and host integration in separate system components around an unchanged Podman engine.
Load-bearing premise
The added integration layers can stay in sync with future upstream Podman releases without recreating the compatibility burden the design seeks to escape, and the observed performance parity will hold for other production workloads beyond the tested set.
What would settle it
A Podman update that forces changes to the integration layers or a new production workload that shows measurable scaling or startup degradation relative to the Enroot+Pyxis baseline on the same hardware.
Figures
read the original abstract
High-performance computing (HPC) systems must support fast-moving software stacks, especially in AI/ML, while preserving scheduler control, scalable startup, and production performance. Yet many HPC container solutions rely on specialized runtime stacks that weaken continuity with mainstream cloud-native workflows and require ongoing effort to sustain compatibility with the evolving upstream ecosystem. We argue that HPC should specialize the integration layer while keeping the container engine aligned with upstream container evolution. We present Sarus Suite, an upstream-aligned HPC container architecture built around an unchanged Podman engine. Sarus Suite adds the HPC-specific functionality needed for production use through complementary system layers for declarative runtime specification, scheduler-native execution, scalable shared-image access, and standards-based host capability injection. We evaluate Sarus Suite on a Cray EX GH200 system using communication-intensive HPC workloads, large scale AI training, metadata-heavy startup workloads, and container startup measurements. Across PyFR, SPH-EXA, Megatron-LM, and Pynamic, Sarus Suite matches the performance and scaling of the production Enroot+Pyxis baseline while delivering consistently faster per-node container startup. The architecture also enables direct use of upstream OCI images, including NGC-based images, and supports cloud-native multi-container workflows expressed through Kubernetes manifests. These results show that HPC-grade containers do not require an HPC-specific runtime, provided that scheduler semantics, scalable image access, and host integration are implemented in explicit system layers. This preserves upstream continuity and software agility while maintaining scheduler control, scalability, and production performance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents Sarus Suite, an HPC container architecture that uses an unmodified Podman engine augmented by explicit system layers for declarative runtime specification, scheduler-native execution, scalable shared-image access, and standards-based host capability injection. It evaluates the system on a Cray EX GH200 using communication-intensive HPC workloads (PyFR, SPH-EXA), large-scale AI training (Megatron-LM), metadata-heavy startup (Pynamic), and container startup measurements, claiming performance parity with the Enroot+Pyxis production baseline, faster per-node startup, direct use of upstream OCI images including NGC, and support for Kubernetes manifests. The central thesis is that HPC-grade containers do not require a specialized runtime if scheduler semantics, image access, and host integration are handled in separate layers.
Significance. If the performance equivalence and architectural claims hold under scrutiny, the work provides a concrete demonstration that HPC systems can preserve upstream container continuity and agility for fast-moving AI/ML stacks while retaining scheduler control and production scalability. This could reduce the long-term maintenance cost of diverging from mainstream runtimes and enable more direct reuse of cloud-native tooling and images on HPC platforms.
major comments (3)
- [Evaluation] Evaluation section: The reported performance parity across PyFR, SPH-EXA, Megatron-LM, and Pynamic, as well as the faster startup claim, is presented without any description of measurement methodology, node counts, number of repetitions, error bars, statistical tests, or exclusion criteria. This absence makes it impossible to assess whether the equivalence to Enroot+Pyxis is robust or subject to post-hoc selection, directly undermining the central empirical support for the architecture.
- [Architecture] Architecture and implementation sections: The paper provides no concrete details on the API surface, hook points, or implementation depth of the four custom layers (declarative runtime specification, scheduler-native execution, scalable shared-image access, standards-based host capability injection) relative to Podman internals such as OCI spec handling or runtime hooks. Without this, the claim that the layers avoid reintroducing compatibility burdens cannot be evaluated.
- [Discussion] Discussion or conclusions: The sustainability argument—that the added layers can be maintained in sync with upstream Podman changes without recreating the maintenance cost the paper seeks to avoid—is asserted but unsupported by any analysis of update processes, version pinning strategy, or testing against Podman release cycles. This is load-bearing for the continuity claim.
minor comments (2)
- [Abstract] The abstract and introduction would benefit from a brief table or diagram summarizing the division of responsibilities between the unmodified Podman engine and the four added layers.
- [Introduction] References to related HPC container runtimes (e.g., Shifter, Singularity/Apptainer) appear light; adding a short comparison table would help situate the contribution.
Simulated Author's Rebuttal
We thank the referee for their thorough review and positive assessment of the significance of our work. We appreciate the detailed feedback on the evaluation, architecture, and discussion sections. Below, we provide point-by-point responses to the major comments and outline the revisions we plan to make to address them.
read point-by-point responses
-
Referee: [Evaluation] Evaluation section: The reported performance parity across PyFR, SPH-EXA, Megatron-LM, and Pynamic, as well as the faster startup claim, is presented without any description of measurement methodology, node counts, number of repetitions, error bars, statistical tests, or exclusion criteria. This absence makes it impossible to assess whether the equivalence to Enroot+Pyxis is robust or subject to post-hoc selection, directly undermining the central empirical support for the architecture.
Authors: We acknowledge that the evaluation section in the current manuscript lacks detailed descriptions of the measurement methodology, including node counts, repetitions, error bars, statistical tests, and exclusion criteria. This is a valid point that affects the reproducibility and robustness assessment of our results. In the revised version, we will expand the evaluation section to include comprehensive details on how the experiments were conducted, the hardware configuration (e.g., number of nodes used for each workload), the number of repetitions for each measurement, presentation of error bars or variance, any statistical tests applied, and criteria for data exclusion if applicable. We believe this will strengthen the empirical support for the performance parity claims. revision: yes
-
Referee: [Architecture] Architecture and implementation sections: The paper provides no concrete details on the API surface, hook points, or implementation depth of the four custom layers (declarative runtime specification, scheduler-native execution, scalable shared-image access, standards-based host capability injection) relative to Podman internals such as OCI spec handling or runtime hooks. Without this, the claim that the layers avoid reintroducing compatibility burdens cannot be evaluated.
Authors: We agree that additional concrete details on the implementation of the custom layers would help evaluate the architecture's claims regarding compatibility and maintenance. The manuscript currently emphasizes the high-level design and benefits of the architecture. To address the referee's concern, we will revise the architecture section to include more specific information on the API surfaces, integration hook points, and implementation depth of each layer relative to Podman, such as how declarative specifications map to OCI runtime specs and the use of runtime hooks for host capability injection. This addition will allow readers to better assess the compatibility claims without requiring changes to the core architecture. revision: yes
-
Referee: [Discussion] Discussion or conclusions: The sustainability argument—that the added layers can be maintained in sync with upstream Podman changes without recreating the maintenance cost the paper seeks to avoid—is asserted but unsupported by any analysis of update processes, version pinning strategy, or testing against Podman release cycles. This is load-bearing for the continuity claim.
Authors: We recognize the importance of substantiating the sustainability argument for the continuity claim. The current manuscript asserts this without detailed analysis. In the revised manuscript, we will expand the discussion section to include an analysis of our update processes, including how we track Podman releases, our version pinning strategy, and results from testing against recent Podman release cycles. This will provide evidence supporting the claim that the added layers can be maintained with lower overhead than a forked runtime. If the analysis is preliminary, we will note it as such. revision: partial
Circularity Check
No circularity: empirical performance claims rest on external benchmarks, not self-derived quantities
full rationale
The paper advances an architectural argument (HPC containers via unmodified Podman plus explicit layers) and supports it with direct performance measurements against an independent production stack (Enroot+Pyxis) on PyFR, SPH-EXA, Megatron-LM, and Pynamic. No equations, fitted parameters, or predictions appear. No load-bearing self-citation chain is invoked to justify the central claim; the evaluation data are externally falsifiable and not constructed from the authors' own prior definitions. This is the normal case of a self-contained systems paper whose results do not reduce to their inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption An unmodified upstream Podman engine plus thin integration layers is sufficient to deliver production HPC performance and scheduler semantics.
invented entities (1)
-
Sarus Suite declarative runtime specification, scheduler-native execution, scalable shared-image access, and standards-based host capability injection layers
no independent evidence
Reference graph
Works this paper leans on
-
[1]
[Ansel et al.(2024)]Jason Ansel, Edward Yang, Horace He, Natalia Gimelshein, et al
work page 2024
-
[2]
PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation. InProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2(La Jolla, CA, USA)(ASPLOS ’24). Association for Computing Machinery, New York, NY, USA, 929–947. doi:10.1145/36206...
-
[3]
InHigh Performance Computing, Michèle Weiland, Guido Juckeland, Sadaf Alam, and Heike Jagode (Eds.)
Sarus: Highly Scalable Docker Containers for HPC Systems. InHigh Performance Computing, Michèle Weiland, Guido Juckeland, Sadaf Alam, and Heike Jagode (Eds.). Springer International Publishing, Cham, 46–60. [Cavelan et al.(2020)]Aurélien Cavelan, Rubén M. Cabezón, Michal Grabarczyk, and Florina M. Ciorba
work page 2020
-
[4]
A Smoothed Particle Hydrodynamics Mini-App for Exascale. InProceedings of the Platform for Advanced Scientific Computing Conference(Geneva, Switzerland)(PASC ’20). Association for Computing Machinery, New York, NY, USA, Article 11, 11 pages. doi:10.1145/3394277.3401855 [Contributors(2017)] Podman Contributors. 2017.Podman. https://github.com/container s/p...
-
[5]
Sarus Suite.https://sarus-suite.github.io/ [Cruz and Madonna(2024)]Felipe A Cruz and Alberto Madonna
work page 2024
-
[6]
Containers-first user environments on HPE Cray EX. InProceedings of the Cray User Group Conference (CUG 2024). Cray User Group. https://cug.org/proceedings/cug2024_proceedings/incl udes/files/pap107s2-file1.pdf [Foundation(2024)] The Cloud Native Computing Foundation. 2024.CNCF Cloud Native Definition v1.1. Retrieved March 3, 2026 fromhttps://github.com/c...
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[7]
InProceedings of the Cray User Group Conference (CUG 2015)
Contain This, Unleashing Docker for HPC. InProceedings of the Cray User Group Conference (CUG 2015). Cray User Group. https://cug.org/proceedings/cug2015_proceedings/incl udes/files/pap157.pdf [Keller Tesser and Borin(2023)]Rafael Keller Tesser and Edson Borin
work page 2015
-
[8]
Borin.The Journal of Supercomputing79, 5 (2023), 5759–5827
Containers in HPC: a survey: RK Tesser, E. Borin.The Journal of Supercomputing79, 5 (2023), 5759–5827. [Kurtzer et al.(2017)]Gregory M. Kurtzer, Vanessa Sochat, and Michael W. Bauer
work page 2023
-
[9]
https://doi.org/10.1371/journal
Sin- gularity: Scientific containers for mobility of compute.PLOS ONE12, 5 (05 2017), 1–20. doi:10.1371/journal.pone.0177459 [Lee et al.(2007a)]Gregory L. Lee, Dong H. Ahn, Bronis R. de Supinski, John Gyllenhaal, and Matthew LeGendre. 2007a.Pynamic.https://github.com/llnl/pynamic [Lee et al.(2007b)]Gregory L. Lee, Dong H. Ahn, Bronis R. de Supinski, John ...
-
[10]
In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’18)
Ray: A Distributed Framework for Emerging AI Applications. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’18). 561–577. [Priedhorsky and Randles(2017)]Reid Priedhorsky and Tim Randles
work page 2017
-
[11]
Unimem: runtime data managementon non-volatile memory-based heterogeneous main memory,
Charliecloud: un- privileged containers for user-defined software stacks in HPC. InProceedings of the Inter- national Conference for High Performance Computing, Networking, Storage and Analysis (Denver, Colorado)(SC ’17). Association for Computing Machinery, New York, NY, USA, Article 36, 10 pages. doi:10.1145/3126908.3126925 [Ray Project Contributors(202...
-
[12]
Scaling Podman on Perlmutter: Embracing a community-supported con- tainer ecosystem. In2022 IEEE/ACM 4th International Workshop on Containers and 25 New Orchestration Paradigms for Isolated Environments in HPC (CANOPIE-HPC). 25–35. doi:10.1109/CANOPIE-HPC56864.2022.00008 [Torrez et al.(2019)]Alfred Torrez, Timothy Randles, and Reid Priedhorsky
-
[13]
HPC Con- tainer Runtimes have Minimal or No Performance Impact. In2019 IEEE/ACM International Workshop on Containers and New Orchestration Paradigms for Isolated Environments in HPC (CANOPIE-HPC). 37–42. doi:10.1109/CANOPIE-HPC49598.2019.00010 [Vincent et al.(2016)]Peter Vincent, Freddie Witherden, Brian Vermeire, Jin Seok Park, and Arvind Iyer
-
[14]
Towards Green Aviation with Python at Petascale. InSC ’16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1–11. doi:10.1109/SC.2016.1 [Walsh(2019)] Daniel Walsh. 2019.Rootless Podman and NFS. Retrieved March 16, 2026 from https://www.redhat.com/en/blog/rootless-podman-nfs [Witherden et al.(202...
-
[15]
Computer Physics Communications311 (2025), 109567
PyFR v2.0.3: Towards industrial adoption of scale-resolving simulations. Computer Physics Communications311 (2025), 109567. doi:10.1016/j.cpc.2025.109567 26
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.