Beyond Pre-Training: The Full Lifecycle of Foundation Models on HPC Systems

Dino Conciatore; Elia Oggian; Federico Da Forno; Jerome Tissieres; Joost VandeVondele; Maxime Martinasso; Stefano Schuppli

arxiv: 2604.12599 · v1 · submitted 2026-04-14 · 💻 cs.DC

Beyond Pre-Training: The Full Lifecycle of Foundation Models on HPC Systems

Dino Conciatore , Elia Oggian , Federico Da Forno , Stefano Schuppli , Jerome Tissieres , Joost VandeVondele , Maxime Martinasso This is my paper

Pith reviewed 2026-05-10 14:34 UTC · model grok-4.3

classification 💻 cs.DC

keywords foundation modelsHPCKubernetesfine-tuninginferencehybrid architectureAI lifecyclesupercomputing

0 comments

The pith

A hybrid HPC-cloud platform lets supercomputers run complete foundation model lifecycles.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Pre-training of large foundation models matches the batch-oriented design of traditional HPC, but fine-tuning and inference demand more flexible, service-like access that clashes with pure batch queues. The paper shows how a national supercomputing center can combine its diskless GPU Cray nodes with ordinary virtualized servers under Kubernetes orchestration to support both styles of work in one facility. This hybrid setup is presented as a practical way to turn supercomputers into end-to-end AI platforms rather than limiting them to the first training stage. If the approach scales, researchers and industry users gain a single sovereign environment for training, adapting, and deploying models without moving data off-site.

Core claim

The paper presents a hybrid cloud-native platform that pairs diskless GPU-enabled HPE Cray EX compute nodes with virtualized commodity infrastructure, all managed by Kubernetes, to bridge traditional HPC batch processing with the service-oriented workflows needed for fine-tuning and inference of foundation models.

What carries the argument

The hybrid service architecture that integrates specialized diskless HPC GPU nodes with virtualized commodity hardware under Kubernetes orchestration to handle mixed batch and interactive AI workloads.

If this is right

Supercomputers can host fine-tuning pipelines that use substantial GPU resources yet need more interactive scheduling than pre-training jobs.
Highly available inference services become feasible inside the same HPC facility that performed the pre-training.
User productivity rises because researchers no longer need to export models to separate cloud environments for later lifecycle stages.
Other national facilities gain a concrete blueprint for adding AI-factory capabilities to existing capability-class machines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Sovereign AI programs could keep more of the model lifecycle inside government-funded HPC systems rather than relying on commercial clouds.
Scientific workflows that combine simulation with AI inference might run end-to-end on the same machine without data movement.
The architecture may generalize to other mixed workloads such as interactive data analysis alongside traditional batch simulations.

Load-bearing premise

The hybrid diskless HPC plus virtualized commodity setup orchestrated by Kubernetes can be deployed in production without major performance penalties or operational conflicts.

What would settle it

Measure whether production fine-tuning and inference jobs on the hybrid platform show higher latency, lower GPU utilization, or more frequent failures than equivalent jobs on pure batch HPC or pure cloud systems.

Figures

Figures reproduced from arXiv: 2604.12599 by Dino Conciatore, Elia Oggian, Federico Da Forno, Jerome Tissieres, Joost VandeVondele, Maxime Martinasso, Stefano Schuppli.

**Figure 2.** Figure 2: Kubernetes tenant distribution over host node types. [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Sandbox provisioning and management leveraging IaC and GitOps methodologies. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Alpernetes Cluster hosting inference service workloads [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

read the original abstract

Large-scale pre-training of Foundational Models (FM) constitutes a computationally intensive first phase for enabling AI across diverse scientific and societal applications. This first phase has positioned High-Performance Computing (HPC) facilities as indispensable backbones of "Sovereign AI" initiatives. While the massive throughput requirements of FM pre-training align with the traditional capability-oriented mission of HPC, subsequent phases of the AI lifecycle, typically referred to as fine-tuning and inference, introduce operational paradigms that can conflict with established batch-processing environments. Moreover, these phases are not computationally trivial: they often require substantial high-end compute resources while exhibiting hardware utilization patterns that differ significantly from those of pre-training. This paper addresses the architectural and strategic challenges of operationalizing a complete AI lifecycle within a national supercomputing facility. We present a hybrid cloud-native platform being developed and deployed at the Swiss National Supercomputing Centre (CSCS) that combines diskless GPU-enabled HPE Cray EX compute nodes with virtualized commodity infrastructure. Orchestrated by Kubernetes, this novel service architecture bridges the gap between HPC batch processing and service-oriented workflows. We report our initial investigations into fine-tuning pipelines and highly available inference services, analyzing the associated trade-offs while improving user productivity. Our findings offer a blueprint for enabling supercomputers to integrate "AI Factories" services and workflows, supporting AI innovations into end-to-end scientific and industrial use cases.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a practical report on CSCS's hybrid Kubernetes setup for running full AI model lifecycles on Cray nodes, but it offers design choices without the measurements needed to show the approach actually works.

read the letter

The paper describes an ongoing effort at the Swiss National Supercomputing Centre to build a platform that handles pre-training, fine-tuning, and inference for foundation models on the same HPC system. They use diskless HPE Cray EX GPU nodes combined with virtualized commodity hardware, all managed through Kubernetes to support both batch and service-oriented workflows. What stands out is the clear identification of the mismatch: pre-training loves the high-throughput batch model of HPC, but the later stages need more flexible, always-available resources with different utilization patterns. The authors outline their investigations into fine-tuning pipelines and high-availability inference services, including the trade-offs involved in this hybrid setup. This gives a blueprint that other centers working on sovereign AI initiatives might find directly applicable. The main limitation is the absence of quantitative evidence. The work reports design choices and qualitative analysis but does not include any performance metrics, such as GPU utilization rates, orchestration overhead, or comparisons of productivity against pure batch or pure cloud alternatives. Without those numbers, it's difficult to assess whether the platform truly delivers on bridging the paradigms without the penalties the authors acknowledge as possible risks. This paper is aimed at system architects and operators at HPC facilities who are adapting their infrastructure for AI workloads. Readers interested in real deployment stories will get value from the concrete architecture details, even if they have to look elsewhere for validated performance data. I think it deserves peer review because the topic is timely and the implementation choices are worth discussing, though the authors should be asked to add empirical results in revision.

Referee Report

2 major / 1 minor

Summary. The manuscript presents a hybrid cloud-native platform developed and deployed at the Swiss National Supercomputing Centre (CSCS) that combines diskless GPU-enabled HPE Cray EX compute nodes with virtualized commodity infrastructure, orchestrated by Kubernetes. This architecture is intended to support the full lifecycle of foundation models on HPC systems by bridging traditional batch processing with service-oriented workflows for fine-tuning and inference; the paper reports initial investigations into pipelines and services along with associated trade-off analyses and positions the work as a blueprint for integrating AI factory services into supercomputers.

Significance. If the hybrid platform can be shown through measurements to operate without major performance penalties or conflicts, the work would provide a practical blueprint for national HPC centers to incorporate dynamic AI workloads, supporting sovereign AI initiatives and end-to-end scientific use cases beyond pre-training.

major comments (2)

[Abstract] Abstract: the claims that the platform 'bridges the gap between HPC batch processing and service-oriented workflows' and 'improves user productivity' are not supported by any quantitative benchmarks, GPU utilization data, orchestration latency, throughput deltas, or comparisons against pure-batch or pure-cloud baselines.
[The reported initial investigations into fine-tuning pipelines and highly available inference services] The reported initial investigations into fine-tuning pipelines and highly available inference services: these remain at the level of design choices and qualitative trade-off discussion; no empirical evidence is provided to demonstrate acceptable overheads or absence of operational conflicts in the hybrid diskless-plus-virtualized setup under Kubernetes, which is load-bearing for the central architectural claim.

minor comments (1)

The manuscript would benefit from explicit definitions of terms such as 'AI Factories' and a clearer description of how diskless nodes interact with the virtualized commodity layer.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments correctly identify areas where the current presentation of claims exceeds the empirical content provided. We address each major comment below and describe the revisions we will make to better align the text with the scope of our initial investigations.

read point-by-point responses

Referee: [Abstract] Abstract: the claims that the platform 'bridges the gap between HPC batch processing and service-oriented workflows' and 'improves user productivity' are not supported by any quantitative benchmarks, GPU utilization data, orchestration latency, throughput deltas, or comparisons against pure-batch or pure-cloud baselines.

Authors: We agree that the abstract asserts bridging of workflows and productivity improvements without supporting quantitative evidence. The manuscript reports architectural design and initial qualitative investigations rather than completed benchmark studies. We will revise the abstract to remove or qualify these claims, limiting it to a description of the hybrid platform, the reported design choices, and the positioning as a blueprint for future AI-factory integration. revision: yes
Referee: [The reported initial investigations into fine-tuning pipelines and highly available inference services] The reported initial investigations into fine-tuning pipelines and highly available inference services: these remain at the level of design choices and qualitative trade-off discussion; no empirical evidence is provided to demonstrate acceptable overheads or absence of operational conflicts in the hybrid diskless-plus-virtualized setup under Kubernetes, which is load-bearing for the central architectural claim.

Authors: The referee accurately notes that the investigations are presented through design choices and qualitative trade-off analysis rather than measured overheads or conflict data. Because the platform deployment is at an early stage, the manuscript does not contain such empirical results. We will revise the relevant sections to state explicitly that the work is preliminary, to describe the evaluation framework we intend to apply, and to avoid implying validated performance parity with pure-batch or pure-cloud baselines. revision: yes

standing simulated objections not resolved

Empirical measurements demonstrating acceptable overheads and absence of operational conflicts in the hybrid diskless-plus-virtualized Kubernetes setup on HPE Cray EX nodes

Circularity Check

0 steps flagged

No circularity: purely descriptive systems architecture report

full rationale

The paper presents an architectural blueprint and initial qualitative investigations for a hybrid Kubernetes-orchestrated HPC platform supporting the full AI model lifecycle. It contains no equations, derivations, fitted parameters, predictive models, or quantitative claims that could reduce to their own inputs by construction. All load-bearing statements are design choices and trade-off discussions rather than self-referential results, self-citation chains, or renamed empirical patterns. This is a standard non-circular descriptive systems paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an engineering systems paper with no mathematical derivations, fitted parameters, or postulated entities. No free parameters, axioms, or invented entities are present.

pith-pipeline@v0.9.0 · 5574 in / 1121 out tokens · 33951 ms · 2026-05-10T14:34:25.486141+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages

[1]

Apertus:

2025. Apertus: Democratizing Open and Compliant LLMs for Global Language Environments. arXiv:2509.14233 [cs.CL] https://arxiv.org/abs/2509.14233

work page arXiv 2025
[2]

Argo Project. 2026. Argo CD: Declarative GitOps Continuous Delivery for Kubernetes. https://argo-cd.readthedocs.io. Accessed: 2026-03-16

work page 2026
[3]

BerriAI. 2024. LiteLLM: Open Source LLM Gateway to Call 100+ LLM APIs in a Unified Format. https://github.com/BerriAI/litellm. Proxy and SDK that provides a unified interface to multiple large language model providers such as OpenAI, Anthropic, Azure, and HuggingFace

work page 2024
[4]

Canonical Ltd. 2026. MAAS: Metal-as-a-Service for Bare-Metal Provisioning. https://maas.io. Accessed: 2026-03-16

work page 2026
[5]

Antony Chazapis, Fotis Nikolaidis, Manolis Marazakis, and Angelos Bilas. 2023. Running Kubernetes Workloads on HPC. InHigh Performance Computing: ISC High Performance 2023 International Workshops, Hamburg, Germany, May 21–25, 2023, Revised Selected Papers(Hamburg, Germany). Springer-Verlag, Berlin, Hei- delberg, 181–192. doi:10.1007/978-3-031-40843-4_14

work page doi:10.1007/978-3-031-40843-4_14 2023
[6]

European Commision. 2025. AI Factories, Bridging AI Innovation and Trust. https://digital-strategy.ec.europa.eu/en/policies/ai-factories

work page 2025
[7]

2026.ColonyOS: Distributed Meta-Orchestrator

ColonyOS Contributors. 2026.ColonyOS: Distributed Meta-Orchestrator. https: //github.com/colonyos/colonies

work page 2026
[8]

2025.Model Spinning: FirecREST and CI/CD for hot model avail- ability

Elia Palme et al. 2025.Model Spinning: FirecREST and CI/CD for hot model avail- ability. https://github.com/swiss-ai/model-spinning Accessed: 2026-03-15

work page 2025
[9]

ETH Zurich, EPFL, and Swiss National Supercomputing Centre (CSCS). 2023. The Swiss AI Initiative. https://www.swiss-ai.org/. National open-science initiative to develop transparent and trustworthy foundation models using large-scale compute on the Alps supercomputer

work page 2023
[10]

European Centre for Medium-Range Weather Forecasts (ECMWF) and Con- sortium. 2025. WeatherGenerator: Building a European Foundation Model for Weather and Climate. https://weathergenerator.eu/. EU Horizon Europe Project, Grant Agreement No. 101187947; aims to develop an AI-driven Earth system model for improved weather and climate prediction, renewable ene...

work page 2025
[11]

Gruntwork. 2026. Terragrunt: Thin Wrapper for Terraform/OpenTofu. https: //terragrunt.gruntwork.io. Accessed: 2026-03-16

work page 2026
[12]

HashiCorp. 2026. Vault: Secrets Management and Data Protection. https://www. vaultproject.io. Accessed: 2026-03-16

work page 2026
[13]

Suhas Kotha, Jacob Mitchell Springer, and Aditi Raghunathan. 2024. Under- standing Catastrophic Forgetting in Language Models via Implicit Inference. arXiv:2309.10105 [cs.CL] https://arxiv.org/abs/2309.10105

work page arXiv 2024
[14]

Pedro Garcia Lopez, Daniel Barcelona Pons, Marcin Copik, Torsten Hoefler, Ed- uardo Quiñones, Maciej Malawski, Peter Pietzutch, Alberto Marti, Thomas Ohlson Timoudas, and Aleksander Slominski. 2025. AI Factories: It’s time to rethink the Cloud-HPC divide. (2025). arXiv:2509.12849 [cs.DC] https://arxiv.org/abs/2509. 12849

work page arXiv 2025
[15]

Alam, and Thomas C

Maxime Martinasso, Mark Klein, Benjamin Cumming, Miguel Gila, Felipe Cruz, Alberto Madonna, Manuel Sopena Ballesteros, Sadaf R. Alam, and Thomas C. Schulthess. 2024. Versatile Software-Defined Cluster for HPC Using Cloud Ab- stractions.Computing in Science & Engineering26, 3 (2024), 20–29. doi:10.1109/ MCSE.2024.3394164

work page arXiv 2024
[16]

Maxime Martinasso, Mark Klein, and Thomas Schulthess. 2025. Alps, a ver- satile research infrastructure. InProceedings of the Cray User Group (CUG ’25). Association for Computing Machinery, New York, NY, USA, 156–165. doi:10.1145/3757348.3757365

work page doi:10.1145/3757348.3757365 2025
[17]

Andre Merzky, Mikhail Titov, Matteo Turilli, Ozgur Kilic, Tianle Wang, and Shantenu Jha. 2025. Scalable Runtime Architecture for Data-driven, Hybrid HPC and ML Workflow Applications. (2025). arXiv:2503.13343 [cs.DC] https: //arxiv.org/abs/2503.13343

work page arXiv 2025
[18]

OpenCHAMI Project. [n. d.]. OpenCHAMI: Open-Source Toolkit for HPC and AI Infrastructure Management. https://www.openchami.org. Cloud-native, composable microservice platform for provisioning and managing HPC and AI systems. Accessed: 2026-03-15

work page 2026
[19]

OpenTofu Project. 2026. OpenTofu: Open Source Infrastructure as Code Tool. https://opentofu.org. Accessed: 2026-03-16

work page 2026
[20]

Elia Palme, Juan Pablo Dorsch, Ali Khosravi, Giovanni Pizzi, Francesco Pag- namenta, Andrea Ceriani, Eirini Koutsaniti, Rafael Sarmiento, Ivano Bone- sana, and Alejandro Dabin. 2025. FirecREST v2: lessons learned from re- designing an API for scalable HPC resource access. arXiv:2512.11634 [cs.DC] https://arxiv.org/abs/2512.11634

work page arXiv 2025
[21]

Bhagyajit Pingua, Adyakanta Sahoo, Meenakshi Kandpal, Deepak Murmu, Jyotir- mayee Rautaray, Rabindra Kumar Barik, and Manob Jyoti Saikia. 2025. Medical LLMs: Fine-Tuning vs. Retrieval-Augmented Generation.Bioengineering12, 7 (2025). doi:10.3390/bioengineering12070687

work page doi:10.3390/bioengineering12070687 2025
[22]

Public AI. 2025. With Love, From Switzerland: Launching Apertus. https:// publicai.co/stories/apertus. Announcement of the Apertus open multilingual large language model and its deployment on the Public AI Inference Utility

work page 2025
[23]

PyTorch Contributors. 2017. Training a Classifier. https://docs.pytorch.org/ tutorials/beginner/blitz/cifar10_tutorial.html. Official PyTorch tutorial demon- strating image classification on the CIFAR-10 dataset using torchvision and a convolutional neural network

work page 2017
[24]

Fine-grained application energy and power measurements on the frontier exascale system,

Stefano Schuppli, Fawzi Mohamed, Henrique Mendonca, Nina Mujkanovic, Elia Palme, Dino Conciatore, Lukas Drescher, Miguel Gila, Pim Witlox, Joost Vande- Vondele, Maxime Martinasso, Thomas C. Schulthess, and Torsten Hoefler. 2025. Evolving HPC services to enable ML workloads on HPE Cray EX. InProceedings of the Cray User Group (CUG ’25). Association for Com...

work page doi:10.1145/3757348.3757366 2025
[25]

SUSE. 2026. Rancher: Enterprise Kubernetes Management Platform. https: //rancher.com/. Accessed: 2026-03-16

work page 2026
[26]

SUSE. 2026. SUSE Virtualization (formerly Harvester): Hyperconverged Infras- tructure Platform. https://www.suse.com/products/virtualization/. Accessed: 2026-03-16

work page 2026
[27]

Swiss AI Initiative. 2025. Apertus-70B-2509. https://huggingface.co/swiss-ai/ Apertus-70B-2509. Model card and weights for the Apertus open multilingual large language model (70B parameters)

work page 2025
[28]

Tim Trappen, Robert Keßler, Roland Pabel, Viktor Achter, and Stefan Wesner. 2025. Automated Dynamic AI Inference Scaling on HPC-Infrastructure: Integrating Kubernetes, Slurm and vLLM. (Dec. 2025), 13–18. doi:10.1145/3774902.3776632

work page doi:10.1145/3774902.3776632 2025
[29]

vLLM Project Contributors. 2025. vLLM Production Stack: Reference Stack for Production LLM Inference. https://github.com/vllm-project/production-stack. Open-source Kubernetes-native reference implementation for scalable LLM in- ference built on top of vLLM, including routing, KV-cache management, and observability components

work page 2025
[30]

Waldur Project. [n. d.]. Waldur: Open-Source Platform for Managing Hybrid Cloud and HPC Resources. https://waldur.com. Platform providing automation, self-service portals, billing, and resource management for private clouds, public clouds, and HPC infrastructures. Accessed: 2026-03-15

work page 2026
[31]

2025.How SwissAI Uses OpenTela for Scalable LLM Serving

Xiaozhe Yao. 2025.How SwissAI Uses OpenTela for Scalable LLM Serving. https: //about.yao.sh/posts/opentela-swissai/ Accessed: 2026-03-15

work page 2025

[1] [1]

Apertus:

2025. Apertus: Democratizing Open and Compliant LLMs for Global Language Environments. arXiv:2509.14233 [cs.CL] https://arxiv.org/abs/2509.14233

work page arXiv 2025

[2] [2]

Argo Project. 2026. Argo CD: Declarative GitOps Continuous Delivery for Kubernetes. https://argo-cd.readthedocs.io. Accessed: 2026-03-16

work page 2026

[3] [3]

BerriAI. 2024. LiteLLM: Open Source LLM Gateway to Call 100+ LLM APIs in a Unified Format. https://github.com/BerriAI/litellm. Proxy and SDK that provides a unified interface to multiple large language model providers such as OpenAI, Anthropic, Azure, and HuggingFace

work page 2024

[4] [4]

Canonical Ltd. 2026. MAAS: Metal-as-a-Service for Bare-Metal Provisioning. https://maas.io. Accessed: 2026-03-16

work page 2026

[5] [5]

Antony Chazapis, Fotis Nikolaidis, Manolis Marazakis, and Angelos Bilas. 2023. Running Kubernetes Workloads on HPC. InHigh Performance Computing: ISC High Performance 2023 International Workshops, Hamburg, Germany, May 21–25, 2023, Revised Selected Papers(Hamburg, Germany). Springer-Verlag, Berlin, Hei- delberg, 181–192. doi:10.1007/978-3-031-40843-4_14

work page doi:10.1007/978-3-031-40843-4_14 2023

[6] [6]

European Commision. 2025. AI Factories, Bridging AI Innovation and Trust. https://digital-strategy.ec.europa.eu/en/policies/ai-factories

work page 2025

[7] [7]

2026.ColonyOS: Distributed Meta-Orchestrator

ColonyOS Contributors. 2026.ColonyOS: Distributed Meta-Orchestrator. https: //github.com/colonyos/colonies

work page 2026

[8] [8]

2025.Model Spinning: FirecREST and CI/CD for hot model avail- ability

Elia Palme et al. 2025.Model Spinning: FirecREST and CI/CD for hot model avail- ability. https://github.com/swiss-ai/model-spinning Accessed: 2026-03-15

work page 2025

[9] [9]

ETH Zurich, EPFL, and Swiss National Supercomputing Centre (CSCS). 2023. The Swiss AI Initiative. https://www.swiss-ai.org/. National open-science initiative to develop transparent and trustworthy foundation models using large-scale compute on the Alps supercomputer

work page 2023

[10] [10]

European Centre for Medium-Range Weather Forecasts (ECMWF) and Con- sortium. 2025. WeatherGenerator: Building a European Foundation Model for Weather and Climate. https://weathergenerator.eu/. EU Horizon Europe Project, Grant Agreement No. 101187947; aims to develop an AI-driven Earth system model for improved weather and climate prediction, renewable ene...

work page 2025

[11] [11]

Gruntwork. 2026. Terragrunt: Thin Wrapper for Terraform/OpenTofu. https: //terragrunt.gruntwork.io. Accessed: 2026-03-16

work page 2026

[12] [12]

HashiCorp. 2026. Vault: Secrets Management and Data Protection. https://www. vaultproject.io. Accessed: 2026-03-16

work page 2026

[13] [13]

Suhas Kotha, Jacob Mitchell Springer, and Aditi Raghunathan. 2024. Under- standing Catastrophic Forgetting in Language Models via Implicit Inference. arXiv:2309.10105 [cs.CL] https://arxiv.org/abs/2309.10105

work page arXiv 2024

[14] [14]

Pedro Garcia Lopez, Daniel Barcelona Pons, Marcin Copik, Torsten Hoefler, Ed- uardo Quiñones, Maciej Malawski, Peter Pietzutch, Alberto Marti, Thomas Ohlson Timoudas, and Aleksander Slominski. 2025. AI Factories: It’s time to rethink the Cloud-HPC divide. (2025). arXiv:2509.12849 [cs.DC] https://arxiv.org/abs/2509. 12849

work page arXiv 2025

[15] [15]

Alam, and Thomas C

Maxime Martinasso, Mark Klein, Benjamin Cumming, Miguel Gila, Felipe Cruz, Alberto Madonna, Manuel Sopena Ballesteros, Sadaf R. Alam, and Thomas C. Schulthess. 2024. Versatile Software-Defined Cluster for HPC Using Cloud Ab- stractions.Computing in Science & Engineering26, 3 (2024), 20–29. doi:10.1109/ MCSE.2024.3394164

work page arXiv 2024

[16] [16]

Maxime Martinasso, Mark Klein, and Thomas Schulthess. 2025. Alps, a ver- satile research infrastructure. InProceedings of the Cray User Group (CUG ’25). Association for Computing Machinery, New York, NY, USA, 156–165. doi:10.1145/3757348.3757365

work page doi:10.1145/3757348.3757365 2025

[17] [17]

Andre Merzky, Mikhail Titov, Matteo Turilli, Ozgur Kilic, Tianle Wang, and Shantenu Jha. 2025. Scalable Runtime Architecture for Data-driven, Hybrid HPC and ML Workflow Applications. (2025). arXiv:2503.13343 [cs.DC] https: //arxiv.org/abs/2503.13343

work page arXiv 2025

[18] [18]

OpenCHAMI Project. [n. d.]. OpenCHAMI: Open-Source Toolkit for HPC and AI Infrastructure Management. https://www.openchami.org. Cloud-native, composable microservice platform for provisioning and managing HPC and AI systems. Accessed: 2026-03-15

work page 2026

[19] [19]

OpenTofu Project. 2026. OpenTofu: Open Source Infrastructure as Code Tool. https://opentofu.org. Accessed: 2026-03-16

work page 2026

[20] [20]

Elia Palme, Juan Pablo Dorsch, Ali Khosravi, Giovanni Pizzi, Francesco Pag- namenta, Andrea Ceriani, Eirini Koutsaniti, Rafael Sarmiento, Ivano Bone- sana, and Alejandro Dabin. 2025. FirecREST v2: lessons learned from re- designing an API for scalable HPC resource access. arXiv:2512.11634 [cs.DC] https://arxiv.org/abs/2512.11634

work page arXiv 2025

[21] [21]

Bhagyajit Pingua, Adyakanta Sahoo, Meenakshi Kandpal, Deepak Murmu, Jyotir- mayee Rautaray, Rabindra Kumar Barik, and Manob Jyoti Saikia. 2025. Medical LLMs: Fine-Tuning vs. Retrieval-Augmented Generation.Bioengineering12, 7 (2025). doi:10.3390/bioengineering12070687

work page doi:10.3390/bioengineering12070687 2025

[22] [22]

Public AI. 2025. With Love, From Switzerland: Launching Apertus. https:// publicai.co/stories/apertus. Announcement of the Apertus open multilingual large language model and its deployment on the Public AI Inference Utility

work page 2025

[23] [23]

PyTorch Contributors. 2017. Training a Classifier. https://docs.pytorch.org/ tutorials/beginner/blitz/cifar10_tutorial.html. Official PyTorch tutorial demon- strating image classification on the CIFAR-10 dataset using torchvision and a convolutional neural network

work page 2017

[24] [24]

Fine-grained application energy and power measurements on the frontier exascale system,

Stefano Schuppli, Fawzi Mohamed, Henrique Mendonca, Nina Mujkanovic, Elia Palme, Dino Conciatore, Lukas Drescher, Miguel Gila, Pim Witlox, Joost Vande- Vondele, Maxime Martinasso, Thomas C. Schulthess, and Torsten Hoefler. 2025. Evolving HPC services to enable ML workloads on HPE Cray EX. InProceedings of the Cray User Group (CUG ’25). Association for Com...

work page doi:10.1145/3757348.3757366 2025

[25] [25]

SUSE. 2026. Rancher: Enterprise Kubernetes Management Platform. https: //rancher.com/. Accessed: 2026-03-16

work page 2026

[26] [26]

SUSE. 2026. SUSE Virtualization (formerly Harvester): Hyperconverged Infras- tructure Platform. https://www.suse.com/products/virtualization/. Accessed: 2026-03-16

work page 2026

[27] [27]

Swiss AI Initiative. 2025. Apertus-70B-2509. https://huggingface.co/swiss-ai/ Apertus-70B-2509. Model card and weights for the Apertus open multilingual large language model (70B parameters)

work page 2025

[28] [28]

Tim Trappen, Robert Keßler, Roland Pabel, Viktor Achter, and Stefan Wesner. 2025. Automated Dynamic AI Inference Scaling on HPC-Infrastructure: Integrating Kubernetes, Slurm and vLLM. (Dec. 2025), 13–18. doi:10.1145/3774902.3776632

work page doi:10.1145/3774902.3776632 2025

[29] [29]

vLLM Project Contributors. 2025. vLLM Production Stack: Reference Stack for Production LLM Inference. https://github.com/vllm-project/production-stack. Open-source Kubernetes-native reference implementation for scalable LLM in- ference built on top of vLLM, including routing, KV-cache management, and observability components

work page 2025

[30] [30]

Waldur Project. [n. d.]. Waldur: Open-Source Platform for Managing Hybrid Cloud and HPC Resources. https://waldur.com. Platform providing automation, self-service portals, billing, and resource management for private clouds, public clouds, and HPC infrastructures. Accessed: 2026-03-15

work page 2026

[31] [31]

2025.How SwissAI Uses OpenTela for Scalable LLM Serving

Xiaozhe Yao. 2025.How SwissAI Uses OpenTela for Scalable LLM Serving. https: //about.yao.sh/posts/opentela-swissai/ Accessed: 2026-03-15

work page 2025