pith. sign in

arxiv: 2504.10013 · v2 · submitted 2025-04-14 · 💻 cs.DC

Training LLMs on HPC Systems: Best Practices from the OpenGPT-X Project

Pith reviewed 2026-05-22 21:00 UTC · model grok-4.3

classification 💻 cs.DC
keywords LLM trainingHPC systems3D parallelismflash attentiontraining throughputtransformer modelscalability
0
0 comments X

The pith

Throughput measurements show how 3D parallelism configurations and flash attention affect training speed for 7B models on HPC systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reports measured throughput numbers collected while training a 7-billion-parameter transformer under multiple combinations of 3D parallelism. It also quantifies the change in speed when flash attention is enabled. These numbers matter because they supply concrete data points that can inform configuration choices when similar models are trained on large computing clusters. The report further outlines the supporting software stack, profiling methods, and day-to-day operational issues encountered during the runs.

Core claim

Measured throughput data across varied 3D parallelism settings during training of a 7B-parameter model, together with the performance impact of flash attention, constitute the central empirical contribution.

What carries the argument

3D parallelism (the combination of data, tensor, and pipeline parallelism) augmented by the flash attention optimization.

If this is right

  • Particular mixes of the three parallelism dimensions deliver higher tokens processed per second than others.
  • Enabling flash attention reduces memory use and raises overall training throughput.
  • Systematic profiling during runs identifies the dominant bottlenecks in the training pipeline.
  • Careful management of the software environment and job scheduler reduces lost compute time.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same measurement approach could be applied to models larger than 7B parameters to test whether the same configuration trends hold.
  • Repeating the benchmarks on a different hardware generation would reveal how portable the observed optimal settings are.
  • The reported numbers supply a baseline for estimating total compute cost when planning multilingual model training campaigns.

Load-bearing premise

The throughput values recorded on this hardware and model size will remain representative on other hardware or with models of different sizes.

What would settle it

Repeating the exact same parallelism configurations on a second HPC cluster and obtaining throughput rankings that differ from those reported here.

read the original abstract

The training of large language models (LLMs) requires substantial computational resources, complex software stacks, and carefully designed workflows to achieve scalability and efficiency. This report presents best practices and insights gained from the OpenGPT-X project, a German initiative focused on developing open, multilingual LLMs optimized for European languages. We detail the use of high-performance computing (HPC) systems, primarily JUWELS Booster at JSC, for training Teuken-7B, a 7-billion-parameter transformer model. The report covers system architecture, training infrastructure, software choices, profiling and benchmarking tools, as well as engineering and operational challenges. It includes measured throughput data of various configurations of 3D parallelism during training and the impact of features such as flash attention.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript presents best practices from the OpenGPT-X project for training LLMs on HPC systems, focusing on the training of the Teuken-7B model on JUWELS Booster. It details the infrastructure, software stack, 3D parallelism strategies, flash attention implementation, and includes specific measured throughput numbers for different parallelism configurations and feature impacts.

Significance. The provision of real-world measured data from a production-scale training run is a strength, offering practical insights into scaling LLM training on European HPC resources. This can inform similar projects, especially those emphasizing open and multilingual models. The report's value lies in its concrete examples rather than theoretical derivations.

major comments (2)
  1. [Section on 3D Parallelism Configurations] The throughput measurements for different 3D parallelism setups are presented without error bars or details on experimental repetitions. This weakens the ability to confidently recommend specific configurations as best practices based on the data.
  2. [Discussion of Best Practices] The generalization of the reported practices beyond the JUWELS Booster system and 7B model size is not addressed. Since the central claim is to provide best practices, evidence or discussion on how these would apply to other hardware or scales is needed to support the claim.
minor comments (1)
  1. [Abstract] Consider adding one or two key quantitative results to the abstract to better convey the paper's empirical contributions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and have revised the manuscript accordingly to strengthen the presentation of our results and claims.

read point-by-point responses
  1. Referee: The throughput measurements for different 3D parallelism setups are presented without error bars or details on experimental repetitions. This weakens the ability to confidently recommend specific configurations as best practices based on the data.

    Authors: We agree that additional details on measurement methodology would improve confidence in the reported numbers. The throughput values were collected during extended stable phases of production training runs on JUWELS Booster, where full repeated trials with statistical error bars are resource-prohibitive. In the revised manuscript we have added a paragraph describing the measurement protocol (duration of each stable window, number of independent configuration tests performed where feasible, and observed run-to-run variability). We also note the practical constraints of production-scale experiments. This provides necessary context while preserving the original data. revision: partial

  2. Referee: The generalization of the reported practices beyond the JUWELS Booster system and 7B model size is not addressed. Since the central claim is to provide best practices, evidence or discussion on how these would apply to other hardware or scales is needed to support the claim.

    Authors: We acknowledge that the manuscript would benefit from explicit discussion of transferability. While the concrete numbers are tied to JUWELS Booster and the 7B scale, the underlying engineering choices (3D parallelism decomposition, software stack selection, and Flash Attention integration) rest on general principles of communication-computation overlap and memory hierarchy optimization. In the revised version we have added a dedicated subsection that discusses applicability to other European HPC systems with different interconnects, to larger model sizes, and to alternative accelerator architectures, including both the transferable elements and the system-specific caveats. revision: yes

Circularity Check

0 steps flagged

No circularity detected; purely descriptive empirical report of measurements and project practices

full rationale

The manuscript is a project report presenting measured throughput numbers for 3D parallelism configurations and flash-attention impact on Teuken-7B running on JUWELS Booster, along with descriptions of system architecture, software choices, and operational challenges. No derivations, equations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text. All content consists of direct reporting of project-specific data and choices with no load-bearing claims that reduce to inputs by construction. This matches the default expectation for non-circular empirical reports; the generalization limitation noted by the reader is a correctness/scope issue, not circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical model, free parameters, or invented entities; the work is an applied engineering report on infrastructure choices and measurements.

pith-pipeline@v0.9.0 · 5664 in / 926 out tokens · 39958 ms · 2026-05-22T21:00:44.180964+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

63 extracted references · 63 canonical work pages · 23 internal anchors

  1. [1]

    JUWELS: Modular Tier-0/1 Supercompute r at the Jülich Supercomputing Centre

    Dorian Krause. “JUWELS: Modular Tier-0/1 Supercompute r at the Jülich Supercomputing Centre”. In: Journal of large-scale research facilities 5 (2019), A135. /d.sc/o.sc/i.sc: 10.17815/jlsrf-5-171. /u.sc/r.sc/l.sc: https://jlsrf.org/index.php/lsf/article/view/171

  2. [2]

    Teuken-7B-Base & Teuken-7B-Instruct: Towards European LLMs

    Mehdi Ali et al. Teuken-7B-Base & Teuken-7B-Instruct: Towards European LLMs. 2024. arXiv: 2410.03730 [cs.CL]. /u.sc/r.sc/l.sc: https://arxiv.org/abs/2410.03730

  3. [3]

    Attention Is All You Need

    Ashish Vaswani et al. Attention Is All You Need. 2023. arXiv: 1706.03762 [cs.CL]. /u.sc/r.sc/l.sc: https://arxiv.org/abs/1706.03762

  4. [4]

    A mathematician’s introduction to transformers and large l anguage models

    Carolin Penke. A mathematician’s introduction to transformers and large l anguage models . JSC Accelerating Devices Lab Blog (online). July 2022. /d.sc/o.sc/i.sc: 10.34732/xdvblg-qsbtyx

  5. [5]

    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

    Jacob Devlin et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Un- derstanding. 2019. arXiv: 1810.04805 [cs.CL]. /u.sc/r.sc/l.sc: https://arxiv.org/abs/1810.04805

  6. [6]

    Language Models are Unsupervised Mult itask Learners

    Alec Radford et al. “Language Models are Unsupervised Mult itask Learners”. In: OpenAI (2019). Accessed: 2024-11-15./u.sc/r.sc/l.sc: https://cdn.openai.com/better-language-models/language _models_are_unsup

  7. [7]

    Language Models are Few-Shot Learners

    Tom B. Brown et al. Language Models are Few-Shot Learners. 2020. arXiv: 2005.14165 [cs.CL]. /u.sc/r.sc/l.sc: https://arxiv.org/abs/2005.14165

  8. [8]

    Training language models to follow instructions with human feedback

    Long Ouyang et al. Training language models to follow instructions with human feedback

  9. [9]

    Training language models to follow instructions with human feedback

    arXiv: 2203.02155 [cs.CL]. /u.sc/r.sc/l.sc: https://arxiv.org/abs/2203.02155

  10. [10]

    Fine-Tuning Language Models from Human Preferences

    Daniel M. Ziegler et al. Fine-Tuning Language Models from Human Preferences . 2020. arXiv: 1909.08593 [cs.CL]. /u.sc/r.sc/l.sc: https://arxiv.org/abs/1909.08593

  11. [11]

    Learning to summarize from human feedback

    Nisan Stiennon et al. Learning to summarize from human feedback. 2022. arXiv: 2009.01325 [cs.CL]. /u.sc/r.sc/l.sc: https://arxiv.org/abs/2009.01325. 12 Carolin Penke, Chelsea Maria John, Jan Ebert, Stefan Kessel heim, and Andreas Herten

  12. [12]

    LLaMA: Open and Efficient Foundation Language Models

    Hugo Touvron et al. LLaMA: Open and Efficient Foundation Language Models . 2023. arXiv: 2302.13971 [cs.CL]. /u.sc/r.sc/l.sc: https://arxiv.org/abs/2302.13971

  13. [13]

    DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

    DeepSeek-AI et al. DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of -Experts Lan- guage Model. 2024. arXiv: 2405.04434 [cs.CL]. /u.sc/r.sc/l.sc: https://arxiv.org/abs/2405.04434

  14. [14]

    OpenGPT-X - Training Large Language Mod els on HPC Systems

    Carolin Penke et al. “OpenGPT-X - Training Large Language Mod els on HPC Systems”. In: 14th JLESC Workshop, Urbana-Champaign (USA), 28 Sep 2022 - 30 Sep 2022. Sept. 28, 2022. /u.sc/r.sc/l.sc: https://juser.fz-juelich.de/record/910080

  15. [15]

    Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning

    Malte Ostendorff and Georg Rehm. Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning. 2023. arXiv: 2301.09626 [cs.CL]. /u.sc/r.sc/l.sc: https://arxiv.org/abs/2301.09626

  16. [18]

    Tokenizer Choice For LLM Training: Negligible or Crucial?2024

    Mehdi Ali et al. Tokenizer Choice For LLM Training: Negligible or Crucial?2024. arXiv: 2310.08754 [cs.LG]. /u.sc/r.sc/l.sc: https://arxiv.org/abs/2310.08754

  17. [19]

    OpenGPT-X: Novel Ar chitecture Exploration

    Chelsea Maria John and Andreas Herten. “OpenGPT-X: Novel Ar chitecture Exploration”. In: WHPC Workshop at SC23 (WHPC@SC23). Denver, CO: Zenodo, Nov. 2023./d.sc/o.sc/i.sc: 10.5281/zenodo.10116242. /u.sc/r.sc/l.sc: https://doi.org/10.5281/zenodo.10116242

  18. [20]

    Investigating Multilingual Instruction-Tuning: Do Polyglot Mod- els Demand for Multilingual Instructions? 2024

    Alexander Arno Weber et al. Investigating Multilingual Instruction-Tuning: Do Polyglot Mod- els Demand for Multilingual Instructions? 2024. arXiv: 2402.13703 [cs.CL]. /u.sc/r.sc/l.sc: https://arxiv.org/abs/2402.13703

  19. [21]

    ILLUMINER: Instruction-tuned Large Language Models as Few -shot In- tent Classifier and Slot Filler

    Paramita Mirza et al. ILLUMINER: Instruction-tuned Large Language Models as Few -shot In- tent Classifier and Slot Filler. 2024. arXiv: 2403.17536 [cs.CL]. /u.sc/r.sc/l.sc: https://arxiv.org/abs/2403.17536

  20. [22]

    Knowledge-Centric Hallucination Detection

    Martin Courtois et al. “Symmetric Dot-Product Attentio n for Efficient Training of BERT Lan- guage Models”. In: Findings of the Association for Computational Linguistics: ACL 2024. Ed. by Lun-Wei Ku, Andre Martins, and Vivek Srikumar. Bangkok, Thailand: Association for Com- putational Linguistics, Aug. 2024, pp. 8002–8011. /d.sc/o.sc/i.sc: 10.18653/v1/2024....

  21. [23]

    LLM-Datasets: An Open Framework for Pretraining Datasets of Large Language Models

    Malte Ostendorff et al. “LLM-Datasets: An Open Framework for Pretraining Datasets of Large Language Models”. In: First Conference on Language Modeling. 2024. /u.sc/r.sc/l.sc: https://openreview.net/forum?id=5RdIMlGLXL

  22. [24]

    Performance and Power: Systematic Evaluation of AI Workloa ds on Accelerators with CARAML

    Chelsea Maria John et al. Performance and Power: Systematic Evaluation of AI Workloa ds on Accelerators with CARAML. 2024. arXiv: 2409.12994 [cs.AR]. /u.sc/r.sc/l.sc: https://arxiv.org/abs/2409.12994

  23. [25]

    Towards Multilingual LLM Evaluation for European Languages

    Klaudia Thellmann et al. Towards Multilingual LLM Evaluation for European Languages. 2024. arXiv: 2410.08928 [cs.CL]. /u.sc/r.sc/l.sc: https://arxiv.org/abs/2410.08928

  24. [26]

    Data Processing for the OpenGPT-X Model Family

    Nicolo’ Brandizzi et al. Data Processing for the OpenGPT-X Model Family. 2024. arXiv: 2410.08800 [cs.CL]. /u.sc/r.sc/l.sc: https://arxiv.org/abs/2410.08800

  25. [27]

    OpenGPT-X: Leveraging GCS Infrastructur e for European Large Language Models

    Jan Ebert et al. “OpenGPT-X: Leveraging GCS Infrastructur e for European Large Language Models”. In: NIC Symposium 2025 Proceedings . Ed. by Christine Peter, Marcus Müller, and Alexander Trautmann. Vol. 52. NIC Series. To appear. Jülich, Germany: Forschungszentrum Jülich GmbH, 2025

  26. [28]

    Training Compute-Optimal Large Language Models

    Jordan Hoffmann et al. Training Compute-Optimal Large Language Models. 2022. arXiv: 2203.15556 [cs.CL] /u.sc/r.sc/l.sc: https://arxiv.org/abs/2203.15556

  27. [29]

    RoFormer: Enhanced Transformer with Rotary Position Embedding

    Jianlin Su et al. RoFormer: Enhanced Transformer with Rotary Position Embedding. 2023. arXiv: 2104.09864 [cs.CL]. /u.sc/r.sc/l.sc: https://arxiv.org/abs/2104.09864

  28. [30]

    Mixtral of Experts

    Albert Q. Jiang et al. Mixtral of Experts. 2024. arXiv: 2401.04088 [cs.LG]. /u.sc/r.sc/l.sc: https://arxiv.org/abs/2401.04088

  29. [31]

    OLMoE: Open Mixture-of-Experts Language Models

    Niklas Muennighoff et al. OLMoE: Open Mixture-of-Experts Language Models . 2025. arXiv: 2409.02060 [cs.CL]. /u.sc/r.sc/l.sc: https://arxiv.org/abs/2409.02060

  30. [32]

    GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

    Joshua Ainslie et al. GQA: Training Generalized Multi-Query Transformer Models from Multi- Head Checkpoints. 2023. arXiv: 2305.13245 [cs.CL]. /u.sc/r.sc/l.sc: https://arxiv.org/abs/2305.13245. Best Practices for HPC LLM Training 13

  31. [33]

    OPT: Open Pre-trained Transformer Language Models

    Susan Zhang et al. OPT: Open Pre-trained Transformer Language Models. 2022. arXiv: 2205.01068 [cs.CL]. /u.sc/r.sc/l.sc: https://arxiv.org/abs/2205.01068

  32. [34]

    I am EdgeRunner AI

    Yanli Zhao et al. “PyTorch FSDP: Experiences on Scaling Ful ly Sharded Data Parallel”. In: Proc. VLDB Endow.16.12 (Aug. 2023), pp. 3848–3860. /i.sc/s.sc/s.sc/n.sc: 2150-8097. /d.sc/o.sc/i.sc: 10.14778/3611540.3611569. /u.sc/r.sc/l.sc: https://doi.org/10.14778/3611540.3611569

  33. [35]

    Memory and Bandwidth are All Your Need f or Fully Sharded Data Parallel

    Jiangtao Wang et al. “Memory and Bandwidth are All Your Need f or Fully Sharded Data Parallel”. In: 2nd Workshop on Advancing Neural Network Training: Computational Efficiency, Scalability, and Resource Optimization (W ANT@ICML 2024). 2024. /u.sc/r.sc/l.sc: https://openreview.net/forum?id=qqV

  34. [36]

    ZeRO: Memory Optimizations Toward Training Trillion Parameter Models

    Samyam Rajbhandari et al. ZeRO: Memory Optimizations Toward Training Trillion Parameter Models. 2020. arXiv: 1910.02054 [cs.LG]. /u.sc/r.sc/l.sc: https://arxiv.org/abs/1910.02054

  35. [39]

    FlashAttention: Fast and Memory-Efficient Exact Attention w ith IO-A wareness

    Tri Dao et al. FlashAttention: Fast and Memory-Efficient Exact Attention w ith IO-A wareness

  36. [40]

    FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

    arXiv: 2205.14135 [cs.LG]. /u.sc/r.sc/l.sc: https://arxiv.org/abs/2205.14135

  37. [41]

    FlashAttention-2: Faster Attention with Better Paralleli sm and Work Partitioning

    Tri Dao. FlashAttention-2: Faster Attention with Better Paralleli sm and Work Partitioning

  38. [42]

    arXiv: 2307.08691 [cs.LG]

  39. [43]

    Research without Re-search: Maximal Update Parametrizati on Yields Accurate Loss Prediction across Scales

    Yiqun Yao and Yequan Wang. Research without Re-search: Maximal Update Parametrizati on Yields Accurate Loss Prediction across Scales . 2023. arXiv: 2304.06875 [cs.CL]

  40. [44]

    Efficiently Scaling Transformer Inference

    Reiner Pope et al. Efficiently Scaling Transformer Inference . 2022. arXiv: 2211.05102 [cs.LG]. /u.sc/r.sc/l.sc: https://arxiv.org/abs/2211.05102

  41. [45]

    Mamba: Linear-Time Sequence Modeling with Selective State Spaces

    Albert Gu and Tri Dao. Mamba: Linear-Time Sequence Modeling with Selective State Spaces

  42. [46]

    Mamba: Linear-Time Sequence Modeling with Selective State Spaces

    arXiv: 2312.00752 [cs.LG]. /u.sc/r.sc/l.sc: https://arxiv.org/abs/2312.00752

  43. [47]

    xlstm: Extended long short-term memory

    Maximilian Beck et al. xLSTM: Extended Long Short-Term Memory. 2024. arXiv: 2405.04517 [cs.LG]. /u.sc/r.sc/l.sc: https://arxiv.org/abs/2405.04517

  44. [48]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    Alexey Dosovitskiy et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. 2021. arXiv: 2010.11929 [cs.CV]. /u.sc/r.sc/l.sc: https://arxiv.org/abs/2010.11929

  45. [49]

    Mod ular Supercomputing Architecture: From Idea to Production

    Estela Suarez, Norbert Eicker, and Thomas Lippert. “Mod ular Supercomputing Architecture: From Idea to Production”. In: May 2019, pp. 223–255. /i.sc/s.sc/b.sc/n.sc: 9781351036863. /d.sc/o.sc/i.sc: 10.1201/9781351036863-9

  46. [50]

    JURECA: Data Centric and Booster module s implementing the modular supercomputing architecture at Jülich Supercomputing Centre

    Philipp Thörnig. “JURECA: Data Centric and Booster module s implementing the modular supercomputing architecture at Jülich Supercomputing Centre ”. In: J. Large-scale Res. Facil. JLSRF 7.A182 (Oct. 2021). /u.sc/r.sc/l.sc: https://doi.org/10.17815/jlsrf-7-182

  47. [51]

    PyTorch: An Imperative Style, High-Performance Deep Learn ing Library

    Adam Paszke et al. PyTorch: An Imperative Style, High-Performance Deep Learn ing Library

  48. [52]

    PyTorch: An Imperative Style, High-Performance Deep Learning Library

    arXiv: 1912.01703 [cs.LG]. /u.sc/r.sc/l.sc: https://arxiv.org/abs/1912.01703

  49. [53]

    TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems

    Martín Abadi et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Soft- ware available from tensorflow.org. 2015. /u.sc/r.sc/l.sc: https://www.tensorflow.org/

  50. [54]

    BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

    BigScience Workshop et al. BLOOM: A 176B-Parameter Open-Access Multilingual Languag e Model. 2023. arXiv: 2211.05100 [cs.CL]. /u.sc/r.sc/l.sc: https://arxiv.org/abs/2211.05100

  51. [55]

    Reducing Activation Recomputation in Large Transformer Mo dels

    Vijay Korthikanti et al. Reducing Activation Recomputation in Large Transformer Mo dels

  52. [56]

    arXiv: 2205.05198 [cs.LG]

  53. [57]

    Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation

    Ofir Press, Noah A. Smith, and Mike Lewis. Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation . 2022. arXiv: 2108.12409 [cs.CL]

  54. [58]

    Adan: Adaptive Nesterov Momentum Algorithm for Faster Opti mizing Deep Models

    Xingyu Xie et al. Adan: Adaptive Nesterov Momentum Algorithm for Faster Opti mizing Deep Models. 2023. arXiv: 2208.06677 [cs.LG]

  55. [59]

    The Ultra-Scale Playbook: Training LLMs on GPU Clusters

    Nouamane Tazi et al. The Ultra-Scale Playbook: Training LLMs on GPU Clusters . 2025

  56. [60]

    Modern Sc ientific Software Manage- ment Using EasyBuild and Lmod

    Markus Geimer, Kenneth Hoste, and Robert McLay. “Modern Sc ientific Software Manage- ment Using EasyBuild and Lmod”. In: 2014 First International Workshop on HPC User Support Tools. 2014, pp. 41–51. /d.sc/o.sc/i.sc: 10.1109/HUST.2014.8

  57. [61]

    Singularity

    Singularity Developers. Singularity. 2021. /d.sc/o.sc/i.sc: 10.5281/zenodo.1310023. /u.sc/r.sc/l.sc: https://doi.org/10.5281/zenodo 14 Carolin Penke, Chelsea Maria John, Jan Ebert, Stefan Kessel heim, and Andreas Herten

  58. [62]

    UFTP: high-perfor mance data transfer for UNI- CORE

    Bernd Thomas Schuller and Tim Pohlmann. “UFTP: high-perfor mance data transfer for UNI- CORE”. In: July 2011, pp. 135–142

  59. [63]

    Yannik Müller et al. LLview. Version v2.3.1-base. July 2024. /d.sc/o.sc/i.sc: 10.5281/zenodo.12706843. /u.sc/r.sc/l.sc: https://doi.org/10.5281/zenodo.12706843

  60. [64]

    Analyzing HPC Monitoring Data With a Vi ew Towards Efficient Re- source Utilization

    Samuel Maloney et al. “Analyzing HPC Monitoring Data With a Vi ew Towards Efficient Re- source Utilization”. In: 2024 IEEE 36th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD). 2024, pp. 170–181. /d.sc/o.sc/i.sc: 10.1109/SBAC-PAD63648.2024.00023

  61. [65]

    OpenGPT-X – Training Large Langua ge Models on HPC Systems

    Chelsea Maria John et al. “OpenGPT-X – Training Large Langua ge Models on HPC Systems”. In: ISC High Performance 2023, Hamburg (Germany), 21 May 2023 - 2 5 May 2023. May 21,

  62. [66]

    /u.sc/r.sc/l.sc: https://juser.fz-juelich.de/record/1007707

    /d.sc/o.sc/i.sc: 10.34732/XDVBLG-SVNDMJ. /u.sc/r.sc/l.sc: https://juser.fz-juelich.de/record/1007707

  63. [67]

    Hilfer fractional advection-diffusion equations with power-law initial condition; a Numerical study using variational iteration method

    Andreas Herten et al. “Application-Driven Exascale: The JUPITER Benchmark Suite”. In: SC24: International Conference for High Performance Compu ting, Networking, Storage and Analysis. IEEE, Nov. 2024, pp. 1–45. /d.sc/o.sc/i.sc: 10.1109/sc41406.2024.00038. /u.sc/r.sc/l.sc: http://dx.doi.org/10.1109/SC41406.2024.00 A APPENDIX Example Slurm job script to lau...