Implementing True MPI Sessions and Evaluating MPI Initialization Scalability
Pith reviewed 2026-05-07 14:03 UTC · model grok-4.3
The pith
True MPI Sessions implemented via MPICH refactoring remove the MPI_COMM_WORLD dependency and improve initialization scalability
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
True MPI Sessions, achieved through architectural refactoring that eliminates internal reliance on a global world communicator, allow explicit hierarchical communicator designs and produce measurable scalability gains in MPI initialization.
What carries the argument
The Sessions model, which builds communicators from process sets without depending on MPI_COMM_WORLD.
If this is right
- Applications written with the Sessions API can initialize without the overhead of constructing a global communicator.
- Hierarchical communicator layouts become usable without global-state costs.
- The traditional world model stays available for backward compatibility.
- Initialization time grows more slowly with process count in the Sessions path.
Where Pith is reading between the lines
- Other MPI implementations may need similar internal decoupling to realize the full intent of the Sessions feature.
- Reduced global state could simplify adding dynamic process management or fault tolerance in future MPI versions.
- The work suggests that minimizing shared global data structures is a general route to better scaling in parallel runtimes.
Load-bearing premise
The refactoring preserves full correctness and performance for all existing applications that continue to use the traditional world communicator model.
What would settle it
A direct measurement of MPI initialization time versus process count, comparing the traditional world-communicator path against the true Sessions path on systems with thousands to millions of processes.
Figures
read the original abstract
Sessions is one of the major features introduced in the MPI-4 standard. It offers an alternative to the traditional world communicator model by allowing applications to construct communicators from process sets, thereby eliminating the dependency on MPI_COMM_WORLD. The Sessions model was proposed as a more scalable solution for exascale systems, where MPI_COMM_WORLD was viewed as a potential scalability bottleneck. However, supporting Sessions is a significant challenge for established codebases like MPICH due to the deep integration of the world model in traditional MPI implementations. Although MPICH added support for the MPI-4 standard upon its release, it still internally relied on a global world communicator. This approach enabled applications written using the Sessions model to function, but it did not fulfill the full design intent of Sessions, which meant to decouple MPI from MPI_COMM_WORLD. We describe MPICH effort to support true MPI Sessions, including a major internal refactoring. We describe the architectural changes required to support true Sessions and evaluate the resulting implementation scalability. Our results demonstrate that true Sessions can offer significant scalability benefits by adopting explicit hierarchical designs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper describes MPICH's implementation of true MPI Sessions via a major internal refactoring that removes the global MPI_COMM_WORLD dependency, allowing communicators to be built from process sets as intended by the MPI-4 standard. It details the required architectural changes and presents an evaluation of MPI initialization scalability, claiming that explicit hierarchical designs in the true Sessions model yield significant scalability benefits over prior 'fake Sessions' approaches that still relied on the world communicator.
Significance. If the reported scalability gains hold under broader testing, this work would be a meaningful contribution to exascale MPI design by validating the Sessions model as a practical, decoupled alternative to the traditional world-communicator approach. It provides concrete evidence from a production implementation that could inform both MPI library developers and application writers targeting large-scale systems.
major comments (1)
- [Evaluation section] Evaluation section: the scalability results and claims focus exclusively on the new Sessions initialization path and hierarchical designs. No data, test-suite results, or performance comparisons are presented for legacy applications that continue to use MPI_COMM_WORLD and the traditional communicator model after the refactoring. This omission is load-bearing because the central claim—that the changes deliver true Sessions without side effects—requires evidence that the legacy paths retain full correctness and incur no additional overhead from the removal of global assumptions.
minor comments (2)
- [Architectural changes] The description of the refactoring would benefit from a clearer before/after diagram or pseudocode showing how global state was eliminated and how process-set-based communicator construction now operates.
- Ensure that all reported scalability numbers include the exact process counts, hardware configuration, and comparison baseline (e.g., pre-refactoring MPICH) so readers can reproduce the claimed benefits.
Simulated Author's Rebuttal
We thank the referee for the thoughtful review and the recommendation for major revision. The point raised about evaluating legacy paths is valid and we will strengthen the manuscript accordingly.
read point-by-point responses
-
Referee: [Evaluation section] Evaluation section: the scalability results and claims focus exclusively on the new Sessions initialization path and hierarchical designs. No data, test-suite results, or performance comparisons are presented for legacy applications that continue to use MPI_COMM_WORLD and the traditional communicator model after the refactoring. This omission is load-bearing because the central claim—that the changes deliver true Sessions without side effects—requires evidence that the legacy paths retain full correctness and incur no additional overhead from the removal of global assumptions.
Authors: We agree that demonstrating the absence of side effects on legacy code is essential to support our claims. Although the primary contribution of the paper is the true Sessions implementation and its scalability advantages, the refactoring was designed to preserve full backward compatibility. In the revised version we will add a new subsection to the Evaluation section that reports: (1) results from the MPICH test suite confirming that all legacy MPI_Init, communicator creation, and collective operations continue to pass without modification, and (2) direct performance comparisons of MPI initialization latency for traditional MPI_COMM_WORLD-based codes before and after the refactoring, showing that the overhead remains within measurement noise. These additions will provide the concrete evidence requested. revision: yes
Circularity Check
No circularity; implementation description and empirical evaluation are self-contained
full rationale
The paper describes an engineering refactoring of MPICH to remove internal reliance on a global MPI_COMM_WORLD communicator and enable true MPI-4 Sessions. It reports architectural changes and scalability measurements from the resulting implementation. No equations, derivations, fitted parameters, or predictions appear. No self-citations are invoked as load-bearing premises for any result. The scalability claim rests on direct evaluation of the new code path rather than any reduction to the paper's own inputs by construction. This matches the default non-circular case for implementation-and-measurement papers.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption MPI-4 Sessions semantics can be realized without a global world communicator
Reference graph
Works this paper leans on
-
[1]
In: 2014 43rd International Conference on Parallel Processing Workshops
Ahn, D.H., Garlick, J., Grondona, M., Lipari, D., Springmeyer, B., Schulz, M.: Flux: A next-generation resource management framework for large HPC centers. In: 2014 43rd International Conference on Parallel Processing Workshops. pp. 9–17. IEEE (2014)
work page 2014
-
[2]
Argonne Leadership Computing Facility: Aurora (2025),https://www.alcf.anl. gov/aurora
work page 2025
-
[3]
Atchley, S., Zimmer, C., Lange, J., Bernholdt, D., Melesse Vergara, V., Beck, T., Brim, M., Budiardja, R., Chandrasekaran, S., Eisenbach, M., et al.: Frontier: Ex- ploring exascale. In: Proceedings of the International Conference for High Perfor- mance Computing, Networking, Storage and Analysis. pp. 1–16 (2023)
work page 2023
-
[4]
In: European MPI Users’ Group Meeting
Balaji, P., Buntinas, D., Goodell, D., Gropp, W., Krishna, J., Lusk, E., Thakur, R.: PMI: A scalable parallel process-management interface for extreme-scale systems. In: European MPI Users’ Group Meeting. pp. 31–41. Springer (2010) 18 H. Zhou et al
work page 2010
-
[5]
Balaji, P., Buntinas, D., Goodell, D., Gropp, W., Kumar, S., Lusk, E., Thakur, R., Träff, J.L.: MPI on a million processors. In: Recent Advances in Parallel Vir- tual Machine and Message Passing Interface: 16th European PVM/MPI Users’ Group Meeting, Espoo, Finland, September 7-10, 2009. Proceedings 16. pp. 20–30. Springer (2009)
work page 2009
-
[6]
Parallel Computing33(9), 634–644 (2007)
Buntinas, D., Mercier, G., Gropp, W.: Implementation and evaluation of shared- memory communication and synchronization operations in MPICH2 using the Nemesis communication subsystem. Parallel Computing33(9), 634–644 (2007)
work page 2007
-
[7]
In: Proceedings of the 24th European MPI Users’ Group Meeting
Castain, R.H., Solt, D., Hursey, J., Bouteiller, A.: PMIx: Process management for exascale environments. In: Proceedings of the 24th European MPI Users’ Group Meeting. pp. 1–10 (2017)
work page 2017
-
[8]
Parallel Computing108, 102827 (2021)
Dosanjh, M.G., Worley, A., Schafer, D., Soundararajan, P., Ghafoor, S., Skjel- lum, A., Bangalore, P.V., Grant, R.E.: Implementation and evaluation of MPI 4.0 partitioned communication libraries. Parallel Computing108, 102827 (2021)
work page 2021
-
[9]
In: International Conference on High Performance Computing
Fecht, J., Schreiber, M., Schulz, M., Pritchard, H., Holmes, D.J.: An emulation layer for dynamic resources with MPI sessions. In: International Conference on High Performance Computing. pp. 147–161. Springer (2022)
work page 2022
-
[10]
Parallel computing 22(6), 789–828 (1996)
Gropp, W., Lusk, E., Doss, N., Skjellum, A.: A high-performance, portable im- plementation of the MPI Message Passing Interface Standard. Parallel computing 22(6), 789–828 (1996)
work page 1996
-
[11]
The Interna- tionalJournalofHighPerformanceComputingApplicationsp.10943420241311608 (2025)
Guo, Y., Raffenetti, K., Zhou, H., Balaji, P., Si, M., Amer, A., Iwasaki, S., Seo, S., Congiu, G., Latham, R., et al.: Preparing MPICH for exascale. The Interna- tionalJournalofHighPerformanceComputingApplicationsp.10943420241311608 (2025)
work page 2025
-
[12]
Hewlett Packard Enterprise: Cray MPICH (2024),https://cpe.ext.hpe.com/ docs/24.03/mpt/mpich/index.html
work page 2024
-
[13]
In: Proceedings of the 23rd European MPI Users’ Group Meeting
Holmes, D., Mohror, K., Grant, R.E., Skjellum, A., Schulz, M., Bland, W., Squyres, J.M.: MPI Sessions: Leveraging runtime infrastructure to increase scalability of applications at exascale. In: Proceedings of the 23rd European MPI Users’ Group Meeting. pp. 121–129 (2016)
work page 2016
-
[14]
Intel Corporation: Intel® MPI Library (2025),https://www.intel.com/content/ www/us/en/developer/tools/oneapi/mpi-library.html
work page 2025
-
[15]
MessagePassingInterfaceForum:MPI:AMessage-PassingInterfaceStandardVer- sion 4.0 (Jun 2021),https://www.mpi-forum.org/docs/mpi-4.0/mpi40-report. pdf
work page 2021
-
[16]
ParTec AG: ParaStation MPI (2025),https://github.com/ParaStation/psmpi
work page 2025
-
[17]
In: 2018 IEEE 4th International Conference on Computer and Communications (ICCC)
Raffenetti, K., Bayyapu, N., Durnov, D., Takagi, M., Balaji, P.: Locality-aware PMI usage for efficient MPI startup. In: 2018 IEEE 4th International Conference on Computer and Communications (ICCC). pp. 624–628. IEEE (2018)
work page 2018
-
[18]
In: Proceedings of the 20th ACM International Conference on Computing Fron- tiers
Rocco, R., Palermo, G., Gregori, D.: Fault awareness in the MPI 4.0 session model. In: Proceedings of the 20th ACM International Conference on Computing Fron- tiers. pp. 189–192 (2023)
work page 2023
-
[19]
Suarez, E., Eicker, N., Hoppe, H.C.: The DEEP-SEA project: A software stack for heterogeneous and modular supercomputers. PARS-Mitteilungen: Vol. 36 (2024)
work page 2024
-
[20]
Thakur, R., Gropp, W., Lusk, E.: Data sieving and collective I/O in ROMIO. In: Proceedings. Frontiers’ 99. Seventh Symposium on the Frontiers of Massively Parallel Computation. pp. 182–189. IEEE (1999)
work page 1999
-
[21]
ohio-state.edu/ Implementing True MPI Sessions 19
The Ohio State University: MVAPICH (2025),https://mvapich.cse. ohio-state.edu/ Implementing True MPI Sessions 19
work page 2025
-
[22]
Future Generation Computer Systems101, 576–589 (2019)
Wozniak, J.M., Dorier, M., Ross, R., Shu, T., Kurc, T., Tang, L., Podhorszki, N., Wolf, M.: MPI jobs within MPI jobs: A practical way of enabling task-level fault- tolerance in HPC workflows. Future Generation Computer Systems101, 576–589 (2019)
work page 2019
-
[23]
In: Workshop on job scheduling strategies for parallel processing
Yoo, A.B., Jette, M.A., Grondona, M.: Slurm: Simple Linux utility for resource management. In: Workshop on job scheduling strategies for parallel processing. pp. 44–60. Springer (2003)
work page 2003
-
[24]
arXiv preprint arXiv:2401.16547 (2024)
Zhou, H., Raffenetti, K., Bland, W., Guo, Y.: Generating bindings in MPICH. arXiv preprint arXiv:2401.16547 (2024)
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.