pith. sign in

arxiv: 2606.09102 · v1 · pith:MZUB3ZL2new · submitted 2026-06-08 · 💻 cs.DC

Concepts in Practice: C++ MPI Bindings for the HPC Ecosystem. From a Standardizable Core to a Composable Interface

Pith reviewed 2026-06-27 15:03 UTC · model grok-4.3

classification 💻 cs.DC
keywords C++ MPI bindingsC++20 conceptsMPI wrappersHPC ecosystemKaMPIngKokkosSYCLGPU integration
0
0 comments X

The pith

A core layer of refined C++20 concepts delivers a low-level native C++ MPI interface that works directly with STL containers and extends to GPU libraries via adapters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes the first concrete layered architecture for modern C++ MPI bindings based on previously proposed design principles using C++20 concepts. At its foundation is a core layer that formalizes MPI data buffers through refined concepts, automatically maps standard C++ constructs, supplies non-intrusive customization points, and provides concept-based procedure wrappers. This produces an extensible low-level interface compatible with STL containers and suitable for standardization. The core then supports a higher-level library with pipe-based syntax and lightweight adapters that allow direct MPI use of Kokkos views, Thrust device vectors, and SYCL buffers. Readers would care because it addresses the long-standing absence of official C++ bindings with a general-purpose, performance-preserving design backed by a reference implementation.

Core claim

The paper presents the first concrete realization of design principles for modern C++ MPI bindings in a layered architecture. At the foundation is a core layer of refined C++20 concepts that formalize the MPI standard's notion of data buffers, enable automatic mapping of standard C++ constructs, supply non-intrusive customization points for third-party types, and supply concept-based wrappers for MPI procedures. The result is a low-level native C++ MPI interface that works directly with STL containers, is highly extensible, and lends itself to standardization. Built on this core is KaMPIng-v2, a library offering convenience and memory-safety with composable pipe-based syntax. The core also s

What carries the argument

The core layer of refined C++20 concepts that formalize MPI's notion of data buffers, map C++ constructs automatically, and supply non-intrusive customization points together with concept-based wrappers for MPI procedures.

If this is right

  • The core layer produces a low-level native C++ MPI interface that accepts STL containers directly without additional boilerplate.
  • KaMPIng-v2 supplies memory-safe MPI programming with composable pipe-based syntax inspired by C++ ranges.
  • Lightweight adapters integrate Kokkos views, Thrust device vectors, and SYCL buffers as first-class participants in MPI calls.
  • The design remains self-contained for third-party libraries and supports potential standardization through its use of standard C++ mechanisms.
  • The architecture demonstrates practical viability through a fully functional open-source reference implementation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The non-intrusive customization points could allow existing C++ codebases to adopt the bindings without modifying their own type definitions.
  • Similar concept-based layering might apply to bindings for other distributed communication standards beyond MPI.
  • The separation of core and adapters could simplify maintenance when new performance-portability libraries emerge.
  • Direct integration of GPU containers into MPI could reduce data movement overhead in heterogeneous HPC applications.

Load-bearing premise

Refined C++20 concepts can formalize MPI's notion of data buffers and provide non-intrusive customization points while preserving performance and compatibility with existing MPI implementations.

What would settle it

A test case in which the concept-based wrappers fail to compile or execute correctly with a standard STL container such as std::vector, or where the adapters for SYCL buffers introduce runtime incompatibility with an existing MPI implementation, would falsify the core claim.

Figures

Figures reproduced from arXiv: 2606.09102 by Daniel Brommer, Matthias Schimek, Tim Niklas Uhl.

Figure 1
Figure 1. Figure 1: Representative examples spanning the three layers. (1) The core interface [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The core buffer protocol. range. For contiguous ranges, mpi::ptr() is defined via std::ranges::data(), sized ranges default mpi::count() to std::ranges::size(), and for ranges whose range_value_t is an MPI builtin type, mpi::type() automatically re￾turns the matching MPI_Datatype. The generalized data buffer concept resolves both shortcomings: the trait level allows any type (GPU containers, MPI_BOTTOM, ar… view at source ↗
Figure 3
Figure 3. Figure 3: Uniform use of MPI communicators across native handles, non-owning [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Concept-based implementation of mpi::allgatherv, illustrating the map￾ping of MPI C calls to the C++ core interface using the dispatch system. It consists of three components: (1) pipe adapters in the style of std::views that attach or override MPI metadata (datatype, count, per-rank counts and displacements), turning arbitrary objects into data buffers or modifying existing ones, and composing freely with… view at source ↗
Figure 5
Figure 5. Figure 5: Standard library and KaMPIng-v2 buffer adaptors. Examples (1) uses [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Non-blocking irecv: a receive buffer is moved into iresult, which prevents access until .wait() returns it. introduced in KaMPIng [23] and formalized in [2]; KaMPIng-v2 realizes it on top of the buffer concept and view pipeline introduced above. We address memory-safety for non-blocking communication through perfect forwarding of buffer arguments combined with a move-only iresult handle. When a buffer is p… view at source ↗
Figure 7
Figure 7. Figure 7: Ecosystem adapters: Kokkos (left) and SYCL (right). [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗
read the original abstract

The official C++ MPI bindings were removed from the standard in 2008, leaving a gap that numerous third-party libraries have attempted to fill. However, existing wrappers typically cover only a limited subset of MPI or target specific use cases, falling short of a general-purpose solution. A recent conceptual paper proposed general design principles for modern C++ bindings based on C++20 concepts, without committing to a concrete interface. We present the first concrete realization of these principles in a layered architecture. At the foundation, we define a core layer: refined C++20 concepts formalizing the MPI standard's notion of data buffers, automatic mapping of standard C++ constructs, non-intrusive customization points for third-party types, and concept-based wrappers for MPI procedures. The result is a low-level native C++ MPI interface that works directly with STL containers, is highly extensible, and lends itself to standardization. Built on this core, we present KaMPIng-v2 -- a C++ MPI library offering the convenience and memory-safety of KaMPIng with composable, pipe-based syntax inspired by C++ ranges for efficient, boilerplate-free MPI programming. Finally, we demonstrate the core layer's broad applicability by designing lightweight adapters for GPU and performance-portability libraries, making the HPC ecosystem a first-class citizen in MPI. Kokkos views, Thrust device vectors, and SYCL buffers can be passed directly to MPI procedures, with adapter logic remaining self-contained. All contributions are backed by a fully functional open-source reference implementation, demonstrating the practical viability of the proposed design.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper presents the first concrete realization of previously proposed design principles for modern C++ MPI bindings. It introduces a layered architecture whose core layer uses refined C++20 concepts to formalize MPI data buffers, support automatic mapping of STL constructs, and provide non-intrusive customization points; this core underpins KaMPIng-v2 (with pipe-based, ranges-inspired syntax) and lightweight adapters that allow direct use of Kokkos views, Thrust device vectors, and SYCL buffers with MPI procedures. All elements are backed by a fully functional open-source reference implementation.

Significance. If the design and implementation hold, the work supplies a practical, extensible, and potentially standardizable low-level C++ MPI interface that directly addresses the gap left by the 2008 removal of official bindings while integrating the HPC ecosystem (including GPU and performance-portability libraries) as first-class citizens. Explicit strengths include the open-source reference implementation, the working adapters for Kokkos/Thrust/SYCL, and the demonstration that C++20 concepts can serve as non-intrusive customization points without altering existing MPI implementations.

major comments (1)
  1. Abstract: the claim that the core layer 'preserves performance and compatibility' while using refined C++20 concepts for buffer handling is load-bearing for the stated practical viability, yet the manuscript supplies no benchmarks, timing data, or comparisons against existing wrappers or raw MPI; this prevents assessment of whether the concept-based dispatch and adapters incur measurable overhead.
minor comments (2)
  1. The manuscript would benefit from an explicit related-work section (or expanded discussion in the introduction) that systematically contrasts the proposed core against the 'numerous third-party libraries' mentioned, citing their coverage limitations.
  2. Notation for the concept definitions and customization points should be clarified with a small table or diagram early in the core-layer description to aid readers unfamiliar with C++20 concepts.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive review and positive assessment of the work's significance. We address the major comment below.

read point-by-point responses
  1. Referee: [—] Abstract: the claim that the core layer 'preserves performance and compatibility' while using refined C++20 concepts for buffer handling is load-bearing for the stated practical viability, yet the manuscript supplies no benchmarks, timing data, or comparisons against existing wrappers or raw MPI; this prevents assessment of whether the concept-based dispatch and adapters incur measurable overhead.

    Authors: We agree that the absence of explicit benchmarks leaves the performance claim unquantified. The core layer is intentionally designed around C++20 concepts to enable zero-overhead static dispatch (resolved entirely at compile time) and thin, non-intrusive adapters that forward directly to the underlying MPI calls without additional copies or runtime indirection. This mirrors the zero-cost abstraction principle used in the STL and ranges library. Nevertheless, to allow readers to verify the claim, we will add a dedicated evaluation section containing microbenchmarks that compare the core layer and the Kokkos/Thrust/SYCL adapters against raw MPI and selected third-party wrappers. The revised manuscript will therefore include timing data and overhead measurements. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents a concrete layered C++ design and reference implementation for MPI bindings, building on a prior conceptual paper's principles without committing to an interface. No equations, fitted parameters, or predictions are involved; the work supplies working code, adapters for Kokkos/Thrust/SYCL, and non-intrusive customization points. The derivation chain consists of design choices instantiated directly in the open-source implementation rather than reducing to self-citation, self-definition, or renamed inputs. The cited conceptual paper is treated as external motivation, not a load-bearing uniqueness theorem or ansatz smuggled in. This is a standard non-circular design/implementation paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The design rests on the assumption that C++20 concepts are suitable for formalizing MPI operations; no free parameters or invented entities are introduced.

axioms (1)
  • domain assumption C++20 concepts can be refined to formalize MPI data buffers and procedures while remaining compatible with the MPI standard
    The abstract states that the core layer uses refined C++20 concepts to formalize MPI notions.

pith-pipeline@v0.9.1-grok · 5827 in / 1249 out tokens · 19887 ms · 2026-06-27T15:03:24.879567+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

23 extracted references · 7 canonical work pages · 1 internal anchor

  1. [1]

    In: IEEE CLUSTER Workshops

    Avans, C.N., Ciesko, J., Pearson, C., Suggs, E.D., Olivier, S.L., Skjellum, A.: Performance insights into supporting Kokkos views in the Kokkos Comm MPI library. In: IEEE CLUSTER Workshops. pp. 186–187 (2024). https: //doi.org/10.1109/CLUSTERWorkshops61563.2024.00051, https://github.com/ kokkos/kokkos-comm

  2. [2]

    In: EuroMPI

    Avans, C.N., Correa, A.A., Ghosh, S., Schimek, M., Schuchart, J., Skjellum, A., Suggs, E.D., Uhl, T.N.: Concepts for designing modern C++ interfaces for MPI. In: EuroMPI. pp. 165–183. Lecture Notes in Computer Science, Springer (2025). https://doi.org/10.1007/978-3-032-07194-1_10

  3. [3]

    Bauke, H.: MPL - a message passing library (2015),https://github.com/rabauke/ mpl

  4. [4]

    In: GPU computing gems Jade edition, pp

    Bell, N., Hoberock, J.: Thrust: A productivity-oriented library for cuda. In: GPU computing gems Jade edition, pp. 359–371. Elsevier (2012)

  5. [5]

    In: IEEE/ACM CCGrid

    Beni, M.S., Crisci, L., Cosenza, B.: EMPI: Enhanced Message Passing Interface in Modern C++. In: IEEE/ACM CCGrid. pp. 141–153 (2023).https://doi.org/10. 1109/CCGrid57682.2023.00023

  6. [6]

    Proceedings of the JuliaCon Conferences1(1), 68 (2021).https://doi

    Byrne, S., Wilcox, L.C., Churavy, V.: Mpi.jl: Julia bindings for the message passing interface. Proceedings of the JuliaCon Conferences1(1), 68 (2021).https://doi. org/10.21105/jcon.00068,https://doi.org/10.21105/jcon.00068

  7. [7]

    Standard Proposal P2996R13, ISO/IEC JTC1/SC22/WG21 (2025), https://www.open-std.org/jtc1/sc22/wg21/docs/ papers/2025/p2996r13.html

    Childers, W., Dimov, P., Katz, D., Revzin, B., Sutton, A., Vali, F., Vande- voorde, D.: Reflection for C++26. Standard Proposal P2996R13, ISO/IEC JTC1/SC22/WG21 (2025), https://www.open-std.org/jtc1/sc22/wg21/docs/ papers/2025/p2996r13.html

  8. [8]

    Correa, A.A.: B-MPI3 (2018),https://github.com/LLNL/b-mpi3

  9. [9]

    CoRRabs/2306.11840(2023)

    Demiralp, A.C., Martin, P., Sakic, N., Krüger, M., Gerrits, T.: A C++20 interface for MPI 4.0. CoRRabs/2306.11840(2023)

  10. [10]

    Journal of Par- allel and Distributed Computing pp

    Edwards, H.C., Trott, C.R., Sunderland, D.: Kokkos: Enabling manycore perfor- mance portability through polymorphic memory access patterns. Journal of Par- allel and Distributed Computing pp. 3202–3216 (2014).https://doi.org/https: //doi.org/10.1016/j.jpdc.2014.07.003, domain-Specific Languages and High- Level Frameworks for High-Performance Computing Con...

  11. [11]

    In: 2021 Workshop on Exascale MPI (ExaMPI)

    Ghosh, S., Alsobrooks, C., Rüfenacht, M., Skjellum, A., Bangalore, P.V., Lumsdaine, A.: Towards modern C++ language support for MPI. In: 2021 Workshop on Exascale MPI (ExaMPI). pp. 27–35. IEEE (2021)

  12. [12]

    Gregor, D., Troyer, M.: Boost.MPI (2005–2007),https://www.boost.org/doc/ libs/1_84_0/doc/html/mpi.html, version 1.84

  13. [13]

    Message Passing Interface Forum: MPI: A Message-Passing Interface Standard Ver- sion 5.0 (Jun 2025),https://www.mpi-forum.org/docs/mpi-5.0/mpi50-report. pdf

  14. [14]

    Standard Proposal P0896R4, ISO/IEC JTC1/SC22/WG21 (Nov 2018),https://www.open-std.org/ jtc1/sc22/wg21/docs/papers/2018/p0896r4.pdf

    Niebler, E., Carter, C., Di Bella, C.: The one ranges proposal. Standard Proposal P0896R4, ISO/IEC JTC1/SC22/WG21 (Nov 2018),https://www.open-std.org/ jtc1/sc22/wg21/docs/papers/2018/p0896r4.pdf

  15. [15]

    com/NVIDIA/cccl, part of the CUDA Core Compute Libraries (CCCL)

    NVIDIA: Thrust: The C++ parallel algorithms library (2025),https://github. com/NVIDIA/cccl, part of the CUDA Core Compute Libraries (CCCL)

  16. [16]

    In: Stotzka, R., Schiffers, M., Cotronis, Y

    Pellegrini, S., Prodan, R., Fahringer, T.: A lightweight C++ interface to MPI. In: Stotzka, R., Schiffers, M., Cotronis, Y. (eds.) Proc. of the 20th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP). pp. 3–10. IEEE (2012).https://doi.org/10.1109/PDP.2012.42

  17. [17]

    Polukhin, A.: Boost.pfr (2016), https://www.boost.org/doc/libs/1_84_0/doc/ html/boost_pfr.html

  18. [18]

    Concurrency and Computation: Practice and Experience13(4), 245–292 (2001).https://doi.org/https://doi.org/10.1002/cpe.556, https:// onlinelibrary.wiley.com/doi/abs/10.1002/cpe.556

    Skjellum, A., Wooley, D.G., Lu, Z., Wolf, M., Bangalore, P.V., Lumsdaine, A., Squyres, J.M., McCandless, B.: Object-oriented analysis and design of the message passing interface. Concurrency and Computation: Practice and Experience13(4), 245–292 (2001).https://doi.org/https://doi.org/10.1002/cpe.556, https:// onlinelibrary.wiley.com/doi/abs/10.1002/cpe.556

  19. [19]

    Steinbusch, B., Gaspar, A., Brown, J.: rsmpi - MPI bindings for rust (2015),https: //github.com/rsmpi/rsmpi

  20. [20]

    Addison-Wesley (1994)

    Stroustrup, B.: The design and evolution of C++. Addison-Wesley (1994)

  21. [21]

    github.io/CppCoreGuidelines/CppCoreGuidelines.html

    Stroustrup, B., Sutter, H., et al.: C++ core guidelines (2024),https://isocpp. github.io/CppCoreGuidelines/CppCoreGuidelines.html

  22. [22]

    Specification Revision 11, The Khronos Group Inc

    The Khronos SYCL Working Group: SYCL 2020 specification. Specification Revision 11, The Khronos Group Inc. (2025),https://registry.khronos.org/SYCL/specs/ sycl-2020/html/sycl-2020.html

  23. [23]

    Hilfer fractional advection-diffusion equations with power-law initial condition; a Numerical study using variational iteration method

    Uhl, T.N., Schimek, M., Hübner, L., Hespe, D., Kurpicz, F., Seemaier, D., Stelz, C., Sanders, P.: KaMPIng: Flexible and (near) zero-overhead C++ bindings for MPI. In: Intl. Conf. for High Performance Computing, Networking, Storage, and Analysis (SC). IEEE (2024).https://doi.org/10.1109/SC41406.2024.00050