pith. sign in

arxiv: 2604.26624 · v1 · submitted 2026-04-29 · 💻 cs.DC

DMRlib: Easy-coding and Efficient Resource Management for Job Malleability

Pith reviewed 2026-05-07 12:41 UTC · model grok-4.3

classification 💻 cs.DC
keywords process malleabilityMPI libraryresource managementjob schedulingelastic computingdata center throughput
0
0 comments X

The pith

DMRlib gives MPI programs a simple way to resize jobs dynamically and raises overall data center throughput by more than 3x.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents DMRlib as a library that brings the resource-efficiency gains of process malleability to parallel scientific codes while keeping the coding effort low. It supplies an MPI-style interface plus a set of ready-made communication patterns so developers can add elasticity without rewriting large parts of their applications. Experiments test both rigid and moldable job submission modes and track metrics including resource allocation rate, jobs completed per second, and energy use. A sympathetic reader would care because conventional static allocation wastes machine time while malleability promises higher productivity once the extra programming cost is removed.

Core claim

DMRlib supplies the performance advantages of process malleability through a minimalist MPI-like syntax and predefined communication patterns, and the reported runs show that elastic workloads can raise global throughput by a factor higher than three relative to traditional non-malleable job streams.

What carries the argument

DMRlib library offering predefined communication patterns together with a minimalist MPI-like syntax for building malleable applications.

If this is right

  • Jobs that can grow or shrink match available resources more closely and raise the overall allocation rate.
  • Malleable workloads complete more jobs per second than fixed-size workloads under the same machine capacity.
  • Dynamic resizing reduces wasted energy by keeping fewer resources idle between job phases.
  • Both rigid and moldable submission modes show gains, with the moldable mode benefiting most from the added flexibility.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Codes with communication patterns not covered by the library may still require substantial extra work, narrowing the set of applications that see immediate benefit.
  • Combining DMRlib with cloud-style schedulers could carry the measured throughput improvements into multi-tenant environments.
  • Wider adoption would likely push data-center operators to support frequent, low-cost job resizing instead of long-lived static partitions.

Load-bearing premise

The library's predefined communication patterns are sufficient for a broad set of scientific applications without adding noticeable performance overhead or extra development work beyond the claimed simple syntax.

What would settle it

Take several production scientific codes whose communication patterns fall outside the library's predefined set and measure whether development time increases substantially or runtime efficiency drops compared with their non-malleable versions.

Figures

Figures reproduced from arXiv: 2604.26624 by Antonio J. Pe\~na, Enrique S. Quintana-Ort\'i, Rafael Mayo, Sergio Iserte.

Figure 1
Figure 1. Figure 1: Execution environment of a malleable application using DMRlib. view at source ↗
Figure 2
Figure 2. Figure 2: Example of an expansion from 5 to 10 processes. view at source ↗
Figure 3
Figure 3. Figure 3: Gain difference of each application and a 10% threshold (thick view at source ↗
Figure 4
Figure 4. Figure 4: Comparison of the four types of workloads. The lines show the view at source ↗
Figure 6
Figure 6. Figure 6: Execution and waiting times per job in the 1,000-job workload view at source ↗
Figure 7
Figure 7. Figure 7: Time difference between the 1,000-job pure moldable and the view at source ↗
Figure 10
Figure 10. Figure 10: Energy needed to complete a workload compared to the fixed view at source ↗
read the original abstract

Process malleability has proved to have a highly positive impact on the resource utilization and global productivity in data centers compared with the conventional static resource allocation policy. However, the non-negligible additional development effort this solution imposes has constrained its adoption by the scientific programming community. In this work, we present DMRlib, a library designed to offer the global advantages of process malleability while providing a minimalist MPI-like syntax. The library includes a series of predefined communication patterns that greatly ease the development of malleable applications. In addition, we deploy several scenarios to demonstrate the positive impact of process malleability featuring different scalability patterns. Concretely, we study two job submission modes (rigid and moldable) in order to identify the best-case scenarios for malleability using metrics such as resource allocation rate, completed jobs per second, and energy consumption. The experiments prove that our elastic approach may improve global throughput by a factor higher than 3x compared to the traditional workloads of non-malleable jobs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript presents DMRlib, a library offering a minimalist MPI-like syntax and predefined communication patterns to reduce the development effort for process-malleable applications. It evaluates malleability benefits under rigid and moldable job submission modes using metrics such as resource allocation rate, completed jobs per second, and energy consumption, claiming that the elastic approach improves global throughput by more than a factor of 3x relative to traditional non-malleable workloads.

Significance. If the empirical claims are substantiated with complete experimental details, DMRlib could meaningfully lower barriers to adopting malleable computing in scientific HPC workloads, improving data-center utilization and productivity. The emphasis on easy-to-use patterns addresses a documented adoption obstacle.

major comments (1)
  1. [Abstract] Abstract: the central claim that 'the experiments prove that our elastic approach may improve global throughput by a factor higher than 3x' is load-bearing yet unsupported by any description of the experimental setup, workload characteristics, application mapping to the predefined patterns, baselines, controls, number of runs, or statistical significance. This prevents verification of whether the reported gains generalize or depend on specially chosen cases.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed review and the opportunity to clarify the presentation of our experimental results. We address the single major comment below and will incorporate revisions to improve accessibility of the key claims.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that 'the experiments prove that our elastic approach may improve global throughput by a factor higher than 3x' is load-bearing yet unsupported by any description of the experimental setup, workload characteristics, application mapping to the predefined patterns, baselines, controls, number of runs, or statistical significance. This prevents verification of whether the reported gains generalize or depend on specially chosen cases.

    Authors: We agree that the abstract's brevity omits sufficient context for the central claim. The full manuscript (Section 4) describes the experimental setup in detail: a 128-node cluster testbed, synthetic workloads exercising strong- and weak-scaling patterns mapped to DMRlib's predefined communication collectives (broadcast, all-to-all, reduce), rigid versus moldable submission modes, static non-malleable MPI baselines, 10 independent runs per configuration with reported means and standard deviations, and the three metrics (resource allocation rate, completed jobs/s, energy). The >3x global throughput gain is observed specifically under moldable submission with high system load and applications whose communication patterns match the library's templates. To address the concern directly, we will revise the abstract to include a concise summary of these conditions, the workload classes, and the submission modes under which the factor is achieved, while retaining the word limit. revision: yes

Circularity Check

0 steps flagged

No circularity: central claim rests on new empirical experiments

full rationale

The paper presents DMRlib as a new library with predefined patterns and reports experimental results comparing malleable vs. non-malleable job throughput under rigid and moldable submission modes. The >3x improvement claim is grounded in measured metrics (resource allocation rate, completed jobs per second, energy) rather than any derivation, self-definition, fitted parameter renamed as prediction, or load-bearing self-citation. No equations or theoretical chain exists that reduces to its own inputs; the work is self-contained against external benchmarks of traditional static allocation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim depends on the existence and correct functioning of the DMRlib implementation plus the validity of the described experimental scenarios; no free parameters or invented physical entities are introduced.

axioms (1)
  • domain assumption The MPI programming model is a standard and widely adopted interface for parallel applications.
    DMRlib is explicitly designed with MPI-like syntax, assuming programmers are familiar with MPI.
invented entities (1)
  • DMRlib no independent evidence
    purpose: Software library providing minimalist syntax and predefined patterns to enable process malleability.
    The library itself is the primary contribution; no independent evidence outside the paper is provided for its correctness or performance.

pith-pipeline@v0.9.0 · 5487 in / 1328 out tokens · 51257 ms · 2026-05-07T12:41:15.960084+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages

  1. [1]

    The workload on parallel su- percomputers: Modeling the characteristics of rigid jobs,

    U. Lublin and D. G. Feitelson, “The workload on parallel su- percomputers: Modeling the characteristics of rigid jobs,” JPDC, vol. 63, no. 11, pp. 1105–1122, Nov 2003

  2. [2]

    Infrastructure and API extensions for elastic execution of MPI applications,

    I. Comprés, A. Mo-Hellenbrand, M. Gerndt, and H.-J. Bungartz, “Infrastructure and API extensions for elastic execution of MPI applications,” in 23rd EuroMPI, 2016, pp. 82–97

  3. [3]

    Dynamic reconfiguration of noniterative scientific applications,

    S. Iserte, H. Martínez, S. Barrachina, M. Castillo, R. Mayo, and A. J. Peña, “Dynamic reconfiguration of noniterative scientific applications,” IJHPCA, Sep 2018

  4. [4]

    Dynamic malleability in iterative MPI applications,

    K. El Maghraoui, T. J. Desell, B. K. Szymanski, and C. A. Varela, “Dynamic malleability in iterative MPI applications,” in 7th IEEE CCGrid, May 2007, pp. 591–598

  5. [5]

    Ar- chitecting malleable MPI applications for priority-driven adaptive scheduling,

    P . Lemarinier, K. Hasanov , S. V enugopal, and K. Katrinis, “Ar- chitecting malleable MPI applications for priority-driven adaptive scheduling,” in 23rd EuroMPI, 2016, pp. 74–81

  6. [6]

    Parallel programming with migratable objects: CHARM++ in practice,

    B. Acun et al. , “Parallel programming with migratable objects: CHARM++ in practice,” in SC14. IEEE, Nov 2014, pp. 647–658

  7. [7]

    Towards realizing the potential of malleable jobs,

    A. Gupta, B. Acun, O. Sarood, and L. V . Kalé, “Towards realizing the potential of malleable jobs,” in 21st HiPC, 2014

  8. [8]

    A scalable double in-memory checkpoint and restart scheme towards exascale,

    G. Zheng, Xiang Ni, and L. V . Kale, “A scalable double in-memory checkpoint and restart scheme towards exascale,” in IEEE/IFIP DSN, Jun. 2012

  9. [9]

    High-throughput computation through efficient re- source management,

    S. Iserte, “High-throughput computation through efficient re- source management,” Ph.D. dissertation, UJI, Nov 2018

  10. [10]

    Autonomic malleability in iterative MPI applications,

    F. S. Ribeiro, A. P . Nascimento, C. Boeres, V . E. F. Rebello, and A. C. Sena, “Autonomic malleability in iterative MPI applications,” in SBAC-P AD, 2013, pp. 192–199. 14

  11. [11]

    FLEX- MPI: An MPI extension for supporting dynamic load balancing on heterogeneous non-dedicated systems,

    G. Martín, M.-C. Marinescu, D. E. Singh, and J. Carretero, “FLEX- MPI: An MPI extension for supporting dynamic load balancing on heterogeneous non-dedicated systems,” in Euro-Par Parallel Processing, Aug. 2013, pp. 138–149

  12. [12]

    Reshape: A framework for dy- namic resizing and scheduling of homogeneous applications in a parallel environment,

    R. Sudarsan and C. J. Ribbens, “Reshape: A framework for dy- namic resizing and scheduling of homogeneous applications in a parallel environment,” in ICPP, 2007

  13. [13]

    Maximizing throughput of overprovisioned HPC data centers under a strict power budget,

    O. Sarood, A. Langer, A. Gupta, and L. Kale, “Maximizing throughput of overprovisioned HPC data centers under a strict power budget,” in SC14. IEEE, Nov 2014, pp. 807–818

  14. [14]

    A batch system with efficient adaptive scheduling for malleable and evolving applications,

    S. Prabhakaran, M. Neumann, S. Rinke, F. Wolf, A. Gupta, and L. V . Kale, “A batch system with efficient adaptive scheduling for malleable and evolving applications,” in IPDPS, May 2015

  15. [15]

    Efficient scalable computing through flexible applications and adaptive workloads,

    S. Iserte, R. Mayo, E. S. Quintana-Ortí, V . Beltran, and A. J. Peña, “Efficient scalable computing through flexible applications and adaptive workloads,” in 46th ICPPW, Aug. 2017, pp. 180–189

  16. [16]

    Collective offload for heterogeneous clusters,

    F. Sainz, J. Bellon, V . Beltran, and J. Labarta, “Collective offload for heterogeneous clusters,” in 22nd HiPC, 2015

  17. [17]

    DMR API: Improving cluster productivity by turning applica- tions into malleable,

    S. Iserte, R. Mayo, E. S. Quintana-Ortí, V . Beltran, and A. J. Peña, “DMR API: Improving cluster productivity by turning applica- tions into malleable,” Parallel Computing, Oct 2018

  18. [18]

    An study of the effect of process malleabil- ity in the energy efficiency on GPU-based clusters,

    S. Iserte and K. Rojek, “An study of the effect of process malleabil- ity in the energy efficiency on GPU-based clusters,” The Journal of Supercomputing, pp. 1–20, oct 2019

  19. [19]

    Malleable iterative MPI applications,

    K. El Maghraoui, T. J. Desell, B. K. Szymanski, and C. A. Varela, “Malleable iterative MPI applications,” Concurrency and Computa- tion: Practice and Experience , vol. 21, no. 3, Mar. 2009

  20. [20]

    En- hancing the performance of malleable MPI applications by using performance-aware dynamic reconfiguration,

    G. Martín, D. E. Singh, M.-C. Marinescu, and J. Carretero, “En- hancing the performance of malleable MPI applications by using performance-aware dynamic reconfiguration,” Parallel Computing, vol. 46, pp. 60–77, Jul. 2015

  21. [21]

    Methods of conjugate gradients for solving linear systems,

    M. Hestenes and E. Stiefel, “Methods of conjugate gradients for solving linear systems,” Journal of Research of the National Bureau of Standards, vol. 49, no. 6, p. 409, Dec 1952

  22. [22]

    Saad, Iterative Methods for Sparse Linear Systems

    Y . Saad, Iterative Methods for Sparse Linear Systems . Society for Industrial and Applied Mathematics, Jan 2003

  23. [23]

    S. J. Aarseth, Gravitational N-body simulations: tools and algorithms . Cambridge University Press, 2009

  24. [24]

    Highly sensitive and ultrafast read mapping for RNA-seq analysis,

    I. Medina et al. , “Highly sensitive and ultrafast read mapping for RNA-seq analysis,” DNA Research, vol. 23, no. 2, Apr 2016

  25. [25]

    Toward convergence in job schedulers for parallel supercomputers,

    D. G. Feitelson and L. Rudolph, “Toward convergence in job schedulers for parallel supercomputers,” in Job Scheduling Strate- gies for Parallel Processing , vol. 1162, no. 5, 1996, pp. 1–26

  26. [26]

    Dynamic load balancing for hybrid applica- tions,

    M. Garcia-Gasulla, “Dynamic load balancing for hybrid applica- tions,” Ph.D. dissertation, UPC, 2017. Sergio Iserte holds the degrees of BS in Com- puter Engineering (2011), MS in Intelligent Sys- tems (2014), and PhD in Computer Science (2018) from Universitat Jaume I, Spain. He is currently postdoc researcher (APOSTD20) in the Mechanical and Engineering...