Promise-Future Synchronization for Cluster Asynchronous Many-Task Runtimes via MPI One-Sided Communication

Mia Reitz

arxiv: 2607.00303 · v1 · pith:6JCHTRKVnew · submitted 2026-07-01 · 💻 cs.DC

Promise-Future Synchronization for Cluster Asynchronous Many-Task Runtimes via MPI One-Sided Communication

Mia Reitz This is my paper

Pith reviewed 2026-07-02 01:03 UTC · model grok-4.3

classification 💻 cs.DC

keywords asynchronous many-task runtimepromise-future modelMPI one-sided communicationhierarchical LU factorizationdynamic dependenciesItoyoriFBC

0 comments

The pith

Promise-future model via MPI one-sided communication extends AMT runtime to support dynamic algorithms like HLU.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses limitations in the ItoyoriFBC AMT runtime where futures are bound to producers at creation and reader counts are fixed at compile time. It extends this with a promise-future model using MPI one-sided communication to allow dynamic dependency creation. This enables direct expression of algorithms with dynamic dependencies such as Hierarchical LU factorization. Experimental evaluation on up to 16 nodes shows near-ideal scaling and a 15.6x speedup.

Core claim

The ItoyoriFBC variant with promise-future synchronization supports dynamic algorithms such as Hierarchical LU factorization by using MPI one-sided communication, achieving near-ideal scaling with a 15.6x speedup on up to 16 nodes.

What carries the argument

Promise-future model implemented with MPI one-sided communication to handle dynamic task dependencies.

Load-bearing premise

The promise-future model can be realized with MPI one-sided communication in a way that preserves correctness and delivers the reported scaling without hidden costs from the synchronization mechanism itself.

What would settle it

A test showing that the HLU implementation fails to scale near-ideally or produces incorrect results due to synchronization issues would falsify the claim.

Figures

Figures reproduced from arXiv: 2607.00303 by Mia Reitz.

**Figure 3.** Figure 3: Task interaction sequence: future-only vs. promise-future model. Listing 1.3. Pseudocode of push 1 bool push ( State * S , Continuation c) { 2 Node * n = new Node (c ); 3 Node * head = read_remote (S -> head ); 4 while ( head != MARKED_DONE ) { 5 n -> next = head ; 6 Node * old = CAS_remote ( 7 &S -> head , head , n ); 8 if ( old == head ) return true ; 9 head = old ; 10 } 11 delete n; 12 return false ; 13… view at source ↗

**Figure 4.** Figure 4: HLU speedup relative to one node on up to 16 nodes. 1–16 nodes (40–640 cores) and 10 runs per configuration. We ran one worker per core. Here, speedup is defined as the one-node running time divided by the running time on n nodes [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

read the original abstract

Asynchronous Many-Task (AMT) runtimes use futures as placeholders for values produced by other tasks. In the ItoyoriFBC AMT runtime, the existing future-only model binds each future to its producer at creation time and requires the number of tasks that read each future to be fixed at compile time. This prevents directly expressing algorithms that create dependencies dynamically. We extend ItoyoriFBC with an implementation of a promise-future model that lifts these limitations. Thereby, our ItoyoriFBC variant supports dynamic algorithms such as Hierarchical LU factorization (HLU). We experimentally evaluated our implementation using HLU on up to 16 nodes and observed near-ideal scaling with a 15.6x speedup.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This extends ItoyoriFBC with promise-futures over MPI one-sided to handle dynamic dependencies and reports 15.6x speedup on HLU at 16 nodes.

read the letter

The core contribution is lifting two hard limits in the original ItoyoriFBC future model—producer binding at creation and fixed reader count at compile time—by adding a promise-future pair implemented with MPI one-sided operations. That change lets the runtime express algorithms whose task graph is not known statically, and the authors demonstrate it on hierarchical LU factorization.

The implementation itself looks like the main piece of new work. They keep the existing task runtime intact and layer the promise mechanism on top, which is a reasonable engineering choice for an AMT system that already uses MPI. The reported scaling result—near-ideal on 16 nodes with a 15.6x speedup—gives concrete evidence that the added synchronization does not destroy performance at that scale.

The evaluation is narrow. Sixteen nodes is modest for cluster work, and the abstract supplies no baseline description, run count, or error bars, so it is hard to judge how much of the speedup comes from the new model versus other factors. The claim that the one-sided realization preserves correctness without hidden costs is plausible but rests entirely on the experiment; if the full paper shows the measurement methodology clearly, that concern shrinks.

This paper is aimed at people who build or tune distributed asynchronous task runtimes and already know ItoyoriFBC or similar MPI-based systems. A reader outside that niche will not find broad new theory or application results.

It is solid enough on its own terms to go to peer review. The extension is well-motivated, the implementation is described at a usable level, and the performance data, while limited in scope, is a real measurement rather than a fitted claim.

Referee Report

1 major / 0 minor

Summary. The paper extends the ItoyoriFBC AMT runtime, which previously bound futures to producers at creation time and required compile-time fixed reader counts, by implementing a promise-future model via MPI one-sided communication. This change enables dynamic task graphs, including Hierarchical LU factorization (HLU). The authors report an experimental evaluation on up to 16 nodes showing near-ideal scaling and a 15.6x speedup.

Significance. If the implementation correctly realizes promise-future synchronization without introducing hidden costs and the scaling result holds under standard HPC evaluation practices, the work would be a useful engineering contribution to cluster-scale AMT systems. It demonstrates a concrete path to support dynamic dependencies while retaining the performance characteristics of MPI RMA, which could benefit other task-based runtimes that currently restrict dependency creation.

major comments (1)

[Abstract] Abstract: the central performance claim ('near-ideal scaling with a 15.6x speedup' for HLU) is presented with no description of the experimental setup, baseline (original ItoyoriFBC or other runtimes), node configuration, measurement protocol, number of trials, or error bars. This detail is load-bearing for assessing whether the MPI one-sided promise-future implementation preserves correctness and delivers the reported scaling without synchronization overhead.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the single major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: the central performance claim ('near-ideal scaling with a 15.6x speedup' for HLU) is presented with no description of the experimental setup, baseline (original ItoyoriFBC or other runtimes), node configuration, measurement protocol, number of trials, or error bars. This detail is load-bearing for assessing whether the MPI one-sided promise-future implementation preserves correctness and delivers the reported scaling without synchronization overhead.

Authors: We agree that the abstract would benefit from additional context on the experimental setup to support the central claim. In the revised version we will expand the abstract to explicitly state the baseline (original ItoyoriFBC), the scale (up to 16 nodes), and that full details of the measurement protocol, number of trials, and error bars appear in the experimental evaluation section. Abstract length limits preclude including every protocol detail directly in the abstract, but the added sentence will make the claim more self-contained while preserving conciseness. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is an engineering description of an implementation extension to ItoyoriFBC that adds promise-future support via MPI one-sided communication, followed by an empirical evaluation on HLU showing measured scaling. No equations, derivations, parameter fits, or predictions appear in the supplied text. The central claim reduces to an experimental observation rather than any self-referential construction or self-citation chain. No load-bearing steps match any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no information on free parameters, axioms, or invented entities; all details are absent.

pith-pipeline@v0.9.1-grok · 5646 in / 1027 out tokens · 34776 ms · 2026-07-02T01:03:51.923361+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

297 extracted references · 48 canonical work pages

[1]

Hamouda , title =

Sara S. Hamouda , title =
[2]

Sadayappan and Chau-Wen Tseng , booktitle =

James Dinan and Stephen Olivier and Gerald Sabin and Jan Prins and P. Sadayappan and Chau-Wen Tseng , booktitle =. Dynamic Load Balancing of Unbalanced Computations Using Message Passing , year =
[3]

On the Inevitability of Integrated HPC Systems and How They Will Change HPC System Operations , year =

Martin Schulz and Dieter Kranzlm\". On the Inevitability of Integrated HPC Systems and How They Will Change HPC System Operations , year =. doi:10.1145/3468044.3468046 , booktitle =

work page doi:10.1145/3468044.3468046
[4]

Mia Reitz , title =. Proc. Int. Conf. on Cluster Computing (CLUSTER), Extended Abstract and Poster , publisher =. 2021 , pages =. doi:10.1109/Cluster48925.2021.00075 , keywords =

work page doi:10.1109/cluster48925.2021.00075 2021
[5]

Mia Reitz , title =. Proc. Int. Parallel and Distributed Processing Symp. (IPDPS), Ph.D. Forum, Extended Abstract , year =. doi:10.1109/IPDPSW52791.2021.00160 , keywords =

work page doi:10.1109/ipdpsw52791.2021.00160 2021
[6]

Containment Domains: A Scalable, Efficient, and Flexible Resilience Scheme for Exascale Systems , year =

Jinsuk Chung and Ikhwan Lee and Michael Sullivan and Jee Ho Ryoo and Dong Wan Kim and Doe Hyun Yoon and Larry Kaplan and Mattan Erez , booktitle =. Containment Domains: A Scalable, Efficient, and Flexible Resilience Scheme for Exascale Systems , year =
[7]

, title =

Kapil Arya and Gene Cooperman and Rohan Garg and Jiajun Cao and and Artem Polyakov. , title =
[8]

Embracing Explicit Communication in Work-Stealing Runtime Systems , author =
[9]

2008 , isbn =

Monte Carlo methods , author =. 2008 , isbn =

2008
[10]

PCJ: HPC Challenge Award at Supecomputing'14 , urldate =

Piotr Ba. PCJ: HPC Challenge Award at Supecomputing'14 , urldate =
[11]

Reinit++: Evaluating the Performance of Global-Restart Recovery Methods For MPI Fault Tolerance , author =. Proc. Int. Conf. on High Performance Computing (ISC) , year =
[12]

CRAFT: A Library for Easier Application-Level Checkpoint/Restart and Automatic Fault Tolerance , year =

Faisal Shahzad and Jonas Thies and Moritz Kreutzer and Thomas Zeiser and Georg Hager and Gerhard Wellein , journal =. CRAFT: A Library for Easier Application-Level Checkpoint/Restart and Automatic Fault Tolerance , year =
[13]

Katz and Hemanth Kolla and Jacqueline Chen and Scott Klasky and Manish Parashar , booktitle =

Marc Gamell and Daniel S. Katz and Hemanth Kolla and Jacqueline Chen and Scott Klasky and Manish Parashar , booktitle =. Exploring Automatic, Online Failure Recovery for Scientific Applications at Extreme Scales , year =
[14]

Abraham , title =

Kuang-Hua Huang and Jacob A. Abraham , title =. 1984 , publisher =. doi:10.1109/TC.1984.1676475 , journal =

work page doi:10.1109/tc.1984.1676475 1984
[15]

Distributed Task-Based Runtime Systems - Current State and Micro-Benchmark Performance , year =

Reazul Hoque and Pavel Shamis , booktitle =. Distributed Task-Based Runtime Systems - Current State and Micro-Benchmark Performance , year =
[16]

PufferFish: NUMA-Aware Work-stealing Library using Elastic Tasks , year =

Vivek Kumar , booktitle =. PufferFish: NUMA-Aware Work-stealing Library using Elastic Tasks , year =
[17]

From tasks graphs to asynchronous distributed checkpointing with local restart , author =. Proc. Int. Conf. on High Performance Computing, Networking, Storage and Analysis (SC) Workshops (FTXS) , publisher =. 2020 , doi =

2020
[18]

Kale , booktitle =

Gengbin Zheng and Lixia Shi and L.V. Kale , booktitle =. FTC-Charm++: An In-Memory Checkpoint-Based Fault Tolerant Runtime for Charm++ and MPI , year =. doi:10.1109/CLUSTR.2004.1392606 , publisher =

work page doi:10.1109/clustr.2004.1392606 2004
[19]

Proceeding Euro-Par: Parallel Processing , year =

Sri Raj Paul and Akihiro Hayashi and Nicole Slattengren and Hemanth Kolla and Matthew Whitlock and Seonmyeong Bak and Keita Teranishi and Jackson Mayo and Vivek Sarkar , title =. Proceeding Euro-Par: Parallel Processing , year =
[20]

Concurrency and Computation: Practice and Experience (CCPE) , publisher =

Aurelien Bouteiller and George Bosilca and Jack Dongarra , title =. Concurrency and Computation: Practice and Experience (CCPE) , publisher =. doi:10.1002/cpe.1589 , year =

work page doi:10.1002/cpe.1589
[21]

Amina Guermouche and Thomas Ropars and Elisabeth Brunet and Marc Snir and Franck Cappello , title =. Int. Parallel and Distributed Processing Symp. (IPDPS) , publisher =. 2011 , pages =

2011
[22]

Checkpoint/Restart for Lagrangian particle mesh with AMR in community code FLASH-X , author =. Int. Symp. on Checkpointing for Supercomputing (SuperCheck) , year =
[23]

2019 , publisher =

Sihuan Li and Hongbo Li and Xin Liang and Jieyang Chen and Elisabeth Giem and Kaiming Ouyang and Kai Zhao and Sheng Di and Franck Cappello and Zizhong Chen , title =. 2019 , publisher =. doi:10.1145/3295500.3356195 , booktitle =

work page doi:10.1145/3295500.3356195 2019
[24]

Reazul Hoque and Thomas Herault and George Bosilca and Jack Dongarra , title =. Proc. Int. Conf. on High Performance Computing, Networking, Storage and Analysis (SC) Workshops (ScalA) , publisher =. 2017 , doi =

2017
[25]

StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures , journal =

C. StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures , journal =. 2011 , publisher =

2011
[26]

2018 , doi =

Marco Bungart , title =. 2018 , doi =

2018
[27]

Brian Larkins , booktitle =

Hannah Cartier and James Dinan and D. Brian Larkins , booktitle =. Optimizing Work Stealing Communication with Structured Atomic Operations , year =
[28]

2019 , type =

Mia Reitz , title =. 2019 , type =

2019
[29]

2018 , type =

Mia Reitz , title =. 2018 , type =

2018
[30]

2010 , isbn =

Georg Hager and Gerhard Wellein , title =. 2010 , isbn =

2010
[31]

2018 , isbn =

Thomas Sterling and Matthew Anderson and Maciej Brodowicz , title =. 2018 , isbn =

2018
[32]

ServiceSs: An Interoperable Programming Framework for the Cloud , year =

Francesc Lordan and Enric Tejedor and Jorge Ejarque and Roger Rafanell and Javier \'. ServiceSs: An Interoperable Programming Framework for the Cloud , year =. doi:10.1007/s10723-013-9272-5 , journal =

work page doi:10.1007/s10723-013-9272-5
[33]

Numrich and John Reid , title =

Robert W. Numrich and John Reid , title =. 2005 , publisher =. doi:10.1145/1080399.1080400 , journal =

work page doi:10.1145/1080399.1080400 2005
[34]

Yelick and Luigi Semenzato and Geoff Pike and Carleton Miyamoto and Ben Liblit and Arvind Krishnamurthy and Paul N

Katherine A. Yelick and Luigi Semenzato and Geoff Pike and Carleton Miyamoto and Ben Liblit and Arvind Krishnamurthy and Paul N. Hilfinger and Susan L. Graham and David Gay and Phillip Colella and Alexander Aiken , title =. Concurrency: Practice and Experience , pages =. 1998 , volume =

1998
[35]

TOP500.org , title =
[36]

Competence Center for High Performance Computing in Hessen (HKHLR) , title =
[37]

Leibniz Supercomputing Centre (LRZ) , title =
[38]

Linux Cluster Kassel , urldate =

Competence Center for High Performance Computing in Hessen (HKHLR) , url =. Linux Cluster Kassel , urldate =
[39]

Morris and Qinglei Cao and George Bosilca and Seema Mirchandaney and Wonchan Lee and Sean Treichler and Patrick McCormick and Alex Aiken , title =

Elliott Slaughter and Wei Wu and Yuankun Fu and Legend Brandenburg and Nicolai Garcia and Wilhem Kautz and Emily Marx and Kaleb S. Morris and Qinglei Cao and George Bosilca and Seema Mirchandaney and Wonchan Lee and Sean Treichler and Patrick McCormick and Alex Aiken , title =. Proc. Int. Conf. for High Performance Computing, Networking, Storage and Analy...

2020
[40]

2020 , howpublished =

Claudia Fohry , title =. 2020 , howpublished =

2020
[41]

Encyclopedia of Parallel Computing , year =

George Almasi , title =. Encyclopedia of Parallel Computing , year =
[42]

Baden and Steven Hofmeyr and Mathias Jacquelin and Amir Kamil and Dan Bonachea and Paul H

John Bachan and Scott B. Baden and Steven Hofmeyr and Mathias Jacquelin and Amir Kamil and Dan Bonachea and Paul H. Hargrove and Hadia Ahmed , booktitle =. UPC++: A High-Performance Communication Framework for Asynchronous Computation , year =
[43]

HiPEAC: European Network on High-performance Embedded Architecture and Compilation , title =
[44]

The European Technology Platform For High-Performance Computing ETP4HPC Association , title =
[45]

Series , title =

SC Conf. Series , title =
[46]

Brad McCredie , title =
[47]

2018 , doi =

On the road to exascale: Advances in High Performance Computing and Simulations - An overview and editorial , journal =. 2018 , doi =

2018
[48]

Oak Ridge National Laboratory , title =
[49]

Resilient Application Co-scheduling with Processor Redistribution , year =

Anne Benoit and Lo. Resilient Application Co-scheduling with Processor Redistribution , year =. Proc. Int. Conf. on Parallel Processing (ICPP) , pages =
[50]

E.N. Elnozahy and Ricardo Bianchini and Tarek El-Ghazawi and Armando Fox and Forest Godf and Adolfy Hoisie and Kathryn McKinley and Rami Melhem and James Plank and Partha Ranganathan and Josh Simons , title =
[51]

Terracotta Open Source Community , urldate =

Terracotta , url =. Terracotta Open Source Community , urldate =
[52]

Infinispan, In-Memory Distributed Data Store , urldate =

Infinispan , url =. Infinispan, In-Memory Distributed Data Store , urldate =
[53]

Infinispan , howpublished =
[54]

Lakshman and Ramana Kompella , booktitle =

Advait Abhay Dixit and Fang Hao and Sarit Mukherjee and T.V. Lakshman and Ramana Kompella , booktitle =. ElastiCon: An Elastic Distributed Sdn Controller , year =
[55]

U U Multi-Tenanted Framework: Distributed Near Duplicate Detection for Big Data , year =

Pradeeban Kathiravelu and Helena Galhardas and Lu\'. U U Multi-Tenanted Framework: Distributed Near Duplicate Detection for Big Data , year =. On the Move to Meaningful Internet Systems , pages =
[56]

Concurrent and Distributed CloudSim Simulations , year =

Pradeeban Kathiravelu and Luis Veiga , booktitle =. Concurrent and Distributed CloudSim Simulations , year =
[57]

An Adaptive Distributed Simulator for Cloud and MapReduce Algorithms and Architectures , year =

Pradeeban Kathiravelu and Luis Veiga , booktitle =. An Adaptive Distributed Simulator for Cloud and MapReduce Algorithms and Architectures , year =
[58]

A Work-Stealing Scheduling Framework Supporting Fault Tolerance , year =

Yizhuo Wang and Weixing Ji and Feng Shi and Qi Zuo , booktitle =. A Work-Stealing Scheduling Framework Supporting Fault Tolerance , year =
[59]

A Programming Language Approach to Fault Tolerance for Fork-Join Parallelism , year =

Mustafa Zengin and Viktor Vafeiadis , booktitle =. A Programming Language Approach to Fault Tolerance for Fork-Join Parallelism , year =
[60]

Journal of Parallel and Distributed Computing (JPDC) , volume =

Rajesh Sudarsana and Calvin J.Ribbens , title =. Journal of Parallel and Distributed Computing (JPDC) , volume =. 2016 , doi =

2016
[61]

Equilibrium: An Elasticity Controller for Parallel Tree Search in the Cloud , year =

Stefan Kehrer and Wolfgang Blochinger , journal =. Equilibrium: An Elasticity Controller for Parallel Tree Search in the Cloud , year =
[62]

, booktitle =

Zheng, Gengbin and Xiang Ni and Kalé, Laxmikant V. , booktitle =. A Scalable Double In-Memory Checkpoint and Restart Scheme Towards Exascale , year =
[63]

and Szymanski, Boleslaw K

El Maghraoui, Kaoutar and Desell, Travis J. and Szymanski, Boleslaw K. and Varela, Carlos A. , booktitle =. Dynamic Malleability in Iterative MPI Applications , year =
[64]

Transactions on Architecture and Code Optimization (TACO) , publisher =

Peter Pirkelbauer and Amalee Wilson and Christina Peterson and Damian Dechev , title =. Transactions on Architecture and Code Optimization (TACO) , publisher =. 2019 , volume =

2019
[65]

SchedMD LLC , title =
[66]

Patrick Finnerty , title =
[67]

2005 , publisher =

Philippe Charles and Christian Grothoff and Vijay Saraswat and Christopher Donawa and Allan Kielstra and Kemal Ebcioglu and Christoph von Praun and Vivek Sarkar , title =. 2005 , publisher =. doi:10.1145/1103845.1094852 , journal =

work page doi:10.1145/1103845.1094852 2005
[68]

Barcelona Supercomputing Center , title =
[69]

GLB: Lifeline-based Global Load Balancing Library in X10 , year =

Wei Zhang and Olivier Tardieu and David Grove and Benjamin Herta and Tomio Kamada and Vijay Saraswat and Mikio Takeuchi , booktitle =. GLB: Lifeline-based Global Load Balancing Library in X10 , year =
[70]

Self-Adjusting Task Granularity for Global Load Balancer Library on Clusters of Many-Core Processors , year =

Patrick Finnerty and Tomio Kamada and Chikara Ohta , booktitle =. Self-Adjusting Task Granularity for Global Load Balancer Library on Clusters of Many-Core Processors , year =
[71]

Quintana-Ort\'

Sergio Iserte and Rafael Mayo and Enrique S. Quintana-Ort\'. DMRlib Easy-coding and Efficient Resource Management for Job Malleability , year =. doi:10.1109/TC.2020.3022933 , journal =

work page doi:10.1109/tc.2020.3022933 2020
[72]

Extending Slurm for Dynamic Resource-Aware Adaptive Batch Scheduling , year =

Mohak Chadha and Jophin John and Michael Gerndt , booktitle =. Extending Slurm for Dynamic Resource-Aware Adaptive Batch Scheduling , year =
[73]

Feitelson , title =

Uri Lublin and Dror G. Feitelson , title =. Journal of Parallel and Distributed Computing , volume =. 2003 , doi =

2003
[74]

Harrison and Gregory Beylkin and Florian A

Robert J. Harrison and Gregory Beylkin and Florian A. Bischoff and Justus A. Calvin and George I. Fann and Jacob Fosso-Tande and Diego Galindo and Jeff R. Hammond and Rebecca Hartman-Baker and Judith C. Hill and Jun Jia and Jakob S. Kottmann and M-J. Yvonne Ou and Junchen Pei and Laura E. Ratcliff and Matthew G. Reuter and Adam C. Richie-Halford and Nicho...

2016
[75]

Parallel Programming with Migratable Objects: Charm++ in Practice , year =

Bilge Acun and Abhishek Gupta and Nikhil Jain and Akhil Langer and Harshitha Menon and Eric Mikida and Xiang Ni and Michael Robson and Yanhua Sun and Ehsan Totoni and Lukasz Wesolowski and Laxmikant Kale , booktitle =. Parallel Programming with Migratable Objects: Charm++ in Practice , year =
[76]

Enhancing the performance of malleable MPI applications by using performance-aware dynamic reconfiguration , journal =

Gonzalo Mart\'. Enhancing the performance of malleable MPI applications by using performance-aware dynamic reconfiguration , journal =. 2015 , doi =

2015
[77]

Dynamic Reconfiguration of Noniterative Scientific Applications: A Case Study with HPG Aligner , journal =

Sergio Iserte and H\'. Dynamic Reconfiguration of Noniterative Scientific Applications: A Case Study with HPG Aligner , journal =. 2019 , doi =

2019
[78]

Kale , booktitle =

Suraj Prabhakaran and Marcel Neumann and Sebastian Rinke and Felix Wolf and Abhishek Gupta and Laxmikant V. Kale , booktitle =. A Batch System with Efficient Adaptive Scheduling for Malleable and Evolving Applications , year =
[79]

Algorithms and Computation , year =

Marc Tchiboukdjian and Nicolas Gast and Denis Trystram and Jean-Louis Roch and Julien Bernard , title =. Algorithms and Computation , year =
[80]

The Implementation of the Cilk-5 Multithreaded Language , author =. Proc. Conf. on Programming Language Design and Implementation (PLDI) , year =

Showing first 80 references.

[1] [1]

Hamouda , title =

Sara S. Hamouda , title =

[2] [2]

Sadayappan and Chau-Wen Tseng , booktitle =

James Dinan and Stephen Olivier and Gerald Sabin and Jan Prins and P. Sadayappan and Chau-Wen Tseng , booktitle =. Dynamic Load Balancing of Unbalanced Computations Using Message Passing , year =

[3] [3]

On the Inevitability of Integrated HPC Systems and How They Will Change HPC System Operations , year =

Martin Schulz and Dieter Kranzlm\". On the Inevitability of Integrated HPC Systems and How They Will Change HPC System Operations , year =. doi:10.1145/3468044.3468046 , booktitle =

work page doi:10.1145/3468044.3468046

[4] [4]

Mia Reitz , title =. Proc. Int. Conf. on Cluster Computing (CLUSTER), Extended Abstract and Poster , publisher =. 2021 , pages =. doi:10.1109/Cluster48925.2021.00075 , keywords =

work page doi:10.1109/cluster48925.2021.00075 2021

[5] [5]

Mia Reitz , title =. Proc. Int. Parallel and Distributed Processing Symp. (IPDPS), Ph.D. Forum, Extended Abstract , year =. doi:10.1109/IPDPSW52791.2021.00160 , keywords =

work page doi:10.1109/ipdpsw52791.2021.00160 2021

[6] [6]

Containment Domains: A Scalable, Efficient, and Flexible Resilience Scheme for Exascale Systems , year =

Jinsuk Chung and Ikhwan Lee and Michael Sullivan and Jee Ho Ryoo and Dong Wan Kim and Doe Hyun Yoon and Larry Kaplan and Mattan Erez , booktitle =. Containment Domains: A Scalable, Efficient, and Flexible Resilience Scheme for Exascale Systems , year =

[7] [7]

, title =

Kapil Arya and Gene Cooperman and Rohan Garg and Jiajun Cao and and Artem Polyakov. , title =

[8] [8]

Embracing Explicit Communication in Work-Stealing Runtime Systems , author =

[9] [9]

2008 , isbn =

Monte Carlo methods , author =. 2008 , isbn =

2008

[10] [10]

PCJ: HPC Challenge Award at Supecomputing'14 , urldate =

Piotr Ba. PCJ: HPC Challenge Award at Supecomputing'14 , urldate =

[11] [11]

Reinit++: Evaluating the Performance of Global-Restart Recovery Methods For MPI Fault Tolerance , author =. Proc. Int. Conf. on High Performance Computing (ISC) , year =

[12] [12]

CRAFT: A Library for Easier Application-Level Checkpoint/Restart and Automatic Fault Tolerance , year =

Faisal Shahzad and Jonas Thies and Moritz Kreutzer and Thomas Zeiser and Georg Hager and Gerhard Wellein , journal =. CRAFT: A Library for Easier Application-Level Checkpoint/Restart and Automatic Fault Tolerance , year =

[13] [13]

Katz and Hemanth Kolla and Jacqueline Chen and Scott Klasky and Manish Parashar , booktitle =

Marc Gamell and Daniel S. Katz and Hemanth Kolla and Jacqueline Chen and Scott Klasky and Manish Parashar , booktitle =. Exploring Automatic, Online Failure Recovery for Scientific Applications at Extreme Scales , year =

[14] [14]

Abraham , title =

Kuang-Hua Huang and Jacob A. Abraham , title =. 1984 , publisher =. doi:10.1109/TC.1984.1676475 , journal =

work page doi:10.1109/tc.1984.1676475 1984

[15] [15]

Distributed Task-Based Runtime Systems - Current State and Micro-Benchmark Performance , year =

Reazul Hoque and Pavel Shamis , booktitle =. Distributed Task-Based Runtime Systems - Current State and Micro-Benchmark Performance , year =

[16] [16]

PufferFish: NUMA-Aware Work-stealing Library using Elastic Tasks , year =

Vivek Kumar , booktitle =. PufferFish: NUMA-Aware Work-stealing Library using Elastic Tasks , year =

[17] [17]

From tasks graphs to asynchronous distributed checkpointing with local restart , author =. Proc. Int. Conf. on High Performance Computing, Networking, Storage and Analysis (SC) Workshops (FTXS) , publisher =. 2020 , doi =

2020

[18] [18]

Kale , booktitle =

Gengbin Zheng and Lixia Shi and L.V. Kale , booktitle =. FTC-Charm++: An In-Memory Checkpoint-Based Fault Tolerant Runtime for Charm++ and MPI , year =. doi:10.1109/CLUSTR.2004.1392606 , publisher =

work page doi:10.1109/clustr.2004.1392606 2004

[19] [19]

Proceeding Euro-Par: Parallel Processing , year =

Sri Raj Paul and Akihiro Hayashi and Nicole Slattengren and Hemanth Kolla and Matthew Whitlock and Seonmyeong Bak and Keita Teranishi and Jackson Mayo and Vivek Sarkar , title =. Proceeding Euro-Par: Parallel Processing , year =

[20] [20]

Concurrency and Computation: Practice and Experience (CCPE) , publisher =

Aurelien Bouteiller and George Bosilca and Jack Dongarra , title =. Concurrency and Computation: Practice and Experience (CCPE) , publisher =. doi:10.1002/cpe.1589 , year =

work page doi:10.1002/cpe.1589

[21] [21]

Amina Guermouche and Thomas Ropars and Elisabeth Brunet and Marc Snir and Franck Cappello , title =. Int. Parallel and Distributed Processing Symp. (IPDPS) , publisher =. 2011 , pages =

2011

[22] [22]

Checkpoint/Restart for Lagrangian particle mesh with AMR in community code FLASH-X , author =. Int. Symp. on Checkpointing for Supercomputing (SuperCheck) , year =

[23] [23]

2019 , publisher =

Sihuan Li and Hongbo Li and Xin Liang and Jieyang Chen and Elisabeth Giem and Kaiming Ouyang and Kai Zhao and Sheng Di and Franck Cappello and Zizhong Chen , title =. 2019 , publisher =. doi:10.1145/3295500.3356195 , booktitle =

work page doi:10.1145/3295500.3356195 2019

[24] [24]

Reazul Hoque and Thomas Herault and George Bosilca and Jack Dongarra , title =. Proc. Int. Conf. on High Performance Computing, Networking, Storage and Analysis (SC) Workshops (ScalA) , publisher =. 2017 , doi =

2017

[25] [25]

StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures , journal =

C. StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures , journal =. 2011 , publisher =

2011

[26] [26]

2018 , doi =

Marco Bungart , title =. 2018 , doi =

2018

[27] [27]

Brian Larkins , booktitle =

Hannah Cartier and James Dinan and D. Brian Larkins , booktitle =. Optimizing Work Stealing Communication with Structured Atomic Operations , year =

[28] [28]

2019 , type =

Mia Reitz , title =. 2019 , type =

2019

[29] [29]

2018 , type =

Mia Reitz , title =. 2018 , type =

2018

[30] [30]

2010 , isbn =

Georg Hager and Gerhard Wellein , title =. 2010 , isbn =

2010

[31] [31]

2018 , isbn =

Thomas Sterling and Matthew Anderson and Maciej Brodowicz , title =. 2018 , isbn =

2018

[32] [32]

ServiceSs: An Interoperable Programming Framework for the Cloud , year =

Francesc Lordan and Enric Tejedor and Jorge Ejarque and Roger Rafanell and Javier \'. ServiceSs: An Interoperable Programming Framework for the Cloud , year =. doi:10.1007/s10723-013-9272-5 , journal =

work page doi:10.1007/s10723-013-9272-5

[33] [33]

Numrich and John Reid , title =

Robert W. Numrich and John Reid , title =. 2005 , publisher =. doi:10.1145/1080399.1080400 , journal =

work page doi:10.1145/1080399.1080400 2005

[34] [34]

Yelick and Luigi Semenzato and Geoff Pike and Carleton Miyamoto and Ben Liblit and Arvind Krishnamurthy and Paul N

Katherine A. Yelick and Luigi Semenzato and Geoff Pike and Carleton Miyamoto and Ben Liblit and Arvind Krishnamurthy and Paul N. Hilfinger and Susan L. Graham and David Gay and Phillip Colella and Alexander Aiken , title =. Concurrency: Practice and Experience , pages =. 1998 , volume =

1998

[35] [35]

TOP500.org , title =

[36] [36]

Competence Center for High Performance Computing in Hessen (HKHLR) , title =

[37] [37]

Leibniz Supercomputing Centre (LRZ) , title =

[38] [38]

Linux Cluster Kassel , urldate =

Competence Center for High Performance Computing in Hessen (HKHLR) , url =. Linux Cluster Kassel , urldate =

[39] [39]

Morris and Qinglei Cao and George Bosilca and Seema Mirchandaney and Wonchan Lee and Sean Treichler and Patrick McCormick and Alex Aiken , title =

Elliott Slaughter and Wei Wu and Yuankun Fu and Legend Brandenburg and Nicolai Garcia and Wilhem Kautz and Emily Marx and Kaleb S. Morris and Qinglei Cao and George Bosilca and Seema Mirchandaney and Wonchan Lee and Sean Treichler and Patrick McCormick and Alex Aiken , title =. Proc. Int. Conf. for High Performance Computing, Networking, Storage and Analy...

2020

[40] [40]

2020 , howpublished =

Claudia Fohry , title =. 2020 , howpublished =

2020

[41] [41]

Encyclopedia of Parallel Computing , year =

George Almasi , title =. Encyclopedia of Parallel Computing , year =

[42] [42]

Baden and Steven Hofmeyr and Mathias Jacquelin and Amir Kamil and Dan Bonachea and Paul H

John Bachan and Scott B. Baden and Steven Hofmeyr and Mathias Jacquelin and Amir Kamil and Dan Bonachea and Paul H. Hargrove and Hadia Ahmed , booktitle =. UPC++: A High-Performance Communication Framework for Asynchronous Computation , year =

[43] [43]

HiPEAC: European Network on High-performance Embedded Architecture and Compilation , title =

[44] [44]

The European Technology Platform For High-Performance Computing ETP4HPC Association , title =

[45] [45]

Series , title =

SC Conf. Series , title =

[46] [46]

Brad McCredie , title =

[47] [47]

2018 , doi =

On the road to exascale: Advances in High Performance Computing and Simulations - An overview and editorial , journal =. 2018 , doi =

2018

[48] [48]

Oak Ridge National Laboratory , title =

[49] [49]

Resilient Application Co-scheduling with Processor Redistribution , year =

Anne Benoit and Lo. Resilient Application Co-scheduling with Processor Redistribution , year =. Proc. Int. Conf. on Parallel Processing (ICPP) , pages =

[50] [50]

E.N. Elnozahy and Ricardo Bianchini and Tarek El-Ghazawi and Armando Fox and Forest Godf and Adolfy Hoisie and Kathryn McKinley and Rami Melhem and James Plank and Partha Ranganathan and Josh Simons , title =

[51] [51]

Terracotta Open Source Community , urldate =

Terracotta , url =. Terracotta Open Source Community , urldate =

[52] [52]

Infinispan, In-Memory Distributed Data Store , urldate =

Infinispan , url =. Infinispan, In-Memory Distributed Data Store , urldate =

[53] [53]

Infinispan , howpublished =

[54] [54]

Lakshman and Ramana Kompella , booktitle =

Advait Abhay Dixit and Fang Hao and Sarit Mukherjee and T.V. Lakshman and Ramana Kompella , booktitle =. ElastiCon: An Elastic Distributed Sdn Controller , year =

[55] [55]

U U Multi-Tenanted Framework: Distributed Near Duplicate Detection for Big Data , year =

Pradeeban Kathiravelu and Helena Galhardas and Lu\'. U U Multi-Tenanted Framework: Distributed Near Duplicate Detection for Big Data , year =. On the Move to Meaningful Internet Systems , pages =

[56] [56]

Concurrent and Distributed CloudSim Simulations , year =

Pradeeban Kathiravelu and Luis Veiga , booktitle =. Concurrent and Distributed CloudSim Simulations , year =

[57] [57]

An Adaptive Distributed Simulator for Cloud and MapReduce Algorithms and Architectures , year =

Pradeeban Kathiravelu and Luis Veiga , booktitle =. An Adaptive Distributed Simulator for Cloud and MapReduce Algorithms and Architectures , year =

[58] [58]

A Work-Stealing Scheduling Framework Supporting Fault Tolerance , year =

Yizhuo Wang and Weixing Ji and Feng Shi and Qi Zuo , booktitle =. A Work-Stealing Scheduling Framework Supporting Fault Tolerance , year =

[59] [59]

A Programming Language Approach to Fault Tolerance for Fork-Join Parallelism , year =

Mustafa Zengin and Viktor Vafeiadis , booktitle =. A Programming Language Approach to Fault Tolerance for Fork-Join Parallelism , year =

[60] [60]

Journal of Parallel and Distributed Computing (JPDC) , volume =

Rajesh Sudarsana and Calvin J.Ribbens , title =. Journal of Parallel and Distributed Computing (JPDC) , volume =. 2016 , doi =

2016

[61] [61]

Equilibrium: An Elasticity Controller for Parallel Tree Search in the Cloud , year =

Stefan Kehrer and Wolfgang Blochinger , journal =. Equilibrium: An Elasticity Controller for Parallel Tree Search in the Cloud , year =

[62] [62]

, booktitle =

Zheng, Gengbin and Xiang Ni and Kalé, Laxmikant V. , booktitle =. A Scalable Double In-Memory Checkpoint and Restart Scheme Towards Exascale , year =

[63] [63]

and Szymanski, Boleslaw K

El Maghraoui, Kaoutar and Desell, Travis J. and Szymanski, Boleslaw K. and Varela, Carlos A. , booktitle =. Dynamic Malleability in Iterative MPI Applications , year =

[64] [64]

Transactions on Architecture and Code Optimization (TACO) , publisher =

Peter Pirkelbauer and Amalee Wilson and Christina Peterson and Damian Dechev , title =. Transactions on Architecture and Code Optimization (TACO) , publisher =. 2019 , volume =

2019

[65] [65]

SchedMD LLC , title =

[66] [66]

Patrick Finnerty , title =

[67] [67]

2005 , publisher =

Philippe Charles and Christian Grothoff and Vijay Saraswat and Christopher Donawa and Allan Kielstra and Kemal Ebcioglu and Christoph von Praun and Vivek Sarkar , title =. 2005 , publisher =. doi:10.1145/1103845.1094852 , journal =

work page doi:10.1145/1103845.1094852 2005

[68] [68]

Barcelona Supercomputing Center , title =

[69] [69]

GLB: Lifeline-based Global Load Balancing Library in X10 , year =

Wei Zhang and Olivier Tardieu and David Grove and Benjamin Herta and Tomio Kamada and Vijay Saraswat and Mikio Takeuchi , booktitle =. GLB: Lifeline-based Global Load Balancing Library in X10 , year =

[70] [70]

Self-Adjusting Task Granularity for Global Load Balancer Library on Clusters of Many-Core Processors , year =

Patrick Finnerty and Tomio Kamada and Chikara Ohta , booktitle =. Self-Adjusting Task Granularity for Global Load Balancer Library on Clusters of Many-Core Processors , year =

[71] [71]

Quintana-Ort\'

Sergio Iserte and Rafael Mayo and Enrique S. Quintana-Ort\'. DMRlib Easy-coding and Efficient Resource Management for Job Malleability , year =. doi:10.1109/TC.2020.3022933 , journal =

work page doi:10.1109/tc.2020.3022933 2020

[72] [72]

Extending Slurm for Dynamic Resource-Aware Adaptive Batch Scheduling , year =

Mohak Chadha and Jophin John and Michael Gerndt , booktitle =. Extending Slurm for Dynamic Resource-Aware Adaptive Batch Scheduling , year =

[73] [73]

Feitelson , title =

Uri Lublin and Dror G. Feitelson , title =. Journal of Parallel and Distributed Computing , volume =. 2003 , doi =

2003

[74] [74]

Harrison and Gregory Beylkin and Florian A

Robert J. Harrison and Gregory Beylkin and Florian A. Bischoff and Justus A. Calvin and George I. Fann and Jacob Fosso-Tande and Diego Galindo and Jeff R. Hammond and Rebecca Hartman-Baker and Judith C. Hill and Jun Jia and Jakob S. Kottmann and M-J. Yvonne Ou and Junchen Pei and Laura E. Ratcliff and Matthew G. Reuter and Adam C. Richie-Halford and Nicho...

2016

[75] [75]

Parallel Programming with Migratable Objects: Charm++ in Practice , year =

Bilge Acun and Abhishek Gupta and Nikhil Jain and Akhil Langer and Harshitha Menon and Eric Mikida and Xiang Ni and Michael Robson and Yanhua Sun and Ehsan Totoni and Lukasz Wesolowski and Laxmikant Kale , booktitle =. Parallel Programming with Migratable Objects: Charm++ in Practice , year =

[76] [76]

Enhancing the performance of malleable MPI applications by using performance-aware dynamic reconfiguration , journal =

Gonzalo Mart\'. Enhancing the performance of malleable MPI applications by using performance-aware dynamic reconfiguration , journal =. 2015 , doi =

2015

[77] [77]

Dynamic Reconfiguration of Noniterative Scientific Applications: A Case Study with HPG Aligner , journal =

Sergio Iserte and H\'. Dynamic Reconfiguration of Noniterative Scientific Applications: A Case Study with HPG Aligner , journal =. 2019 , doi =

2019

[78] [78]

Kale , booktitle =

Suraj Prabhakaran and Marcel Neumann and Sebastian Rinke and Felix Wolf and Abhishek Gupta and Laxmikant V. Kale , booktitle =. A Batch System with Efficient Adaptive Scheduling for Malleable and Evolving Applications , year =

[79] [79]

Algorithms and Computation , year =

Marc Tchiboukdjian and Nicolas Gast and Denis Trystram and Jean-Louis Roch and Julien Bernard , title =. Algorithms and Computation , year =

[80] [80]

The Implementation of the Cilk-5 Multithreaded Language , author =. Proc. Conf. on Programming Language Design and Implementation (PLDI) , year =