pith. sign in

arxiv: 1801.01843 · v1 · pith:YOER2WWHnew · submitted 2018-01-05 · 💻 cs.DC

Design and Performance Characterization of RADICAL-Pilot on Titan

classification 💻 cs.DC
keywords taskspilotradical-pilotabstractioncompriseddesignexecutionlarge
0
0 comments X
read the original abstract

Many extreme scale scientific applications have workloads comprised of a large number of individual high-performance tasks. The Pilot abstraction decouples workload specification, resource management, and task execution via job placeholders and late-binding. As such, suitable implementations of the Pilot abstraction can support the collective execution of large number of tasks on supercomputers. We introduce RADICAL-Pilot (RP) as a portable, modular and extensible Python-based Pilot system. We describe RP's design, architecture and implementation. We characterize its performance and show its ability to scalably execute workloads comprised of thousands of MPI tasks on Titan--a DOE leadership class facility. Specifically, we investigate RP's weak (strong) scaling properties up to 131K (65K) cores and 4096 (16384) 32 core tasks. RADICAL-Pilot can be used stand-alone, as well as integrated with other tools as a runtime system.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.