Monitoring Extreme-scale Lustre Toolkit
read the original abstract
We discuss the design and ongoing development of the Monitoring Extreme-scale Lustre Toolkit (MELT), a unified Lustre performance monitoring and analysis infrastructure that provides continuous, low-overhead summary information on the health and performance of Lustre, as well as on-demand, in- depth problem diagnosis and root-cause analysis. The MELT infrastructure leverages a distributed overlay network to enable monitoring of center-wide Lustre filesystems where clients are located across many network domains. We preview interactive command-line utilities that help administrators and users to observe Lustre performance at various levels of resolution, from individual servers or clients to whole filesystems, including job-level reporting. Finally, we discuss our future plans for automating the root-cause analysis of common Lustre performance problems.
This paper has not been read by Pith yet.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.