pith. sign in

arxiv: 1801.03872 · v1 · pith:QJR2F2OZnew · submitted 2018-01-11 · ✦ hep-ex · cs.DL

The archive solution for distributed workflow management agents of the CMS experiment at LHC

classification ✦ hep-ex cs.DL
keywords managementsystemworkflowagentsarchivecerncomputingdata
0
0 comments X
read the original abstract

The CMS experiment at the CERN LHC developed the Workflow Management Archive system to persistently store unstructured framework job report documents produced by distributed workflow management agents. In this paper we present its architecture, implementation, deployment, and integration with the CMS and CERN computing infrastructures, such as central HDFS and Hadoop Spark cluster. The system leverages modern technologies such as a document oriented database and the Hadoop eco-system to provide the necessary flexibility to reliably process, store, and aggregate $\mathcal{O}$(1M) documents on a daily basis. We describe the data transformation, the short and long term storage layers, the query language, along with the aggregation pipeline developed to visualize various performance metrics to assist CMS data operators in assessing the performance of the CMS computing system.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.