pith. sign in

arxiv: 1808.06094 · v2 · pith:PNEFWLBEnew · submitted 2018-08-18 · 💻 cs.DC

Pangea: Monolithic Distributed Storage for Data Analytics

classification 💻 cs.DC
keywords datasystemlikepangeaperformancestoragesystemsanalytics
0
0 comments X
read the original abstract

Storage and memory systems for modern data analytics are heavily layered, managing shared persistent data, cached data, and non-shared execution data in separate systems such as distributed file system like HDFS, in-memory file system like Alluxio and computation framework like Spark. Such layering introduces significant performance and management costs for copying data across layers redundantly and deciding proper resource allocation for all layers. In this paper we propose a single system called Pangea that can manage all data---both intermediate and long-lived data, and their buffer/caching, data placement optimization, and failure recovery---all in one monolithic storage system, without any layering. We present a detailed performance evaluation of Pangea and show that its performance compares favorably with several widely used layered systems such as Spark.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.