pith. sign in

arxiv: 1210.8242 · v1 · pith:CVXFVVQWnew · submitted 2012-10-31 · 💻 cs.DB · cs.DC

Pipelined Workflow in Hybrid MPI/Pthread runtime for External Memory Graph Construction

classification 💻 cs.DB cs.DC
keywords listconstructionedgeexternalmemoryprocessingedgesgraph
0
0 comments X
read the original abstract

Graph construction from a given set of edges is a data-intensive operator that appears in social network analysis, ontology enabled databases, and, other analytics processing. The operator represents an edge list to compressed sparse row (CSR) representation (or sometimes in adjacency list, or as clustered B-Tree storage). In this work, we show how to scale CSR construction to massive scale on SSD-enabled supercomputers such as Gordon using pipelined processing. We develop several abstraction and operations for external memory and parallel edge list and integer array processing that are utilized towards building a scalable algorithm for creating CSR representation. Our experiments demonstrate that this scheme is four to six times faster than currently available implementation. Moreover, our scheme can handle up to 8 billion edges (128GB) by using external memory as compared to prior schemes where performance degrades considerably for edge list size 26 million and beyond.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.