AXS: A framework for fast astronomical data processing based on Apache Spark

Andrew J. Connolly; Colin T. Slater; Eric C. Bellm; Krzysztof Suberlak; Mario Juri\'c; Petar Ze\v{c}evi\'c; Sven Lon\v{c}ari\'c; V. Zach Golkhou

arxiv: 1905.09034 · v2 · pith:UY3O7MOAnew · submitted 2019-05-22 · 🌌 astro-ph.IM · cs.DC

AXS: A framework for fast astronomical data processing based on Apache Spark

Petar Ze\v{c}evi\'c , Colin T. Slater , Mario Juri\'c , Andrew J. Connolly , Sven Lon\v{c}ari\'c , Eric C. Bellm , V. Zach Golkhou , Krzysztof Suberlak This is my paper

classification 🌌 astro-ph.IM cs.DC

keywords datasparkastronomicalcross-matchingpythonallwiseanalysisapache

0 comments

read the original abstract

We introduce AXS (Astronomy eXtensions for Spark), a scalable open-source astronomical data analysis framework built on Apache Spark, a widely used industry-standard engine for big data processing. Building on capabilities present in Spark, AXS aims to enable querying and analyzing almost arbitrarily large astronomical catalogs using familiar Python/AstroPy concepts, DataFrame APIs, and SQL statements. We achieve this by i) adding support to Spark for efficient on-line positional cross-matching and ii) supplying a Python library supporting commonly-used operations for astronomical data analysis. To support scalable cross-matching, we developed a variant of the ZONES algorithm (Gray et al. 2004) capable of operating in distributed, shared-nothing architecture. We couple this to a data partitioning scheme that enables fast catalog cross-matching and handles the data skew often present in deep all-sky data sets. The cross-match and other often-used functionalities are exposed to the end users through an easy-to-use Python API. We demonstrate AXS' technical and scientific performance on SDSS, ZTF, Gaia DR2, and AllWise catalogs. Using AXS we were able to perform on-the-fly cross-match of Gaia DR2 (1.8 billion rows) and AllWise (900 million rows) data sets in ~ 30 seconds. We discuss how cloud-ready distributed systems like AXS provide a natural way to enable comprehensive end-user analyses of large datasets such as LSST.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

SETI in the Spatio-Temporal Survey Domain
astro-ph.IM 2019-07 unverdicted novelty 5.0

Proposes that synoptic time domain surveys can probe 10-100 times more Cosmic Haystack volume for technosignatures than traditional radio SETI by searching spatially resolved or multi-star signals over time.