pith. sign in

arxiv: 1409.0798 · v1 · pith:CJJ6KBFDnew · submitted 2014-09-02 · 💻 cs.DB

DataHub: Collaborative Data Science & Dataset Version Management at Scale

classification 💻 cs.DB
keywords versioncontroldatadatasetabilitycollaborativedatahubdatasets
0
0 comments X
read the original abstract

Relational databases have limited support for data collaboration, where teams collaboratively curate and analyze large datasets. Inspired by software version control systems like git, we propose (a) a dataset version control system, giving users the ability to create, branch, merge, difference and search large, divergent collections of datasets, and (b) a platform, DataHub, that gives users the ability to perform collaborative data analysis building on this version control system. We outline the challenges in providing dataset version control at scale.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.