pith. sign in

arxiv: 1808.07541 · v1 · pith:UMG33SAUnew · submitted 2018-08-22 · 💻 cs.DL

Reproducible data citations for computational research

classification 💻 cs.DL
keywords datapublicationresultsourcescodecomputationalcitationcitations
0
0 comments X
read the original abstract

The general purpose of a scientific publication is the exchange and spread of knowledge. A publication usually reports a scientific result and tries to convince the reader that it is valid. With an ever-growing number of papers relying on computational methods that make use of large quantities of data and sophisticated statistical modeling techniques, a textual description of the result is often not enough for a publication to be transparent and reproducible. While there are efforts to encourage sharing of code and data, we currently lack conventions for linking data sources to a computational result that is stated in the main publication text or used to generate a figure or table. Thus, here I propose a data citation format that allows for an automatic reproduction of all computations. A data citation consists of a descriptor that refers to the functional program code and the input that generated the result. The input itself may be a set of other data citations, such that all data transformations, from the original data sources to the final result, are transparently expressed by a directed graph. Functions can be implemented in a variety of programming languages since data sources are expected to be stored in open and standardized text-based file formats. A publication is then an online file repository consisting of a Hypertext Markup Language (HTML) document and additional data and code source files, together with a summarization of all data sources, similar to a list of references in a bibliography.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.