pith. sign in

arxiv: 1907.02335 · v1 · pith:OPON47XMnew · submitted 2019-07-04 · 🌌 astro-ph.IM · cs.DC

Development of a data infrastructure for a global data and analysis center in astroparticle physics

Pith reviewed 2026-05-25 09:18 UTC · model grok-4.3

classification 🌌 astro-ph.IM cs.DC
keywords astroparticle physicscosmic raysdata managementKASCADETAIGAdistributed storageopen accessair showers
0
0 comments X

The pith

The GRADLCI project applies distributed data management to give open access to KASCADE and TAIGA cosmic ray data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper describes the development of a data infrastructure using modern distributed data management technologies within the German-Russian Astroparticle Data Life Cycle Initiative. This is to provide reliable open access to data from the KASCADE experiment in Germany and the Tunka-133 setup of TAIGA in Russia. The experiments share the same latitude and have overlapping operation periods, which makes joint analysis of air-shower data valuable for studying cosmic rays in the hundreds of TeV to hundreds of PeV range. A sympathetic reader would care because this addresses the growing data volumes in astroparticle physics and enables multi-messenger approaches and machine learning on combined datasets.

Core claim

In the GRADLCI initiative, modern technologies of the distributed data management are being employed for establishing a reliable open access to the experimental cosmic-ray physics data collected by KASCADE and the Tunka-133 setup of TAIGA.

What carries the argument

The German-Russian Astroparticle Data Life Cycle Initiative (GRADLCI), which employs distributed data management technologies to enable open access and joint analysis.

If this is right

  • Joint analysis of data from two experiments becomes possible due to shared latitude and overlapping runs.
  • Supports testing of theoretical models for cosmic ray origins using multi-messenger approach.
  • Facilitates investigation of phenomena with rare statistics in particle detection.
  • Allows cross-calibration between different experiments and testing new hadronic interaction hypotheses.
  • Enables high-performance data processing and accurate data mapping for large volumes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Combined datasets could reveal new insights into air-shower physics not visible in single-experiment data.
  • Similar data infrastructures might be adopted by other global physics collaborations facing data volume challenges.
  • Open access could accelerate machine learning applications in cosmic ray research.
  • The feasibility could be tested by measuring the rate of new publications using the joint data.

Load-bearing premise

That the shared latitude and overlapping operation runs make joint analysis of KASCADE and TAIGA data particularly valuable once distributed management is implemented.

What would settle it

Implementation of the infrastructure followed by no increase in joint analyses or no new scientific results from combining the two datasets.

Figures

Figures reproduced from arXiv: 1907.02335 by A. Haungs, D. Kang, D. Kostunin, D. Wochele, F. Polgart, J. Wochele, V. Tokareva.

Figure 1
Figure 1. Figure 1: KASCADE data processing workflow (data life cycle). Data processing is performed by means of special software developed for the experiment: a data reconstruction program KRETA [31], a program for detector output simulation CRES [18] based on GEANT3 [11] and a program for detailed EAS simulation CORSIKA [19,28]. A scheme of the data reconstruction process is presented in fig. 1. The open access data are sto… view at source ↗
Figure 2
Figure 2. Figure 2: KCDC IT structure [27]. Expanding the experimental data by adding new detector components could require to change the structure of a stored event. In order to do this without the restraint of a fixed database schema, a NoSQL database MongoDB has been chosen to store the experimental data. MongoDB uses JSON-like documents with schemata. It supports field, range query, and regular expression searches, and in… view at source ↗
read the original abstract

Nowadays astroparticle physics faces a rapid data volume increase. Meanwhile, there are still challenges of testing the theoretical models for clarifying the origin of cosmic rays by applying a multi-messenger approach, machine learning and investigation of the phenomena related to the rare statistics in detecting incoming particles. The problems are related to the accurate data mapping and data management as well as to the distributed storage and high-performance data processing. In particular, one could be interested in employing such solutions in study of air-showers induced by ultra-high energy cosmic and gamma rays, testing new hypotheses of hadronic interaction or cross-calibration of different experiments. KASCADE (Karlsruhe, Germany) and TAIGA (Tunka valley, Russia) are experiments in the field of astroparticle physics, aiming at the detection of cosmic-ray air-showers, induced by the primaries in the energy range of about hundreds TeVs to hundreds PeVs. They are located at the same latitude and have an overlap in operation runs. These factors determine the interest in performing a joint analysis of these data. In the German-Russian Astroparticle Data Life Cycle Initiative (GRADLCI), modern technologies of the distributed data management are being employed for establishing a reliable open access to the experimental cosmic-ray physics data collected by KASCADE and the Tunka-133 setup of TAIGA.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript describes the German-Russian Astroparticle Data Life Cycle Initiative (GRADLCI), which applies modern distributed data management technologies to establish reliable open access to cosmic-ray air-shower data collected by the KASCADE experiment (Karlsruhe) and the Tunka-133 setup of TAIGA (Tunka valley). The motivation rests on the experiments sharing latitude and having overlapping operation periods, which the authors argue makes joint analysis of data in the hundreds of TeV to hundreds of PeV range scientifically valuable for multi-messenger studies, model testing, and cross-calibration.

Significance. If the described infrastructure is successfully deployed and provides the claimed open access, the work could facilitate joint analyses that are otherwise difficult due to data volume and distribution challenges in astroparticle physics. The initiative targets real needs in high-performance processing and distributed storage. However, the manuscript supplies no implementation details, performance metrics, or validation results, limiting assessment of its technical contribution.

major comments (1)
  1. [Abstract] Abstract: the central claim that 'modern technologies of the distributed data management are being employed for establishing a reliable open access' is stated without any description of the specific technologies, architecture, data-mapping methods, or performance benchmarks. This absence makes it impossible to evaluate whether the claimed open access has been or can be achieved.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed review and constructive feedback on our manuscript describing the GRADLCI initiative. We address the single major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that 'modern technologies of the distributed data management are being employed for establishing a reliable open access' is stated without any description of the specific technologies, architecture, data-mapping methods, or performance benchmarks. This absence makes it impossible to evaluate whether the claimed open access has been or can be achieved.

    Authors: We agree that the abstract would be strengthened by referencing the key technologies and approach. The body of the manuscript describes the distributed data management framework, including the specific systems and data-mapping strategies applied to the KASCADE and Tunka-133 datasets. We will revise the abstract to include a concise mention of these elements. Performance benchmarks and validation results are not yet available, as the infrastructure remains under active development; the current manuscript focuses on the design and initial deployment rather than quantitative metrics. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The manuscript is a descriptive project-overview paper announcing the GRADLCI initiative and the application of distributed data-management technologies to KASCADE and Tunka-133 data. It contains no equations, derivations, fitted parameters, predictions, or uniqueness theorems. The stated motivations (overlapping runs, same latitude) are background context rather than load-bearing premises that reduce to self-citation or self-definition. No step in the text reduces by construction to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The description rests on the domain assumption that joint analysis of the two datasets is scientifically worthwhile and that standard distributed technologies can deliver reliable open access; no free parameters or new entities are introduced.

axioms (1)
  • domain assumption Data from KASCADE and TAIGA can be usefully combined for joint analysis because the experiments share latitude and overlapping operation periods.
    Explicitly stated in the abstract as the factor determining interest in joint analysis.

pith-pipeline@v0.9.0 · 5796 in / 1096 out tokens · 25527 ms · 2026-05-25T09:18:21.029815+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · 2 internal anchors

  1. [1]

    Antoni, T., et al.: The cosmic-ray experiment KASCADE. Nucl. Instrum. Methods Phys. Res. Sect. A 513(3), 490–510 (2003)

  2. [2]

    Apel, W.D., et al.: The KASCADE-Grande experiment. Nucl. Instrum. Methods Phys. Res. Sect. A 620(2), 202–216 (2010)

  3. [3]

    Apel, W.D., et al.: Kneelike structure in the spectrum of the heavy component of cosmic rays observed with KASCADE-Grande. Phys. Rev. Lett. 107(17), 171104 (2011)

  4. [4]

    Apel, W.D., et al.: A comparison of the cosmic-ray energy scales of Tunka-133 and KASCADE-Grande via their radio extensions Tunka-Rex and LOPES. Phys. Lett. B 763, 179–185 (2016)

  5. [5]

    Astrophys

    Apel, W.D., et al.: KASCADE-Grande limits on the isotropic diffuse gamma-ray flux between 100 TeV and 1 EeV. Astrophys. J. 848(1), 1 (2017)

  6. [6]

    http://www.astrogrid.org

    AstroGrid: UK’s Virtual Observatory Service. http://www.astrogrid.org

  7. [7]

    Berezhnev, S., et al.: The Tunka-133 EAS cherenkov light array: Status of 2011. Nucl. Instrum. Methods Phys. Res. Sect. A 692, 98–105 (2012)

  8. [8]

    Towards a Model for Computing in European Astroparticle Physics

    Bergh¨ ofer, T., et al.: Towards a model for computing in european astroparticle physics. arXiv:1512.00988 [astro-ph.IM] (2015)

  9. [9]

    https://openaccess.mpg.de/Berlin-Declaration, published: January 2015

    Berlin declaration on open access to knowledge in the sciences and humanities. https://openaccess.mpg.de/Berlin-Declaration, published: January 2015

  10. [10]

    Bezyazeekov, P.A., et al.: Measurement of cosmic-ray air showers with the Tunka Radio Extension (Tunka-Rex). Nucl. Instrum. Methods Phys. Res. Sect. A 802, 89–96 (2015)

  11. [11]

    Brun, R., Bruyant, F., Maire, M., McPherson, A.C., Zanarini, P.: GEANT 3: user’s guide Geant 3.10, Geant 3.11. Tech. rep., CERN, Geneva (1987)

  12. [12]

    Budnev, N., et al.: The TAIGA experiment: from cosmic ray to gamma-ray astron- omy in the Tunka valley. J. Phys. Conf. Ser. 718(5), 052006 (2016)

  13. [13]

    Budnev, N., et al.: The Tunka-Grande experiment. J. of Instr. 12(06), C06019– C06019 (2017)

  14. [14]

    Using Binary File Format Description Languages for Documenting, Parsing, and Verifying Raw Data in TAIGA Experiment

    Bychkov, I., et al.: Using binary file format description languages for documenting, parsing, and verifying raw data in TAIGA experiment. In: Proceedings of the VIII International Conference “Distributed Computing and Grid-technologies in Science and Education” (GRID 2018). Dubna, Russia (2018),https://arxiv.org/ abs/1812.01324

  15. [15]

    Bychkov, I., et al.: Russian-German Astroparticle Data Life Cycle Initiative. Data J. 3(4), 56 (2018)

  16. [16]

    http://www.celeryproject.org/

    Celery distributed task queue. http://www.celeryproject.org/

  17. [17]

    http://opendata.cern.ch

    CERN Open Data. http://opendata.cern.ch

  18. [18]

    https://kcdc.ikp.kit.edu/static/pdf/kcdc_mainpage/ kcdc-Simulation-Manual.pdf, p

    Cosmic Ray Event Simulation (CRES). https://kcdc.ikp.kit.edu/static/pdf/kcdc_mainpage/ kcdc-Simulation-Manual.pdf, p. 13

  19. [19]

    https://www.ikp.kit.edu/corsika

    COsmic Ray SImulations for KAscade (CORSIKA). https://www.ikp.kit.edu/corsika

  20. [20]

    https://docs.djangoproject.com/en/2

    Django web framework documentation. https://docs.djangoproject.com/en/2. 2/

  21. [21]

    http://www.euro-vo.org

    Euro-VO: the European Virtual Observatory (a partnership of VOs including As- troGrid, the French-VO, ESO, ESA, etc.). http://www.euro-vo.org

  22. [22]

    https://ec.europa.eu/research/openscience/index.cfm?pg= open-science-cloud

    European Open Science Cloud (EOSC). https://ec.europa.eu/research/openscience/index.cfm?pg= open-science-cloud

  23. [23]

    https: //www.escape2020.eu

    European Science Cluster of Astronomy & Particle physics (ESCAPE). https: //www.escape2020.eu

  24. [24]

    Gress, O., et al.: Tunka-HiSCORE – a new array for multi-TeV γ-ray astronomy and cosmic-ray physics. Nucl. Instrum. Methods Phys. Res. Sect. A 732, 290–294 (2013)

  25. [25]

    these proceedings (2019)

    Haungs, A.: Towards a global analysis and data center in Astroparticle Physics. these proceedings (2019)

  26. [26]

    Haungs, A., et al.: Air shower measurements with the LOPES radio antenna array. Nucl. Instrum. Methods Phys. Res. Sect. A 604(1-2), S1–S8 (2009)

  27. [27]

    Haungs, A., et al.: The KASCADE Cosmic-ray Data Centre KCDC: granting open access to astroparticle physics research data. Eur. Phys. J. C 78(9), 741 (2018)

  28. [28]

    Heck, D., Schatz, G., Knapp, J., Thouw, T., Capdevielle, J.: CORSIKA: a Monte Carlo code to simulate extensive air showers. Tech. rep., Forschungszentrum Karl- sruhe GmbH, Karlsruhe (1998)

  29. [29]

    In: 35th International Cosmic Ray Conference (ICRC2017)

    Kang, D., et al.: A new release of the KASCADE cosmic ray data centre (KCDC). In: 35th International Cosmic Ray Conference (ICRC2017). p. 452. Proceedings of Science (2017)

  30. [30]

    https://kcdc.ikp.kit.edu

    KASCADE Cosmic-ray Data Center (KCDC). https://kcdc.ikp.kit.edu

  31. [31]

    https://kcdc.ikp.kit.edu/static/pdf/kcdc_mainpage/ kcdc-Simulation-Manual.pdf, p

    KASCADE Reconstruction for ExTensive Airshowers (KRETA). https://kcdc.ikp.kit.edu/static/pdf/kcdc_mainpage/ kcdc-Simulation-Manual.pdf, p. 13

  32. [32]

    Kostunin, D., et al.: Tunka Advanced Instrument for cosmic rays and Gamma Astronomy. In: 18th International Baikal Summer School on Physics of Elementary Particles and Astrophysics: Exploring the Universe through multiple messengers (ISAPP-Baikal 2018) Bolshie Koty, Lake Baikal, Russia, July 12-21, 2018 (2019)

  33. [33]

    https://docs.nginx.com/

    NGINX documentation. https://docs.nginx.com/

  34. [34]

    https://www.rabbitmq.com/

    RabbitMQ message-broker. https://www.rabbitmq.com/

  35. [35]

    http://www.spase-group.org

    SPASE: Space Physics Archive Search and Extract. http://www.spase-group.org

  36. [36]

    Scientific Data 3, 160018 (Mar 2016)

    Wilkinson, M.D., et al.: The FAIR Guiding Principles for scientific data manage- ment and stewardship. Scientific Data 3, 160018 (Mar 2016)

  37. [37]

    https://cdsweb.cern.ch/record/2296399/files/ zebra.pdf

    ZEBRA reference manual. https://cdsweb.cern.ch/record/2296399/files/ zebra.pdf

  38. [38]

    Zhurov, D., et al.: First results of the tracking system calibration of the TAIGA- IACT telescope. J. Phys.: Conf. Ser. 1181, 012045 (2019)