pith. sign in

arxiv: 2605.15437 · v1 · pith:GDRVRLJCnew · submitted 2026-05-14 · 💻 cs.DC

Open Science Data Federation -- operation and monitoring

Pith reviewed 2026-05-19 14:40 UTC · model grok-4.3

classification 💻 cs.DC
keywords Open Science Data FederationStashCachedata access networkcyberinfrastructuredata sharingNSF solicitationsglobal data federationmonitoring and accounting
0
0 comments X

The pith

The Open Science Data Federation has become an integral part of the U.S. national cyberinfrastructure landscape due to the sharing requirements of recent NSF solicitations, which the OSDF is uniquely positioned to enable.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that the Open Science Data Federation creates a global data access network by building on the StashCache project. It adds new data origins and caches, access methods, monitoring, and accounting mechanisms to support efficient data distribution and sharing. A sympathetic reader would care because extensive data processing is now common across science fields, making methods to share data with collaborators essential. The OSDF addresses these needs and meets data-sharing mandates from recent NSF solicitations, positioning it as a key element in national research infrastructure used by many collaborations and projects.

Core claim

The Open Science Data Federation builds upon the successful StashCache project to create a global data access network. The OSDF expands the StashCache project to add new data origins and caches, access methods, monitoring, and accounting mechanisms. Additionally, the OSDF has become an integral part of the U.S. national cyberinfrastructure landscape due to the sharing requirements of recent NSF solicitations, which the OSDF is uniquely positioned to enable. The OSDF continues to be utilized by many research collaborations and individual users, which pull the data to many research infrastructures and projects.

What carries the argument

The network of data origins, caches, access methods, monitoring, and accounting mechanisms that extend StashCache into the Open Science Data Federation for global sharing.

If this is right

  • Data can be distributed efficiently to processing sites across research infrastructures.
  • Methods become available to share data with collaborators in a global network.
  • Recent NSF solicitations' sharing requirements are met through the OSDF design.
  • Utilization grows among research collaborations and individual users pulling data to projects.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The added monitoring and accounting features may support improved tracking of data usage across large scientific projects.
  • Extending an existing system like StashCache could encourage similar incremental federations for other data-intensive fields.
  • Wider adoption might promote standardized access methods that reduce duplication in distributed research environments.

Load-bearing premise

The OSDF's combination of origins, caches, and accounting mechanisms is uniquely able to satisfy the data-sharing mandates in recent NSF solicitations without comparable alternatives existing.

What would settle it

Demonstration of another system that fulfills the data-sharing requirements of recent NSF solicitations with comparable efficiency, coverage, and features for distribution, monitoring, and accounting.

Figures

Figures reproduced from arXiv: 2605.15437 by Derek Weitzel, Fabio Andrijauskas, Frank Wuerthwein.

Figure 1
Figure 1. Figure 1: OSDF caches and origins worldwide. 2 BACKGROUND In order to serve data to computational workflows running on the distributed computing infras￾tructure, the OSG utilizes the Open Science Data Federation [10]. At the crux of this data delivery framework are the concepts of "origin," "caches," and "redirectors," all implemented as services via the XrootD [6] software framework, which allows for low latency an… view at source ↗
Figure 2
Figure 2. Figure 2: The Open Science Data Federation uses XrootD, and several other tools to serve data to execution [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The monthly evolution of cache storage per project namespace shows the steady growth of the OSDF [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
read the original abstract

Extensive data processing is becoming commonplace in many fields of science. Distributing data to processing sites and providing methods to share the data with collaborators efficiently has become essential. The Open Science Data Federation (OSDF) builds upon the successful StashCache project to create a global data access network. The OSDF expands the StashCache project to add new data origins and caches, access methods, monitoring, and accounting mechanisms. Additionally, the OSDF has become an integral part of the U.S. national cyberinfrastructure landscape due to the sharing requirements of recent NSF solicitations, which the OSDF is uniquely positioned to enable. The OSDF continues to be utilized by many research collaborations and individual users, which pull the data to many research infrastructures and projects.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript describes the Open Science Data Federation (OSDF) as an expansion of the StashCache project, detailing additions of data origins, caches, access methods, monitoring, and accounting mechanisms. It reports ongoing utilization by research collaborations and asserts that OSDF has become integral to U.S. national cyberinfrastructure because recent NSF solicitations impose data-sharing requirements that OSDF is uniquely positioned to satisfy.

Significance. A deployed global data-access network supporting scientific workflows is relevant to distributed computing and open-science infrastructure. The manuscript provides a concrete operational account of origins, caches, and accounting that could serve as a reference implementation; however, the absence of quantitative usage metrics, error analysis, or adoption statistics tied to specific NSF mandates limits the strength of the impact assessment.

major comments (1)
  1. [Abstract] Abstract: the assertion that OSDF is 'uniquely positioned to enable' the data-sharing requirements of recent NSF solicitations is load-bearing for the claim of national-cyberinfrastructure integration, yet the text supplies neither explicit mapping of solicitation language to OSDF features nor any comparison with peer systems (e.g., Globus, XRootD federations, or other cache networks).
minor comments (1)
  1. The manuscript would benefit from a concise table or diagram summarizing the added components (origins, caches, monitoring, accounting) and their interfaces.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback on the manuscript. The major comment highlights an important area where the abstract's claim requires additional support to fully substantiate OSDF's role in national cyberinfrastructure. We address this point below and will incorporate revisions to strengthen the presentation.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the assertion that OSDF is 'uniquely positioned to enable' the data-sharing requirements of recent NSF solicitations is load-bearing for the claim of national-cyberinfrastructure integration, yet the text supplies neither explicit mapping of solicitation language to OSDF features nor any comparison with peer systems (e.g., Globus, XRootD federations, or other cache networks).

    Authors: We agree that the abstract makes a concise but strong assertion without an accompanying explicit mapping or comparison, which weakens the supporting evidence for the national-cyberinfrastructure claim. The manuscript body describes OSDF features such as expanded data origins, caches, access methods, monitoring, and accounting, but does not directly link these to specific NSF solicitation language or differentiate from peer systems. To address this, we will revise the manuscript by expanding the introduction with a dedicated paragraph that maps key elements of recent NSF data-sharing requirements (e.g., mandates for accessible data management and collaboration support) to OSDF capabilities like its global caching and accounting infrastructure. We will also add a brief comparison subsection noting distinctions from Globus (focused on transfer services) and XRootD federations (emphasizing high-energy physics data access), highlighting OSDF's open-science orientation and StashCache heritage. These changes will be included in the revised version. revision: yes

Circularity Check

0 steps flagged

No circularity detected; operational report with no derivations or self-referential reductions

full rationale

The paper is a descriptive operational report on the OSDF system, its expansion from StashCache, components (origins, caches, monitoring, accounting), and usage by research collaborations. The assertion that OSDF is integral to U.S. cyberinfrastructure due to NSF solicitations and 'uniquely positioned' is presented as an empirical statement without any derivation chain, equations, predictions, or fitted parameters. No self-definitional loops, fitted inputs renamed as predictions, or load-bearing self-citations reducing to unverified prior claims by the same authors are present. The content is self-contained as a factual description of system operation and does not rely on internal reductions for its central claims.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a descriptive systems paper with no mathematical derivations, fitted parameters, or postulated entities; the central statements rest on operational claims about adoption and positioning within NSF requirements.

pith-pipeline@v0.9.0 · 5655 in / 1011 out tokens · 40864 ms · 2026-05-19T14:40:13.037431+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages

  1. [1]

    Home - Access — access-ci.org

    2024. Home - Access — access-ci.org. https://access-ci.org/. [Accessed 20-Apr-2024]

  2. [2]

    National Research Platform

    2024. National Research Platform. https://nationalresearchplatform.org/. [Accessed 20-Apr-2024]

  3. [3]

    Pelican Platform

    2024. Pelican Platform. https://pelicanplatform.org/. [Accessed 4-Jun-2024]

  4. [4]

    Coluci and Fabio Andrijauskas and Sócrates O

    Vitor R. Coluci, Fabio Andrijauskas, and Sócrates O. Dantas. 2023. 8 - Modeling thermal conductivity with Green’s function molecular dynamics simulations. InModeling, Characterization, and Production of Nanomaterials (Second Edition)(second edition ed.), Vinod K. Tewary and Yong Zhang (Eds.). Woodhead Publishing, 171–187. https: //doi.org/10.1016/B978-0-1...

  5. [5]

    Ziyue Deng, Alex Sim, Kesheng Wu, Chin Guok, Damian Hazen, Inder Monga, Fabio Andrijauskas, Frank Würthwein, and Derek Weitzel. 2023. Analyzing Transatlantic Network Traffic over Scientific Data Caches. InProceedings of the 2023 on Systems and Network Telemetry and Analytics (HPDC ’23). ACM. https://doi.org/10.1145/3589012.3594897

  6. [6]

    Alvise Dorigo, Peter Elmer, Fabrizio Furano, and Andrew Hanushevsky. 2005. XROOTD-A Highly scalable architecture for data access.WSEAS Transactions on Computers1, 4.3, 348–353

  7. [7]

    E Fajardo, A Tadel, M Tadel, B Steer, T Martin, and F Würthwein. 2018. A federated Xrootd cache.Journal of Physics: Conference Series1085, 3, 032025. https://doi.org/10.1088/1742-6596/1085/3/032025

  8. [8]

    David Schultz, Igor Sfiligoi, Benedikt Riedel, Fabio Andrijauskas, Derek Weitzel, and Frank Würthwein. 2023. IceCube experience using XRootD-based Origins with GPU workflows in PNRP. arXiv:2308.07999 [physics.comp-ph]

  9. [9]

    Brown, Peter Couvares, Frank Wurthwein, and Edgar Fajardo Hernandez

    Derek Weitzel, Brian Bockelman, Duncan A. Brown, Peter Couvares, Frank Wurthwein, and Edgar Fajardo Hernandez

  10. [10]

    and Couvares, Peter and W\"

    Data Access for LIGO on the OSG. InProceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact(New Orleans, LA, USA)(PEARC17). Association for Computing Machinery, New York, NY, USA, Article 24, 6 pages. https://doi.org/10.1145/3093338.3093363

  11. [11]

    Derek Weitzel, Marian Zvada, Ilija Vukotic, Rob Gardner, Brian Bockelman, Mats Rynge, Edgar Fajardo Hernandez, Brian Lin, and Mátyás Selmeci. 2019. StashCache: A Distributed Caching Federation for the Open Science Grid. InProceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (Learning) (Chicago, IL, USA)(PEARC ’...

  12. [12]

    Alex Withers, Brian Bockelman, Derek Weitzel, Duncan Brown, Jeff Gaynor, Jim Basney, Todd Tannenbaum, and Zach Miller. 2018. SciTokens: Capability-Based Secure Access to Remote Scientific Data. InProceedings of the Practice and Experience on Advanced Research Computing(Pittsburgh, PA, USA)(PEARC ’18). Association for Computing Machinery, New York, NY, USA...