Open Science Data Federation -- operation and monitoring
Pith reviewed 2026-05-19 14:40 UTC · model grok-4.3
The pith
The Open Science Data Federation has become an integral part of the U.S. national cyberinfrastructure landscape due to the sharing requirements of recent NSF solicitations, which the OSDF is uniquely positioned to enable.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The Open Science Data Federation builds upon the successful StashCache project to create a global data access network. The OSDF expands the StashCache project to add new data origins and caches, access methods, monitoring, and accounting mechanisms. Additionally, the OSDF has become an integral part of the U.S. national cyberinfrastructure landscape due to the sharing requirements of recent NSF solicitations, which the OSDF is uniquely positioned to enable. The OSDF continues to be utilized by many research collaborations and individual users, which pull the data to many research infrastructures and projects.
What carries the argument
The network of data origins, caches, access methods, monitoring, and accounting mechanisms that extend StashCache into the Open Science Data Federation for global sharing.
If this is right
- Data can be distributed efficiently to processing sites across research infrastructures.
- Methods become available to share data with collaborators in a global network.
- Recent NSF solicitations' sharing requirements are met through the OSDF design.
- Utilization grows among research collaborations and individual users pulling data to projects.
Where Pith is reading between the lines
- The added monitoring and accounting features may support improved tracking of data usage across large scientific projects.
- Extending an existing system like StashCache could encourage similar incremental federations for other data-intensive fields.
- Wider adoption might promote standardized access methods that reduce duplication in distributed research environments.
Load-bearing premise
The OSDF's combination of origins, caches, and accounting mechanisms is uniquely able to satisfy the data-sharing mandates in recent NSF solicitations without comparable alternatives existing.
What would settle it
Demonstration of another system that fulfills the data-sharing requirements of recent NSF solicitations with comparable efficiency, coverage, and features for distribution, monitoring, and accounting.
Figures
read the original abstract
Extensive data processing is becoming commonplace in many fields of science. Distributing data to processing sites and providing methods to share the data with collaborators efficiently has become essential. The Open Science Data Federation (OSDF) builds upon the successful StashCache project to create a global data access network. The OSDF expands the StashCache project to add new data origins and caches, access methods, monitoring, and accounting mechanisms. Additionally, the OSDF has become an integral part of the U.S. national cyberinfrastructure landscape due to the sharing requirements of recent NSF solicitations, which the OSDF is uniquely positioned to enable. The OSDF continues to be utilized by many research collaborations and individual users, which pull the data to many research infrastructures and projects.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript describes the Open Science Data Federation (OSDF) as an expansion of the StashCache project, detailing additions of data origins, caches, access methods, monitoring, and accounting mechanisms. It reports ongoing utilization by research collaborations and asserts that OSDF has become integral to U.S. national cyberinfrastructure because recent NSF solicitations impose data-sharing requirements that OSDF is uniquely positioned to satisfy.
Significance. A deployed global data-access network supporting scientific workflows is relevant to distributed computing and open-science infrastructure. The manuscript provides a concrete operational account of origins, caches, and accounting that could serve as a reference implementation; however, the absence of quantitative usage metrics, error analysis, or adoption statistics tied to specific NSF mandates limits the strength of the impact assessment.
major comments (1)
- [Abstract] Abstract: the assertion that OSDF is 'uniquely positioned to enable' the data-sharing requirements of recent NSF solicitations is load-bearing for the claim of national-cyberinfrastructure integration, yet the text supplies neither explicit mapping of solicitation language to OSDF features nor any comparison with peer systems (e.g., Globus, XRootD federations, or other cache networks).
minor comments (1)
- The manuscript would benefit from a concise table or diagram summarizing the added components (origins, caches, monitoring, accounting) and their interfaces.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback on the manuscript. The major comment highlights an important area where the abstract's claim requires additional support to fully substantiate OSDF's role in national cyberinfrastructure. We address this point below and will incorporate revisions to strengthen the presentation.
read point-by-point responses
-
Referee: [Abstract] Abstract: the assertion that OSDF is 'uniquely positioned to enable' the data-sharing requirements of recent NSF solicitations is load-bearing for the claim of national-cyberinfrastructure integration, yet the text supplies neither explicit mapping of solicitation language to OSDF features nor any comparison with peer systems (e.g., Globus, XRootD federations, or other cache networks).
Authors: We agree that the abstract makes a concise but strong assertion without an accompanying explicit mapping or comparison, which weakens the supporting evidence for the national-cyberinfrastructure claim. The manuscript body describes OSDF features such as expanded data origins, caches, access methods, monitoring, and accounting, but does not directly link these to specific NSF solicitation language or differentiate from peer systems. To address this, we will revise the manuscript by expanding the introduction with a dedicated paragraph that maps key elements of recent NSF data-sharing requirements (e.g., mandates for accessible data management and collaboration support) to OSDF capabilities like its global caching and accounting infrastructure. We will also add a brief comparison subsection noting distinctions from Globus (focused on transfer services) and XRootD federations (emphasizing high-energy physics data access), highlighting OSDF's open-science orientation and StashCache heritage. These changes will be included in the revised version. revision: yes
Circularity Check
No circularity detected; operational report with no derivations or self-referential reductions
full rationale
The paper is a descriptive operational report on the OSDF system, its expansion from StashCache, components (origins, caches, monitoring, accounting), and usage by research collaborations. The assertion that OSDF is integral to U.S. cyberinfrastructure due to NSF solicitations and 'uniquely positioned' is presented as an empirical statement without any derivation chain, equations, predictions, or fitted parameters. No self-definitional loops, fitted inputs renamed as predictions, or load-bearing self-citations reducing to unverified prior claims by the same authors are present. The content is self-contained as a factual description of system operation and does not rely on internal reductions for its central claims.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The Open Science Data Federation (OSDF) builds upon the successful StashCache project to create a global data access network... origins, caches, access methods, monitoring, and accounting mechanisms.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The OSDF has become an integral part of the U.S. national cyberinfrastructure landscape due to the sharing requirements of recent NSF solicitations
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
2024. Home - Access — access-ci.org. https://access-ci.org/. [Accessed 20-Apr-2024]
work page 2024
-
[2]
2024. National Research Platform. https://nationalresearchplatform.org/. [Accessed 20-Apr-2024]
work page 2024
-
[3]
2024. Pelican Platform. https://pelicanplatform.org/. [Accessed 4-Jun-2024]
work page 2024
-
[4]
Coluci and Fabio Andrijauskas and Sócrates O
Vitor R. Coluci, Fabio Andrijauskas, and Sócrates O. Dantas. 2023. 8 - Modeling thermal conductivity with Green’s function molecular dynamics simulations. InModeling, Characterization, and Production of Nanomaterials (Second Edition)(second edition ed.), Vinod K. Tewary and Yong Zhang (Eds.). Woodhead Publishing, 171–187. https: //doi.org/10.1016/B978-0-1...
-
[5]
Ziyue Deng, Alex Sim, Kesheng Wu, Chin Guok, Damian Hazen, Inder Monga, Fabio Andrijauskas, Frank Würthwein, and Derek Weitzel. 2023. Analyzing Transatlantic Network Traffic over Scientific Data Caches. InProceedings of the 2023 on Systems and Network Telemetry and Analytics (HPDC ’23). ACM. https://doi.org/10.1145/3589012.3594897
-
[6]
Alvise Dorigo, Peter Elmer, Fabrizio Furano, and Andrew Hanushevsky. 2005. XROOTD-A Highly scalable architecture for data access.WSEAS Transactions on Computers1, 4.3, 348–353
work page 2005
-
[7]
E Fajardo, A Tadel, M Tadel, B Steer, T Martin, and F Würthwein. 2018. A federated Xrootd cache.Journal of Physics: Conference Series1085, 3, 032025. https://doi.org/10.1088/1742-6596/1085/3/032025
- [8]
-
[9]
Brown, Peter Couvares, Frank Wurthwein, and Edgar Fajardo Hernandez
Derek Weitzel, Brian Bockelman, Duncan A. Brown, Peter Couvares, Frank Wurthwein, and Edgar Fajardo Hernandez
-
[10]
Data Access for LIGO on the OSG. InProceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact(New Orleans, LA, USA)(PEARC17). Association for Computing Machinery, New York, NY, USA, Article 24, 6 pages. https://doi.org/10.1145/3093338.3093363
-
[11]
Derek Weitzel, Marian Zvada, Ilija Vukotic, Rob Gardner, Brian Bockelman, Mats Rynge, Edgar Fajardo Hernandez, Brian Lin, and Mátyás Selmeci. 2019. StashCache: A Distributed Caching Federation for the Open Science Grid. InProceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (Learning) (Chicago, IL, USA)(PEARC ’...
-
[12]
Alex Withers, Brian Bockelman, Derek Weitzel, Duncan Brown, Jeff Gaynor, Jim Basney, Todd Tannenbaum, and Zach Miller. 2018. SciTokens: Capability-Based Secure Access to Remote Scientific Data. InProceedings of the Practice and Experience on Advanced Research Computing(Pittsburgh, PA, USA)(PEARC ’18). Association for Computing Machinery, New York, NY, USA...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.