pith. sign in

arxiv: 2605.15378 · v1 · pith:4NGIECU2new · submitted 2026-05-14 · 💻 cs.DC · astro-ph.IM

Using the Open Science Data Federation for data distribution: Big Bear Solar Observatory use case

Pith reviewed 2026-05-19 15:19 UTC · model grok-4.3

classification 💻 cs.DC astro-ph.IM
keywords Open Science Data FederationBig Bear Solar Observatorydata distributiondata cachingimage processing pipelinescientific data accesscyber-infrastructure
0
0 comments X p. Extension
pith:4NGIECU2 Add to your LaTeX paper What is a Pith Number?
\usepackage{pith}
\pithnumber{4NGIECU2}

Prints a linked pith:4NGIECU2 badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

The pith

Integrating Big Bear Solar Observatory data into the Open Science Data Federation provides standard reliable access and enables global image processing pipelines.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that placing BBSO solar data into the Open Science Data Federation creates a dependable route for researchers to reach large datasets from any location. A network of caches keeps copies of the data close to processing sites around the world, which cuts down on transfer delays. Once this access layer is in place, the authors were able to build a pipeline that runs image processing on every BBSO image no matter where the computation occurs. Readers in data-heavy fields would care because the same pattern could remove a common bottleneck when moving and analyzing big scientific collections.

Core claim

The central claim is that integrating the Big Bear Solar Observatory data into the Open Science Data Federation provided standard and reliable data access. The OSDF caches deliver local copies of the data worldwide. This integration made it possible to create a pipeline that applies image processing techniques to all images from BBSO from any point on the planet.

What carries the argument

The Open Science Data Federation (OSDF), a global network expanded from StashCache to include twenty data origins and thirty caches together with new access methods and monitoring tools, which carries the data from BBSO origins to local caches for worldwide use.

If this is right

  • Researchers obtain standard and reliable access to BBSO data through the OSDF.
  • OSDF caches place local copies of the data at sites worldwide.
  • An image-processing pipeline for all BBSO images can run from any global location.
  • The OSDF meets sharing requirements in recent NSF solicitations for national cyber-infrastructure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same integration pattern could be tested with data from other solar or astronomical observatories to check whether comparable pipelines become feasible.
  • Wider adoption might reduce the need for each research group to maintain its own data mirrors for large image collections.
  • Performance under higher data volumes from future instruments could be measured to see where the current cache network reaches limits.

Load-bearing premise

The OSDF infrastructure of origins, caches, and access methods will deliver reliable performance and local availability for the volume of BBSO data without failures or bottlenecks.

What would settle it

A measurement showing that data retrieval from remote locations still incurs long delays or that the global image-processing pipeline fails to finish when it relies on OSDF caches for BBSO data would falsify the claim.

Figures

Figures reproduced from arXiv: 2605.15378 by Alexsandra Guadarrama, Fabio Andrijauskas, Sydney Montiel.

Figure 1
Figure 1. Figure 1: Map featuring the locations of current OSDF architectural components (https://osg [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Data flowchart of the gathered images from the BBSO to perform processing. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Sun image processed by the diffusion filter with a manually detached filament [3]. [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
read the original abstract

The growing demand for extensive data processing is now a standard in many scientific fields. Efficiently distributing data to processing sites and enabling seamless sharing has become crucial. The Open Science Data Federation (OSDF) builds on the success of the StashCache project to establish a global data distribution network. By expanding StashCache, OSDF integrates additional data origins and caches, enhancing accessibility and performance (20 origins and 30 caches), new access methods, and monitoring and accounting mechanisms. Additionally, the OSDF has become essential to the US national cyber-infrastructure landscape due to the sharing requirements of recent NSF solicitations. One use case for the OSDF is the data access to the Big Bear Solar Observatory (BBSO). Integrating the BBSO data into the OSDF provided standard and reliable data access. Moreover, the OSDF caches provide local data worldwide. Using the OSDF and the BBSO data, creating a pipeline to apply image processing techniques to all images from BBSO anywhere on the planet was possible.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper describes the Open Science Data Federation (OSDF) as an expansion of StashCache into a global data distribution network incorporating 20 origins and 30 caches along with new access methods and monitoring/accounting mechanisms. It presents the integration of Big Bear Solar Observatory (BBSO) data as a use case, asserting that this integration delivers standard and reliable data access, that OSDF caches enable local data availability worldwide, and that the combination permits creation of a pipeline to apply image processing techniques to all BBSO images from any location on the planet.

Significance. If the integration and pipeline function as described, the manuscript provides a practical demonstration of federated data infrastructure supporting distributed scientific workflows in solar physics. It illustrates how OSDF can meet NSF-mandated data-sharing requirements and enable global accessibility for observatory datasets, offering a reusable template for similar integrations in other domains.

major comments (1)
  1. [BBSO use-case section] The central claims that integrating BBSO data into OSDF 'provided standard and reliable data access' and enabled 'a pipeline to apply image processing techniques to all images from BBSO anywhere on the planet' are load-bearing for the use-case narrative but rest on descriptive assertion alone. No quantitative metrics (e.g., cache-hit rates, transfer latencies, error logs, throughput measurements, or before/after comparisons) are supplied to substantiate reliability or worldwide locality at BBSO data volumes.
minor comments (2)
  1. [Abstract] The abstract states '20 origins and 30 caches' without indicating whether these figures are current, projected, or measured at the time of writing; the manuscript should clarify the exact status and any growth trajectory.
  2. [Pipeline description] The description of the image-processing pipeline is high-level; adding a brief schematic or pseudocode of the pipeline steps would improve reproducibility without altering the core narrative.

Simulated Author's Rebuttal

1 responses · 0 unresolved

Thank you for reviewing our manuscript. We appreciate the positive assessment of the significance of the work and the specific feedback on the BBSO use-case section. Below we address the major comment.

read point-by-point responses
  1. Referee: [BBSO use-case section] The central claims that integrating BBSO data into OSDF 'provided standard and reliable data access' and enabled 'a pipeline to apply image processing techniques to all images from BBSO anywhere on the planet' are load-bearing for the use-case narrative but rest on descriptive assertion alone. No quantitative metrics (e.g., cache-hit rates, transfer latencies, error logs, throughput measurements, or before/after comparisons) are supplied to substantiate reliability or worldwide locality at BBSO data volumes.

    Authors: We concur that the claims about standard and reliable data access and the global pipeline are central to the use case and would be better supported by quantitative evidence. The manuscript as submitted emphasizes the architectural integration and the conceptual enablement of worldwide processing rather than empirical performance data. To address this concern, we will revise the manuscript to include quantitative metrics drawn from the OSDF monitoring and accounting mechanisms, such as cache hit rates and transfer statistics for BBSO data where available. We will also provide before-and-after context based on prior access methods if such information can be obtained. This will be added as a dedicated paragraph or subsection in the use-case section. revision: yes

Circularity Check

0 steps flagged

No circularity in descriptive infrastructure use-case report

full rationale

The manuscript is a high-level narrative describing the integration of BBSO data into the existing OSDF infrastructure and the resulting ability to run image-processing pipelines. It contains no equations, fitted parameters, derivations, or self-citations that could form a load-bearing chain. All claims are presented as direct outcomes of the described integration steps rather than results obtained by reducing prior self-referential inputs to themselves. This is a standard non-circular use-case report whose central assertions rest on external infrastructure behavior rather than internal definitional or fitted loops.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The report rests on the prior success and availability of the OSDF network without introducing new free parameters, axioms beyond standard infrastructure assumptions, or invented entities.

axioms (1)
  • domain assumption OSDF provides reliable global data distribution and local caching for observatory data
    Invoked when stating that integration provided standard reliable access and local data worldwide.

pith-pipeline@v0.9.0 · 5715 in / 1127 out tokens · 48778 ms · 2026-05-19T15:19:21.609535+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages

  1. [1]

    StashCache: A Distributed Caching Federation for the Open Science Grid , year =

    Weitzel, Derek and Zvada, Marian and Vukotic, Ilija and Gardner, Rob and Bockelman, Brian and Rynge, Mats and Hernandez, Edgar Fajardo and Lin, Brian and Selmeci, M\'. StashCache: A Distributed Caching Federation for the Open Science Grid , year =. doi:10.1145/3332186.3332212 , booktitle =

  2. [2]

    and Couvares, Peter and W\"

    Weitzel, Derek and Bockelman, Brian and Brown, Duncan A. and Couvares, Peter and W\". Data Access for LIGO on the OSG , year =. Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact , articleno =. doi:10.1145/3093338.3093363 , abstract =

  3. [4]

    2023 , eprint=

    IceCube experience using XRootD-based Origins with GPU workflows in PNRP , author=. 2023 , eprint=

  4. [5]

    2023 , eprint=

    The Ligo-Virgo-KAGRA Computing Infrastructure for Gravitational-wave Research , author=. 2023 , eprint=

  5. [6]

    Journal of Physics: Conference Series , abstract =

    E Fajardo and A Tadel and M Tadel and B Steer and T Martin and F Würthwein , title =. Journal of Physics: Conference Series , abstract =. 2018 , month =. doi:10.1088/1742-6596/1085/3/032025 , url =

  6. [7]

    WSEAS Transactions on Computers , volume=

    XROOTD-A Highly scalable architecture for data access , author=. WSEAS Transactions on Computers , volume=

  7. [8]

    2024 , note =

    OSDF Shoveler. 2024 , note =

  8. [9]

    2024 , note =

    XRootD. 2024 , note =

  9. [10]

    Coluci and Fabio Andrijauskas and Sócrates O

    Vitor R. Coluci and Fabio Andrijauskas and Sócrates O. Dantas , keywords =. 8 - Modeling thermal conductivity with Green’s function molecular dynamics simulations , editor =. Modeling, Characterization, and Production of Nanomaterials (Second Edition) , publisher =. 2023 , series =. doi:https://doi.org/10.1016/B978-0-12-819905-3.00008-7 , url =

  10. [11]

    2024 , note =

    , title =. 2024 , note =

  11. [12]

    Proceedings of the Practice and Experience on Advanced Research Computing , articleno =

    Withers, Alex and Bockelman, Brian and Weitzel, Derek and Brown, Duncan and Gaynor, Jeff and Basney, Jim and Tannenbaum, Todd and Miller, Zach , title =. Proceedings of the Practice and Experience on Advanced Research Computing , articleno =. 2018 , isbn =. doi:10.1145/3219104.3219135 , abstract =

  12. [13]

    Solar filaments detection using parallel programming in hybrid architectures , year =

    Andrijauskas, Fabio and Gradvohl, Andr\'. Solar filaments detection using parallel programming in hybrid architectures , year =. Proceedings of the 2012 Workshop on High-Performance Computing for Astronomy Date , pages =. doi:10.1145/2286976.2286987 , abstract =

  13. [14]

    Analyzing Transatlantic Network Traffic over Scientific Data Caches , url=

    Deng, Ziyue and Sim, Alex and Wu, Kesheng and Guok, Chin and Hazen, Damian and Monga, Inder and Andrijauskas, Fabio and Würthwein, Frank and Weitzel, Derek , year=. Analyzing Transatlantic Network Traffic over Scientific Data Caches , url=. doi:10.1145/3589012.3594897 , booktitle=