pith. sign in

arxiv: 2605.20132 · v1 · pith:JXRUEE6Qnew · submitted 2026-05-19 · ⚛️ physics.geo-ph · cs.LG· eess.SP

FiLark: a streaming-first software framework for end-to-end exploration, annotation, and algorithm integration in distributed acoustic sensing

Pith reviewed 2026-05-20 02:36 UTC · model grok-4.3

classification ⚛️ physics.geo-ph cs.LGeess.SP
keywords distributed acoustic sensingstreaming data frameworkinteractive visualizationevent annotationsignal processingreal-time monitoringPython softwareconstant memory processing
0
0 comments X

The pith

A streaming-first framework lets DAS workflows move from interactive exploration to production pipelines without changes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces FiLark, a Python framework that treats continuous DAS data sources as unified streams instead of manually selected segments. It builds data access, signal processing, visualization, and monitoring around this single abstraction. An OpenGL ring-buffer renderer supports interactive browsing of arbitrarily long recordings while using constant memory. An annotation interface allows event labeling directly inside the stream to produce machine-learning datasets without offline steps. Stateful chunked execution in the processing library keeps continuity across boundaries, and a monitor interface brings detectors into the same workflow.

Core claim

By presenting any DAS source as a unified stream and applying the streaming abstraction uniformly, FiLark enables an OpenGL-based ring-buffer renderer for constant-memory browsing of long recordings, direct-in-stream event annotation for reproducible labeled datasets, a signal-processing library with temporal spatial spectral and decomposition operators plus CPU and GPU implementations that use stateful chunked execution to preserve continuity, and a monitor interface that integrates streaming detectors into visualization, so that configurations developed interactively transfer directly to scalable production pipelines without modification.

What carries the argument

The unified streaming abstraction that presents all DAS sources as continuous streams and supports stateful chunked execution to maintain processing continuity across segment boundaries.

If this is right

  • Processing configurations and workflows developed interactively transfer directly to scalable production pipelines without modification.
  • Interactive browsing and visualization of arbitrarily long recordings is possible with constant memory usage.
  • Event labeling directly within continuous data streams creates reproducible machine-learning-ready labeled datasets without offline preprocessing.
  • Streaming detectors and learning-based models integrate into the visualization workflow through the standardized monitor interface.
  • Temporal spatial spectral and decomposition operators maintain continuity via stateful chunked execution on both CPU and GPU paths.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same streaming design could shorten the time between prototyping a DAS pipeline and running it at field scale.
  • Constant-memory browsing may let analysts work with multi-hour recordings on ordinary laptops rather than high-memory servers.
  • The approach could be tested on other continuous high-channel sensor streams such as large-scale seismic or ocean-bottom arrays.
  • Integration with existing PyTorch models may allow rapid iteration between annotation and model retraining inside the same interface.

Load-bearing premise

That stateful chunked execution preserves processing continuity and application semantics across segment boundaries for all relevant DAS tasks including those with long-range spatial correlations or spectral decompositions.

What would settle it

Apply a long-range spatial correlation operator to a continuous DAS recording once as a single stream and once as successive chunks, then compare the outputs for exact matches at every boundary point.

Figures

Figures reproduced from arXiv: 2605.20132 by Jintao Li, Kai Tong, Weichang Li, Xaingyu Guo.

Figure 1
Figure 1. Figure 1: Layered architecture of FiLark. The framework is organized into three user-facing phases supported by two [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Ring-buffer GPU texture mechanism. The fixed-size GPU texture is divided into equal-width time slots [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The FiLark annotation interface applied to a DAS recording. Bounding boxes, polylines, and masks are drawn [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Region-of-interest (ROI) analysis window in FiLark. A segment selected on the streaming canvas is [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Streaming renderer throughput as a function of channel count [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: OOI ocean-cable DAS data (Wilcock et al., 2023) processed with FiLark’s GPU-accelerated signal processing backend. (a) Raw DAS data showing strong low-frequency hydrodynamic noise. (b) Full FFT bandpass filtering (15– 32 Hz), which suppresses the noise floor and reveals seismic and acoustic arrivals. (c) Causal stateful IIR bandpass filtering (15–32 Hz) applied chunk-by-chunk with filter state preserved ac… view at source ↗
read the original abstract

Distributed acoustic sensing (DAS) systems generate continuous, ultra-high-channel-count data streams at rates that exceed the capabilities of conventional batch-oriented analysis frameworks. As a result, essential tasks such as interactive exploration of long-duration recordings, scalable event annotation, and real-time algorithm-in-the-loop monitoring remain inadequately supported by workflows built around manually selected data segments and offline processing. This paper presents FiLark (Fiber Lark), a Python framework that applies a \emph{streaming-first} principle uniformly across data access, signal processing, visualization and monitoring for DAS. Instead of operating on manually selected data segments, FiLark presents any DAS sources-including continuous multi-file recordings-as a unified stream and builds all system components around that abstraction. An OpenGL-based ring-buffer renderer enables interactive browsing and visualization of arbitrarily long recordings with constant memory usage. An integrated annotation interface supports event labeling directly within continuous data streams, facilitating the creation of reproducible machine-learning-ready labeled datasets without offline preprocessing. The signal processing library includes temporal, spatial, spectral, and decomposition-based operators, with both CPU implementations and GPU-accelerated variants via PyTorch, alongside stateful chunked execution that preserves processing continuity and application semantics across segment boundaries. A standardized monitor interface further integrates streaming detectors and learning-based models into the visualization workflow. By sharing a common streaming abstraction across all layers, FiLark allows processing configurations and workflows developed interactively to transfer directly to scalable production pipelines without modification.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper introduces FiLark, a Python framework for distributed acoustic sensing (DAS) that adopts a streaming-first design across data access, signal processing, visualization, annotation, and monitoring. It presents continuous DAS sources as unified streams, uses an OpenGL ring-buffer for constant-memory interactive browsing of long recordings, supports in-stream event annotation for ML datasets, provides CPU/GPU signal processing operators with stateful chunked execution, and includes a monitor interface for detectors and models. The central claim is that a shared streaming abstraction enables interactive workflows to transfer directly to scalable production pipelines without modification while preserving processing continuity across chunks.

Significance. If the implementation details and verification hold, FiLark could meaningfully advance DAS analysis by unifying interactive exploration and large-scale processing under one abstraction, reducing the need for manual segment selection and offline preprocessing. The emphasis on reproducible annotation and seamless interactive-to-production transfer addresses practical bottlenecks in high-channel-count geophysical data workflows.

major comments (1)
  1. [Abstract] Abstract (signal processing library description): the assertion that 'stateful chunked execution ... preserves processing continuity and application semantics across segment boundaries' is load-bearing for the central claim of unmodified workflow transfer and constant-memory operation. The manuscript supplies no explicit overlap strategy, internal state carry-over mechanism (e.g., filter histories or phase buffers), or quantitative equivalence tests comparing chunked versus full-batch results for spectral decompositions or operators with long-range spatial correlations.
minor comments (1)
  1. The abstract and description would benefit from a brief table or diagram summarizing the supported operators and their chunked vs. batch equivalence guarantees.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thoughtful and constructive review. The feedback on the abstract's claim regarding stateful chunked execution is well taken and directly addresses a load-bearing aspect of the central contribution. We address this point below and will revise the manuscript to provide the requested details.

read point-by-point responses
  1. Referee: [Abstract] Abstract (signal processing library description): the assertion that 'stateful chunked execution ... preserves processing continuity and application semantics across segment boundaries' is load-bearing for the central claim of unmodified workflow transfer and constant-memory operation. The manuscript supplies no explicit overlap strategy, internal state carry-over mechanism (e.g., filter histories or phase buffers), or quantitative equivalence tests comparing chunked versus full-batch results for spectral decompositions or operators with long-range spatial correlations.

    Authors: We agree that the manuscript does not currently supply explicit implementation details on overlap strategies, state carry-over mechanisms, or quantitative equivalence tests in the abstract or main text. The framework implements stateful operators as persistent Python objects (or PyTorch modules) that retain internal buffers such as FIR filter histories and phase accumulators across successive chunks; an overlap-add scheme with configurable overlap length (typically 50% for spectral operators) is used to ensure continuity for operators with long-range dependencies. In the revised version we will expand the signal-processing section with a dedicated subsection describing these mechanisms, including pseudocode for state propagation and an overlap strategy table. We will also add quantitative equivalence results (L2-norm differences and spectral coherence metrics) comparing chunked versus full-batch execution for representative operators such as short-time Fourier transforms and spatial wavenumber filters, to be included either in the main text or as supplementary material. revision: yes

Circularity Check

0 steps flagged

No circularity: software framework description with no derivations or self-referential reductions

full rationale

The manuscript describes a Python software framework (FiLark) built around a streaming abstraction for DAS data. All central claims concern implementation features such as ring-buffer visualization, annotation interfaces, signal-processing operators, and stateful chunked execution. No equations, fitted parameters, predictions, or first-principles derivations appear. The text simply states that the library 'includes ... stateful chunked execution that preserves processing continuity' as a design property of the provided code, without reducing any result to itself by construction or via self-citation chains. The contribution is therefore self-contained as an engineering artifact rather than a mathematical argument that collapses into its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The contribution is a software framework rather than a theoretical derivation. It relies on standard assumptions about continuous data availability and the correctness of underlying libraries (PyTorch, OpenGL) but introduces no new physical constants, fitted parameters, or postulated entities.

axioms (1)
  • domain assumption DAS data sources can be presented as continuous unified streams without loss of essential temporal or spatial information
    Invoked when the framework treats multi-file recordings as a single stream for all downstream components.

pith-pipeline@v0.9.0 · 5811 in / 1293 out tokens · 51075 ms · 2026-05-20T02:36:04.361098+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages

  1. [1]

    Seismica 3, 10–26443

    Dascore: A python library for distributed fiber optic sensing. Seismica 3, 10–26443. doi:10.26443/seismica.v3i2.1184. Hu, M., Li, Z.,

  2. [2]

    Seismological Research Letters 95, 3055–3066

    Daspy: A python toolbox for das seismology. Seismological Research Letters 95, 3055–3066. doi:10.1785/0220240124. Lellouch, A., Lindsey, N.J., Ellsworth, W.L., Biondi, B.,

  3. [3]

    Journal of Geophysical Research: Solid Earth 126, e2020JB020925

    On the detection capabilities of underwater distributed acoustic sensing. Journal of Geophysical Research: Solid Earth 126, e2020JB020925. doi:10.1029/2020JB020925. Mousavi, S.M., Beroza, G.C.,

  4. [4]

    Science 377, eabm4470

    Deep-learning seismology. Science 377, eabm4470. doi:10.1126/science.abm4470. Mousavi, S.M., Sheng, Y., Zhu, W., Beroza, G.C.,

  5. [5]

    IEEE Access 7, 179464–179476

    STEAD: A large-scale standardized earthquake dataset for deep learning-based seismic phase recognition. IEEE Access 7, 179464–179476. doi:10.1109/ACCESS.2019.2947848. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al.,

  6. [6]

    8024–8035

    PyTorch: An imperative style, high-performance deep learning library, in: Advances in Neural Information Processing Systems, pp. 8024–8035. Russell,B.C.,Torralba,A.,Murphy,K.P.,Freeman,W.T.,2008. Labelme:adatabaseandweb-basedtoolforimageannotation. Internationaljournal of computer vision 77, 157–173. Sladen, A., Rivet, D., Ampuero, J.P., De Barros, L., He...

  7. [7]

    Trabattoni, A., Baillet, M., van den Ende, M., Rivet, D., Stutzmann, E., Strumia, C., Biagioli, F.,

    doi:10.1038/s41467-019-13793-z. Trabattoni, A., Baillet, M., van den Ende, M., Rivet, D., Stutzmann, E., Strumia, C., Biagioli, F.,

  8. [8]

    Seismological Research Letters 96, 3221–3230

    Xdas: a python framework for distributed acoustic sensing. Seismological Research Letters 96, 3221–3230. doi:10.1785/0220240366. Wang,X.,Zhan,Z.,Williams,E.F.,Karrenbach,M.,Lellouch,A.,Mondanos,M.,2019. Distributedacousticsensingfor3Dimagingofnear-surface structures during traffic-noise interferometry. Geophysical Research Letters 46, 6469–6478. doi:10.10...

  9. [9]

    JASA Express Letters 3, 026002

    Distributed acoustic sensing recordings of low-frequency whale calls and ship noise offshore central oregon. JASA Express Letters 3, 026002. doi:10.1121/10.0017104. Page 15 of 16 FiLark: streaming-first DAS framework Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., et al.,

  10. [10]

    Transformers: State-of-the-art natural language processing, in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 38–45. doi:10.18653/v1/2020.emnlp-demos.6. Wu, X., Luo, S., Dupont, L., Gao, L., Zhao, W.,

  11. [11]

    Geophysics 84, IM35–IM45

    FaultSeg3D: Using synthetic datasets to train an end-to-end convolutional neural network for 3D seismic fault segmentation. Geophysics 84, IM35–IM45. doi:10.1190/geo2018-0646.1. Zhan, Z.,

  12. [12]

    Geophysical Journal International 222, 1022–1031

    Rapid seismic waveform classification and arrival picking using convolutional neural networks. Geophysical Journal International 222, 1022–1031. doi:10.1093/gji/ggz076. Page 16 of 16