arxiv: 2604.10295 · v1 · submitted 2026-04-11 · 💻 cs.DC · cs.CE

Recognition: unknown

Icicle: Scalable Metadata Indexing and Real-Time Monitoring for HPC File Systems

Haochen Pan , Ryan Chard , Song Young Oh , Maxime Gonthier , Val\'erie Hayot-Sasson , Geoffrey Lentner , Joe Bottigliero , Rachana Ananthakrishnan

show 2 more authors

Kyle Chard Ian Foster

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:09 UTC · model grok-4.3

classification 💻 cs.DC cs.CE

keywords HPC file systemsmetadata indexingreal-time monitoringKafkaFlinkLustrescalable queriesevent ingestion

0 comments

The pith

Icicle uses a Kafka-Flink pipeline to index and monitor metadata in billion-file HPC systems with order-of-magnitude throughput gains.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large HPC file systems hold billions of files, so standard utilities like find and du no longer work for basic questions about usage or location. Icicle keeps a unified, queryable view of the file system by ingesting metadata in two ways: periodic full snapshots for bulk changes and continuous event streams for live updates from systems such as Lustre and IBM Storage Scale. The architecture routes these streams through Kafka for reliable transport and Flink for processing, then stores results in two complementary search indexes that support both single-file lookups and aggregate statistics by user, group, or directory. Experiments on production-scale datasets show substantial speedups over prior batch indexing tools, with controls to trade off consistency against latency. A reader would care because this makes day-to-day administration and interactive analysis feasible on storage that previously required offline scans or custom scripts.

Core claim

Icicle maintains a unified, up-to-date, and queryable view of file system state while supporting both periodic snapshot-based ingestion for bulk metadata updates and event-based ingestion for real-time synchronization from production systems such as Lustre and IBM Storage Scale. Built on Apache Kafka and Apache Flink, Icicle provides high-throughput, fault-tolerant, and horizontally scalable ingestion of metadata events into two complementary search indexes, enabling both individual file discovery and aggregate summary statistics by user, group, and directory. This architecture enables efficient support for both coarse-grained administrative queries and interactive analytics over billions of

What carries the argument

Dual-mode ingestion pipeline that combines snapshot-based bulk loads with event-driven real-time streams, routed through Kafka for durability and Flink for scalable processing into two complementary search indexes.

If this is right

Administrators gain fast aggregate statistics by user or directory without scanning the entire file system.
Real-time event ingestion keeps the index synchronized with rapidly changing environments that batch tools cannot track.
Tunable consistency and freshness options let operators balance query speed against metadata lag for different workloads.
Horizontal scaling of the Kafka-Flink layer supports continued growth in file count and metadata volume.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same dual-ingestion pattern could be adapted to other distributed storage systems that expose event streams, not just the two named here.
Over time the indexed metadata might support trend analysis or anomaly detection without additional data collection.
Interactive queries over the indexes could reduce reliance on periodic full-system reports in operations dashboards.

Load-bearing premise

The Kafka and Flink components can ingest and index metadata events from live Lustre and IBM Storage Scale systems at full production scale without unacceptable latency, data loss, or consistency failures.

What would settle it

Deploy Icicle on a production Lustre or IBM Storage Scale system containing billions of files under realistic user load, then measure whether ingestion throughput, query latency, and metadata completeness meet the reported order-of-magnitude improvements without data loss or excessive lag.

Figures

Figures reproduced from arXiv: 2604.10295 by Geoffrey Lentner, Haochen Pan, Ian Foster, Joe Bottigliero, Kyle Chard, Maxime Gonthier, Rachana Ananthakrishnan, Ryan Chard, Song Young Oh, Val\'erie Hayot-Sasson.

**Figure 2.** Figure 2: Icicle web interface overview. ID, group ID, permissions, and access, change, and modification timestamps normalized to ISO8601 with timezone offsets. Permission and mode bits are provided both in human-readable form (e.g., -rw-r--r--) and as integers (e.g., 100644) in content.raw. During the reduce stage, records are accumulated into batches of approximately 10 MB, the maximum payload accepted by the Glo… view at source ↗

**Figure 3.** Figure 3: Filebench throughput scaling on Lustre as the MDT [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Filebench throughput scaling on GPFS as a function of [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: Throughput versus Kafka topic partitions for a single [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

read the original abstract

Modern HPC file systems can contain billions of files and hundreds of petabytes of data, making even simple questions increasingly intractable to answer. Traditional file system utilities such as find and du fail to scale to these sizes. While external indexing tools like GUFI and Brindexer improve query performance, they remain batch-oriented and unsuitable for heterogeneous, rapidly evolving environments. We present Icicle, a scalable framework for continuous file system metadata indexing and monitoring. Icicle maintains a unified, up-to-date, and queryable view of file system state while supporting both periodic snapshot-based ingestion for bulk metadata updates and event-based ingestion for real-time synchronization from production systems such as Lustre and IBM Storage Scale. Built on Apache Kafka and Apache Flink, Icicle provides high-throughput, fault-tolerant, and horizontally scalable ingestion of metadata events into two complementary search indexes, enabling both individual file discovery and aggregate summary statistics by user, group, and directory. This architecture enables efficient support for both coarse-grained administrative queries and interactive analytics over billions of objects. Our experimental evaluation on production-scale HPC datasets demonstrates order-of-magnitude throughput improvements over existing monitoring and indexing approaches, with tunable options for balancing consistency, latency, and metadata freshness.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Icicle's dual snapshot-plus-event ingestion with Kafka and Flink is a concrete step for real-time HPC metadata, but the order-of-magnitude claims rest on evaluation details that still need checking.

read the letter

The main takeaway is a system called Icicle that combines periodic snapshots with live event streams from Lustre and IBM Storage Scale, using Kafka and Flink to feed two search indexes for both file-level queries and aggregate stats by user or directory. This dual-mode setup is the actual new element compared to the batch tools it cites. It gives a workable way to keep metadata fresh without forcing everything through one path, and the architecture description shows clear attention to fault tolerance and horizontal scaling. That part is useful engineering for the petabyte-scale problem it targets. The experiments claim big throughput wins on production datasets with tunable consistency and latency, but the abstract gives no workload specifics, baseline details, or evidence that the pipeline handled live high-rate streams without loss or spikes. If the tests relied on replayed logs or reduced loads rather than peak production conditions, the gains are harder to trust for the stated use case. This paper is for people who run or study large HPC storage systems and need monitoring that actually keeps up. Someone working on stream processing for file metadata would find the design choices worth looking at. I would send it to peer review so the evaluation can be examined directly and the implementation details can be tested against real traces.

Referee Report

2 major / 2 minor

Summary. The paper presents Icicle, a scalable framework for continuous file system metadata indexing and real-time monitoring in large-scale HPC environments. Built on Apache Kafka and Apache Flink, it supports both periodic snapshot-based ingestion for bulk updates and event-based ingestion for synchronization from production systems such as Lustre and IBM Storage Scale. The architecture maintains a unified, queryable view supporting individual file discovery and aggregate statistics by user, group, and directory. The central claim is that experiments on production-scale HPC datasets demonstrate order-of-magnitude throughput improvements over existing monitoring and indexing approaches, with tunable options for balancing consistency, latency, and metadata freshness.

Significance. If the performance and scalability claims hold under realistic conditions, Icicle addresses a pressing need in HPC for real-time metadata management at exascale, where traditional utilities and batch indexers like GUFI fail. The practical integration of established streaming technologies offers a viable path for fault-tolerant, horizontally scalable ingestion and querying over billions of objects.

major comments (2)

[Abstract and § on experimental evaluation] Abstract and experimental evaluation section: the headline claim of order-of-magnitude throughput gains on production-scale datasets provides no details on workload characteristics (e.g., event rates, file counts), comparison baselines, measurement methodology, error bars, or whether tests used live high-rate streams from Lustre/Storage Scale versus synthetic traces or post-facto replays. This is load-bearing for the central performance result.
[Architecture and ingestion sections] Architecture and ingestion pipeline description: the Kafka+Flink design for ingesting metadata events lacks concrete measurements or analysis of data loss rates, end-to-end latency under peak production loads, and consistency guarantees when handling real Lustre and IBM Storage Scale event streams at observed HPC rates. This directly affects the viability of the real-time monitoring claim.

minor comments (2)

[Abstract] Clarify in the abstract or introduction the exact scale of the production datasets used (e.g., number of files, events/sec) to better contextualize the reported gains.
[Related work and evaluation] Ensure quantitative comparisons to GUFI, Brindexer, and other baselines appear in the evaluation rather than only qualitative discussion in related work.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight important areas where additional clarity and data will strengthen the presentation of our performance and real-time monitoring claims. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract and § on experimental evaluation] Abstract and experimental evaluation section: the headline claim of order-of-magnitude throughput gains on production-scale datasets provides no details on workload characteristics (e.g., event rates, file counts), comparison baselines, measurement methodology, error bars, or whether tests used live high-rate streams from Lustre/Storage Scale versus synthetic traces or post-facto replays. This is load-bearing for the central performance result.

Authors: We agree that the experimental evaluation requires more explicit detail to support the central performance claims. In the revised manuscript we will expand the relevant section to report workload characteristics (event rates and file counts), the precise comparison baselines (including GUFI and other batch tools), the measurement methodology for throughput, error bars on reported figures, and clarification on whether the production-scale datasets were processed via live streams or post-facto replays of traces. These additions will make the order-of-magnitude gains fully reproducible and evaluable. revision: yes
Referee: [Architecture and ingestion sections] Architecture and ingestion pipeline description: the Kafka+Flink design for ingesting metadata events lacks concrete measurements or analysis of data loss rates, end-to-end latency under peak production loads, and consistency guarantees when handling real Lustre and IBM Storage Scale event streams at observed HPC rates. This directly affects the viability of the real-time monitoring claim.

Authors: We acknowledge that quantitative evidence on these operational aspects is needed to substantiate the real-time monitoring claims. While the current text describes the fault-tolerant Kafka+Flink architecture, we will add concrete measurements and analysis in the revised architecture and evaluation sections. This will include reported data loss rates, end-to-end latency figures under peak loads, and the consistency guarantees observed when ingesting live event streams from Lustre and IBM Storage Scale at the rates encountered in our production-scale experiments. revision: yes

Circularity Check

0 steps flagged

No circularity: system description and empirical results with no derivations or self-referential fits

full rationale

The paper describes an architecture (Kafka + Flink for metadata ingestion from Lustre/Storage Scale) and reports experimental throughput results on production-scale datasets. No equations, fitted parameters, predictions derived from inputs, or load-bearing self-citations appear in the provided text. Claims rest on external benchmarks and measured performance rather than any reduction to the paper's own definitions or prior self-work by construction. This is the expected non-finding for a systems/implementation paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a systems-engineering paper; no mathematical free parameters, axioms, or invented physical entities are introduced. Architectural decisions (Kafka/Flink choice, dual ingestion) are engineering choices rather than formal postulates.

pith-pipeline@v0.9.0 · 5551 in / 1171 out tokens · 36426 ms · 2026-05-10T15:09:57.848232+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

47 extracted references · 15 canonical work pages

[1]

The Lustre Storage Architecture,

P. Braam, “The Lustre Storage Architecture,” Mar. 2019, arXiv:1903.01955 [cs]. [Online]. Available: http://arxiv.org/abs/1903. 01955

work page arXiv 2019
[2]

OLCF announces storage specifications for Frontier exascale system,

M. Lakin, “OLCF announces storage specifications for Frontier exascale system,” 2021, retrieved Mar 25, 2026 from https://www.olcf.ornl.gov/2021/05/20/ olcf-announces-storage-specifications-for-frontier-exascale-system/

2021
[3]

ALCF deploys powerful new file storage systems,

N. Heinonen, “ALCF deploys powerful new file storage systems,” 2021, retrieved Mar 25, 2026 from https://www.alcf.anl.gov/news/ alcf-deploys-powerful-new-file-storage-systems

2021
[4]

Storage,

National Energy Research Scientific Computing Center (NERSC), “Storage,” 2025, retrieved Mar 25, 2026 from https://www.nersc.gov/ what-we-do/computing-for-science/data-resources/storage

2025
[5]

Monitoring Tools for Large Scale Systems,

R. Miller, J. Hill, D. A. Dillow, R. Gunasekaran, G. Shipman, and D. Maxwell, “Monitoring Tools for Large Scale Systems,” inCray User Group Conference (CUG 2010), Edinburgh, Scotland, May

2010
[6]

Available: https://cug.org/5-publications/proceedings attendee lists/CUG10CD/pages/1-program/final program/CUG10 Proceedings/pages/authors/06-10Tuesday/8C-Shipman-paper.pdf

[Online]. Available: https://cug.org/5-publications/proceedings attendee lists/CUG10CD/pages/1-program/final program/CUG10 Proceedings/pages/authors/06-10Tuesday/8C-Shipman-paper.pdf
[7]

GUFI: Fast, Secure File System Metadata Search for Both Privileged and Unprivileged Users,

D. Manno, J. Lee, P. Challa, Q. Zheng, D. Bonnie, G. Grider, and B. Settlemyer, “GUFI: Fast, Secure File System Metadata Search for Both Privileged and Unprivileged Users,” inSC22: International Conference for High Performance Computing, Networking, Storage and Analysis, Nov. 2022, pp. 1–14. [Online]. Available: https://ieeexplore.ieee.org/document/10046106/

work page arXiv 2022
[8]

Efficient Metadata Indexing for HPC Storage Systems,

A. K. Paul, B. Wang, N. Rutman, C. Spitz, and A. R. Butt, “Efficient Metadata Indexing for HPC Storage Systems,” in2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID), May 2020, pp. 162–171. [Online]. Available: https://ieeexplore.ieee.org/document/9139660/

work page arXiv 2020
[9]

GPFS: A shared- disk file system for large computing clusters,

F. B. Schmuck and R. L. Haskin, “GPFS: A shared- disk file system for large computing clusters,” inProceedings of the Conference on File and Storage Technologies, ser. FAST ’02. USA: USENIX Association, 2002, pp. 231–244. [Online]. Available: https://www.usenix.org/legacy/publications/library/ proceedings/fast02/full papers/schmuck/schmuck.pdf

2002
[10]

mmwatch command — ibm storage scale 5.2.3 documenta- tion,

IBM, “mmwatch command — ibm storage scale 5.2.3 documenta- tion,” 2025, retrieved Mar 25, 2026 from https://www.ibm.com/docs/ en/storage-scale/5.2.3?topic=reference-mmwatch-command

2025
[11]

Kafka: A distributed messaging system for log processing,

J. Kreps, N. Narkhede, J. Raoet al., “Kafka: A distributed messaging system for log processing,” inProceedings of the NetDB, vol. 11, no. 2011, Athens, Greece, 2011, pp. 1–7. [Online]. Available: https://notes.stephenholiday.com/Kafka.pdf

2011
[12]

Apache flink: Stream and batch processing in a single engine,

P. Carbone, A. Katsifodimos, S. Ewen, V . Markl, S. Haridi, and K. Tzoumas, “Apache flink: Stream and batch processing in a single engine,”The Bulletin of the Technical Committee on Data Engineering, vol. 38, no. 4, 2015. [Online]. Available: https://asterios.katsifodimos.com/assets/publications/flink-deb.pdf

2015
[13]

FSMonitor: Scalable File System Monitoring for Arbitrary Storage Systems,

A. K. Paul, R. Chard, K. Chard, S. Tuecke, A. R. Butt, and I. Foster, “FSMonitor: Scalable File System Monitoring for Arbitrary Storage Systems,” in2019 IEEE International Conference on Cluster Computing (CLUSTER). Albuquerque, NM, USA: IEEE, Sep. 2019, pp. 1–11. [Online]. Available: https://ieeexplore.ieee.org/document/8891045/

work page arXiv 2019
[14]

Globus platform services for data publication,

R. Ananthakrishnan, B. Blaiszik, K. Chard, R. Chard, B. McCollam, J. Pruyne, S. Rosen, S. Tuecke, and I. Foster, “Globus platform services for data publication,” inProceedings of the Practice and Experience on Advanced Research Computing: Seamless Creativity, ser. PEARC ’18. New York, NY , USA: Association for Computing Machinery,
[15]

Available: https://doi.org/10.1145/3219104.3219127

[Online]. Available: https://doi.org/10.1145/3219104.3219127

work page doi:10.1145/3219104.3219127
[16]

Elasticsearch,

Elasticsearch, “Elasticsearch,” 2010, retrieved Mar 25, 2026 from https: //www.elastic.co/elasticsearch

2010
[17]

Opensearch,

OpenSearch, “Opensearch,” 2021, retrieved Mar 25, 2026 from https: //opensearch.org/

2021
[18]

Octopus: Experiences with a hybrid event-driven architecture for distributed scientific computing,

H. Pan, R. Chard, S. Zhou, A. Kamatar, R. Vescovi, V . Hayot-Sasson, A. Bauer, M. Gonthier, K. Chard, and I. Foster, “Octopus: Experiences with a hybrid event-driven architecture for distributed scientific computing,” inSC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2024, pp. 496–507. [O...

work page arXiv 2024
[19]

confluent-kafka-python,

Confluent Inc., “confluent-kafka-python,” 2016, retrieved Mar 25, 2026 from https://github.com/confluentinc/confluent-kafka-python

2016
[20]

orjson Contributors, “orjson,” 2018, retrieved Mar 25, 2026 from https: //github.com/ijl/orjson

2018
[21]

Ddsketch: a fast and fully-mergeable quantile sketch with relative-error guarantees,

C. Masson, J. E. Rim, and H. K. Lee, “Ddsketch: a fast and fully-mergeable quantile sketch with relative-error guarantees,”Proc. VLDB Endow., vol. 12, no. 12, pp. 2195–2205, Aug. 2019. [Online]. Available: https://doi.org/10.14778/3352063.3352135

work page doi:10.14778/3352063.3352135 2019
[22]

Optimal quantile approximation in streams,

Z. Karnin, K. Lang, and E. Liberty, “Optimal quantile approximation in streams,” in2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), 2016, pp. 71–78. [Online]. Available: https://arxiv.org/abs/1603.05346

work page arXiv 2016
[23]

Relative error streaming quantiles,

G. Cormode, Z. Karnin, E. Liberty, J. Thaler, and P. Vesel ´y, “Relative error streaming quantiles,”J. ACM, vol. 70, no. 5, Oct. 2023. [Online]. Available: https://doi.org/10.1145/3617891

work page doi:10.1145/3617891 2023
[24]

The t-digest: Efficient estimates of distributions,

T. Dunning, “The t-digest: Efficient estimates of distributions,” Software Impacts, vol. 7, p. 100049, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2665963820300403

2021
[25]

sketches-py,

Datadog, “sketches-py,” 2020, retrieved Mar 25, 2026 from https: //github.com/DataDog/sketches-py

2020
[26]

datasketches-python,

Apache Software Foundation, “datasketches-python,” 2024, retrieved Mar 25, 2026 from https://github.com/apache/datasketches-python

2024
[27]

Filebench: A flexible framework for file system benchmarking,

V . Tarasov, “Filebench: A flexible framework for file system benchmarking,”;login: The USENIX Magazine, vol. 41, no. 1, p. 6,
[28]

Available: https://www.usenix.org/publications/login/ spring2016/tarasov

[Online]. Available: https://www.usenix.org/publications/login/ spring2016/tarasov
[29]

Spy- glass: fast, scalable metadata search for large-scale storage systems,

A. W. Leung, M. Shao, T. Bisson, S. Pasupathy, and E. L. Miller, “Spy- glass: fast, scalable metadata search for large-scale storage systems,” inProceedings of the 7th conference on File and storage technologies, ser. FAST ’09. USA: USENIX Association, Feb. 2009, pp. 153–

2009
[30]

Available: https://www.usenix.org/conference/fast-09/ spyglass-fast-scalable-metadata-search-large-scale-storage-systems

[Online]. Available: https://www.usenix.org/conference/fast-09/ spyglass-fast-scalable-metadata-search-large-scale-storage-systems
[31]

SmartStore: a new metadata organization paradigm with semantic-awareness for next-generation file systems,

Y . Hua, H. Jiang, Y . Zhu, D. Feng, and L. Tian, “SmartStore: a new metadata organization paradigm with semantic-awareness for next-generation file systems,” inProceedings of the Conference on High Performance Computing Networking, Storage and Analysis, ser. SC ’09. New York, NY , USA: Association for Computing Machinery, Nov. 2009, pp. 1–12. [Online]. A...

work page arXiv 2009
[32]

Security Aware Partitioning for efficient file system search,

A. Parker-Wood, C. Strong, E. L. Miller, and D. D. E. Long, “Security Aware Partitioning for efficient file system search,” in2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST). Incline Village, NV , USA: IEEE, May 2010, pp. 1–14. [Online]. Available: http://ieeexplore.ieee.org/document/5496990/

work page arXiv 2010
[33]

Scale and concurrency of GIGA+: file system directories with millions of files,

S. Patil and G. Gibson, “Scale and concurrency of GIGA+: file system directories with millions of files,” inProceedings of the 9th USENIX conference on File and storage technologies, ser. FAST’11. USA: USENIX Association, Feb. 2011, pp. 177–190

2011
[34]

TABLEFS: Enhancing metadata efficiency in the local file system,

K. Ren and G. Gibson, “TABLEFS: Enhancing metadata efficiency in the local file system,” in2013 USENIX Annual Technical Conference (USENIX ATC 13). San Jose, CA: USENIX Association, Jun. 2013, pp. 145–156. [Online]. Available: https://www.usenix.org/conference/ atc13/technical-sessions/presentation/ren

2013
[35]

Indexfs: Scaling file system metadata performance with stateless caching and bulk insertion,

K. Ren, Q. Zheng, S. Patil, and G. Gibson, “Indexfs: Scaling file system metadata performance with stateless caching and bulk insertion,” inSC’14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 2014, pp. 237–248. [Online]. Available: https://doi.org/10.1109/ SC.2014.25

2014
[36]

Deltafs: Exascale file systems scale better without dedicated servers,

Q. Zheng, K. Ren, G. Gibson, B. W. Settlemyer, and G. Grider, “Deltafs: Exascale file systems scale better without dedicated servers,” inProceedings of the 10th Parallel Data Storage Workshop, 2015, pp. 1–6. [Online]. Available: https://doi.org/10.1145/2834976.2834977

work page doi:10.1145/2834976.2834977 2015
[37]

InfiniFS: An efficient metadata service for large-scale distributed filesystems,

W. Lv, Y . Lu, Y . Zhang, P. Duan, and J. Shu, “InfiniFS: An efficient metadata service for large-scale distributed filesystems,” in20th USENIX Conference on File and Storage Technologies (FAST 22). Santa Clara, CA: USENIX Association, Feb. 2022, pp. 313–328. [Online]. Available: https://www.usenix.org/conference/fast22/presentation/lv

2022
[38]

LazyBase: trading freshness for performance in a scalable database,

J. Cipar, G. Ganger, K. Keeton, C. B. Morrey, C. A. Soules, and A. Veitch, “LazyBase: trading freshness for performance in a scalable database,” inProceedings of the 7th ACM european conference on Computer Systems, ser. EuroSys ’12. New York, NY , USA: Association for Computing Machinery, Apr. 2012, pp. 169–182. [Online]. Available: https://dl.acm.org/doi...

work page doi:10.1145/2168836.2168854 2012
[39]

Borgfs: File system metadata index search,

SNIA, “Borgfs: File system metadata index search,” 2014, re- trieved Mar 25, 2026 from https://www.snia.org/educational-library/ borgfs-file-system-metadata-index-search-2014

2014
[40]

Taking back control of HPC file systems with Robinhood Policy Engine,

T. Leibovici, “Taking back control of HPC file systems with Robinhood Policy Engine,” May 2015, arXiv:1505.01448 [cs]. [Online]. Available: http://arxiv.org/abs/1505.01448

work page arXiv 2015
[41]

QuickSilver: A Distributed Policy Driven Data Management System,

C. Brumgard, A. George, R. Mohr, K. Maheshwari, J. Simmons, and S. Oral, “QuickSilver: A Distributed Policy Driven Data Management System,” inWorkshop: Women in HPC: Diversifying the HPC Commu- nity and Engaging Male Allies. Dallas, TX: Association for Computing Machinery, 2022. [Online]. Available: https://sc22.supercomputing.org/ proceedings/workshops/w...

2022
[42]

Polimor: A policy engine made-to-order for automated and scalable data management in lustre,

A. George, C. Brumgard, R. Mohr, K. Maheshwari, J. Simmons, S. Oral, and J. Hanley, “Polimor: A policy engine made-to-order for automated and scalable data management in lustre,” inProceedings of the SC ’23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, ser. SC-W ’23. New York, NY , USA: Associatio...

work page doi:10.1145/3624062.3624190 2023
[43]

Cray clusterstor data services user guide,

Hewlett Packard Enterprise, “Cray clusterstor data services user guide,” 2021, retrieved Mar 25, 2026 from https://support.hpe.com/hpesc/public/ docDisplay?docId=a00114855en us&docLocale=en US

2021
[44]

Ibm spectrum scale information lifecycle man- agement policies: Practical guide,

IBM, “Ibm spectrum scale information lifecycle man- agement policies: Practical guide,” 2021, retrieved Mar 25, 2026 from https://www.ibm.com/support/pages/ ibm-spectrum-scale-information-lifecycle-management-policies-practical-guide

2021
[45]

Kernel korner: intro to inotify,

R. Love, “Kernel korner: intro to inotify,”Linux J., vol. 2005, no. 139, p. 8, Nov. 2005. [Online]. Available: https://www.linuxjournal. com/article/8478

2005
[46]

Kqueue - A Generic and Scalable Event Notification Facility,

J. Lemon, “Kqueue - A Generic and Scalable Event Notification Facility,” inProceedings of the FREENIX Track: 2001 USENIX Annual Technical Conference. USA: USENIX Association, Jun. 2001, pp. 141–153. [Online]. Available: https://people.freebsd.org/ ∼jlemon/ papers/kqueue.pdf

2001
[47]

File system events,

Apple, “File system events,” 2012, retrieved Mar 25, 2026 from https: //developer.apple.com/documentation/coreservices/file system events

2012