Recognition: unknown
Icicle: Scalable Metadata Indexing and Real-Time Monitoring for HPC File Systems
Pith reviewed 2026-05-10 15:09 UTC · model grok-4.3
The pith
Icicle uses a Kafka-Flink pipeline to index and monitor metadata in billion-file HPC systems with order-of-magnitude throughput gains.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Icicle maintains a unified, up-to-date, and queryable view of file system state while supporting both periodic snapshot-based ingestion for bulk metadata updates and event-based ingestion for real-time synchronization from production systems such as Lustre and IBM Storage Scale. Built on Apache Kafka and Apache Flink, Icicle provides high-throughput, fault-tolerant, and horizontally scalable ingestion of metadata events into two complementary search indexes, enabling both individual file discovery and aggregate summary statistics by user, group, and directory. This architecture enables efficient support for both coarse-grained administrative queries and interactive analytics over billions of
What carries the argument
Dual-mode ingestion pipeline that combines snapshot-based bulk loads with event-driven real-time streams, routed through Kafka for durability and Flink for scalable processing into two complementary search indexes.
If this is right
- Administrators gain fast aggregate statistics by user or directory without scanning the entire file system.
- Real-time event ingestion keeps the index synchronized with rapidly changing environments that batch tools cannot track.
- Tunable consistency and freshness options let operators balance query speed against metadata lag for different workloads.
- Horizontal scaling of the Kafka-Flink layer supports continued growth in file count and metadata volume.
Where Pith is reading between the lines
- The same dual-ingestion pattern could be adapted to other distributed storage systems that expose event streams, not just the two named here.
- Over time the indexed metadata might support trend analysis or anomaly detection without additional data collection.
- Interactive queries over the indexes could reduce reliance on periodic full-system reports in operations dashboards.
Load-bearing premise
The Kafka and Flink components can ingest and index metadata events from live Lustre and IBM Storage Scale systems at full production scale without unacceptable latency, data loss, or consistency failures.
What would settle it
Deploy Icicle on a production Lustre or IBM Storage Scale system containing billions of files under realistic user load, then measure whether ingestion throughput, query latency, and metadata completeness meet the reported order-of-magnitude improvements without data loss or excessive lag.
Figures
read the original abstract
Modern HPC file systems can contain billions of files and hundreds of petabytes of data, making even simple questions increasingly intractable to answer. Traditional file system utilities such as find and du fail to scale to these sizes. While external indexing tools like GUFI and Brindexer improve query performance, they remain batch-oriented and unsuitable for heterogeneous, rapidly evolving environments. We present Icicle, a scalable framework for continuous file system metadata indexing and monitoring. Icicle maintains a unified, up-to-date, and queryable view of file system state while supporting both periodic snapshot-based ingestion for bulk metadata updates and event-based ingestion for real-time synchronization from production systems such as Lustre and IBM Storage Scale. Built on Apache Kafka and Apache Flink, Icicle provides high-throughput, fault-tolerant, and horizontally scalable ingestion of metadata events into two complementary search indexes, enabling both individual file discovery and aggregate summary statistics by user, group, and directory. This architecture enables efficient support for both coarse-grained administrative queries and interactive analytics over billions of objects. Our experimental evaluation on production-scale HPC datasets demonstrates order-of-magnitude throughput improvements over existing monitoring and indexing approaches, with tunable options for balancing consistency, latency, and metadata freshness.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents Icicle, a scalable framework for continuous file system metadata indexing and real-time monitoring in large-scale HPC environments. Built on Apache Kafka and Apache Flink, it supports both periodic snapshot-based ingestion for bulk updates and event-based ingestion for synchronization from production systems such as Lustre and IBM Storage Scale. The architecture maintains a unified, queryable view supporting individual file discovery and aggregate statistics by user, group, and directory. The central claim is that experiments on production-scale HPC datasets demonstrate order-of-magnitude throughput improvements over existing monitoring and indexing approaches, with tunable options for balancing consistency, latency, and metadata freshness.
Significance. If the performance and scalability claims hold under realistic conditions, Icicle addresses a pressing need in HPC for real-time metadata management at exascale, where traditional utilities and batch indexers like GUFI fail. The practical integration of established streaming technologies offers a viable path for fault-tolerant, horizontally scalable ingestion and querying over billions of objects.
major comments (2)
- [Abstract and § on experimental evaluation] Abstract and experimental evaluation section: the headline claim of order-of-magnitude throughput gains on production-scale datasets provides no details on workload characteristics (e.g., event rates, file counts), comparison baselines, measurement methodology, error bars, or whether tests used live high-rate streams from Lustre/Storage Scale versus synthetic traces or post-facto replays. This is load-bearing for the central performance result.
- [Architecture and ingestion sections] Architecture and ingestion pipeline description: the Kafka+Flink design for ingesting metadata events lacks concrete measurements or analysis of data loss rates, end-to-end latency under peak production loads, and consistency guarantees when handling real Lustre and IBM Storage Scale event streams at observed HPC rates. This directly affects the viability of the real-time monitoring claim.
minor comments (2)
- [Abstract] Clarify in the abstract or introduction the exact scale of the production datasets used (e.g., number of files, events/sec) to better contextualize the reported gains.
- [Related work and evaluation] Ensure quantitative comparisons to GUFI, Brindexer, and other baselines appear in the evaluation rather than only qualitative discussion in related work.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. The comments highlight important areas where additional clarity and data will strengthen the presentation of our performance and real-time monitoring claims. We address each major comment below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Abstract and § on experimental evaluation] Abstract and experimental evaluation section: the headline claim of order-of-magnitude throughput gains on production-scale datasets provides no details on workload characteristics (e.g., event rates, file counts), comparison baselines, measurement methodology, error bars, or whether tests used live high-rate streams from Lustre/Storage Scale versus synthetic traces or post-facto replays. This is load-bearing for the central performance result.
Authors: We agree that the experimental evaluation requires more explicit detail to support the central performance claims. In the revised manuscript we will expand the relevant section to report workload characteristics (event rates and file counts), the precise comparison baselines (including GUFI and other batch tools), the measurement methodology for throughput, error bars on reported figures, and clarification on whether the production-scale datasets were processed via live streams or post-facto replays of traces. These additions will make the order-of-magnitude gains fully reproducible and evaluable. revision: yes
-
Referee: [Architecture and ingestion sections] Architecture and ingestion pipeline description: the Kafka+Flink design for ingesting metadata events lacks concrete measurements or analysis of data loss rates, end-to-end latency under peak production loads, and consistency guarantees when handling real Lustre and IBM Storage Scale event streams at observed HPC rates. This directly affects the viability of the real-time monitoring claim.
Authors: We acknowledge that quantitative evidence on these operational aspects is needed to substantiate the real-time monitoring claims. While the current text describes the fault-tolerant Kafka+Flink architecture, we will add concrete measurements and analysis in the revised architecture and evaluation sections. This will include reported data loss rates, end-to-end latency figures under peak loads, and the consistency guarantees observed when ingesting live event streams from Lustre and IBM Storage Scale at the rates encountered in our production-scale experiments. revision: yes
Circularity Check
No circularity: system description and empirical results with no derivations or self-referential fits
full rationale
The paper describes an architecture (Kafka + Flink for metadata ingestion from Lustre/Storage Scale) and reports experimental throughput results on production-scale datasets. No equations, fitted parameters, predictions derived from inputs, or load-bearing self-citations appear in the provided text. Claims rest on external benchmarks and measured performance rather than any reduction to the paper's own definitions or prior self-work by construction. This is the expected non-finding for a systems/implementation paper.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
The Lustre Storage Architecture,
P. Braam, “The Lustre Storage Architecture,” Mar. 2019, arXiv:1903.01955 [cs]. [Online]. Available: http://arxiv.org/abs/1903. 01955
-
[2]
OLCF announces storage specifications for Frontier exascale system,
M. Lakin, “OLCF announces storage specifications for Frontier exascale system,” 2021, retrieved Mar 25, 2026 from https://www.olcf.ornl.gov/2021/05/20/ olcf-announces-storage-specifications-for-frontier-exascale-system/
2021
-
[3]
ALCF deploys powerful new file storage systems,
N. Heinonen, “ALCF deploys powerful new file storage systems,” 2021, retrieved Mar 25, 2026 from https://www.alcf.anl.gov/news/ alcf-deploys-powerful-new-file-storage-systems
2021
-
[4]
Storage,
National Energy Research Scientific Computing Center (NERSC), “Storage,” 2025, retrieved Mar 25, 2026 from https://www.nersc.gov/ what-we-do/computing-for-science/data-resources/storage
2025
-
[5]
Monitoring Tools for Large Scale Systems,
R. Miller, J. Hill, D. A. Dillow, R. Gunasekaran, G. Shipman, and D. Maxwell, “Monitoring Tools for Large Scale Systems,” inCray User Group Conference (CUG 2010), Edinburgh, Scotland, May
2010
-
[6]
Available: https://cug.org/5-publications/proceedings attendee lists/CUG10CD/pages/1-program/final program/CUG10 Proceedings/pages/authors/06-10Tuesday/8C-Shipman-paper.pdf
[Online]. Available: https://cug.org/5-publications/proceedings attendee lists/CUG10CD/pages/1-program/final program/CUG10 Proceedings/pages/authors/06-10Tuesday/8C-Shipman-paper.pdf
-
[7]
GUFI: Fast, Secure File System Metadata Search for Both Privileged and Unprivileged Users,
D. Manno, J. Lee, P. Challa, Q. Zheng, D. Bonnie, G. Grider, and B. Settlemyer, “GUFI: Fast, Secure File System Metadata Search for Both Privileged and Unprivileged Users,” inSC22: International Conference for High Performance Computing, Networking, Storage and Analysis, Nov. 2022, pp. 1–14. [Online]. Available: https://ieeexplore.ieee.org/document/10046106/
-
[8]
Efficient Metadata Indexing for HPC Storage Systems,
A. K. Paul, B. Wang, N. Rutman, C. Spitz, and A. R. Butt, “Efficient Metadata Indexing for HPC Storage Systems,” in2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID), May 2020, pp. 162–171. [Online]. Available: https://ieeexplore.ieee.org/document/9139660/
-
[9]
GPFS: A shared- disk file system for large computing clusters,
F. B. Schmuck and R. L. Haskin, “GPFS: A shared- disk file system for large computing clusters,” inProceedings of the Conference on File and Storage Technologies, ser. FAST ’02. USA: USENIX Association, 2002, pp. 231–244. [Online]. Available: https://www.usenix.org/legacy/publications/library/ proceedings/fast02/full papers/schmuck/schmuck.pdf
2002
-
[10]
mmwatch command — ibm storage scale 5.2.3 documenta- tion,
IBM, “mmwatch command — ibm storage scale 5.2.3 documenta- tion,” 2025, retrieved Mar 25, 2026 from https://www.ibm.com/docs/ en/storage-scale/5.2.3?topic=reference-mmwatch-command
2025
-
[11]
Kafka: A distributed messaging system for log processing,
J. Kreps, N. Narkhede, J. Raoet al., “Kafka: A distributed messaging system for log processing,” inProceedings of the NetDB, vol. 11, no. 2011, Athens, Greece, 2011, pp. 1–7. [Online]. Available: https://notes.stephenholiday.com/Kafka.pdf
2011
-
[12]
Apache flink: Stream and batch processing in a single engine,
P. Carbone, A. Katsifodimos, S. Ewen, V . Markl, S. Haridi, and K. Tzoumas, “Apache flink: Stream and batch processing in a single engine,”The Bulletin of the Technical Committee on Data Engineering, vol. 38, no. 4, 2015. [Online]. Available: https://asterios.katsifodimos.com/assets/publications/flink-deb.pdf
2015
-
[13]
FSMonitor: Scalable File System Monitoring for Arbitrary Storage Systems,
A. K. Paul, R. Chard, K. Chard, S. Tuecke, A. R. Butt, and I. Foster, “FSMonitor: Scalable File System Monitoring for Arbitrary Storage Systems,” in2019 IEEE International Conference on Cluster Computing (CLUSTER). Albuquerque, NM, USA: IEEE, Sep. 2019, pp. 1–11. [Online]. Available: https://ieeexplore.ieee.org/document/8891045/
-
[14]
Globus platform services for data publication,
R. Ananthakrishnan, B. Blaiszik, K. Chard, R. Chard, B. McCollam, J. Pruyne, S. Rosen, S. Tuecke, and I. Foster, “Globus platform services for data publication,” inProceedings of the Practice and Experience on Advanced Research Computing: Seamless Creativity, ser. PEARC ’18. New York, NY , USA: Association for Computing Machinery,
-
[15]
Available: https://doi.org/10.1145/3219104.3219127
[Online]. Available: https://doi.org/10.1145/3219104.3219127
-
[16]
Elasticsearch,
Elasticsearch, “Elasticsearch,” 2010, retrieved Mar 25, 2026 from https: //www.elastic.co/elasticsearch
2010
-
[17]
Opensearch,
OpenSearch, “Opensearch,” 2021, retrieved Mar 25, 2026 from https: //opensearch.org/
2021
-
[18]
Octopus: Experiences with a hybrid event-driven architecture for distributed scientific computing,
H. Pan, R. Chard, S. Zhou, A. Kamatar, R. Vescovi, V . Hayot-Sasson, A. Bauer, M. Gonthier, K. Chard, and I. Foster, “Octopus: Experiences with a hybrid event-driven architecture for distributed scientific computing,” inSC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2024, pp. 496–507. [O...
-
[19]
confluent-kafka-python,
Confluent Inc., “confluent-kafka-python,” 2016, retrieved Mar 25, 2026 from https://github.com/confluentinc/confluent-kafka-python
2016
-
[20]
orjson Contributors, “orjson,” 2018, retrieved Mar 25, 2026 from https: //github.com/ijl/orjson
2018
-
[21]
Ddsketch: a fast and fully-mergeable quantile sketch with relative-error guarantees,
C. Masson, J. E. Rim, and H. K. Lee, “Ddsketch: a fast and fully-mergeable quantile sketch with relative-error guarantees,”Proc. VLDB Endow., vol. 12, no. 12, pp. 2195–2205, Aug. 2019. [Online]. Available: https://doi.org/10.14778/3352063.3352135
-
[22]
Optimal quantile approximation in streams,
Z. Karnin, K. Lang, and E. Liberty, “Optimal quantile approximation in streams,” in2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), 2016, pp. 71–78. [Online]. Available: https://arxiv.org/abs/1603.05346
-
[23]
Relative error streaming quantiles,
G. Cormode, Z. Karnin, E. Liberty, J. Thaler, and P. Vesel ´y, “Relative error streaming quantiles,”J. ACM, vol. 70, no. 5, Oct. 2023. [Online]. Available: https://doi.org/10.1145/3617891
-
[24]
The t-digest: Efficient estimates of distributions,
T. Dunning, “The t-digest: Efficient estimates of distributions,” Software Impacts, vol. 7, p. 100049, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2665963820300403
2021
-
[25]
sketches-py,
Datadog, “sketches-py,” 2020, retrieved Mar 25, 2026 from https: //github.com/DataDog/sketches-py
2020
-
[26]
datasketches-python,
Apache Software Foundation, “datasketches-python,” 2024, retrieved Mar 25, 2026 from https://github.com/apache/datasketches-python
2024
-
[27]
Filebench: A flexible framework for file system benchmarking,
V . Tarasov, “Filebench: A flexible framework for file system benchmarking,”;login: The USENIX Magazine, vol. 41, no. 1, p. 6,
-
[28]
Available: https://www.usenix.org/publications/login/ spring2016/tarasov
[Online]. Available: https://www.usenix.org/publications/login/ spring2016/tarasov
-
[29]
Spy- glass: fast, scalable metadata search for large-scale storage systems,
A. W. Leung, M. Shao, T. Bisson, S. Pasupathy, and E. L. Miller, “Spy- glass: fast, scalable metadata search for large-scale storage systems,” inProceedings of the 7th conference on File and storage technologies, ser. FAST ’09. USA: USENIX Association, Feb. 2009, pp. 153–
2009
-
[30]
Available: https://www.usenix.org/conference/fast-09/ spyglass-fast-scalable-metadata-search-large-scale-storage-systems
[Online]. Available: https://www.usenix.org/conference/fast-09/ spyglass-fast-scalable-metadata-search-large-scale-storage-systems
-
[31]
Y . Hua, H. Jiang, Y . Zhu, D. Feng, and L. Tian, “SmartStore: a new metadata organization paradigm with semantic-awareness for next-generation file systems,” inProceedings of the Conference on High Performance Computing Networking, Storage and Analysis, ser. SC ’09. New York, NY , USA: Association for Computing Machinery, Nov. 2009, pp. 1–12. [Online]. A...
-
[32]
Security Aware Partitioning for efficient file system search,
A. Parker-Wood, C. Strong, E. L. Miller, and D. D. E. Long, “Security Aware Partitioning for efficient file system search,” in2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST). Incline Village, NV , USA: IEEE, May 2010, pp. 1–14. [Online]. Available: http://ieeexplore.ieee.org/document/5496990/
-
[33]
Scale and concurrency of GIGA+: file system directories with millions of files,
S. Patil and G. Gibson, “Scale and concurrency of GIGA+: file system directories with millions of files,” inProceedings of the 9th USENIX conference on File and storage technologies, ser. FAST’11. USA: USENIX Association, Feb. 2011, pp. 177–190
2011
-
[34]
TABLEFS: Enhancing metadata efficiency in the local file system,
K. Ren and G. Gibson, “TABLEFS: Enhancing metadata efficiency in the local file system,” in2013 USENIX Annual Technical Conference (USENIX ATC 13). San Jose, CA: USENIX Association, Jun. 2013, pp. 145–156. [Online]. Available: https://www.usenix.org/conference/ atc13/technical-sessions/presentation/ren
2013
-
[35]
Indexfs: Scaling file system metadata performance with stateless caching and bulk insertion,
K. Ren, Q. Zheng, S. Patil, and G. Gibson, “Indexfs: Scaling file system metadata performance with stateless caching and bulk insertion,” inSC’14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 2014, pp. 237–248. [Online]. Available: https://doi.org/10.1109/ SC.2014.25
2014
-
[36]
Deltafs: Exascale file systems scale better without dedicated servers,
Q. Zheng, K. Ren, G. Gibson, B. W. Settlemyer, and G. Grider, “Deltafs: Exascale file systems scale better without dedicated servers,” inProceedings of the 10th Parallel Data Storage Workshop, 2015, pp. 1–6. [Online]. Available: https://doi.org/10.1145/2834976.2834977
-
[37]
InfiniFS: An efficient metadata service for large-scale distributed filesystems,
W. Lv, Y . Lu, Y . Zhang, P. Duan, and J. Shu, “InfiniFS: An efficient metadata service for large-scale distributed filesystems,” in20th USENIX Conference on File and Storage Technologies (FAST 22). Santa Clara, CA: USENIX Association, Feb. 2022, pp. 313–328. [Online]. Available: https://www.usenix.org/conference/fast22/presentation/lv
2022
-
[38]
LazyBase: trading freshness for performance in a scalable database,
J. Cipar, G. Ganger, K. Keeton, C. B. Morrey, C. A. Soules, and A. Veitch, “LazyBase: trading freshness for performance in a scalable database,” inProceedings of the 7th ACM european conference on Computer Systems, ser. EuroSys ’12. New York, NY , USA: Association for Computing Machinery, Apr. 2012, pp. 169–182. [Online]. Available: https://dl.acm.org/doi...
-
[39]
Borgfs: File system metadata index search,
SNIA, “Borgfs: File system metadata index search,” 2014, re- trieved Mar 25, 2026 from https://www.snia.org/educational-library/ borgfs-file-system-metadata-index-search-2014
2014
-
[40]
Taking back control of HPC file systems with Robinhood Policy Engine,
T. Leibovici, “Taking back control of HPC file systems with Robinhood Policy Engine,” May 2015, arXiv:1505.01448 [cs]. [Online]. Available: http://arxiv.org/abs/1505.01448
-
[41]
QuickSilver: A Distributed Policy Driven Data Management System,
C. Brumgard, A. George, R. Mohr, K. Maheshwari, J. Simmons, and S. Oral, “QuickSilver: A Distributed Policy Driven Data Management System,” inWorkshop: Women in HPC: Diversifying the HPC Commu- nity and Engaging Male Allies. Dallas, TX: Association for Computing Machinery, 2022. [Online]. Available: https://sc22.supercomputing.org/ proceedings/workshops/w...
2022
-
[42]
Polimor: A policy engine made-to-order for automated and scalable data management in lustre,
A. George, C. Brumgard, R. Mohr, K. Maheshwari, J. Simmons, S. Oral, and J. Hanley, “Polimor: A policy engine made-to-order for automated and scalable data management in lustre,” inProceedings of the SC ’23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, ser. SC-W ’23. New York, NY , USA: Associatio...
-
[43]
Cray clusterstor data services user guide,
Hewlett Packard Enterprise, “Cray clusterstor data services user guide,” 2021, retrieved Mar 25, 2026 from https://support.hpe.com/hpesc/public/ docDisplay?docId=a00114855en us&docLocale=en US
2021
-
[44]
Ibm spectrum scale information lifecycle man- agement policies: Practical guide,
IBM, “Ibm spectrum scale information lifecycle man- agement policies: Practical guide,” 2021, retrieved Mar 25, 2026 from https://www.ibm.com/support/pages/ ibm-spectrum-scale-information-lifecycle-management-policies-practical-guide
2021
-
[45]
Kernel korner: intro to inotify,
R. Love, “Kernel korner: intro to inotify,”Linux J., vol. 2005, no. 139, p. 8, Nov. 2005. [Online]. Available: https://www.linuxjournal. com/article/8478
2005
-
[46]
Kqueue - A Generic and Scalable Event Notification Facility,
J. Lemon, “Kqueue - A Generic and Scalable Event Notification Facility,” inProceedings of the FREENIX Track: 2001 USENIX Annual Technical Conference. USA: USENIX Association, Jun. 2001, pp. 141–153. [Online]. Available: https://people.freebsd.org/ ∼jlemon/ papers/kqueue.pdf
2001
-
[47]
File system events,
Apple, “File system events,” 2012, retrieved Mar 25, 2026 from https: //developer.apple.com/documentation/coreservices/file system events
2012
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.