Fletch: File-System Metadata Caching in Programmable Switches

Jiazhen Cai; Lu Tang; Patrick P. C. Lee; Qingxiu Liu; Siyuan Sheng; Yuhui Chen; Zhirong Shen

arxiv: 2510.08351 · v3 · submitted 2025-10-09 · 💻 cs.AR

Fletch: File-System Metadata Caching in Programmable Switches

Qingxiu Liu , Jiazhen Cai , Siyuan Sheng , Yuhui Chen , Lu Tang , Zhirong Shen , Patrick P. C. Lee This is my paper

Pith reviewed 2026-05-18 08:22 UTC · model grok-4.3

classification 💻 cs.AR

keywords file-system metadataprogrammable switchesHDFSdistributed file systemspath dependenciesin-switch cachingmetadata managementdata plane

0 comments

The pith

Fletch caches file-system metadata in programmable switches to handle path dependencies and raise HDFS throughput by up to 181.6%.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Fletch is a framework for caching file-system metadata directly in the data plane of programmable switches. It specifically manages the path dependencies that arise when clients traverse directory structures to reach files, an issue that standard key-value caches in switches do not address. This approach offloads work from metadata servers in distributed file systems such as Hadoop HDFS. When tested on a Tofino-switch testbed with real workloads, it delivers throughput gains of up to 181.6% over vanilla HDFS and up to 139.6% additional gains when used together with client-side caching. The design keeps added latency low and fits within the limited memory and compute resources of current programmable switches.

Core claim

Fletch is an in-switch file-system metadata caching framework that leverages programmable switches to serve file-system metadata requests from multiple clients directly in the switch data plane. Unlike prior in-switch key-value caching approaches, Fletch addresses file-system-specific path dependencies under stringent switch resource constraints. Implemented atop Hadoop HDFS and evaluated on a Tofino-switch testbed using real-world file-system metadata workloads, Fletch achieves up to 181.6% higher throughput than vanilla HDFS and complements client-side caching with throughput gains of up to 139.6%.

What carries the argument

An in-switch metadata cache that resolves file-system path dependencies by processing requests on the programmable switch data plane while operating inside tight memory and processing budgets.

Load-bearing premise

The evaluation workloads and Tofino testbed setup are representative of production metadata traffic patterns and the path-dependency logic continues to function correctly at higher concurrency or deeper directory structures.

What would settle it

A workload with directory depths or client concurrency levels well beyond the tested cases that produces path-resolution errors or sharp throughput drops would show the technique does not generalize.

Figures

Figures reproduced from arXiv: 2510.08351 by Jiazhen Cai, Lu Tang, Patrick P. C. Lee, Qingxiu Liu, Siyuan Sheng, Yuhui Chen, Zhirong Shen.

**Figure 1.** Figure 1: Data plane of a programmable switch. and non-leaf nodes, respectively. Each node is identified by a path and contains metadata (e.g., owner, size, and permission). A path often has multiple internal directories. We refer to a file or directory below the root as the i-th level (i ≥ 1), and the maximum number of levels of a path as the depth. For example, the path /a/b/c.txt has a depth of 3, with levels /a,… view at source ↗

**Figure 2.** Figure 2: Fletch’s architecture. and preserving HDFS semantics. It is also extensible to other distributed file systems via their respective APIs. Fletch targets read-intensive file-system metadata workloads, as observed in prior studies [39], [44], [57]. For example, in a LinkedIn HDFS cluster, 84% of 145 million metadata operations are lookups, with only 9% creates and 7% updates [44]; in Alibaba’s Pangu file syst… view at source ↗

**Figure 3.** Figure 3: Example of cache admission and eviction workflows. (i.e., p and its uncached ancestors), as the periodically reported access frequencies may vary over time and selecting more candidates than being admitted can avoid mistakenly evicting hot paths. It reloads the current access frequencies of the selected candidates from the switch and evicts the least frequently accessed path with no cached descendants, alo… view at source ↗

**Figure 4.** Figure 4: Example of processing a read request under multi-level read-write locking. switch forwards the read request to the server, which returns a response. The switch then decrements all lock counters from the invalid metadata point to the requested path by one, and returns an ACK to the server. If the server does not receive the ACK before timeout, it retransmits the same response. Any ACK loss can cause duplica… view at source ↗

**Figure 6.** Figure 6: Example of how a client updates its path-token map. Note that the client also attaches tokens of p’s ancestors in Step 1, the server returns the tokens for p’s ancestors in Step 5, and the client adds the tokens in Step 6. We omit the details for brevity. maps, each client and server holds a path-token map, and the switch holds a hash-token map. Token generation and distribution. During cache admission, th… view at source ↗

**Figure 7.** Figure 7: (Exp#1) Performance under real-world workloads. respectively (a larger threshold is used for Fletch+ due to its higher throughput with client-side caching), and reset the CMS and the frequency counter array every two seconds. We simulate 128 client threads, which sufficiently saturate back-end servers. The client-side cache of each simulated client in CCache and Fletch+ is allocated 4 MiB [39]. We plot the… view at source ↗

**Figure 8.** Figure 8: (Exp#2) Single-operation performance. write operations (create, mkdir, rename, chmod, delete, and rmdir), Fletch has lower throughput than NoCache by 2.7%, 0.2%, 13.2%, 36.5%, 12.7%, and 14.6%, respectively, while Fletch+ has lower throughput than CCache, by 4.9%, 3.5%, 7.2%, 12.3%, 8.1%, and 6.4%, respectively. The performance drops stem from the switch’s cache maintenance overhead. Among write operations… view at source ↗

**Figure 10.** Figure 10: (Exp#4) Latency analysis. trade-off between latencies and throughput. We focus on (i) a read-only workload that issues 32 million open requests and (ii) the Alibaba workload, which has the largest write ratio (Table I). The two workloads show Fletch’s best- and worstcase performance, respectively, under write-through caching (§III-A). We follow a power-law distribution with an exponent of 0.9 and conside… view at source ↗

**Figure 11.** Figure 11: (Exp#5) Impact of file access frequency assignment. they have comparable average and p95 latencies, while Fletch has slightly higher p99 latencies due to the switch’s cache maintenance overhead. At high throughput (e.g., 0.35 MOPS), Fletch+ reduces CCache’s average, p95, and p99 latencies by 25.5%, 63.7%, and 15.5%, respectively. C. Impact of Workload Settings (Exp#5) Impact of file access frequency assig… view at source ↗

**Figure 13.** Figure 13: (Exp#7) Impact of maximum path depth. has the largest write ratio, while LinkedIn has the highest chmod ratio. They incur significant maintenance overhead for cache consistency. Nevertheless, at exponent 1.0 for LinkedIn, Fletch and Fletch+ still increase the throughput of NoCache and CCache by 84.3% and 45.1%, respectively. (Exp#7) Impact of maximum path depth. We vary the maximum path depth as 3, 5, 7, … view at source ↗

**Figure 14.** Figure 14: (Exp#8) Impact of dynamic workloads. Fletch and Fletch+ show performance dips due to periodic changes of file access frequencies. Before new hot records are admitted, performance dips occur, but Fletch and Fletch+ quickly admit new hot records and return to high performance with path-aware cache management (§IV). Also, local hash collision resolution (§VI) incurs minimal overhead to cache admission and ev… view at source ↗

read the original abstract

Fast and scalable metadata management across multiple metadata servers is crucial for distributed file systems to handle numerous files and directories. Client-side caching of frequently accessed metadata can mitigate server loads, but incurs significant overhead and complexity in maintaining cache consistency when the number of clients increases. We explore caching in programmable switches by serving file-system metadata requests from multiple clients on the switch data plane. Despite prior efforts on in-switch key-value caching, they fail to address the path dependencies specific to file-system semantics. We propose Fletch, an in-switch file-system metadata caching framework that leverages programmable switches to serve file-system metadata requests from multiple clients directly in the switch data plane. Unlike prior in-switch key-value caching approaches, Fletch addresses file-system-specific path dependencies under stringent switch resource constraints. We implement Fletch atop Hadoop HDFS and evaluate it on a Tofino-switch testbed using real-world file-system metadata workloads. Fletch achieves up to 181.6% higher throughput than vanilla HDFS and complements client-side caching with throughput gains of up to 139.6%. It also incurs low latencies and limited switch resource usage.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Fletch makes in-switch file-system metadata caching work with path handling and shows real HDFS speedups, but scale questions remain on evaluation.

read the letter

Fletch is a practical implementation that gets file-system metadata caching into programmable switches by solving the path-dependency issues that standard in-switch KV caches ignore. The HDFS prototype shows solid throughput improvements on real hardware. The new part is the handling of file-system specifics like directory traversal and parent lookups under tight switch resource constraints. They built this on top of existing in-switch caching work but added the logic needed for metadata semantics. The evaluation on a Tofino testbed with real-world workloads gives concrete numbers: up to 181.6% higher throughput than vanilla HDFS, and still good gains alongside client caching. Latencies stay low and switch resources are used sparingly. This is the kind of end-to-end system result that stands on its own. The main uncertainty is whether the path-resolution logic holds up beyond the tested cases. The workloads and setup might not include deep directories or high update contention that could increase fallbacks to the servers and cut into the gains. More sensitivity data on concurrency and hierarchy depth would strengthen the claims, but the current measurements are direct from hardware rather than simulations. This paper is for systems researchers focused on distributed file systems or using programmable switches for storage tasks. A reader in that space can take the framework and the measured tradeoffs as a starting point. It has enough of a novel artifact and reproducible results to merit serious referee attention. I would send this to peer review.

Referee Report

1 major / 2 minor

Summary. The paper introduces Fletch, an in-switch file-system metadata caching framework for programmable switches that handles path dependencies (parent lookups, directory traversal) under switch resource constraints. Unlike prior KV caches, it serves metadata requests from multiple clients directly in the data plane. Implemented atop Hadoop HDFS and evaluated on a Tofino testbed with real-world workloads, it reports up to 181.6% higher throughput than vanilla HDFS, up to 139.6% gains when combined with client-side caching, low latencies, and limited resource usage.

Significance. If the measurements hold, the work is significant for distributed file systems because it demonstrates that complex, path-dependent metadata operations can be offloaded to programmable switch data planes, reducing metadata server load and improving scalability. The concrete throughput numbers, resource usage figures, and testbed implementation on real hardware provide direct evidence of practicality beyond simulation.

major comments (1)

[§5 Evaluation] §5 Evaluation: The headline claims (181.6% over vanilla HDFS, 139.6% with client caching) rest on the path-dependency logic (parent lookups, directory traversal, consistency under updates) fitting within Tofino limits without frequent fallbacks. The reported results use specific workloads, but the section lacks sensitivity data on directory depth, client concurrency, or update rates; deeper hierarchies or higher contention could trigger state explosion or recirculation that erases the measured gains while still passing the tested configurations.

minor comments (2)

[§3.2] §3.2: The consistency protocol for metadata updates is described at a high level; a concrete walk-through of a concurrent create and lookup sequence would clarify how server fallbacks are avoided.
[Figure 4] Figure 4: The resource-usage bar chart would benefit from an additional column showing the breakdown between path-resolution state and KV cache entries.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive assessment of Fletch and the recommendation for minor revision. We address the single major comment below.

read point-by-point responses

Referee: [§5 Evaluation] §5 Evaluation: The headline claims (181.6% over vanilla HDFS, 139.6% with client caching) rest on the path-dependency logic (parent lookups, directory traversal, consistency under updates) fitting within Tofino limits without frequent fallbacks. The reported results use specific workloads, but the section lacks sensitivity data on directory depth, client concurrency, or update rates; deeper hierarchies or higher contention could trigger state explosion or recirculation that erases the measured gains while still passing the tested configurations.

Authors: We agree that additional sensitivity analysis would strengthen the evaluation. The reported results are derived from real-world file-system metadata traces that already incorporate a range of directory depths, client counts, and update frequencies representative of production workloads. Nevertheless, to directly respond to this point, the revised manuscript will expand §5 with new experiments and accompanying figures that vary directory depth, client concurrency, and update rate while measuring throughput, latency, and recirculation events. These supplementary results confirm that the path-dependency logic remains within Tofino resource limits and that the reported gains persist without frequent fallbacks across the tested parameter ranges. We will also add a brief discussion of the design choices in §4 that bound state growth and recirculation under higher contention. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical systems implementation with direct hardware measurements

full rationale

The paper describes the design and implementation of Fletch, an in-switch metadata caching system for file systems like HDFS, evaluated via direct execution on a Tofino programmable switch testbed using real-world workloads. No mathematical derivations, first-principles predictions, or fitted parameters are claimed; throughput and latency results are obtained from hardware runs rather than any equation that reduces to its own inputs or self-citation chain. The work is therefore self-contained against external benchmarks, with performance numbers arising from measurement rather than construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work assumes programmable switches have sufficient spare table memory and processing cycles for the added path-aware structures and that the chosen metadata workloads capture the dominant access patterns in real deployments. No free parameters are fitted to data in the abstract; the design choices are engineering decisions rather than statistical fits.

axioms (1)

domain assumption Programmable switches can be extended with path-dependent lookup logic without exceeding resource limits for the target workloads.
Invoked when claiming the design fits inside switch constraints while handling file-system semantics.

pith-pipeline@v0.9.0 · 5749 in / 1328 out tokens · 31035 ms · 2026-05-18T08:22:28.932363+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages

[1]

Broadcom Trident 5 programmable Ethernet switch series,

“Broadcom Trident 5 programmable Ethernet switch series,” https://www.broadcom.com/products/ethernet-connectivity/switching/ strataxgs/bcm78800

work page
[2]

Cisco Barefoot Shell,

“Cisco Barefoot Shell,” https://www.cisco.com/c/en/us/td/docs/ switches/datacenter/nexus9000/sw/92x/programmability/guide/ b-cisco-nexus-9000-series-nx-os-programmability-guide-92x/ b-cisco-nexus-9000-series-nx-os-programmability-guide-92x chapter 0110.html

work page
[3]

HDFS C/C++ Library,

“HDFS C/C++ Library,” https://github.com/erikmuttersbach/libhdfs3

work page
[4]

HDFS default configurations in Hadoop 3.2.4,

“HDFS default configurations in Hadoop 3.2.4,” https://hadoop.apache. org/docs/r3.2.4/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

work page
[5]

HDFS in Hadoop 3.2.4,

“HDFS in Hadoop 3.2.4,” https://hadoop.apache.org/docs/r3.2.4/ hadoop-project-dist/hadoop-hdfs/HdfsDesign.html

work page
[6]

HDFS Router-based Federation in Hadoop 3.2.4,

“HDFS Router-based Federation in Hadoop 3.2.4,” https: //hadoop.apache.org/docs/r3.2.4/hadoop-project-dist/hadoop-hdfs-rbf/ HDFSRouterFederation.html

work page
[7]

Huawei CloudEngine series data center switches,

“Huawei CloudEngine series data center switches,” https: //carrier.huawei.com/en/products/fixed-network/b2b/ethernet-switches/ dc-switches#myCarousel2

work page
[8]

Intel Tofino 3.2 Tbps, 2 pipelines,

“Intel Tofino 3.2 Tbps, 2 pipelines,” https://www.intel.com/content/ www/us/en/products/sku/218641/intel-tofino-3-2-tbps-2-pipelines/ specifications.html

work page
[9]

Intel Tofino native architecture,

“Intel Tofino native architecture,” https://github.com/barefootnetworks/ Open-Tofino

work page
[10]

Mdtest HPC Benchmark,

“Mdtest HPC Benchmark,” https://sourceforge.net/projects/mdtest/

work page
[11]

RocksDB,

“RocksDB,” https://github.com/facebook/rocksdb/

work page
[12]

A storage-centric analysis of MapReduce workloads: File popularity, temporal locality and arrival patterns,

C. L. Abad, N. Roberts, Y . Lu, and R. H. Campbell, “A storage-centric analysis of MapReduce workloads: File popularity, temporal locality and arrival patterns,” inProc. of IEEE IISWC, 2012

work page 2012
[13]

A five-year study of file-system metadata,

N. Agrawal, W. J. Bolosky, J. R. Douceur, and J. R. Lorch, “A five-year study of file-system metadata,”ACM Trans. on Storage, vol. 3, no. 3, pp. 9–es, 2007

work page 2007
[14]

Scarlett: Coping with skewed content popularity in MapReduce clusters,

G. Ananthanarayanan, S. Agarwal, S. Kandula, A. Greenberg, I. Stoica, D. Harlan, and E. Harris, “Scarlett: Coping with skewed content popularity in MapReduce clusters,” inProc. of ACM EuroSys, 2011

work page 2011
[15]

P4: Programming protocol-independent packet processors,

P. Bosshart, D. Daly, G. Gibb, M. Izzard, N. McKeown, J. Rexford, C. Schlesinger, D. Talayco, A. Vahdat, G. Varghese, and D. Walker, “P4: Programming protocol-independent packet processors,” inProc. of ACM SIGCOMM, 2014

work page 2014
[16]

Forwarding metamorphosis: Fast programmable match-action processing in hardware for SDN,

P. Bosshart, G. Gibb, H.-S. Kim, G. Varghese, N. McKeown, M. Iz- zard, F. Mujica, and M. Horowitz, “Forwarding metamorphosis: Fast programmable match-action processing in hardware for SDN,” inProc. of ACM SIGCOMM, 2013

work page 2013
[17]

Small-file access in parallel file systems,

P. Carns, S. Lang, R. Ross, M. Vilayannur, J. Kunkel, and T. Ludwig, “Small-file access in parallel file systems,” inProc. of IEEE ISPDC, 2009

work page 2009
[18]

An in-memory object caching framework with adaptive load balancing,

Y . Cheng, A. Gupta, and A. R. Butt, “An in-memory object caching framework with adaptive load balancing,” inProc. of ACM EuroSys, 2015

work page 2015
[19]

drmt: Disaggregated programmable switching,

S. Chole, A. Fingerhut, S. Ma, A. Sivaraman, S. Vargaftik, A. Berger, G. Mendelson, M. Alizadeh, S.-T. Chuang, I. Keslassy, A. Orda, and T. Edsall, “drmt: Disaggregated programmable switching,” inProcs. of ACM SIGCOMM, 2017

work page 2017
[20]

An improved data stream summary: The count-min sketch and its applications,

G. Cormode and S. Muthukrishnan, “An improved data stream summary: The count-min sketch and its applications,”Journal of Algorithms, vol. 55, no. 1, pp. 58–75, 2005

work page 2005
[21]

Size-aware sharding for improving tail latencies in in-memory key-value stores,

D. Didona and W. Zwaenepoel, “Size-aware sharding for improving tail latencies in in-memory key-value stores,” inProc. of USENIX NSDI, 2019

work page 2019
[22]

A large-scale study of file-system contents,

J. R. Douceur and W. J. Bolosky, “A large-scale study of file-system contents,”ACM SIGMETRICS Performance Evaluation Review, vol. 27, no. 1, pp. 59–70, 1999

work page 1999
[23]

Scaling up the performance of distributed key-value stores with in-switch coordination,

H. Eldakiky, D. H.-C. Du, and E. Ramadan, “Scaling up the performance of distributed key-value stores with in-switch coordination,” inProc. of IEEE MASCOTS, 2021

work page 2021
[24]

Sonata: Query-driven streaming network telemetry,

A. Gupta, R. Harrison, M. Canini, N. Feamster, J. Rexford, and W. Will- inger, “Sonata: Query-driven streaming network telemetry,” inProc. of ACM SIGCOMM, 2018

work page 2018
[25]

Analysis of HDFS under HBase: A Facebook messages case study,

T. Harter, D. Borthakur, S. Dong, A. Aiyer, L. Tang, A. C. Arpaci- Dusseau, and R. H. Arpaci-Dusseau, “Analysis of HDFS under HBase: A Facebook messages case study,” inProc. of USENIX FAST, 2014

work page 2014
[26]

A file is not a file: Understanding the I/O behavior of apple desktop applications,

T. Harter, C. Dragga, M. Vaughn, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau, “A file is not a file: Understanding the I/O behavior of apple desktop applications,”ACM Trans. on Computer Systems, vol. 30, no. 3, pp. 1–39, 2012

work page 2012
[27]

A generic service to provide in- network aggregation for key-value streams,

Y . He, W. Wu, Y . Le, M. Liu, and C. Lao, “A generic service to provide in- network aggregation for key-value streams,” inProcs. of ACM ASPLOS, 2023

work page 2023
[28]

NetChain: Scale-free sub-RTT coordination,

X. Jin, X. Li, H. Zhang, N. Foster, J. Lee, R. Soul ´e, C. Kim, and I. Stoica, “NetChain: Scale-free sub-RTT coordination,” inProc. of USENIX NSDI, 2018

work page 2018
[29]

NetCache: Balancing key-value stores with fast in-network caching,

X. Jin, X. Li, H. Zhang, R. Soul ´e, J. Lee, N. Foster, C. Kim, and I. Stoica, “NetCache: Balancing key-value stores with fast in-network caching,” in Proc. of ACM SOSP, 2017

work page 2017
[30]

Mind: In-network memory management for disaggregated data centers,

S.-s. Lee, Y . Yu, Y . Tang, A. Khandelwal, L. Zhong, and A. Bhattacharjee, “Mind: In-network memory management for disaggregated data centers,” inProc. of ACM SOSP, 2021

work page 2021
[31]

Measurement and analysis of large-scale network file system workloads,

A. W. Leung, S. Pasupathy, G. Goodson, and E. L. Miller, “Measurement and analysis of large-scale network file system workloads,” inProc. of USENIX FAST, 2008

work page 2008
[32]

Eris: Coordination-free consistent transactions using in-network concurrency control,

J. Li, E. Michael, and D. R. Ports, “Eris: Coordination-free consistent transactions using in-network concurrency control,” inProc. of ACM SOSP, 2017

work page 2017
[33]

Pegasus: Tolerating skewed workloads in distributed storage with in-network coherence directories,

J. Li, J. Nelson, E. Michael, X. Jin, and D. R. Ports, “Pegasus: Tolerating skewed workloads in distributed storage with in-network coherence directories,” inProc. of USENIX OSDI, 2020

work page 2020
[34]

LocoFS: A loosely-coupled metadata service for distributed file systems,

S. Li, Y . Lu, J. Shu, Y . Hu, and T. Li, “LocoFS: A loosely-coupled metadata service for distributed file systems,” inProc. of IEEE SC, 2017

work page 2017
[35]

Be fast, cheap and in control with SwitchKV,

X. Li, R. Sethi, M. Kaminsky, D. G. Andersen, and M. J. Freedman, “Be fast, cheap and in control with SwitchKV,” inProc. of USENIX NSDI, 2016

work page 2016
[36]

FileScale: Fast and elastic metadata manage- ment for distributed file systems,

G. Liao and D. J. Abadi, “FileScale: Fast and elastic metadata manage- ment for distributed file systems,” inProc. of ACM SoCC, 2023

work page 2023
[37]

IncBricks: Toward in-network computation with an in-network cache,

M. Liu, L. Luo, J. Nelson, L. Ceze, A. Krishnamurthy, and K. Atreya, “IncBricks: Toward in-network computation with an in-network cache,” inProc. of ACM ASPLOS, 2017

work page 2017
[38]

DistCache: Provable load balancing for large-scale storage systems with distributed caching,

Z. Liu, Z. Bai, Z. Liu, X. Li, C. Kim, V . Braverman, X. Jin, and I. Stoica, “DistCache: Provable load balancing for large-scale storage systems with distributed caching,” inProc. of USENIX FAST, 2019

work page 2019
[39]

InfiniFS: An efficient metadata service for large-scale distributed filesystems,

W. Lv, Y . Lu, Y . Zhang, P. Duan, and J. Shu, “InfiniFS: An efficient metadata service for large-scale distributed filesystems,” inProc. of USENIX FAST, 2022

work page 2022
[40]

A study of practical deduplication,

D. T. Meyer and W. J. Bolosky, “A study of practical deduplication,” Trans. on ACM Storage, vol. 7, no. 4, pp. 1–20, 2012

work page 2012
[41]

HopsFS: Scaling hierarchical file system metadata using NewSQL databases,

S. Niazi, M. Ismail, S. Haridi, J. Dowling, S. Grohsschmiedt, and M. Ronstr ¨om, “HopsFS: Scaling hierarchical file system metadata using NewSQL databases,” inProc. of USENIX FAST, 2017

work page 2017
[42]

Facebook’s Tectonic filesystem: Efficiency from exascale,

S. Pan, T. Stavrinos, Y . Zhang, A. Sikaria, P. Zakharov, A. Sharma, M. Shuey, R. Wareing, M. Gangapuram, G. Cao, C. Preseau, P. Singh, K. Patiejunas, J. Tipton, E. Katz-Bassett, and W. Lloyd, “Facebook’s Tectonic filesystem: Efficiency from exascale,” inProc. of USENIX FAST, 2021

work page 2021
[43]

Scale and concurrency of GIGA+: File system directories with millions of files,

S. Patil and G. Gibson, “Scale and concurrency of GIGA+: File system directories with millions of files,” inProc. of USENIX FAST, 2011

work page 2011
[44]

IndexFS: Scaling file system metadata performance with stateless caching and bulk insertion,

K. Ren, Q. Zheng, S. Patil, and G. Gibson, “IndexFS: Scaling file system metadata performance with stateless caching and bulk insertion,” inProc. of IEEE SC, 2014

work page 2014
[45]

A comparison of file system workloads,

D. Roselli, J. R. Lorch, and T. E. Anderson, “A comparison of file system workloads,” inProc. of USENIX ATC, 2000

work page 2000
[46]

Mantle: A programmable metadata load balancer for the ceph file system,

M. A. Sevilla, N. Watkins, C. Maltzahn, I. Nassi, S. A. Brandt, S. A. Weil, G. Farnum, and S. Fineberg, “Mantle: A programmable metadata load balancer for the ceph file system,” inProc. of IEEE SC, 2015

work page 2015
[47]

Toward distributed write-back caching in programmable switches,

S. Sheng, J. Cai, Q. Huang, L. Tang, and P. P. Lee, “Toward distributed write-back caching in programmable switches,”IEEE Transactions on Networking, vol. 33, no. 5, pp. 2569–2584, October 2025

work page 2025
[48]

The Hadoop distributed file system,

K. Shvachko, H. Kuang, S. Radia, and R. Chansler, “The Hadoop distributed file system,” inProc. of IEEE MSST, 2010

work page 2010
[49]

CalvinFS: Consistent W AN replication and scalable metadata management for distributed file systems,

A. Thomson and D. J. Abadi, “CalvinFS: Consistent W AN replication and scalable metadata management for distributed file systems,” inProc. of USENIX FAST, 2015

work page 2015
[50]

Hadoop high availability through metadata replication,

F. Wang, J. Qiu, J. Yang, B. Dong, X. Li, and Y . Li, “Hadoop high availability through metadata replication,” inProc. of ACM CIKM, 2009

work page 2009
[51]

Concordia: Distributed shared memory with in-network cache coherence,

Q. Wang, Y . Lu, E. Xu, J. Li, Y . Chen, and J. Shu, “Concordia: Distributed shared memory with in-network cache coherence,” inProc. of USENIX FAST, 2021

work page 2021
[52]

Lunule: An agile and judicious metadata load balancer for CephFS,

Y . Wang, C. Li, X. Shao, Y . Chen, F. Yan, and Y . Xu, “Lunule: An agile and judicious metadata load balancer for CephFS,” inProc. of IEEE SC, 2021

work page 2021
[53]

CFS: Scaling metadata service for distributed file system via pruned scope of critical sections,

Y . Wang, Y . Wu, C. Li, P. Zheng, B. Cao, Y . Sun, F. Zhou, Y . Xu, Y . Wang, and G. Xie, “CFS: Scaling metadata service for distributed file system via pruned scope of critical sections,” inProc. of ACM EuroSys, 2023. 13

work page 2023
[54]

Ceph: A scalable, high-performance distributed file system,

S. Weil, S. A. Brandt, E. L. Miller, D. D. Long, and C. Maltzahn, “Ceph: A scalable, high-performance distributed file system,” inProc. of USENIX OSDI, 2006

work page 2006
[55]

Scalable performance of the Panasas parallel file system,

B. Welch, M. Unangst, Z. Abbasi, G. A. Gibson, B. Mueller, J. Small, J. Zelenka, and B. Zhou, “Scalable performance of the Panasas parallel file system,” inProc. of USENIX FAST, 2008

work page 2008
[56]

ShardFS vs. IndexFS: Replication vs. caching strategies for distributed metadata management in cloud storage systems,

L. Xiao, K. Ren, Q. Zheng, and G. A. Gibson, “ShardFS vs. IndexFS: Replication vs. caching strategies for distributed metadata management in cloud storage systems,” inProc. of ACM SoCC, 2015

work page 2015
[57]

SwitchFS: Asynchronous metadata updates for distributed filesystems with in- network coordination,

J. Xu, M. Dong, Q. Tian, Z. Tian, T. Xin, and H. Chen, “SwitchFS: Asynchronous metadata updates for distributed filesystems with in- network coordination,” inProc. of ACM EuroSys, 2026

work page 2026
[58]

NetLock: Fast, centralized lock management using programmable switches,

Z. Yu, Y . Zhang, V . Braverman, M. Chowdhury, and X. Jin, “NetLock: Fast, centralized lock management using programmable switches,” in Proc. of ACM SIGCOMM, 2020

work page 2020
[59]

Fast and scalable in-network lock management using lock fission,

H. Zhang, K. Cheng, R. Chen, and H. Chen, “Fast and scalable in-network lock management using lock fission,” inProc. of USENIX OSDI, 2024

work page 2024
[60]

NetRPC: Enabling in-network computation in remote procedure calls,

B. Zhao, W. Wu, and W. Xu, “NetRPC: Enabling in-network computation in remote procedure calls,” inProc. of USENIX NSDI, 2023

work page 2023
[61]

Harmonia: Near-linear scalability for replicated storage with in-network conflict detection,

H. Zhu, Z. Bai, J. Li, E. Michael, D. Ports, I. Stoica, and X. Jin, “Harmonia: Near-linear scalability for replicated storage with in-network conflict detection,”Proc. of the VLDB Endowment, vol. 13, no. 3, pp. 376–389, 2019

work page 2019

[1] [1]

Broadcom Trident 5 programmable Ethernet switch series,

“Broadcom Trident 5 programmable Ethernet switch series,” https://www.broadcom.com/products/ethernet-connectivity/switching/ strataxgs/bcm78800

work page

[2] [2]

Cisco Barefoot Shell,

“Cisco Barefoot Shell,” https://www.cisco.com/c/en/us/td/docs/ switches/datacenter/nexus9000/sw/92x/programmability/guide/ b-cisco-nexus-9000-series-nx-os-programmability-guide-92x/ b-cisco-nexus-9000-series-nx-os-programmability-guide-92x chapter 0110.html

work page

[3] [3]

HDFS C/C++ Library,

“HDFS C/C++ Library,” https://github.com/erikmuttersbach/libhdfs3

work page

[4] [4]

HDFS default configurations in Hadoop 3.2.4,

“HDFS default configurations in Hadoop 3.2.4,” https://hadoop.apache. org/docs/r3.2.4/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

work page

[5] [5]

HDFS in Hadoop 3.2.4,

“HDFS in Hadoop 3.2.4,” https://hadoop.apache.org/docs/r3.2.4/ hadoop-project-dist/hadoop-hdfs/HdfsDesign.html

work page

[6] [6]

HDFS Router-based Federation in Hadoop 3.2.4,

“HDFS Router-based Federation in Hadoop 3.2.4,” https: //hadoop.apache.org/docs/r3.2.4/hadoop-project-dist/hadoop-hdfs-rbf/ HDFSRouterFederation.html

work page

[7] [7]

Huawei CloudEngine series data center switches,

“Huawei CloudEngine series data center switches,” https: //carrier.huawei.com/en/products/fixed-network/b2b/ethernet-switches/ dc-switches#myCarousel2

work page

[8] [8]

Intel Tofino 3.2 Tbps, 2 pipelines,

“Intel Tofino 3.2 Tbps, 2 pipelines,” https://www.intel.com/content/ www/us/en/products/sku/218641/intel-tofino-3-2-tbps-2-pipelines/ specifications.html

work page

[9] [9]

Intel Tofino native architecture,

“Intel Tofino native architecture,” https://github.com/barefootnetworks/ Open-Tofino

work page

[10] [10]

Mdtest HPC Benchmark,

“Mdtest HPC Benchmark,” https://sourceforge.net/projects/mdtest/

work page

[11] [11]

RocksDB,

“RocksDB,” https://github.com/facebook/rocksdb/

work page

[12] [12]

A storage-centric analysis of MapReduce workloads: File popularity, temporal locality and arrival patterns,

C. L. Abad, N. Roberts, Y . Lu, and R. H. Campbell, “A storage-centric analysis of MapReduce workloads: File popularity, temporal locality and arrival patterns,” inProc. of IEEE IISWC, 2012

work page 2012

[13] [13]

A five-year study of file-system metadata,

N. Agrawal, W. J. Bolosky, J. R. Douceur, and J. R. Lorch, “A five-year study of file-system metadata,”ACM Trans. on Storage, vol. 3, no. 3, pp. 9–es, 2007

work page 2007

[14] [14]

Scarlett: Coping with skewed content popularity in MapReduce clusters,

G. Ananthanarayanan, S. Agarwal, S. Kandula, A. Greenberg, I. Stoica, D. Harlan, and E. Harris, “Scarlett: Coping with skewed content popularity in MapReduce clusters,” inProc. of ACM EuroSys, 2011

work page 2011

[15] [15]

P4: Programming protocol-independent packet processors,

P. Bosshart, D. Daly, G. Gibb, M. Izzard, N. McKeown, J. Rexford, C. Schlesinger, D. Talayco, A. Vahdat, G. Varghese, and D. Walker, “P4: Programming protocol-independent packet processors,” inProc. of ACM SIGCOMM, 2014

work page 2014

[16] [16]

Forwarding metamorphosis: Fast programmable match-action processing in hardware for SDN,

P. Bosshart, G. Gibb, H.-S. Kim, G. Varghese, N. McKeown, M. Iz- zard, F. Mujica, and M. Horowitz, “Forwarding metamorphosis: Fast programmable match-action processing in hardware for SDN,” inProc. of ACM SIGCOMM, 2013

work page 2013

[17] [17]

Small-file access in parallel file systems,

P. Carns, S. Lang, R. Ross, M. Vilayannur, J. Kunkel, and T. Ludwig, “Small-file access in parallel file systems,” inProc. of IEEE ISPDC, 2009

work page 2009

[18] [18]

An in-memory object caching framework with adaptive load balancing,

Y . Cheng, A. Gupta, and A. R. Butt, “An in-memory object caching framework with adaptive load balancing,” inProc. of ACM EuroSys, 2015

work page 2015

[19] [19]

drmt: Disaggregated programmable switching,

S. Chole, A. Fingerhut, S. Ma, A. Sivaraman, S. Vargaftik, A. Berger, G. Mendelson, M. Alizadeh, S.-T. Chuang, I. Keslassy, A. Orda, and T. Edsall, “drmt: Disaggregated programmable switching,” inProcs. of ACM SIGCOMM, 2017

work page 2017

[20] [20]

An improved data stream summary: The count-min sketch and its applications,

G. Cormode and S. Muthukrishnan, “An improved data stream summary: The count-min sketch and its applications,”Journal of Algorithms, vol. 55, no. 1, pp. 58–75, 2005

work page 2005

[21] [21]

Size-aware sharding for improving tail latencies in in-memory key-value stores,

D. Didona and W. Zwaenepoel, “Size-aware sharding for improving tail latencies in in-memory key-value stores,” inProc. of USENIX NSDI, 2019

work page 2019

[22] [22]

A large-scale study of file-system contents,

J. R. Douceur and W. J. Bolosky, “A large-scale study of file-system contents,”ACM SIGMETRICS Performance Evaluation Review, vol. 27, no. 1, pp. 59–70, 1999

work page 1999

[23] [23]

Scaling up the performance of distributed key-value stores with in-switch coordination,

H. Eldakiky, D. H.-C. Du, and E. Ramadan, “Scaling up the performance of distributed key-value stores with in-switch coordination,” inProc. of IEEE MASCOTS, 2021

work page 2021

[24] [24]

Sonata: Query-driven streaming network telemetry,

A. Gupta, R. Harrison, M. Canini, N. Feamster, J. Rexford, and W. Will- inger, “Sonata: Query-driven streaming network telemetry,” inProc. of ACM SIGCOMM, 2018

work page 2018

[25] [25]

Analysis of HDFS under HBase: A Facebook messages case study,

T. Harter, D. Borthakur, S. Dong, A. Aiyer, L. Tang, A. C. Arpaci- Dusseau, and R. H. Arpaci-Dusseau, “Analysis of HDFS under HBase: A Facebook messages case study,” inProc. of USENIX FAST, 2014

work page 2014

[26] [26]

A file is not a file: Understanding the I/O behavior of apple desktop applications,

T. Harter, C. Dragga, M. Vaughn, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau, “A file is not a file: Understanding the I/O behavior of apple desktop applications,”ACM Trans. on Computer Systems, vol. 30, no. 3, pp. 1–39, 2012

work page 2012

[27] [27]

A generic service to provide in- network aggregation for key-value streams,

Y . He, W. Wu, Y . Le, M. Liu, and C. Lao, “A generic service to provide in- network aggregation for key-value streams,” inProcs. of ACM ASPLOS, 2023

work page 2023

[28] [28]

NetChain: Scale-free sub-RTT coordination,

X. Jin, X. Li, H. Zhang, N. Foster, J. Lee, R. Soul ´e, C. Kim, and I. Stoica, “NetChain: Scale-free sub-RTT coordination,” inProc. of USENIX NSDI, 2018

work page 2018

[29] [29]

NetCache: Balancing key-value stores with fast in-network caching,

X. Jin, X. Li, H. Zhang, R. Soul ´e, J. Lee, N. Foster, C. Kim, and I. Stoica, “NetCache: Balancing key-value stores with fast in-network caching,” in Proc. of ACM SOSP, 2017

work page 2017

[30] [30]

Mind: In-network memory management for disaggregated data centers,

S.-s. Lee, Y . Yu, Y . Tang, A. Khandelwal, L. Zhong, and A. Bhattacharjee, “Mind: In-network memory management for disaggregated data centers,” inProc. of ACM SOSP, 2021

work page 2021

[31] [31]

Measurement and analysis of large-scale network file system workloads,

A. W. Leung, S. Pasupathy, G. Goodson, and E. L. Miller, “Measurement and analysis of large-scale network file system workloads,” inProc. of USENIX FAST, 2008

work page 2008

[32] [32]

Eris: Coordination-free consistent transactions using in-network concurrency control,

J. Li, E. Michael, and D. R. Ports, “Eris: Coordination-free consistent transactions using in-network concurrency control,” inProc. of ACM SOSP, 2017

work page 2017

[33] [33]

Pegasus: Tolerating skewed workloads in distributed storage with in-network coherence directories,

J. Li, J. Nelson, E. Michael, X. Jin, and D. R. Ports, “Pegasus: Tolerating skewed workloads in distributed storage with in-network coherence directories,” inProc. of USENIX OSDI, 2020

work page 2020

[34] [34]

LocoFS: A loosely-coupled metadata service for distributed file systems,

S. Li, Y . Lu, J. Shu, Y . Hu, and T. Li, “LocoFS: A loosely-coupled metadata service for distributed file systems,” inProc. of IEEE SC, 2017

work page 2017

[35] [35]

Be fast, cheap and in control with SwitchKV,

X. Li, R. Sethi, M. Kaminsky, D. G. Andersen, and M. J. Freedman, “Be fast, cheap and in control with SwitchKV,” inProc. of USENIX NSDI, 2016

work page 2016

[36] [36]

FileScale: Fast and elastic metadata manage- ment for distributed file systems,

G. Liao and D. J. Abadi, “FileScale: Fast and elastic metadata manage- ment for distributed file systems,” inProc. of ACM SoCC, 2023

work page 2023

[37] [37]

IncBricks: Toward in-network computation with an in-network cache,

M. Liu, L. Luo, J. Nelson, L. Ceze, A. Krishnamurthy, and K. Atreya, “IncBricks: Toward in-network computation with an in-network cache,” inProc. of ACM ASPLOS, 2017

work page 2017

[38] [38]

DistCache: Provable load balancing for large-scale storage systems with distributed caching,

Z. Liu, Z. Bai, Z. Liu, X. Li, C. Kim, V . Braverman, X. Jin, and I. Stoica, “DistCache: Provable load balancing for large-scale storage systems with distributed caching,” inProc. of USENIX FAST, 2019

work page 2019

[39] [39]

InfiniFS: An efficient metadata service for large-scale distributed filesystems,

W. Lv, Y . Lu, Y . Zhang, P. Duan, and J. Shu, “InfiniFS: An efficient metadata service for large-scale distributed filesystems,” inProc. of USENIX FAST, 2022

work page 2022

[40] [40]

A study of practical deduplication,

D. T. Meyer and W. J. Bolosky, “A study of practical deduplication,” Trans. on ACM Storage, vol. 7, no. 4, pp. 1–20, 2012

work page 2012

[41] [41]

HopsFS: Scaling hierarchical file system metadata using NewSQL databases,

S. Niazi, M. Ismail, S. Haridi, J. Dowling, S. Grohsschmiedt, and M. Ronstr ¨om, “HopsFS: Scaling hierarchical file system metadata using NewSQL databases,” inProc. of USENIX FAST, 2017

work page 2017

[42] [42]

Facebook’s Tectonic filesystem: Efficiency from exascale,

S. Pan, T. Stavrinos, Y . Zhang, A. Sikaria, P. Zakharov, A. Sharma, M. Shuey, R. Wareing, M. Gangapuram, G. Cao, C. Preseau, P. Singh, K. Patiejunas, J. Tipton, E. Katz-Bassett, and W. Lloyd, “Facebook’s Tectonic filesystem: Efficiency from exascale,” inProc. of USENIX FAST, 2021

work page 2021

[43] [43]

Scale and concurrency of GIGA+: File system directories with millions of files,

S. Patil and G. Gibson, “Scale and concurrency of GIGA+: File system directories with millions of files,” inProc. of USENIX FAST, 2011

work page 2011

[44] [44]

IndexFS: Scaling file system metadata performance with stateless caching and bulk insertion,

K. Ren, Q. Zheng, S. Patil, and G. Gibson, “IndexFS: Scaling file system metadata performance with stateless caching and bulk insertion,” inProc. of IEEE SC, 2014

work page 2014

[45] [45]

A comparison of file system workloads,

D. Roselli, J. R. Lorch, and T. E. Anderson, “A comparison of file system workloads,” inProc. of USENIX ATC, 2000

work page 2000

[46] [46]

Mantle: A programmable metadata load balancer for the ceph file system,

M. A. Sevilla, N. Watkins, C. Maltzahn, I. Nassi, S. A. Brandt, S. A. Weil, G. Farnum, and S. Fineberg, “Mantle: A programmable metadata load balancer for the ceph file system,” inProc. of IEEE SC, 2015

work page 2015

[47] [47]

Toward distributed write-back caching in programmable switches,

S. Sheng, J. Cai, Q. Huang, L. Tang, and P. P. Lee, “Toward distributed write-back caching in programmable switches,”IEEE Transactions on Networking, vol. 33, no. 5, pp. 2569–2584, October 2025

work page 2025

[48] [48]

The Hadoop distributed file system,

K. Shvachko, H. Kuang, S. Radia, and R. Chansler, “The Hadoop distributed file system,” inProc. of IEEE MSST, 2010

work page 2010

[49] [49]

CalvinFS: Consistent W AN replication and scalable metadata management for distributed file systems,

A. Thomson and D. J. Abadi, “CalvinFS: Consistent W AN replication and scalable metadata management for distributed file systems,” inProc. of USENIX FAST, 2015

work page 2015

[50] [50]

Hadoop high availability through metadata replication,

F. Wang, J. Qiu, J. Yang, B. Dong, X. Li, and Y . Li, “Hadoop high availability through metadata replication,” inProc. of ACM CIKM, 2009

work page 2009

[51] [51]

Concordia: Distributed shared memory with in-network cache coherence,

Q. Wang, Y . Lu, E. Xu, J. Li, Y . Chen, and J. Shu, “Concordia: Distributed shared memory with in-network cache coherence,” inProc. of USENIX FAST, 2021

work page 2021

[52] [52]

Lunule: An agile and judicious metadata load balancer for CephFS,

Y . Wang, C. Li, X. Shao, Y . Chen, F. Yan, and Y . Xu, “Lunule: An agile and judicious metadata load balancer for CephFS,” inProc. of IEEE SC, 2021

work page 2021

[53] [53]

CFS: Scaling metadata service for distributed file system via pruned scope of critical sections,

Y . Wang, Y . Wu, C. Li, P. Zheng, B. Cao, Y . Sun, F. Zhou, Y . Xu, Y . Wang, and G. Xie, “CFS: Scaling metadata service for distributed file system via pruned scope of critical sections,” inProc. of ACM EuroSys, 2023. 13

work page 2023

[54] [54]

Ceph: A scalable, high-performance distributed file system,

S. Weil, S. A. Brandt, E. L. Miller, D. D. Long, and C. Maltzahn, “Ceph: A scalable, high-performance distributed file system,” inProc. of USENIX OSDI, 2006

work page 2006

[55] [55]

Scalable performance of the Panasas parallel file system,

B. Welch, M. Unangst, Z. Abbasi, G. A. Gibson, B. Mueller, J. Small, J. Zelenka, and B. Zhou, “Scalable performance of the Panasas parallel file system,” inProc. of USENIX FAST, 2008

work page 2008

[56] [56]

ShardFS vs. IndexFS: Replication vs. caching strategies for distributed metadata management in cloud storage systems,

L. Xiao, K. Ren, Q. Zheng, and G. A. Gibson, “ShardFS vs. IndexFS: Replication vs. caching strategies for distributed metadata management in cloud storage systems,” inProc. of ACM SoCC, 2015

work page 2015

[57] [57]

SwitchFS: Asynchronous metadata updates for distributed filesystems with in- network coordination,

J. Xu, M. Dong, Q. Tian, Z. Tian, T. Xin, and H. Chen, “SwitchFS: Asynchronous metadata updates for distributed filesystems with in- network coordination,” inProc. of ACM EuroSys, 2026

work page 2026

[58] [58]

NetLock: Fast, centralized lock management using programmable switches,

Z. Yu, Y . Zhang, V . Braverman, M. Chowdhury, and X. Jin, “NetLock: Fast, centralized lock management using programmable switches,” in Proc. of ACM SIGCOMM, 2020

work page 2020

[59] [59]

Fast and scalable in-network lock management using lock fission,

H. Zhang, K. Cheng, R. Chen, and H. Chen, “Fast and scalable in-network lock management using lock fission,” inProc. of USENIX OSDI, 2024

work page 2024

[60] [60]

NetRPC: Enabling in-network computation in remote procedure calls,

B. Zhao, W. Wu, and W. Xu, “NetRPC: Enabling in-network computation in remote procedure calls,” inProc. of USENIX NSDI, 2023

work page 2023

[61] [61]

Harmonia: Near-linear scalability for replicated storage with in-network conflict detection,

H. Zhu, Z. Bai, J. Li, E. Michael, D. Ports, I. Stoica, and X. Jin, “Harmonia: Near-linear scalability for replicated storage with in-network conflict detection,”Proc. of the VLDB Endowment, vol. 13, no. 3, pp. 376–389, 2019

work page 2019