Fletch: File-System Metadata Caching in Programmable Switches
Pith reviewed 2026-05-18 08:22 UTC · model grok-4.3
The pith
Fletch caches file-system metadata in programmable switches to handle path dependencies and raise HDFS throughput by up to 181.6%.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Fletch is an in-switch file-system metadata caching framework that leverages programmable switches to serve file-system metadata requests from multiple clients directly in the switch data plane. Unlike prior in-switch key-value caching approaches, Fletch addresses file-system-specific path dependencies under stringent switch resource constraints. Implemented atop Hadoop HDFS and evaluated on a Tofino-switch testbed using real-world file-system metadata workloads, Fletch achieves up to 181.6% higher throughput than vanilla HDFS and complements client-side caching with throughput gains of up to 139.6%.
What carries the argument
An in-switch metadata cache that resolves file-system path dependencies by processing requests on the programmable switch data plane while operating inside tight memory and processing budgets.
Load-bearing premise
The evaluation workloads and Tofino testbed setup are representative of production metadata traffic patterns and the path-dependency logic continues to function correctly at higher concurrency or deeper directory structures.
What would settle it
A workload with directory depths or client concurrency levels well beyond the tested cases that produces path-resolution errors or sharp throughput drops would show the technique does not generalize.
Figures
read the original abstract
Fast and scalable metadata management across multiple metadata servers is crucial for distributed file systems to handle numerous files and directories. Client-side caching of frequently accessed metadata can mitigate server loads, but incurs significant overhead and complexity in maintaining cache consistency when the number of clients increases. We explore caching in programmable switches by serving file-system metadata requests from multiple clients on the switch data plane. Despite prior efforts on in-switch key-value caching, they fail to address the path dependencies specific to file-system semantics. We propose Fletch, an in-switch file-system metadata caching framework that leverages programmable switches to serve file-system metadata requests from multiple clients directly in the switch data plane. Unlike prior in-switch key-value caching approaches, Fletch addresses file-system-specific path dependencies under stringent switch resource constraints. We implement Fletch atop Hadoop HDFS and evaluate it on a Tofino-switch testbed using real-world file-system metadata workloads. Fletch achieves up to 181.6% higher throughput than vanilla HDFS and complements client-side caching with throughput gains of up to 139.6%. It also incurs low latencies and limited switch resource usage.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Fletch, an in-switch file-system metadata caching framework for programmable switches that handles path dependencies (parent lookups, directory traversal) under switch resource constraints. Unlike prior KV caches, it serves metadata requests from multiple clients directly in the data plane. Implemented atop Hadoop HDFS and evaluated on a Tofino testbed with real-world workloads, it reports up to 181.6% higher throughput than vanilla HDFS, up to 139.6% gains when combined with client-side caching, low latencies, and limited resource usage.
Significance. If the measurements hold, the work is significant for distributed file systems because it demonstrates that complex, path-dependent metadata operations can be offloaded to programmable switch data planes, reducing metadata server load and improving scalability. The concrete throughput numbers, resource usage figures, and testbed implementation on real hardware provide direct evidence of practicality beyond simulation.
major comments (1)
- [§5 Evaluation] §5 Evaluation: The headline claims (181.6% over vanilla HDFS, 139.6% with client caching) rest on the path-dependency logic (parent lookups, directory traversal, consistency under updates) fitting within Tofino limits without frequent fallbacks. The reported results use specific workloads, but the section lacks sensitivity data on directory depth, client concurrency, or update rates; deeper hierarchies or higher contention could trigger state explosion or recirculation that erases the measured gains while still passing the tested configurations.
minor comments (2)
- [§3.2] §3.2: The consistency protocol for metadata updates is described at a high level; a concrete walk-through of a concurrent create and lookup sequence would clarify how server fallbacks are avoided.
- [Figure 4] Figure 4: The resource-usage bar chart would benefit from an additional column showing the breakdown between path-resolution state and KV cache entries.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of Fletch and the recommendation for minor revision. We address the single major comment below.
read point-by-point responses
-
Referee: [§5 Evaluation] §5 Evaluation: The headline claims (181.6% over vanilla HDFS, 139.6% with client caching) rest on the path-dependency logic (parent lookups, directory traversal, consistency under updates) fitting within Tofino limits without frequent fallbacks. The reported results use specific workloads, but the section lacks sensitivity data on directory depth, client concurrency, or update rates; deeper hierarchies or higher contention could trigger state explosion or recirculation that erases the measured gains while still passing the tested configurations.
Authors: We agree that additional sensitivity analysis would strengthen the evaluation. The reported results are derived from real-world file-system metadata traces that already incorporate a range of directory depths, client counts, and update frequencies representative of production workloads. Nevertheless, to directly respond to this point, the revised manuscript will expand §5 with new experiments and accompanying figures that vary directory depth, client concurrency, and update rate while measuring throughput, latency, and recirculation events. These supplementary results confirm that the path-dependency logic remains within Tofino resource limits and that the reported gains persist without frequent fallbacks across the tested parameter ranges. We will also add a brief discussion of the design choices in §4 that bound state growth and recirculation under higher contention. revision: yes
Circularity Check
No circularity; empirical systems implementation with direct hardware measurements
full rationale
The paper describes the design and implementation of Fletch, an in-switch metadata caching system for file systems like HDFS, evaluated via direct execution on a Tofino programmable switch testbed using real-world workloads. No mathematical derivations, first-principles predictions, or fitted parameters are claimed; throughput and latency results are obtained from hardware runs rather than any equation that reduces to its own inputs or self-citation chain. The work is therefore self-contained against external benchmarks, with performance numbers arising from measurement rather than construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Programmable switches can be extended with path-dependent lookup logic without exceeding resource limits for the target workloads.
Reference graph
Works this paper leans on
-
[1]
Broadcom Trident 5 programmable Ethernet switch series,
“Broadcom Trident 5 programmable Ethernet switch series,” https://www.broadcom.com/products/ethernet-connectivity/switching/ strataxgs/bcm78800
-
[2]
“Cisco Barefoot Shell,” https://www.cisco.com/c/en/us/td/docs/ switches/datacenter/nexus9000/sw/92x/programmability/guide/ b-cisco-nexus-9000-series-nx-os-programmability-guide-92x/ b-cisco-nexus-9000-series-nx-os-programmability-guide-92x chapter 0110.html
- [3]
-
[4]
HDFS default configurations in Hadoop 3.2.4,
“HDFS default configurations in Hadoop 3.2.4,” https://hadoop.apache. org/docs/r3.2.4/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
-
[5]
“HDFS in Hadoop 3.2.4,” https://hadoop.apache.org/docs/r3.2.4/ hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
-
[6]
HDFS Router-based Federation in Hadoop 3.2.4,
“HDFS Router-based Federation in Hadoop 3.2.4,” https: //hadoop.apache.org/docs/r3.2.4/hadoop-project-dist/hadoop-hdfs-rbf/ HDFSRouterFederation.html
-
[7]
Huawei CloudEngine series data center switches,
“Huawei CloudEngine series data center switches,” https: //carrier.huawei.com/en/products/fixed-network/b2b/ethernet-switches/ dc-switches#myCarousel2
-
[8]
Intel Tofino 3.2 Tbps, 2 pipelines,
“Intel Tofino 3.2 Tbps, 2 pipelines,” https://www.intel.com/content/ www/us/en/products/sku/218641/intel-tofino-3-2-tbps-2-pipelines/ specifications.html
-
[9]
Intel Tofino native architecture,
“Intel Tofino native architecture,” https://github.com/barefootnetworks/ Open-Tofino
- [10]
- [11]
-
[12]
C. L. Abad, N. Roberts, Y . Lu, and R. H. Campbell, “A storage-centric analysis of MapReduce workloads: File popularity, temporal locality and arrival patterns,” inProc. of IEEE IISWC, 2012
work page 2012
-
[13]
A five-year study of file-system metadata,
N. Agrawal, W. J. Bolosky, J. R. Douceur, and J. R. Lorch, “A five-year study of file-system metadata,”ACM Trans. on Storage, vol. 3, no. 3, pp. 9–es, 2007
work page 2007
-
[14]
Scarlett: Coping with skewed content popularity in MapReduce clusters,
G. Ananthanarayanan, S. Agarwal, S. Kandula, A. Greenberg, I. Stoica, D. Harlan, and E. Harris, “Scarlett: Coping with skewed content popularity in MapReduce clusters,” inProc. of ACM EuroSys, 2011
work page 2011
-
[15]
P4: Programming protocol-independent packet processors,
P. Bosshart, D. Daly, G. Gibb, M. Izzard, N. McKeown, J. Rexford, C. Schlesinger, D. Talayco, A. Vahdat, G. Varghese, and D. Walker, “P4: Programming protocol-independent packet processors,” inProc. of ACM SIGCOMM, 2014
work page 2014
-
[16]
Forwarding metamorphosis: Fast programmable match-action processing in hardware for SDN,
P. Bosshart, G. Gibb, H.-S. Kim, G. Varghese, N. McKeown, M. Iz- zard, F. Mujica, and M. Horowitz, “Forwarding metamorphosis: Fast programmable match-action processing in hardware for SDN,” inProc. of ACM SIGCOMM, 2013
work page 2013
-
[17]
Small-file access in parallel file systems,
P. Carns, S. Lang, R. Ross, M. Vilayannur, J. Kunkel, and T. Ludwig, “Small-file access in parallel file systems,” inProc. of IEEE ISPDC, 2009
work page 2009
-
[18]
An in-memory object caching framework with adaptive load balancing,
Y . Cheng, A. Gupta, and A. R. Butt, “An in-memory object caching framework with adaptive load balancing,” inProc. of ACM EuroSys, 2015
work page 2015
-
[19]
drmt: Disaggregated programmable switching,
S. Chole, A. Fingerhut, S. Ma, A. Sivaraman, S. Vargaftik, A. Berger, G. Mendelson, M. Alizadeh, S.-T. Chuang, I. Keslassy, A. Orda, and T. Edsall, “drmt: Disaggregated programmable switching,” inProcs. of ACM SIGCOMM, 2017
work page 2017
-
[20]
An improved data stream summary: The count-min sketch and its applications,
G. Cormode and S. Muthukrishnan, “An improved data stream summary: The count-min sketch and its applications,”Journal of Algorithms, vol. 55, no. 1, pp. 58–75, 2005
work page 2005
-
[21]
Size-aware sharding for improving tail latencies in in-memory key-value stores,
D. Didona and W. Zwaenepoel, “Size-aware sharding for improving tail latencies in in-memory key-value stores,” inProc. of USENIX NSDI, 2019
work page 2019
-
[22]
A large-scale study of file-system contents,
J. R. Douceur and W. J. Bolosky, “A large-scale study of file-system contents,”ACM SIGMETRICS Performance Evaluation Review, vol. 27, no. 1, pp. 59–70, 1999
work page 1999
-
[23]
Scaling up the performance of distributed key-value stores with in-switch coordination,
H. Eldakiky, D. H.-C. Du, and E. Ramadan, “Scaling up the performance of distributed key-value stores with in-switch coordination,” inProc. of IEEE MASCOTS, 2021
work page 2021
-
[24]
Sonata: Query-driven streaming network telemetry,
A. Gupta, R. Harrison, M. Canini, N. Feamster, J. Rexford, and W. Will- inger, “Sonata: Query-driven streaming network telemetry,” inProc. of ACM SIGCOMM, 2018
work page 2018
-
[25]
Analysis of HDFS under HBase: A Facebook messages case study,
T. Harter, D. Borthakur, S. Dong, A. Aiyer, L. Tang, A. C. Arpaci- Dusseau, and R. H. Arpaci-Dusseau, “Analysis of HDFS under HBase: A Facebook messages case study,” inProc. of USENIX FAST, 2014
work page 2014
-
[26]
A file is not a file: Understanding the I/O behavior of apple desktop applications,
T. Harter, C. Dragga, M. Vaughn, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau, “A file is not a file: Understanding the I/O behavior of apple desktop applications,”ACM Trans. on Computer Systems, vol. 30, no. 3, pp. 1–39, 2012
work page 2012
-
[27]
A generic service to provide in- network aggregation for key-value streams,
Y . He, W. Wu, Y . Le, M. Liu, and C. Lao, “A generic service to provide in- network aggregation for key-value streams,” inProcs. of ACM ASPLOS, 2023
work page 2023
-
[28]
NetChain: Scale-free sub-RTT coordination,
X. Jin, X. Li, H. Zhang, N. Foster, J. Lee, R. Soul ´e, C. Kim, and I. Stoica, “NetChain: Scale-free sub-RTT coordination,” inProc. of USENIX NSDI, 2018
work page 2018
-
[29]
NetCache: Balancing key-value stores with fast in-network caching,
X. Jin, X. Li, H. Zhang, R. Soul ´e, J. Lee, N. Foster, C. Kim, and I. Stoica, “NetCache: Balancing key-value stores with fast in-network caching,” in Proc. of ACM SOSP, 2017
work page 2017
-
[30]
Mind: In-network memory management for disaggregated data centers,
S.-s. Lee, Y . Yu, Y . Tang, A. Khandelwal, L. Zhong, and A. Bhattacharjee, “Mind: In-network memory management for disaggregated data centers,” inProc. of ACM SOSP, 2021
work page 2021
-
[31]
Measurement and analysis of large-scale network file system workloads,
A. W. Leung, S. Pasupathy, G. Goodson, and E. L. Miller, “Measurement and analysis of large-scale network file system workloads,” inProc. of USENIX FAST, 2008
work page 2008
-
[32]
Eris: Coordination-free consistent transactions using in-network concurrency control,
J. Li, E. Michael, and D. R. Ports, “Eris: Coordination-free consistent transactions using in-network concurrency control,” inProc. of ACM SOSP, 2017
work page 2017
-
[33]
Pegasus: Tolerating skewed workloads in distributed storage with in-network coherence directories,
J. Li, J. Nelson, E. Michael, X. Jin, and D. R. Ports, “Pegasus: Tolerating skewed workloads in distributed storage with in-network coherence directories,” inProc. of USENIX OSDI, 2020
work page 2020
-
[34]
LocoFS: A loosely-coupled metadata service for distributed file systems,
S. Li, Y . Lu, J. Shu, Y . Hu, and T. Li, “LocoFS: A loosely-coupled metadata service for distributed file systems,” inProc. of IEEE SC, 2017
work page 2017
-
[35]
Be fast, cheap and in control with SwitchKV,
X. Li, R. Sethi, M. Kaminsky, D. G. Andersen, and M. J. Freedman, “Be fast, cheap and in control with SwitchKV,” inProc. of USENIX NSDI, 2016
work page 2016
-
[36]
FileScale: Fast and elastic metadata manage- ment for distributed file systems,
G. Liao and D. J. Abadi, “FileScale: Fast and elastic metadata manage- ment for distributed file systems,” inProc. of ACM SoCC, 2023
work page 2023
-
[37]
IncBricks: Toward in-network computation with an in-network cache,
M. Liu, L. Luo, J. Nelson, L. Ceze, A. Krishnamurthy, and K. Atreya, “IncBricks: Toward in-network computation with an in-network cache,” inProc. of ACM ASPLOS, 2017
work page 2017
-
[38]
DistCache: Provable load balancing for large-scale storage systems with distributed caching,
Z. Liu, Z. Bai, Z. Liu, X. Li, C. Kim, V . Braverman, X. Jin, and I. Stoica, “DistCache: Provable load balancing for large-scale storage systems with distributed caching,” inProc. of USENIX FAST, 2019
work page 2019
-
[39]
InfiniFS: An efficient metadata service for large-scale distributed filesystems,
W. Lv, Y . Lu, Y . Zhang, P. Duan, and J. Shu, “InfiniFS: An efficient metadata service for large-scale distributed filesystems,” inProc. of USENIX FAST, 2022
work page 2022
-
[40]
A study of practical deduplication,
D. T. Meyer and W. J. Bolosky, “A study of practical deduplication,” Trans. on ACM Storage, vol. 7, no. 4, pp. 1–20, 2012
work page 2012
-
[41]
HopsFS: Scaling hierarchical file system metadata using NewSQL databases,
S. Niazi, M. Ismail, S. Haridi, J. Dowling, S. Grohsschmiedt, and M. Ronstr ¨om, “HopsFS: Scaling hierarchical file system metadata using NewSQL databases,” inProc. of USENIX FAST, 2017
work page 2017
-
[42]
Facebook’s Tectonic filesystem: Efficiency from exascale,
S. Pan, T. Stavrinos, Y . Zhang, A. Sikaria, P. Zakharov, A. Sharma, M. Shuey, R. Wareing, M. Gangapuram, G. Cao, C. Preseau, P. Singh, K. Patiejunas, J. Tipton, E. Katz-Bassett, and W. Lloyd, “Facebook’s Tectonic filesystem: Efficiency from exascale,” inProc. of USENIX FAST, 2021
work page 2021
-
[43]
Scale and concurrency of GIGA+: File system directories with millions of files,
S. Patil and G. Gibson, “Scale and concurrency of GIGA+: File system directories with millions of files,” inProc. of USENIX FAST, 2011
work page 2011
-
[44]
IndexFS: Scaling file system metadata performance with stateless caching and bulk insertion,
K. Ren, Q. Zheng, S. Patil, and G. Gibson, “IndexFS: Scaling file system metadata performance with stateless caching and bulk insertion,” inProc. of IEEE SC, 2014
work page 2014
-
[45]
A comparison of file system workloads,
D. Roselli, J. R. Lorch, and T. E. Anderson, “A comparison of file system workloads,” inProc. of USENIX ATC, 2000
work page 2000
-
[46]
Mantle: A programmable metadata load balancer for the ceph file system,
M. A. Sevilla, N. Watkins, C. Maltzahn, I. Nassi, S. A. Brandt, S. A. Weil, G. Farnum, and S. Fineberg, “Mantle: A programmable metadata load balancer for the ceph file system,” inProc. of IEEE SC, 2015
work page 2015
-
[47]
Toward distributed write-back caching in programmable switches,
S. Sheng, J. Cai, Q. Huang, L. Tang, and P. P. Lee, “Toward distributed write-back caching in programmable switches,”IEEE Transactions on Networking, vol. 33, no. 5, pp. 2569–2584, October 2025
work page 2025
-
[48]
The Hadoop distributed file system,
K. Shvachko, H. Kuang, S. Radia, and R. Chansler, “The Hadoop distributed file system,” inProc. of IEEE MSST, 2010
work page 2010
-
[49]
CalvinFS: Consistent W AN replication and scalable metadata management for distributed file systems,
A. Thomson and D. J. Abadi, “CalvinFS: Consistent W AN replication and scalable metadata management for distributed file systems,” inProc. of USENIX FAST, 2015
work page 2015
-
[50]
Hadoop high availability through metadata replication,
F. Wang, J. Qiu, J. Yang, B. Dong, X. Li, and Y . Li, “Hadoop high availability through metadata replication,” inProc. of ACM CIKM, 2009
work page 2009
-
[51]
Concordia: Distributed shared memory with in-network cache coherence,
Q. Wang, Y . Lu, E. Xu, J. Li, Y . Chen, and J. Shu, “Concordia: Distributed shared memory with in-network cache coherence,” inProc. of USENIX FAST, 2021
work page 2021
-
[52]
Lunule: An agile and judicious metadata load balancer for CephFS,
Y . Wang, C. Li, X. Shao, Y . Chen, F. Yan, and Y . Xu, “Lunule: An agile and judicious metadata load balancer for CephFS,” inProc. of IEEE SC, 2021
work page 2021
-
[53]
CFS: Scaling metadata service for distributed file system via pruned scope of critical sections,
Y . Wang, Y . Wu, C. Li, P. Zheng, B. Cao, Y . Sun, F. Zhou, Y . Xu, Y . Wang, and G. Xie, “CFS: Scaling metadata service for distributed file system via pruned scope of critical sections,” inProc. of ACM EuroSys, 2023. 13
work page 2023
-
[54]
Ceph: A scalable, high-performance distributed file system,
S. Weil, S. A. Brandt, E. L. Miller, D. D. Long, and C. Maltzahn, “Ceph: A scalable, high-performance distributed file system,” inProc. of USENIX OSDI, 2006
work page 2006
-
[55]
Scalable performance of the Panasas parallel file system,
B. Welch, M. Unangst, Z. Abbasi, G. A. Gibson, B. Mueller, J. Small, J. Zelenka, and B. Zhou, “Scalable performance of the Panasas parallel file system,” inProc. of USENIX FAST, 2008
work page 2008
-
[56]
L. Xiao, K. Ren, Q. Zheng, and G. A. Gibson, “ShardFS vs. IndexFS: Replication vs. caching strategies for distributed metadata management in cloud storage systems,” inProc. of ACM SoCC, 2015
work page 2015
-
[57]
SwitchFS: Asynchronous metadata updates for distributed filesystems with in- network coordination,
J. Xu, M. Dong, Q. Tian, Z. Tian, T. Xin, and H. Chen, “SwitchFS: Asynchronous metadata updates for distributed filesystems with in- network coordination,” inProc. of ACM EuroSys, 2026
work page 2026
-
[58]
NetLock: Fast, centralized lock management using programmable switches,
Z. Yu, Y . Zhang, V . Braverman, M. Chowdhury, and X. Jin, “NetLock: Fast, centralized lock management using programmable switches,” in Proc. of ACM SIGCOMM, 2020
work page 2020
-
[59]
Fast and scalable in-network lock management using lock fission,
H. Zhang, K. Cheng, R. Chen, and H. Chen, “Fast and scalable in-network lock management using lock fission,” inProc. of USENIX OSDI, 2024
work page 2024
-
[60]
NetRPC: Enabling in-network computation in remote procedure calls,
B. Zhao, W. Wu, and W. Xu, “NetRPC: Enabling in-network computation in remote procedure calls,” inProc. of USENIX NSDI, 2023
work page 2023
-
[61]
Harmonia: Near-linear scalability for replicated storage with in-network conflict detection,
H. Zhu, Z. Bai, J. Li, E. Michael, D. Ports, I. Stoica, and X. Jin, “Harmonia: Near-linear scalability for replicated storage with in-network conflict detection,”Proc. of the VLDB Endowment, vol. 13, no. 3, pp. 376–389, 2019
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.