NetCAS: Dynamic Cache and Backend Device Management in Networked Environments
Pith reviewed 2026-05-18 14:42 UTC · model grok-4.3
The pith
NetCAS splits I/O between cache and remote backend using real-time network feedback to improve throughput under variable conditions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
NetCAS dynamically determines split ratios between cache and backend based on real-time network feedback and a precomputed Perf Profile, then enforces those ratios with a low-overhead batched round-robin scheduler, achieving up to 174% higher performance than traditional caching and up to 3.5X better than converging schemes under fluctuating network conditions.
What carries the argument
Dynamic split ratio adjustment driven by network feedback and Perf Profile, enforced by a batched round-robin scheduler that avoids per-request costs.
If this is right
- Storage systems can maintain higher throughput by actively using both cache and backend even when the backend is remote and contended.
- Split ratios should respond to both workload patterns and current networking performance rather than hit rates alone.
- Low-overhead batch scheduling makes frequent ratio changes practical without adding measurable latency.
- Performance improves most when network conditions vary, because the system can shift load away from temporarily slow paths.
Where Pith is reading between the lines
- The same feedback-driven splitting could be tested on storage hierarchies with more than two device tiers.
- Combining the method with lightweight network prediction might reduce dependence on continuous real-time measurements.
- Similar dynamic balancing may help other networked resources such as distributed file systems or object stores facing variable interconnect delays.
Load-bearing premise
Real-time network feedback can be collected at negligible cost and the precomputed performance profile remains accurate for actual runtime workloads without frequent re-profiling.
What would settle it
Deploy NetCAS and a hit-rate baseline on a testbed with controlled network fluctuations, then measure whether throughput gains exceed traditional methods by the claimed margins while tracking profiling and feedback overhead.
Figures
read the original abstract
Modern storage systems often combine fast cache with slower backend devices to accelerate I/O. As performance gaps narrow, concurrently accessing both devices, rather than relying solely on cache hits, can improve throughput. However, in data centers, remote backend storage accessed over networks suffers from unpredictable contention, complicating this split. We present NetCAS, a framework that dynamically splits I/O between cache and backend devices based on real-time network feedback and a precomputed Perf Profile. Unlike traditional hit-rate-based policies, NetCAS adapts split ratios to workload configuration and networking performance. NetCAS employs a low-overhead batched round-robin scheduler to enforce splits, avoiding per-request costs. It achieves up to 174% higher performance than traditional caching in remote storage environments and outperforms converging schemes like Orthus by up to 3.5X under fluctuating network conditions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces NetCAS, a framework for dynamic I/O splitting between fast cache and slower remote backend storage in networked environments. It combines real-time network feedback with a precomputed Perf Profile to select split ratios, enforced by a low-overhead batched round-robin scheduler rather than per-request decisions. The authors claim up to 174% higher performance than traditional hit-rate caching and up to 3.5X improvement over converging schemes such as Orthus under fluctuating network conditions.
Significance. If the evaluation holds, the work addresses a practical gap in remote storage systems where network contention makes static or hit-rate-only policies suboptimal. The batched scheduler and profile-based adaptation offer a concrete mechanism for runtime splitting that could improve throughput in data-center settings without high per-request overhead.
major comments (2)
- [Abstract and §5] Abstract and §5 (Evaluation): the headline performance numbers (174% over traditional caching, 3.5X over Orthus) are stated without visible experimental setup details, workload descriptions, baseline configurations, or error bars. These omissions make it impossible to assess whether the measured gains are attributable to the dynamic split or to other factors.
- [§3.2] §3.2 (Perf Profile construction): the central claim that split ratios chosen from the precomputed profile plus real-time feedback outperform static policies rests on the assumption that the profile remains accurate under runtime shifts in I/O mix or network contention. No mechanism for detecting profile staleness or low-cost re-profiling is described, which directly undermines the adaptability advantage asserted for fluctuating conditions.
minor comments (2)
- [§4] §4 (Scheduler): the description of the batched round-robin enforcement would benefit from a small pseudocode listing or timing diagram to clarify how batching avoids per-request overhead.
- [Related Work] Related work section: ensure explicit comparison of the Perf Profile approach against other profile-based or feedback-driven caching systems beyond Orthus.
Simulated Author's Rebuttal
We thank the referee for their thoughtful review and constructive comments on our work. We address each of the major comments in detail below, and have made revisions to the manuscript to improve clarity and completeness.
read point-by-point responses
-
Referee: [Abstract and §5] Abstract and §5 (Evaluation): the headline performance numbers (174% over traditional caching, 3.5X over Orthus) are stated without visible experimental setup details, workload descriptions, baseline configurations, or error bars. These omissions make it impossible to assess whether the measured gains are attributable to the dynamic split or to other factors.
Authors: We agree with the referee that additional details are necessary to properly contextualize the reported performance improvements. In the revised manuscript, we have updated the abstract to include a brief description of the experimental setup, the workloads used (such as varying I/O sizes and access patterns under simulated network fluctuations), the baseline configurations for traditional caching and Orthus, and we have added error bars to the relevant graphs in §5. These changes help demonstrate that the gains are due to the dynamic I/O splitting mechanism. revision: yes
-
Referee: [§3.2] §3.2 (Perf Profile construction): the central claim that split ratios chosen from the precomputed profile plus real-time feedback outperform static policies rests on the assumption that the profile remains accurate under runtime shifts in I/O mix or network contention. No mechanism for detecting profile staleness or low-cost re-profiling is described, which directly undermines the adaptability advantage asserted for fluctuating conditions.
Authors: The Perf Profile is precomputed to provide a mapping from network conditions to optimal split ratios, and real-time network feedback is used to select from this profile at runtime. We recognize that the manuscript did not explicitly describe handling for potential profile staleness. To address this, we have added text in §3.2 explaining that the profile incorporates a range of conditions to provide robustness, and we include a low-cost sampling mechanism using recent network metrics to trigger re-selection or minor adjustments without full re-profiling. This maintains the low-overhead nature while enhancing adaptability. revision: partial
Circularity Check
No significant circularity; claims rest on implementation and empirical measurement
full rationale
The provided abstract and description contain no equations, derivations, or mathematical claims. NetCAS is presented as a practical framework that uses a precomputed Perf Profile and real-time feedback as inputs to a scheduler, with performance numbers (174% and 3.5X) reported from measurements. No step reduces a prediction or result to a fitted parameter or self-citation by construction. The central claims are therefore self-contained against external benchmarks rather than circular.
Axiom & Free-Parameter Ledger
free parameters (1)
- split ratios
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
NetCAS relies on a precomputed Perf Profile that stores empirically optimal ratios for different workload configurations... ρbase = Icache / (Icache + Iback)
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat.induction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Batched Weighted Round Robin (BWRR) scheduler... pattern_size ← min(W/gcd(a,b),B)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
{SpanDB}: A fast,{Cost-Effective}{LSM- tree} based {KV} store on hybrid storage
Hao Chen, Chaoyi Ruan, Cheng Li, Xiaosong Ma, and Yinlong Xu. {SpanDB}: A fast,{Cost-Effective}{LSM- tree} based {KV} store on hybrid storage. In19th USENIX Conference on File and Storage Technologies (FAST 21), pages 17–32, 2021
work page 2021
-
[2]
Writes hurt: Lessons in cache design for optane nvram
Alexandra Fedorova, Keith A Smith, Keith Bostic, Su- san LoVerso, Michael Cahill, and Alex Gorrod. Writes hurt: Lessons in cache design for optane nvram. InPro- ceedings of the 13th Symposium on Cloud Computing, pages 110–125, 2022
work page 2022
-
[3]
Performance characterization of nvme-over- fabrics storage disaggregation
Zvika Guz, Harry Li, Anahita Shayesteh, and Vijay Bal- akrishnan. Performance characterization of nvme-over- fabrics storage disaggregation. Master’s thesis, 2018
work page 2018
-
[4]
Cubic: a new tcp-friendly high-speed tcp variant
Sangtae Ha, Injong Rhee, and Lisong Xu. Cubic: a new tcp-friendly high-speed tcp variant. volume 42, pages 64–74. ACM New York, NY , USA, 2008
work page 2008
-
[5]
Gabriel Haas and Viktor Leis. What modern nvme stor- age can do, and how to exploit it: High-performance i/o for high-performance storage engines.Proceedings of the VLDB Endowment, 16(9):2090–2102, 2023
work page 2090
-
[6]
Lifting the veil on {Meta’s} microservice architecture: Analyses of topology and request workflows
Darby Huye, Yuri Shkuro, and Raja R Sambasivan. Lifting the veil on {Meta’s} microservice architecture: Analyses of topology and request workflows. In2023 USENIX Annual Technical Conference (USENIX ATC 23), pages 419–432, 2023
work page 2023
-
[7]
Netcache: Balancing key-value stores with fast in-network caching
Xin Jin, Xiaozhou Li, Haoyu Zhang, Robert Soulé, Jeongkeun Lee, Nate Foster, Changhoon Kim, and Ion Stoica. Netcache: Balancing key-value stores with fast in-network caching. InProceedings of the 26th sympo- sium on operating systems principles, pages 121–136, 2017
work page 2017
-
[8]
Understanding and profil- ing {NVMe-over-TCP} using ntprof
Yuyuan Kang and Ming Liu. Understanding and profil- ing {NVMe-over-TCP} using ntprof. In22nd USENIX Symposium on Networked Systems Design and Imple- mentation (NSDI 25), pages 1117–1136, 2025
work page 2025
-
[9]
Strata: A cross media file system
Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, and Thomas Anderson. Strata: A cross media file system. InProceedings of the 26th Symposium on Operating Systems Principles, pages 460– 477, 2017
work page 2017
-
[10]
Wen Yew Liang, Ming Feng Chang, Yen Lin Chen, and Jenq Haur Wang. Performance evaluation for dynamic voltage and frequency scaling using runtime perfor- mance counters.Applied Mechanics and Materials, 284:2575–2579, 2013
work page 2013
-
[11]
{P2CACHE}: Exploring tiered memory for{In-Kernel} file systems caching
Zhen Lin, Lingfeng Xiang, Jia Rao, and Hui Lu. {P2CACHE}: Exploring tiered memory for{In-Kernel} file systems caching. In2023 USENIX Annual Technical Conference (USENIX ATC 23), pages 801–815, 2023
work page 2023
-
[12]
Multi-host sharing of a single-function nvme device in a pcie cluster
Jonas Markussen, Lars Bjørlykke Kristiansen, Håkon Kvale Stensland, and Pål Halvorsen. Multi-host sharing of a single-function nvme device in a pcie cluster. InSC24-W: Workshops of the International Conference for High Performance Computing, Network- ing, Storage and Analysis, pages 1638–1645. IEEE, 2024
work page 2024
-
[13]
Microsoft. Azure sql database hyperscale. https://learn.microsoft.com/en-us/azure/ azure-sql/database/service-tier-hyperscale ,
-
[14]
Accessed: 2025-06-30
work page 2025
-
[15]
Nvme® over rdma trans- port: Improving network-based storage
NVM Express, Inc. Nvme® over rdma trans- port: Improving network-based storage. NVM Express Blog. https://nvmexpress.org/ nvme-over-rdma-transport-improving-network-based-storage/
-
[16]
NVM Express TCP Trans- port Specification
NVM Express Organization. NVM Express TCP Trans- port Specification
-
[17]
Open Cache Acceleration Software
Open CAS. Open Cache Acceleration Software. https: //open-cas.com/. Accessed: 2025-09-17
work page 2025
-
[18]
Introducing lightning: A flex- ible nvme jbof
Chris Petersen. Introducing lightning: A flex- ible nvme jbof. InEngineering at Meta,
-
[19]
https://engineering.fb.com/ 2016/03/09/data-center-engineering/ introducing-lightning-a-flexible-nvme-jbof/
work page 2016
-
[20]
{PolyStore}: Exploiting combined capabilities of heterogeneous storage
Yujie Ren, David Domingo, Jian Zhang, Paul John, Rekha Pitchumani, Sanidhya Kashyap, and Sudarsun Kannan. {PolyStore}: Exploiting combined capabilities of heterogeneous storage. In23rd USENIX Conference on File and Storage Technologies (FAST 25), pages 539– 555, 2025
work page 2025
-
[21]
Disaggregated raid storage in modern datacenters
Junyi Shu, Ruidong Zhu, Yun Ma, Gang Huang, Hong Mei, Xuanzhe Liu, and Xin Jin. Disaggregated raid storage in modern datacenters. InProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3, pages 147–163, 2023
work page 2023
-
[22]
Dcperf: An open-source, battle-tested performance benchmark suite for datacenter workloads
Wei Su, Abhishek Dhanotia, Carlos Torres, Jayneel Gandhi, Neha Gholkar, Shobhit Kanaujia, Maxim Nau- mov, Kalyan Subramanian, Valentin Andrei, Yifan Yuan, et al. Dcperf: An open-source, battle-tested performance benchmark suite for datacenter workloads. InProceed- ings of the 52nd Annual International Symposium on Computer Architecture, pages 1717–1730, 2025. 7
work page 2025
-
[23]
Amazon aurora: Design consid- erations for high throughput cloud-native relational databases
Alexandre Verbitski, Anurag Gupta, Debanjan Saha, Mu- rali Brahmadesam, Kamal Gupta, Raman Mittal, Sailesh Krishnamurthy, Sandor Maurice, Tengiz Kharatishvili, and Xiaofeng Bao. Amazon aurora: Design consid- erations for high throughput cloud-native relational databases. InProceedings of the 2017 ACM Interna- tional Conference on Management of Data, pages...
work page 2017
-
[24]
Tcp ex machina: Computer-generated congestion control
Keith Winstein and Hari Balakrishnan. Tcp ex machina: Computer-generated congestion control. volume 43, pages 123–134. ACM New York, NY , USA, 2013
work page 2013
-
[25]
The storage hierarchy is not a hierarchy: Optimizing caching on modern storage devices with orthus
Kan Wu, Zhihan Guo, Guanzhou Hu, Kaiwei Tu, Ram- natthan Alagappan, Rathijit Sen, Kwanghyun Park, An- drea C Arpaci-Dusseau, and Remzi H Arpaci-Dusseau. The storage hierarchy is not a hierarchy: Optimizing caching on modern storage devices with orthus. In19th USENIX Conference on File and Storage Technologies (FAST 21), pages 307–323, 2021
work page 2021
-
[26]
Yekang Zhan, Haichuan Hu, Xiangrui Yang, Qiang Cao, Hong Jiang, Shaohua Wang, and Jie Yao. Rethinking the {Request-to-IO} transformation process of file sys- tems for full utilization of {High-Bandwidth}{SSDs}. In23rd USENIX Conference on File and Storage Tech- nologies (FAST 25), pages 69–86, 2025
work page 2025
-
[27]
Dds: Dpu-optimized disaggregated storage
Qizhen Zhang, Philip Bernstein, Badrish Chandramouli, Jiasheng Hu, and Yiming Zheng. Dds: Dpu-optimized disaggregated storage. 2024
work page 2024
-
[28]
Ziggurat: A tiered file system for {Non- V olatile} main memories and disks
Shengan Zheng, Morteza Hoseinzadeh, and Steven Swanson. Ziggurat: A tiered file system for {Non- V olatile} main memories and disks. In17th USENIX Conference on File and Storage Technologies (FAST 19), pages 207–219, 2019. 8
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.