pith. sign in

arxiv: 2510.02323 · v2 · submitted 2025-09-25 · 💻 cs.OS · cs.NI· cs.PF

NetCAS: Dynamic Cache and Backend Device Management in Networked Environments

Pith reviewed 2026-05-18 14:42 UTC · model grok-4.3

classification 💻 cs.OS cs.NIcs.PF
keywords dynamic cachingnetworked storageI/O splittingperformance profilingcache backend managementround-robin schedulingremote storage
0
0 comments X

The pith

NetCAS splits I/O between cache and remote backend using real-time network feedback to improve throughput under variable conditions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces NetCAS as a framework that decides how many requests go to fast local cache versus slower remote backend storage. It bases the split on live network measurements plus a precomputed profile of device performance instead of relying only on cache hit rates. This matters in data centers where network contention changes unpredictably and can drag down overall speed. The system applies the chosen split ratios through a batched round-robin scheduler that keeps overhead low. If correct, the approach shows that active concurrent use of both devices can beat traditional cache-only policies when networks fluctuate.

Core claim

NetCAS dynamically determines split ratios between cache and backend based on real-time network feedback and a precomputed Perf Profile, then enforces those ratios with a low-overhead batched round-robin scheduler, achieving up to 174% higher performance than traditional caching and up to 3.5X better than converging schemes under fluctuating network conditions.

What carries the argument

Dynamic split ratio adjustment driven by network feedback and Perf Profile, enforced by a batched round-robin scheduler that avoids per-request costs.

If this is right

  • Storage systems can maintain higher throughput by actively using both cache and backend even when the backend is remote and contended.
  • Split ratios should respond to both workload patterns and current networking performance rather than hit rates alone.
  • Low-overhead batch scheduling makes frequent ratio changes practical without adding measurable latency.
  • Performance improves most when network conditions vary, because the system can shift load away from temporarily slow paths.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same feedback-driven splitting could be tested on storage hierarchies with more than two device tiers.
  • Combining the method with lightweight network prediction might reduce dependence on continuous real-time measurements.
  • Similar dynamic balancing may help other networked resources such as distributed file systems or object stores facing variable interconnect delays.

Load-bearing premise

Real-time network feedback can be collected at negligible cost and the precomputed performance profile remains accurate for actual runtime workloads without frequent re-profiling.

What would settle it

Deploy NetCAS and a hit-rate baseline on a testbed with controlled network fluctuations, then measure whether throughput gains exceed traditional methods by the claimed margins while tracking profiling and feedback overhead.

Figures

Figures reproduced from arXiv: 2510.02323 by Chanseo Park, Joon Yong Hwang, Younghoon Kim.

Figure 1
Figure 1. Figure 1: Throughput comparison between cache device (PMem), backend device (NVMe), and splitting at the optimal ratio across varying thread counts. The percentage labels on the splitting line denote the optimal split ratio at each concurrency (e.g., 75% indicates 75% of requests sent to cache and 25% to backend). 2 Background & Motivation 2.1 Shifting Device Asymmetries Traditional hierarchies paired small, fast ca… view at source ↗
Figure 3
Figure 3. Figure 3: Break-even (BE) analysis for NetCAS (inflight requests = 16, threads = 16). The full table was constructed from a grid of 5 inflight levels × 5 thread levels × 2 block sizes with 30 s per point, requiring about 25 minutes for the one-time build. 3.3 Performance Profile To split requests efficiently without incurring costly online exploration, NetCAS relies on a Performance Profile (Perf Profile) that empir… view at source ↗
Figure 4
Figure 4. Figure 4: Normalized throughput under different inflight request counts without network congestion. At low concurrency the calcu￾lated split deviates from the empirical best, but accuracy improves quickly with higher concurrency, converging to the optimal ratio. 1 2 4 8 1 2 4 8 1 2 4 8 1 2 4 8 Number of Threads 0 100 Total IOPS(K) Inflight req=1 Inflight req=2 Inflight req=4 Inflight req=8 BWRR Random [PITH_FULL_IM… view at source ↗
Figure 5
Figure 5. Figure 5: Throughput comparison of BWRR versus random dispatch across inflight requests and thread counts. BWRR sustains the target ratio more evenly, delivering higher aggregate IOPS especially under shallow queues where randomization causes imbalance. short windows. This prevents burstiness, avoids idle slots on either device, and keeps both cache and backend continuously utilized. Algorithm 1 shows the core logic… view at source ↗
Figure 6
Figure 6. Figure 6: Baseline throughput without contention. NetCAS achieves up to 125% higher throughput than OrthusCAS and 142% over vanilla OpenCAS across concurrency levels. while the backend device is a remote Samsung 990 Pro NVMe SSD accessed over NVMe–oF (RDMA) through a Mellanox ConnectX–5 100 Gbps NIC. Both devices are prefilled with data before measurement to ensure cache–backend consis￾tency. The network topology co… view at source ↗
Figure 7
Figure 7. Figure 7: Throughput under injected congestion (20 s). fio with 4 and 16 thread settings (16 inflight request each) and TPCC with 16 terminals (StockLevel 100% for read-only workload). In all cases, contention is created by 10 competing flows from 2 servers, each capped at 2.5 Gbps. 4.3 Performance Under Contention Next, we evaluate robustness under dynamic congestion. Fig￾ure 7 shows throughput over time with 20s o… view at source ↗
Figure 8
Figure 8. Figure 8: Throughput under low and high contention for the same workload (inflight requests = 16, threads = 16). Each competing flow attempts to maximize its bandwidth without capping. NetCAS allo￾cates a larger share to the cache as backend bandwidth is constrained, mitigating throughput loss. 5 Conclusion This paper presented NetCAS, a lightweight framework for hybrid storage systems, which benefits from concurren… view at source ↗
read the original abstract

Modern storage systems often combine fast cache with slower backend devices to accelerate I/O. As performance gaps narrow, concurrently accessing both devices, rather than relying solely on cache hits, can improve throughput. However, in data centers, remote backend storage accessed over networks suffers from unpredictable contention, complicating this split. We present NetCAS, a framework that dynamically splits I/O between cache and backend devices based on real-time network feedback and a precomputed Perf Profile. Unlike traditional hit-rate-based policies, NetCAS adapts split ratios to workload configuration and networking performance. NetCAS employs a low-overhead batched round-robin scheduler to enforce splits, avoiding per-request costs. It achieves up to 174% higher performance than traditional caching in remote storage environments and outperforms converging schemes like Orthus by up to 3.5X under fluctuating network conditions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces NetCAS, a framework for dynamic I/O splitting between fast cache and slower remote backend storage in networked environments. It combines real-time network feedback with a precomputed Perf Profile to select split ratios, enforced by a low-overhead batched round-robin scheduler rather than per-request decisions. The authors claim up to 174% higher performance than traditional hit-rate caching and up to 3.5X improvement over converging schemes such as Orthus under fluctuating network conditions.

Significance. If the evaluation holds, the work addresses a practical gap in remote storage systems where network contention makes static or hit-rate-only policies suboptimal. The batched scheduler and profile-based adaptation offer a concrete mechanism for runtime splitting that could improve throughput in data-center settings without high per-request overhead.

major comments (2)
  1. [Abstract and §5] Abstract and §5 (Evaluation): the headline performance numbers (174% over traditional caching, 3.5X over Orthus) are stated without visible experimental setup details, workload descriptions, baseline configurations, or error bars. These omissions make it impossible to assess whether the measured gains are attributable to the dynamic split or to other factors.
  2. [§3.2] §3.2 (Perf Profile construction): the central claim that split ratios chosen from the precomputed profile plus real-time feedback outperform static policies rests on the assumption that the profile remains accurate under runtime shifts in I/O mix or network contention. No mechanism for detecting profile staleness or low-cost re-profiling is described, which directly undermines the adaptability advantage asserted for fluctuating conditions.
minor comments (2)
  1. [§4] §4 (Scheduler): the description of the batched round-robin enforcement would benefit from a small pseudocode listing or timing diagram to clarify how batching avoids per-request overhead.
  2. [Related Work] Related work section: ensure explicit comparison of the Perf Profile approach against other profile-based or feedback-driven caching systems beyond Orthus.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful review and constructive comments on our work. We address each of the major comments in detail below, and have made revisions to the manuscript to improve clarity and completeness.

read point-by-point responses
  1. Referee: [Abstract and §5] Abstract and §5 (Evaluation): the headline performance numbers (174% over traditional caching, 3.5X over Orthus) are stated without visible experimental setup details, workload descriptions, baseline configurations, or error bars. These omissions make it impossible to assess whether the measured gains are attributable to the dynamic split or to other factors.

    Authors: We agree with the referee that additional details are necessary to properly contextualize the reported performance improvements. In the revised manuscript, we have updated the abstract to include a brief description of the experimental setup, the workloads used (such as varying I/O sizes and access patterns under simulated network fluctuations), the baseline configurations for traditional caching and Orthus, and we have added error bars to the relevant graphs in §5. These changes help demonstrate that the gains are due to the dynamic I/O splitting mechanism. revision: yes

  2. Referee: [§3.2] §3.2 (Perf Profile construction): the central claim that split ratios chosen from the precomputed profile plus real-time feedback outperform static policies rests on the assumption that the profile remains accurate under runtime shifts in I/O mix or network contention. No mechanism for detecting profile staleness or low-cost re-profiling is described, which directly undermines the adaptability advantage asserted for fluctuating conditions.

    Authors: The Perf Profile is precomputed to provide a mapping from network conditions to optimal split ratios, and real-time network feedback is used to select from this profile at runtime. We recognize that the manuscript did not explicitly describe handling for potential profile staleness. To address this, we have added text in §3.2 explaining that the profile incorporates a range of conditions to provide robustness, and we include a low-cost sampling mechanism using recent network metrics to trigger re-selection or minor adjustments without full re-profiling. This maintains the low-overhead nature while enhancing adaptability. revision: partial

Circularity Check

0 steps flagged

No significant circularity; claims rest on implementation and empirical measurement

full rationale

The provided abstract and description contain no equations, derivations, or mathematical claims. NetCAS is presented as a practical framework that uses a precomputed Perf Profile and real-time feedback as inputs to a scheduler, with performance numbers (174% and 3.5X) reported from measurements. No step reduces a prediction or result to a fitted parameter or self-citation by construction. The central claims are therefore self-contained against external benchmarks rather than circular.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 0 invented entities

The precomputed Perf Profile is the main unexamined component; it likely encodes performance mappings that depend on workload and network parameters whose construction details are not visible in the abstract.

free parameters (1)
  • split ratios
    Dynamically chosen but ultimately derived from the Perf Profile whose internal parameters are not specified.

pith-pipeline@v0.9.0 · 5677 in / 1195 out tokens · 58144 ms · 2026-05-18T14:42:55.910733+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages

  1. [1]

    {SpanDB}: A fast,{Cost-Effective}{LSM- tree} based {KV} store on hybrid storage

    Hao Chen, Chaoyi Ruan, Cheng Li, Xiaosong Ma, and Yinlong Xu. {SpanDB}: A fast,{Cost-Effective}{LSM- tree} based {KV} store on hybrid storage. In19th USENIX Conference on File and Storage Technologies (FAST 21), pages 17–32, 2021

  2. [2]

    Writes hurt: Lessons in cache design for optane nvram

    Alexandra Fedorova, Keith A Smith, Keith Bostic, Su- san LoVerso, Michael Cahill, and Alex Gorrod. Writes hurt: Lessons in cache design for optane nvram. InPro- ceedings of the 13th Symposium on Cloud Computing, pages 110–125, 2022

  3. [3]

    Performance characterization of nvme-over- fabrics storage disaggregation

    Zvika Guz, Harry Li, Anahita Shayesteh, and Vijay Bal- akrishnan. Performance characterization of nvme-over- fabrics storage disaggregation. Master’s thesis, 2018

  4. [4]

    Cubic: a new tcp-friendly high-speed tcp variant

    Sangtae Ha, Injong Rhee, and Lisong Xu. Cubic: a new tcp-friendly high-speed tcp variant. volume 42, pages 64–74. ACM New York, NY , USA, 2008

  5. [5]

    What modern nvme stor- age can do, and how to exploit it: High-performance i/o for high-performance storage engines.Proceedings of the VLDB Endowment, 16(9):2090–2102, 2023

    Gabriel Haas and Viktor Leis. What modern nvme stor- age can do, and how to exploit it: High-performance i/o for high-performance storage engines.Proceedings of the VLDB Endowment, 16(9):2090–2102, 2023

  6. [6]

    Lifting the veil on {Meta’s} microservice architecture: Analyses of topology and request workflows

    Darby Huye, Yuri Shkuro, and Raja R Sambasivan. Lifting the veil on {Meta’s} microservice architecture: Analyses of topology and request workflows. In2023 USENIX Annual Technical Conference (USENIX ATC 23), pages 419–432, 2023

  7. [7]

    Netcache: Balancing key-value stores with fast in-network caching

    Xin Jin, Xiaozhou Li, Haoyu Zhang, Robert Soulé, Jeongkeun Lee, Nate Foster, Changhoon Kim, and Ion Stoica. Netcache: Balancing key-value stores with fast in-network caching. InProceedings of the 26th sympo- sium on operating systems principles, pages 121–136, 2017

  8. [8]

    Understanding and profil- ing {NVMe-over-TCP} using ntprof

    Yuyuan Kang and Ming Liu. Understanding and profil- ing {NVMe-over-TCP} using ntprof. In22nd USENIX Symposium on Networked Systems Design and Imple- mentation (NSDI 25), pages 1117–1136, 2025

  9. [9]

    Strata: A cross media file system

    Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, and Thomas Anderson. Strata: A cross media file system. InProceedings of the 26th Symposium on Operating Systems Principles, pages 460– 477, 2017

  10. [10]

    Performance evaluation for dynamic voltage and frequency scaling using runtime perfor- mance counters.Applied Mechanics and Materials, 284:2575–2579, 2013

    Wen Yew Liang, Ming Feng Chang, Yen Lin Chen, and Jenq Haur Wang. Performance evaluation for dynamic voltage and frequency scaling using runtime perfor- mance counters.Applied Mechanics and Materials, 284:2575–2579, 2013

  11. [11]

    {P2CACHE}: Exploring tiered memory for{In-Kernel} file systems caching

    Zhen Lin, Lingfeng Xiang, Jia Rao, and Hui Lu. {P2CACHE}: Exploring tiered memory for{In-Kernel} file systems caching. In2023 USENIX Annual Technical Conference (USENIX ATC 23), pages 801–815, 2023

  12. [12]

    Multi-host sharing of a single-function nvme device in a pcie cluster

    Jonas Markussen, Lars Bjørlykke Kristiansen, Håkon Kvale Stensland, and Pål Halvorsen. Multi-host sharing of a single-function nvme device in a pcie cluster. InSC24-W: Workshops of the International Conference for High Performance Computing, Network- ing, Storage and Analysis, pages 1638–1645. IEEE, 2024

  13. [13]

    Azure sql database hyperscale

    Microsoft. Azure sql database hyperscale. https://learn.microsoft.com/en-us/azure/ azure-sql/database/service-tier-hyperscale ,

  14. [14]

    Accessed: 2025-06-30

  15. [15]

    Nvme® over rdma trans- port: Improving network-based storage

    NVM Express, Inc. Nvme® over rdma trans- port: Improving network-based storage. NVM Express Blog. https://nvmexpress.org/ nvme-over-rdma-transport-improving-network-based-storage/

  16. [16]

    NVM Express TCP Trans- port Specification

    NVM Express Organization. NVM Express TCP Trans- port Specification

  17. [17]

    Open Cache Acceleration Software

    Open CAS. Open Cache Acceleration Software. https: //open-cas.com/. Accessed: 2025-09-17

  18. [18]

    Introducing lightning: A flex- ible nvme jbof

    Chris Petersen. Introducing lightning: A flex- ible nvme jbof. InEngineering at Meta,

  19. [19]

    https://engineering.fb.com/ 2016/03/09/data-center-engineering/ introducing-lightning-a-flexible-nvme-jbof/

  20. [20]

    {PolyStore}: Exploiting combined capabilities of heterogeneous storage

    Yujie Ren, David Domingo, Jian Zhang, Paul John, Rekha Pitchumani, Sanidhya Kashyap, and Sudarsun Kannan. {PolyStore}: Exploiting combined capabilities of heterogeneous storage. In23rd USENIX Conference on File and Storage Technologies (FAST 25), pages 539– 555, 2025

  21. [21]

    Disaggregated raid storage in modern datacenters

    Junyi Shu, Ruidong Zhu, Yun Ma, Gang Huang, Hong Mei, Xuanzhe Liu, and Xin Jin. Disaggregated raid storage in modern datacenters. InProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3, pages 147–163, 2023

  22. [22]

    Dcperf: An open-source, battle-tested performance benchmark suite for datacenter workloads

    Wei Su, Abhishek Dhanotia, Carlos Torres, Jayneel Gandhi, Neha Gholkar, Shobhit Kanaujia, Maxim Nau- mov, Kalyan Subramanian, Valentin Andrei, Yifan Yuan, et al. Dcperf: An open-source, battle-tested performance benchmark suite for datacenter workloads. InProceed- ings of the 52nd Annual International Symposium on Computer Architecture, pages 1717–1730, 2025. 7

  23. [23]

    Amazon aurora: Design consid- erations for high throughput cloud-native relational databases

    Alexandre Verbitski, Anurag Gupta, Debanjan Saha, Mu- rali Brahmadesam, Kamal Gupta, Raman Mittal, Sailesh Krishnamurthy, Sandor Maurice, Tengiz Kharatishvili, and Xiaofeng Bao. Amazon aurora: Design consid- erations for high throughput cloud-native relational databases. InProceedings of the 2017 ACM Interna- tional Conference on Management of Data, pages...

  24. [24]

    Tcp ex machina: Computer-generated congestion control

    Keith Winstein and Hari Balakrishnan. Tcp ex machina: Computer-generated congestion control. volume 43, pages 123–134. ACM New York, NY , USA, 2013

  25. [25]

    The storage hierarchy is not a hierarchy: Optimizing caching on modern storage devices with orthus

    Kan Wu, Zhihan Guo, Guanzhou Hu, Kaiwei Tu, Ram- natthan Alagappan, Rathijit Sen, Kwanghyun Park, An- drea C Arpaci-Dusseau, and Remzi H Arpaci-Dusseau. The storage hierarchy is not a hierarchy: Optimizing caching on modern storage devices with orthus. In19th USENIX Conference on File and Storage Technologies (FAST 21), pages 307–323, 2021

  26. [26]

    Rethinking the {Request-to-IO} transformation process of file sys- tems for full utilization of {High-Bandwidth}{SSDs}

    Yekang Zhan, Haichuan Hu, Xiangrui Yang, Qiang Cao, Hong Jiang, Shaohua Wang, and Jie Yao. Rethinking the {Request-to-IO} transformation process of file sys- tems for full utilization of {High-Bandwidth}{SSDs}. In23rd USENIX Conference on File and Storage Tech- nologies (FAST 25), pages 69–86, 2025

  27. [27]

    Dds: Dpu-optimized disaggregated storage

    Qizhen Zhang, Philip Bernstein, Badrish Chandramouli, Jiasheng Hu, and Yiming Zheng. Dds: Dpu-optimized disaggregated storage. 2024

  28. [28]

    Ziggurat: A tiered file system for {Non- V olatile} main memories and disks

    Shengan Zheng, Morteza Hoseinzadeh, and Steven Swanson. Ziggurat: A tiered file system for {Non- V olatile} main memories and disks. In17th USENIX Conference on File and Storage Technologies (FAST 19), pages 207–219, 2019. 8