pith. sign in

arxiv: 2605.18066 · v1 · pith:57GF3772new · submitted 2026-05-18 · 💻 cs.OS · cs.DC

TIDAL: Recovering Temporal Phase for Cloud Block Storage Placement from LLM-Derived Semantics

Pith reviewed 2026-05-20 00:19 UTC · model grok-4.3

classification 💻 cs.OS cs.DC
keywords cloud block storagevirtual disk placementtemporal phaseLLM semanticscomplementary placementoverload reductionmetadata inferencecold-start placement
0
0 comments X

The pith

TIDAL recovers temporal phases from LLM-derived semantics in provisioning names to enable complementary placement of cloud virtual disks and cut overloads.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that overload risk in cloud block storage stems from aligned temporal peaks of co-located disks rather than average spatial load. Existing methods lack history for new disks at provisioning time, so they cannot achieve complementary placement that offsets those peaks. TIDAL extracts application semantics from tenant names and identifiers using LLMs, translates the semantics into phase-aware signals, and uses those signals for placement decisions. An offline-to-online architecture with distillation and caching keeps inference fast enough for the control plane. If correct, this approach would allow much lower overload frequency and duration without waiting for runtime observations.

Core claim

TIDAL is a CVD placement framework that recovers phase-aware temporal signals for cold-start placement from tenant-provided names and identifiers by first using LLMs to extract application semantics from noisy metadata and then translating those semantics into temporal phase estimates that guide complementary placement across pods.

What carries the argument

LLM-driven semantic recovery from provisioning metadata, translated into phase-aware temporal signals, with an offline-to-online teacher-student distillation and prefix-aware caching pipeline that enables millisecond CPU-only inference.

If this is right

  • Complementary placement becomes feasible at provisioning time even for disks that have no prior runtime history.
  • Pods experience fewer transient congestion events while still satisfying spatial balance constraints.
  • Control-plane latency requirements are met through distillation and caching so that the method can run online.
  • Resource efficiency and performance isolation both improve because peak alignment is reduced without extra hardware.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar metadata-driven semantic extraction could apply to other cold-start decisions such as VM or container scheduling.
  • The reliance on names raises questions about how changes in tenant naming conventions would affect accuracy over time.
  • If the correlation between semantics and phases proves stable, the same signals might support predictive auto-scaling rather than only placement.

Load-bearing premise

Tenant-provided names and identifiers contain recoverable semantic information that correlates reliably with the actual temporal load phases of the underlying applications.

What would settle it

Production traces in which the phases inferred by the LLM from names show no better correlation with observed load peaks than random assignment would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.18066 by Changlin Wan, Difan Tan, Hua Wang, Jiawen Liu, Ke Zhou.

Figure 1
Figure 1. Figure 1: Comparison between existing CVD placement schemes (a) and TIDAL (b). Machines (CVMs) across providers such as AWS [3], Mi￾crosoft Azure [48], and Alibaba Cloud [83]. In CBS, tenants provision Cloud Virtual Disks (CVDs) [13, 44, 70, 73, 77] to scale storage resources, which are placed onto backend stor￾age clusters partitioned into pods. The placement of a newly created CVD—that is, deciding which pod hosts… view at source ↗
Figure 2
Figure 2. Figure 2: Cloud block storage architecture. CVMs access CVDs over the datacenter network; the storage cluster is partitioned into pods spanning multiple physical servers. semantic cache short-circuits redundant runtime inferences. Together, these techniques enable sub-10 ms placement deci￾sions while preserving the benefits of LLM-based semantic understanding. This paper makes the following contributions: • We formu… view at source ↗
Figure 3
Figure 3. Figure 3: Temporal characteristics of CVD workloads. (a) CDF of peak/valley stability. (b) Distribution of peak/valley windows over weekdays and weekends. to smooth aggregate pod load through complementary placement. 2.3 The Cold-Start Phase Gap These observations point to an opportunity: use complemen￾tary placement to assign a new CVD to a pod whose existing workloads offset the disk’s future peaks. The difficulty… view at source ↗
Figure 4
Figure 4. Figure 4: (a) shows representative daily profiles for several ap￾plication classes, illustrating that different semantic groups 1The semantic distribution in [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: illustrates its end-to-end workflow. Upon receiving a provisioning request, TIDAL first infers an application label from tenant-provided identifiers in provi￾sioning metadata (Step ❶: Semantic inference). It then maps the label to a canonical temporal pattern (Step ❷: Pattern mapping) and predicts load intensity from resource specifica￾tions (Step ❸: Intensity prediction). These two components are combined… view at source ↗
Figure 6
Figure 6. Figure 6: Offline-to-online semantic inference pipeline in TIDAL. The offline stage constructs the taxonomy and dis￾tills supervision from a teacher LLM, while the online stage serves semantic inference through filtering, prefix-aware caching, and a lightweight student model. where 𝑃, 𝑉 , and 𝐷 denote the project, VM, and disk identi￾fiers. The core difficulty is that these identifiers are noisy, open-vocabulary, an… view at source ↗
Figure 7
Figure 7. Figure 7: Construction process of the offline profile library. We formulate this task as regression rather than classifi￾cation. Prior schemes often discretize intensity into coarse levels (e.g., Low/Medium/High) and formulate prediction as classification [33, 37, 67, 70], but such quantization intro￾duces packing error. Since overload avoidance depends on numerical mismatch against a fixed pod bandwidth budget, red… view at source ↗
Figure 8
Figure 8. Figure 8: Effectiveness of TIDAL in load smoothing. (a) Overload time fraction (OTF) as placement progresses. (b) CDF of overload duration. (c) Spatial vs. temporal imbalance. (d) Distribution of temporal imbalance across pods. TELA performs better by dispersing bursty disks, but TIDAL consistently achieves the lowest OTF among all practical schemes, with only 3.19% OTF at full placement, and closely tracking Oracle… view at source ↗
Figure 9
Figure 9. Figure 9: Ablation study. (a) Impact of intensity prediction and semantic inference. (b) Comparison of greedy objectives. load under the threshold more often, but also produces in￾trinsically smoother pod-level load curves. 5.3 Dissecting TIDAL’s Gains We next isolate which components are responsible for TIDAL’s gains. Impact of system components. We compare three progres￾sively stronger variants: (i) TIDAL-Cap, whi… view at source ↗
Figure 11
Figure 11. Figure 11: Robustness to semantically weak metadata. (a) Placement latency and (b) OTF under injected meaningless metadata, with and without regex-based filtering [PITH_FULL_IMAGE:figures/full_fig_p010_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Hyperparameter sensitivity. (a) Sensitivity to temporal resolution 𝐾. (b) Sensitivity to candidate size 𝑀. into provisioning metadata, simulating requests whose names carry little recoverable semantics [PITH_FULL_IMAGE:figures/full_fig_p011_12.png] view at source ↗
read the original abstract

Cloud Virtual Disk (CVD) placement in Cloud Block Storage (CBS) is critical for resource efficiency and performance isolation. Existing schemes prioritize spatial load balancing by dispersing disks across pods based on configuration-derived load estimates. However, overload risk in CBS is fundamentally temporal. Even when average load is balanced, pods can still suffer transient congestion when the peaks of co-located disks align in time. Achieving complementary placement, which co-locates CVDs with offset peaks, is hard at provisioning time because new disks have no history from which to infer temporal phase. We present TIDAL, a CVD placement framework that recovers phase-aware signals for cold-start placement from an underused source: tenant-provided names and identifiers in provisioning metadata. TIDAL first uses LLMs to recover application semantics from noisy metadata such as project, VM, and disk names. It then translates these semantics into phase-aware temporal signals to guide complementary placement. To satisfy control-plane constraints, TIDAL adopts an offline-to-online design with teacher-student distillation, regex-based filtering, and prefix-aware caching, enabling CPU-only inference with millisecond-level latency. Evaluations driven by production traces show that TIDAL reduces overload frequency by 79.1% and P95 overload duration by 73.7% compared with the strongest baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents TIDAL, a CVD placement framework for cloud block storage that recovers phase-aware temporal signals from tenant-provided metadata names and identifiers using LLMs. It translates these semantics into complementary placement decisions to offset load peaks and reduce transient overloads, employing an offline teacher-student distillation design with regex filtering and prefix caching for low-latency CPU inference. Evaluations on production traces report 79.1% reduction in overload frequency and 73.7% reduction in P95 overload duration relative to the strongest baselines.

Significance. If the results hold, TIDAL offers a practical way to incorporate temporal phase information into cold-start placement where history is unavailable, addressing a gap in existing spatial load-balancing schemes. The offline-to-online architecture with distillation and caching is a strength for meeting control-plane constraints. This could improve resource efficiency and isolation in CBS deployments if the LLM-derived signals prove reliable across workloads.

major comments (2)
  1. [§4] §4, Evaluation: The headline reductions (79.1% overload frequency, 73.7% P95 duration) are measured against external baselines on production traces, but the section provides insufficient detail on LLM prompting strategy, the precise phase derivation method from semantics, baseline definitions, or controls for LLM output variability; these elements are load-bearing for assessing whether the gains stem from the claimed semantic recovery.
  2. [§3.2] §3.2, Semantic-to-Phase Translation: The mapping from LLM-extracted application semantics to temporal phase signals relies on an assumed correlation between tenant metadata and actual load phases; without an ablation isolating this component or explicit validation of the correlation on the traces, the central claim that metadata semantics enable reliable complementary placement remains difficult to verify.
minor comments (2)
  1. [§3.3] The description of prefix-aware caching in §3.3 could include concrete latency measurements or cache hit rates to better substantiate the millisecond-level inference claim.
  2. [Figure 3] Figure 3 (placement comparison) would benefit from error bars or statistical significance tests on the overload metrics to strengthen the quantitative comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We respond to each major comment below and indicate the revisions we will make to address the concerns raised.

read point-by-point responses
  1. Referee: [§4] §4, Evaluation: The headline reductions (79.1% overload frequency, 73.7% P95 duration) are measured against external baselines on production traces, but the section provides insufficient detail on LLM prompting strategy, the precise phase derivation method from semantics, baseline definitions, or controls for LLM output variability; these elements are load-bearing for assessing whether the gains stem from the claimed semantic recovery.

    Authors: We agree that the evaluation section would benefit from greater transparency on these points to allow independent assessment of the results. In the revised version we will expand §4 to include the exact prompting templates and system instructions provided to the LLM, the rule-based procedure that converts extracted semantic categories into phase offset signals, the precise configurations and parameter settings used for each baseline, and additional runs that quantify sensitivity to LLM output stochasticity. These additions will make explicit how the reported reductions are tied to the semantic recovery mechanism. revision: yes

  2. Referee: [§3.2] §3.2, Semantic-to-Phase Translation: The mapping from LLM-extracted application semantics to temporal phase signals relies on an assumed correlation between tenant metadata and actual load phases; without an ablation isolating this component or explicit validation of the correlation on the traces, the central claim that metadata semantics enable reliable complementary placement remains difficult to verify.

    Authors: The current manuscript motivates the mapping from domain knowledge of typical cloud workload patterns (e.g., distinguishing interactive services from batch jobs) and shows that the resulting placement decisions improve outcomes on production traces. We acknowledge that an isolated ablation of the translation step and direct statistical validation of the metadata-to-phase correlation are not presented. In revision we will add a dedicated paragraph in §3.2 that spells out the mapping rules with examples and, using the available trace metadata, report a targeted comparison of placement quality with and without the phase signals to provide the requested validation. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper derives phase-aware placement signals by applying LLMs to tenant-provided metadata names and identifiers, then using the resulting semantics for complementary CVD placement. This chain operates on external inputs (provisioning metadata) and is evaluated via direct measurement on production traces against independent baselines, with no fitted parameters renamed as predictions, no self-definitional loops in the equations, and no load-bearing self-citations or ansatz smuggling. The offline teacher-student distillation and caching are implementation details that do not reduce the core claim to its own outputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the unverified assumption that names encode temporal behavior and that LLMs can extract it; no free parameters or new entities are introduced in the abstract.

axioms (1)
  • domain assumption LLMs can recover application semantics from noisy tenant-provided names and identifiers that correlate with temporal load phases.
    Invoked when the abstract states that LLMs translate semantics into phase-aware signals.

pith-pipeline@v0.9.0 · 5766 in / 1209 out tokens · 45515 ms · 2026-05-20T00:19:22.755410+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

96 extracted references · 96 canonical work pages · 5 internal anchors

  1. [1]

    Abdelzaher and Nina Bhatti

    Tarek F. Abdelzaher and Nina Bhatti. 1999. Web content adaptation to improve server overload behavior.Comput. Netw.31, 11–16 (May 1999), 1563–1577. doi:10.1016/S1389-1286(99)00031-6

  2. [2]

    Jennifer Abel and Birger Lantow. 2019. A Methodological Framework for Dictionary and Rule-based Text Classification. InProceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2019) - Volume 1: KDIR. INSTICC, SciTePress, 330–337. doi:10.5220/0008121503300337

  3. [3]

    2025.What is Amazon Elastic Block Store.https://docs.aws

    Amazon. 2025.What is Amazon Elastic Block Store.https://docs.aws. amazon.com/ebs/latest/userguide/what-is-ebs.html

  4. [4]

    Pradeep Ambati, Inigo Goiri, Felipe Frujeri, Alper Gun, Ke Wang, Brian Dolan, Brian Corell, Sekhar Pasupuleti, Thomas Moscibroda, Sameh Elnikety, Marcus Fontoura, and Ricardo Bianchini. 2020. Pro- viding SLOs for Resource-Harvesting VMs in Cloud Platforms. In14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association...

  5. [5]

    Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, et al . 2023. Qwen Technical Report.arXiv preprint(2023). arXiv:2309.16609 [cs.CL]

  6. [6]

    Maguire Jr., Panagiotis Papadimitratos, and Marco Chiesa

    Tom Barbette, Chen Tang, Haoran Yao, Dejan Kostić, Gerald Q. Maguire Jr., Panagiotis Papadimitratos, and Marco Chiesa. 2020. A High-Speed Load-Balancer Design with Guaranteed Per-Connection- Consistency. In17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20). USENIX Association, Santa Clara, CA, 667–683.https://www.usenix.org/con...

  7. [7]

    Bogdanov, Waleed Reda, Gerald Q

    Kirill L. Bogdanov, Waleed Reda, Gerald Q. Maguire, Dejan Kostić, and Marco Canini. 2018. Fast and Accurate Load Balancing for Geo- Distributed Storage Systems. InProceedings of the ACM Symposium on Cloud Computing(Carlsbad, CA, USA)(SoCC ’18). Association for Computing Machinery, New York, NY, USA, 386–400. doi:10.1145/ 3267809.3267820

  8. [8]

    Leo Breiman. 2001. Random Forests.Machine Learning45, 1 (Oct. 2001), 5–32. doi:10.1023/A:1010933404324

  9. [9]

    Brendan Burns, Brian Grant, David Oppenheimer, Eric Brewer, and John Wilkes. 2016. Borg, Omega, and Kubernetes.Commun. ACM59, 5 (April 2016), 50–57. doi:10.1145/2890784

  10. [10]

    Liuhua Chen and Haiying Shen. 2014. Consolidating complementary VMs with spatial/temporal-awareness in cloud datacenters. InIEEE INFOCOM 2014 - IEEE Conference on Computer Communications. 1033–

  11. [11]

    doi:10.1109/INFOCOM.2014.6848033

  12. [12]

    Ruobing Chen, Haosen Shi, Yusen Li, Xiaoguang Liu, and Gang Wang

  13. [13]

    In Proceedings of the Eighteenth European Conference on Computer Sys- tems(Rome, Italy)(EuroSys ’23)

    OLPart: Online Learning based Resource Partitioning for Colo- cating Multiple Latency-Critical Jobs on Commodity Computers. In Proceedings of the Eighteenth European Conference on Computer Sys- tems(Rome, Italy)(EuroSys ’23). Association for Computing Machinery, New York, NY, USA, 347–364. doi:10.1145/3552326.3567490

  14. [14]

    Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(San Francisco, California, USA)(KDD ’16). Association for Computing Machinery, New York, NY, USA, 785–794. doi:10.1145/2939672.2939785

  15. [15]

    Xinqi Chen, Yu Zhang, Erci Xu, Changhong Wang, Jifei Yi, Qiup- ing Wang, Shizhuo Sun, Zhongyu Wang, Haonan Wu, Junping Wu, et al. 2026. How Soon is Now? Preloading Images for Virtual Disks with {ThinkAhead}. In24th USENIX Conference on File and Storage Technologies (FAST 26). 399–414

  16. [16]

    Ho-Ren Chuang, Karim Manaouil, Tong Xing, Antonio Barbalace, Pierre Olivier, Balvansh Heerekar, and Binoy Ravindran. 2023. Ag- gregate VM: Why Reduce or Evict VM’s Resources When You Can Borrow Them From Other Nodes?. InProceedings of the Eighteenth European Conference on Computer Systems(Rome, Italy)(EuroSys ’23). Association for Computing Machinery, New...

  17. [17]

    Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hansen, Eric Jul, Christian Limpach, Ian Pratt, and Andrew Warfield. 2005. Live Migration of Virtual Machines. In2nd Symposium on Networked Sys- tems Design & Implementation (NSDI 05). USENIX Association, Boston, MA.https://www.usenix.org/conference/nsdi-05/live-migration- virtual-machines

  18. [18]

    Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaud- hary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. 2020. Unsupervised Cross-lingual Representation Learning at Scale. InProceedings of the 58th Annual Meeting of the Association for Computational Linguis- tics, Dan Jurafsky, Joyce Chai,...

  19. [19]

    Eli Cortez, Anand Bonde, Alexandre Muzio, Mark Russinovich, Mar- cus Fontoura, and Ricardo Bianchini. 2017. Resource Central: Un- derstanding and Predicting Workloads for Improved Resource Man- agement in Large Cloud Platforms. InProceedings of the 26th Sym- posium on Operating Systems Principles(Shanghai, China)(SOSP ’17). Association for Computing Machi...

  20. [20]

    Woeginger

    János Csirik and Gerhard J. Woeginger. 1998. On-line Packing and Cov- ering Problems. InDevelopments from a June 1996 Seminar on Online Algorithms: The State of the Art. Springer-Verlag, Berlin, Heidelberg, 147–177

  21. [21]

    Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, and Ziqing Yang

  22. [22]

    IEEE/ACM Transactions on Audio, Speech, and Language Processing29 (2021), 3504–3514

    Pre-Training With Whole Word Masking for Chinese BERT. IEEE/ACM Transactions on Audio, Speech, and Language Processing29 (2021), 3504–3514. doi:10.1109/TASLP.2021.3124365

  23. [23]

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova

  24. [24]

    BERT: Pre-training of Deep Bidirectional Transformers for Lan- guage Understanding. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguis- tics: Human Language Technologies, Volume 1 (Long and Short Papers), TIDAL: Recovering Temporal Phase for Cloud Block Storage Placement from LLM-Derived Semanti...

  25. [25]

    David Domingo, Hugo Barbalho, Marco Molinaro, Kuan Liu, Abhisek Pan, David Dion, Thomas Moscibroda, Sudarsun Kannan, and Ishai Menache. 2025. Kamino: efficient VM allocation at scale with latency- driven cache-aware scheduling. InProceedings of the 19th USENIX Conference on Operating Systems Design and Implementation(Boston, MA, USA)(OSDI ’25). USENIX Ass...

  26. [26]

    Eisenbud, Cheng Yi, Carlo Contavalli, Cody Smith, Ro- man Kononov, Eric Mann-Hielscher, Ardas Cilingiroglu, Bin Cheyney, Wentao Shang, and Jinnah Dylan Hosein

    Danielle E. Eisenbud, Cheng Yi, Carlo Contavalli, Cody Smith, Ro- man Kononov, Eric Mann-Hielscher, Ardas Cilingiroglu, Bin Cheyney, Wentao Shang, and Jinnah Dylan Hosein. 2016. Maglev: A Fast and Reliable Software Network Load Balancer. In13th USENIX Sympo- sium on Networked Systems Design and Implementation (NSDI 16). USENIX Association, Santa Clara, CA...

  27. [27]

    Zishuo Feng and Feng Cao. 2025. CNMBERT: A Model for Converting Hanyu Pinyin Abbreviations to Chinese Characters. arXiv:2411.11770 [cs.CL]https://arxiv.org/abs/2411.11770

  28. [28]

    Yao Fu, Leyang Xue, Yeqi Huang, Andrei-Octavian Brabete, Dmitrii Ustiugov, Yuvraj Patel, and Luo Mai. 2024. ServerlessLLM: Low- Latency Serverless Inference for Large Language Models. In18th USENIX Symposium on Operating Systems Design and Implemen- tation (OSDI 24). USENIX Association, Santa Clara, CA, 135–153. https://www.usenix.org/conference/osdi24/pr...

  29. [29]

    Shayna Gardiner, Tania Habib, Kevin Humphreys, Masha Azizi, Fred- eric Mailhot, Anne Paling, Preston Thomas, and Nathan Zhang. 2024. Data Anonymization for Privacy-Preserving Large Language Model Fine-Tuning on Call Transcripts. InProceedings of the Workshop on Computational Approaches to Language Data Pseudonymization (CALD- pseudo 2024), Elena Volodina,...

  30. [30]

    Gemini Team. 2023. Gemini: A Family of Highly Capable Multimodal Models.arXiv preprint(2023). arXiv:2312.11805 [cs.CL]

  31. [31]

    Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, et al. 2025. Deepseek-r1 incentivizes reasoning in llms through reinforcement learning.Nature645, 8081 (2025), 633–638

  32. [32]

    Ori Hadary, Luke Marshall, Ishai Menache, Abhisek Pan, Esaias E Greeff, David Dion, Star Dorminey, Shailesh Joshi, Yang Chen, Mark Russinovich, and Thomas Moscibroda. 2020. Protean: VM Allocation Service at Scale. In14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, 845–861. https://www.usenix.org/conferenc...

  33. [33]

    Fabien Hermenier, Xavier Lorca, Jean-Marc Menaud, Gilles Muller, and Julia Lawall. 2009. Entropy: a consolidation manager for clusters. In Proceedings of the 2009 ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments(Washington, DC, USA)(VEE ’09). Association for Computing Machinery, New York, NY, USA, 41–50. doi:10.1145/1508293.1508300

  34. [34]

    Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, and Qun Liu. 2020. TinyBERT: Distilling BERT for Natural Language Understanding. InFindings of the Association for Computational Linguistics: EMNLP 2020, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, Online, 4163–4174. doi:10.18653/v...

  35. [35]

    Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov

  36. [36]

    Bag of Tricks for Efficient Text Classification. InProceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, Mirella Lapata, Phil Blunsom, and Alexander Koller (Eds.). Association for Computational Linguistics, Valencia, Spain, 427–431.https://aclanthology.org/E17- 2068/

  37. [37]

    Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Wei- dong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. LightGBM: a highly efficient gradient boosting decision tree. InProceedings of the 31st International Conference on Neural Information Processing Systems(Long Beach, Cal- ifornia, USA)(NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 3149–3157

  38. [38]

    Misra, Seyyed Ah- mad Javadi, Bianca Schroeder, Marcus Fontoura, and Ricardo Bian- chini

    Alok Gautam Kumbhare, Reza Azimi, Ioannis Manousakis, Anand Bonde, Felipe Frujeri, Nithish Mahalingam, Pulkit A. Misra, Seyyed Ah- mad Javadi, Bianca Schroeder, Marcus Fontoura, and Ricardo Bian- chini. 2021. Prediction-Based Power Oversubscription in Cloud Plat- forms. In2021 USENIX Annual Technical Conference (USENIX ATC 21). USENIX Association, 473–487...

  39. [39]

    Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph Gonzalez, Hao Zhang, and Ion Stoica

  40. [40]

    Efficient Memory Management for Large Language Model Serving with PagedAttention , booktitle =

    Efficient Memory Management for Large Language Model Serving with PagedAttention. InProceedings of the 29th Symposium on Operating Systems Principles(Koblenz, Germany)(SOSP ’23). As- sociation for Computing Machinery, New York, NY, USA, 611–626. doi:10.1145/3600006.3613165

  41. [41]

    Jinhong Li, Qiuping Wang, Patrick P. C. Lee, and Chao Shi. 2023. An In-depth Comparative Analysis of Cloud Block Storage Workloads: Findings and Implications.ACM Trans. Storage19, 2, Article 16 (March 2023), 32 pages. doi:10.1145/3572779

  42. [42]

    Suyi Li, Luping Wang, Wei Wang, Yinghao Yu, and Bo Li. 2021. George: Learning to Place Long-Lived Containers in Large Clusters with Oper- ation Constraints. InProceedings of the ACM Symposium on Cloud Com- puting(Seattle, WA, USA)(SoCC ’21). Association for Computing Ma- chinery, New York, NY, USA, 258–272. doi:10.1145/3472883.3486971

  43. [43]

    Bush, Prakash Ramanan, Rajesh Kumar, Thomas Chestna, Yajing Liu, Ying Liu, Ye Zhao, Kathryn S

    Jianheng Ling, Pratik Worah, Yawen Wang, Yunchuan Kong, Chunlei Wang, Clifford Stein, Diwakar Gupta, Jason Behmer, Logan A. Bush, Prakash Ramanan, Rajesh Kumar, Thomas Chestna, Yajing Liu, Ying Liu, Ye Zhao, Kathryn S. McKinley, Meeyoung Park, and Martin Maas

  44. [44]

    InEighth Conference on Machine Learning and Systems.https://openreview.net/forum?id= 9vyyfVNW1E

    LAVA: Lifetime-Aware VM Allocation with Learned Distri- butions and Adaptation to Mispredictions. InEighth Conference on Machine Learning and Systems.https://openreview.net/forum?id= 9vyyfVNW1E

  45. [45]

    Shutian Luo, Huanle Xu, Kejiang Ye, Guoyao Xu, Liping Zhang, Guodong Yang, and Chengzhong Xu. 2022. The power of predic- tion: microservice auto scaling via workload learning. InProceedings of the 13th Symposium on Cloud Computing(San Francisco, California) (SoCC ’22). Association for Computing Machinery, New York, NY, USA, 355–369. doi:10.1145/3542929.3563477

  46. [46]

    Martin Maas. 2020. A Taxonomy of ML for Systems Problems.IEEE Micro40, 5 (2020), 8–16. doi:10.1109/MM.2020.3012883

  47. [47]

    Heax: An architecture for computing on encrypted data,

    Martin Maas, David G. Andersen, Michael Isard, Mohammad Mahdi Javanmard, Kathryn S. McKinley, and Colin Raffel. 2020. Learning- based Memory Allocation for C++ Server Workloads. InProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems(Lausanne, Switzer- land)(ASPLOS ’20). Associati...

  48. [48]

    Hongzi Mao, Mohammad Alizadeh, Ishai Menache, and Srikanth Kan- dula. 2016. Resource Management with Deep Reinforcement Learning. InProceedings of the 15th ACM Workshop on Hot Topics in Networks(At- lanta, GA, USA)(HotNets ’16). Association for Computing Machinery, New York, NY, USA, 50–56. doi:10.1145/3005745.3005750

  49. [49]

    Haoyu Mao, Yongkun Li, Wenzhe Zhu, Fei Li, and Yinlong Xu. 2023. On Optimizing Traffic Imbalance in Large-scale Block-based Cloud Storage: Trace Analysis and Algorithm Design. In2022 IEEE 28th International Conference on Parallel and Distributed Systems (ICPADS). 728–736. doi:10.1109/ICPADS56603.2022.00100 Trovato et al

  50. [50]

    Hongzi Mao, Malte Schwarzkopf, Shaileshh Bojja Venkatakrishnan, Zili Meng, and Mohammad Alizadeh. 2019. Learning scheduling algo- rithms for data processing clusters. InProceedings of the ACM Special Interest Group on Data Communication(Beijing, China)(SIGCOMM ’19). Association for Computing Machinery, New York, NY, USA, 270–288. doi:10.1145/3341302.3342080

  51. [51]

    Meyer, Gitika Aggarwal, Brendan Cully, Geoffrey Lefeb- vre, Michael J

    Dutch T. Meyer, Gitika Aggarwal, Brendan Cully, Geoffrey Lefeb- vre, Michael J. Feeley, Norman C. Hutchinson, and Andrew Warfield

  52. [52]

    InProceedings of the 3rd ACM SIGOPS/EuroSys European Conference on Computer Systems 2008(Glasgow, Scotland UK)(Eurosys ’08)

    Parallax: virtual disks for virtual machines. InProceedings of the 3rd ACM SIGOPS/EuroSys European Conference on Computer Systems 2008(Glasgow, Scotland UK)(Eurosys ’08). Association for Computing Machinery, New York, NY, USA, 41–54. doi:10.1145/1352592.1352598

  53. [53]

    Justin J. Meza, Thote Gowda, Ahmed Eid, Tomiwa Ijaware, Dmitry Chernyshev, Yi Yu, Md Nazim Uddin, Rohan Das, Chad Nachiappan, Sari Tran, Shuyang Shi, Tina Luo, David Ke Hong, Sankaralingam Pan- neerselvam, Hans Ragas, Svetlin Manavski, Weidong Wang, and Fran- cois Richard. 2023. Defcon: Preventing Overload with Graceful Feature Degradation. In17th USENIX ...

  54. [54]

    Rui Miao, Hongyi Zeng, Changhoon Kim, Jeongkeun Lee, and Minlan Yu. 2017. SilkRoad: Making Stateful Layer-4 Load Balancing Fast and Cheap Using Switching ASICs. InProceedings of the Conference of the ACM Special Interest Group on Data Communication(Los Angeles, CA, USA)(SIGCOMM ’17). Association for Computing Machinery, New York, NY, USA, 15–28. doi:10.11...

  55. [55]

    Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Zeyu Wang, Zhengxin Zhang, Rae Ying Yee Wong, Alan Zhu, Lijie Yang, Xi- aoxiang Shi, Chunan Shi, Zhuoming Chen, Daiyaan Arfeen, Reyna Ab- hyankar, and Zhihao Jia. 2024. SpecInfer: Accelerating Large Language Model Serving with Tree-based Speculative Inference and Verification. InProceedings of the ...

  56. [56]

    2025.Introduction to Azure managed disks.https: //learn.microsoft.com/en-us/azure/virtual-machines/managed- disks-overview

    Microsoft. 2025.Introduction to Azure managed disks.https: //learn.microsoft.com/en-us/azure/virtual-machines/managed- disks-overview

  57. [57]

    Vladimir Olteanu, Alexandru Agache, Andrei Voinescu, and Costin Raiciu. 2018. Stateless Datacenter Load-balancing with Beamer. In15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18). USENIX Association, Renton, WA, 125–139.https://www. usenix.org/conference/nsdi18/presentation/olteanu

  58. [58]

    OpenAI. 2023. GPT-4 Technical Report.arXiv preprint(2023). arXiv:2303.08774 [cs.CL]

  59. [59]

    Maltz, Randy Kern, Hemant Kumar, Marios Zikos, Hongyu Wu, Changhoon Kim, and Naveen Karri

    Parveen Patel, Deepak Bansal, Lihua Yuan, Ashwin Murthy, Albert Greenberg, David A. Maltz, Randy Kern, Hemant Kumar, Marios Zikos, Hongyu Wu, Changhoon Kim, and Naveen Karri. 2013. Ananta: cloud scale load balancing. InProceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM(Hong Kong, China)(SIGCOMM ’13). As- sociation for Computing Machinery, New York,...

  60. [60]

    Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Pret- tenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. 2011. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res.12, nul...

  61. [61]

    Yanghua Peng, Yixin Bao, Yangrui Chen, Chuan Wu, and Chuanxiong Guo. 2018. Optimus: an efficient dynamic resource scheduler for deep learning clusters. InProceedings of the Thirteenth EuroSys Conference (Porto, Portugal)(EuroSys ’18). Association for Computing Machinery, New York, NY, USA, Article 3, 14 pages. doi:10.1145/3190508.3190517

  62. [62]

    Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush, and Andrey Gulin. 2018. CatBoost: unbiased boosting with categorical features. InProceedings of the 32nd International Conference on Neural Information Processing Systems (Montréal, Canada)(NIPS’18). Curran Associates Inc., Red Hook, NY, USA, 6639–6649

  63. [63]

    Ganger, and Eric P

    Aurick Qiao, Sang Keun Choe, Suhas Jayaram Subramanya, Willie Neiswanger, Qirong Ho, Hao Zhang, Gregory R. Ganger, and Eric P. Xing. 2021. Pollux: Co-adaptive Cluster Scheduling for Goodput- Optimized Deep Learning. In15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21). USENIX Association, 1–18.https://www.usenix.org/conference/...

  64. [64]

    Banerjee, Saurabh Jha, Zbigniew T

    Haoran Qiu, Subho S. Banerjee, Saurabh Jha, Zbigniew T. Kalbar- czyk, and Ravishankar K. Iyer. 2020. FIRM: An Intelligent Fine- grained Resource Management Framework for SLO-Oriented Mi- croservices. In14th USENIX Symposium on Operating Systems De- sign and Implementation (OSDI 20). USENIX Association, 805–825. https://www.usenix.org/conference/osdi20/pre...

  65. [65]

    Benjamin Reidys, Jinghan Sun, Anirudh Badam, Shadi Noghabi, and Jian Huang. 2022. BlockFlex: Enabling Storage Harvesting with Software-Defined Flash in Modern Cloud Platforms. In16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22). USENIX Association, Carlsbad, CA, 17–33.https://www.usenix. org/conference/osdi22/presentation/reidys

  66. [66]

    Benjamin Reidys, Pantea Zardoshti, Íñigo Goiri, Celine Irvene, Daniel S. Berger, Haoran Ma, Kapil Arya, Eli Cortez, Taylor Stark, Eugene Bak, Mehmet Iyigun, Stanko Novakovic, Lisa Hsu, Karel Trueba, Abhisek Pan, Chetan Bansal, Saravan Rajmohan, Jian Huang, and Ricardo Bianchini. 2025. Coach: Exploiting Temporal Patterns for All-Resource Oversubscription i...

  67. [67]

    Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf

  68. [68]

    DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

    DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv:1910.01108 [cs.CL]https://arxiv.org/abs/1910.01108

  69. [69]

    Steven S. Seiden. 2002. On the online bin packing problem.J. ACM 49, 5 (Sept. 2002), 640–671. doi:10.1145/585265.585269

  70. [70]

    Bin Shi, Haiying Shen, Bo Dong, and Qinghua Zheng. 2022. Mem- ory/Disk Operation Aware Lightweight VM Live Migration.IEEE/ACM Trans. Netw.30, 4 (March 2022), 1895–1910. doi:10.1109/TNET.2022. 3155935

  71. [71]

    Jiuchen Shi, Kaihua Fu, Quan Chen, Changpeng Yang, Pengfei Huang, Mosong Zhou, Jieru Zhao, Chen Chen, and Minyi Guo. 2022. Charac- terizing and orchestrating VM reservation in geo-distributed clouds to improve the resource efficiency. InProceedings of the 13th Sym- posium on Cloud Computing(San Francisco, California)(SoCC ’22). Association for Computing M...

  72. [72]

    Junyi Shu, Kun Qian, Ennan Zhai, Xuanzhe Liu, and Xin Jin. 2024. Burstable Cloud Block Storage with Data Processing Units. In18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24). USENIX Association, Santa Clara, CA, 783–799.https: //www.usenix.org/conference/osdi24/presentation/shu

  73. [73]

    Jovan Stojkovic, Chaojie Zhang, Íñigo Goiri, Esha Choukse, Haoran Qiu, Rodrigo Fonseca, Josep Torrellas, and Ricardo Bianchini. 2025. TAPAS: Thermal- and Power-A ware Scheduling for LLM Inference in Cloud Platforms. Association for Computing Machinery, New York, NY, USA, 1266–1281.https://doi.org/10.1145/3676641.3716025

  74. [74]

    Jinghan Sun, Benjamin Reidys, Daixuan Li, Jichuan Chang, Marc Snir, and Jian Huang. 2025. FleetIO: Managing Multi-Tenant Cloud Stor- age with Multi-Agent Reinforcement Learning. InProceedings of the TIDAL: Recovering Temporal Phase for Cloud Block Storage Placement from LLM-Derived Semantics 30th ACM International Conference on Architectural Support for P...

  75. [75]

    Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou. 2020. MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault (Eds.). Association for Com- putational Lingui...

  76. [76]

    Difan Tan, Jiawei Li, Hua Wang, Xiaoxiao Li, Wenbo Liu, Zijin Qin, Ke Zhou, Ming Xie, and Mengling Tao. 2025. Tela: A Temporal Load- Aware Cloud Virtual Disk Placement Scheme. InProceedings of the 30th ACM International Conference on Architectural Support for Pro- gramming Languages and Operating Systems, Volume 1(Rotterdam, Netherlands)(ASPLOS ’25). Asso...

  77. [77]

    Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Alma- hairi, Yasmine Babaei, et al. 2023. Llama 2: Open Foundation and Fine- Tuned Chat Models.arXiv preprint(2023). arXiv:2307.09288 [cs.CL]

  78. [78]

    Akshat Verma, Puneet Ahuja, and Anindya Neogi. 2008. pMapper: Power and Migration Cost Aware Application Placement in Virtualized Systems. InMiddleware 2008, Valérie Issarny and Richard Schantz (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 243–264

  79. [79]

    Hua Wang, Yang Yang, Ping Huang, Yu Zhang, Ke Zhou, Mengling Tao, and Bin Cheng. 2020. S-CDA: A Smart Cloud Disk Allocation Approach in Cloud Block Storage System. In2020 57th ACM/IEEE Design Automation Conference (DAC). 1–6. doi:10.1109/DAC18072. 2020.9218702

  80. [80]

    Lu Wang, Mayukh Das, Fangkai Yang, Bo Qiao, Hang Dong, Si Qin, Victor Rühle, Chetan Bansal, Eli Cortez, Íñigo Goiri, Saravan Rajmo- han, Qingwei Lin, and Dongmei Zhang. 2025. ProtoRAIL: A Risk- cognizant Imitation Agent for Adaptive vCPU Oversubscription In the Cloud. InEighth Conference on Machine Learning and Systems. https://openreview.net/forum?id=Dt8s7CIsEu

Showing first 80 references.