TIDAL: Recovering Temporal Phase for Cloud Block Storage Placement from LLM-Derived Semantics
Pith reviewed 2026-05-20 00:19 UTC · model grok-4.3
The pith
TIDAL recovers temporal phases from LLM-derived semantics in provisioning names to enable complementary placement of cloud virtual disks and cut overloads.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
TIDAL is a CVD placement framework that recovers phase-aware temporal signals for cold-start placement from tenant-provided names and identifiers by first using LLMs to extract application semantics from noisy metadata and then translating those semantics into temporal phase estimates that guide complementary placement across pods.
What carries the argument
LLM-driven semantic recovery from provisioning metadata, translated into phase-aware temporal signals, with an offline-to-online teacher-student distillation and prefix-aware caching pipeline that enables millisecond CPU-only inference.
If this is right
- Complementary placement becomes feasible at provisioning time even for disks that have no prior runtime history.
- Pods experience fewer transient congestion events while still satisfying spatial balance constraints.
- Control-plane latency requirements are met through distillation and caching so that the method can run online.
- Resource efficiency and performance isolation both improve because peak alignment is reduced without extra hardware.
Where Pith is reading between the lines
- Similar metadata-driven semantic extraction could apply to other cold-start decisions such as VM or container scheduling.
- The reliance on names raises questions about how changes in tenant naming conventions would affect accuracy over time.
- If the correlation between semantics and phases proves stable, the same signals might support predictive auto-scaling rather than only placement.
Load-bearing premise
Tenant-provided names and identifiers contain recoverable semantic information that correlates reliably with the actual temporal load phases of the underlying applications.
What would settle it
Production traces in which the phases inferred by the LLM from names show no better correlation with observed load peaks than random assignment would falsify the central claim.
Figures
read the original abstract
Cloud Virtual Disk (CVD) placement in Cloud Block Storage (CBS) is critical for resource efficiency and performance isolation. Existing schemes prioritize spatial load balancing by dispersing disks across pods based on configuration-derived load estimates. However, overload risk in CBS is fundamentally temporal. Even when average load is balanced, pods can still suffer transient congestion when the peaks of co-located disks align in time. Achieving complementary placement, which co-locates CVDs with offset peaks, is hard at provisioning time because new disks have no history from which to infer temporal phase. We present TIDAL, a CVD placement framework that recovers phase-aware signals for cold-start placement from an underused source: tenant-provided names and identifiers in provisioning metadata. TIDAL first uses LLMs to recover application semantics from noisy metadata such as project, VM, and disk names. It then translates these semantics into phase-aware temporal signals to guide complementary placement. To satisfy control-plane constraints, TIDAL adopts an offline-to-online design with teacher-student distillation, regex-based filtering, and prefix-aware caching, enabling CPU-only inference with millisecond-level latency. Evaluations driven by production traces show that TIDAL reduces overload frequency by 79.1% and P95 overload duration by 73.7% compared with the strongest baselines.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents TIDAL, a CVD placement framework for cloud block storage that recovers phase-aware temporal signals from tenant-provided metadata names and identifiers using LLMs. It translates these semantics into complementary placement decisions to offset load peaks and reduce transient overloads, employing an offline teacher-student distillation design with regex filtering and prefix caching for low-latency CPU inference. Evaluations on production traces report 79.1% reduction in overload frequency and 73.7% reduction in P95 overload duration relative to the strongest baselines.
Significance. If the results hold, TIDAL offers a practical way to incorporate temporal phase information into cold-start placement where history is unavailable, addressing a gap in existing spatial load-balancing schemes. The offline-to-online architecture with distillation and caching is a strength for meeting control-plane constraints. This could improve resource efficiency and isolation in CBS deployments if the LLM-derived signals prove reliable across workloads.
major comments (2)
- [§4] §4, Evaluation: The headline reductions (79.1% overload frequency, 73.7% P95 duration) are measured against external baselines on production traces, but the section provides insufficient detail on LLM prompting strategy, the precise phase derivation method from semantics, baseline definitions, or controls for LLM output variability; these elements are load-bearing for assessing whether the gains stem from the claimed semantic recovery.
- [§3.2] §3.2, Semantic-to-Phase Translation: The mapping from LLM-extracted application semantics to temporal phase signals relies on an assumed correlation between tenant metadata and actual load phases; without an ablation isolating this component or explicit validation of the correlation on the traces, the central claim that metadata semantics enable reliable complementary placement remains difficult to verify.
minor comments (2)
- [§3.3] The description of prefix-aware caching in §3.3 could include concrete latency measurements or cache hit rates to better substantiate the millisecond-level inference claim.
- [Figure 3] Figure 3 (placement comparison) would benefit from error bars or statistical significance tests on the overload metrics to strengthen the quantitative comparison.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We respond to each major comment below and indicate the revisions we will make to address the concerns raised.
read point-by-point responses
-
Referee: [§4] §4, Evaluation: The headline reductions (79.1% overload frequency, 73.7% P95 duration) are measured against external baselines on production traces, but the section provides insufficient detail on LLM prompting strategy, the precise phase derivation method from semantics, baseline definitions, or controls for LLM output variability; these elements are load-bearing for assessing whether the gains stem from the claimed semantic recovery.
Authors: We agree that the evaluation section would benefit from greater transparency on these points to allow independent assessment of the results. In the revised version we will expand §4 to include the exact prompting templates and system instructions provided to the LLM, the rule-based procedure that converts extracted semantic categories into phase offset signals, the precise configurations and parameter settings used for each baseline, and additional runs that quantify sensitivity to LLM output stochasticity. These additions will make explicit how the reported reductions are tied to the semantic recovery mechanism. revision: yes
-
Referee: [§3.2] §3.2, Semantic-to-Phase Translation: The mapping from LLM-extracted application semantics to temporal phase signals relies on an assumed correlation between tenant metadata and actual load phases; without an ablation isolating this component or explicit validation of the correlation on the traces, the central claim that metadata semantics enable reliable complementary placement remains difficult to verify.
Authors: The current manuscript motivates the mapping from domain knowledge of typical cloud workload patterns (e.g., distinguishing interactive services from batch jobs) and shows that the resulting placement decisions improve outcomes on production traces. We acknowledge that an isolated ablation of the translation step and direct statistical validation of the metadata-to-phase correlation are not presented. In revision we will add a dedicated paragraph in §3.2 that spells out the mapping rules with examples and, using the available trace metadata, report a targeted comparison of placement quality with and without the phase signals to provide the requested validation. revision: partial
Circularity Check
No significant circularity; derivation is self-contained
full rationale
The paper derives phase-aware placement signals by applying LLMs to tenant-provided metadata names and identifiers, then using the resulting semantics for complementary CVD placement. This chain operates on external inputs (provisioning metadata) and is evaluated via direct measurement on production traces against independent baselines, with no fitted parameters renamed as predictions, no self-definitional loops in the equations, and no load-bearing self-citations or ansatz smuggling. The offline teacher-student distillation and caching are implementation details that do not reduce the core claim to its own outputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLMs can recover application semantics from noisy tenant-provided names and identifiers that correlate with temporal load phases.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
TIDAL first uses LLMs to recover application semantics from noisy metadata such as project, VM, and disk names. It then translates these semantics into phase-aware temporal signals to guide complementary placement.
-
IndisputableMonolith/Foundation/DimensionForcing.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We set the temporal resolution to K=12 (2-hour time slots).
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Tarek F. Abdelzaher and Nina Bhatti. 1999. Web content adaptation to improve server overload behavior.Comput. Netw.31, 11–16 (May 1999), 1563–1577. doi:10.1016/S1389-1286(99)00031-6
-
[2]
Jennifer Abel and Birger Lantow. 2019. A Methodological Framework for Dictionary and Rule-based Text Classification. InProceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2019) - Volume 1: KDIR. INSTICC, SciTePress, 330–337. doi:10.5220/0008121503300337
-
[3]
2025.What is Amazon Elastic Block Store.https://docs.aws
Amazon. 2025.What is Amazon Elastic Block Store.https://docs.aws. amazon.com/ebs/latest/userguide/what-is-ebs.html
work page 2025
-
[4]
Pradeep Ambati, Inigo Goiri, Felipe Frujeri, Alper Gun, Ke Wang, Brian Dolan, Brian Corell, Sekhar Pasupuleti, Thomas Moscibroda, Sameh Elnikety, Marcus Fontoura, and Ricardo Bianchini. 2020. Pro- viding SLOs for Resource-Harvesting VMs in Cloud Platforms. In14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association...
work page 2020
-
[5]
Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, et al . 2023. Qwen Technical Report.arXiv preprint(2023). arXiv:2309.16609 [cs.CL]
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[6]
Maguire Jr., Panagiotis Papadimitratos, and Marco Chiesa
Tom Barbette, Chen Tang, Haoran Yao, Dejan Kostić, Gerald Q. Maguire Jr., Panagiotis Papadimitratos, and Marco Chiesa. 2020. A High-Speed Load-Balancer Design with Guaranteed Per-Connection- Consistency. In17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20). USENIX Association, Santa Clara, CA, 667–683.https://www.usenix.org/con...
work page 2020
-
[7]
Bogdanov, Waleed Reda, Gerald Q
Kirill L. Bogdanov, Waleed Reda, Gerald Q. Maguire, Dejan Kostić, and Marco Canini. 2018. Fast and Accurate Load Balancing for Geo- Distributed Storage Systems. InProceedings of the ACM Symposium on Cloud Computing(Carlsbad, CA, USA)(SoCC ’18). Association for Computing Machinery, New York, NY, USA, 386–400. doi:10.1145/ 3267809.3267820
-
[8]
Leo Breiman. 2001. Random Forests.Machine Learning45, 1 (Oct. 2001), 5–32. doi:10.1023/A:1010933404324
-
[9]
Brendan Burns, Brian Grant, David Oppenheimer, Eric Brewer, and John Wilkes. 2016. Borg, Omega, and Kubernetes.Commun. ACM59, 5 (April 2016), 50–57. doi:10.1145/2890784
-
[10]
Liuhua Chen and Haiying Shen. 2014. Consolidating complementary VMs with spatial/temporal-awareness in cloud datacenters. InIEEE INFOCOM 2014 - IEEE Conference on Computer Communications. 1033–
work page 2014
-
[11]
doi:10.1109/INFOCOM.2014.6848033
-
[12]
Ruobing Chen, Haosen Shi, Yusen Li, Xiaoguang Liu, and Gang Wang
-
[13]
In Proceedings of the Eighteenth European Conference on Computer Sys- tems(Rome, Italy)(EuroSys ’23)
OLPart: Online Learning based Resource Partitioning for Colo- cating Multiple Latency-Critical Jobs on Commodity Computers. In Proceedings of the Eighteenth European Conference on Computer Sys- tems(Rome, Italy)(EuroSys ’23). Association for Computing Machinery, New York, NY, USA, 347–364. doi:10.1145/3552326.3567490
-
[14]
Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(San Francisco, California, USA)(KDD ’16). Association for Computing Machinery, New York, NY, USA, 785–794. doi:10.1145/2939672.2939785
-
[15]
Xinqi Chen, Yu Zhang, Erci Xu, Changhong Wang, Jifei Yi, Qiup- ing Wang, Shizhuo Sun, Zhongyu Wang, Haonan Wu, Junping Wu, et al. 2026. How Soon is Now? Preloading Images for Virtual Disks with {ThinkAhead}. In24th USENIX Conference on File and Storage Technologies (FAST 26). 399–414
work page 2026
-
[16]
Ho-Ren Chuang, Karim Manaouil, Tong Xing, Antonio Barbalace, Pierre Olivier, Balvansh Heerekar, and Binoy Ravindran. 2023. Ag- gregate VM: Why Reduce or Evict VM’s Resources When You Can Borrow Them From Other Nodes?. InProceedings of the Eighteenth European Conference on Computer Systems(Rome, Italy)(EuroSys ’23). Association for Computing Machinery, New...
-
[17]
Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hansen, Eric Jul, Christian Limpach, Ian Pratt, and Andrew Warfield. 2005. Live Migration of Virtual Machines. In2nd Symposium on Networked Sys- tems Design & Implementation (NSDI 05). USENIX Association, Boston, MA.https://www.usenix.org/conference/nsdi-05/live-migration- virtual-machines
work page 2005
-
[18]
Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaud- hary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. 2020. Unsupervised Cross-lingual Representation Learning at Scale. InProceedings of the 58th Annual Meeting of the Association for Computational Linguis- tics, Dan Jurafsky, Joyce Chai,...
-
[19]
Eli Cortez, Anand Bonde, Alexandre Muzio, Mark Russinovich, Mar- cus Fontoura, and Ricardo Bianchini. 2017. Resource Central: Un- derstanding and Predicting Workloads for Improved Resource Man- agement in Large Cloud Platforms. InProceedings of the 26th Sym- posium on Operating Systems Principles(Shanghai, China)(SOSP ’17). Association for Computing Machi...
- [20]
-
[21]
Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, and Ziqing Yang
-
[22]
IEEE/ACM Transactions on Audio, Speech, and Language Processing29 (2021), 3504–3514
Pre-Training With Whole Word Masking for Chinese BERT. IEEE/ACM Transactions on Audio, Speech, and Language Processing29 (2021), 3504–3514. doi:10.1109/TASLP.2021.3124365
-
[23]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova
-
[24]
BERT: Pre-training of Deep Bidirectional Transformers for Lan- guage Understanding. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguis- tics: Human Language Technologies, Volume 1 (Long and Short Papers), TIDAL: Recovering Temporal Phase for Cloud Block Storage Placement from LLM-Derived Semanti...
-
[25]
David Domingo, Hugo Barbalho, Marco Molinaro, Kuan Liu, Abhisek Pan, David Dion, Thomas Moscibroda, Sudarsun Kannan, and Ishai Menache. 2025. Kamino: efficient VM allocation at scale with latency- driven cache-aware scheduling. InProceedings of the 19th USENIX Conference on Operating Systems Design and Implementation(Boston, MA, USA)(OSDI ’25). USENIX Ass...
work page 2025
-
[26]
Danielle E. Eisenbud, Cheng Yi, Carlo Contavalli, Cody Smith, Ro- man Kononov, Eric Mann-Hielscher, Ardas Cilingiroglu, Bin Cheyney, Wentao Shang, and Jinnah Dylan Hosein. 2016. Maglev: A Fast and Reliable Software Network Load Balancer. In13th USENIX Sympo- sium on Networked Systems Design and Implementation (NSDI 16). USENIX Association, Santa Clara, CA...
work page 2016
- [27]
-
[28]
Yao Fu, Leyang Xue, Yeqi Huang, Andrei-Octavian Brabete, Dmitrii Ustiugov, Yuvraj Patel, and Luo Mai. 2024. ServerlessLLM: Low- Latency Serverless Inference for Large Language Models. In18th USENIX Symposium on Operating Systems Design and Implemen- tation (OSDI 24). USENIX Association, Santa Clara, CA, 135–153. https://www.usenix.org/conference/osdi24/pr...
work page 2024
-
[29]
Shayna Gardiner, Tania Habib, Kevin Humphreys, Masha Azizi, Fred- eric Mailhot, Anne Paling, Preston Thomas, and Nathan Zhang. 2024. Data Anonymization for Privacy-Preserving Large Language Model Fine-Tuning on Call Transcripts. InProceedings of the Workshop on Computational Approaches to Language Data Pseudonymization (CALD- pseudo 2024), Elena Volodina,...
work page 2024
-
[30]
Gemini Team. 2023. Gemini: A Family of Highly Capable Multimodal Models.arXiv preprint(2023). arXiv:2312.11805 [cs.CL]
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[31]
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, et al. 2025. Deepseek-r1 incentivizes reasoning in llms through reinforcement learning.Nature645, 8081 (2025), 633–638
work page 2025
-
[32]
Ori Hadary, Luke Marshall, Ishai Menache, Abhisek Pan, Esaias E Greeff, David Dion, Star Dorminey, Shailesh Joshi, Yang Chen, Mark Russinovich, and Thomas Moscibroda. 2020. Protean: VM Allocation Service at Scale. In14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, 845–861. https://www.usenix.org/conferenc...
work page 2020
-
[33]
Fabien Hermenier, Xavier Lorca, Jean-Marc Menaud, Gilles Muller, and Julia Lawall. 2009. Entropy: a consolidation manager for clusters. In Proceedings of the 2009 ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments(Washington, DC, USA)(VEE ’09). Association for Computing Machinery, New York, NY, USA, 41–50. doi:10.1145/1508293.1508300
-
[34]
Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, and Qun Liu. 2020. TinyBERT: Distilling BERT for Natural Language Understanding. InFindings of the Association for Computational Linguistics: EMNLP 2020, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, Online, 4163–4174. doi:10.18653/v...
-
[35]
Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov
-
[36]
Bag of Tricks for Efficient Text Classification. InProceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, Mirella Lapata, Phil Blunsom, and Alexander Koller (Eds.). Association for Computational Linguistics, Valencia, Spain, 427–431.https://aclanthology.org/E17- 2068/
work page 2068
-
[37]
Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Wei- dong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. LightGBM: a highly efficient gradient boosting decision tree. InProceedings of the 31st International Conference on Neural Information Processing Systems(Long Beach, Cal- ifornia, USA)(NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 3149–3157
work page 2017
-
[38]
Misra, Seyyed Ah- mad Javadi, Bianca Schroeder, Marcus Fontoura, and Ricardo Bian- chini
Alok Gautam Kumbhare, Reza Azimi, Ioannis Manousakis, Anand Bonde, Felipe Frujeri, Nithish Mahalingam, Pulkit A. Misra, Seyyed Ah- mad Javadi, Bianca Schroeder, Marcus Fontoura, and Ricardo Bian- chini. 2021. Prediction-Based Power Oversubscription in Cloud Plat- forms. In2021 USENIX Annual Technical Conference (USENIX ATC 21). USENIX Association, 473–487...
work page 2021
-
[39]
Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph Gonzalez, Hao Zhang, and Ion Stoica
-
[40]
Efficient Memory Management for Large Language Model Serving with PagedAttention , booktitle =
Efficient Memory Management for Large Language Model Serving with PagedAttention. InProceedings of the 29th Symposium on Operating Systems Principles(Koblenz, Germany)(SOSP ’23). As- sociation for Computing Machinery, New York, NY, USA, 611–626. doi:10.1145/3600006.3613165
-
[41]
Jinhong Li, Qiuping Wang, Patrick P. C. Lee, and Chao Shi. 2023. An In-depth Comparative Analysis of Cloud Block Storage Workloads: Findings and Implications.ACM Trans. Storage19, 2, Article 16 (March 2023), 32 pages. doi:10.1145/3572779
-
[42]
Suyi Li, Luping Wang, Wei Wang, Yinghao Yu, and Bo Li. 2021. George: Learning to Place Long-Lived Containers in Large Clusters with Oper- ation Constraints. InProceedings of the ACM Symposium on Cloud Com- puting(Seattle, WA, USA)(SoCC ’21). Association for Computing Ma- chinery, New York, NY, USA, 258–272. doi:10.1145/3472883.3486971
-
[43]
Bush, Prakash Ramanan, Rajesh Kumar, Thomas Chestna, Yajing Liu, Ying Liu, Ye Zhao, Kathryn S
Jianheng Ling, Pratik Worah, Yawen Wang, Yunchuan Kong, Chunlei Wang, Clifford Stein, Diwakar Gupta, Jason Behmer, Logan A. Bush, Prakash Ramanan, Rajesh Kumar, Thomas Chestna, Yajing Liu, Ying Liu, Ye Zhao, Kathryn S. McKinley, Meeyoung Park, and Martin Maas
-
[44]
InEighth Conference on Machine Learning and Systems.https://openreview.net/forum?id= 9vyyfVNW1E
LAVA: Lifetime-Aware VM Allocation with Learned Distri- butions and Adaptation to Mispredictions. InEighth Conference on Machine Learning and Systems.https://openreview.net/forum?id= 9vyyfVNW1E
-
[45]
Shutian Luo, Huanle Xu, Kejiang Ye, Guoyao Xu, Liping Zhang, Guodong Yang, and Chengzhong Xu. 2022. The power of predic- tion: microservice auto scaling via workload learning. InProceedings of the 13th Symposium on Cloud Computing(San Francisco, California) (SoCC ’22). Association for Computing Machinery, New York, NY, USA, 355–369. doi:10.1145/3542929.3563477
-
[46]
Martin Maas. 2020. A Taxonomy of ML for Systems Problems.IEEE Micro40, 5 (2020), 8–16. doi:10.1109/MM.2020.3012883
-
[47]
Heax: An architecture for computing on encrypted data,
Martin Maas, David G. Andersen, Michael Isard, Mohammad Mahdi Javanmard, Kathryn S. McKinley, and Colin Raffel. 2020. Learning- based Memory Allocation for C++ Server Workloads. InProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems(Lausanne, Switzer- land)(ASPLOS ’20). Associati...
-
[48]
Hongzi Mao, Mohammad Alizadeh, Ishai Menache, and Srikanth Kan- dula. 2016. Resource Management with Deep Reinforcement Learning. InProceedings of the 15th ACM Workshop on Hot Topics in Networks(At- lanta, GA, USA)(HotNets ’16). Association for Computing Machinery, New York, NY, USA, 50–56. doi:10.1145/3005745.3005750
-
[49]
Haoyu Mao, Yongkun Li, Wenzhe Zhu, Fei Li, and Yinlong Xu. 2023. On Optimizing Traffic Imbalance in Large-scale Block-based Cloud Storage: Trace Analysis and Algorithm Design. In2022 IEEE 28th International Conference on Parallel and Distributed Systems (ICPADS). 728–736. doi:10.1109/ICPADS56603.2022.00100 Trovato et al
-
[50]
Hongzi Mao, Malte Schwarzkopf, Shaileshh Bojja Venkatakrishnan, Zili Meng, and Mohammad Alizadeh. 2019. Learning scheduling algo- rithms for data processing clusters. InProceedings of the ACM Special Interest Group on Data Communication(Beijing, China)(SIGCOMM ’19). Association for Computing Machinery, New York, NY, USA, 270–288. doi:10.1145/3341302.3342080
-
[51]
Meyer, Gitika Aggarwal, Brendan Cully, Geoffrey Lefeb- vre, Michael J
Dutch T. Meyer, Gitika Aggarwal, Brendan Cully, Geoffrey Lefeb- vre, Michael J. Feeley, Norman C. Hutchinson, and Andrew Warfield
-
[52]
Parallax: virtual disks for virtual machines. InProceedings of the 3rd ACM SIGOPS/EuroSys European Conference on Computer Systems 2008(Glasgow, Scotland UK)(Eurosys ’08). Association for Computing Machinery, New York, NY, USA, 41–54. doi:10.1145/1352592.1352598
-
[53]
Justin J. Meza, Thote Gowda, Ahmed Eid, Tomiwa Ijaware, Dmitry Chernyshev, Yi Yu, Md Nazim Uddin, Rohan Das, Chad Nachiappan, Sari Tran, Shuyang Shi, Tina Luo, David Ke Hong, Sankaralingam Pan- neerselvam, Hans Ragas, Svetlin Manavski, Weidong Wang, and Fran- cois Richard. 2023. Defcon: Preventing Overload with Graceful Feature Degradation. In17th USENIX ...
work page 2023
-
[54]
Rui Miao, Hongyi Zeng, Changhoon Kim, Jeongkeun Lee, and Minlan Yu. 2017. SilkRoad: Making Stateful Layer-4 Load Balancing Fast and Cheap Using Switching ASICs. InProceedings of the Conference of the ACM Special Interest Group on Data Communication(Los Angeles, CA, USA)(SIGCOMM ’17). Association for Computing Machinery, New York, NY, USA, 15–28. doi:10.11...
-
[55]
Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Zeyu Wang, Zhengxin Zhang, Rae Ying Yee Wong, Alan Zhu, Lijie Yang, Xi- aoxiang Shi, Chunan Shi, Zhuoming Chen, Daiyaan Arfeen, Reyna Ab- hyankar, and Zhihao Jia. 2024. SpecInfer: Accelerating Large Language Model Serving with Tree-based Speculative Inference and Verification. InProceedings of the ...
-
[56]
Microsoft. 2025.Introduction to Azure managed disks.https: //learn.microsoft.com/en-us/azure/virtual-machines/managed- disks-overview
work page 2025
-
[57]
Vladimir Olteanu, Alexandru Agache, Andrei Voinescu, and Costin Raiciu. 2018. Stateless Datacenter Load-balancing with Beamer. In15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18). USENIX Association, Renton, WA, 125–139.https://www. usenix.org/conference/nsdi18/presentation/olteanu
work page 2018
-
[58]
OpenAI. 2023. GPT-4 Technical Report.arXiv preprint(2023). arXiv:2303.08774 [cs.CL]
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[59]
Maltz, Randy Kern, Hemant Kumar, Marios Zikos, Hongyu Wu, Changhoon Kim, and Naveen Karri
Parveen Patel, Deepak Bansal, Lihua Yuan, Ashwin Murthy, Albert Greenberg, David A. Maltz, Randy Kern, Hemant Kumar, Marios Zikos, Hongyu Wu, Changhoon Kim, and Naveen Karri. 2013. Ananta: cloud scale load balancing. InProceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM(Hong Kong, China)(SIGCOMM ’13). As- sociation for Computing Machinery, New York,...
-
[60]
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Pret- tenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. 2011. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res.12, nul...
work page 2011
-
[61]
Yanghua Peng, Yixin Bao, Yangrui Chen, Chuan Wu, and Chuanxiong Guo. 2018. Optimus: an efficient dynamic resource scheduler for deep learning clusters. InProceedings of the Thirteenth EuroSys Conference (Porto, Portugal)(EuroSys ’18). Association for Computing Machinery, New York, NY, USA, Article 3, 14 pages. doi:10.1145/3190508.3190517
-
[62]
Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush, and Andrey Gulin. 2018. CatBoost: unbiased boosting with categorical features. InProceedings of the 32nd International Conference on Neural Information Processing Systems (Montréal, Canada)(NIPS’18). Curran Associates Inc., Red Hook, NY, USA, 6639–6649
work page 2018
-
[63]
Aurick Qiao, Sang Keun Choe, Suhas Jayaram Subramanya, Willie Neiswanger, Qirong Ho, Hao Zhang, Gregory R. Ganger, and Eric P. Xing. 2021. Pollux: Co-adaptive Cluster Scheduling for Goodput- Optimized Deep Learning. In15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21). USENIX Association, 1–18.https://www.usenix.org/conference/...
work page 2021
-
[64]
Banerjee, Saurabh Jha, Zbigniew T
Haoran Qiu, Subho S. Banerjee, Saurabh Jha, Zbigniew T. Kalbar- czyk, and Ravishankar K. Iyer. 2020. FIRM: An Intelligent Fine- grained Resource Management Framework for SLO-Oriented Mi- croservices. In14th USENIX Symposium on Operating Systems De- sign and Implementation (OSDI 20). USENIX Association, 805–825. https://www.usenix.org/conference/osdi20/pre...
work page 2020
-
[65]
Benjamin Reidys, Jinghan Sun, Anirudh Badam, Shadi Noghabi, and Jian Huang. 2022. BlockFlex: Enabling Storage Harvesting with Software-Defined Flash in Modern Cloud Platforms. In16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22). USENIX Association, Carlsbad, CA, 17–33.https://www.usenix. org/conference/osdi22/presentation/reidys
work page 2022
-
[66]
Benjamin Reidys, Pantea Zardoshti, Íñigo Goiri, Celine Irvene, Daniel S. Berger, Haoran Ma, Kapil Arya, Eli Cortez, Taylor Stark, Eugene Bak, Mehmet Iyigun, Stanko Novakovic, Lisa Hsu, Karel Trueba, Abhisek Pan, Chetan Bansal, Saravan Rajmohan, Jian Huang, and Ricardo Bianchini. 2025. Coach: Exploiting Temporal Patterns for All-Resource Oversubscription i...
-
[67]
Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf
-
[68]
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv:1910.01108 [cs.CL]https://arxiv.org/abs/1910.01108
work page internal anchor Pith review Pith/arXiv arXiv 1910
-
[69]
Steven S. Seiden. 2002. On the online bin packing problem.J. ACM 49, 5 (Sept. 2002), 640–671. doi:10.1145/585265.585269
-
[70]
Bin Shi, Haiying Shen, Bo Dong, and Qinghua Zheng. 2022. Mem- ory/Disk Operation Aware Lightweight VM Live Migration.IEEE/ACM Trans. Netw.30, 4 (March 2022), 1895–1910. doi:10.1109/TNET.2022. 3155935
-
[71]
Jiuchen Shi, Kaihua Fu, Quan Chen, Changpeng Yang, Pengfei Huang, Mosong Zhou, Jieru Zhao, Chen Chen, and Minyi Guo. 2022. Charac- terizing and orchestrating VM reservation in geo-distributed clouds to improve the resource efficiency. InProceedings of the 13th Sym- posium on Cloud Computing(San Francisco, California)(SoCC ’22). Association for Computing M...
-
[72]
Junyi Shu, Kun Qian, Ennan Zhai, Xuanzhe Liu, and Xin Jin. 2024. Burstable Cloud Block Storage with Data Processing Units. In18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24). USENIX Association, Santa Clara, CA, 783–799.https: //www.usenix.org/conference/osdi24/presentation/shu
work page 2024
-
[73]
Jovan Stojkovic, Chaojie Zhang, Íñigo Goiri, Esha Choukse, Haoran Qiu, Rodrigo Fonseca, Josep Torrellas, and Ricardo Bianchini. 2025. TAPAS: Thermal- and Power-A ware Scheduling for LLM Inference in Cloud Platforms. Association for Computing Machinery, New York, NY, USA, 1266–1281.https://doi.org/10.1145/3676641.3716025
-
[74]
Jinghan Sun, Benjamin Reidys, Daixuan Li, Jichuan Chang, Marc Snir, and Jian Huang. 2025. FleetIO: Managing Multi-Tenant Cloud Stor- age with Multi-Agent Reinforcement Learning. InProceedings of the TIDAL: Recovering Temporal Phase for Cloud Block Storage Placement from LLM-Derived Semantics 30th ACM International Conference on Architectural Support for P...
-
[75]
Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou. 2020. MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault (Eds.). Association for Com- putational Lingui...
-
[76]
Difan Tan, Jiawei Li, Hua Wang, Xiaoxiao Li, Wenbo Liu, Zijin Qin, Ke Zhou, Ming Xie, and Mengling Tao. 2025. Tela: A Temporal Load- Aware Cloud Virtual Disk Placement Scheme. InProceedings of the 30th ACM International Conference on Architectural Support for Pro- gramming Languages and Operating Systems, Volume 1(Rotterdam, Netherlands)(ASPLOS ’25). Asso...
-
[77]
Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Alma- hairi, Yasmine Babaei, et al. 2023. Llama 2: Open Foundation and Fine- Tuned Chat Models.arXiv preprint(2023). arXiv:2307.09288 [cs.CL]
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[78]
Akshat Verma, Puneet Ahuja, and Anindya Neogi. 2008. pMapper: Power and Migration Cost Aware Application Placement in Virtualized Systems. InMiddleware 2008, Valérie Issarny and Richard Schantz (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 243–264
work page 2008
-
[79]
Hua Wang, Yang Yang, Ping Huang, Yu Zhang, Ke Zhou, Mengling Tao, and Bin Cheng. 2020. S-CDA: A Smart Cloud Disk Allocation Approach in Cloud Block Storage System. In2020 57th ACM/IEEE Design Automation Conference (DAC). 1–6. doi:10.1109/DAC18072. 2020.9218702
-
[80]
Lu Wang, Mayukh Das, Fangkai Yang, Bo Qiao, Hang Dong, Si Qin, Victor Rühle, Chetan Bansal, Eli Cortez, Íñigo Goiri, Saravan Rajmo- han, Qingwei Lin, and Dongmei Zhang. 2025. ProtoRAIL: A Risk- cognizant Imitation Agent for Adaptive vCPU Oversubscription In the Cloud. InEighth Conference on Machine Learning and Systems. https://openreview.net/forum?id=Dt8s7CIsEu
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.