Recognition: no theorem link
SemaTune: Semantic-Aware Online OS Tuning with Large Language Models
Pith reviewed 2026-05-15 02:33 UTC · model grok-4.3
The pith
SemaTune uses language models to reason over OS knob meanings and history, delivering 72.5 percent better stable performance than defaults across 13 workloads.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SemaTune shows that bounded language-model guidance, combined with typed validation and dual-loop control, turns OS tuning into a semantically aware process that improves stable-phase performance by 72.5 percent over defaults and 153.3 percent over the strongest non-LLM baseline on 13 live workloads while tuning up to 41 parameters. The same controller still outperforms direct-application-objective baselines by 93.7 percentage points when restricted to host-level metrics alone and avoids the severe degraded regions reached by black-box exploration.
What carries the argument
Dual-loop controller that packs knob schemas, telemetry, configuration, history, and retrieved runs into compact context for an LLM, then validates every proposed change before kernel or sysctl application.
If this is right
- Tuning decisions can now incorporate cross-knob policy structure and indirect performance signals instead of scalar rewards alone.
- Host-level controllers become viable for services that do not expose application metrics.
- Exploration can be constrained to prevent entry into degraded states that continue after the bad setting is removed.
- Model cost stays low, around 20 cents for a 30-window session, while still outperforming structure-blind methods.
Where Pith is reading between the lines
- The validation layer could be extended to other system interfaces such as network or storage stacks where semantic constraints are similarly available.
- History retrieval might allow the slower loop to detect workload phase changes and switch strategies without additional human input.
- Combining the semantic proposals with lightweight local models could reduce latency further while preserving the safety guarantees of typed checks.
Load-bearing premise
The language model will generate changes that improve or at least maintain performance after typed validation, even when only host-level metrics are available.
What would settle it
A workload where SemaTune, after validation, enters a degraded performance region that persists longer or more severely than the strongest non-LLM baseline under identical host-metric inputs.
Figures
read the original abstract
Online OS tuning can improve long-running services, but existing controllers are poorly matched to live hosts. They treat scheduler, power, memory, and I/O controls as black-box variables and optimize a scalar reward. This view ignores cross-knob policy structure, breaks down when application metrics are unavailable, and can send a running service into degraded regions that persist after the bad setting is removed. We present SemaTune, a host-side framework for steady-state OS tuning with bounded language-model guidance. SemaTune turns knob schemas, telemetry, current configuration, recent action--response history, and retrieved prior runs into a compact decision context. A fast loop proposes low-latency updates, a slower loop periodically revises the search strategy, and every proposed change passes through typed validation before reaching kernel or sysctl interfaces. This lets the controller reason about OS-control meaning and indirect performance signals while keeping model cost, latency, and authority constrained. We evaluate SemaTune on 13 live workloads from five benchmark suites while tuning up to 41 Linux parameters. Across the suite, SemaTune improves stable-phase performance by 72.5\% over default settings and by 153.3\% relative to the strongest non-LLM baseline. A 30-window session costs about \$0.20 in model calls. With only host-level metrics, SemaTune still outperforms baselines given direct application objectives by 93.7 percentage points, while avoiding severe degraded regions reached by structure-blind exploration.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces SemaTune, a host-side framework for online OS tuning of Linux parameters that incorporates LLM guidance informed by knob schemas, telemetry, current configuration, action-response history, and retrieved prior runs. It uses a fast proposal loop and a slower strategy-revision loop, with all changes passing typed validation before application. Evaluated on 13 live workloads from five benchmark suites while tuning up to 41 parameters, SemaTune reports 72.5% stable-phase improvement over defaults and 153.3% over the strongest non-LLM baseline, at low model cost, while claiming to avoid persistent degraded regions even with only host-level metrics.
Significance. If the central claims hold under rigorous validation, the work would represent a meaningful advance in practical online systems tuning by demonstrating how constrained LLM reasoning over semantic and historical context can outperform black-box controllers, particularly in settings without direct application metrics. The bounded-cost design and explicit handling of cross-knob structure address known failure modes of prior methods.
major comments (2)
- [Evaluation] Evaluation section: the reported 72.5% and 153.3% stable-phase gains are presented without per-workload traces, post-tuning monitoring beyond the 30-window sessions, or statistical tests confirming absence of regression after the tuning window closes; this leaves the claim that semantic context reliably prevents persistent degradation unverified.
- [System Design and Evaluation] The typed-validation mechanism is described as checking schemas and interfaces, yet no analysis or experiments demonstrate that it catches emergent cross-knob interactions (e.g., scheduler-memory-I/O combinations producing sustained high latency); the abstract notes that structure-blind methods reach such regions, but the evaluation provides no concrete evidence that SemaTune avoids them.
minor comments (2)
- The abstract and evaluation could more explicitly list the 41 Linux parameters and the five benchmark suites to improve reproducibility.
- [System Design] Notation for the fast and slow loops is introduced without a compact diagram or pseudocode, making the control flow harder to follow on first reading.
Simulated Author's Rebuttal
Thank you for the constructive feedback. We address each major comment below with targeted revisions to strengthen the evaluation and clarify the design.
read point-by-point responses
-
Referee: [Evaluation] Evaluation section: the reported 72.5% and 153.3% stable-phase gains are presented without per-workload traces, post-tuning monitoring beyond the 30-window sessions, or statistical tests confirming absence of regression after the tuning window closes; this leaves the claim that semantic context reliably prevents persistent degradation unverified.
Authors: We agree that additional per-workload detail and statistical support would strengthen the claims. In the revised manuscript we will add an appendix with per-workload performance traces for all 13 workloads and include statistical tests (paired t-tests with p-values) on the stable-phase improvements. The 30-window sessions define the evaluation window, with stable phase measured in the final windows; we did not collect extended post-session monitoring data. The avoidance of persistent degradation is evidenced by the absence of the regressions observed in baselines during these sessions, but we will add an explicit limitations paragraph noting that longer-term post-tuning monitoring remains future work. revision: partial
-
Referee: [System Design and Evaluation] The typed-validation mechanism is described as checking schemas and interfaces, yet no analysis or experiments demonstrate that it catches emergent cross-knob interactions (e.g., scheduler-memory-I/O combinations producing sustained high latency); the abstract notes that structure-blind methods reach such regions, but the evaluation provides no concrete evidence that SemaTune avoids them.
Authors: Typed validation performs schema conformance and interface compatibility checks to reject syntactically invalid settings, but does not model or detect emergent cross-knob interactions at runtime. Avoidance of degraded regions is achieved by the LLM's semantic reasoning over knob schemas, telemetry, action history, and retrieved runs in both the fast proposal and strategy-revision loops. The evaluation shows SemaTune outperforming structure-blind baselines without entering severe degradation, yet we do not isolate a specific cross-knob failure case. In revision we will clarify this distinction in Section 3 and add a qualitative example illustrating how semantic context steers away from a known harmful scheduler-memory-I/O combination. revision: partial
Circularity Check
No significant circularity in derivation chain
full rationale
The paper describes an empirical system for LLM-guided OS tuning and reports concrete performance gains from live workload experiments against external baselines and defaults. No equations, fitted parameters, self-citations, or ansatzes are invoked as load-bearing steps in any derivation; the central claims rest on measured improvements (72.5% and 153.3%) rather than reductions to inputs by construction. The evaluation is self-contained against independent benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Large language models possess sufficient semantic understanding of OS controls and telemetry to propose effective tuning actions.
Reference graph
Works this paper leans on
-
[1]
PhD thesis, Inria Rennes-Bretagne Atlantique, 2019
Mathieu Acher, Hugo Martin, Juliana Alves Pereira, Arnaud Blouin, Jean-Marc Jézéquel, Djamel Eddine Khelladi, Luc Lesoil, and Olivier Barais.Learning very large configuration spaces: What matters for linux kernel sizes. PhD thesis, Inria Rennes-Bretagne Atlantique, 2019
work page 2019
-
[2]
Improving storage systems using machine learning.ACM Transactions on Storage, 19(1):1– 30, 2023
Ibrahim Umit Akgun, Ali Selman Aydin, Andrew Burford, Michael McNeill, Michael Arkhangelskiy, and Erez Zadok. Improving storage systems using machine learning.ACM Transactions on Storage, 19(1):1– 30, 2023
work page 2023
-
[3]
A machine learning framework to improve storage system performance
Ibrahim Umit Akgun, Ali Selman Aydin, Aadil Shaikh, Lukas Velikov, and Erez Zadok. A machine learning framework to improve storage system performance. InProceedings of the 13th ACM Workshop on Hot Topics in Storage and File Systems, HotStorage ’21, page 94–102, New York, NY, USA, 2021. Association for Computing Machinery
work page 2021
-
[4]
Cose: Configuring serverless functions using statistical learning
Nabeel Akhtar, Ali Raza, Vatche Ishakian, and Ibrahim Matta. Cose: Configuring serverless functions using statistical learning. InIEEE INFOCOM 2020 - IEEE Conference on Computer Communications, pages 129–138, 2020
work page 2020
-
[5]
{CherryPick}: Adap- tively unearthing the best cloud configurations for big data analytics
Omid Alipourfard, Hongqiang Harry Liu, Jianshu Chen, Shivaram Venkataraman, Minlan Yu, and Ming Zhang. {CherryPick}: Adap- tively unearthing the best cloud configurations for big data analytics. In14th USENIX Symposium on Networked Systems Design and Imple- mentation (NSDI 17), pages 469–482, 2017
work page 2017
-
[6]
arXiv preprint arXiv:2510.14150 , year =
Henrique Assumpção, Diego Ferreira, Leandro Campos, and Fabricio Murai. Codeevolve: An open source evolutionary coding agent for algorithm discovery and optimization.arXiv preprint arXiv:2510.14150, 2025
-
[7]
Workload analysis of a large-scale key-value store
Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. Workload analysis of a large-scale key-value store. In Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint inter- national conference on Measurement and Modeling of Computer Systems, pages 53–64, 2012
work page 2012
-
[8]
{Config-Snob}: Tuning for the best configurations of networking protocol stack
Manaf Bin-Yahya, Yifei Zhao, Hossein Shafieirad, Anthony Ho, Shijun Yin, Fanzhao Wang, and Geng Li. {Config-Snob}: Tuning for the best configurations of networking protocol stack. In2024 USENIX Annual Technical Conference (USENIX ATC 24), pages 749–765, 2024
work page 2024
-
[9]
Sergey Blagodurov, Sergey Zhuravlev, and Alexandra Fedorova. Contention-aware scheduling on multicore systems.ACM Trans- actions on Computer Systems (TOCS), 28(4):1–45, 2010
work page 2010
-
[10]
Contention-aware scheduling on multicore systems.ACM Trans
Sergey Blagodurov, Sergey Zhuravlev, and Alexandra Fedorova. Contention-aware scheduling on multicore systems.ACM Trans. Comput. Syst., 28(4), December 2010
work page 2010
-
[11]
Metastable failures in distributed systems
Nathan Bronson, Abutalib Aghayev, Aleksey Charapko, and Timothy Zhu. Metastable failures in distributed systems. InProceedings of the Workshop on Hot Topics in Operating Systems, HotOS ’21, page 221–227, New York, NY, USA, 2021. Association for Computing Machinery
work page 2021
-
[12]
Carver: Finding important parameters for storage system tuning
Zhen Cao, Geoff Kuenning, and Erez Zadok. Carver: Finding important parameters for storage system tuning. In18th USENIX Conference on File and Storage Technologies (FAST 20), pages 43–57, 2020
work page 2020
-
[13]
SmartChoices: Hybridizing Programming and Machine Learning
Victor Carbune, Thierry Coppey, Alexander Daryin, Thomas Dese- laers, Nikhil Sarda, and Jay Yagnik. Smartchoices: hybridizing pro- gramming and machine learning.arXiv preprint arXiv:1810.00619, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[14]
Mert Cemri, Shubham Agrawal, Akshat Gupta, Shu Liu, Audrey Cheng, Qiuyang Mang, Ashwin Naren, Lutfi Eren Erdogan, Koushik Sen, Matei Zaharia, et al. Adaevolve: Adaptive llm driven zeroth-order optimization.arXiv preprint arXiv:2602.20133, 2026
-
[15]
Autoos: make your os more powerful by exploiting large language models
Huilai Chen, Yuanbo Wen, Limin Cheng, Shouxu Kuang, Yumeng Liu, Weijia Li, Ling Li, Rui Zhang, Xinkai Song, Wei Li, et al. Autoos: make your os more powerful by exploiting large language models. In Forty-first International Conference on Machine Learning, 2024
work page 2024
-
[16]
Jingde Chen, Subho S. Banerjee, Zbigniew T. Kalbarczyk, and Rav- ishankar K. Iyer. Machine learning for load balancing in the linux kernel. InProceedings of the 11th ACM SIGOPS Asia-Pacific Workshop on Systems, pages 67–74, 2020
work page 2020
-
[17]
Principled performance tunability in operating system kernels.arXiv preprint arXiv:2512.12530, 2025
Zhongjie Chen, Wentao Zhang, Yulong Tang, Ran Shu, Fengyuan Ren, Tianyin Xu, and Jing Liu. Principled performance tunability in operating system kernels.arXiv preprint arXiv:2512.12530, 2025
-
[18]
Audrey Cheng, Shu Liu, Melissa Pan, Zhifei Li, Bowen Wang, Alex Krentsel, Tian Xia, Mert Cemri, Jongseok Park, Shuo Yang, et al. Bar- barians at the gate: How ai is upending systems research.arXiv 13 Georgios Liargkovas, Mihir Nitin Joshi, Hubertus Franke, and Kostis Kaffes preprint arXiv:2510.06189, 2025
-
[19]
Chroma-Core.Chroma: The AI-native open-source embedding database,
-
[20]
Accessed: 2026-04-01
work page 2026
-
[21]
Ira Cohen, Jeffrey S Chase, Moises Goldszmidt, Terence Kelly, and Julie Symons. Correlating instrumentation data to system states: A building block for automated diagnosis and control. InOSDI, volume 4, pages 16–16, 2004
work page 2004
-
[22]
Sam Cox. Code execution through deception: Gemini ai cli hi- jack.https://tracebit.com/blog/code-exec-deception-gemini-ai-cli- hijack, July 2025. Tracebit Research Blog. Accessed: 2026-03-19
work page 2025
-
[23]
Mlos: An infrastructure for automated software performance engineering
Carlo Curino, Neha Godwal, Brian Kroth, Sergiy Kuryata, Greg Lapin- ski, Siqi Liu, Slava Oks, Olga Poppe, Adam Smiechowski, Ed Thayer, et al. Mlos: An infrastructure for automated software performance engineering. InProceedings of the Fourth International Workshop on Data Management for End-to-End Machine Learning, pages 1–5, 2020
work page 2020
-
[24]
Oltp-bench: An extensible testbed for benchmarking relational databases.PVLDB, 7(4):277–288, 2013
Djellel Eddine Difallah, Andrew Pavlo, Carlo Curino, and Philippe Cudré-Mauroux. Oltp-bench: An extensible testbed for benchmarking relational databases.PVLDB, 7(4):277–288, 2013
work page 2013
-
[25]
Kleio: A hybrid memory page scheduler with machine intelligence
Thaleia Dimitra Doudali, Sergey Blagodurov, Abhinav Vishnu, Sud- hanva Gurumurthi, and Ada Gavrilovska. Kleio: A hybrid memory page scheduler with machine intelligence. InProceedings of the 28th International symposium on high-performance parallel and distributed computing, pages 37–48, 2019
work page 2019
-
[26]
Machine learning augmented hybrid memory management
Thaleia Dimitra Doudali and Ada Gavrilovska. Machine learning augmented hybrid memory management. InProceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing, HPDC ’21, page 253–254, New York, NY, USA, 2021. Asso- ciation for Computing Machinery
work page 2021
-
[27]
Thaleia Dimitra Doudali, Daniel Zahka, and Ada Gavrilovska. Tun- ing the frequency of periodic data movements over hybrid memory systems.arXiv preprint arXiv:2101.07200, 2021
-
[28]
Tuning database configuration parameters with ituned.Proc
Songyun Duan, Vamsidhar Thummala, and Shivnath Babu. Tuning database configuration parameters with ituned.Proc. VLDB Endow., 2(1):1246–1257, August 2009
work page 2009
-
[29]
Sizeless: Predicting the optimal size of serverless functions
Simon Eismann, Long Bui, Johannes Grohmann, Cristina Abad, Niko- las Herbst, and Samuel Kounev. Sizeless: Predicting the optimal size of serverless functions. InProceedings of the 22nd International Mid- dleware Conference, pages 248–259, 2021
work page 2021
-
[30]
Verify- ing learning-augmented systems
Tomer Eliyahu, Yafim Kazak, Guy Katz, and Michael Schapira. Verify- ing learning-augmented systems. SIGCOMM ’21, page 305–318, New York, NY, USA, 2021. Association for Computing Machinery
work page 2021
-
[31]
Towards a machine learning-assisted kernel with lake
Henrique Fingler, Isha Tarte, Hangchen Yu, Ariel Szekely, Bodun Hu, Aditya Akella, and Christopher J Rossbach. Towards a machine learning-assisted kernel with lake. InProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, pages 846–861, 2023
work page 2023
-
[32]
Tuna: Tuning unstable and noisy cloud applica- tions
Johannes Freischuetz, Konstantinos Kanellis, Brian Kroth, and Shiv- aram Venkataraman. Tuna: Tuning unstable and noisy cloud applica- tions. InProceedings of the Twentieth European Conference on Computer Systems, pages 954–973, 2025
work page 2025
-
[33]
Victor Giannakouris and Immanuel Trummer. 𝜆-tune: Harnessing large language models for automated database system tuning.Pro- ceedings of the ACM on Management of Data, 3(1):1–26, 2025
work page 2025
-
[34]
Google vizier: A service for black-box optimization
Daniel Golovin, Benjamin Solnik, Subhodeep Moitra, Greg Kochanski, John Karro, and David Sculley. Google vizier: A service for black-box optimization. InProceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1487–1495, 2017
work page 2017
-
[35]
Using ebpf hooks to profile linux file system activity across benchmarking workloads
Dhruv Goyal and Sebastian Angel. Using ebpf hooks to profile linux file system activity across benchmarking workloads. 2025
work page 2025
-
[36]
Glia: A Human-Inspired AI for Automated Systems Design and Optimization
Pouya Hamadanian, Pantea Karimi, Arash Nasr-Esfahany, Kimia Noor- bakhsh, Joseph Chandler, Ali ParandehGheibi, Mohammad Alizadeh, and Hari Balakrishnan. Glia: A human-inspired ai for automated systems design and optimization.arXiv preprint arXiv:2510.27176, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[37]
{LinnOS}: Predictability on unpredictable flash storage with a light neural network
Mingzhe Hao, Levent Toksoz, Nanqinqin Li, Edward Edberg Halim, Henry Hoffmann, and Haryadi S Gunawi. {LinnOS}: Predictability on unpredictable flash storage with a light neural network. In14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), pages 173–190, 2020
work page 2020
-
[38]
Zhiyuan He, Aashish Gottipati, Lili Qiu, Yuqing Yang, and Francis Y Yan. Congestion control system optimization with large language models.arXiv preprint arXiv:2508.16074, 2025
-
[39]
Deep q-learning from demonstrations
Todd Hester, Matej Vecerik, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Dan Horgan, John Quan, Andrew Sendonaris, Ian Osband, et al. Deep q-learning from demonstrations. InProceedings of the AAAI conference on artificial intelligence, volume 32, 2018
work page 2018
-
[40]
Metastable failures in the wild
Lexiang Huang, Matthew Magnusson, Abishek Bangalore Muralikr- ishna, Salman Estyak, Rebecca Isaacs, Abutalib Aghayev, Timothy Zhu, and Aleksey Charapko. Metastable failures in the wild. In16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22), pages 73–90, Carlsbad, CA, July 2022. USENIX Association
work page 2022
-
[41]
Jinghao Jia, Raj Sahu, Adam Oswald, Dan Williams, Michael V. Le, and Tianyin Xu. Kernel extension verification is untenable. InHotOS 2023: Proceedings of the 19th Workshop on Hot Topics in Operating Systems, pages 150–157, 2023
work page 2023
-
[42]
Lao Jiale, Wang Jianping, Chen Wanghu, Wang Yibo, Zhang Yunjia, Tang Mingjie, Li Yufei, Cheng Zhiyuan, and Wang Jianguo. Gptuner: A manual-reading database tuning system via gpt-guided bayesian optimization.Proceedings of the VLDB Endowment, 17(8):1939–1952, 2024
work page 1939
-
[43]
Sai Krishna Reddy Kakarla, Francis Y. Yan, and Ryan Beckett. Diffy: Data-driven bug finding for configurations.Proceedings of the ACM on Programming Languages, 8(PLDI), 2024
work page 2024
-
[44]
Herding llamas: Using llms as an os module.arXiv preprint arXiv:2401.08908, 2024
Aditya K Kamath and Sujay Yadalam. Herding llamas: Using llms as an os module.arXiv preprint arXiv:2401.08908, 2024
-
[45]
Melanie Kambadur, Tipp Moseley, Rick Hank, and Martha A. Kim. Measuring interference between live datacenter applications. InSC ’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pages 1–12, 2012
work page 2012
-
[46]
Too many knobs to tune? towards faster database tuning by pre-selecting important knobs
Konstantinos Kanellis, Ramnatthan Alagappan, and Shivaram Venkataraman. Too many knobs to tune? towards faster database tuning by pre-selecting important knobs. In12th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 20), 2020
work page 2020
-
[47]
Llamatune: sample-efficient dbms configuration tuning.arXiv preprint arXiv:2203.05128, 2022
Konstantinos Kanellis, Cong Ding, Brian Kroth, Andreas Müller, Carlo Curino, and Shivaram Venkataraman. Llamatune: sample-efficient dbms configuration tuning.arXiv preprint arXiv:2203.05128, 2022
-
[48]
Nautilus: A benchmarking platform for dbms knob tuning
Konstantinos Kanellis, Johannes Freischuetz, and Shivaram Venkatara- man. Nautilus: A benchmarking platform for dbms knob tuning. In Proceedings of the Eighth Workshop on Data Management for End-to- End Machine Learning, pages 72–76, 2024
work page 2024
-
[49]
Konstantinos Kanellis, Sujay Yadalam, Hayden Coffey, Shivaram Venkataraman, and Michael Swift. From good to great: Parameter tuning in memory tiering systems.IEEE Transactions on Computers, 75(4):1378–1390, 2026
work page 2026
-
[50]
Striking the right chord: Parameter tuning in memory tiering systems
Konstantinos Kanellis, Sujay Yadalam, Shivaram Venkataraman, and Michael Swift. Striking the right chord: Parameter tuning in memory tiering systems. InProceedings of the 3rd Workshop on Disruptive Memory Systems, DIMES ’25, page 1–9, New York, NY, USA, 2025. Association for Computing Machinery
work page 2025
-
[51]
Sweta Karlekar, Carolina Zheng, Magnus Saebo, Nicolas Beltran-Velez, Shuyang Yu, John Bowlan, Michal Kucer, and David Blei. Duel-evolve: Reward-free test-time scaling via llm self-preferences.arXiv preprint arXiv:2602.21585, 2026
-
[52]
{SelfTune }: Tuning cluster managers
Ajaykrishna Karthikeyan, Nagarajan Natarajan, Gagan Somashekar, Lei Zhao, Ranjita Bhagwan, Rodrigo Fonseca, Tatiana Racheva, and Yogesh Bansal. {SelfTune }: Tuning cluster managers. In20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14 SemaTune : Semantic-Aware Online OS Tuning with Large Language Models 23), pages 1097–1114, 2023
work page 2023
-
[53]
Tailbench: a benchmark suite and evaluation methodology for latency-critical applications
Harshad Kasture and Daniel Sanchez. Tailbench: a benchmark suite and evaluation methodology for latency-critical applications. In2016 IEEE International Symposium on Workload Characterization (IISWC), pages 1–10. IEEE, 2016
work page 2016
-
[54]
Exploring the design space of page management for Multi-Tiered memory systems
Jonghyeon Kim, Wonkyo Choe, and Jeongseob Ahn. Exploring the design space of page management for Multi-Tiered memory systems. In2021 USENIX Annual Technical Conference (USENIX ATC 21), pages 715–728. USENIX Association, July 2021
work page 2021
-
[55]
Alexey Kopytov. sysbench: Scriptable database and system perfor- mance benchmark.https://github.com/akopytov/sysbench, 2024. Ver- sion 1.0.20
work page 2024
-
[56]
Brian Kroth, Sergiy Matusevych, Rana Alotaibi, Yiwen Zhu, Anja Gruenheid, and Yuanyuan Tian. Mlos in action: Bridging the gap between experimentation and auto-tuning in the cloud.Proceedings of the VLDB Endowment, 17(12):4269–4272, 2024
work page 2024
-
[57]
Heimdall: Optimizing storage i/o admission with extensive machine learning pipeline
Daniar H Kurniawan, Rani Ayu Putri, Peiran Qin, Kahfi S Zulkifli, Ray AO Sinurat, Janki Bhimani, Sandeep Madireddy, Achmad Imam Kistijantoro, and Haryadi S Gunawi. Heimdall: Optimizing storage i/o admission with extensive machine learning pipeline. InProceedings of the Twentieth European Conference on Computer Systems, pages 1109–1125, 2025
work page 2025
-
[58]
Gptuner: An llm-based database tuning system.ACM SIGMOD Record, 54(1):101– 110, 2025
Jiale Lao, Yibo Wang, Yufei Li, Jianping Wang, Yunjia Zhang, Zhiyuan Cheng, Wanghu Chen, Mingjie Tang, and Jianguo Wang. Gptuner: An llm-based database tuning system.ACM SIGMOD Record, 54(1):101– 110, 2025
work page 2025
-
[59]
Gemini Embedding: Generalizable Embeddings from Gemini
Jinhyuk Lee, Feiyang Chen, Sahil Dua, Daniel Cer, Madhuri Shanbhogue, Iftekhar Naim, Gustavo Hernández Ábrego, Zhe Li, Kaifeng Chen, Henrique Schechter Vera, et al. Gemini embedding: Gen- eralizable embeddings from gemini.arXiv preprint arXiv:2503.07891, 2025
work page internal anchor Pith review arXiv 2025
-
[60]
An expert in residence: LLM agents for always-on operating system tuning
Georgios Liargkovas, Vahab Jabrayilov, Hubertus Franke, and Kostis Kaffes. An expert in residence: LLM agents for always-on operating system tuning. InMachine Learning for Systems 2025, 2025
work page 2025
-
[61]
Bush, Prakash Ramanan, Rajesh Kumar, Thomas Chestna, Yajing Liu, YING LIU, Ye Zhao, Kathryn S
Jianheng Ling, Pratik Worah, Yawen Wang, Yunchuan Kong, Chunlei Wang, Clifford Stein, Diwakar Gupta, Jason Behmer, Logan A. Bush, Prakash Ramanan, Rajesh Kumar, Thomas Chestna, Yajing Liu, YING LIU, Ye Zhao, Kathryn S. McKinley, Meeyoung Park, and Martin Maas. Lava: Lifetime-aware vm allocation with learned distributions and adaptation to mispredictions. ...
work page 2025
-
[62]
Tiered memory management beyond hotness
Jinshu Liu, Hamid Hadian, Hanchen Xu, and Huaicheng Li. Tiered memory management beyond hotness. In19th USENIX Symposium on Operating Systems Design and Implementation (OSDI 25), pages 731–747, 2025
work page 2025
-
[63]
Dimakis, Matei Zaharia, and Ion Stoica
Shu Liu, Mert Cemri, Shubham Agarwal, Alexander Krentsel, Ash- win Naren, Qiuyang Mang, Zhifei Li, Akshat Gupta, Monishwaran Maheswaran, Audrey Cheng, Melissa Pan, Ethan Boneh, Kannan Ram- chandran, Koushik Sen, Alexandros G. Dimakis, Matei Zaharia, and Ion Stoica. Skydiscover: A flexible framework for ai-driven scientific and algorithmic discovery, 2026
work page 2026
-
[64]
Martin Maas, David G Andersen, Michael Isard, Mohammad Mahdi Javanmard, Kathryn S McKinley, and Colin Raffel. Combining ma- chine learning and lifetime-based resource management for memory allocation and beyond.Communications of the ACM, 67(4):87–96, 2024
work page 2024
-
[65]
Aios: Llm agent operating system.arXiv preprint arXiv:2403.16971, 2024
Kai Mei, Xi Zhu, Wujiang Xu, Wenyue Hua, Mingyu Jin, Zelong Li, Shuyuan Xu, Ruosong Ye, Yingqiang Ge, and Yongfeng Zhang. Aios: Llm agent operating system.arXiv preprint arXiv:2403.16971, 2024
-
[66]
Lee Chong Ming. Replit’s ceo apologizes after its ai agent wiped a company’s code base in a test run and lied about it.https://www.businessinsider.com/replit-ceo-apologizes-ai-coding- tool-delete-company-database-2025-7, July 2025. Business Insider, accessed 2026-03-19
work page 2025
-
[67]
AlphaEvolve: A coding agent for scientific and algorithmic discovery
Alexander Novikov, Ngân V˜u, Marvin Eisenberger, Emilien Dupont, Po-Sen Huang, Adam Zsolt Wagner, Sergey Shirobokov, Borislav Ko- zlovskii, Francisco JR Ruiz, Abbas Mehrabian, et al. Alphaevolve: A coding agent for scientific and algorithmic discovery.arXiv preprint arXiv:2506.13131, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[68]
Juliana Alves Pereira, Mathieu Acher, Hugo Martin, Jean-Marc Jézéquel, Goetz Botterweck, and Anthony Ventresque. Learning soft- ware configuration spaces: A systematic literature review.Journal of Systems and Software, 182:111044, 2021
work page 2021
-
[69]
Jia Rao and Cheng-Zhong Xu. Online capacity identification of multi- tier websites using hardware performance counters.IEEE Transactions on Parallel and Distributed Systems, 22(3):426–438, 2010
work page 2010
-
[70]
How i learned to stop worrying and love learned os policies
Divyanshu Saxena, Jiayi Chen, Sujay Yadalam, Yeonju Ro, Rohit Dwivedula, Eric H Campbell, Aditya Akella, Christopher J Rossbach, and Michael Swift. How i learned to stop worrying and love learned os policies. InProceedings of the 2025 Workshop on Hot Topics in Operating Systems, pages 1–7, 2025
work page 2025
-
[71]
On a foundation model for operating systems
Divyanshu Saxena, Nihal Sharma, Donghyun Kim, Rohit Dwivedula, Jiayi Chen, Chenxi Yang, Sriram Ravula, Zichao Hu, Aditya Akella, Sebastian Angel, et al. On a foundation model for operating systems. arXiv preprint arXiv:2312.07813, 2023
-
[72]
Hardware counter driven on-the-fly request signatures
Kai Shen, Ming Zhong, Sandhya Dwarkadas, Chuanpeng Li, Christo- pher Stewart, and Xiao Zhang. Hardware counter driven on-the-fly request signatures. InProceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XIII, page 189–200, New York, NY, USA, 2008. Asso- ciation for Computing Machinery
work page 2008
-
[73]
Jasper Snoek, Hugo Larochelle, and Ryan P Adams. Practical bayesian optimization of machine learning algorithms.Advances in neural information processing systems, 25, 2012
work page 2012
-
[74]
{OPPerTune}:{Post-Deployment} configuration tuning of services made easy
Gagan Somashekar, Karan Tandon, Anush Kini, Chieh-Chun Chang, Petr Husak, Ranjita Bhagwan, Mayukh Das, Anshul Gandhi, and Na- garajan Natarajan. {OPPerTune}:{Post-Deployment} configuration tuning of services made easy. In21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24), pages 1101–1120, 2024
work page 2024
-
[75]
Dcperf: An open- source, battle-tested performance benchmark suite for datacenter workloads
Wei Su, Abhishek Dhanotia, Carlos Torres, Jayneel Gandhi, Neha Gholkar, Shobhit Kanaujia, Maxim Naumov, Kalyan Subramanian, Valentin Andrei, Yifan Yuan, and Chunqiang Tang. Dcperf: An open- source, battle-tested performance benchmark suite for datacenter workloads. InProceedings of the 52nd Annual International Symposium on Computer Architecture, ISCA ’25...
work page 2025
-
[76]
Oqueue: Observable communication in learning directed operating systems
Aditya Atul Tewari, Sujay Yadalam, Arthur Michener Peters, Saurabh Agarwal, Aditya Akella, Michael M Swift, and Christopher J Rossbach. Oqueue: Observable communication in learning directed operating systems. InProceedings of the 4th Workshop on Practical Adoption Challenges of ML for Systems, pages 31–36, 2025
work page 2025
-
[77]
Immanuel Trummer. Db-bert: a database tuning tool that" reads the manual". InProceedings of the 2022 international conference on management of data, pages 190–203, 2022
work page 2022
-
[78]
Automatic database management system tuning through large-scale machine learning
Dana Van Aken, Andrew Pavlo, Geoffrey J Gordon, and Bohan Zhang. Automatic database management system tuning through large-scale machine learning. InProceedings of the 2017 ACM international con- ference on management of data, pages 1009–1024, 2017
work page 2017
-
[79]
Midhul Vuppalapati and Rachit Agarwal. Tiered memory management: Access latency is the key! InProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles, SOSP ’24, page 79–94, New York, NY, USA, 2024. Association for Computing Machinery
work page 2024
-
[80]
Understanding and auto-adjusting performance-sensitive configurations
Shu Wang, Chi Li, Henry Hoffmann, Shan Lu, William Sentosa, and Achmad Imam Kistijantoro. Understanding and auto-adjusting performance-sensitive configurations. InProceedings of the Twenty- Third International Conference on Architectural Support for Program- ming Languages and Operating Systems, ASPLOS ’18, page 154–168, New York, NY, USA, 2018. Associati...
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.