LiveStack: OS Support for Cluster-Scale Full-Stack Live Simulation

Antoine Kaufmann; Haifeng Sun; Jialin Li; Jonas Kaufmann; Yihan Yang; Yiliang Wan

arxiv: 2606.18958 · v1 · pith:GN4JVX4Lnew · submitted 2026-06-17 · 💻 cs.DC · cs.OS

LiveStack: OS Support for Cluster-Scale Full-Stack Live Simulation

Yiliang Wan , Haifeng Sun , Yihan Yang , Jonas Kaufmann , Antoine Kaufmann , Jialin Li This is my paper

Pith reviewed 2026-06-26 19:11 UTC · model grok-4.3

classification 💻 cs.DC cs.OS

keywords full-stack simulationcluster-scale simulationOS virtualizationlive simulationdistributed systemssimulation orchestrationLinux kernel extensions

0 comments

The pith

LiveStack extends Linux virtualization with four subsystems to run unmodified production stacks in cluster-scale simulation while preserving both fidelity and iterative speed.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to demonstrate that cluster-scale full-stack simulation can deliver both complete fidelity for unmodified production software and the performance needed to explore many configurations quickly. No prior method has combined the two at this scale. LiveStack achieves the combination by layering four new mechanisms on the existing Linux virtualization stack to keep live and modeled parts synchronized under one simulated clock while limiting their mutual interference. If the approach holds, developers could test entire distributed systems and new hardware designs on real code before any physical hardware exists. The work frames simulation management itself as a natural operating-system duty rather than an external tool.

Core claim

LiveStack is an OS-level approach to cluster-scale full-stack simulation built on top of the Linux virtualization stack. LiveStack comprises four subsystems: simulation-oriented scheduling, live memory hierarchy management, simulation-aware IPC, and distributed simulation orchestration. Together, they coordinate live and modeled components under shared simulated time while controlling interference among co-located live hosts. These mechanisms point toward simulation-native OS support, where simulation control and orchestration become core OS responsibilities.

What carries the argument

Four coordinated subsystems (simulation-oriented scheduling, live memory hierarchy management, simulation-aware IPC, and distributed simulation orchestration) that keep live and modeled components synchronized under a single simulated timeline.

If this is right

Unmodified production software stacks can be evaluated at cluster scale with full fidelity.
Iterative exploration of hardware and software configurations becomes feasible at usable speeds.
Interference among multiple live hosts sharing simulation resources remains controllable.
Simulation orchestration can be treated as a native operating-system service rather than an external layer.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same coordination pattern could be adapted to other virtualization or container runtimes beyond Linux.
Shared simulated time might simplify debugging of timing-sensitive distributed bugs that are hard to reproduce on real clusters.
Once simulation is inside the OS, hardware-in-the-loop experiments could be scheduled alongside ordinary workloads without separate toolchains.

Load-bearing premise

The Linux virtualization stack can be extended with the four described mechanisms without unacceptable interference or loss of fidelity when live and simulated components share the same resources.

What would settle it

A direct comparison in which an unmodified production distributed application running under LiveStack exhibits either fidelity loss relative to bare hardware or simulation throughput too low for iterative configuration sweeps would disprove the central claim.

Figures

Figures reproduced from arXiv: 2606.18958 by Antoine Kaufmann, Haifeng Sun, Jialin Li, Jonas Kaufmann, Yihan Yang, Yiliang Wan.

**Figure 1.** Figure 1: The overall architecture of LiveStack. four subsystems: simulation-oriented scheduling for virtualtime coordination, live memory hierarchy management for performance isolation, simulation-aware IPC (Inter-Process Communication) for visibility-controlled communication, and distributed orchestration for multi-host scale-out. This allows LiveStack to preserve full-stack fidelity while avoiding the fine-grai… view at source ↗

**Figure 2.** Figure 2: An example LiveStack scheduling timeline. Dispatch. LiveStack coordinates vtasks through configurable synchronization scopes, each of which groups vtasks that should progress together within a bounded virtual-time skew. A vtask may participate in multiple scopes. Common-case dispatch rule. Each scope maintains a cached vtime, defined as the minimum vtime among its runnable members. During dispatch, a runna… view at source ↗

read the original abstract

Cluster-scale full-stack simulation is essential for evaluating distributed software stacks and emerging hardware components before deployment. Such simulation must achieve both full-stack fidelity for the unmodified production stack and the simulation performance required for iterative configuration exploration. However, no existing method achieves both. We present LiveStack, an OS-level approach to cluster-scale full-stack simulation built on top of the Linux virtualization stack. LiveStack comprises four subsystems: simulation-oriented scheduling, live memory hierarchy management, simulation-aware IPC, and distributed simulation orchestration. Together, they coordinate live and modeled components under shared simulated time while controlling interference among co-located live hosts. These mechanisms point toward simulation-native OS support, where simulation control and orchestration become core OS responsibilities.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LiveStack proposes four OS subsystems on Linux virtualization to enable cluster-scale live simulation with both fidelity and speed, but the work is still at the architecture stage with no results shown.

read the letter

LiveStack proposes four OS subsystems on top of the Linux virtualization stack to run live and modeled components together at cluster scale under shared simulated time. The subsystems are simulation-oriented scheduling, live memory hierarchy management, simulation-aware IPC, and distributed simulation orchestration, with the goal of keeping interference low while preserving fidelity for unmodified stacks.

The paper does a clear job stating the requirements—no prior method gets both full fidelity and the performance needed for iterative exploration—and then maps each subsystem to one part of the coordination problem. The combination itself is presented as new relative to the cited prior work, and treating simulation control as an OS responsibility is a direct way to frame the contribution.

The main soft spot is the absence of any implementation measurements or error analysis. The architecture description does not yet show how much interference actually occurs when live hosts share resources with simulated ones, or how close fidelity stays to the production stack. The central assumption—that these extensions can be added without unacceptable side effects—remains untested in the material available, so the practical payoff is still open.

This is the kind of paper that would interest OS and distributed-systems researchers who build or rely on large-scale simulators. A reader looking for concrete subsystem ideas to adapt or extend would get value from the breakdown even before seeing numbers.

It deserves serious peer review because the problem is real for the field and the proposal is specific enough for referees to assess feasibility and suggest the right experiments.

Referee Report

1 major / 0 minor

Summary. The manuscript presents LiveStack, an OS-level approach to cluster-scale full-stack live simulation built on the Linux virtualization stack. It comprises four subsystems—simulation-oriented scheduling, live memory hierarchy management, simulation-aware IPC, and distributed simulation orchestration—that coordinate live and modeled components under shared simulated time while controlling interference among co-located live hosts, with the goal of achieving both full-stack fidelity for unmodified production stacks and the performance needed for iterative configuration exploration.

Significance. If the subsystems deliver the claimed fidelity and performance without unacceptable interference, the work would be significant for enabling pre-deployment evaluation of distributed software stacks and emerging hardware at cluster scale. It proposes making simulation control and orchestration core OS responsibilities, addressing a gap where existing methods fail to achieve both requirements simultaneously.

major comments (1)

[Abstract] Abstract: The central claim that the four subsystems achieve both full-stack fidelity and required simulation performance rests on the unverified assumption that the Linux virtualization stack can be extended with the described mechanisms without loss of fidelity or unacceptable interference; the manuscript provides no implementation details, evaluation data, or error analysis to substantiate this.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the review and the identification of this issue with the abstract. We address the comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that the four subsystems achieve both full-stack fidelity and required simulation performance rests on the unverified assumption that the Linux virtualization stack can be extended with the described mechanisms without loss of fidelity or unacceptable interference; the manuscript provides no implementation details, evaluation data, or error analysis to substantiate this.

Authors: We agree that the abstract states the central claim without sufficient qualification. The manuscript is a design paper whose contribution is the description of the four subsystems and their coordination under shared simulated time. It contains no implementation, no performance measurements, and no error analysis. We will revise the abstract to state that LiveStack is a proposed OS architecture whose mechanisms are designed to achieve the stated goals, with the design arguments for fidelity and interference control presented in the body; we will also add an explicit statement that empirical validation remains future work. This change will remove the unsubstantiated claim from the abstract. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents a systems architecture for LiveStack, an OS-level approach built on the Linux virtualization stack, comprising four described subsystems (simulation-oriented scheduling, live memory hierarchy management, simulation-aware IPC, and distributed simulation orchestration) that coordinate live and modeled components. No equations, fitted parameters, predictions, or derivation chains appear in the provided text. Claims rest on the design of these mechanisms rather than any reduction to prior fitted quantities or self-citation chains. The architecture is self-contained as a proposal for new OS responsibilities, with no load-bearing step that equates outputs to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review; no free parameters, invented entities, or non-standard axioms are identifiable. The approach rests on the domain assumption that Linux virtualization can be safely extended for simulation control.

axioms (1)

domain assumption Linux virtualization stack provides a suitable base for adding simulation-oriented scheduling, memory management, IPC, and orchestration without breaking production fidelity.
The entire system is described as built on top of this stack.

pith-pipeline@v0.9.1-grok · 5655 in / 1173 out tokens · 17934 ms · 2026-06-26T19:11:27.285875+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

54 extracted references · 7 canonical work pages

[1]

Ultra Ethernet Specification v1.0.1.https://ultraethernet.org/ uec-1-0-specAccessed: 2026-05-18

2025. Ultra Ethernet Specification v1.0.1.https://ultraethernet.org/ uec-1-0-specAccessed: 2026-05-18

2025
[2]

Ultra Accelerator Link Specifications.https://ualinkconsortium

2026. Ultra Accelerator Link Specifications.https://ualinkconsortium. org/specification/. Accessed: 2026-05-18

2026
[3]

2024.AMD64 Architecture Programmer’s Manual

Advanced Micro Devices, Inc. 2024.AMD64 Architecture Programmer’s Manual. Accessed: 2026-05-19

2024
[4]

Apache Software Foundation. 2026. Apache Hadoop: Open-Source Framework for Distributed Storage and Processing.https://hadoop. apache.org/Accessed: 2026-05-18

2026
[5]

Apache Software Foundation. 2026. Apache Spark: Unified Analytics Engine for Large-Scale Data Processing.https://spark.apache.org/ Accessed: 2026-05-18

2026
[6]

Songyuan Bai, Hao Zheng, Chen Tian, Xiaoliang Wang, Chang Liu, Xin Jin, Fu Xiao, Qiao Xiang, Wanchun Dou, and Guihai Chen. 2024. Unison: A Parallel-Efficient and User-Transparent Network Simula- tion Kernel. InProceedings of the Nineteenth European Conference on Computer Systems (EuroSys ’24)

2024
[7]

Ben Romdhanne Bilel and Nikaein Navid. 2012. Cunetsim: A gpu based simulation testbed for large scale mobile networks. In2012 In- ternational Conference on Communications and Information Technology (ICCIT). IEEE, 374–378

2012
[8]

T. L. Borden, J. P. Hennessy, and J. W. Rymarczyk. 1989. Multiple Operating Systems on One Processor Complex.IBM Systems Journal 28, 1 (1989), 104–123.https://doi.org/10.1147/sj.281.0104

work page doi:10.1147/sj.281.0104 1989
[9]

CXL Consortium. 2025. Compute Express Link (CXL) Specification Revision 4.0.https://www.computeexpresslink.org/download-the- specification. Accessed: 2026-05-19

2025
[10]

Fares Elsabbagh, Shabnam Sheikhha, Victor A Ying, Quan M Nguyen, Joel S Emer, and Daniel Sanchez. 2023. Accelerating rtl simulation with hardware-software co-design. InProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture. 153–166

2023
[11]

Miquel Ferriol-Galmés, Jordi Paillisse, José Suárez-Varela, Krzysztof Rusek, Shihan Xiao, Xiang Shi, Xiangle Cheng, Pere Barlet-Ros, and Albert Cabellos-Aparicio. 2023. RouteNet-Fermi: Network modeling with graph neural networks.IEEE/ACM transactions on networking31, 6 (2023), 3080–3095

2023
[12]

Daniel Firestone, Andrew Putnam, Sambhrama Mundkur, Derek Chiou, Alireza Dabagh, Mike Andrewartha, Hari Angepat, Vivek Bhanu, Adrian Caulfield, Eric Chung, et al. 2018. Azure accelerated networking:{SmartNICs} in the public cloud. In15th USENIX Sym- posium on Networked Systems Design and Implementation (NSDI 18). 51–66

2018
[13]

Kaihui Gao, Li Chen, Dan Li, Vincent Liu, Xizheng Wang, Ran Zhang, and Lu Lu. 2023. Dons: Fast and affordable discrete event network simulation with automatic parallelization. InProceedings of the ACM SIGCOMM 2023 Conference. 167–181

2023
[14]

gem5 Project. 2026. The gem5 Simulator Project.https://www.gem5. org/. Accessed: 2026-05-19

2026
[15]

Fei Gui, Kaihui Gao, Li Chen, Dan Li, Vincent Liu, Ran Zhang, Hong- bing Yang, and Dian Xiong. 2025. Accelerating Design Space Explo- ration for {LLM} Training Systems with Multi-experiment Parallel Simulation. In22nd USENIX Symposium on Networked Systems Design and Implementation (NSDI 25). 473–488

2025
[16]

Zizheng Guo, Yanqing Zhang, Runsheng Wang, Yibo Lin, and Haoxing Ren. 2025. GEM: GPU-accelerated emulator-inspired RTL simulation. In2025 62nd ACM/IEEE Design Automation Conference (DAC). IEEE, 1–7

2025
[17]

2026.Intel(R) 64 and IA-32 Architectures Software Developer’s Manual

Intel Corporation. 2026.Intel(R) 64 and IA-32 Architectures Software Developer’s Manual. Accessed: 2026-05-19

2026
[18]

Intel Corporation. 2026. Introduction to Memory Bandwidth Alloca- tion.https://www.intel.com/content/www/us/en/developer/articles/ technical/introduction-to-memory-bandwidth-allocation.html. Ac- cessed: 2026-05-20

2026
[19]

Intel Simics. 2026. Simics Full-System Simulator.https://www. intel.com/content/www/us/en/developer/articles/tool/simics- simulator.html. Accessed: 2026-05-19

2026
[20]

Norman P Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al . 2017. In-datacenter performance analysis of a tensor processing unit. InProceedings of the 44th annual international symposium on computer architecture. 1–12

2017
[21]

Sagar Karandikar, Howard Mao, Donggyu Kim, David Biancolin, Alon Amid, Dayeol Lee, Nathan Pemberton, Emmanuel Amaro, Colin Schmidt, Aditya Chopra, et al . 2018. FireSim: FPGA-accelerated cycle-exact scale-out system simulation in the public cloud. In2018 ACM/IEEE 45th Annual International Symposium on Computer Archi- tecture (ISCA). IEEE, 29–42

2018
[22]

Eric Keller, Jakub Szefer, Jennifer Rexford, and Ruby B. Lee. 2010. NoHype: Virtualized Cloud Infrastructure without the Virtualization. InProceedings of the 37th Annual International Symposium on Computer Architecture (ISCA ’10). 350–361.https://doi.org/10.1145/1815961. 1816010

work page doi:10.1145/1815961 2010
[23]

Sajy Khashab, Hariharan Sezhiyan, Rani Abboud, Alex Normatov, Stefan Kaestle, Eliav Bar-Ilan, Mohammad Nassar, Omer Shabtai, Wei Bai, Matty Kadosh, et al. 2025. NSX: Large-Scale Network Simulation on an AI Server. InProceedings of the 2nd Workshop on Networks for AI Computing. 19–25

2025
[24]

Chenning Li, Arash Nasr-Esfahany, Kevin Zhao, Kimia Noorbakhsh, Prateesh Goyal, Mohammad Alizadeh, and Thomas E Anderson. 2024. m3: Accurate flow-level performance estimation using machine learn- ing. InProceedings of the ACM SIGCOMM 2024 Conference. 813–827

2024
[25]

Hejing Li, Jialin Li, and Antoine Kaufmann. 2022. SimBricks: end-to- end network system evaluation with modular simulation. InProceed- ings of the ACM SIGCOMM 2022 Conference. 380–396

2022
[26]

Hejing Li, Marvin Meiers, Jialin Li, and Antoine Kaufmann. 2025. SplitSim: Towards Practical Large-Scale Full-System Simulation for Systems Research.Proceedings of the ACM on Networking3, CoNEXT4 (2025), 1–19.https://doi.org/10.1145/3768999

work page doi:10.1145/3768999 2025
[27]

Qinyong Li, Zhiwei Zhao, Geyong Min, Zi Wang, and Luwei Fu. 2026. GeDES: GPU-Driven Discrete Event Network Simulator. InProceedings of the 21st European Conference on Computer Systems. 468–483

2026
[28]

Tiantian Lin, Cheng Qiu, Xiaohang Wang, Ling Wang, Zhulin Zheng, Yingtao Jiang, Amit Kumar Singh, Jieming Yin, Sihai Qiu, Xiaodong Li, et al. 2025. LEGOSim: A Unified Parallel Simulation Framework for Multi-chiplet Heterogeneous Integration. InProceedings of the 58th IEEE/ACM International Symposium on Microarchitecture. 1347–1362

2025
[29]

Jiacheng Ma, Jonas Kaufmann, Emilien Guandalino, Rishabh Iyer, Bourgeat Thomas, and George Candea. 2025. Fast End-to-End Per- formance Simulation of Accelerated Hardware–Software Stacks. In Proceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles (SOSP ’25)

2025
[30]

Tie Ma, Long Luo, Hongfang Yu, Xi Chen, Jingzhao Xie, Chongxi Ma, Yunhan Xie, Gang Sun, Tianxi Wei, Li Chen, et al. 2024. Klonet: an {Easy-to-Use} and scalable platform for computer networks educa- tion. In21st USENIX Symposium on Networked Systems Design and 6 LiveStack : OS Support for Cluster-Scale Full-Stack Live Simulation Implementation (NSDI 24). 2025–2046

2024
[31]

José Martins, Adriano Tavares, Marco Solieri, Marko Bertogna, and Sandro Pinto. 2020. Bao: A Lightweight Static Partitioning Hypervi- sor for Modern Multi-Core Embedded Systems. InWorkshop on Next Generation Real-Time Embedded Systems (NG-RES), co-located with HiPEAC. 3:1–3:14.https://doi.org/10.4230/OASIcs.NG-RES.2020.3

work page doi:10.4230/oasics.ng-res.2020.3 2020
[32]

2016.NVIDIA NVLink: High-Speed GPU Inter- connect

NVIDIA Corporation. 2016.NVIDIA NVLink: High-Speed GPU Inter- connect. Technical Report. Accessed: 2026-05-18

2016
[33]

NVIDIA Corporation. 2026. NVIDIA Virtual GPU (vGPU) Software. https://docs.nvidia.com/vgpu/. Accessed: 2026-05-19

2026
[34]

OpenInfra Foundation. 2026. The Most Widely Deployed Open Source Cloud Software in the World.https://www.openstack.orgAccessed: 2026-05-19

2026
[35]

QEMU Project. 2026. QEMU: The Fast Processor Emulator.https: //www.qemu.org/. Accessed: 2026-05-18

2026
[36]

Yicheng Qian, Ran Shu, Rui Ma, Yang Wang, Derek Chiou, Nadeen Gebara, Luca Piccolboni, Miriam Leeser, and Yongqiang Xiong. 2025. Miniature: Fast AI Supercomputer Networks Simulation on FPGAs. In Proceedings of the 9th Asia-Pacific Workshop on Networking. 114–120

2025
[37]

Jianxing Qin, Jingrong Chen, Xinhao Kong, Yongji Wu, Tianjun Yuan, Liang Luo, Zhaodong Wang, Ying Zhang, Tingjun Chen, Alvin R Lebeck, et al. 2026. Phantora: Maximizing Code Reuse in Simulation- based Machine Learning System Performance Estimation. In23rd USENIX Symposium on Networked Systems Design and Implementation (NSDI 26). 1809–1825

2026
[38]

Ralf Ramsauer, Jan Kiszka, Daniel Lohmann, and Wolfgang Mauerer
[39]

InProceedings of the 13th Workshop on Operating Systems Platforms for Embedded Real-Time Applications (OSPERT).https://arxiv.org/abs/1705.06932

Look Mum, no VM Exits! (Almost). InProceedings of the 13th Workshop on Operating Systems Platforms for Embedded Real-Time Applications (OSPERT).https://arxiv.org/abs/1705.06932

work page arXiv
[40]

Riley and Thomas R

George F. Riley and Thomas R. Henderson. 2010. The ns-3 Network Simulator. InModeling and Tools for Network Simulation. Springer, 15–34.https://doi.org/10.1007/978-3-642-12331-3_2

work page doi:10.1007/978-3-642-12331-3_2 2010
[41]

Krzysztof Rusek, José Suárez-Varela, Paul Almasan, Pere Barlet-Ros, and Albert Cabellos-Aparicio. 2020. RouteNet: Leveraging graph neural networks for network modeling and optimization in SDN.IEEE Journal on Selected Areas in Communications38, 10 (2020), 2260–2270

2020
[42]

Weihang Shen, Mingcong Han, Jialong Liu, Rong Chen, and Haibo Chen. 2025. {XSched}: Preemptive Scheduling for Diverse {XPUs }. In19th USENIX Symposium on Operating Systems Design and Imple- mentation (OSDI 25). 671–692

2025
[43]

Jovan Stojkovic, Abraham Farrell, Zhangxiaowen Gong, Christopher J Hughes, and Josep Torrellas. 2026. AccelFlow: Orchestrating an On- Package Ensemble of Fine-Grained Accelerators for Microservices. In 2026 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 1–17

2026
[44]

Yan Sun, Yifan Yuan, Zeduo Yu, Reese Kuper, Chihun Song, Jinghan Huang, Houxiang Ji, Siddharth Agarwal, Jiaqi Lou, Ipoom Jeong, et al
[45]

InProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture

Demystifying cxl memory with genuine cxl-ready systems and devices. InProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture. 105–121
[46]

The libvirt Project. 2026. libvirt virtualization API.https://libvirt.org/ Accessed: 2026-05-19

2026
[47]

The Linux Foundation. 2026. Kubernetes: Production-Grade Container Orchestration.https://kubernetes.io/Accessed: 2026-05-19

2026
[48]

The Linux Kernel Foundation. 2026. Cpusets.https://docs.kernel.org/ admin-guide/cgroup-v1/cpusets.html. Accessed: 2026-05-20

2026
[49]

The Linux Kernel Foundation. 2026. KVM: Kernel-based Virtual Ma- chine.https://www.linux-kvm.org/. Accessed: 2026-05-18

2026
[50]

The Linux Kernel Foundation. 2026. KVM PVclock.https://www.linux- kvm.org/page/KVMClock. Accessed: 2026-05-20

2026
[51]

András Varga and Rudolf Hornig. 2008. An Overview of the OM- NeT++ Simulation Environment. InProceedings of the 1st International Conference on Simulation Tools and Techniques for Communications, Networks and Systems (SIMUTools ’08). ICST, Marseille, France, 1–10. https://doi.org/10.4108/ICST.SIMUTOOLS2008.3027

work page doi:10.4108/icst.simutools2008.3027 2008
[52]

Xizheng Wang, Qingxu Li, Yichi Xu, Gang Lu, Dan Li, Li Chen, Heyang Zhou, Linkang Zheng, Sen Zhang, Yikai Zhu, et al. 2025. {SimAI}: uni- fying architecture design and performance tuning for {Large-Scale} large language model training with scalability and precision. In22nd USENIX Symposium on Networked Systems Design and Implementation (NSDI 25). 541–558

2025
[53]

Qingqing Yang, Xi Peng, Li Chen, Libin Liu, Jingze Zhang, Hong Xu, Baochun Li, and Gong Zhang. 2022. DeepQueueNet: Towards scalable and generalized network performance estimation with packet-level visibility. InProceedings of the ACM SIGCOMM 2022 Conference. 441– 457

2022
[54]

Qizhen Zhang, Kelvin KW Ng, Charles Kazer, Shen Yan, João Sedoc, and Vincent Liu. 2021. MimicNet: Fast performance estimates for data center networks with machine learning. InProceedings of the 2021 ACM SIGCOMM 2021 Conference. 287–304. 7

2021

[1] [1]

Ultra Ethernet Specification v1.0.1.https://ultraethernet.org/ uec-1-0-specAccessed: 2026-05-18

2025. Ultra Ethernet Specification v1.0.1.https://ultraethernet.org/ uec-1-0-specAccessed: 2026-05-18

2025

[2] [2]

Ultra Accelerator Link Specifications.https://ualinkconsortium

2026. Ultra Accelerator Link Specifications.https://ualinkconsortium. org/specification/. Accessed: 2026-05-18

2026

[3] [3]

2024.AMD64 Architecture Programmer’s Manual

Advanced Micro Devices, Inc. 2024.AMD64 Architecture Programmer’s Manual. Accessed: 2026-05-19

2024

[4] [4]

Apache Software Foundation. 2026. Apache Hadoop: Open-Source Framework for Distributed Storage and Processing.https://hadoop. apache.org/Accessed: 2026-05-18

2026

[5] [5]

Apache Software Foundation. 2026. Apache Spark: Unified Analytics Engine for Large-Scale Data Processing.https://spark.apache.org/ Accessed: 2026-05-18

2026

[6] [6]

Songyuan Bai, Hao Zheng, Chen Tian, Xiaoliang Wang, Chang Liu, Xin Jin, Fu Xiao, Qiao Xiang, Wanchun Dou, and Guihai Chen. 2024. Unison: A Parallel-Efficient and User-Transparent Network Simula- tion Kernel. InProceedings of the Nineteenth European Conference on Computer Systems (EuroSys ’24)

2024

[7] [7]

Ben Romdhanne Bilel and Nikaein Navid. 2012. Cunetsim: A gpu based simulation testbed for large scale mobile networks. In2012 In- ternational Conference on Communications and Information Technology (ICCIT). IEEE, 374–378

2012

[8] [8]

T. L. Borden, J. P. Hennessy, and J. W. Rymarczyk. 1989. Multiple Operating Systems on One Processor Complex.IBM Systems Journal 28, 1 (1989), 104–123.https://doi.org/10.1147/sj.281.0104

work page doi:10.1147/sj.281.0104 1989

[9] [9]

CXL Consortium. 2025. Compute Express Link (CXL) Specification Revision 4.0.https://www.computeexpresslink.org/download-the- specification. Accessed: 2026-05-19

2025

[10] [10]

Fares Elsabbagh, Shabnam Sheikhha, Victor A Ying, Quan M Nguyen, Joel S Emer, and Daniel Sanchez. 2023. Accelerating rtl simulation with hardware-software co-design. InProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture. 153–166

2023

[11] [11]

Miquel Ferriol-Galmés, Jordi Paillisse, José Suárez-Varela, Krzysztof Rusek, Shihan Xiao, Xiang Shi, Xiangle Cheng, Pere Barlet-Ros, and Albert Cabellos-Aparicio. 2023. RouteNet-Fermi: Network modeling with graph neural networks.IEEE/ACM transactions on networking31, 6 (2023), 3080–3095

2023

[12] [12]

Daniel Firestone, Andrew Putnam, Sambhrama Mundkur, Derek Chiou, Alireza Dabagh, Mike Andrewartha, Hari Angepat, Vivek Bhanu, Adrian Caulfield, Eric Chung, et al. 2018. Azure accelerated networking:{SmartNICs} in the public cloud. In15th USENIX Sym- posium on Networked Systems Design and Implementation (NSDI 18). 51–66

2018

[13] [13]

Kaihui Gao, Li Chen, Dan Li, Vincent Liu, Xizheng Wang, Ran Zhang, and Lu Lu. 2023. Dons: Fast and affordable discrete event network simulation with automatic parallelization. InProceedings of the ACM SIGCOMM 2023 Conference. 167–181

2023

[14] [14]

gem5 Project. 2026. The gem5 Simulator Project.https://www.gem5. org/. Accessed: 2026-05-19

2026

[15] [15]

Fei Gui, Kaihui Gao, Li Chen, Dan Li, Vincent Liu, Ran Zhang, Hong- bing Yang, and Dian Xiong. 2025. Accelerating Design Space Explo- ration for {LLM} Training Systems with Multi-experiment Parallel Simulation. In22nd USENIX Symposium on Networked Systems Design and Implementation (NSDI 25). 473–488

2025

[16] [16]

Zizheng Guo, Yanqing Zhang, Runsheng Wang, Yibo Lin, and Haoxing Ren. 2025. GEM: GPU-accelerated emulator-inspired RTL simulation. In2025 62nd ACM/IEEE Design Automation Conference (DAC). IEEE, 1–7

2025

[17] [17]

2026.Intel(R) 64 and IA-32 Architectures Software Developer’s Manual

Intel Corporation. 2026.Intel(R) 64 and IA-32 Architectures Software Developer’s Manual. Accessed: 2026-05-19

2026

[18] [18]

Intel Corporation. 2026. Introduction to Memory Bandwidth Alloca- tion.https://www.intel.com/content/www/us/en/developer/articles/ technical/introduction-to-memory-bandwidth-allocation.html. Ac- cessed: 2026-05-20

2026

[19] [19]

Intel Simics. 2026. Simics Full-System Simulator.https://www. intel.com/content/www/us/en/developer/articles/tool/simics- simulator.html. Accessed: 2026-05-19

2026

[20] [20]

Norman P Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al . 2017. In-datacenter performance analysis of a tensor processing unit. InProceedings of the 44th annual international symposium on computer architecture. 1–12

2017

[21] [21]

Sagar Karandikar, Howard Mao, Donggyu Kim, David Biancolin, Alon Amid, Dayeol Lee, Nathan Pemberton, Emmanuel Amaro, Colin Schmidt, Aditya Chopra, et al . 2018. FireSim: FPGA-accelerated cycle-exact scale-out system simulation in the public cloud. In2018 ACM/IEEE 45th Annual International Symposium on Computer Archi- tecture (ISCA). IEEE, 29–42

2018

[22] [22]

Eric Keller, Jakub Szefer, Jennifer Rexford, and Ruby B. Lee. 2010. NoHype: Virtualized Cloud Infrastructure without the Virtualization. InProceedings of the 37th Annual International Symposium on Computer Architecture (ISCA ’10). 350–361.https://doi.org/10.1145/1815961. 1816010

work page doi:10.1145/1815961 2010

[23] [23]

Sajy Khashab, Hariharan Sezhiyan, Rani Abboud, Alex Normatov, Stefan Kaestle, Eliav Bar-Ilan, Mohammad Nassar, Omer Shabtai, Wei Bai, Matty Kadosh, et al. 2025. NSX: Large-Scale Network Simulation on an AI Server. InProceedings of the 2nd Workshop on Networks for AI Computing. 19–25

2025

[24] [24]

Chenning Li, Arash Nasr-Esfahany, Kevin Zhao, Kimia Noorbakhsh, Prateesh Goyal, Mohammad Alizadeh, and Thomas E Anderson. 2024. m3: Accurate flow-level performance estimation using machine learn- ing. InProceedings of the ACM SIGCOMM 2024 Conference. 813–827

2024

[25] [25]

Hejing Li, Jialin Li, and Antoine Kaufmann. 2022. SimBricks: end-to- end network system evaluation with modular simulation. InProceed- ings of the ACM SIGCOMM 2022 Conference. 380–396

2022

[26] [26]

Hejing Li, Marvin Meiers, Jialin Li, and Antoine Kaufmann. 2025. SplitSim: Towards Practical Large-Scale Full-System Simulation for Systems Research.Proceedings of the ACM on Networking3, CoNEXT4 (2025), 1–19.https://doi.org/10.1145/3768999

work page doi:10.1145/3768999 2025

[27] [27]

Qinyong Li, Zhiwei Zhao, Geyong Min, Zi Wang, and Luwei Fu. 2026. GeDES: GPU-Driven Discrete Event Network Simulator. InProceedings of the 21st European Conference on Computer Systems. 468–483

2026

[28] [28]

Tiantian Lin, Cheng Qiu, Xiaohang Wang, Ling Wang, Zhulin Zheng, Yingtao Jiang, Amit Kumar Singh, Jieming Yin, Sihai Qiu, Xiaodong Li, et al. 2025. LEGOSim: A Unified Parallel Simulation Framework for Multi-chiplet Heterogeneous Integration. InProceedings of the 58th IEEE/ACM International Symposium on Microarchitecture. 1347–1362

2025

[29] [29]

Jiacheng Ma, Jonas Kaufmann, Emilien Guandalino, Rishabh Iyer, Bourgeat Thomas, and George Candea. 2025. Fast End-to-End Per- formance Simulation of Accelerated Hardware–Software Stacks. In Proceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles (SOSP ’25)

2025

[30] [30]

Tie Ma, Long Luo, Hongfang Yu, Xi Chen, Jingzhao Xie, Chongxi Ma, Yunhan Xie, Gang Sun, Tianxi Wei, Li Chen, et al. 2024. Klonet: an {Easy-to-Use} and scalable platform for computer networks educa- tion. In21st USENIX Symposium on Networked Systems Design and 6 LiveStack : OS Support for Cluster-Scale Full-Stack Live Simulation Implementation (NSDI 24). 2025–2046

2024

[31] [31]

José Martins, Adriano Tavares, Marco Solieri, Marko Bertogna, and Sandro Pinto. 2020. Bao: A Lightweight Static Partitioning Hypervi- sor for Modern Multi-Core Embedded Systems. InWorkshop on Next Generation Real-Time Embedded Systems (NG-RES), co-located with HiPEAC. 3:1–3:14.https://doi.org/10.4230/OASIcs.NG-RES.2020.3

work page doi:10.4230/oasics.ng-res.2020.3 2020

[32] [32]

2016.NVIDIA NVLink: High-Speed GPU Inter- connect

NVIDIA Corporation. 2016.NVIDIA NVLink: High-Speed GPU Inter- connect. Technical Report. Accessed: 2026-05-18

2016

[33] [33]

NVIDIA Corporation. 2026. NVIDIA Virtual GPU (vGPU) Software. https://docs.nvidia.com/vgpu/. Accessed: 2026-05-19

2026

[34] [34]

OpenInfra Foundation. 2026. The Most Widely Deployed Open Source Cloud Software in the World.https://www.openstack.orgAccessed: 2026-05-19

2026

[35] [35]

QEMU Project. 2026. QEMU: The Fast Processor Emulator.https: //www.qemu.org/. Accessed: 2026-05-18

2026

[36] [36]

Yicheng Qian, Ran Shu, Rui Ma, Yang Wang, Derek Chiou, Nadeen Gebara, Luca Piccolboni, Miriam Leeser, and Yongqiang Xiong. 2025. Miniature: Fast AI Supercomputer Networks Simulation on FPGAs. In Proceedings of the 9th Asia-Pacific Workshop on Networking. 114–120

2025

[37] [37]

Jianxing Qin, Jingrong Chen, Xinhao Kong, Yongji Wu, Tianjun Yuan, Liang Luo, Zhaodong Wang, Ying Zhang, Tingjun Chen, Alvin R Lebeck, et al. 2026. Phantora: Maximizing Code Reuse in Simulation- based Machine Learning System Performance Estimation. In23rd USENIX Symposium on Networked Systems Design and Implementation (NSDI 26). 1809–1825

2026

[38] [38]

Ralf Ramsauer, Jan Kiszka, Daniel Lohmann, and Wolfgang Mauerer

[39] [39]

InProceedings of the 13th Workshop on Operating Systems Platforms for Embedded Real-Time Applications (OSPERT).https://arxiv.org/abs/1705.06932

Look Mum, no VM Exits! (Almost). InProceedings of the 13th Workshop on Operating Systems Platforms for Embedded Real-Time Applications (OSPERT).https://arxiv.org/abs/1705.06932

work page arXiv

[40] [40]

Riley and Thomas R

George F. Riley and Thomas R. Henderson. 2010. The ns-3 Network Simulator. InModeling and Tools for Network Simulation. Springer, 15–34.https://doi.org/10.1007/978-3-642-12331-3_2

work page doi:10.1007/978-3-642-12331-3_2 2010

[41] [41]

Krzysztof Rusek, José Suárez-Varela, Paul Almasan, Pere Barlet-Ros, and Albert Cabellos-Aparicio. 2020. RouteNet: Leveraging graph neural networks for network modeling and optimization in SDN.IEEE Journal on Selected Areas in Communications38, 10 (2020), 2260–2270

2020

[42] [42]

Weihang Shen, Mingcong Han, Jialong Liu, Rong Chen, and Haibo Chen. 2025. {XSched}: Preemptive Scheduling for Diverse {XPUs }. In19th USENIX Symposium on Operating Systems Design and Imple- mentation (OSDI 25). 671–692

2025

[43] [43]

Jovan Stojkovic, Abraham Farrell, Zhangxiaowen Gong, Christopher J Hughes, and Josep Torrellas. 2026. AccelFlow: Orchestrating an On- Package Ensemble of Fine-Grained Accelerators for Microservices. In 2026 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 1–17

2026

[44] [44]

Yan Sun, Yifan Yuan, Zeduo Yu, Reese Kuper, Chihun Song, Jinghan Huang, Houxiang Ji, Siddharth Agarwal, Jiaqi Lou, Ipoom Jeong, et al

[45] [45]

InProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture

Demystifying cxl memory with genuine cxl-ready systems and devices. InProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture. 105–121

[46] [46]

The libvirt Project. 2026. libvirt virtualization API.https://libvirt.org/ Accessed: 2026-05-19

2026

[47] [47]

The Linux Foundation. 2026. Kubernetes: Production-Grade Container Orchestration.https://kubernetes.io/Accessed: 2026-05-19

2026

[48] [48]

The Linux Kernel Foundation. 2026. Cpusets.https://docs.kernel.org/ admin-guide/cgroup-v1/cpusets.html. Accessed: 2026-05-20

2026

[49] [49]

The Linux Kernel Foundation. 2026. KVM: Kernel-based Virtual Ma- chine.https://www.linux-kvm.org/. Accessed: 2026-05-18

2026

[50] [50]

The Linux Kernel Foundation. 2026. KVM PVclock.https://www.linux- kvm.org/page/KVMClock. Accessed: 2026-05-20

2026

[51] [51]

András Varga and Rudolf Hornig. 2008. An Overview of the OM- NeT++ Simulation Environment. InProceedings of the 1st International Conference on Simulation Tools and Techniques for Communications, Networks and Systems (SIMUTools ’08). ICST, Marseille, France, 1–10. https://doi.org/10.4108/ICST.SIMUTOOLS2008.3027

work page doi:10.4108/icst.simutools2008.3027 2008

[52] [52]

Xizheng Wang, Qingxu Li, Yichi Xu, Gang Lu, Dan Li, Li Chen, Heyang Zhou, Linkang Zheng, Sen Zhang, Yikai Zhu, et al. 2025. {SimAI}: uni- fying architecture design and performance tuning for {Large-Scale} large language model training with scalability and precision. In22nd USENIX Symposium on Networked Systems Design and Implementation (NSDI 25). 541–558

2025

[53] [53]

Qingqing Yang, Xi Peng, Li Chen, Libin Liu, Jingze Zhang, Hong Xu, Baochun Li, and Gong Zhang. 2022. DeepQueueNet: Towards scalable and generalized network performance estimation with packet-level visibility. InProceedings of the ACM SIGCOMM 2022 Conference. 441– 457

2022

[54] [54]

Qizhen Zhang, Kelvin KW Ng, Charles Kazer, Shen Yan, João Sedoc, and Vincent Liu. 2021. MimicNet: Fast performance estimates for data center networks with machine learning. InProceedings of the 2021 ACM SIGCOMM 2021 Conference. 287–304. 7

2021