pith. sign in

arxiv: 2606.18958 · v1 · pith:GN4JVX4Lnew · submitted 2026-06-17 · 💻 cs.DC · cs.OS

LiveStack: OS Support for Cluster-Scale Full-Stack Live Simulation

Pith reviewed 2026-06-26 19:11 UTC · model grok-4.3

classification 💻 cs.DC cs.OS
keywords full-stack simulationcluster-scale simulationOS virtualizationlive simulationdistributed systemssimulation orchestrationLinux kernel extensions
0
0 comments X

The pith

LiveStack extends Linux virtualization with four subsystems to run unmodified production stacks in cluster-scale simulation while preserving both fidelity and iterative speed.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to demonstrate that cluster-scale full-stack simulation can deliver both complete fidelity for unmodified production software and the performance needed to explore many configurations quickly. No prior method has combined the two at this scale. LiveStack achieves the combination by layering four new mechanisms on the existing Linux virtualization stack to keep live and modeled parts synchronized under one simulated clock while limiting their mutual interference. If the approach holds, developers could test entire distributed systems and new hardware designs on real code before any physical hardware exists. The work frames simulation management itself as a natural operating-system duty rather than an external tool.

Core claim

LiveStack is an OS-level approach to cluster-scale full-stack simulation built on top of the Linux virtualization stack. LiveStack comprises four subsystems: simulation-oriented scheduling, live memory hierarchy management, simulation-aware IPC, and distributed simulation orchestration. Together, they coordinate live and modeled components under shared simulated time while controlling interference among co-located live hosts. These mechanisms point toward simulation-native OS support, where simulation control and orchestration become core OS responsibilities.

What carries the argument

Four coordinated subsystems (simulation-oriented scheduling, live memory hierarchy management, simulation-aware IPC, and distributed simulation orchestration) that keep live and modeled components synchronized under a single simulated timeline.

If this is right

  • Unmodified production software stacks can be evaluated at cluster scale with full fidelity.
  • Iterative exploration of hardware and software configurations becomes feasible at usable speeds.
  • Interference among multiple live hosts sharing simulation resources remains controllable.
  • Simulation orchestration can be treated as a native operating-system service rather than an external layer.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same coordination pattern could be adapted to other virtualization or container runtimes beyond Linux.
  • Shared simulated time might simplify debugging of timing-sensitive distributed bugs that are hard to reproduce on real clusters.
  • Once simulation is inside the OS, hardware-in-the-loop experiments could be scheduled alongside ordinary workloads without separate toolchains.

Load-bearing premise

The Linux virtualization stack can be extended with the four described mechanisms without unacceptable interference or loss of fidelity when live and simulated components share the same resources.

What would settle it

A direct comparison in which an unmodified production distributed application running under LiveStack exhibits either fidelity loss relative to bare hardware or simulation throughput too low for iterative configuration sweeps would disprove the central claim.

Figures

Figures reproduced from arXiv: 2606.18958 by Antoine Kaufmann, Haifeng Sun, Jialin Li, Jonas Kaufmann, Yihan Yang, Yiliang Wan.

Figure 1
Figure 1. Figure 1: The overall architecture of LiveStack. four subsystems: simulation-oriented scheduling for virtual￾time coordination, live memory hierarchy management for performance isolation, simulation-aware IPC (Inter-Process Communication) for visibility-controlled communication, and distributed orchestration for multi-host scale-out. This al￾lows LiveStack to preserve full-stack fidelity while avoiding the fine-grai… view at source ↗
Figure 2
Figure 2. Figure 2: An example LiveStack scheduling timeline. Dispatch. LiveStack coordinates vtasks through configurable synchronization scopes, each of which groups vtasks that should progress together within a bounded virtual-time skew. A vtask may participate in multiple scopes. Common-case dispatch rule. Each scope maintains a cached vtime, defined as the minimum vtime among its runnable members. During dispatch, a runna… view at source ↗
read the original abstract

Cluster-scale full-stack simulation is essential for evaluating distributed software stacks and emerging hardware components before deployment. Such simulation must achieve both full-stack fidelity for the unmodified production stack and the simulation performance required for iterative configuration exploration. However, no existing method achieves both. We present LiveStack, an OS-level approach to cluster-scale full-stack simulation built on top of the Linux virtualization stack. LiveStack comprises four subsystems: simulation-oriented scheduling, live memory hierarchy management, simulation-aware IPC, and distributed simulation orchestration. Together, they coordinate live and modeled components under shared simulated time while controlling interference among co-located live hosts. These mechanisms point toward simulation-native OS support, where simulation control and orchestration become core OS responsibilities.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript presents LiveStack, an OS-level approach to cluster-scale full-stack live simulation built on the Linux virtualization stack. It comprises four subsystems—simulation-oriented scheduling, live memory hierarchy management, simulation-aware IPC, and distributed simulation orchestration—that coordinate live and modeled components under shared simulated time while controlling interference among co-located live hosts, with the goal of achieving both full-stack fidelity for unmodified production stacks and the performance needed for iterative configuration exploration.

Significance. If the subsystems deliver the claimed fidelity and performance without unacceptable interference, the work would be significant for enabling pre-deployment evaluation of distributed software stacks and emerging hardware at cluster scale. It proposes making simulation control and orchestration core OS responsibilities, addressing a gap where existing methods fail to achieve both requirements simultaneously.

major comments (1)
  1. [Abstract] Abstract: The central claim that the four subsystems achieve both full-stack fidelity and required simulation performance rests on the unverified assumption that the Linux virtualization stack can be extended with the described mechanisms without loss of fidelity or unacceptable interference; the manuscript provides no implementation details, evaluation data, or error analysis to substantiate this.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the review and the identification of this issue with the abstract. We address the comment below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that the four subsystems achieve both full-stack fidelity and required simulation performance rests on the unverified assumption that the Linux virtualization stack can be extended with the described mechanisms without loss of fidelity or unacceptable interference; the manuscript provides no implementation details, evaluation data, or error analysis to substantiate this.

    Authors: We agree that the abstract states the central claim without sufficient qualification. The manuscript is a design paper whose contribution is the description of the four subsystems and their coordination under shared simulated time. It contains no implementation, no performance measurements, and no error analysis. We will revise the abstract to state that LiveStack is a proposed OS architecture whose mechanisms are designed to achieve the stated goals, with the design arguments for fidelity and interference control presented in the body; we will also add an explicit statement that empirical validation remains future work. This change will remove the unsubstantiated claim from the abstract. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents a systems architecture for LiveStack, an OS-level approach built on the Linux virtualization stack, comprising four described subsystems (simulation-oriented scheduling, live memory hierarchy management, simulation-aware IPC, and distributed simulation orchestration) that coordinate live and modeled components. No equations, fitted parameters, predictions, or derivation chains appear in the provided text. Claims rest on the design of these mechanisms rather than any reduction to prior fitted quantities or self-citation chains. The architecture is self-contained as a proposal for new OS responsibilities, with no load-bearing step that equates outputs to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review; no free parameters, invented entities, or non-standard axioms are identifiable. The approach rests on the domain assumption that Linux virtualization can be safely extended for simulation control.

axioms (1)
  • domain assumption Linux virtualization stack provides a suitable base for adding simulation-oriented scheduling, memory management, IPC, and orchestration without breaking production fidelity.
    The entire system is described as built on top of this stack.

pith-pipeline@v0.9.1-grok · 5655 in / 1173 out tokens · 17934 ms · 2026-06-26T19:11:27.285875+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

54 extracted references · 7 canonical work pages

  1. [1]

    Ultra Ethernet Specification v1.0.1.https://ultraethernet.org/ uec-1-0-specAccessed: 2026-05-18

    2025. Ultra Ethernet Specification v1.0.1.https://ultraethernet.org/ uec-1-0-specAccessed: 2026-05-18

  2. [2]

    Ultra Accelerator Link Specifications.https://ualinkconsortium

    2026. Ultra Accelerator Link Specifications.https://ualinkconsortium. org/specification/. Accessed: 2026-05-18

  3. [3]

    2024.AMD64 Architecture Programmer’s Manual

    Advanced Micro Devices, Inc. 2024.AMD64 Architecture Programmer’s Manual. Accessed: 2026-05-19

  4. [4]

    Apache Software Foundation. 2026. Apache Hadoop: Open-Source Framework for Distributed Storage and Processing.https://hadoop. apache.org/Accessed: 2026-05-18

  5. [5]

    Apache Software Foundation. 2026. Apache Spark: Unified Analytics Engine for Large-Scale Data Processing.https://spark.apache.org/ Accessed: 2026-05-18

  6. [6]

    Songyuan Bai, Hao Zheng, Chen Tian, Xiaoliang Wang, Chang Liu, Xin Jin, Fu Xiao, Qiao Xiang, Wanchun Dou, and Guihai Chen. 2024. Unison: A Parallel-Efficient and User-Transparent Network Simula- tion Kernel. InProceedings of the Nineteenth European Conference on Computer Systems (EuroSys ’24)

  7. [7]

    Ben Romdhanne Bilel and Nikaein Navid. 2012. Cunetsim: A gpu based simulation testbed for large scale mobile networks. In2012 In- ternational Conference on Communications and Information Technology (ICCIT). IEEE, 374–378

  8. [8]

    T. L. Borden, J. P. Hennessy, and J. W. Rymarczyk. 1989. Multiple Operating Systems on One Processor Complex.IBM Systems Journal 28, 1 (1989), 104–123.https://doi.org/10.1147/sj.281.0104

  9. [9]

    CXL Consortium. 2025. Compute Express Link (CXL) Specification Revision 4.0.https://www.computeexpresslink.org/download-the- specification. Accessed: 2026-05-19

  10. [10]

    Fares Elsabbagh, Shabnam Sheikhha, Victor A Ying, Quan M Nguyen, Joel S Emer, and Daniel Sanchez. 2023. Accelerating rtl simulation with hardware-software co-design. InProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture. 153–166

  11. [11]

    Miquel Ferriol-Galmés, Jordi Paillisse, José Suárez-Varela, Krzysztof Rusek, Shihan Xiao, Xiang Shi, Xiangle Cheng, Pere Barlet-Ros, and Albert Cabellos-Aparicio. 2023. RouteNet-Fermi: Network modeling with graph neural networks.IEEE/ACM transactions on networking31, 6 (2023), 3080–3095

  12. [12]

    Daniel Firestone, Andrew Putnam, Sambhrama Mundkur, Derek Chiou, Alireza Dabagh, Mike Andrewartha, Hari Angepat, Vivek Bhanu, Adrian Caulfield, Eric Chung, et al. 2018. Azure accelerated networking:{SmartNICs} in the public cloud. In15th USENIX Sym- posium on Networked Systems Design and Implementation (NSDI 18). 51–66

  13. [13]

    Kaihui Gao, Li Chen, Dan Li, Vincent Liu, Xizheng Wang, Ran Zhang, and Lu Lu. 2023. Dons: Fast and affordable discrete event network simulation with automatic parallelization. InProceedings of the ACM SIGCOMM 2023 Conference. 167–181

  14. [14]

    gem5 Project. 2026. The gem5 Simulator Project.https://www.gem5. org/. Accessed: 2026-05-19

  15. [15]

    Fei Gui, Kaihui Gao, Li Chen, Dan Li, Vincent Liu, Ran Zhang, Hong- bing Yang, and Dian Xiong. 2025. Accelerating Design Space Explo- ration for {LLM} Training Systems with Multi-experiment Parallel Simulation. In22nd USENIX Symposium on Networked Systems Design and Implementation (NSDI 25). 473–488

  16. [16]

    Zizheng Guo, Yanqing Zhang, Runsheng Wang, Yibo Lin, and Haoxing Ren. 2025. GEM: GPU-accelerated emulator-inspired RTL simulation. In2025 62nd ACM/IEEE Design Automation Conference (DAC). IEEE, 1–7

  17. [17]

    2026.Intel(R) 64 and IA-32 Architectures Software Developer’s Manual

    Intel Corporation. 2026.Intel(R) 64 and IA-32 Architectures Software Developer’s Manual. Accessed: 2026-05-19

  18. [18]

    Intel Corporation. 2026. Introduction to Memory Bandwidth Alloca- tion.https://www.intel.com/content/www/us/en/developer/articles/ technical/introduction-to-memory-bandwidth-allocation.html. Ac- cessed: 2026-05-20

  19. [19]

    Intel Simics. 2026. Simics Full-System Simulator.https://www. intel.com/content/www/us/en/developer/articles/tool/simics- simulator.html. Accessed: 2026-05-19

  20. [20]

    Norman P Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al . 2017. In-datacenter performance analysis of a tensor processing unit. InProceedings of the 44th annual international symposium on computer architecture. 1–12

  21. [21]

    Sagar Karandikar, Howard Mao, Donggyu Kim, David Biancolin, Alon Amid, Dayeol Lee, Nathan Pemberton, Emmanuel Amaro, Colin Schmidt, Aditya Chopra, et al . 2018. FireSim: FPGA-accelerated cycle-exact scale-out system simulation in the public cloud. In2018 ACM/IEEE 45th Annual International Symposium on Computer Archi- tecture (ISCA). IEEE, 29–42

  22. [22]

    Eric Keller, Jakub Szefer, Jennifer Rexford, and Ruby B. Lee. 2010. NoHype: Virtualized Cloud Infrastructure without the Virtualization. InProceedings of the 37th Annual International Symposium on Computer Architecture (ISCA ’10). 350–361.https://doi.org/10.1145/1815961. 1816010

  23. [23]

    Sajy Khashab, Hariharan Sezhiyan, Rani Abboud, Alex Normatov, Stefan Kaestle, Eliav Bar-Ilan, Mohammad Nassar, Omer Shabtai, Wei Bai, Matty Kadosh, et al. 2025. NSX: Large-Scale Network Simulation on an AI Server. InProceedings of the 2nd Workshop on Networks for AI Computing. 19–25

  24. [24]

    Chenning Li, Arash Nasr-Esfahany, Kevin Zhao, Kimia Noorbakhsh, Prateesh Goyal, Mohammad Alizadeh, and Thomas E Anderson. 2024. m3: Accurate flow-level performance estimation using machine learn- ing. InProceedings of the ACM SIGCOMM 2024 Conference. 813–827

  25. [25]

    Hejing Li, Jialin Li, and Antoine Kaufmann. 2022. SimBricks: end-to- end network system evaluation with modular simulation. InProceed- ings of the ACM SIGCOMM 2022 Conference. 380–396

  26. [26]

    Hejing Li, Marvin Meiers, Jialin Li, and Antoine Kaufmann. 2025. SplitSim: Towards Practical Large-Scale Full-System Simulation for Systems Research.Proceedings of the ACM on Networking3, CoNEXT4 (2025), 1–19.https://doi.org/10.1145/3768999

  27. [27]

    Qinyong Li, Zhiwei Zhao, Geyong Min, Zi Wang, and Luwei Fu. 2026. GeDES: GPU-Driven Discrete Event Network Simulator. InProceedings of the 21st European Conference on Computer Systems. 468–483

  28. [28]

    Tiantian Lin, Cheng Qiu, Xiaohang Wang, Ling Wang, Zhulin Zheng, Yingtao Jiang, Amit Kumar Singh, Jieming Yin, Sihai Qiu, Xiaodong Li, et al. 2025. LEGOSim: A Unified Parallel Simulation Framework for Multi-chiplet Heterogeneous Integration. InProceedings of the 58th IEEE/ACM International Symposium on Microarchitecture. 1347–1362

  29. [29]

    Jiacheng Ma, Jonas Kaufmann, Emilien Guandalino, Rishabh Iyer, Bourgeat Thomas, and George Candea. 2025. Fast End-to-End Per- formance Simulation of Accelerated Hardware–Software Stacks. In Proceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles (SOSP ’25)

  30. [30]

    Tie Ma, Long Luo, Hongfang Yu, Xi Chen, Jingzhao Xie, Chongxi Ma, Yunhan Xie, Gang Sun, Tianxi Wei, Li Chen, et al. 2024. Klonet: an {Easy-to-Use} and scalable platform for computer networks educa- tion. In21st USENIX Symposium on Networked Systems Design and 6 LiveStack : OS Support for Cluster-Scale Full-Stack Live Simulation Implementation (NSDI 24). 2025–2046

  31. [31]

    José Martins, Adriano Tavares, Marco Solieri, Marko Bertogna, and Sandro Pinto. 2020. Bao: A Lightweight Static Partitioning Hypervi- sor for Modern Multi-Core Embedded Systems. InWorkshop on Next Generation Real-Time Embedded Systems (NG-RES), co-located with HiPEAC. 3:1–3:14.https://doi.org/10.4230/OASIcs.NG-RES.2020.3

  32. [32]

    2016.NVIDIA NVLink: High-Speed GPU Inter- connect

    NVIDIA Corporation. 2016.NVIDIA NVLink: High-Speed GPU Inter- connect. Technical Report. Accessed: 2026-05-18

  33. [33]

    NVIDIA Corporation. 2026. NVIDIA Virtual GPU (vGPU) Software. https://docs.nvidia.com/vgpu/. Accessed: 2026-05-19

  34. [34]

    OpenInfra Foundation. 2026. The Most Widely Deployed Open Source Cloud Software in the World.https://www.openstack.orgAccessed: 2026-05-19

  35. [35]

    QEMU Project. 2026. QEMU: The Fast Processor Emulator.https: //www.qemu.org/. Accessed: 2026-05-18

  36. [36]

    Yicheng Qian, Ran Shu, Rui Ma, Yang Wang, Derek Chiou, Nadeen Gebara, Luca Piccolboni, Miriam Leeser, and Yongqiang Xiong. 2025. Miniature: Fast AI Supercomputer Networks Simulation on FPGAs. In Proceedings of the 9th Asia-Pacific Workshop on Networking. 114–120

  37. [37]

    Jianxing Qin, Jingrong Chen, Xinhao Kong, Yongji Wu, Tianjun Yuan, Liang Luo, Zhaodong Wang, Ying Zhang, Tingjun Chen, Alvin R Lebeck, et al. 2026. Phantora: Maximizing Code Reuse in Simulation- based Machine Learning System Performance Estimation. In23rd USENIX Symposium on Networked Systems Design and Implementation (NSDI 26). 1809–1825

  38. [38]

    Ralf Ramsauer, Jan Kiszka, Daniel Lohmann, and Wolfgang Mauerer

  39. [39]

    InProceedings of the 13th Workshop on Operating Systems Platforms for Embedded Real-Time Applications (OSPERT).https://arxiv.org/abs/1705.06932

    Look Mum, no VM Exits! (Almost). InProceedings of the 13th Workshop on Operating Systems Platforms for Embedded Real-Time Applications (OSPERT).https://arxiv.org/abs/1705.06932

  40. [40]

    Riley and Thomas R

    George F. Riley and Thomas R. Henderson. 2010. The ns-3 Network Simulator. InModeling and Tools for Network Simulation. Springer, 15–34.https://doi.org/10.1007/978-3-642-12331-3_2

  41. [41]

    Krzysztof Rusek, José Suárez-Varela, Paul Almasan, Pere Barlet-Ros, and Albert Cabellos-Aparicio. 2020. RouteNet: Leveraging graph neural networks for network modeling and optimization in SDN.IEEE Journal on Selected Areas in Communications38, 10 (2020), 2260–2270

  42. [42]

    Weihang Shen, Mingcong Han, Jialong Liu, Rong Chen, and Haibo Chen. 2025. {XSched}: Preemptive Scheduling for Diverse {XPUs }. In19th USENIX Symposium on Operating Systems Design and Imple- mentation (OSDI 25). 671–692

  43. [43]

    Jovan Stojkovic, Abraham Farrell, Zhangxiaowen Gong, Christopher J Hughes, and Josep Torrellas. 2026. AccelFlow: Orchestrating an On- Package Ensemble of Fine-Grained Accelerators for Microservices. In 2026 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 1–17

  44. [44]

    Yan Sun, Yifan Yuan, Zeduo Yu, Reese Kuper, Chihun Song, Jinghan Huang, Houxiang Ji, Siddharth Agarwal, Jiaqi Lou, Ipoom Jeong, et al

  45. [45]

    InProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture

    Demystifying cxl memory with genuine cxl-ready systems and devices. InProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture. 105–121

  46. [46]

    The libvirt Project. 2026. libvirt virtualization API.https://libvirt.org/ Accessed: 2026-05-19

  47. [47]

    The Linux Foundation. 2026. Kubernetes: Production-Grade Container Orchestration.https://kubernetes.io/Accessed: 2026-05-19

  48. [48]

    The Linux Kernel Foundation. 2026. Cpusets.https://docs.kernel.org/ admin-guide/cgroup-v1/cpusets.html. Accessed: 2026-05-20

  49. [49]

    The Linux Kernel Foundation. 2026. KVM: Kernel-based Virtual Ma- chine.https://www.linux-kvm.org/. Accessed: 2026-05-18

  50. [50]

    The Linux Kernel Foundation. 2026. KVM PVclock.https://www.linux- kvm.org/page/KVMClock. Accessed: 2026-05-20

  51. [51]

    András Varga and Rudolf Hornig. 2008. An Overview of the OM- NeT++ Simulation Environment. InProceedings of the 1st International Conference on Simulation Tools and Techniques for Communications, Networks and Systems (SIMUTools ’08). ICST, Marseille, France, 1–10. https://doi.org/10.4108/ICST.SIMUTOOLS2008.3027

  52. [52]

    Xizheng Wang, Qingxu Li, Yichi Xu, Gang Lu, Dan Li, Li Chen, Heyang Zhou, Linkang Zheng, Sen Zhang, Yikai Zhu, et al. 2025. {SimAI}: uni- fying architecture design and performance tuning for {Large-Scale} large language model training with scalability and precision. In22nd USENIX Symposium on Networked Systems Design and Implementation (NSDI 25). 541–558

  53. [53]

    Qingqing Yang, Xi Peng, Li Chen, Libin Liu, Jingze Zhang, Hong Xu, Baochun Li, and Gong Zhang. 2022. DeepQueueNet: Towards scalable and generalized network performance estimation with packet-level visibility. InProceedings of the ACM SIGCOMM 2022 Conference. 441– 457

  54. [54]

    Qizhen Zhang, Kelvin KW Ng, Charles Kazer, Shen Yan, João Sedoc, and Vincent Liu. 2021. MimicNet: Fast performance estimates for data center networks with machine learning. InProceedings of the 2021 ACM SIGCOMM 2021 Conference. 287–304. 7