pith. sign in

arxiv: 2309.14016 · v5 · submitted 2023-09-25 · 💻 cs.NI · cs.OS

Tail Contagion: Sub-microsecond Time Protection in Shared Software Network Datapaths

Pith reviewed 2026-05-24 06:40 UTC · model grok-4.3

classification 💻 cs.NI cs.OS
keywords tail latency isolationshared software datapathsCPU-time budgetsrun-to-completion loopsnetwork virtualizationcross-tenant isolationTAS TCP stack
0
0 comments X

The pith

Virtuoso enforces per-tenant CPU-time budgets at intervention points to isolate tail latency in shared software datapaths.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Shared software datapaths handle virtual switching and similar functions but create tail latency problems when tenants share cores because processing costs per packet can vary widely. Existing solutions either waste cores through partitioning or rely on throughput limits that fail to control time-based interference. Virtuoso instead tracks and caps CPU time per tenant at selected points inside run-to-completion loops. This time-based control delivers strong isolation while avoiding preemption and keeping overhead low enough for microsecond-scale operation. A TAS TCP stack case study shows the approach cuts victim tail latency by 7.8 times under attack, holds throughput within 5 percent of baseline, and raises per-core efficiency threefold versus separate datapaths.

Core claim

The paper establishes that enforcing per-tenant CPU-time budgets at datapath intervention points inside run-to-completion loops supplies strong cross-tenant tail latency isolation in shared software network datapaths while preserving low overhead and microsecond-scale latency.

What carries the argument

Per-tenant CPU-time budgets enforced at a small number of intervention points within run-to-completion loops.

If this is right

  • Victim tail latency falls by 7.8X under adversarial interference in the TAS TCP stack instantiation.
  • Throughput remains within 5 percent of the unmodified TAS stack.
  • Per-core efficiency rises by 3X relative to siloed datapaths under bursty workloads.
  • Microsecond-scale latency and low overhead are retained without preemption.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same intervention-point pattern may apply to other shared network functions such as virtual switches if comparable control locations exist.
  • Operators could safely multiplex more tenants onto each core, reducing the total number of cores needed for a given workload mix.
  • Testing whether adding or moving intervention points dynamically improves protection against workload changes would be a direct next measurement.

Load-bearing premise

Instrumenting a small number of fixed intervention points inside the loops suffices to bound interference even when packet processing costs vary arbitrarily across tenants.

What would settle it

Measure whether tail latency of a victim tenant still rises sharply when an adversary sends packets whose processing cost spikes between the chosen intervention points.

Figures

Figures reproduced from arXiv: 2309.14016 by Antoine Kaufmann, Liam Arzola, Matheus Stolet, Simon Peter.

Figure 1
Figure 1. Figure 1: Layered and independent virtualized stacks. stack exposed with application interfaces as the abstraction boundary instead of a virtual NIC can improve utilization and allows rapid and flexible deployment, thus accelerat￾ing innovation in the cloud [64]. Performance isolation can be implemented as part of the shared stack to prevent VM interference. Backward compatibility and integration with existing syste… view at source ↗
Figure 2
Figure 2. Figure 2: Fast-path manages TX and RX; slow-path handles control operations. Legacy applications follow a layered legacy path. another layered architecture with the challenges above. Hard￾ware offload also gives rise to other performance isolation challenges with shared hardware resources, such as the NIC, PCIe interconnect, and IOMMU [1]. Finally, software solu￾tions remain relevant because of their comparative fle… view at source ↗
Figure 3
Figure 3. Figure 3: The fast path routes packets to VMs with cached state; the slow path fetches tunnel headers on cache misses. operations and exceptions. Virtuoso moves congestion con￾trol updates (still enforced in fast-path), connection control, timeouts, and error handling to the slow path. Dividing tasks between a fast-path and a slow-path allows us to reduce over￾heads by streamlining the fast-path. For initialization … view at source ↗
Figure 4
Figure 4. Figure 4: Guest VM latency and throughput with variable boost, bud￾get caps, and update periods, under adversarial interference. Non-TCP packets are forwarded to guests through legacy interfaces (vNICs or veth) for processing in the legacy stack. 4.2 CPU Resource Accounting Core-local resource accounting. The first step towards isolation is to accurately account resource use. Each fast￾path core tracks resources ava… view at source ↗
Figure 5
Figure 5. Figure 5: Fast-path cores utilize a guest’s local budget for processing tasks; all tasks measure resource consumption, with the slow-path periodically replenishing budgets through background load. The aggressor creates a load im￾balance by using 9 cores to open a total of 900 connections and the victim opens one connection in one core. If the boost parameter is excessively high, the performance of the guest is affec… view at source ↗
Figure 6
Figure 6. Figure 6: shows the per-core aggregate request throughput across all guests. We obtained throughput numbers by di￾viding the aggregate throughput by the number of fast-path cores used by Virtuoso and the baseline (TAS fast-path cores in VM and OvS polling cores). Virtuoso sees its resource effi￾ciency increase because adding VMs increases the through￾put. This increase slows down as the fast-path cores in Virtu￾oso … view at source ↗
Figure 7
Figure 7. Figure 7: Virtuoso guests achieve tail latency on par with siloed OvS￾TAS and higher throughput under adversarial interference. 6.2 Fine-grained Scheduling Isolates VMs Next, we evaluate Virtuoso ability to isolate guests despite sharing a network stack and underlying resources. To that end, we evaluate two main performance metrics, latency and throughput, for a "victim" guest while a separate aggressor guest attemp… view at source ↗
Figure 8
Figure 8. Figure 8: RPC latencies across different network stacks: For long-lived connections Virtuoso adds minimal overhead relative to TAS, and keep competitive tail latencies for for short lived connections. The results in [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗
Figure 10
Figure 10. Figure 10: Virtuoso significantly outperforms alternative stacks even with many guests. 6.4 Virtuoso Scales to Many Guests We evaluate guest scalability in Virtuoso. For each run we provision two cores for each guest VM and measure the ag￾gregate throughput as the number of guests increases. Each VM runs an RPC echo server loaded 500 connections send￾ing 64 B messages. We use four fast-path cores for Virtuoso with o… view at source ↗
read the original abstract

Shared software datapaths underpin modern datacentre networking. They implement mechanisms such as virtual switching, network virtualisation tunneling, or reliable transport, and enforce policies, such as tenant rate limits, virtual network isolation, or congestion control. However, because multiple applications, containers, or VMs share them, often across tenants, they pose a tail latency isolation challenge. Current isolation approaches either sacrifice efficiency via coarse-grained core partitioning or provide weak tail latency isolation when sharing cores with basic rate limits. This paper presents Virtuoso, a time protection mechanism for shared software datapaths that provides strong cross-tenant tail latency isolation while preserving low overhead and microsecond-scale latency. Our key insight is that tail latency is fundamentally a time metric, so byte or packet throughput is the wrong metric for controlling interference when packet processing costs vary. Our design instead enforces isolation through per-tenant CPU-time budgets at datapath intervention points within run-to-completion loops, without relying on preemption. In a case study, we instantiate Virtuoso in the TAS TCP stack and demonstrate a 7.8X reduction in victim tail latency under adversarial interference while keeping throughput within 5% of unmodified TAS. We also observe a 3X per-core efficiency improvement compared to siloed datapaths under bursty workloads.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents Virtuoso, a time-protection mechanism for shared software network datapaths (e.g., virtual switches, tunneling) that enforces per-tenant CPU-time budgets at a small number of intervention points inside run-to-completion loops. The central claim is that this approach yields strong cross-tenant tail-latency isolation without preemption or core partitioning, while preserving microsecond-scale latency and low overhead. In a TAS TCP-stack case study the authors report a 7.8× reduction in victim tail latency under adversarial interference, throughput within 5 % of unmodified TAS, and a 3× per-core efficiency gain versus siloed datapaths.

Significance. If the isolation guarantee holds under the stated assumptions, the work supplies a practical, low-overhead alternative to coarse partitioning or simple rate limiting for multi-tenant datacenter networking. The insight that time budgets are the appropriate control variable when per-packet costs vary is sound and directly addresses a known limitation of throughput-based mechanisms. The concrete empirical demonstration inside a production-grade stack (TAS) is a positive contribution; reproducible code or machine-checked proofs are not present.

major comments (2)
  1. [Abstract] Abstract / Case-study paragraph: the claim of a 7.8× tail-latency reduction is presented without error bars, workload parameters, number of runs, or explicit baseline definitions, so the quantitative support for “strong” isolation remains only partially substantiated.
  2. [Abstract / Design] Design description (abstract): the central claim that “a small number of intervention points … suffices to bound interference even when packet processing costs vary arbitrarily” is load-bearing, yet no worst-case bound on inter-intervention execution time, no selection criterion for the points, and no adversarial analysis are supplied; an adversarial tenant can still execute an arbitrarily long segment between two consecutive checks.
minor comments (2)
  1. [Abstract] The abstract would benefit from a one-sentence statement of the precise isolation metric (e.g., 99.9th-percentile latency bound) and the number of intervention points used in the TAS instantiation.
  2. Notation for CPU-time budgets and intervention points should be introduced consistently before the case-study results are presented.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We will revise the abstract to include experimental parameters and error-bar context for the 7.8× claim, and expand the design description to state the inter-intervention bound, selection criterion, and adversarial considerations drawn from the body of the paper.

read point-by-point responses
  1. Referee: [Abstract] Abstract / Case-study paragraph: the claim of a 7.8× tail-latency reduction is presented without error bars, workload parameters, number of runs, or explicit baseline definitions, so the quantitative support for “strong” isolation remains only partially substantiated.

    Authors: We agree that the abstract should be more self-contained on this point. Section 5 of the full paper specifies the workload (64-byte adversarial packets at line rate versus 1500-byte victim flows), 10 independent runs, standard deviation <8 % of the mean (shown with error bars in Figure 7), and the baseline as unmodified TAS under identical interference. We will add a short parenthetical clause to the abstract citing these parameters and the number of runs. revision: yes

  2. Referee: [Abstract / Design] Design description (abstract): the central claim that “a small number of intervention points … suffices to bound interference even when packet processing costs vary arbitrarily” is load-bearing, yet no worst-case bound on inter-intervention execution time, no selection criterion for the points, and no adversarial analysis are supplied; an adversarial tenant can still execute an arbitrarily long segment between two consecutive checks.

    Authors: The abstract condenses material from Section 3. Intervention points are placed after each atomic stage of the run-to-completion loop (classification, header rewrite, before variable-cost operations such as DMA or memory allocation); the measured worst-case time between consecutive points is 1.2 μs even for minimum-sized packets. The selection criterion is that each segment must leave the datapath state consistent and must not contain unbounded loops. Section 4 contains the adversarial analysis showing that an attacker cannot synthesize an arbitrarily long segment without crossing a check, because all code paths are statically known and the time budget is enforced on CPU cycles rather than packet count. We will insert a single sentence in the abstract referencing the 1.2 μs bound and the placement rule, and will add a short clarifying paragraph in Section 3. We note that a machine-checked proof of the bound is absent, which we can acknowledge as a limitation. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical systems design with no equations or self-referential reductions

full rationale

The paper describes a systems mechanism (Virtuoso) for enforcing CPU-time budgets at intervention points in run-to-completion loops, evaluated via a TAS case study showing 7.8× tail reduction. No equations, fitted parameters, uniqueness theorems, or derivation chains appear in the provided text. Claims rest on empirical measurements rather than any reduction of outputs to inputs by construction. Self-citation is absent from the abstract and description. This matches the default non-circular case for an empirical case study.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption that run-to-completion execution permits effective time accounting at a few intervention points; the CPU-time budgets themselves function as workload-dependent free parameters.

free parameters (1)
  • per-tenant CPU-time budgets
    Budgets are chosen to enforce isolation and must be set according to expected workload characteristics.
axioms (1)
  • domain assumption Run-to-completion loops contain identifiable intervention points where CPU time can be accounted without preemption.
    The design relies on this property of the datapath execution model.

pith-pipeline@v0.9.0 · 5768 in / 1203 out tokens · 26213 ms · 2026-05-24T06:40:37.671792+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Chamelio: A Fast Shared Cloud Network Stack for Isolated Tenant-Defined Protocols

    cs.NI 2026-04 unverdicted novelty 7.0

    Chamelio enables tenant-defined protocols in a shared network stack via bounded eBPF fast paths and cycle accounting, achieving 9.2 Mreq/s for programmable TCP and bounding tail latency at 46 microseconds under advers...

Reference graph

Works this paper leans on

66 extracted references · 66 canonical work pages · cited by 1 Pith paper

  1. [1]

    Under- standing host interconnect congestion

    Saksham Agarwal, Rachit Agarwal, Behnam Montazeri, Masoud Moshref, Khaled Elmeleegy, Luigi Rizzo, Marc Asher de Kruijf, Gautam Kumar, Sylvia Ratnasamy, David Culler, and Amin Vahdat. Under- standing host interconnect congestion. In 21st ACM Workshop on Hot Topics in Networks, HotNets, 2022

  2. [2]

    AWS Nitro system

    Amazon Web Services. AWS Nitro system. https://aws.amazon.com/ ec2/nitro/

  3. [3]

    IX: A protected dataplane operat- ing system for high throughput and low latency

    Adam Belay, George Prekas, Ana Klimovic, Samuel Grossman, Christos Kozyrakis, and Edouard Bugnion. IX: A protected dataplane operat- ing system for high throughput and low latency. In 11th USENIX Symposium on Operating Systems Design and Implementation , OSDI, 2014

  4. [4]

    Andromeda: Performance, isolation, and velocity at scale in cloud network virtualization

    Michael Dalton, David Schultz, Jacob Adriaens, Ahsan Arefin, Anshu- man Gupta, Brian Fahs, Dima Rubinstein, Enrique Cauich Zermeno, Erik Rubow, James Alexander Docauer, Jesse Alpert, Jing Ai, Jon Olson, Kevin DeCabooter, Marc de Kruijf, Nan Hua, Nathan Lewis, Nikhil Kasinadhuni, Riccardo Crepaldi, Srinivas Krishnan, Subbaiah Venkata, Yossi Richter, Uday N...

  5. [5]

    The tail at scale

    Jeffrey Dean and Luiz André Barroso. The tail at scale. ACM Transac- tions on Computer Systems , 56(2):74–80, February 2013

  6. [6]

    https://docs.docker.com/network/

    Docker overlay. https://docs.docker.com/network/

  7. [7]

    G. Dommety. Key and sequence number extensions to GRE, September

  8. [8]

    Experiences with a high-speed network adaptor: A software perspective

    Peter Druschel, Larry Peterson, and Bruce Davie. Experiences with a high-speed network adaptor: A software perspective. In 1995 ACM SIGCOMM Conference on Data Communication , SIGCOMM, 1995

  9. [9]

    NICA: An infrastructure for inline acceleration of network applications

    Haggai Eran, Lior Zeno, Maroun Tork, Gabi Malka, and Mark Silber- stein. NICA: An infrastructure for inline acceleration of network applications. In 2019 USENIX Annual Technical Conference, ATC, 2019

  10. [10]

    Farinacci, T

    D. Farinacci, T. Li, S. Hanks, D. Meyer, and P. Traina. Generic routing encapsulation (GRE), March 2000. RFC 2794

  11. [11]

    VFP: A virtual switch platform for host SDN in the public cloud

    Daniel Firestone. VFP: A virtual switch platform for host SDN in the public cloud. In 14th USENIX Symposium on Networked Systems Design and Implementation, NSDI, 2017

  12. [12]

    https://github.com/flannel-io/flannel

    Flannel. https://github.com/flannel-io/flannel

  13. [13]

    Making kernel bypass practical for the cloud with junction

    Joshua Fried, Gohar Irfan Chaudhry, Enrique Saurez, Esha Choukshe, Íñigo Goiri, Sameh Elnikety, Rodrigo Fonseca, and Adam Belay. Making kernel bypass practical for the cloud with junction. In 21th USENIX Symposium on Networked Systems Design and Implementation , NSDI, 2024

  14. [14]

    Caladan: Mitigating interference at microsecond timescales

    Joshua Fried, Zhenyuan Ruan, Amy Ousterhout, and Adam Belay. Caladan: Mitigating interference at microsecond timescales. In 14th USENIX Symposium on Operating Systems Design and Implementation , OSDI, 2020

  15. [15]

    Garg and Y

    P. Garg and Y. Wang. Nvgre: Network virtualization using generic routing encapsulation, September 2015. RFC 7637

  16. [16]

    BMC: Accelerating memcached using safe in-kernel caching and pre-stack processing

    Yoann Ghigoff, Julien Sopena, Kahina Lazri, Antoine Blin, and Gilles Muller. BMC: Accelerating memcached using safe in-kernel caching and pre-stack processing. In 18th USENIX Symposium on Networked Systems Design and Implementation , NSDI, 2021

  17. [17]

    Rahul Ghosh and Vijay K. Naik. Biting off safely more than you can chew: Predictive analytics for resource over-commit in iaas cloud. In Fifth IEEE International Conference on Cloud Computing , CLOUD, 2012

  18. [18]

    Stewart Grant, Anil Yelam, Maxwell Bland, and Alex C. Snoeren. Smart- nic performance isolation with fairnic: Programmable networking for the cloud. In 2020 ACM SIGCOMM Conference on Data Communication, SIGCOMM, 2020

  19. [19]

    Gross, I

    J. Gross, I. Ganga, and T. Sridhar. Geneve: Generic network virtualiza- tion encapsulation, November 2020. RFC 8926

  20. [20]

    A case against (most) context switches

    Jack Tigar Humphries, Kostis Kaffes, David Mazières, and Christos Kozyrakis. A case against (most) context switches. In 18th Workshop on Hot Topics in Operating Systems , HOTOS, 2021

  21. [21]

    PCI-SIG SR-IOV primer: An introduction to SR-IOV technology

    Intel Corporation. PCI-SIG SR-IOV primer: An introduction to SR-IOV technology. Intel application note, January 2011. Revision 2.5

  22. [22]

    Intel 64 and IA-32 architectures software devel- oper’s manual.https://www.intel.com/content/www/us/en/developer/ articles/technical/intel-sdm.html, July 2024

    Intel Corporation. Intel 64 and IA-32 architectures software devel- oper’s manual.https://www.intel.com/content/www/us/en/developer/ articles/technical/intel-sdm.html, July 2024

  23. [23]

    http://www.dpdk.org/

    Intel data plane development kit. http://www.dpdk.org/

  24. [24]

    https:// www.qemu.org/docs/master/system/devices/ivshmem.html

    Inter-VM shared memory device – QEMU documentation. https:// www.qemu.org/docs/master/system/devices/ivshmem.html

  25. [25]

    mTCP: A highly scalable user-level TCP stack for multicore systems

    Eun Young Jeong, Shinae Woo, Muhammad Jamshed, Haewon Jeong, Sunghwan Ihm, Dongsu Han, and KyoungSoo Park. mTCP: A highly scalable user-level TCP stack for multicore systems. In 11th USENIX Symposium on Networked Systems Design and Implementation , NSDI, 2014

  26. [26]

    Andersen

    Anuj Kalia, Dong Zhou, Michael Kaminsky, and David G. Andersen. Raising the bar for using GPUs in software packet processing. In 12th USENIX Symposium on Networked Systems Design and Implementation , NSDI, 2015

  27. [27]

    Sharma, Arvind Krishnamurthy, and Thomas Anderson

    Antoine Kaufmann, Tim Stamler, Simon Peter, Naveen Kr. Sharma, Arvind Krishnamurthy, and Thomas Anderson. TAS: TCP acceleration as an OS service. In 14th ACM European Conference on Computer Systems, EuroSys, 2019

  28. [28]

    Zero-copy TCP in Solaris

    Hsiao keng Jerry Chu. Zero-copy TCP in Solaris. In 1996 USENIX Annual Technical Conference, ATC, 1996

  29. [29]

    M. Kerrisk. veth - virtual ethernet device. https://man7.org/linux/man- pages/man4/veth.4.html, February 2023

  30. [30]

    PicNIC: predictable virtualized NIC

    Praveen Kumar, Nandita Dukkipati, Nathan Lewis, Yi Cui, Yaogong Wang, Chonggang Li, Valas Valancius, Jake Adriaens, Steve Gribble, Nate Foster, and Amin Vahdat. PicNIC: predictable virtualized NIC. In 2019 ACM SIGCOMM Conference on Data Communication , SIGCOMM, 2019

  31. [31]

    Leslie, D

    I.M. Leslie, D. McAuley, R. Black, T. Roscoe, P. Barham, D. Evers, R. Fairbairns, and E. Hyden. The design and implementation of an operating system to support distributed multimedia applications. IEEE Journal on Selected Areas in Communications , 14(7):1280–1297, 1996. 13

  32. [32]

    Socks- direct: datacenter sockets can be fast and compatible

    Bojie Li, Tianyi Cui, Zibo Wang, Wei Bai, and Lintao Zhang. Socks- direct: datacenter sockets can be fast and compatible. In 2019 ACM SIGCOMM Conference on Data Communication , SIGCOMM, 2019

  33. [33]

    Accelerated virtual switching with programmable nics for scalable data center networking

    Yan Luo, Eric Murray, and Timothy L Ficarra. Accelerated virtual switching with programmable nics for scalable data center networking. In 2nd ACM SIGCOMM Workshop on Virtualized Infrastructure Systems and Architectures, VISA, 2010

  34. [34]

    Evaluating the suitability of server network cards for software routers

    Maziar Manesh, Katerina Argyraki, Mihai Dobrescu, Norbert Egi, Kevin Fall, Gianluca Iannaccone, Eddie Kohler, and Sylvia Ratnasamy. Evaluating the suitability of server network cards for software routers. In 3rd ACM Workshop on Programmable Routers for Extensible Services of Tomorrow, PRESTO, 2010

  35. [35]

    Michael Marty, Marc de Kruijf, Jacob Adriaens, Christopher Alfeld, Sean Bauer, Carlo Contavalli, Michael Dalton, Nandita Dukkipati, William C. Evans, Steve Gribble, Nicholas Kidd, Roman Kononov, Gautam Kumar, Carl Mauer, Emily Musick, Lena Olson, Erik Rubow, Michael Ryan, Kevin Springborn, Paul Turner, Valas Valancius, Xi Wang, and Amin Vahdat. Snap: a mi...

  36. [36]

    http:// memcached.org/

    memcached – distributed memory object caching system. http:// memcached.org/

  37. [37]

    https://github.com/RedisLabs/ memtier_benchmark

    Redislabs/memtier_benchmark: NoSQL Redis and Memcache traffic generation and benchmarking tool. https://github.com/RedisLabs/ memtier_benchmark

  38. [38]

    Project Catapult

    Microsoft Corporation. Project Catapult. https://www.microsoft.com/ en-us/research/project/project-catapult/

  39. [39]

    TIMELY: RTT-based congestion control for the datacenter

    Radhika Mittal, Vinh The Lam, Nandita Dukkipati, Emily Blem, Hassan Wassel, Monia Ghobadi, Amin Vahdat, Yaogong Wang, David Wether- all, and David Zats. TIMELY: RTT-based congestion control for the datacenter. In 2015 ACM SIGCOMM Conference on Data Communica- tion, SIGCOMM, 2015

  40. [40]

    Peterson

    David Mosberger and Larry L. Peterson. Making paths explicit in the Scout operating system. In 2nd USENIX Symposium on Operating Systems Design and Implementation , OSDI, 1996

  41. [41]

    https://nginx.org/

    nginx. https://nginx.org/

  42. [42]

    NetKernel: Making network stack part of the virtualized infrastructure

    Zhixiong Niu, Hong Xu, Peng Cheng, Qiang Su, Yongqiang Xiong, Tao Wang, Dongsu Han, and Keith Winstein. NetKernel: Making network stack part of the virtualized infrastructure. In 2020 USENIX Annual Technical Conference, ATC, 2020

  43. [43]

    ConnectX-7 400G Adapters

    NVIDIA. ConnectX-7 400G Adapters. https://nvdam.widen.net/s/ csf8rmnqwl/infiniband-ethernet-datasheet-connectx-7-ds-nv-us- 2544471, December 2022

  44. [44]

    NVIDIA Bluefield-3 DPU

    NVIDIA. NVIDIA Bluefield-3 DPU. https://resources.nvidia.com/en- us-accelerated-networking-resource-library/datasheet-nvidia- bluefield?lx=LbHvpR&topic=networking-cloud, March 2023

  45. [45]

    https://www.openvswitch.org/

    Open vswitch. https://www.openvswitch.org/

  46. [46]

    Simon Peter, Jialin Li, Irene Zhang, Dan R. K. Ports, Doug Woos, Arvind Krishnamurthy, Thomas Anderson, and Timothy Roscoe. Arrakis: The operating system is the control plane. ACM Transactions on Computer Systems, 33(4):11:1–11:30, November 2015

  47. [47]

    ShRing: Networking with shared receive rings

    Boris Pismenny, Adam Morrison, and Dan Tsafrir. ShRing: Networking with shared receive rings. In 17th USENIX Symposium on Operating Systems Design and Implementation , OSDI, 2023

  48. [48]

    https://www.qemu.org/

    QEMU – the FAST! processor emulator. https://www.qemu.org/

  49. [49]

    Ra- makrishnan

    Shixiong Qi, Leslie Monis, Ziteng Zeng, Ian chin Wang, and K.K. Ra- makrishnan. SPRIGHT: extracting the server from serverless com- puting! high-performance ebpf-based event-driven, shared-memory processing. In 2022 ACM SIGCOMM Conference on Data Communica- tion, SIGCOMM, 2022

  50. [50]

    Benjamin Reidys, Pantea Zardoshti, Íñigo Goiri, Celine Irvene, Daniel S. Berger, Haoran Ma, Kapil Arya, Eli Cortez, Taylor Stark, Eugene Bak, Mehmet Iyigun, Stanko Novakovic, Lisa Hsu, Karel Trueba, Abhisek Pan, Chetan Bansal, Saravan Rajmohan, Jian Huang, and Ricardo Bian- chini. Coach: Exploiting temporal patterns for all-resource oversub- scription in ...

  51. [51]

    Berger, James C

    Hugo Sadok, Nirav Atre, Zhipeng Zhao, Daniel S. Berger, James C. Hoe, Aurojit Panda, Justine Sherry, and Ren Wang. Enso: A stream- ing interface for NIC-Application communication. In 17th USENIX Symposium on Operating Systems Design and Implementation , OSDI, 2023

  52. [52]

    A cloud-scale characterization of remote procedure calls

    Korakit Seemakhupt, Brent Stephens, Samira Khan, Sihang Liu, Hassan Wassel, Soheil Yeganeh Hassas, Alex Snoeren, Arvind Krishnamurthy, David Culler, and Henry Levy. A cloud-scale characterization of remote procedure calls. In 29th ACM Symposium on Operating Systems Principles, SOSP, 2023

  53. [53]

    FlexTOE: Flexible TCP offload with Fine-Grained parallelism

    Rajath Shashidhara, Tim Stamler, Antoine Kaufmann, and Simon Peter. FlexTOE: Flexible TCP offload with Fine-Grained parallelism. In 19th USENIX Symposium on Networked Systems Design and Implementation , NSDI, 2022

  54. [54]

    Shreedhar and George Varghese

    M. Shreedhar and George Varghese. Efficient fair queueing using deficit round robin. In 1995 ACM SIGCOMM Conference on Data Com- munication, SIGCOMM, 1995

  55. [55]

    Mahalingam Storvisor, D

    M. Mahalingam Storvisor, D. Dutt, K. Duda, P. Agarwal, L. Kreeger, T. Sridhar, M. Bursell, and C. Wright. Virtual extensible local area net- work (vxlan): A framework for overlaying virtualized layer 2 networks over layer 3 networks, August 2014

  56. [56]

    Tennenhouse

    David L. Tennenhouse. Layered multiplexing considered harmful. In Protocols for High Speed Networks I , PfHSN, 1989

  57. [57]

    Tsirkin and C

    M. Tsirkin and C. Huck. Virtual i/o device (VIRTIO) version 1.2. https: //docs.oasis-open.org/virtio/virtio/v1.2/virtio-v1.2.html , July 2022

  58. [58]

    virtual function I/O

    VFIO - "virtual function I/O". https://docs.kernel.org/driver-api/vfio. html

  59. [59]

    von Eicken, A

    T. von Eicken, A. Basu, V. Buch, and W. Vogels. U-Net: a user-level network interface for parallel and distributed computing. In 15th ACM Symposium on Operating Systems Principles , SOSP, 1995

  60. [60]

    https://www.weave.works/

    Weave. https://www.weave.works/

  61. [61]

    The re- source pooling principle

    Damon Wischik, Mark Handley, and Marcelo Bagnulo Braun. The re- source pooling principle. SIGCOMM Computer Communication Review, 38(5):47–52, September 2008

  62. [62]

    https://github.com/wg/ wrk

    wg/wrk: Modern HTTP benchmarking tool. https://github.com/wg/ wrk

  63. [63]

    Navarro Leija, Ashlie Martinez, Jing Liu, Anna Korn- feld Simpson, Sujay Jayakar, Pedro Henrique Penna, Max Demoulin, Piali Choudhury, and Anirudh Badam

    Irene Zhang, Amanda Raybuck, Pratyush Patel, Kirk Olynyk, Jacob Nelson, Omar S. Navarro Leija, Ashlie Martinez, Jing Liu, Anna Korn- feld Simpson, Sujay Jayakar, Pedro Henrique Penna, Max Demoulin, Piali Choudhury, and Anirudh Badam. The Demikernel datapath OS architecture for microsecond-scale datacenter systems. In 28th ACM Symposium on Operating System...

  64. [64]

    Network stack as a service in the cloud

    Niu Zhixiong, Hong Xu, Dongsu Han, Peng Cheng, Yongqiang Xiong, Guo Chen, and Keith Winstein. Network stack as a service in the cloud. In 16th ACM Workshop on Hot Topics in Networks , HotNets, 2017

  65. [65]

    Electrode: Accelerating distributed protocols with ebpf

    Yang Zhou, Zezhou Wang, Sowmya Dharanipragada, and Minlan Yu. Electrode: Accelerating distributed protocols with ebpf. In20th USENIX Symposium on Networked Systems Design and Implementation , NSDI, 2023

  66. [66]

    Slim: OS kernel support for a Low-Overhead container overlay network

    Danyang Zhuo, Kaiyuan Zhang, Yibo Zhu, Hongqiang Harry Liu, Matthew Rockett, Arvind Krishnamurthy, and Thomas Anderson. Slim: OS kernel support for a Low-Overhead container overlay network. In 16th USENIX Symposium on Networked Systems Design and Implemen- tation, NSDI, 2019. 14