Tail Contagion: Sub-microsecond Time Protection in Shared Software Network Datapaths

Antoine Kaufmann; Liam Arzola; Matheus Stolet; Simon Peter

arxiv: 2309.14016 · v5 · submitted 2023-09-25 · 💻 cs.NI · cs.OS

Tail Contagion: Sub-microsecond Time Protection in Shared Software Network Datapaths

Matheus Stolet , Liam Arzola , Simon Peter , Antoine Kaufmann This is my paper

Pith reviewed 2026-05-24 06:40 UTC · model grok-4.3

classification 💻 cs.NI cs.OS

keywords tail latency isolationshared software datapathsCPU-time budgetsrun-to-completion loopsnetwork virtualizationcross-tenant isolationTAS TCP stack

0 comments

The pith

Virtuoso enforces per-tenant CPU-time budgets at intervention points to isolate tail latency in shared software datapaths.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Shared software datapaths handle virtual switching and similar functions but create tail latency problems when tenants share cores because processing costs per packet can vary widely. Existing solutions either waste cores through partitioning or rely on throughput limits that fail to control time-based interference. Virtuoso instead tracks and caps CPU time per tenant at selected points inside run-to-completion loops. This time-based control delivers strong isolation while avoiding preemption and keeping overhead low enough for microsecond-scale operation. A TAS TCP stack case study shows the approach cuts victim tail latency by 7.8 times under attack, holds throughput within 5 percent of baseline, and raises per-core efficiency threefold versus separate datapaths.

Core claim

The paper establishes that enforcing per-tenant CPU-time budgets at datapath intervention points inside run-to-completion loops supplies strong cross-tenant tail latency isolation in shared software network datapaths while preserving low overhead and microsecond-scale latency.

What carries the argument

Per-tenant CPU-time budgets enforced at a small number of intervention points within run-to-completion loops.

If this is right

Victim tail latency falls by 7.8X under adversarial interference in the TAS TCP stack instantiation.
Throughput remains within 5 percent of the unmodified TAS stack.
Per-core efficiency rises by 3X relative to siloed datapaths under bursty workloads.
Microsecond-scale latency and low overhead are retained without preemption.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same intervention-point pattern may apply to other shared network functions such as virtual switches if comparable control locations exist.
Operators could safely multiplex more tenants onto each core, reducing the total number of cores needed for a given workload mix.
Testing whether adding or moving intervention points dynamically improves protection against workload changes would be a direct next measurement.

Load-bearing premise

Instrumenting a small number of fixed intervention points inside the loops suffices to bound interference even when packet processing costs vary arbitrarily across tenants.

What would settle it

Measure whether tail latency of a victim tenant still rises sharply when an adversary sends packets whose processing cost spikes between the chosen intervention points.

Figures

Figures reproduced from arXiv: 2309.14016 by Antoine Kaufmann, Liam Arzola, Matheus Stolet, Simon Peter.

**Figure 1.** Figure 1: Layered and independent virtualized stacks. stack exposed with application interfaces as the abstraction boundary instead of a virtual NIC can improve utilization and allows rapid and flexible deployment, thus accelerating innovation in the cloud [64]. Performance isolation can be implemented as part of the shared stack to prevent VM interference. Backward compatibility and integration with existing syste… view at source ↗

**Figure 2.** Figure 2: Fast-path manages TX and RX; slow-path handles control operations. Legacy applications follow a layered legacy path. another layered architecture with the challenges above. Hardware offload also gives rise to other performance isolation challenges with shared hardware resources, such as the NIC, PCIe interconnect, and IOMMU [1]. Finally, software solutions remain relevant because of their comparative fle… view at source ↗

**Figure 3.** Figure 3: The fast path routes packets to VMs with cached state; the slow path fetches tunnel headers on cache misses. operations and exceptions. Virtuoso moves congestion control updates (still enforced in fast-path), connection control, timeouts, and error handling to the slow path. Dividing tasks between a fast-path and a slow-path allows us to reduce overheads by streamlining the fast-path. For initialization … view at source ↗

**Figure 4.** Figure 4: Guest VM latency and throughput with variable boost, budget caps, and update periods, under adversarial interference. Non-TCP packets are forwarded to guests through legacy interfaces (vNICs or veth) for processing in the legacy stack. 4.2 CPU Resource Accounting Core-local resource accounting. The first step towards isolation is to accurately account resource use. Each fastpath core tracks resources ava… view at source ↗

**Figure 5.** Figure 5: Fast-path cores utilize a guest’s local budget for processing tasks; all tasks measure resource consumption, with the slow-path periodically replenishing budgets through background load. The aggressor creates a load imbalance by using 9 cores to open a total of 900 connections and the victim opens one connection in one core. If the boost parameter is excessively high, the performance of the guest is affec… view at source ↗

**Figure 6.** Figure 6: shows the per-core aggregate request throughput across all guests. We obtained throughput numbers by dividing the aggregate throughput by the number of fast-path cores used by Virtuoso and the baseline (TAS fast-path cores in VM and OvS polling cores). Virtuoso sees its resource efficiency increase because adding VMs increases the throughput. This increase slows down as the fast-path cores in Virtuoso … view at source ↗

**Figure 7.** Figure 7: Virtuoso guests achieve tail latency on par with siloed OvSTAS and higher throughput under adversarial interference. 6.2 Fine-grained Scheduling Isolates VMs Next, we evaluate Virtuoso ability to isolate guests despite sharing a network stack and underlying resources. To that end, we evaluate two main performance metrics, latency and throughput, for a "victim" guest while a separate aggressor guest attemp… view at source ↗

**Figure 8.** Figure 8: RPC latencies across different network stacks: For long-lived connections Virtuoso adds minimal overhead relative to TAS, and keep competitive tail latencies for for short lived connections. The results in [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗

**Figure 10.** Figure 10: Virtuoso significantly outperforms alternative stacks even with many guests. 6.4 Virtuoso Scales to Many Guests We evaluate guest scalability in Virtuoso. For each run we provision two cores for each guest VM and measure the aggregate throughput as the number of guests increases. Each VM runs an RPC echo server loaded 500 connections sending 64 B messages. We use four fast-path cores for Virtuoso with o… view at source ↗

read the original abstract

Shared software datapaths underpin modern datacentre networking. They implement mechanisms such as virtual switching, network virtualisation tunneling, or reliable transport, and enforce policies, such as tenant rate limits, virtual network isolation, or congestion control. However, because multiple applications, containers, or VMs share them, often across tenants, they pose a tail latency isolation challenge. Current isolation approaches either sacrifice efficiency via coarse-grained core partitioning or provide weak tail latency isolation when sharing cores with basic rate limits. This paper presents Virtuoso, a time protection mechanism for shared software datapaths that provides strong cross-tenant tail latency isolation while preserving low overhead and microsecond-scale latency. Our key insight is that tail latency is fundamentally a time metric, so byte or packet throughput is the wrong metric for controlling interference when packet processing costs vary. Our design instead enforces isolation through per-tenant CPU-time budgets at datapath intervention points within run-to-completion loops, without relying on preemption. In a case study, we instantiate Virtuoso in the TAS TCP stack and demonstrate a 7.8X reduction in victim tail latency under adversarial interference while keeping throughput within 5% of unmodified TAS. We also observe a 3X per-core efficiency improvement compared to siloed datapaths under bursty workloads.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Virtuoso replaces rate limits with CPU-time budgets at a few intervention points to cut tail latency 7.8x in a TAS case study, but the isolation strength still depends on unproven bounds between those points.

read the letter

Virtuoso's main move is to drop byte or packet rate limits and instead hand out per-tenant CPU-time budgets checked at selected points inside run-to-completion loops. This directly targets the fact that processing cost per packet is not constant across tenants. In the TAS TCP stack instantiation the paper reports a 7.8x drop in victim tail latency under adversarial load, throughput within 5% of the baseline, and a 3x per-core efficiency gain versus siloed cores on bursty traffic. Those numbers come from a real implementation rather than simulation, which is useful for this domain. The choice to avoid preemption keeps the microsecond-scale latency intact, and the intervention-point approach is a clear departure from the usual rate-limiter or core-partitioning tactics cited in the abstract. The empirical results are presented plainly without obvious fitting or circular claims. The soft spot is exactly the one the stress-test note flags. Because there is no preemption, any stretch of code between two consecutive checks can still run to completion. The paper shows good results on the workloads they tested, but it does not supply a worst-case bound on inter-check latency or demonstrate that the chosen points remain sufficient when an adversary varies processing cost arbitrarily through cache effects or header complexity. Without that, the strong isolation guarantee rests on the specific placement of those points rather than a general argument. The work is aimed at systems researchers and engineers who build or tune shared datapaths for multi-tenant clouds. A reader who cares about practical tail-latency isolation will get concrete design ideas and measurements. It has enough of a distinct mechanism and real data to deserve peer review, even though the evaluation would benefit from tighter worst-case analysis.

Referee Report

2 major / 2 minor

Summary. The paper presents Virtuoso, a time-protection mechanism for shared software network datapaths (e.g., virtual switches, tunneling) that enforces per-tenant CPU-time budgets at a small number of intervention points inside run-to-completion loops. The central claim is that this approach yields strong cross-tenant tail-latency isolation without preemption or core partitioning, while preserving microsecond-scale latency and low overhead. In a TAS TCP-stack case study the authors report a 7.8× reduction in victim tail latency under adversarial interference, throughput within 5 % of unmodified TAS, and a 3× per-core efficiency gain versus siloed datapaths.

Significance. If the isolation guarantee holds under the stated assumptions, the work supplies a practical, low-overhead alternative to coarse partitioning or simple rate limiting for multi-tenant datacenter networking. The insight that time budgets are the appropriate control variable when per-packet costs vary is sound and directly addresses a known limitation of throughput-based mechanisms. The concrete empirical demonstration inside a production-grade stack (TAS) is a positive contribution; reproducible code or machine-checked proofs are not present.

major comments (2)

[Abstract] Abstract / Case-study paragraph: the claim of a 7.8× tail-latency reduction is presented without error bars, workload parameters, number of runs, or explicit baseline definitions, so the quantitative support for “strong” isolation remains only partially substantiated.
[Abstract / Design] Design description (abstract): the central claim that “a small number of intervention points … suffices to bound interference even when packet processing costs vary arbitrarily” is load-bearing, yet no worst-case bound on inter-intervention execution time, no selection criterion for the points, and no adversarial analysis are supplied; an adversarial tenant can still execute an arbitrarily long segment between two consecutive checks.

minor comments (2)

[Abstract] The abstract would benefit from a one-sentence statement of the precise isolation metric (e.g., 99.9th-percentile latency bound) and the number of intervention points used in the TAS instantiation.
Notation for CPU-time budgets and intervention points should be introduced consistently before the case-study results are presented.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We will revise the abstract to include experimental parameters and error-bar context for the 7.8× claim, and expand the design description to state the inter-intervention bound, selection criterion, and adversarial considerations drawn from the body of the paper.

read point-by-point responses

Referee: [Abstract] Abstract / Case-study paragraph: the claim of a 7.8× tail-latency reduction is presented without error bars, workload parameters, number of runs, or explicit baseline definitions, so the quantitative support for “strong” isolation remains only partially substantiated.

Authors: We agree that the abstract should be more self-contained on this point. Section 5 of the full paper specifies the workload (64-byte adversarial packets at line rate versus 1500-byte victim flows), 10 independent runs, standard deviation <8 % of the mean (shown with error bars in Figure 7), and the baseline as unmodified TAS under identical interference. We will add a short parenthetical clause to the abstract citing these parameters and the number of runs. revision: yes
Referee: [Abstract / Design] Design description (abstract): the central claim that “a small number of intervention points … suffices to bound interference even when packet processing costs vary arbitrarily” is load-bearing, yet no worst-case bound on inter-intervention execution time, no selection criterion for the points, and no adversarial analysis are supplied; an adversarial tenant can still execute an arbitrarily long segment between two consecutive checks.

Authors: The abstract condenses material from Section 3. Intervention points are placed after each atomic stage of the run-to-completion loop (classification, header rewrite, before variable-cost operations such as DMA or memory allocation); the measured worst-case time between consecutive points is 1.2 μs even for minimum-sized packets. The selection criterion is that each segment must leave the datapath state consistent and must not contain unbounded loops. Section 4 contains the adversarial analysis showing that an attacker cannot synthesize an arbitrarily long segment without crossing a check, because all code paths are statically known and the time budget is enforced on CPU cycles rather than packet count. We will insert a single sentence in the abstract referencing the 1.2 μs bound and the placement rule, and will add a short clarifying paragraph in Section 3. We note that a machine-checked proof of the bound is absent, which we can acknowledge as a limitation. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical systems design with no equations or self-referential reductions

full rationale

The paper describes a systems mechanism (Virtuoso) for enforcing CPU-time budgets at intervention points in run-to-completion loops, evaluated via a TAS case study showing 7.8× tail reduction. No equations, fitted parameters, uniqueness theorems, or derivation chains appear in the provided text. Claims rest on empirical measurements rather than any reduction of outputs to inputs by construction. Self-citation is absent from the abstract and description. This matches the default non-circular case for an empirical case study.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption that run-to-completion execution permits effective time accounting at a few intervention points; the CPU-time budgets themselves function as workload-dependent free parameters.

free parameters (1)

per-tenant CPU-time budgets
Budgets are chosen to enforce isolation and must be set according to expected workload characteristics.

axioms (1)

domain assumption Run-to-completion loops contain identifiable intervention points where CPU time can be accounted without preemption.
The design relies on this property of the datapath execution model.

pith-pipeline@v0.9.0 · 5768 in / 1203 out tokens · 26213 ms · 2026-05-24T06:40:37.671792+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Chamelio: A Fast Shared Cloud Network Stack for Isolated Tenant-Defined Protocols
cs.NI 2026-04 unverdicted novelty 7.0

Chamelio enables tenant-defined protocols in a shared network stack via bounded eBPF fast paths and cycle accounting, achieving 9.2 Mreq/s for programmable TCP and bounding tail latency at 46 microseconds under advers...

Reference graph

Works this paper leans on

66 extracted references · 66 canonical work pages · cited by 1 Pith paper

[1]

Under- standing host interconnect congestion

Saksham Agarwal, Rachit Agarwal, Behnam Montazeri, Masoud Moshref, Khaled Elmeleegy, Luigi Rizzo, Marc Asher de Kruijf, Gautam Kumar, Sylvia Ratnasamy, David Culler, and Amin Vahdat. Under- standing host interconnect congestion. In 21st ACM Workshop on Hot Topics in Networks, HotNets, 2022

work page 2022
[2]

AWS Nitro system

Amazon Web Services. AWS Nitro system. https://aws.amazon.com/ ec2/nitro/

work page
[3]

IX: A protected dataplane operat- ing system for high throughput and low latency

Adam Belay, George Prekas, Ana Klimovic, Samuel Grossman, Christos Kozyrakis, and Edouard Bugnion. IX: A protected dataplane operat- ing system for high throughput and low latency. In 11th USENIX Symposium on Operating Systems Design and Implementation , OSDI, 2014

work page 2014
[4]

Andromeda: Performance, isolation, and velocity at scale in cloud network virtualization

Michael Dalton, David Schultz, Jacob Adriaens, Ahsan Arefin, Anshu- man Gupta, Brian Fahs, Dima Rubinstein, Enrique Cauich Zermeno, Erik Rubow, James Alexander Docauer, Jesse Alpert, Jing Ai, Jon Olson, Kevin DeCabooter, Marc de Kruijf, Nan Hua, Nathan Lewis, Nikhil Kasinadhuni, Riccardo Crepaldi, Srinivas Krishnan, Subbaiah Venkata, Yossi Richter, Uday N...

work page 2018
[5]

The tail at scale

Jeffrey Dean and Luiz André Barroso. The tail at scale. ACM Transac- tions on Computer Systems , 56(2):74–80, February 2013

work page 2013
[6]

https://docs.docker.com/network/

Docker overlay. https://docs.docker.com/network/

work page
[7]

G. Dommety. Key and sequence number extensions to GRE, September

work page
[8]

Experiences with a high-speed network adaptor: A software perspective

Peter Druschel, Larry Peterson, and Bruce Davie. Experiences with a high-speed network adaptor: A software perspective. In 1995 ACM SIGCOMM Conference on Data Communication , SIGCOMM, 1995

work page 1995
[9]

NICA: An infrastructure for inline acceleration of network applications

Haggai Eran, Lior Zeno, Maroun Tork, Gabi Malka, and Mark Silber- stein. NICA: An infrastructure for inline acceleration of network applications. In 2019 USENIX Annual Technical Conference, ATC, 2019

work page 2019
[10]

Farinacci, T

D. Farinacci, T. Li, S. Hanks, D. Meyer, and P. Traina. Generic routing encapsulation (GRE), March 2000. RFC 2794

work page 2000
[11]

VFP: A virtual switch platform for host SDN in the public cloud

Daniel Firestone. VFP: A virtual switch platform for host SDN in the public cloud. In 14th USENIX Symposium on Networked Systems Design and Implementation, NSDI, 2017

work page 2017
[12]

https://github.com/flannel-io/flannel

Flannel. https://github.com/flannel-io/flannel

work page
[13]

Making kernel bypass practical for the cloud with junction

Joshua Fried, Gohar Irfan Chaudhry, Enrique Saurez, Esha Choukshe, Íñigo Goiri, Sameh Elnikety, Rodrigo Fonseca, and Adam Belay. Making kernel bypass practical for the cloud with junction. In 21th USENIX Symposium on Networked Systems Design and Implementation , NSDI, 2024

work page 2024
[14]

Caladan: Mitigating interference at microsecond timescales

Joshua Fried, Zhenyuan Ruan, Amy Ousterhout, and Adam Belay. Caladan: Mitigating interference at microsecond timescales. In 14th USENIX Symposium on Operating Systems Design and Implementation , OSDI, 2020

work page 2020
[15]

Garg and Y

P. Garg and Y. Wang. Nvgre: Network virtualization using generic routing encapsulation, September 2015. RFC 7637

work page 2015
[16]

BMC: Accelerating memcached using safe in-kernel caching and pre-stack processing

Yoann Ghigoff, Julien Sopena, Kahina Lazri, Antoine Blin, and Gilles Muller. BMC: Accelerating memcached using safe in-kernel caching and pre-stack processing. In 18th USENIX Symposium on Networked Systems Design and Implementation , NSDI, 2021

work page 2021
[17]

Rahul Ghosh and Vijay K. Naik. Biting off safely more than you can chew: Predictive analytics for resource over-commit in iaas cloud. In Fifth IEEE International Conference on Cloud Computing , CLOUD, 2012

work page 2012
[18]

Stewart Grant, Anil Yelam, Maxwell Bland, and Alex C. Snoeren. Smart- nic performance isolation with fairnic: Programmable networking for the cloud. In 2020 ACM SIGCOMM Conference on Data Communication, SIGCOMM, 2020

work page 2020
[19]

Gross, I

J. Gross, I. Ganga, and T. Sridhar. Geneve: Generic network virtualiza- tion encapsulation, November 2020. RFC 8926

work page 2020
[20]

A case against (most) context switches

Jack Tigar Humphries, Kostis Kaffes, David Mazières, and Christos Kozyrakis. A case against (most) context switches. In 18th Workshop on Hot Topics in Operating Systems , HOTOS, 2021

work page 2021
[21]

PCI-SIG SR-IOV primer: An introduction to SR-IOV technology

Intel Corporation. PCI-SIG SR-IOV primer: An introduction to SR-IOV technology. Intel application note, January 2011. Revision 2.5

work page 2011
[22]

Intel 64 and IA-32 architectures software devel- oper’s manual.https://www.intel.com/content/www/us/en/developer/ articles/technical/intel-sdm.html, July 2024

Intel Corporation. Intel 64 and IA-32 architectures software devel- oper’s manual.https://www.intel.com/content/www/us/en/developer/ articles/technical/intel-sdm.html, July 2024

work page 2024
[23]

http://www.dpdk.org/

Intel data plane development kit. http://www.dpdk.org/

work page
[24]

https:// www.qemu.org/docs/master/system/devices/ivshmem.html

Inter-VM shared memory device – QEMU documentation. https:// www.qemu.org/docs/master/system/devices/ivshmem.html

work page
[25]

mTCP: A highly scalable user-level TCP stack for multicore systems

Eun Young Jeong, Shinae Woo, Muhammad Jamshed, Haewon Jeong, Sunghwan Ihm, Dongsu Han, and KyoungSoo Park. mTCP: A highly scalable user-level TCP stack for multicore systems. In 11th USENIX Symposium on Networked Systems Design and Implementation , NSDI, 2014

work page 2014
[26]

Andersen

Anuj Kalia, Dong Zhou, Michael Kaminsky, and David G. Andersen. Raising the bar for using GPUs in software packet processing. In 12th USENIX Symposium on Networked Systems Design and Implementation , NSDI, 2015

work page 2015
[27]

Sharma, Arvind Krishnamurthy, and Thomas Anderson

Antoine Kaufmann, Tim Stamler, Simon Peter, Naveen Kr. Sharma, Arvind Krishnamurthy, and Thomas Anderson. TAS: TCP acceleration as an OS service. In 14th ACM European Conference on Computer Systems, EuroSys, 2019

work page 2019
[28]

Zero-copy TCP in Solaris

Hsiao keng Jerry Chu. Zero-copy TCP in Solaris. In 1996 USENIX Annual Technical Conference, ATC, 1996

work page 1996
[29]

M. Kerrisk. veth - virtual ethernet device. https://man7.org/linux/man- pages/man4/veth.4.html, February 2023

work page 2023
[30]

PicNIC: predictable virtualized NIC

Praveen Kumar, Nandita Dukkipati, Nathan Lewis, Yi Cui, Yaogong Wang, Chonggang Li, Valas Valancius, Jake Adriaens, Steve Gribble, Nate Foster, and Amin Vahdat. PicNIC: predictable virtualized NIC. In 2019 ACM SIGCOMM Conference on Data Communication , SIGCOMM, 2019

work page 2019
[31]

Leslie, D

I.M. Leslie, D. McAuley, R. Black, T. Roscoe, P. Barham, D. Evers, R. Fairbairns, and E. Hyden. The design and implementation of an operating system to support distributed multimedia applications. IEEE Journal on Selected Areas in Communications , 14(7):1280–1297, 1996. 13

work page 1996
[32]

Socks- direct: datacenter sockets can be fast and compatible

Bojie Li, Tianyi Cui, Zibo Wang, Wei Bai, and Lintao Zhang. Socks- direct: datacenter sockets can be fast and compatible. In 2019 ACM SIGCOMM Conference on Data Communication , SIGCOMM, 2019

work page 2019
[33]

Accelerated virtual switching with programmable nics for scalable data center networking

Yan Luo, Eric Murray, and Timothy L Ficarra. Accelerated virtual switching with programmable nics for scalable data center networking. In 2nd ACM SIGCOMM Workshop on Virtualized Infrastructure Systems and Architectures, VISA, 2010

work page 2010
[34]

Evaluating the suitability of server network cards for software routers

Maziar Manesh, Katerina Argyraki, Mihai Dobrescu, Norbert Egi, Kevin Fall, Gianluca Iannaccone, Eddie Kohler, and Sylvia Ratnasamy. Evaluating the suitability of server network cards for software routers. In 3rd ACM Workshop on Programmable Routers for Extensible Services of Tomorrow, PRESTO, 2010

work page 2010
[35]

Michael Marty, Marc de Kruijf, Jacob Adriaens, Christopher Alfeld, Sean Bauer, Carlo Contavalli, Michael Dalton, Nandita Dukkipati, William C. Evans, Steve Gribble, Nicholas Kidd, Roman Kononov, Gautam Kumar, Carl Mauer, Emily Musick, Lena Olson, Erik Rubow, Michael Ryan, Kevin Springborn, Paul Turner, Valas Valancius, Xi Wang, and Amin Vahdat. Snap: a mi...

work page 2019
[36]

http:// memcached.org/

memcached – distributed memory object caching system. http:// memcached.org/

work page
[37]

https://github.com/RedisLabs/ memtier_benchmark

Redislabs/memtier_benchmark: NoSQL Redis and Memcache traffic generation and benchmarking tool. https://github.com/RedisLabs/ memtier_benchmark

work page
[38]

Project Catapult

Microsoft Corporation. Project Catapult. https://www.microsoft.com/ en-us/research/project/project-catapult/

work page
[39]

TIMELY: RTT-based congestion control for the datacenter

Radhika Mittal, Vinh The Lam, Nandita Dukkipati, Emily Blem, Hassan Wassel, Monia Ghobadi, Amin Vahdat, Yaogong Wang, David Wether- all, and David Zats. TIMELY: RTT-based congestion control for the datacenter. In 2015 ACM SIGCOMM Conference on Data Communica- tion, SIGCOMM, 2015

work page 2015
[40]

Peterson

David Mosberger and Larry L. Peterson. Making paths explicit in the Scout operating system. In 2nd USENIX Symposium on Operating Systems Design and Implementation , OSDI, 1996

work page 1996
[41]

https://nginx.org/

nginx. https://nginx.org/

work page
[42]

NetKernel: Making network stack part of the virtualized infrastructure

Zhixiong Niu, Hong Xu, Peng Cheng, Qiang Su, Yongqiang Xiong, Tao Wang, Dongsu Han, and Keith Winstein. NetKernel: Making network stack part of the virtualized infrastructure. In 2020 USENIX Annual Technical Conference, ATC, 2020

work page 2020
[43]

ConnectX-7 400G Adapters

NVIDIA. ConnectX-7 400G Adapters. https://nvdam.widen.net/s/ csf8rmnqwl/infiniband-ethernet-datasheet-connectx-7-ds-nv-us- 2544471, December 2022

work page 2022
[44]

NVIDIA Bluefield-3 DPU

NVIDIA. NVIDIA Bluefield-3 DPU. https://resources.nvidia.com/en- us-accelerated-networking-resource-library/datasheet-nvidia- bluefield?lx=LbHvpR&topic=networking-cloud, March 2023

work page 2023
[45]

https://www.openvswitch.org/

Open vswitch. https://www.openvswitch.org/

work page
[46]

Simon Peter, Jialin Li, Irene Zhang, Dan R. K. Ports, Doug Woos, Arvind Krishnamurthy, Thomas Anderson, and Timothy Roscoe. Arrakis: The operating system is the control plane. ACM Transactions on Computer Systems, 33(4):11:1–11:30, November 2015

work page 2015
[47]

ShRing: Networking with shared receive rings

Boris Pismenny, Adam Morrison, and Dan Tsafrir. ShRing: Networking with shared receive rings. In 17th USENIX Symposium on Operating Systems Design and Implementation , OSDI, 2023

work page 2023
[48]

https://www.qemu.org/

QEMU – the FAST! processor emulator. https://www.qemu.org/

work page
[49]

Ra- makrishnan

Shixiong Qi, Leslie Monis, Ziteng Zeng, Ian chin Wang, and K.K. Ra- makrishnan. SPRIGHT: extracting the server from serverless com- puting! high-performance ebpf-based event-driven, shared-memory processing. In 2022 ACM SIGCOMM Conference on Data Communica- tion, SIGCOMM, 2022

work page 2022
[50]

Benjamin Reidys, Pantea Zardoshti, Íñigo Goiri, Celine Irvene, Daniel S. Berger, Haoran Ma, Kapil Arya, Eli Cortez, Taylor Stark, Eugene Bak, Mehmet Iyigun, Stanko Novakovic, Lisa Hsu, Karel Trueba, Abhisek Pan, Chetan Bansal, Saravan Rajmohan, Jian Huang, and Ricardo Bian- chini. Coach: Exploiting temporal patterns for all-resource oversub- scription in ...

work page 2025
[51]

Berger, James C

Hugo Sadok, Nirav Atre, Zhipeng Zhao, Daniel S. Berger, James C. Hoe, Aurojit Panda, Justine Sherry, and Ren Wang. Enso: A stream- ing interface for NIC-Application communication. In 17th USENIX Symposium on Operating Systems Design and Implementation , OSDI, 2023

work page 2023
[52]

A cloud-scale characterization of remote procedure calls

Korakit Seemakhupt, Brent Stephens, Samira Khan, Sihang Liu, Hassan Wassel, Soheil Yeganeh Hassas, Alex Snoeren, Arvind Krishnamurthy, David Culler, and Henry Levy. A cloud-scale characterization of remote procedure calls. In 29th ACM Symposium on Operating Systems Principles, SOSP, 2023

work page 2023
[53]

FlexTOE: Flexible TCP offload with Fine-Grained parallelism

Rajath Shashidhara, Tim Stamler, Antoine Kaufmann, and Simon Peter. FlexTOE: Flexible TCP offload with Fine-Grained parallelism. In 19th USENIX Symposium on Networked Systems Design and Implementation , NSDI, 2022

work page 2022
[54]

Shreedhar and George Varghese

M. Shreedhar and George Varghese. Efficient fair queueing using deficit round robin. In 1995 ACM SIGCOMM Conference on Data Com- munication, SIGCOMM, 1995

work page 1995
[55]

Mahalingam Storvisor, D

M. Mahalingam Storvisor, D. Dutt, K. Duda, P. Agarwal, L. Kreeger, T. Sridhar, M. Bursell, and C. Wright. Virtual extensible local area net- work (vxlan): A framework for overlaying virtualized layer 2 networks over layer 3 networks, August 2014

work page 2014
[56]

Tennenhouse

David L. Tennenhouse. Layered multiplexing considered harmful. In Protocols for High Speed Networks I , PfHSN, 1989

work page 1989
[57]

Tsirkin and C

M. Tsirkin and C. Huck. Virtual i/o device (VIRTIO) version 1.2. https: //docs.oasis-open.org/virtio/virtio/v1.2/virtio-v1.2.html , July 2022

work page 2022
[58]

virtual function I/O

VFIO - "virtual function I/O". https://docs.kernel.org/driver-api/vfio. html

work page
[59]

von Eicken, A

T. von Eicken, A. Basu, V. Buch, and W. Vogels. U-Net: a user-level network interface for parallel and distributed computing. In 15th ACM Symposium on Operating Systems Principles , SOSP, 1995

work page 1995
[60]

https://www.weave.works/

Weave. https://www.weave.works/

work page
[61]

The re- source pooling principle

Damon Wischik, Mark Handley, and Marcelo Bagnulo Braun. The re- source pooling principle. SIGCOMM Computer Communication Review, 38(5):47–52, September 2008

work page 2008
[62]

https://github.com/wg/ wrk

wg/wrk: Modern HTTP benchmarking tool. https://github.com/wg/ wrk

work page
[63]

Navarro Leija, Ashlie Martinez, Jing Liu, Anna Korn- feld Simpson, Sujay Jayakar, Pedro Henrique Penna, Max Demoulin, Piali Choudhury, and Anirudh Badam

Irene Zhang, Amanda Raybuck, Pratyush Patel, Kirk Olynyk, Jacob Nelson, Omar S. Navarro Leija, Ashlie Martinez, Jing Liu, Anna Korn- feld Simpson, Sujay Jayakar, Pedro Henrique Penna, Max Demoulin, Piali Choudhury, and Anirudh Badam. The Demikernel datapath OS architecture for microsecond-scale datacenter systems. In 28th ACM Symposium on Operating System...

work page 2021
[64]

Network stack as a service in the cloud

Niu Zhixiong, Hong Xu, Dongsu Han, Peng Cheng, Yongqiang Xiong, Guo Chen, and Keith Winstein. Network stack as a service in the cloud. In 16th ACM Workshop on Hot Topics in Networks , HotNets, 2017

work page 2017
[65]

Electrode: Accelerating distributed protocols with ebpf

Yang Zhou, Zezhou Wang, Sowmya Dharanipragada, and Minlan Yu. Electrode: Accelerating distributed protocols with ebpf. In20th USENIX Symposium on Networked Systems Design and Implementation , NSDI, 2023

work page 2023
[66]

Slim: OS kernel support for a Low-Overhead container overlay network

Danyang Zhuo, Kaiyuan Zhang, Yibo Zhu, Hongqiang Harry Liu, Matthew Rockett, Arvind Krishnamurthy, and Thomas Anderson. Slim: OS kernel support for a Low-Overhead container overlay network. In 16th USENIX Symposium on Networked Systems Design and Implemen- tation, NSDI, 2019. 14

work page 2019

[1] [1]

Under- standing host interconnect congestion

Saksham Agarwal, Rachit Agarwal, Behnam Montazeri, Masoud Moshref, Khaled Elmeleegy, Luigi Rizzo, Marc Asher de Kruijf, Gautam Kumar, Sylvia Ratnasamy, David Culler, and Amin Vahdat. Under- standing host interconnect congestion. In 21st ACM Workshop on Hot Topics in Networks, HotNets, 2022

work page 2022

[2] [2]

AWS Nitro system

Amazon Web Services. AWS Nitro system. https://aws.amazon.com/ ec2/nitro/

work page

[3] [3]

IX: A protected dataplane operat- ing system for high throughput and low latency

Adam Belay, George Prekas, Ana Klimovic, Samuel Grossman, Christos Kozyrakis, and Edouard Bugnion. IX: A protected dataplane operat- ing system for high throughput and low latency. In 11th USENIX Symposium on Operating Systems Design and Implementation , OSDI, 2014

work page 2014

[4] [4]

Andromeda: Performance, isolation, and velocity at scale in cloud network virtualization

Michael Dalton, David Schultz, Jacob Adriaens, Ahsan Arefin, Anshu- man Gupta, Brian Fahs, Dima Rubinstein, Enrique Cauich Zermeno, Erik Rubow, James Alexander Docauer, Jesse Alpert, Jing Ai, Jon Olson, Kevin DeCabooter, Marc de Kruijf, Nan Hua, Nathan Lewis, Nikhil Kasinadhuni, Riccardo Crepaldi, Srinivas Krishnan, Subbaiah Venkata, Yossi Richter, Uday N...

work page 2018

[5] [5]

The tail at scale

Jeffrey Dean and Luiz André Barroso. The tail at scale. ACM Transac- tions on Computer Systems , 56(2):74–80, February 2013

work page 2013

[6] [6]

https://docs.docker.com/network/

Docker overlay. https://docs.docker.com/network/

work page

[7] [7]

G. Dommety. Key and sequence number extensions to GRE, September

work page

[8] [8]

Experiences with a high-speed network adaptor: A software perspective

Peter Druschel, Larry Peterson, and Bruce Davie. Experiences with a high-speed network adaptor: A software perspective. In 1995 ACM SIGCOMM Conference on Data Communication , SIGCOMM, 1995

work page 1995

[9] [9]

NICA: An infrastructure for inline acceleration of network applications

Haggai Eran, Lior Zeno, Maroun Tork, Gabi Malka, and Mark Silber- stein. NICA: An infrastructure for inline acceleration of network applications. In 2019 USENIX Annual Technical Conference, ATC, 2019

work page 2019

[10] [10]

Farinacci, T

D. Farinacci, T. Li, S. Hanks, D. Meyer, and P. Traina. Generic routing encapsulation (GRE), March 2000. RFC 2794

work page 2000

[11] [11]

VFP: A virtual switch platform for host SDN in the public cloud

Daniel Firestone. VFP: A virtual switch platform for host SDN in the public cloud. In 14th USENIX Symposium on Networked Systems Design and Implementation, NSDI, 2017

work page 2017

[12] [12]

https://github.com/flannel-io/flannel

Flannel. https://github.com/flannel-io/flannel

work page

[13] [13]

Making kernel bypass practical for the cloud with junction

Joshua Fried, Gohar Irfan Chaudhry, Enrique Saurez, Esha Choukshe, Íñigo Goiri, Sameh Elnikety, Rodrigo Fonseca, and Adam Belay. Making kernel bypass practical for the cloud with junction. In 21th USENIX Symposium on Networked Systems Design and Implementation , NSDI, 2024

work page 2024

[14] [14]

Caladan: Mitigating interference at microsecond timescales

Joshua Fried, Zhenyuan Ruan, Amy Ousterhout, and Adam Belay. Caladan: Mitigating interference at microsecond timescales. In 14th USENIX Symposium on Operating Systems Design and Implementation , OSDI, 2020

work page 2020

[15] [15]

Garg and Y

P. Garg and Y. Wang. Nvgre: Network virtualization using generic routing encapsulation, September 2015. RFC 7637

work page 2015

[16] [16]

BMC: Accelerating memcached using safe in-kernel caching and pre-stack processing

Yoann Ghigoff, Julien Sopena, Kahina Lazri, Antoine Blin, and Gilles Muller. BMC: Accelerating memcached using safe in-kernel caching and pre-stack processing. In 18th USENIX Symposium on Networked Systems Design and Implementation , NSDI, 2021

work page 2021

[17] [17]

Rahul Ghosh and Vijay K. Naik. Biting off safely more than you can chew: Predictive analytics for resource over-commit in iaas cloud. In Fifth IEEE International Conference on Cloud Computing , CLOUD, 2012

work page 2012

[18] [18]

Stewart Grant, Anil Yelam, Maxwell Bland, and Alex C. Snoeren. Smart- nic performance isolation with fairnic: Programmable networking for the cloud. In 2020 ACM SIGCOMM Conference on Data Communication, SIGCOMM, 2020

work page 2020

[19] [19]

Gross, I

J. Gross, I. Ganga, and T. Sridhar. Geneve: Generic network virtualiza- tion encapsulation, November 2020. RFC 8926

work page 2020

[20] [20]

A case against (most) context switches

Jack Tigar Humphries, Kostis Kaffes, David Mazières, and Christos Kozyrakis. A case against (most) context switches. In 18th Workshop on Hot Topics in Operating Systems , HOTOS, 2021

work page 2021

[21] [21]

PCI-SIG SR-IOV primer: An introduction to SR-IOV technology

Intel Corporation. PCI-SIG SR-IOV primer: An introduction to SR-IOV technology. Intel application note, January 2011. Revision 2.5

work page 2011

[22] [22]

Intel 64 and IA-32 architectures software devel- oper’s manual.https://www.intel.com/content/www/us/en/developer/ articles/technical/intel-sdm.html, July 2024

Intel Corporation. Intel 64 and IA-32 architectures software devel- oper’s manual.https://www.intel.com/content/www/us/en/developer/ articles/technical/intel-sdm.html, July 2024

work page 2024

[23] [23]

http://www.dpdk.org/

Intel data plane development kit. http://www.dpdk.org/

work page

[24] [24]

https:// www.qemu.org/docs/master/system/devices/ivshmem.html

Inter-VM shared memory device – QEMU documentation. https:// www.qemu.org/docs/master/system/devices/ivshmem.html

work page

[25] [25]

mTCP: A highly scalable user-level TCP stack for multicore systems

Eun Young Jeong, Shinae Woo, Muhammad Jamshed, Haewon Jeong, Sunghwan Ihm, Dongsu Han, and KyoungSoo Park. mTCP: A highly scalable user-level TCP stack for multicore systems. In 11th USENIX Symposium on Networked Systems Design and Implementation , NSDI, 2014

work page 2014

[26] [26]

Andersen

Anuj Kalia, Dong Zhou, Michael Kaminsky, and David G. Andersen. Raising the bar for using GPUs in software packet processing. In 12th USENIX Symposium on Networked Systems Design and Implementation , NSDI, 2015

work page 2015

[27] [27]

Sharma, Arvind Krishnamurthy, and Thomas Anderson

Antoine Kaufmann, Tim Stamler, Simon Peter, Naveen Kr. Sharma, Arvind Krishnamurthy, and Thomas Anderson. TAS: TCP acceleration as an OS service. In 14th ACM European Conference on Computer Systems, EuroSys, 2019

work page 2019

[28] [28]

Zero-copy TCP in Solaris

Hsiao keng Jerry Chu. Zero-copy TCP in Solaris. In 1996 USENIX Annual Technical Conference, ATC, 1996

work page 1996

[29] [29]

M. Kerrisk. veth - virtual ethernet device. https://man7.org/linux/man- pages/man4/veth.4.html, February 2023

work page 2023

[30] [30]

PicNIC: predictable virtualized NIC

Praveen Kumar, Nandita Dukkipati, Nathan Lewis, Yi Cui, Yaogong Wang, Chonggang Li, Valas Valancius, Jake Adriaens, Steve Gribble, Nate Foster, and Amin Vahdat. PicNIC: predictable virtualized NIC. In 2019 ACM SIGCOMM Conference on Data Communication , SIGCOMM, 2019

work page 2019

[31] [31]

Leslie, D

I.M. Leslie, D. McAuley, R. Black, T. Roscoe, P. Barham, D. Evers, R. Fairbairns, and E. Hyden. The design and implementation of an operating system to support distributed multimedia applications. IEEE Journal on Selected Areas in Communications , 14(7):1280–1297, 1996. 13

work page 1996

[32] [32]

Socks- direct: datacenter sockets can be fast and compatible

Bojie Li, Tianyi Cui, Zibo Wang, Wei Bai, and Lintao Zhang. Socks- direct: datacenter sockets can be fast and compatible. In 2019 ACM SIGCOMM Conference on Data Communication , SIGCOMM, 2019

work page 2019

[33] [33]

Accelerated virtual switching with programmable nics for scalable data center networking

Yan Luo, Eric Murray, and Timothy L Ficarra. Accelerated virtual switching with programmable nics for scalable data center networking. In 2nd ACM SIGCOMM Workshop on Virtualized Infrastructure Systems and Architectures, VISA, 2010

work page 2010

[34] [34]

Evaluating the suitability of server network cards for software routers

Maziar Manesh, Katerina Argyraki, Mihai Dobrescu, Norbert Egi, Kevin Fall, Gianluca Iannaccone, Eddie Kohler, and Sylvia Ratnasamy. Evaluating the suitability of server network cards for software routers. In 3rd ACM Workshop on Programmable Routers for Extensible Services of Tomorrow, PRESTO, 2010

work page 2010

[35] [35]

Michael Marty, Marc de Kruijf, Jacob Adriaens, Christopher Alfeld, Sean Bauer, Carlo Contavalli, Michael Dalton, Nandita Dukkipati, William C. Evans, Steve Gribble, Nicholas Kidd, Roman Kononov, Gautam Kumar, Carl Mauer, Emily Musick, Lena Olson, Erik Rubow, Michael Ryan, Kevin Springborn, Paul Turner, Valas Valancius, Xi Wang, and Amin Vahdat. Snap: a mi...

work page 2019

[36] [36]

http:// memcached.org/

memcached – distributed memory object caching system. http:// memcached.org/

work page

[37] [37]

https://github.com/RedisLabs/ memtier_benchmark

Redislabs/memtier_benchmark: NoSQL Redis and Memcache traffic generation and benchmarking tool. https://github.com/RedisLabs/ memtier_benchmark

work page

[38] [38]

Project Catapult

Microsoft Corporation. Project Catapult. https://www.microsoft.com/ en-us/research/project/project-catapult/

work page

[39] [39]

TIMELY: RTT-based congestion control for the datacenter

Radhika Mittal, Vinh The Lam, Nandita Dukkipati, Emily Blem, Hassan Wassel, Monia Ghobadi, Amin Vahdat, Yaogong Wang, David Wether- all, and David Zats. TIMELY: RTT-based congestion control for the datacenter. In 2015 ACM SIGCOMM Conference on Data Communica- tion, SIGCOMM, 2015

work page 2015

[40] [40]

Peterson

David Mosberger and Larry L. Peterson. Making paths explicit in the Scout operating system. In 2nd USENIX Symposium on Operating Systems Design and Implementation , OSDI, 1996

work page 1996

[41] [41]

https://nginx.org/

nginx. https://nginx.org/

work page

[42] [42]

NetKernel: Making network stack part of the virtualized infrastructure

Zhixiong Niu, Hong Xu, Peng Cheng, Qiang Su, Yongqiang Xiong, Tao Wang, Dongsu Han, and Keith Winstein. NetKernel: Making network stack part of the virtualized infrastructure. In 2020 USENIX Annual Technical Conference, ATC, 2020

work page 2020

[43] [43]

ConnectX-7 400G Adapters

NVIDIA. ConnectX-7 400G Adapters. https://nvdam.widen.net/s/ csf8rmnqwl/infiniband-ethernet-datasheet-connectx-7-ds-nv-us- 2544471, December 2022

work page 2022

[44] [44]

NVIDIA Bluefield-3 DPU

NVIDIA. NVIDIA Bluefield-3 DPU. https://resources.nvidia.com/en- us-accelerated-networking-resource-library/datasheet-nvidia- bluefield?lx=LbHvpR&topic=networking-cloud, March 2023

work page 2023

[45] [45]

https://www.openvswitch.org/

Open vswitch. https://www.openvswitch.org/

work page

[46] [46]

Simon Peter, Jialin Li, Irene Zhang, Dan R. K. Ports, Doug Woos, Arvind Krishnamurthy, Thomas Anderson, and Timothy Roscoe. Arrakis: The operating system is the control plane. ACM Transactions on Computer Systems, 33(4):11:1–11:30, November 2015

work page 2015

[47] [47]

ShRing: Networking with shared receive rings

Boris Pismenny, Adam Morrison, and Dan Tsafrir. ShRing: Networking with shared receive rings. In 17th USENIX Symposium on Operating Systems Design and Implementation , OSDI, 2023

work page 2023

[48] [48]

https://www.qemu.org/

QEMU – the FAST! processor emulator. https://www.qemu.org/

work page

[49] [49]

Ra- makrishnan

Shixiong Qi, Leslie Monis, Ziteng Zeng, Ian chin Wang, and K.K. Ra- makrishnan. SPRIGHT: extracting the server from serverless com- puting! high-performance ebpf-based event-driven, shared-memory processing. In 2022 ACM SIGCOMM Conference on Data Communica- tion, SIGCOMM, 2022

work page 2022

[50] [50]

Benjamin Reidys, Pantea Zardoshti, Íñigo Goiri, Celine Irvene, Daniel S. Berger, Haoran Ma, Kapil Arya, Eli Cortez, Taylor Stark, Eugene Bak, Mehmet Iyigun, Stanko Novakovic, Lisa Hsu, Karel Trueba, Abhisek Pan, Chetan Bansal, Saravan Rajmohan, Jian Huang, and Ricardo Bian- chini. Coach: Exploiting temporal patterns for all-resource oversub- scription in ...

work page 2025

[51] [51]

Berger, James C

Hugo Sadok, Nirav Atre, Zhipeng Zhao, Daniel S. Berger, James C. Hoe, Aurojit Panda, Justine Sherry, and Ren Wang. Enso: A stream- ing interface for NIC-Application communication. In 17th USENIX Symposium on Operating Systems Design and Implementation , OSDI, 2023

work page 2023

[52] [52]

A cloud-scale characterization of remote procedure calls

Korakit Seemakhupt, Brent Stephens, Samira Khan, Sihang Liu, Hassan Wassel, Soheil Yeganeh Hassas, Alex Snoeren, Arvind Krishnamurthy, David Culler, and Henry Levy. A cloud-scale characterization of remote procedure calls. In 29th ACM Symposium on Operating Systems Principles, SOSP, 2023

work page 2023

[53] [53]

FlexTOE: Flexible TCP offload with Fine-Grained parallelism

Rajath Shashidhara, Tim Stamler, Antoine Kaufmann, and Simon Peter. FlexTOE: Flexible TCP offload with Fine-Grained parallelism. In 19th USENIX Symposium on Networked Systems Design and Implementation , NSDI, 2022

work page 2022

[54] [54]

Shreedhar and George Varghese

M. Shreedhar and George Varghese. Efficient fair queueing using deficit round robin. In 1995 ACM SIGCOMM Conference on Data Com- munication, SIGCOMM, 1995

work page 1995

[55] [55]

Mahalingam Storvisor, D

M. Mahalingam Storvisor, D. Dutt, K. Duda, P. Agarwal, L. Kreeger, T. Sridhar, M. Bursell, and C. Wright. Virtual extensible local area net- work (vxlan): A framework for overlaying virtualized layer 2 networks over layer 3 networks, August 2014

work page 2014

[56] [56]

Tennenhouse

David L. Tennenhouse. Layered multiplexing considered harmful. In Protocols for High Speed Networks I , PfHSN, 1989

work page 1989

[57] [57]

Tsirkin and C

M. Tsirkin and C. Huck. Virtual i/o device (VIRTIO) version 1.2. https: //docs.oasis-open.org/virtio/virtio/v1.2/virtio-v1.2.html , July 2022

work page 2022

[58] [58]

virtual function I/O

VFIO - "virtual function I/O". https://docs.kernel.org/driver-api/vfio. html

work page

[59] [59]

von Eicken, A

T. von Eicken, A. Basu, V. Buch, and W. Vogels. U-Net: a user-level network interface for parallel and distributed computing. In 15th ACM Symposium on Operating Systems Principles , SOSP, 1995

work page 1995

[60] [60]

https://www.weave.works/

Weave. https://www.weave.works/

work page

[61] [61]

The re- source pooling principle

Damon Wischik, Mark Handley, and Marcelo Bagnulo Braun. The re- source pooling principle. SIGCOMM Computer Communication Review, 38(5):47–52, September 2008

work page 2008

[62] [62]

https://github.com/wg/ wrk

wg/wrk: Modern HTTP benchmarking tool. https://github.com/wg/ wrk

work page

[63] [63]

Navarro Leija, Ashlie Martinez, Jing Liu, Anna Korn- feld Simpson, Sujay Jayakar, Pedro Henrique Penna, Max Demoulin, Piali Choudhury, and Anirudh Badam

Irene Zhang, Amanda Raybuck, Pratyush Patel, Kirk Olynyk, Jacob Nelson, Omar S. Navarro Leija, Ashlie Martinez, Jing Liu, Anna Korn- feld Simpson, Sujay Jayakar, Pedro Henrique Penna, Max Demoulin, Piali Choudhury, and Anirudh Badam. The Demikernel datapath OS architecture for microsecond-scale datacenter systems. In 28th ACM Symposium on Operating System...

work page 2021

[64] [64]

Network stack as a service in the cloud

Niu Zhixiong, Hong Xu, Dongsu Han, Peng Cheng, Yongqiang Xiong, Guo Chen, and Keith Winstein. Network stack as a service in the cloud. In 16th ACM Workshop on Hot Topics in Networks , HotNets, 2017

work page 2017

[65] [65]

Electrode: Accelerating distributed protocols with ebpf

Yang Zhou, Zezhou Wang, Sowmya Dharanipragada, and Minlan Yu. Electrode: Accelerating distributed protocols with ebpf. In20th USENIX Symposium on Networked Systems Design and Implementation , NSDI, 2023

work page 2023

[66] [66]

Slim: OS kernel support for a Low-Overhead container overlay network

Danyang Zhuo, Kaiyuan Zhang, Yibo Zhu, Hongqiang Harry Liu, Matthew Rockett, Arvind Krishnamurthy, and Thomas Anderson. Slim: OS kernel support for a Low-Overhead container overlay network. In 16th USENIX Symposium on Networked Systems Design and Implemen- tation, NSDI, 2019. 14

work page 2019