pith. sign in

arxiv: 2604.27686 · v1 · submitted 2026-04-30 · 💻 cs.NI

Libra: Accelerating Socket I/O via Programmable Selective Data Copying

Pith reviewed 2026-05-07 06:14 UTC · model grok-4.3

classification 💻 cs.NI
keywords L7 proxiessocket I/Okernel-user copyselective copyingeBPFnetwork performancethroughput optimizationtail latency
0
0 comments X

The pith

Completely removing kernel-user copies while keeping standard socket semantics is impossible, so Libra selectively copies only metadata and reuses bulk payloads in the kernel.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that fully eliminating kernel-user data copies for socket operations while preserving standard POSIX semantics for unmodified applications is fundamentally impossible under conventional OS abstractions. It proposes Libra as a selective-copy framework that moves only the small metadata proxies inspect for routing decisions, such as HTTP headers, and keeps the bulk payload in the kernel for direct forwarding on the transmit path. eBPF identifies protocol-specific metadata boundaries and coordinates the selective copy plus payload reuse across receive and transmit without any changes to the socket API or applications. A sympathetic reader cares because L7 proxies like Nginx and HAProxy sit at the core of cloud-native traffic handling, and the measured gains reach 4.2x plaintext throughput and over 90% lower P99 tail latency.

Core claim

Under conventional OS abstractions, fully eliminating kernel-user copies while preserving standard socket semantics for unmodified proxies is fundamentally impossible. This leads to the practical insight that in common L7 workloads proxies inspect only small metadata for routing while forwarding the bulk payload unchanged. Libra therefore copies only metadata to user space and retains the bulk payload in the kernel for forwarding, using eBPF to identify protocol-specific metadata boundaries and coordinate selective copy and payload reuse across receive and transmit paths, all without modifying the socket API.

What carries the argument

Libra selective-copy framework, which uses eBPF to identify protocol-specific metadata boundaries and coordinate selective copy and payload reuse across receive and transmit paths.

If this is right

  • Unmodified Nginx and HAProxy achieve up to 4.2x higher plaintext throughput.
  • P99 tail latency drops by more than 90% for plaintext traffic.
  • With hardware-offloaded kTLS, encrypted throughput rises by 2.0x and tail latency falls by 65%.
  • The gains require no new APIs, no application changes, and no specialized environments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same metadata-versus-payload split could apply to other kernel-user interfaces such as file I/O or message queues where data is mostly forwarded unchanged.
  • Kernel developers might add native programmable hooks for payload reuse to reduce reliance on eBPF for this coordination.
  • Widespread adoption would lower CPU cycles spent on memory copies, potentially reducing power draw in data-center proxy fleets.
  • Extending the boundary detection to additional protocols like QUIC or gRPC would test whether the approach generalizes beyond HTTP and TLS.

Load-bearing premise

Common L7 workloads involve proxies that inspect only small metadata for routing while forwarding the bulk payload unchanged, and that eBPF can reliably identify the protocol-specific metadata boundaries.

What would settle it

A workload measurement showing that typical proxies inspect or modify large portions of the payload rather than small metadata, or an experiment where eBPF cannot correctly locate metadata boundaries for standard protocols, would falsify the selective-copy benefit.

Figures

Figures reproduced from arXiv: 2604.27686 by Kairui Zhou, Shengkai Lin, Shizhen Zhao, Wei Zhang.

Figure 1
Figure 1. Figure 1: Analysis of cross-boundary copying overhead and TLB efficiency. the Tx overhead (10.4%). To understand this, we separately profile TLB miss rates on the receive and transmit paths us￾ing a single-stream file download setup (client dominated by Rx, server by Tx), as shown in Figure 1b. On the Rx path, the NIC DMAs packets into discontiguous page fragments, and copy_to_user performs scattered reads across th… view at source ↗
Figure 2
Figure 2. Figure 2: Virtual memory remapping process. • POSIX-Compatible recv (G2): The system supports the standard recv(void *buf, size_t len, ...) interface. Under this paradigm, the buffer buf (virtual buffer𝑉 ) follows the properties of Flexible Buffering: it is allocated prior to the recv call, and its length len (𝐿) can be set to any positive value by the application developer. Basic Assumptions (System Constraints): •… view at source ↗
Figure 3
Figure 3. Figure 3: Data flow comparison between standard and Libra-based L7 proxy forwarding. the proxy issues send, the VPI is included in the outbound stream, allowing Libra to maintain the tracking chain. Secure Mapping. To prevent leaking kernel memory layouts and violating Kernel Address Space Layout Randomization (KASLR) [33], the VPI is never exposed as a raw physical or virtual pointer. Instead, it is generated as a … view at source ↗
Figure 4
Figure 4. Figure 4: The eBPF-driven ingress state machine. The top half denotes the eBPF control plane’s logical phase, while the bottom half denotes the data plane kernel actions. (§3.4) confirms that the payload has been fully transmitted and resets the state to DEFAULT. 3.4 Egress: Payload Reassembly with L7 State Synchronization The egress data path is responsible for reassembling newly constructed L7 metadata with the an… view at source ↗
Figure 6
Figure 6. Figure 6: Performance comparison of Libra vs. standard stack (and F-Stack/kTLS variants). compared to standard Nginx and by over 60% compared to standard HAProxy. F-Stack achieves the lowest tail latency for small payloads, maintaining values near 4 ms. At 32 KB, its P99 latency is less than 20% of that of Libra-Pinned. However, its tail latency spikes with increasing payload size, while Libra-Pinned still sustains … view at source ↗
Figure 7
Figure 7. Figure 7: Percentage of CPU cycles spent on data copying across different proxies. 1KB 2KB 4KB 8KB 16KB 32KB 64KB 128KB 256KB 512KB Payload Size 0.0 0.5 1.0 1.5 Normalized Speedup Self-Baseline Libra Speedup Copier Speedup (a) Normalized speedup under 8-connection. 1KB 2KB 4KB 8KB 16KB 32KB 64KB 128KB 256KB 512KB Payload Size 0 1 2 3 Normalized Speedup Self-Baseline Libra Speedup Copier Speedup (b) Normalized speedu… view at source ↗
Figure 9
Figure 9. Figure 9: Comparative analysis between Libra and the standard stack across different performance metrics. Meta Sel-Copy: selective copying of new L7 metadata; Meta Alloc: memory allocation for new L7 metadata; Meta eBPF: eBPF program; Meta SKB-Trans: zero-copy reuse of anchored payloads. Libra exhibits payload-independent overhead, remaining relatively stable between 19% and 30%. This overhead primar￾ily stems from … view at source ↗
Figure 10
Figure 10. Figure 10: Analysis of CPU overhead components under software kTLS. 1kb 2kb 4kb 8kb 16kb 32kb 64kb 128kb 256kb 512kb 1024kb Payload Size 0 20 40 60 CPU Overhead (%) HW recv-copy SW recv-copy HW send-copy SW aes-ni HW send-alloc SW send-alloc view at source ↗
Figure 11
Figure 11. Figure 11: Analysis of CPU overhead components: kTLS HW vs. SW in the standard stack. B.2 Why HW kTLS Gains Little Furthermore, eliminating the receive-side copy forces the AES-NI engine to process plaintext directly from the frag￾mented sk_buff structures in the receive socket. In the standard Linux network stack, the payload is copied into a contiguous user-space buffer, enabling the AES-NI pipeline to benefit fro… view at source ↗
read the original abstract

Layer-7 (L7) proxies are critical to modern cloud-native systems, yet their performance is increasingly bottlenecked by copying entire payloads across the kernel-user boundary. Existing approaches reduce this overhead but typically sacrifice compatibility with unmodified POSIX applications, introduce new APIs, or require specialized environments. We show that, under conventional OS abstractions, fully eliminating kernel-user copies while preserving standard socket semantics for unmodified proxies is fundamentally impossible. This leads to a practical insight: in common L7 workloads, proxies inspect only small metadata (e.g., HTTP headers) for routing, while forwarding the bulk payload unchanged. Based on this insight, we present Libra, an OS-level selective-copy framework that copies only metadata to the user space and retains the bulk payload in the kernel for forwarding, reducing data movement without breaking compatibility. Libra uses eBPF to identify protocol-specific metadata boundaries and coordinate selective copy and payload reuse across receive and transmit paths, all without modifying the socket API. Implemented in Linux and evaluated with unmodified Nginx and HAProxy, Libra improves plaintext throughput by up to 4.2x and reduces P99 tail latency by over 90%. With hardware-offloaded kTLS, it boosts encrypted throughput by 2.0x and cuts tail latency by 65%.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The paper argues that fully eliminating kernel-user copies for L7 proxies is impossible under standard POSIX socket semantics without breaking compatibility for unmodified applications. It introduces Libra, an eBPF-based selective-copy framework that copies only small protocol metadata (e.g., headers) to userspace while retaining bulk payload in the kernel for forwarding. The system is implemented in Linux and evaluated with unmodified Nginx and HAProxy, reporting up to 4.2× plaintext throughput gains, >90% P99 latency reduction, and 2.0× encrypted throughput improvement with hardware-offloaded kTLS.

Significance. If the impossibility argument can be made rigorous and the eBPF boundary detection shown reliable for common L7 protocols, Libra would represent a practical advance for high-performance proxies by reducing data movement without new APIs or application changes. The performance claims, if supported by reproducible methodology, would be significant for cloud networking systems. The work credits standard eBPF mechanisms and focuses on compatibility, which strengthens its potential impact if the core assumptions hold.

major comments (3)
  1. [Introduction, §2] Introduction and §2 (Impossibility Argument): The central claim that no existing mechanism (sendfile, splice, io_uring, etc.) can achieve zero-copy forwarding while allowing unmodified proxies to perform recv() on metadata and send() on the same data is asserted informally without a proof sketch, exhaustive case analysis, or counter-example enumeration. This load-bearing motivation for selective copy requires concrete substantiation to support the 'fundamentally impossible' conclusion.
  2. [§4.2] §4.2 (eBPF Boundary Detection): The correctness of eBPF programs for locating protocol-specific metadata boundaries at RX and TX paths is not demonstrated for edge cases including variable-length headers, chunked transfer encoding, pipelined requests, or delimiter collisions in payloads. A misidentified boundary would either violate preserved socket semantics or reuse incorrect data, directly undermining the selective-copy guarantee.
  3. [§5] §5 (Evaluation): Performance results (4.2× throughput, 90% latency reduction) are presented without a complete methodology section detailing workloads (request sizes, concurrency, protocol mix), exact baseline configurations, number of runs, or error bars. This absence weakens assessment of the reported gains and their reproducibility.
minor comments (3)
  1. [Abstract] Abstract: The phrase 'up to 4.2x' should be qualified with the specific configuration (e.g., request size, proxy type) to avoid overgeneralization.
  2. [Figures] Figure 2 or 3 (Architecture diagrams): Add annotations clarifying the exact eBPF hook points and data flow between kernel payload reuse and userspace metadata.
  3. [§6] §6 (Related Work): Include explicit comparison to prior eBPF-based protocol parsers (e.g., for HTTP header extraction) to better position the novelty of the selective-copy coordination.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment point-by-point below, outlining specific revisions we will make to improve rigor, clarity, and reproducibility while preserving the core contributions of Libra.

read point-by-point responses
  1. Referee: [Introduction, §2] Introduction and §2 (Impossibility Argument): The central claim that no existing mechanism (sendfile, splice, io_uring, etc.) can achieve zero-copy forwarding while allowing unmodified proxies to perform recv() on metadata and send() on the same data is asserted informally without a proof sketch, exhaustive case analysis, or counter-example enumeration. This load-bearing motivation for selective copy requires concrete substantiation to support the 'fundamentally impossible' conclusion.

    Authors: We agree that the impossibility argument is presented at a high level and would benefit from greater substantiation. In the revised manuscript, we will expand §2 with a structured case analysis of existing mechanisms. We will enumerate the POSIX semantics that must be preserved (recv() delivering inspectable metadata to user space, followed by send() on the identical data without recopying) and explain why sendfile and splice cannot be used by unmodified proxies (they bypass user space or restrict operations to kernel-to-kernel paths), while io_uring still requires data movement to user space for L7 inspection or relies on non-POSIX interfaces. A concise proof sketch will formalize the incompatibility: any zero-copy path either denies the proxy access to metadata or violates the requirement that the same buffer be forwarded unchanged. This will strengthen the motivation for selective copying without a full formal proof, which lies outside the paper's scope. revision: yes

  2. Referee: [§4.2] §4.2 (eBPF Boundary Detection): The correctness of eBPF programs for locating protocol-specific metadata boundaries at RX and TX paths is not demonstrated for edge cases including variable-length headers, chunked transfer encoding, pipelined requests, or delimiter collisions in payloads. A misidentified boundary would either violate preserved socket semantics or reuse incorrect data, directly undermining the selective-copy guarantee.

    Authors: We acknowledge that §4.2 would be strengthened by explicit handling of edge cases. In the revision, we will expand this section with a detailed description of the eBPF state machines: variable-length headers are parsed by accumulating data until the CRLFCRLF delimiter while tracking header size limits; chunked encoding is detected via the Transfer-Encoding header and chunk-size parsing; pipelined requests are processed sequentially within the socket buffer using per-connection state; and delimiter collisions are mitigated by protocol context (e.g., validating HTTP methods before treating a byte sequence as a boundary). We will add synthetic test results covering these scenarios for HTTP/1.1 and HTTPS to demonstrate reliability under common L7 workloads, while noting assumptions and limitations for non-standard payloads. revision: yes

  3. Referee: [§5] §5 (Evaluation): Performance results (4.2× throughput, 90% latency reduction) are presented without a complete methodology section detailing workloads (request sizes, concurrency, protocol mix), exact baseline configurations, number of runs, or error bars. This absence weakens assessment of the reported gains and their reproducibility.

    Authors: We agree that the evaluation lacks sufficient methodological detail. We will add a dedicated 'Experimental Methodology' subsection to §5 specifying: workloads (HTTP GET/POST requests with sizes 1 KB–1 MB, concurrency levels of 100–1000 connections, and plaintext/TLS protocol mixes); exact baseline configurations (vanilla Linux kernel sockets, sendfile, splice, and io_uring with unmodified Nginx 1.25 and HAProxy 2.8); number of runs (minimum 10 repetitions per configuration with warm-up periods); and statistical reporting (means with standard deviation error bars). These additions will directly support reproducibility of the throughput and latency results. revision: yes

Circularity Check

0 steps flagged

No circularity detected; claims rest on informal impossibility argument and external workload observations rather than self-referential definitions or fitted inputs.

full rationale

The paper's central claim is an informal impossibility result under standard POSIX socket semantics, followed by an empirical observation about L7 proxy behavior (inspecting only small metadata) and an implementation using eBPF for boundary detection. No equations, parameters, or derivations are present that reduce to the paper's own inputs by construction. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. The design is self-contained against external benchmarks (Linux kernel mechanisms, unmodified Nginx/HAProxy) with no renaming of known results or smuggling of prior author work. This is the expected outcome for a systems implementation paper without mathematical modeling or parameter fitting.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central design rests on one domain assumption about proxy behavior and introduces the Libra framework itself; no free parameters or additional invented physical entities are mentioned.

axioms (1)
  • domain assumption In common L7 workloads, proxies inspect only small metadata (e.g., HTTP headers) for routing, while forwarding the bulk payload unchanged.
    Explicitly stated as the practical insight that enables the selective-copy design.
invented entities (1)
  • Libra selective-copy framework no independent evidence
    purpose: To copy only metadata to user space and retain bulk payload in the kernel for forwarding using eBPF coordination
    New system component introduced to realize the selective-copy behavior while preserving socket semantics.

pith-pipeline@v0.9.0 · 5526 in / 1311 out tokens · 93582 ms · 2026-05-07T06:14:43.938216+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages

  1. [1]

    Advanced Encryption Standard Instructions (AES- NI).https://www.intel.com/content/www/us/en/developer/articles/ technical/advanced-encryption-standard-instructions-aes-ni.html

    aes-ni 2012. Advanced Encryption Standard Instructions (AES- NI).https://www.intel.com/content/www/us/en/developer/articles/ technical/advanced-encryption-standard-instructions-aes-ni.html

  2. [2]

    What is an Application Load Balancer? https://docs.aws.amazon.com/elasticloadbalancing/latest/ application/introduction.html

    aws-alb 2026. What is an Application Load Balancer? https://docs.aws.amazon.com/elasticloadbalancing/latest/ application/introduction.html

  3. [3]

    Adam Belay, George Prekas, Ana Klimovic, Samuel Grossman, Chris- tos Kozyrakis, and Edouard Bugnion. 2014. IX: a protected dataplane operating system for high throughput and low latency. InProceedings of the 11th USENIX Conference on Operating Systems Design and Im- plementation(Broomfield, CO)(OSDI’14). USENIX Association, USA, 49–65

  4. [4]

    Belshe, R

    M. Belshe, R. Peon, and M. Thomson. 2015. Hypertext Transfer Protocol Version 2 (HTTP/2). RFC 7540,https://www.rfc-editor.org/rfc/rfc7540. doi:10.17487/RFC7540

  5. [5]

    Nielsen, R

    Tim Berners-Lee, Roy T. Fielding, and Henrik Frystyk Nielsen. 1996. Hypertext Transfer Protocol – HTTP/1.0. RFC 1945,https://www.rfc- editor.org/rfc/rfc1945. doi:10.17487/RFC1945

  6. [6]

    Tom Callahan, Mark Allman, and Vern Paxson. 2010. A longitudinal view of HTTP traffic. InProceedings of the 11th International Conference on Passive and Active Measurement(Zurich, Switzerland)(PAM’10). Springer-Verlag, Berlin, Heidelberg, 222–231

  7. [7]

    Hsiao-keng Jerry Chu. 1996. Zero-copy TCP in Solaris. InProceedings of the 1996 Annual Conference on USENIX Annual Technical Conference (San Diego, CA)(ATEC ’96). USENIX Association, USA, 21

  8. [8]

    NVIDIA ConnectX-5 Ethernet Adapter.https://www

    connectx5 2016. NVIDIA ConnectX-5 Ethernet Adapter.https://www. nvidia.com/en-in/networking/ethernet/connectx-5/

  9. [9]

    NVIDIA ConnectX-6 Dx Ethernet Adapter.https: //www.nvidia.com/en-in/networking/ethernet/connectx-6-dx/

    connectx6-dx 2019. NVIDIA ConnectX-6 Dx Ethernet Adapter.https: //www.nvidia.com/en-in/networking/ethernet/connectx-6-dx/

  10. [10]

    Jonathan Corbet. 2017. Zero-copy networking.https://lwn.net/ Articles/726917/

  11. [11]

    Jonathan Corbet. 2018. Zero-copy TCP receive.https://lwn.net/ Articles/752188/

  12. [12]

    DPDK.https://www.dpdk.org

    dpdk 2026. DPDK.https://www.dpdk.org

  13. [13]

    Yanlin Du and Ruslan Nikolaev. 2025. Joyride: Rethinking Linux’s network stack design for better performance, security, and reliability. InProceedings of the 3rd Workshop on Kernel Isolation, Safety and Verifi- cation(Seoul, Republic of Korea)(KISV ’25). Association for Computing Machinery, New York, NY, USA, 25–31. doi:10.1145/3765889.3767045

  14. [14]

    Eric Dumazet and Coco Li. 2021. BIG TCP. Presentation at Netdev Conference 0x15.https://netdevconf.info/0x15/slides/35/BIG%20TCP. pdf

  15. [15]

    eBPF.https://ebpf.io/

    ebpf 2026. eBPF.https://ebpf.io/

  16. [16]

    Envoy Proxy.https://envoyproxy.io

    envoy 2026. Envoy Proxy.https://envoyproxy.io

  17. [17]

    Fielding, M

    R. Fielding, M. Nottingham, and J. Reschke. 2022. HTTP/1.1. RFC 9112, https://www.rfc-editor.org/rfc/rfc9112. doi:10.17487/RFC9112

  18. [18]

    F-Stack.https://www.f-stack.org/

    fstack 2026. F-Stack.https://www.f-stack.org/

  19. [19]

    Will Glozer. 2012. wrk: a modern HTTP benchmarking tool.https: //github.com/wg/wrk

  20. [20]

    Abraham Gonzalez, Aasheesh Kolli, Samira Khan, Sihang Liu, Vidushi Dadu, Sagar Karandikar, Jichuan Chang, Krste Asanovic, and Parthasarathy Ranganathan. 2023. Profiling Hyperscale Big Data Pro- cessing. InProceedings of the 50th Annual International Symposium on Computer Architecture(Orlando, FL, USA)(ISCA ’23). Association for Computing Machinery, New Yo...

  21. [21]

    Brendan Gregg. 2011. FlameGraph.https://github.com/brendangregg/ FlameGraph

  22. [22]

    HAProxy.https://www.haproxy.org

    haproxy 2026. HAProxy.https://www.haproxy.org

  23. [23]

    Yutaro Hayakawa, Michio Honda, Douglas Santry, and Lars Eggert

  24. [24]

    In18th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2021, April 12-14, 2021, James Mickens and Renata Teixeira (Eds.)

    Prism: Proxies without the Pain. In18th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2021, April 12-14, 2021, James Mickens and Renata Teixeira (Eds.). USENIX Association, 535–549.https://www.usenix.org/conference/nsdi21/presentation/ hayakawa

  25. [25]

    Jingkai He, Yunpeng Dong, Dong Du, Mo Zou, Zhitai Yu, Yuxin Ren, Ning Jia, Yubin Xia, and Haibo Chen. 2025. How to Copy Memory? Coordinated Asynchronous Copy as a First-Class OS Service. InPro- ceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles(Lotte Hotel World, Seoul, Republic of Korea)(SOSP ’25). Association for Computing Machine...

  26. [26]

    Introduction to Header-Data Split

    header-data-split 2021. Introduction to Header-Data Split. https://learn.microsoft.com/en-us/windows-hardware/drivers/ network/header-data-split

  27. [27]

    Page Weight Report.https://httparchive.org/reports/ page-weight

    http-archive 2026. Page Weight Report.https://httparchive.org/reports/ page-weight

  28. [28]

    Intel Xeon Silver 4110 Processor.https://www

    intel-xeon-4110 2017. Intel Xeon Silver 4110 Processor.https://www. intel.com/content/www/us/en/products/sku/123547/intel-xeon- silver-4110-processor-11m-cache-2-10-ghz/specifications.html

  29. [29]

    ISO/IEC 9899:2024: Information technology–Programming languages–C

    isoc 2024. ISO/IEC 9899:2024: Information technology–Programming languages–C. International Organization for Standardization, Geneva, Switzerland

  30. [30]

    RFC 9000 (May 2021)

    J. Iyengar and M. Thomson. 2021. QUIC: A UDP-Based Multiplexed and Secure Transport. RFC 9000,https://www.rfc-editor.org/rfc/rfc9000. doi:10.17487/RFC9000

  31. [31]

    Eun Young Jeong, Shinae Woo, Muhammad Jamshed, Haewon Jeong, Sunghwan Ihm, Dongsu Han, and KyoungSoo Park. 2014. mTCP: a highly scalable user-level TCP stack for multicore systems. InProceed- ings of the 11th USENIX Conference on Networked Systems Design and Implementation(Seattle, WA)(NSDI’14). USENIX Association, USA, 489–502. 13

  32. [32]

    Svilen Kanev, Juan Pablo Darago, Kim Hazelwood, Parthasarathy Ran- ganathan, Tipp Moseley, Gu-Yeon Wei, and David Brooks. 2015. Pro- filing a Warehouse-Scale Computer. InProceedings of the 42nd Annual International Symposium on Computer Architecture(Portland, Oregon) (ISCA ’15). Association for Computing Machinery, New York, NY, USA, 158–169. doi:10.1145/...

  33. [33]

    Svilen Kanev, Sam Likun Xi, Gu-Yeon Wei, and David Brooks. 2017. Mallacc: Accelerating Memory Allocation. InProceedings of the Twenty- Second International Conference on Architectural Support for Program- ming Languages and Operating Systems(Xi’an, China)(ASPLOS ’17). Association for Computing Machinery, New York, NY, USA, 33–45. doi:10.1145/3037697.3037736

  34. [34]

    The Linux Kernel Documentation: Kernel Self-Protection Project.https://www.kernel.org/doc/html/latest/security/self- protection.html#kernel-address-space-layout-randomization-kaslr

    kaslr 2026. The Linux Kernel Documentation: Kernel Self-Protection Project.https://www.kernel.org/doc/html/latest/security/self- protection.html#kernel-address-space-layout-randomization-kaslr

  35. [35]

    Khalidi and Moti N

    Yousef A. Khalidi and Moti N. Thadani. 1995.An Efficient Zero-Copy I/O Framework for UNIX. Technical Report. USA

  36. [36]

    Knuth, J.H

    Donald E. Knuth, James H. Morris, Jr., and Vaughan R. Pratt. 1977. Fast Pattern Matching in Strings.SIAM J. Comput.6, 2 (1977), 323–350. doi:10.1137/0206024

  37. [37]

    Krasic, M

    B. Krasic, M. Bishop, and A. Frindell. 2022. QPACK: Header Compres- sion for HTTP/3. RFC 9204,https://www.rfc-editor.org/rfc/rfc9204. doi:10.17487/RFC9204

  38. [38]

    Kernel TLS.https://www.kernel.org/doc/html/latest/ networking/tls.html

    ktls 2026. Kernel TLS.https://www.kernel.org/doc/html/latest/ networking/tls.html

  39. [39]

    ld.so(8) - Linux manual page.https://man7.org/linux/man- pages/man8/ld.so.8.html

    ld.so 2026. ld.so(8) - Linux manual page.https://man7.org/linux/man- pages/man8/ld.so.8.html

  40. [40]

    Bojie Li, Tianyi Cui, Zibo Wang, Wei Bai, and Lintao Zhang. 2019. Socksdirect: datacenter sockets can be fast and compatible. InProceed- ings of the ACM Special Interest Group on Data Communication(Beijing, China)(SIGCOMM ’19). Association for Computing Machinery, New York, NY, USA, 90–103. doi:10.1145/3341302.3342071

  41. [41]

    Maltz and Pravin Bhagwat

    David A. Maltz and Pravin Bhagwat. 2000. TCP splice application layer proxy performance.J. High Speed Netw.8, 3 (Jan. 2000), 225–240

  42. [42]

    NGINX.https://nginx.org

    nginx 2026. NGINX.https://nginx.org

  43. [43]

    OpenSSL: Cryptography and SSL/TLS Toolkit.https: //www.openssl.org/

    openssl 2026. OpenSSL: Cryptography and SSL/TLS Toolkit.https: //www.openssl.org/

  44. [44]

    Tian Pan, Enge Song, Yueshang Zuo, Shaokai Zhang, Yang Song, Jiangu Zhao, Wengang Hou, Jianyuan Lu, Xiaoqing Sun, Shize Zhang, Ye Yang, Jiao Zhang, Tao Huang, Biao Lyu, Xing Li, Rong Wen, Zhigang Zong, and Shunmin Zhu. 2025. Hermes: Enhancing Layer-7 Cloud Load Bal- ancers with Userspace-Directed I/O Event Notification. InProceedings of the ACM SIGCOMM 20...

  45. [45]

    Maltz, Randy Kern, Hemant Kumar, Marios Zikos, Hongyu Wu, Changhoon Kim, and Naveen Karri

    Parveen Patel, Deepak Bansal, Lihua Yuan, Ashwin Murthy, Albert Greenberg, David A. Maltz, Randy Kern, Hemant Kumar, Marios Zikos, Hongyu Wu, Changhoon Kim, and Naveen Karri. 2013. Ananta: cloud scale load balancing. InProceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM(Hong Kong, China)(SIGCOMM ’13). As- sociation for Computing Machinery, New York,...

  46. [46]

    Dinglan Peng, Congyu Liu, Tapti Palit, Anjo Vahldiek-Oberwagner, Mona Vij, and Pedro Fonseca. 2025. Pegasus: Transparent and Uni- fied Kernel-Bypass Networking for Fast Local and Remote Com- munication. InProceedings of the Twentieth European Conference on Computer Systems(Rotterdam, Netherlands)(EuroSys ’25). As- sociation for Computing Machinery, New Yo...

  47. [47]

    The new linux ’perf’ tools.http://oldvger.kernel.org/~acme/ perf/lk2010-perf-paper.pdf

    perf 2010. The new linux ’perf’ tools.http://oldvger.kernel.org/~acme/ perf/lk2010-perf-paper.pdf

  48. [48]

    Simon Peter, Jialin Li, Irene Zhang, Dan R. K. Ports, Doug Woos, Arvind Krishnamurthy, Thomas Anderson, and Timothy Roscoe. 2015. Arrakis: The Operating System Is the Control Plane.ACM Trans. Comput. Syst. 33, 4, Article 11 (Nov. 2015), 30 pages. doi:10.1145/2812806

  49. [49]

    IEEE Standard for Information Technology–Portable Operating System Interface (POSIX) Base Specifications, Issue 7

    posix 2018. IEEE Standard for Information Technology–Portable Operating System Interface (POSIX) Base Specifications, Issue 7. IEEE Std 1003.1-2017,https://ieeexplore.ieee.org/document/8277153. doi:10.1109/IEEESTD.2018.8277153

  50. [50]

    Architectural Specifications for RDMA over TCP/IP.https://www.rdmaconsortium.org

    rdma-consortium 2009. Architectural Specifications for RDMA over TCP/IP.https://www.rdmaconsortium.org

  51. [51]

    Seastar.https://seastar.io

    seastar 2026. Seastar.https://seastar.io

  52. [52]

    sendfile(2) - Linux manual page.https://man7.org/linux/ man-pages/man2/sendfile.2.html

    sendfile 2026. sendfile(2) - Linux manual page.https://man7.org/linux/ man-pages/man2/sendfile.2.html

  53. [53]

    Xiaoyi Shi, Lin He, Jiasheng Zhou, Yifan Yang, and Ying Liu. 2025. Miresga: Accelerating Layer-7 Load Balancing with Programmable Switches. InProceedings of the ACM on Web Conference 2025(Sydney NSW, Australia)(WWW ’25). Association for Computing Machinery, New York, NY, USA, 2424–2434. doi:10.1145/3696410.3714809

  54. [54]

    sr-iov 2011. PCI-SIG SR-IOV Primer: An Introduction to SR-IOV Technology.https://www.intel.com/content/www/us/en/content- details/321211/pci-sig-sr-iov-primer-an-introduction-to-sr-iov- technology.html

  55. [55]

    Timothy Stamler, Deukyeon Hwang, Amanda Raybuck, Wei Zhang, and Simon Peter. 2022. zIO: Accelerating IO-Intensive Applications with Transparent Zero-Copy IO. In16th USENIX Symposium on Operat- ing Systems Design and Implementation, OSDI 2022, Carlsbad, CA, USA, July 11-13, 2022, Marcos K. Aguilera and Hakim Weatherspoon (Eds.). USENIX Association, 431–445...

  56. [56]

    Martin Thomson and Chris Benfield. 2022. HTTP/3. RFC 9114,https: //www.rfc-editor.org/rfc/rfc9114. doi:10.17487/RFC9114

  57. [57]

    Tinyproxy: lightweight http(s) proxy daemon.https: //tinyproxy.github.io/

    tinyproxy 2026. Tinyproxy: lightweight http(s) proxy daemon.https: //tinyproxy.github.io/

  58. [58]

    Traefik Proxy.https://traefik.io

    traefik 2026. Traefik Proxy.https://traefik.io

  59. [59]

    Marcos A. M. Vieira, Matheus S. Castanho, Racyus D. G. Pacífico, Elerson R. S. Santos, Eduardo P. M. Câmara Júnior, and Luiz F. M. Vieira. 2020. Fast Packet Processing with eBPF and XDP: Concepts, Code, Challenges, and Applications.ACM Comput. Surv.53, 1, Article 16 (Feb. 2020), 36 pages. doi:10.1145/3371038

  60. [60]

    Ziqi Wei, Zhiqiang Wang, Qing Li, Yuan Yang, Cheng Luo, Fuyu Wang, Yong Jiang, Sijie Yang, and Zhenhui Yuan. 2024. QDSR: accelerating layer-7 load balancing by direct server return with QUIC. InProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference (Santa Clara, CA, USA)(USENIX ATC’24). USENIX Association, USA, Article 44, 16 pages

  61. [61]

    encrypt-and-copy

    Wensong Zhang. 2000. Linux Virtual Server for Scalable Network Services.ottawa linux symposium(2000). 14 A IMPLEMENTATION DETAILS A.1 Receive Socket Memory Management Payload anchoring retains uncopied data in the kernel, in- creasing socket memory usage and causing the OS to au- tomatically shrink the TCP receive window. Since the the- oretical per-socke...