pith. sign in

arxiv: 2604.21881 · v1 · submitted 2026-04-23 · 💻 cs.NI · cs.AR

SPAC: Automating FPGA-based Network Switches with Protocol Adaptive Customization

Pith reviewed 2026-05-08 13:55 UTC · model grok-4.3

classification 💻 cs.NI cs.AR
keywords FPGA network switchesprotocol adaptive customizationdesign space explorationHLS componentsmulti-fidelity simulationtrace-aware optimizationcustom network hardwarePareto-optimal designs
0
0 comments X

The pith

SPAC automates FPGA network switches co-optimized for specific protocols and workloads, cutting LUT usage by 55% and latency by up to 38% versus fixed designs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces SPAC to automate the creation of custom FPGA-based network switches that match particular protocols and application traffic patterns. It supplies a domain-specific language for joint protocol and architecture design, a library of reusable HLS switch components, and a trace-aware design space exploration engine backed by multi-fidelity simulations. A sympathetic reader would care because network demands now range from minimal-delay sensor and trading traffic to sustained high-throughput bulk transfers, and generic switches waste resources on either end of that spectrum. By letting the tool explore and select efficient designs before any hardware is built, SPAC aims to make specialized switches practical without lengthy manual engineering. The reported experiments show these tailored designs use far less logic and memory while delivering measurable latency gains across real workloads.

Core claim

SPAC introduces a unified workflow with a domain-specific language for protocol-architecture co-design, a library of modular HLS-based adaptive switch components, and a trace-aware Design Space Exploration engine. By providing a multi-fidelity simulation stack, SPAC enables rapid identification of Pareto-optimal designs prior to deployment. Experimental results show that by tailoring the micro-architecture and protocol to the specific workload, SPAC-generated designs reduce LUT and BRAM usage by 55% and 53%, respectively. Compared to fixed-architecture counterparts, SPAC delivers latency reductions ranging from 7.8% to 38.4% across various tasks while maintaining adequate resource and packet

What carries the argument

The trace-aware Design Space Exploration engine together with the multi-fidelity simulation stack, which co-optimizes protocol rules and switch micro-architecture for a given set of traffic traces.

If this is right

  • Latency-critical services such as HFT and sensor networks can receive switches with minimal logic delay instead of oversized generic hardware.
  • Hyperscale datacenter fabrics can obtain sustained line-rate throughput tailored to collective and training traffic patterns.
  • FPGA devices can host more complex switch logic or fit the same function into smaller, cheaper chips.
  • Development cycles for custom network hardware shorten because the DSE engine replaces much of the manual tuning.
  • Packet drop rates remain comparable to fixed designs while resource and latency metrics improve.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the simulation-to-hardware gap stays small on new workloads, the same co-design flow could be reused for custom NICs or routers.
  • Expanding the library of HLS components would let the system handle a wider range of emerging protocols without rewriting the DSE engine.
  • The Pareto front produced by the tool gives designers explicit trade-off curves they can inspect before committing to silicon.
  • Similar automation of protocol-plus-architecture search might apply to other reconfigurable platforms beyond FPGAs.

Load-bearing premise

The multi-fidelity simulation stack and trace-aware DSE engine produce designs whose real FPGA performance and resource usage closely match the simulated Pareto front for the tested workloads.

What would settle it

Synthesize and place one SPAC-generated design on a physical FPGA, then measure its actual LUT count, BRAM usage, end-to-end latency, and packet drop rate and compare those numbers to the values predicted by the simulation for the same workload.

Figures

Figures reproduced from arXiv: 2604.21881 by Ajay Brahmakshatriya, Alexander Charlton, Ce Guo, Guoyu Li, Hongxiang Fan, Lucas H L Ng, Philippos Papaphilippou, Qianzhou Wang, Saman Amarasinghe, Wayne Luk, Will Punter, Yang Cao.

Figure 1
Figure 1. Figure 1: Hardware Sensitivity (left): different scheduler architectures favor view at source ↗
Figure 2
Figure 2. Figure 2: SPAC System Overview. tecture (which often relies on TCAMs or runtime configuration registers) in favor of a template-driven synthesis approach. The core of our parsing subsystem is a generic HLS template, which remains completely agnostic to specific protocol details until compilation. During the synthesis phase, the SPAC com￾piler lowers the high-level user-defined protocol specification into a C++ heade… view at source ↗
Figure 3
Figure 3. Figure 3: Forward Table Architectures Dst Port 1 W1 Dst Port 2 W1 Dst Port 3 Dst Port N ... Src Port 1 Data VOQs W2 Dst Port 1 Dst Port 2 W3 Dst Port 3 Dst Port N ... Src Port 2 Data VOQs Ptr2 Dst Port 1 Ptr1 Dst Port 2 Ptr3 Ptr1 Dst Port 3 Dst Port N ... Pointer-Based Free Space Queue Ptr3 Ptr2 Ptr1 Stored Data Word Bitmap Data Word 1 0b0110 Data Word 2 0b0001 Data Word 3 0b0100 Data Buffer (a) N*N Virtual Output Q… view at source ↗
Figure 4
Figure 4. Figure 4: N*N Virtual Output Queues (VOQs) versus Shared VOQs. view at source ↗
Figure 5
Figure 5. Figure 5: RR / EDRRM / iSLIP Scheduler. packets are copied and stored in all queues associated with the source port. The queues are fully partitioned to support reading and writing in parallel. One drawback of this implementation is that it suffers from limited FIFO depth and high memory/time costs due to data duplication for broadcasting. As a compromise, we also provide Shared VOQs [8], which use a central data bu… view at source ↗
Figure 6
Figure 6. Figure 6: Variance Analysis of Resource and Performance Estimates view at source ↗
Figure 7
Figure 7. Figure 7: DSE Algorithm Search Space Visualization. view at source ↗
Figure 8
Figure 8. Figure 8: Average P2P Performance Under Different #Port and Architectures. view at source ↗
read the original abstract

With network requirements diverging across emerging applications, latency-critical services demand minimal logic delay, while hyperscale training and collectives require sustained line-rate throughput for synchronized bulk transfers. This divergence creates an urgent need for custom network switches tailored to specialized protocols and application-specific traffic patterns. This paper presents SPAC (Switch and Protocol Adaptive Customization), a novel approach that automates the generation of FPGA-based network switches co-optimized for custom protocols and application-specific traffic patterns. SPAC introduces a unified workflow with a domain-specific language (DSL) for protocol-architecture co-design, a library of modular HLS-based adaptive switch components, and a trace-aware Design Space Exploration (DSE) engine. By providing a multi-fidelity simulation stack, SPAC enables rapid identification of Pareto-optimal designs prior to deployment. We demonstrate the efficacy of the domain-specific adaptation of SPAC across a spectrum of real-world scenarios, spanning from latency-sensitive sensor and HFT networks to hyperscale datacenter fabrics. Experimental results show that by tailoring the micro-architecture and protocol to the specific workload, SPAC-generated designs reduce LUT and BRAM usage by 55% and 53%, respectively. Compared to fixed-architecture counterparts, SPAC delivers latency reductions ranging from 7.8% to 38.4% across various tasks while maintaining adequate resource consumption and packet drop rate.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper presents SPAC, a framework that automates the generation of FPGA-based network switches via a DSL for protocol-architecture co-design, a library of modular HLS-based components, and a trace-aware DSE engine supported by a multi-fidelity simulation stack. It claims that workload-specific tailoring yields 55% LUT and 53% BRAM reductions relative to fixed architectures, plus latency improvements of 7.8–38.4% across sensor, HFT, and hyperscale workloads while preserving acceptable resource use and packet drop rates.

Significance. If the results hold after proper validation, the work would be significant for enabling rapid, application-driven customization of network hardware in latency-critical and high-throughput domains. The trace-aware DSE and multi-fidelity simulation approach represent a practical strength for exploring design spaces before hardware deployment.

major comments (2)
  1. [Abstract] Abstract: The central claims of 55% LUT / 53% BRAM reductions and 7.8–38.4% latency gains are stated without any description of the fixed-architecture baselines, exact workload traces, number of runs, or error bars. This information is load-bearing for evaluating whether the reported improvements are statistically meaningful or reproducible.
  2. [Results / Evaluation section] Results / Evaluation section: No quantitative correlation is provided between the multi-fidelity simulation metrics (LUT, BRAM, latency) used for Pareto selection and the actual post-synthesis FPGA measurements for the final designs in the sensor, HFT, and hyperscale cases. Effects such as HLS routing congestion or custom-protocol BRAM contention could invalidate the simulated gains.
minor comments (1)
  1. [Abstract] Abstract: The latency reduction range is given without mapping specific percentages to individual tasks or workloads, which reduces interpretability of the cross-scenario results.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful review and constructive suggestions. We address the major comments point by point below, proposing revisions to enhance clarity and validation.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claims of 55% LUT / 53% BRAM reductions and 7.8–38.4% latency gains are stated without any description of the fixed-architecture baselines, exact workload traces, number of runs, or error bars. This information is load-bearing for evaluating whether the reported improvements are statistically meaningful or reproducible.

    Authors: We agree that the abstract would benefit from more context on the baselines and workloads to support the claims. The fixed-architecture baselines refer to standard non-adaptive switch designs implemented in HLS without protocol-architecture co-design. The workloads encompass real-world traces from sensor networks, HFT applications, and hyperscale datacenters, as described in the evaluation. While space constraints in the abstract limit full details, we will revise it to briefly reference the nature of the baselines and workloads, and clarify that comprehensive statistics, including results from multiple runs and variability measures, are available in the results section. This revision will help readers assess the statistical significance and reproducibility of the improvements. revision: partial

  2. Referee: [Results / Evaluation section] Results / Evaluation section: No quantitative correlation is provided between the multi-fidelity simulation metrics (LUT, BRAM, latency) used for Pareto selection and the actual post-synthesis FPGA measurements for the final designs in the sensor, HFT, and hyperscale cases. Effects such as HLS routing congestion or custom-protocol BRAM contention could invalidate the simulated gains.

    Authors: We appreciate this observation regarding the validation of our multi-fidelity simulation. The manuscript presents post-synthesis FPGA measurements for the final SPAC designs in the results section, demonstrating that the selected architectures achieve the reported resource and latency benefits on actual hardware. However, we acknowledge the value of explicitly quantifying the correlation between the simulation predictions used in DSE and the post-synthesis results to rule out effects like routing congestion. In the revised manuscript, we will add a quantitative analysis, including tables or figures showing the percentage differences in LUT, BRAM, and latency between multi-fidelity simulations and actual post-synthesis reports for the Pareto-optimal designs across all evaluated workloads. This will provide stronger evidence for the reliability of our approach. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on empirical FPGA measurements, not self-referential derivations

full rationale

The paper reports resource reductions (55% LUT, 53% BRAM) and latency gains (7.8–38.4%) as outcomes of deploying SPAC-generated designs on real hardware and comparing them to fixed-architecture baselines. No equations, fitted parameters, or predictions are presented that reduce by construction to the inputs of the DSE engine or multi-fidelity simulator. The workflow description (DSL, HLS library, trace-aware DSE) is a design methodology whose outputs are externally validated by synthesis and execution, satisfying the self-contained benchmark criterion. No self-citation chains or ansatz smuggling appear in the load-bearing claims.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review performed on abstract only; no explicit free parameters, axioms, or invented entities are stated in the provided text.

pith-pipeline@v0.9.0 · 5582 in / 1202 out tokens · 33674 ms · 2026-05-08T13:55:42.860522+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

52 extracted references · 18 canonical work pages

  1. [1]

    HPCC: high precision congestion control,

    Y . Li, R. Miao, H. H. Liu, Y . Zhuang, F. Feng, L. Tang, Z. Cao, M. Zhang, F. Kelly, M. Alizadeh, and M. Yu, “HPCC: high precision congestion control,” inProceedings of the ACM Special Interest Group on Data Communication, SIGCOMM 2019, Beijing, China, August 19-23, 2019, J. Wu and W. Hall, Eds. ACM, 2019, pp. 44–58. [Online]. Available: https://doi.org/...

  2. [2]

    The islip scheduling algorithm for input-queued switches,

    N. McKeown, “The islip scheduling algorithm for input-queued switches,”IEEE/ACM Trans. Netw., vol. 7, no. 2, pp. 188–201, 1999. [Online]. Available: https://doi.org/10.1109/90.769767

  3. [3]

    The dual round robin matching switch with exhaustive service,

    Y . Li, S. Panwar, and H. J. Chao, “The dual round robin matching switch with exhaustive service,” inWorkshop on High Performance Switching and Routing, Merging Optical and IP Technologie. IEEE, 2002, pp. 58–63

  4. [4]

    P4: programming protocol-independent packet processors,

    P. Bosshart, D. Daly, G. Gibb, M. Izzard, N. McKeown, J. Rexford, C. Schlesinger, D. Talayco, A. Vahdat, G. Varghese, and D. Walker, “P4: programming protocol-independent packet processors,”Comput. Commun. Rev., vol. 44, no. 3, pp. 87–95, 2014. [Online]. Available: https://doi.org/10.1145/2656877.2656890

  5. [5]

    Available: https://www.nsnam.org/

    nsnam, “ns-3.” [Online]. Available: https://www.nsnam.org/

  6. [6]

    Saturating the transceiver bandwidth: switch fabric design on FPGAs,

    Z. Dai and J. Zhu, “Saturating the transceiver bandwidth: switch fabric design on FPGAs,” inProceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, ser. FPGA ’12. New York, NY , USA: Association for Computing Machinery, 2012, p. 67–76. [Online]. Available: https://doi.org/10.1145/2145694.2145706

  7. [7]

    Hipernetch: High- performance FPGA network switch,

    P. Papaphilippou, J. Meng, N. Gebara, and W. Luk, “Hipernetch: High- performance FPGA network switch,”ACM Transactions on Reconfig- urable Technology and Systems (TRETS), vol. 15, no. 1, pp. 1–31, 2021

  8. [8]

    Investigating the feasibility of FPGA-based network switches,

    J. Meng, N. Gebara, H.-C. Ng, P. Costa, and W. Luk, “Investigating the feasibility of FPGA-based network switches,” in2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP), vol. 2160. IEEE, 2019, pp. 218–226

  9. [9]

    Netfpga sume: Toward 100 gbps as research commodity,

    N. Zilberman, Y . Audzevich, G. A. Covington, and A. W. Moore, “Netfpga sume: Toward 100 gbps as research commodity,”IEEE Micro, vol. 34, no. 5, pp. 32–41, 2014

  10. [10]

    CusComNet: A customisable network for reconfigurable heterogeneous clusters,

    S. Denholm, K. H. Tsoi, P. Pietzuch, and W. Luk, “CusComNet: A customisable network for reconfigurable heterogeneous clusters,” in ASAP 2011 - 22nd IEEE International Conference on Application- specific Systems, Architectures and Processors, 2011, pp. 9–16

  11. [11]

    Experimental survey of FPGA-based monolithic switches and a novel queue balancer,

    P. Papaphilippou, K. Sano, B. A. Adhi, and W. Luk, “Experimental survey of FPGA-based monolithic switches and a novel queue balancer,” IEEE Transactions on Parallel and Distributed Systems, vol. 34, no. 5, pp. 1621–1634, 2023

  12. [12]

    Opennic,

    Xilinx, “Opennic,” 2021, accessed 2026-03-30. [Online]. Available: https://github.com/Xilinx/open-nic

  13. [13]

    Corundum: An open-source 100-gbps nic,

    A. Forencich, A. C. Snoeren, G. Porter, and G. Papen, “Corundum: An open-source 100-gbps nic,” in28th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM 2020, Fayetteville, AR, USA, May 3-6, 2020. IEEE, 2020, pp. 38–46. [Online]. Available: https://doi.org/10.1109/FCCM48280.2020. 00015

  14. [14]

    ebpflow: A hardware/software platform to seamlessly offload network functions leveraging ebpf,

    R. D. G. Pac ´ıfico, L. F. D. S. Duarte, L. F. M. Vieira, B. Raghavan, J. A. M. Nacif, and M. A. M. Vieira, “ebpflow: A hardware/software platform to seamlessly offload network functions leveraging ebpf,” IEEE/ACM Trans. Netw., vol. 32, no. 2, pp. 1319–1332, 2024. [Online]. Available: https://doi.org/10.1109/TNET.2023.3318251

  15. [15]

    nanotube,

    Xilinx, “nanotube,” 2023, accessed 2026-03-30. [Online]. Available: https://github.com/Xilinx/nanotube

  16. [16]

    PANIC: A high-performance programmable NIC for multi-tenant networks,

    J. Lin, K. Patel, B. E. Stephens, A. Sivaraman, and A. Akella, “PANIC: A high-performance programmable NIC for multi-tenant networks,” in14th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2020, Virtual Event, November 4-6,

  17. [17]

    USENIX Association, 2020, pp. 243–259. [Online]. Available: https://www.usenix.org/conference/osdi20/presentation/lin

  18. [18]

    Achieving 100gbps intrusion prevention on a single server,

    Z. Zhao, H. Sadok, N. Atre, J. C. Hoe, V . Sekar, and J. Sherry, “Achieving 100gbps intrusion prevention on a single server,” in14th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2020, Virtual Event, November 4-6, 2020. USENIX Association, 2020, pp. 1083–1100. [Online]. Available: https://www.usenix.org/ conference/osdi20/presentat...

  19. [19]

    Flowblaze: Stateful packet processing in hardware,

    S. Pontarelli, R. Bifulco, M. Bonola, C. Cascone, M. S. Brunella, V . Bruschi, D. Sanvito, G. Siracusano, A. Capone, M. Honda, and F. Huici, “Flowblaze: Stateful packet processing in hardware,” in16th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2019, Boston, MA, February 26-28, 2019, J. R. Lorch and M. Yu, Eds. USENIX Association...

  20. [20]

    Demystifying datapath accelerator enhanced off-path smartnic,

    X. Chen, J. Zhang, T. Fu, Y . Shen, S. Ma, K. Qian, L. Zhu, C. Shi, Y . Zhang, M. Liu, and Z. Wang, “Demystifying datapath accelerator enhanced off-path smartnic,” in32nd IEEE International Conference on Network Protocols, ICNP 2024, Charleroi, Belgium, October 28-31, 2024. IEEE, 2024, pp. 1–12. [Online]. Available: https://doi.org/10.1109/ICNP61940.2024.10858560

  21. [21]

    hxdp: Efficient software packet processing on FPGA nics,

    M. S. Brunella, G. Belocchi, M. Bonola, S. Pontarelli, G. Siracusano, G. Bianchi, A. Cammarano, A. Palumbo, L. Petrucci, and R. Bifulco, “hxdp: Efficient software packet processing on FPGA nics,”Commun. ACM, vol. 65, no. 8, pp. 92–100, 2022. [Online]. Available: https://doi.org/10.1145/3543668

  22. [22]

    Triton: A flexible hardware offloading architecture for accelerating apsara vswitch in alibaba cloud,

    X. Li, X. Jiang, Y . Yang, L. Chen, Y . Wang, C. Wang, C. Xu, Y . Lv, B. Yang, T. Wu, H. Gao, Z. Chen, Y . Qiao, H. Ding, Y . Dong, H. Yang, J. Song, J. Lu, P. Zhang, C. Wei, Z. Zhang, W. Chen, Q. He, and S. Zhu, “Triton: A flexible hardware offloading architecture for accelerating apsara vswitch in alibaba cloud,” inProceedings of the ACM SIGCOMM 2024 Co...

  23. [23]

    FPGAs are the hero in-network computing needs,

    S. A. Fahmy, Z. Yang, Y . Chen, G. Alonso, Z. Istv ´an, and M. Canini, “FPGAs are the hero in-network computing needs,” inProceedings of the 16th ACM SIGOPS Asia-Pacific Workshop on Systems, 2025, pp. 131–139

  24. [24]

    Accel- erating distributed reinforcement learning with in-switch computing,

    Y . Li, I.-J. Liu, Y . Yuan, D. Chen, A. Schwing, and J. Huang, “Accel- erating distributed reinforcement learning with in-switch computing,” inProceedings of the 46th International Symposium on Computer Architecture, 2019, pp. 279–291

  25. [25]

    The case for in-network computing on demand,

    Y . Tokusashi, H. T. Dang, F. Pedone, R. Soul ´e, and N. Zilberman, “The case for in-network computing on demand,” inProceedings of the Fourteenth EuroSys Conference 2019, 2019, pp. 1–16

  26. [26]

    In- network computation is a dumb idea whose time has come,

    A. Sapio, I. Abdelaziz, A. Aldilaijan, M. Canini, and P. Kalnis, “In- network computation is a dumb idea whose time has come,” inPro- ceedings of the 16th ACM Workshop on Hot Topics in Networks, 2017, pp. 150–156

  27. [27]

    A survey on in-network computing: Programmable data plane and technology specific applications,

    S. Kianpisheh and T. Taleb, “A survey on in-network computing: Programmable data plane and technology specific applications,”IEEE Communications Surveys & Tutorials, vol. 25, no. 1, pp. 701–761, 2022

  28. [28]

    A survey on architectures, hardware accel- eration and challenges for in-network computing,

    M. Nickel and D. G ¨ohringer, “A survey on architectures, hardware accel- eration and challenges for in-network computing,”ACM Transactions on Reconfigurable Technology and Systems, vol. 18, no. 1, pp. 1–34, 2024

  29. [29]

    IEEE Standard for Ethernet,

    “IEEE Standard for Ethernet,”IEEE Std 802.3-2022 (Revision of IEEE Std 802.3-2018), pp. 1–7025, Jul. 2022. [Online]. Available: https://ieeexplore.ieee.org/document/9844436

  30. [30]

    ietf.org/rfc/rfc793.txt

    “ietf.org/rfc/rfc793.txt.” [Online]. Available: https://www.ietf.org/rfc/ rfc793.txt

  31. [31]

    Available: https://www.afs.enea.it/asantoro/V2r1 2 1 Release

    [Online]. Available: https://www.afs.enea.it/asantoro/V2r1 2 1 Release. pdf

  32. [32]

    [Online]

    FIX Trading Community,FIX 5.0 Service Pack 2 Specification, FIX Trading Community, 2011, financial Information eXchange Protocol. [Online]. Available: https://www.fixtrading.org/standards/fix-5-0-sp-2/

  33. [33]

    FAST protocol,

    FIX Protocol Ltd, “FAST protocol,” FIX Trading Community, Standard Specification, 2006, fIX Adapted for STreaming. [Online]. Available: https://www.fixtrading.org/standards/fast/

  34. [34]

    Industrial communication networks - fieldbus specifications - part 5-10: Application layer service definition - type 10 elements (profinet),

    “Industrial communication networks - fieldbus specifications - part 5-10: Application layer service definition - type 10 elements (profinet),” IEC, Standard Specification, 2023. [Online]. Available: https://www.profibus.com/download/profinet-specification

  35. [35]

    Industrial communication networks - fieldbus specifications - part 5-10: Application layer service definition - type 12 elements (ethercat),

    “Industrial communication networks - fieldbus specifications - part 5-10: Application layer service definition - type 12 elements (ethercat),” IEC, Standard Specification, 2023. [Online]. Available: https://www.ethercat.org/en/downloads/downloads A02E436C7A97479F9261FDFA8A6D71E5.htm

  36. [36]

    Data center TCP (DCTCP),

    M. Alizadeh, A. G. Greenberg, D. A. Maltz, J. Padhye, P. Patel, B. Prabhakar, S. Sengupta, and M. Sridharan, “Data center TCP (DCTCP),” inProceedings of the ACM SIGCOMM 2010 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, New Delhi, India, August 30 -September 3, 2010, S. Kalyanaraman, V . N. Padmanabhan...

  37. [37]

    Netblocks: Staging layouts for high-performance custom host network stacks,

    A. Brahmakshatriya, C. Rinard, M. Ghobadi, and S. P. Amarasinghe, “Netblocks: Staging layouts for high-performance custom host network stacks,”Proc. ACM Program. Lang., vol. 8, no. PLDI, pp. 467–491,

  38. [38]

    Available: https://doi.org/10.1145/3656396

    [Online]. Available: https://doi.org/10.1145/3656396

  39. [39]

    Vitisnetp4: P4 language support for xilinx devices,

    Xilinx, “Vitisnetp4: P4 language support for xilinx devices,” https: //www.xilinx.com/products/intellectual-property/ef-di-vitisnetp4.html, 2024, [Accessed 13-01-2026]

  40. [40]

    P4thls: A templated hls framework to automate efficient mapping of p4 data-plane applica- tions to fpgas,

    M. Abbasmollaei, T. Ould-Bachir, and Y . Savaria, “P4thls: A templated hls framework to automate efficient mapping of p4 data-plane applica- tions to fpgas,”IEEE Access, 2025

  41. [41]

    P4 to fpga-a fast approach for generating efficient network processors,

    Z. Cao, H. Su, Q. Yang, J. Shen, M. Wen, and C. Zhang, “P4 to fpga-a fast approach for generating efficient network processors,”IEEE Access, vol. 8, pp. 23 440–23 456, 2020

  42. [42]

    P4-to-vhdl: Automatic generation of 100 gbps packet parsers,

    P. Ben ´acek, V . Pus, and H. Kub ´atov´a, “P4-to-vhdl: Automatic generation of 100 gbps packet parsers,” in24th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM 2016, Washington, DC, USA, May 1-3, 2016. IEEE Computer Society, 2016, pp. 148–155. [Online]. Available: https://doi.org/10.1109/FCCM.2016.46

  43. [43]

    P4FPGA: A rapid prototyping framework for P4,

    H. Wang, R. Soul ´e, H. T. Dang, K. S. Lee, V . Shrivastav, N. Foster, and H. Weatherspoon, “P4FPGA: A rapid prototyping framework for P4,” inProceedings of the Symposium on SDN Research, SOSR 2017, Santa Clara, CA, USA, April 3-4, 2017. ACM, 2017, pp. 122–135. [Online]. Available: https://doi.org/10.1145/3050220.3050234

  44. [44]

    32 Keigo Imai, Nobuko Yoshida, and Shoji Yuen

    S. Ibanez, G. J. Brebner, N. McKeown, and N. Zilberman, “The p4- >netfpga workflow for line-rate packet processing,” inProceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA 2019, Seaside, CA, USA, February 24-26, 2019, K. Bazargan and S. Neuendorffer, Eds. ACM, 2019, pp. 1–9. [Online]. Available: https://doi.org...

  45. [45]

    Configurable FPGA packet parser for terabit networks with guaranteed wire-speed throughput,

    J. Cabal, P. Ben ´acek, L. Kekely, M. Kekely, V . Pus, and J. Korenek, “Configurable FPGA packet parser for terabit networks with guaranteed wire-speed throughput,” inProceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA 2018, Monterey, CA, USA, February 25-27, 2018, J. H. Anderson and K. Bazargan, Eds. ACM, 201...

  46. [46]

    Dataset of scada traffic captures from a medical waste incinerator with injected cyberattacks,

    B. Al-Duwairi, A. Shatnawi, A. Al-Hammouri, and M. Ababneh, “Dataset of scada traffic captures from a medical waste incinerator with injected cyberattacks,”Data in Brief, vol. 63, p. 112294, 11 2025

  47. [47]

    Ai-defined flow control in programmable network fabric (ai-fabric): The nanosecond flow intelligence module (nfim) for ultra-low-latency scheduling,

    S. K. Balakrishnan, “Ai-defined flow control in programmable network fabric (ai-fabric): The nanosecond flow intelligence module (nfim) for ultra-low-latency scheduling,” 11 2025

  48. [48]

    Accelerating distributed reinforcement learning with in-switch computing,

    Y . Li, I. Liu, Y . Yuan, D. Chen, A. G. Schwing, and J. Huang, “Accelerating distributed reinforcement learning with in-switch computing,” inProceedings of the 46th International Symposium on Computer Architecture, ISCA 2019, Phoenix, AZ, USA, June 22-26, 2019, S. B. Manne, H. C. Hunter, and E. R. Altman, Eds. ACM, 2019, pp. 279–291. [Online]. Available:...

  49. [49]

    Characterizing microservice dependency and performance: Alibaba trace analysis,

    S. Luo, H. Xu, C. Lu, K. Ye, G. Xu, L. Zhang, Y . Ding, J. He, and C. Xu, “Characterizing microservice dependency and performance: Alibaba trace analysis,” inProceedings of the ACM Symposium on Cloud Computing, 2021, pp. 412–426

  50. [50]

    Desert underwater: An ns-miracle-based framework to design, simulate, emulate and realize test-beds for underwater network protocols,

    R. Masiero, S. Azad, F. Favaro, M. Petrani, G. Toso, F. Guerra, P. Casari, and M. Zorzi, “Desert underwater: An ns-miracle-based framework to design, simulate, emulate and realize test-beds for underwater network protocols,” in2012 Oceans - Yeosu, 2012, pp. 1–10

  51. [51]

    Inside the social network’s (datacenter) network,

    A. Roy, H. Zeng, J. Bagga, G. Porter, and A. C. Snoeren, “Inside the social network’s (datacenter) network,” inProceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, SIGCOMM 2015, London, United Kingdom, August 17-21, 2015, S. Uhlig, O. Maennel, B. Karp, and J. Padhye, Eds. ACM, 2015, pp. 123–137. [Online]. Available: http...

  52. [52]

    Scaling distributed machine learning with in-network aggregation,

    A. Sapio, M. Canini, C. Ho, J. Nelson, P. Kalnis, C. Kim, A. Krishnamurthy, M. Moshref, D. R. K. Ports, and P. Richt ´arik, “Scaling distributed machine learning with in-network aggregation,” in18th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2021, April 12-14, 2021, J. Mickens and R. Teixeira, Eds. USENIX Association, 2021, pp. ...