SPAC: Automating FPGA-based Network Switches with Protocol Adaptive Customization
Pith reviewed 2026-05-08 13:55 UTC · model grok-4.3
The pith
SPAC automates FPGA network switches co-optimized for specific protocols and workloads, cutting LUT usage by 55% and latency by up to 38% versus fixed designs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SPAC introduces a unified workflow with a domain-specific language for protocol-architecture co-design, a library of modular HLS-based adaptive switch components, and a trace-aware Design Space Exploration engine. By providing a multi-fidelity simulation stack, SPAC enables rapid identification of Pareto-optimal designs prior to deployment. Experimental results show that by tailoring the micro-architecture and protocol to the specific workload, SPAC-generated designs reduce LUT and BRAM usage by 55% and 53%, respectively. Compared to fixed-architecture counterparts, SPAC delivers latency reductions ranging from 7.8% to 38.4% across various tasks while maintaining adequate resource and packet
What carries the argument
The trace-aware Design Space Exploration engine together with the multi-fidelity simulation stack, which co-optimizes protocol rules and switch micro-architecture for a given set of traffic traces.
If this is right
- Latency-critical services such as HFT and sensor networks can receive switches with minimal logic delay instead of oversized generic hardware.
- Hyperscale datacenter fabrics can obtain sustained line-rate throughput tailored to collective and training traffic patterns.
- FPGA devices can host more complex switch logic or fit the same function into smaller, cheaper chips.
- Development cycles for custom network hardware shorten because the DSE engine replaces much of the manual tuning.
- Packet drop rates remain comparable to fixed designs while resource and latency metrics improve.
Where Pith is reading between the lines
- If the simulation-to-hardware gap stays small on new workloads, the same co-design flow could be reused for custom NICs or routers.
- Expanding the library of HLS components would let the system handle a wider range of emerging protocols without rewriting the DSE engine.
- The Pareto front produced by the tool gives designers explicit trade-off curves they can inspect before committing to silicon.
- Similar automation of protocol-plus-architecture search might apply to other reconfigurable platforms beyond FPGAs.
Load-bearing premise
The multi-fidelity simulation stack and trace-aware DSE engine produce designs whose real FPGA performance and resource usage closely match the simulated Pareto front for the tested workloads.
What would settle it
Synthesize and place one SPAC-generated design on a physical FPGA, then measure its actual LUT count, BRAM usage, end-to-end latency, and packet drop rate and compare those numbers to the values predicted by the simulation for the same workload.
Figures
read the original abstract
With network requirements diverging across emerging applications, latency-critical services demand minimal logic delay, while hyperscale training and collectives require sustained line-rate throughput for synchronized bulk transfers. This divergence creates an urgent need for custom network switches tailored to specialized protocols and application-specific traffic patterns. This paper presents SPAC (Switch and Protocol Adaptive Customization), a novel approach that automates the generation of FPGA-based network switches co-optimized for custom protocols and application-specific traffic patterns. SPAC introduces a unified workflow with a domain-specific language (DSL) for protocol-architecture co-design, a library of modular HLS-based adaptive switch components, and a trace-aware Design Space Exploration (DSE) engine. By providing a multi-fidelity simulation stack, SPAC enables rapid identification of Pareto-optimal designs prior to deployment. We demonstrate the efficacy of the domain-specific adaptation of SPAC across a spectrum of real-world scenarios, spanning from latency-sensitive sensor and HFT networks to hyperscale datacenter fabrics. Experimental results show that by tailoring the micro-architecture and protocol to the specific workload, SPAC-generated designs reduce LUT and BRAM usage by 55% and 53%, respectively. Compared to fixed-architecture counterparts, SPAC delivers latency reductions ranging from 7.8% to 38.4% across various tasks while maintaining adequate resource consumption and packet drop rate.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents SPAC, a framework that automates the generation of FPGA-based network switches via a DSL for protocol-architecture co-design, a library of modular HLS-based components, and a trace-aware DSE engine supported by a multi-fidelity simulation stack. It claims that workload-specific tailoring yields 55% LUT and 53% BRAM reductions relative to fixed architectures, plus latency improvements of 7.8–38.4% across sensor, HFT, and hyperscale workloads while preserving acceptable resource use and packet drop rates.
Significance. If the results hold after proper validation, the work would be significant for enabling rapid, application-driven customization of network hardware in latency-critical and high-throughput domains. The trace-aware DSE and multi-fidelity simulation approach represent a practical strength for exploring design spaces before hardware deployment.
major comments (2)
- [Abstract] Abstract: The central claims of 55% LUT / 53% BRAM reductions and 7.8–38.4% latency gains are stated without any description of the fixed-architecture baselines, exact workload traces, number of runs, or error bars. This information is load-bearing for evaluating whether the reported improvements are statistically meaningful or reproducible.
- [Results / Evaluation section] Results / Evaluation section: No quantitative correlation is provided between the multi-fidelity simulation metrics (LUT, BRAM, latency) used for Pareto selection and the actual post-synthesis FPGA measurements for the final designs in the sensor, HFT, and hyperscale cases. Effects such as HLS routing congestion or custom-protocol BRAM contention could invalidate the simulated gains.
minor comments (1)
- [Abstract] Abstract: The latency reduction range is given without mapping specific percentages to individual tasks or workloads, which reduces interpretability of the cross-scenario results.
Simulated Author's Rebuttal
We thank the referee for their thoughtful review and constructive suggestions. We address the major comments point by point below, proposing revisions to enhance clarity and validation.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claims of 55% LUT / 53% BRAM reductions and 7.8–38.4% latency gains are stated without any description of the fixed-architecture baselines, exact workload traces, number of runs, or error bars. This information is load-bearing for evaluating whether the reported improvements are statistically meaningful or reproducible.
Authors: We agree that the abstract would benefit from more context on the baselines and workloads to support the claims. The fixed-architecture baselines refer to standard non-adaptive switch designs implemented in HLS without protocol-architecture co-design. The workloads encompass real-world traces from sensor networks, HFT applications, and hyperscale datacenters, as described in the evaluation. While space constraints in the abstract limit full details, we will revise it to briefly reference the nature of the baselines and workloads, and clarify that comprehensive statistics, including results from multiple runs and variability measures, are available in the results section. This revision will help readers assess the statistical significance and reproducibility of the improvements. revision: partial
-
Referee: [Results / Evaluation section] Results / Evaluation section: No quantitative correlation is provided between the multi-fidelity simulation metrics (LUT, BRAM, latency) used for Pareto selection and the actual post-synthesis FPGA measurements for the final designs in the sensor, HFT, and hyperscale cases. Effects such as HLS routing congestion or custom-protocol BRAM contention could invalidate the simulated gains.
Authors: We appreciate this observation regarding the validation of our multi-fidelity simulation. The manuscript presents post-synthesis FPGA measurements for the final SPAC designs in the results section, demonstrating that the selected architectures achieve the reported resource and latency benefits on actual hardware. However, we acknowledge the value of explicitly quantifying the correlation between the simulation predictions used in DSE and the post-synthesis results to rule out effects like routing congestion. In the revised manuscript, we will add a quantitative analysis, including tables or figures showing the percentage differences in LUT, BRAM, and latency between multi-fidelity simulations and actual post-synthesis reports for the Pareto-optimal designs across all evaluated workloads. This will provide stronger evidence for the reliability of our approach. revision: yes
Circularity Check
No circularity: claims rest on empirical FPGA measurements, not self-referential derivations
full rationale
The paper reports resource reductions (55% LUT, 53% BRAM) and latency gains (7.8–38.4%) as outcomes of deploying SPAC-generated designs on real hardware and comparing them to fixed-architecture baselines. No equations, fitted parameters, or predictions are presented that reduce by construction to the inputs of the DSE engine or multi-fidelity simulator. The workflow description (DSL, HLS library, trace-aware DSE) is a design methodology whose outputs are externally validated by synthesis and execution, satisfying the self-contained benchmark criterion. No self-citation chains or ansatz smuggling appear in the load-bearing claims.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
HPCC: high precision congestion control,
Y . Li, R. Miao, H. H. Liu, Y . Zhuang, F. Feng, L. Tang, Z. Cao, M. Zhang, F. Kelly, M. Alizadeh, and M. Yu, “HPCC: high precision congestion control,” inProceedings of the ACM Special Interest Group on Data Communication, SIGCOMM 2019, Beijing, China, August 19-23, 2019, J. Wu and W. Hall, Eds. ACM, 2019, pp. 44–58. [Online]. Available: https://doi.org/...
-
[2]
The islip scheduling algorithm for input-queued switches,
N. McKeown, “The islip scheduling algorithm for input-queued switches,”IEEE/ACM Trans. Netw., vol. 7, no. 2, pp. 188–201, 1999. [Online]. Available: https://doi.org/10.1109/90.769767
-
[3]
The dual round robin matching switch with exhaustive service,
Y . Li, S. Panwar, and H. J. Chao, “The dual round robin matching switch with exhaustive service,” inWorkshop on High Performance Switching and Routing, Merging Optical and IP Technologie. IEEE, 2002, pp. 58–63
2002
-
[4]
P4: programming protocol-independent packet processors,
P. Bosshart, D. Daly, G. Gibb, M. Izzard, N. McKeown, J. Rexford, C. Schlesinger, D. Talayco, A. Vahdat, G. Varghese, and D. Walker, “P4: programming protocol-independent packet processors,”Comput. Commun. Rev., vol. 44, no. 3, pp. 87–95, 2014. [Online]. Available: https://doi.org/10.1145/2656877.2656890
-
[5]
Available: https://www.nsnam.org/
nsnam, “ns-3.” [Online]. Available: https://www.nsnam.org/
-
[6]
Saturating the transceiver bandwidth: switch fabric design on FPGAs,
Z. Dai and J. Zhu, “Saturating the transceiver bandwidth: switch fabric design on FPGAs,” inProceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, ser. FPGA ’12. New York, NY , USA: Association for Computing Machinery, 2012, p. 67–76. [Online]. Available: https://doi.org/10.1145/2145694.2145706
-
[7]
Hipernetch: High- performance FPGA network switch,
P. Papaphilippou, J. Meng, N. Gebara, and W. Luk, “Hipernetch: High- performance FPGA network switch,”ACM Transactions on Reconfig- urable Technology and Systems (TRETS), vol. 15, no. 1, pp. 1–31, 2021
2021
-
[8]
Investigating the feasibility of FPGA-based network switches,
J. Meng, N. Gebara, H.-C. Ng, P. Costa, and W. Luk, “Investigating the feasibility of FPGA-based network switches,” in2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP), vol. 2160. IEEE, 2019, pp. 218–226
2019
-
[9]
Netfpga sume: Toward 100 gbps as research commodity,
N. Zilberman, Y . Audzevich, G. A. Covington, and A. W. Moore, “Netfpga sume: Toward 100 gbps as research commodity,”IEEE Micro, vol. 34, no. 5, pp. 32–41, 2014
2014
-
[10]
CusComNet: A customisable network for reconfigurable heterogeneous clusters,
S. Denholm, K. H. Tsoi, P. Pietzuch, and W. Luk, “CusComNet: A customisable network for reconfigurable heterogeneous clusters,” in ASAP 2011 - 22nd IEEE International Conference on Application- specific Systems, Architectures and Processors, 2011, pp. 9–16
2011
-
[11]
Experimental survey of FPGA-based monolithic switches and a novel queue balancer,
P. Papaphilippou, K. Sano, B. A. Adhi, and W. Luk, “Experimental survey of FPGA-based monolithic switches and a novel queue balancer,” IEEE Transactions on Parallel and Distributed Systems, vol. 34, no. 5, pp. 1621–1634, 2023
2023
-
[12]
Opennic,
Xilinx, “Opennic,” 2021, accessed 2026-03-30. [Online]. Available: https://github.com/Xilinx/open-nic
2021
-
[13]
Corundum: An open-source 100-gbps nic,
A. Forencich, A. C. Snoeren, G. Porter, and G. Papen, “Corundum: An open-source 100-gbps nic,” in28th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM 2020, Fayetteville, AR, USA, May 3-6, 2020. IEEE, 2020, pp. 38–46. [Online]. Available: https://doi.org/10.1109/FCCM48280.2020. 00015
-
[14]
ebpflow: A hardware/software platform to seamlessly offload network functions leveraging ebpf,
R. D. G. Pac ´ıfico, L. F. D. S. Duarte, L. F. M. Vieira, B. Raghavan, J. A. M. Nacif, and M. A. M. Vieira, “ebpflow: A hardware/software platform to seamlessly offload network functions leveraging ebpf,” IEEE/ACM Trans. Netw., vol. 32, no. 2, pp. 1319–1332, 2024. [Online]. Available: https://doi.org/10.1109/TNET.2023.3318251
-
[15]
nanotube,
Xilinx, “nanotube,” 2023, accessed 2026-03-30. [Online]. Available: https://github.com/Xilinx/nanotube
2023
-
[16]
PANIC: A high-performance programmable NIC for multi-tenant networks,
J. Lin, K. Patel, B. E. Stephens, A. Sivaraman, and A. Akella, “PANIC: A high-performance programmable NIC for multi-tenant networks,” in14th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2020, Virtual Event, November 4-6,
2020
-
[17]
USENIX Association, 2020, pp. 243–259. [Online]. Available: https://www.usenix.org/conference/osdi20/presentation/lin
2020
-
[18]
Achieving 100gbps intrusion prevention on a single server,
Z. Zhao, H. Sadok, N. Atre, J. C. Hoe, V . Sekar, and J. Sherry, “Achieving 100gbps intrusion prevention on a single server,” in14th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2020, Virtual Event, November 4-6, 2020. USENIX Association, 2020, pp. 1083–1100. [Online]. Available: https://www.usenix.org/ conference/osdi20/presentat...
2020
-
[19]
Flowblaze: Stateful packet processing in hardware,
S. Pontarelli, R. Bifulco, M. Bonola, C. Cascone, M. S. Brunella, V . Bruschi, D. Sanvito, G. Siracusano, A. Capone, M. Honda, and F. Huici, “Flowblaze: Stateful packet processing in hardware,” in16th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2019, Boston, MA, February 26-28, 2019, J. R. Lorch and M. Yu, Eds. USENIX Association...
2019
-
[20]
Demystifying datapath accelerator enhanced off-path smartnic,
X. Chen, J. Zhang, T. Fu, Y . Shen, S. Ma, K. Qian, L. Zhu, C. Shi, Y . Zhang, M. Liu, and Z. Wang, “Demystifying datapath accelerator enhanced off-path smartnic,” in32nd IEEE International Conference on Network Protocols, ICNP 2024, Charleroi, Belgium, October 28-31, 2024. IEEE, 2024, pp. 1–12. [Online]. Available: https://doi.org/10.1109/ICNP61940.2024.10858560
-
[21]
hxdp: Efficient software packet processing on FPGA nics,
M. S. Brunella, G. Belocchi, M. Bonola, S. Pontarelli, G. Siracusano, G. Bianchi, A. Cammarano, A. Palumbo, L. Petrucci, and R. Bifulco, “hxdp: Efficient software packet processing on FPGA nics,”Commun. ACM, vol. 65, no. 8, pp. 92–100, 2022. [Online]. Available: https://doi.org/10.1145/3543668
-
[22]
X. Li, X. Jiang, Y . Yang, L. Chen, Y . Wang, C. Wang, C. Xu, Y . Lv, B. Yang, T. Wu, H. Gao, Z. Chen, Y . Qiao, H. Ding, Y . Dong, H. Yang, J. Song, J. Lu, P. Zhang, C. Wei, Z. Zhang, W. Chen, Q. He, and S. Zhu, “Triton: A flexible hardware offloading architecture for accelerating apsara vswitch in alibaba cloud,” inProceedings of the ACM SIGCOMM 2024 Co...
-
[23]
FPGAs are the hero in-network computing needs,
S. A. Fahmy, Z. Yang, Y . Chen, G. Alonso, Z. Istv ´an, and M. Canini, “FPGAs are the hero in-network computing needs,” inProceedings of the 16th ACM SIGOPS Asia-Pacific Workshop on Systems, 2025, pp. 131–139
2025
-
[24]
Accel- erating distributed reinforcement learning with in-switch computing,
Y . Li, I.-J. Liu, Y . Yuan, D. Chen, A. Schwing, and J. Huang, “Accel- erating distributed reinforcement learning with in-switch computing,” inProceedings of the 46th International Symposium on Computer Architecture, 2019, pp. 279–291
2019
-
[25]
The case for in-network computing on demand,
Y . Tokusashi, H. T. Dang, F. Pedone, R. Soul ´e, and N. Zilberman, “The case for in-network computing on demand,” inProceedings of the Fourteenth EuroSys Conference 2019, 2019, pp. 1–16
2019
-
[26]
In- network computation is a dumb idea whose time has come,
A. Sapio, I. Abdelaziz, A. Aldilaijan, M. Canini, and P. Kalnis, “In- network computation is a dumb idea whose time has come,” inPro- ceedings of the 16th ACM Workshop on Hot Topics in Networks, 2017, pp. 150–156
2017
-
[27]
A survey on in-network computing: Programmable data plane and technology specific applications,
S. Kianpisheh and T. Taleb, “A survey on in-network computing: Programmable data plane and technology specific applications,”IEEE Communications Surveys & Tutorials, vol. 25, no. 1, pp. 701–761, 2022
2022
-
[28]
A survey on architectures, hardware accel- eration and challenges for in-network computing,
M. Nickel and D. G ¨ohringer, “A survey on architectures, hardware accel- eration and challenges for in-network computing,”ACM Transactions on Reconfigurable Technology and Systems, vol. 18, no. 1, pp. 1–34, 2024
2024
-
[29]
“IEEE Standard for Ethernet,”IEEE Std 802.3-2022 (Revision of IEEE Std 802.3-2018), pp. 1–7025, Jul. 2022. [Online]. Available: https://ieeexplore.ieee.org/document/9844436
-
[30]
ietf.org/rfc/rfc793.txt
“ietf.org/rfc/rfc793.txt.” [Online]. Available: https://www.ietf.org/rfc/ rfc793.txt
-
[31]
Available: https://www.afs.enea.it/asantoro/V2r1 2 1 Release
[Online]. Available: https://www.afs.enea.it/asantoro/V2r1 2 1 Release. pdf
-
[32]
[Online]
FIX Trading Community,FIX 5.0 Service Pack 2 Specification, FIX Trading Community, 2011, financial Information eXchange Protocol. [Online]. Available: https://www.fixtrading.org/standards/fix-5-0-sp-2/
2011
-
[33]
FAST protocol,
FIX Protocol Ltd, “FAST protocol,” FIX Trading Community, Standard Specification, 2006, fIX Adapted for STreaming. [Online]. Available: https://www.fixtrading.org/standards/fast/
2006
-
[34]
Industrial communication networks - fieldbus specifications - part 5-10: Application layer service definition - type 10 elements (profinet),
“Industrial communication networks - fieldbus specifications - part 5-10: Application layer service definition - type 10 elements (profinet),” IEC, Standard Specification, 2023. [Online]. Available: https://www.profibus.com/download/profinet-specification
2023
-
[35]
Industrial communication networks - fieldbus specifications - part 5-10: Application layer service definition - type 12 elements (ethercat),
“Industrial communication networks - fieldbus specifications - part 5-10: Application layer service definition - type 12 elements (ethercat),” IEC, Standard Specification, 2023. [Online]. Available: https://www.ethercat.org/en/downloads/downloads A02E436C7A97479F9261FDFA8A6D71E5.htm
2023
-
[36]
M. Alizadeh, A. G. Greenberg, D. A. Maltz, J. Padhye, P. Patel, B. Prabhakar, S. Sengupta, and M. Sridharan, “Data center TCP (DCTCP),” inProceedings of the ACM SIGCOMM 2010 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, New Delhi, India, August 30 -September 3, 2010, S. Kalyanaraman, V . N. Padmanabhan...
-
[37]
Netblocks: Staging layouts for high-performance custom host network stacks,
A. Brahmakshatriya, C. Rinard, M. Ghobadi, and S. P. Amarasinghe, “Netblocks: Staging layouts for high-performance custom host network stacks,”Proc. ACM Program. Lang., vol. 8, no. PLDI, pp. 467–491,
-
[38]
Available: https://doi.org/10.1145/3656396
[Online]. Available: https://doi.org/10.1145/3656396
-
[39]
Vitisnetp4: P4 language support for xilinx devices,
Xilinx, “Vitisnetp4: P4 language support for xilinx devices,” https: //www.xilinx.com/products/intellectual-property/ef-di-vitisnetp4.html, 2024, [Accessed 13-01-2026]
2024
-
[40]
P4thls: A templated hls framework to automate efficient mapping of p4 data-plane applica- tions to fpgas,
M. Abbasmollaei, T. Ould-Bachir, and Y . Savaria, “P4thls: A templated hls framework to automate efficient mapping of p4 data-plane applica- tions to fpgas,”IEEE Access, 2025
2025
-
[41]
P4 to fpga-a fast approach for generating efficient network processors,
Z. Cao, H. Su, Q. Yang, J. Shen, M. Wen, and C. Zhang, “P4 to fpga-a fast approach for generating efficient network processors,”IEEE Access, vol. 8, pp. 23 440–23 456, 2020
2020
-
[42]
P4-to-vhdl: Automatic generation of 100 gbps packet parsers,
P. Ben ´acek, V . Pus, and H. Kub ´atov´a, “P4-to-vhdl: Automatic generation of 100 gbps packet parsers,” in24th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM 2016, Washington, DC, USA, May 1-3, 2016. IEEE Computer Society, 2016, pp. 148–155. [Online]. Available: https://doi.org/10.1109/FCCM.2016.46
-
[43]
P4FPGA: A rapid prototyping framework for P4,
H. Wang, R. Soul ´e, H. T. Dang, K. S. Lee, V . Shrivastav, N. Foster, and H. Weatherspoon, “P4FPGA: A rapid prototyping framework for P4,” inProceedings of the Symposium on SDN Research, SOSR 2017, Santa Clara, CA, USA, April 3-4, 2017. ACM, 2017, pp. 122–135. [Online]. Available: https://doi.org/10.1145/3050220.3050234
-
[44]
32 Keigo Imai, Nobuko Yoshida, and Shoji Yuen
S. Ibanez, G. J. Brebner, N. McKeown, and N. Zilberman, “The p4- >netfpga workflow for line-rate packet processing,” inProceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA 2019, Seaside, CA, USA, February 24-26, 2019, K. Bazargan and S. Neuendorffer, Eds. ACM, 2019, pp. 1–9. [Online]. Available: https://doi.org...
-
[45]
Configurable FPGA packet parser for terabit networks with guaranteed wire-speed throughput,
J. Cabal, P. Ben ´acek, L. Kekely, M. Kekely, V . Pus, and J. Korenek, “Configurable FPGA packet parser for terabit networks with guaranteed wire-speed throughput,” inProceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA 2018, Monterey, CA, USA, February 25-27, 2018, J. H. Anderson and K. Bazargan, Eds. ACM, 201...
-
[46]
Dataset of scada traffic captures from a medical waste incinerator with injected cyberattacks,
B. Al-Duwairi, A. Shatnawi, A. Al-Hammouri, and M. Ababneh, “Dataset of scada traffic captures from a medical waste incinerator with injected cyberattacks,”Data in Brief, vol. 63, p. 112294, 11 2025
2025
-
[47]
Ai-defined flow control in programmable network fabric (ai-fabric): The nanosecond flow intelligence module (nfim) for ultra-low-latency scheduling,
S. K. Balakrishnan, “Ai-defined flow control in programmable network fabric (ai-fabric): The nanosecond flow intelligence module (nfim) for ultra-low-latency scheduling,” 11 2025
2025
-
[48]
Accelerating distributed reinforcement learning with in-switch computing,
Y . Li, I. Liu, Y . Yuan, D. Chen, A. G. Schwing, and J. Huang, “Accelerating distributed reinforcement learning with in-switch computing,” inProceedings of the 46th International Symposium on Computer Architecture, ISCA 2019, Phoenix, AZ, USA, June 22-26, 2019, S. B. Manne, H. C. Hunter, and E. R. Altman, Eds. ACM, 2019, pp. 279–291. [Online]. Available:...
-
[49]
Characterizing microservice dependency and performance: Alibaba trace analysis,
S. Luo, H. Xu, C. Lu, K. Ye, G. Xu, L. Zhang, Y . Ding, J. He, and C. Xu, “Characterizing microservice dependency and performance: Alibaba trace analysis,” inProceedings of the ACM Symposium on Cloud Computing, 2021, pp. 412–426
2021
-
[50]
Desert underwater: An ns-miracle-based framework to design, simulate, emulate and realize test-beds for underwater network protocols,
R. Masiero, S. Azad, F. Favaro, M. Petrani, G. Toso, F. Guerra, P. Casari, and M. Zorzi, “Desert underwater: An ns-miracle-based framework to design, simulate, emulate and realize test-beds for underwater network protocols,” in2012 Oceans - Yeosu, 2012, pp. 1–10
2012
-
[51]
Inside the social network’s (datacenter) network,
A. Roy, H. Zeng, J. Bagga, G. Porter, and A. C. Snoeren, “Inside the social network’s (datacenter) network,” inProceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, SIGCOMM 2015, London, United Kingdom, August 17-21, 2015, S. Uhlig, O. Maennel, B. Karp, and J. Padhye, Eds. ACM, 2015, pp. 123–137. [Online]. Available: http...
-
[52]
Scaling distributed machine learning with in-network aggregation,
A. Sapio, M. Canini, C. Ho, J. Nelson, P. Kalnis, C. Kim, A. Krishnamurthy, M. Moshref, D. R. K. Ports, and P. Richt ´arik, “Scaling distributed machine learning with in-network aggregation,” in18th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2021, April 12-14, 2021, J. Mickens and R. Teixeira, Eds. USENIX Association, 2021, pp. ...
2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.