SCENIC delivers a programmable 200G SmartNIC with offloaded protocol stacks, stream compute units, and full OS transparency that matches commercial performance for custom offloads like collective communication and GPU data partitioning.
Jones, Jingtong Hu, Yiyu Shi, and Peipei Zhou
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.AR 5verdicts
UNVERDICTED 5roles
background 1polarities
background 1representative citing papers
DORA is an instruction-based DNN accelerator architecture with a two-stage compilation framework that delivers stable efficiency across varied workloads and up to 5x throughput gains versus prior accelerators on FPGA.
FILCO introduces a real-time reconfigurable composing architecture for DNN acceleration that achieves 1.3x-5x better throughput and hardware efficiency than prior designs on diverse workloads via an analytical model and two-stage design space exploration.
PLENA introduces a co-designed system with three optimization pathways for long-context agentic LLM inference, claiming up to 2.23x throughput over A100 and 4.04x energy efficiency.
Systematic study concludes overlay architectures suit frequent model switching in current autonomous driving setups, while customized ones may become preferable as bitstream reload overhead decreases.
citing papers explorer
-
SCENIC: Stream Computation-Enhanced SmartNIC
SCENIC delivers a programmable 200G SmartNIC with offloaded protocol stacks, stream compute units, and full OS transparency that matches commercial performance for custom offloads like collective communication and GPU data partitioning.
-
DORA: Dataflow-Instruction Orchestration Architecture for DNN Acceleration
DORA is an instruction-based DNN accelerator architecture with a two-stage compilation framework that delivers stable efficiency across varied workloads and up to 5x throughput gains versus prior accelerators on FPGA.
-
FILCO: Flexible Composing Architecture with Real-Time Reconfigurability for DNN Acceleration
FILCO introduces a real-time reconfigurable composing architecture for DNN acceleration that achieves 1.3x-5x better throughput and hardware efficiency than prior designs on diverse workloads via an analytical model and two-stage design space exploration.
-
Combating the Memory Walls: Optimization Pathways for Long-Context Agentic LLM Inference
PLENA introduces a co-designed system with three optimization pathways for long-context agentic LLM inference, claiming up to 2.23x throughput over A100 and 4.04x energy efficiency.
-
To Overlay or to Customize? Revisiting Architectural Choices in Heterogeneous Systems
Systematic study concludes overlay architectures suit frequent model switching in current autonomous driving setups, while customized ones may become preferable as bitstream reload overhead decreases.