Flightllm: Efficient large language model inference with a complete mapping flow on fpgas

Zeng, S · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Understand and Accelerate Memory Processing Pipeline for Disaggregated LLM Inference

cs.DC · 2026-03-30 · unverdicted · novelty 5.0

Unifying LLM memory optimizations into a Prepare-Compute-Retrieve-Apply pipeline and accelerating it on GPU-FPGA hardware yields up to 2.2x faster inference and 4.7x less energy than GPU-only baselines.

citing papers explorer

Showing 1 of 1 citing paper.

Understand and Accelerate Memory Processing Pipeline for Disaggregated LLM Inference cs.DC · 2026-03-30 · unverdicted · none · ref 30
Unifying LLM memory optimizations into a Prepare-Compute-Retrieve-Apply pipeline and accelerating it on GPU-FPGA hardware yields up to 2.2x faster inference and 4.7x less energy than GPU-only baselines.

Flightllm: Efficient large language model inference with a complete mapping flow on fpgas

fields

years

verdicts

representative citing papers

citing papers explorer