Unifying LLM memory optimizations into a Prepare-Compute-Retrieve-Apply pipeline and accelerating it on GPU-FPGA hardware yields up to 2.2x faster inference and 4.7x less energy than GPU-only baselines.
Flightllm: Efficient large language model inference with a complete mapping flow on fpgas
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.DC 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Understand and Accelerate Memory Processing Pipeline for Disaggregated LLM Inference
Unifying LLM memory optimizations into a Prepare-Compute-Retrieve-Apply pipeline and accelerating it on GPU-FPGA hardware yields up to 2.2x faster inference and 4.7x less energy than GPU-only baselines.