vLLM TPU support

vLLM Team · 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Ragged Paged Attention: A High-Performance and Flexible LLM Inference Kernel for TPU

cs.PF · 2026-04-16 · unverdicted · novelty 7.0

RPA kernel for TPUs achieves 86% MBU in decode and 73% MFU in prefill on Llama 3 8B via tiling for ragged memory, fused pipelines, and specialized compilation for prefill/decode workloads.

citing papers explorer

Showing 1 of 1 citing paper.

Ragged Paged Attention: A High-Performance and Flexible LLM Inference Kernel for TPU cs.PF · 2026-04-16 · unverdicted · none · ref 11
RPA kernel for TPUs achieves 86% MBU in decode and 73% MFU in prefill on Llama 3 8B via tiling for ragged memory, fused pipelines, and specialized compilation for prefill/decode workloads.

vLLM TPU support

fields

years

verdicts

representative citing papers

citing papers explorer