VitaLLM demonstrates a 16nm silicon prototype accelerator achieving 72.46 tokens/s decode for 3B ternary LLMs in 0.214 mm² area with reduced KV cache traffic via predictive sparse attention.
Slim-Llama: A 4.69mw large-language- model processor with binary/ternary weights for billion-parameter llama model,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AR 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
VitaLLM: A Versatile and Tiny Accelerator for Mixed-Precision LLM Inference on Edge Devices
VitaLLM demonstrates a 16nm silicon prototype accelerator achieving 72.46 tokens/s decode for 3B ternary LLMs in 0.214 mm² area with reduced KV cache traffic via predictive sparse attention.