MobileLLM: Optimizing sub-billion parameter language models for on-device use cases,

· 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

VitaLLM: A Versatile, Ultra-Compact Ternary LLM Accelerator with Dependency-Aware Scheduling

cs.AR · 2026-04-30 · conditional · novelty 6.0

VitaLLM delivers 70.7 tokens/s decoding in a 0.223 mm² TSMC 16 nm chip at 66 mW with a figure-of-merit of 17.4 TOPS/mm²/W by combining TINT cores, BoothFlex attention, leading-one prediction, and dependency-aware scheduling.

citing papers explorer

Showing 1 of 1 citing paper.

VitaLLM: A Versatile, Ultra-Compact Ternary LLM Accelerator with Dependency-Aware Scheduling cs.AR · 2026-04-30 · conditional · none · ref 1
VitaLLM delivers 70.7 tokens/s decoding in a 0.223 mm² TSMC 16 nm chip at 66 mW with a figure-of-merit of 17.4 TOPS/mm²/W by combining TINT cores, BoothFlex attention, leading-one prediction, and dependency-aware scheduling.

MobileLLM: Optimizing sub-billion parameter language models for on-device use cases,

fields

years

verdicts

representative citing papers

citing papers explorer