BaseRT achieves up to 1.56x higher LLM decode throughput than llama.cpp on Apple Silicon through native Metal kernel fusion and unified memory optimizations.
Profiling Large Language Model Inference on Apple Silicon: A Quantization Perspective.arXiv preprint arXiv:2508.08531, 2025
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
BaseRT: Best-in-Class LLM Inference on Apple Silicon via Native Metal
BaseRT achieves up to 1.56x higher LLM decode throughput than llama.cpp on Apple Silicon through native Metal kernel fusion and unified memory optimizations.