Profiling Large Language Model Inference on Apple Silicon: A Quantization Perspective.arXiv preprint arXiv:2508.08531, 2025

Afsara Benazir, Felix Xiaozhu Lin · 2025 · arXiv 2508.08531

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

BaseRT: Best-in-Class LLM Inference on Apple Silicon via Native Metal

cs.CL · 2026-07-01 · unverdicted · novelty 4.0

BaseRT achieves up to 1.56x higher LLM decode throughput than llama.cpp on Apple Silicon through native Metal kernel fusion and unified memory optimizations.

citing papers explorer

Showing 1 of 1 citing paper.

BaseRT: Best-in-Class LLM Inference on Apple Silicon via Native Metal cs.CL · 2026-07-01 · unverdicted · none · ref 17
BaseRT achieves up to 1.56x higher LLM decode throughput than llama.cpp on Apple Silicon through native Metal kernel fusion and unified memory optimizations.

Profiling Large Language Model Inference on Apple Silicon: A Quantization Perspective.arXiv preprint arXiv:2508.08531, 2025

fields

years

verdicts

representative citing papers

citing papers explorer