Context-aware cross-family speculative decoding reaches 1.7x speedup on structured Polish text but fails to deliver gains on varied instructions because both models are memory-bandwidth bound on unified memory.
Native LLM and MLLM Inference at Scale on Apple Silicon
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Cross-Family Speculative Decoding for Polish Language Models on Apple~Silicon: An Empirical Evaluation of Bielik~11B with UAG-Extended MLX-LM
Context-aware cross-family speculative decoding reaches 1.7x speedup on structured Polish text but fails to deliver gains on varied instructions because both models are memory-bandwidth bound on unified memory.