Merino: Entropy-driven design for generative language models on iot devices

Youpeng Zhao, Ming Lin, Huadong Tang, Qiang Wu, Jun Wang · arXiv 2403.17312

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

GhostServe: A Lightweight Checkpointing System in the Shadow for Fault-Tolerant LLM Serving

cs.DC · 2026-03-26 · unverdicted · novelty 7.0

GhostServe applies erasure coding to KV cache in host memory for fast recovery from failures in LLM serving, cutting checkpointing latency up to 2.7x and recovery latency 2.1x versus prior methods.

TIDE: Efficient and Lossless MoE Diffusion LLM Inference with I/O-aware Expert Offload

cs.CL · 2026-05-19 · unverdicted · novelty 5.0

TIDE schedules I/O-aware expert offloading for MoE diffusion LLMs by solving for an optimal refresh interval that exploits temporal stability of activations, yielding up to 1.5x throughput gain losslessly.

citing papers explorer

Showing 2 of 2 citing papers.

GhostServe: A Lightweight Checkpointing System in the Shadow for Fault-Tolerant LLM Serving cs.DC · 2026-03-26 · unverdicted · none · ref 23
GhostServe applies erasure coding to KV cache in host memory for fast recovery from failures in LLM serving, cutting checkpointing latency up to 2.7x and recovery latency 2.1x versus prior methods.
TIDE: Efficient and Lossless MoE Diffusion LLM Inference with I/O-aware Expert Offload cs.CL · 2026-05-19 · unverdicted · none · ref 12
TIDE schedules I/O-aware expert offloading for MoE diffusion LLMs by solving for an optimal refresh interval that exploits temporal stability of activations, yielding up to 1.5x throughput gain losslessly.

Merino: Entropy-driven design for generative language models on iot devices

fields

years

verdicts

representative citing papers

citing papers explorer