pith. sign in

Fast on-device llm inference with npus

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

fields

cs.AR 1 cs.NI 1

years

2026 2

verdicts

UNVERDICTED 2

representative citing papers

NPU Design for Diffusion Language Model Inference

cs.AR · 2026-01-28 · unverdicted · novelty 8.0

Introduces the first NPU accelerator for diffusion language models with dLLM-specific ISA, hardware execution model, BAOS KV quantization, and 7nm RTL synthesis.

citing papers explorer

Showing 2 of 2 citing papers.

  • NPU Design for Diffusion Language Model Inference cs.AR · 2026-01-28 · unverdicted · none · ref 8

    Introduces the first NPU accelerator for diffusion language models with dLLM-specific ISA, hardware execution model, BAOS KV quantization, and 7nm RTL synthesis.

  • SparKV: Overhead-Aware KV Cache Loading for Efficient On-Device LLM Inference cs.NI · 2026-04-23 · unverdicted · none · ref 6

    SparKV reduces time-to-first-token by 1.3x-5.1x and energy use by 1.5x-3.3x for on-device LLM inference by adaptively choosing between cloud KV streaming and local computation while overlapping execution and adjusting for runtime conditions.