pith. sign in
Pith Number

pith:A7V2254Y

pith:2026:A7V2254YD5WHMLNE5DFW5Z5X37
not attested not anchored not stored refs resolved

Lever: Speculative LLM Inference on Smartphones

Fengzu Li, Ju Ren, Tuowei Wang, Wei Gao, Yanfan Sun

Lever reduces smartphone LLM inference latency by 2.93x over flash baselines through optimized speculative decoding.

arxiv:2605.16786 v1 · 2026-05-16 · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{A7V2254YD5WHMLNE5DFW5Z5X37}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Lever reduces inference latency by an average of 2.93x over baseline flash-offloaded inference and 1.50x over conventional speculative decoding, narrowing the latency gap between flash-backed and memory-resident LLM inference.

C2weakest assumption

The assumption that jointly optimizing token-tree construction with an I/O- and compute-aware gain-cost objective, early-exit pruning, and CPU-NPU mapping will deliver the claimed speedups under real smartphone I/O latency and parallelism constraints (stated in the abstract description of the three stages).

C3one line summary

Lever optimizes the drafting, verification, and execution stages of speculative decoding for flash-backed LLM inference on smartphones, reporting 2.93x average latency reduction over baseline flash-offloaded inference.

References

40 extracted · 40 resolved · 8 Pith anchors

[1] Llm in a flash: Efficient large language model inference with limited memory 2024
[2] Hydra: Sequentially-dependent draft heads for medusa decoding 2024
[3] Program Synthesis with Large Language Models 2021 · arXiv:2108.07732
[4] Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads 2024 · arXiv:2401.10774
[5] Accelerating Large Language Model Decoding with Speculative Sampling 2023 · arXiv:2302.01318

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-05-20T00:03:21.936503Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

07ebad77981f6c762da4e8cb6ee7b7dfeb5aa04bbd6b90890cba5937f281cb64

Aliases

arxiv: 2605.16786 · arxiv_version: 2605.16786v1 · doi: 10.48550/arxiv.2605.16786 · pith_short_12: A7V2254YD5WH · pith_short_16: A7V2254YD5WHMLNE · pith_short_8: A7V2254Y
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/A7V2254YD5WHMLNE5DFW5Z5X37 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 07ebad77981f6c762da4e8cb6ee7b7dfeb5aa04bbd6b90890cba5937f281cb64
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "3f929fedd8949b62128e5a9887c5e2203e6e36e004f53e0ef44725a255407b01",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-05-16T03:43:10Z",
    "title_canon_sha256": "e169d93e2760b75e10d1dd3c072fb954d0b34eeb2ef0500afd4663bbc98af489"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.16786",
    "kind": "arxiv",
    "version": 1
  }
}