Pith Number

pith:7YXM6JRY

pith:2024:7YXM6JRYOH4O5NBMCHKEPDPSW2

not attested not anchored not stored refs resolved

A Survey on Efficient Inference for Large Language Models

Guohao Dai, Jiaming Xu, Ke Hong, Luning Wang, Shengen Yan, Shiyao Li, Tianyu Fu, Xiao-Ping Zhang, Xiuhong Li, Xuefei Ning, Yuhan Dong, Yuming Lou, Yu Wang, Zhihang Yuan, Zixuan Zhou

A survey organizes methods for efficient large language model inference into data-level, model-level, and system-level categories and benchmarks representative techniques.

arxiv:2404.14294 v3 · 2024-04-22 · cs.CL · cs.AI

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{7YXM6JRYOH4O5NBMCHKEPDPSW2}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

This paper presents a comprehensive survey of the existing literature on efficient LLM inference... organized into data-level, model-level, and system-level optimization... with comparative experiments on representative methods.

C2weakest assumption

That the chosen representative methods and experimental comparisons fairly represent the broader literature and yield generalizable quantitative insights without significant selection bias.

C3one line summary

The paper surveys techniques to speed up and reduce the resource needs of LLM inference, organized by data-level, model-level, and system-level changes, with comparative experiments on representative methods.

References

298 extracted · 298 resolved · 41 Pith anchors

[1] Improving language understanding by generative pre-training, 2018

[2] Language models are unsupervised multitask learners 2019

[3] Language models are few-shot learners 1901

[4] OPT: Open Pre-trained Transformer Language Models 2022 · arXiv:2205.01068

[6] Baichuan 2: Open large-scale language models 2023 · arXiv:2309.10305

Formal links

2 machine-checked theorem links

Cited by

36 papers in Pith

Growing a Multi-head Twig via Distillation and Reinforcement Learning to Accelerate Large Vision-Language Models

EntroLLM: Entropy Encoded Weight Compression for Efficient Large Language Model Inference on Edge Devices

Policy Contrastive Decoding for Robotic Foundation Models

FASTER: Rethinking Real-Time Flow VLAs

Unified Deployment-Aware Evaluation of Open Reasoning Language Models

Receipt and verification

First computed	2026-05-17T23:38:53.798407Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

fe2ecf263871f8eeb42c11d4478df2b69b77748d33f8d92acab2b44d81666059

Aliases

arxiv: 2404.14294 · arxiv_version: 2404.14294v3 · doi: 10.48550/arxiv.2404.14294 · pith_short_12: 7YXM6JRYOH4O · pith_short_16: 7YXM6JRYOH4O5NBM · pith_short_8: 7YXM6JRY

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/7YXM6JRYOH4O5NBMCHKEPDPSW2 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: fe2ecf263871f8eeb42c11d4478df2b69b77748d33f8d92acab2b44d81666059

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "7e45755716429abd0dc0e09cd3eff786a25f8857d55ccb1a7e23f2fa7d08b786",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2024-04-22T15:53:08Z",
    "title_canon_sha256": "0158e010d7858a65e7781dd03ec62b813bbae982fd020a8150281fd273403c03"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2404.14294",
    "kind": "arxiv",
    "version": 3
  }
}