pith. sign in
Pith Number

pith:7YXM6JRY

pith:2024:7YXM6JRYOH4O5NBMCHKEPDPSW2
not attested not anchored not stored refs resolved

A Survey on Efficient Inference for Large Language Models

Guohao Dai, Jiaming Xu, Ke Hong, Luning Wang, Shengen Yan, Shiyao Li, Tianyu Fu, Xiao-Ping Zhang, Xiuhong Li, Xuefei Ning, Yuhan Dong, Yuming Lou, Yu Wang, Zhihang Yuan, Zixuan Zhou

A survey organizes methods for efficient large language model inference into data-level, model-level, and system-level categories and benchmarks representative techniques.

arxiv:2404.14294 v3 · 2024-04-22 · cs.CL · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{7YXM6JRYOH4O5NBMCHKEPDPSW2}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

This paper presents a comprehensive survey of the existing literature on efficient LLM inference... organized into data-level, model-level, and system-level optimization... with comparative experiments on representative methods.

C2weakest assumption

That the chosen representative methods and experimental comparisons fairly represent the broader literature and yield generalizable quantitative insights without significant selection bias.

C3one line summary

The paper surveys techniques to speed up and reduce the resource needs of LLM inference, organized by data-level, model-level, and system-level changes, with comparative experiments on representative methods.

References

298 extracted · 298 resolved · 41 Pith anchors

[1] Improving language understanding by generative pre-training, 2018
[2] Language models are unsupervised multitask learners 2019
[3] Language models are few-shot learners 1901
[4] OPT: Open Pre-trained Transformer Language Models 2022 · arXiv:2205.01068
[6] Baichuan 2: Open large-scale language models 2023 · arXiv:2309.10305

Formal links

2 machine-checked theorem links

Cited by

36 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:53.798407Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

fe2ecf263871f8eeb42c11d4478df2b69b77748d33f8d92acab2b44d81666059

Aliases

arxiv: 2404.14294 · arxiv_version: 2404.14294v3 · doi: 10.48550/arxiv.2404.14294 · pith_short_12: 7YXM6JRYOH4O · pith_short_16: 7YXM6JRYOH4O5NBM · pith_short_8: 7YXM6JRY
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/7YXM6JRYOH4O5NBMCHKEPDPSW2 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: fe2ecf263871f8eeb42c11d4478df2b69b77748d33f8d92acab2b44d81666059
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "7e45755716429abd0dc0e09cd3eff786a25f8857d55ccb1a7e23f2fa7d08b786",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2024-04-22T15:53:08Z",
    "title_canon_sha256": "0158e010d7858a65e7781dd03ec62b813bbae982fd020a8150281fd273403c03"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2404.14294",
    "kind": "arxiv",
    "version": 3
  }
}