pith. sign in
Pith Number

pith:PFQ4IG73

pith:2026:PFQ4IG73A4U5ZLJPCG76ZK7MIQ
not attested not anchored not stored refs resolved

PreFT: Prefill-only finetuning for efficient inference

Andrew Lanpouthakoun, Aryaman Arora, Ben Keigwin, Christopher Potts, Dan Jurafsky, Dhruv Pai, Zhengxuan Wu

Applying adapters only during prefill and discarding them afterward raises serving throughput nearly twofold while keeping performance near standard PEFT levels.

arxiv:2605.14217 v1 · 2026-05-14 · cs.LG · cs.AI · cs.CL · cs.SY · eess.SY

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{PFQ4IG73A4U5ZLJPCG76ZK7MIQ}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

serving multi-user PreFTs is more efficient than traditional PEFTs (1.9× the throughput when serving 512 adapters on Llama 3.1 70B). On RL tasks PreFTs approach parity with standard PEFTs.

C2weakest assumption

That discarding the adapter after prefill does not materially degrade the quality of the generated tokens on downstream tasks, and that any loss can be offset by increasing adapter rank without throughput cost.

C3one line summary

Prefill-only adaptation of LLMs yields 1.9x higher throughput for 512 adapters on Llama 3.1 70B with near-parity performance on RL tasks and recoverable loss on SFT.

References

46 extracted · 46 resolved · 10 Pith anchors

[1] On-policy distillation of language models: Learning from self-generated mistakes 2024
[2] Program Synthesis with Large Language Models 2021 · arXiv:2108.07732
[3] How to Scale Your Model 2025
[4] 2408.07055 , archiveprefix = 2024
[5] LoRA-XS : Low-rank adaptation with extremely small number of parameters 2025 · doi:10.3233/faia251185
Receipt and verification
First computed 2026-05-17T23:39:10.867464Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

7961c41bfb0729dcad2f11bfecabec44385f09e18554f02d884e88f96772842a

Aliases

arxiv: 2605.14217 · arxiv_version: 2605.14217v1 · doi: 10.48550/arxiv.2605.14217 · pith_short_12: PFQ4IG73A4U5 · pith_short_16: PFQ4IG73A4U5ZLJP · pith_short_8: PFQ4IG73
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/PFQ4IG73A4U5ZLJPCG76ZK7MIQ \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 7961c41bfb0729dcad2f11bfecabec44385f09e18554f02d884e88f96772842a
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "ae94a8a8261dba87cb2891b1f9d169d168d2be8c1ae311ed59ec1cceb1113bef",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CL",
      "cs.SY",
      "eess.SY"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-05-14T00:19:41Z",
    "title_canon_sha256": "f6fbe28f555b9ae084ed42d96daebe9c88f492ec8dabe5a81d3ffc062b157bb5"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.14217",
    "kind": "arxiv",
    "version": 1
  }
}