pith. sign in
Pith Number

pith:JO2OSSKH

pith:2026:JO2OSSKHNSUX3EPGK72F2A7S2B
not attested not anchored not stored refs resolved

Large Language Model as Token Compressor and Decompressor

Jielei Zhang, Junkai Lin, Tianhao Zhao, Wei Yang, Wenbing Li, Yiran Wang, Zikai Song

An off-the-shelf LLM can be fine-tuned with LoRA to compress long texts into adaptive sequences of Z-tokens while preserving reconstruction and task performance.

arxiv:2603.25340 v2 · 2026-03-26 · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{JO2OSSKHNSUX3EPGK72F2A7S2B}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

an off-the-shelf LLM can be adapted into a discrete, variable-length token compressor and decompressor for long-context processing

C2weakest assumption

that fine-tuning with LoRA on the self-expressive autoencoding objective will produce Z-tokens that preserve enough information for both faithful reconstruction and downstream task performance without requiring extensive post-hoc adjustments

C3one line summary

A pretrained LLM is adapted via LoRA fine-tuning into a content-adaptive compressor that maps long texts to compact variable-length Z-token sequences while preserving reconstruction quality and downstream performance.

References

47 extracted · 47 resolved · 0 Pith anchors

[1] Peters, and Arman Cohan 2020
[2] Token merging: Your vit but faster, 2023 2023
[3] Hudson, Ehsan Adeli, et al 2022
[4] Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Sub- biah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakan- tan, Pranav Shyam, Girish Sastry, Amanda Askell, Sand- hini Agarwal, Ariel Herbert-V oss, 2020
[5] Adapting language models to compress con- texts, 2023 2023

Cited by

4 papers in Pith

Receipt and verification
First computed 2026-05-18T03:09:22.531461Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

4bb4e949476ca97d91e657f45d03f2d06a74156ef7ae405b0b812d0b1b2d5b9c

Aliases

arxiv: 2603.25340 · arxiv_version: 2603.25340v2 · doi: 10.48550/arxiv.2603.25340 · pith_short_12: JO2OSSKHNSUX · pith_short_16: JO2OSSKHNSUX3EPG · pith_short_8: JO2OSSKH
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/JO2OSSKHNSUX3EPGK72F2A7S2B \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 4bb4e949476ca97d91e657f45d03f2d06a74156ef7ae405b0b812d0b1b2d5b9c
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "0cf7210c30b7c99e5301fe6099b8f4977c9a456a51c925172c937b42dfe89f26",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2026-03-26T11:30:44Z",
    "title_canon_sha256": "b81dec45054f97d889dd3e9fc8e6a6c4b10f0b70d449fb36e98d42bc40260256"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2603.25340",
    "kind": "arxiv",
    "version": 2
  }
}