SemanticZip: A Pilot Framework for Lossy Text Compression with LLMs as Semantic Decompressors

Natalia Trukhina; Vadim Vashkelis

arxiv: 2605.24541 · v1 · pith:3HNIKVINnew · submitted 2026-05-23 · 💻 cs.LG · cs.AI· cs.CL· cs.IR

SemanticZip: A Pilot Framework for Lossy Text Compression with LLMs as Semantic Decompressors

Natalia Trukhina , Vadim Vashkelis This is my paper

Pith reviewed 2026-06-30 14:41 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CLcs.IR

keywords semantic compressionlossy compressionLLM decompressiontext codecssemantic atomsprotected packetsrecoverabilitytoken gain

0 comments

The pith

SemanticZip introduces lossy text compression where LLMs decompress compact codes to recover task-relevant semantic commitments while protecting exact parts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents SemanticZip as a pilot framework for compressing text into compact codes that an LLM can later expand back into meaningful content for a given task. Unlike standard compression or summarization, it evaluates recovery of semantic atoms using an independent decoder LLM and distinguishes between protected exact text and lossy semantic packets. Six representation regimes are tested on five diagnostic cases, showing varying levels of recall and token savings, with structured prose performing best on recovery and SemanticZip ASCII on compression. The authors emphasize that this is an experimental interface rather than a performance benchmark, highlighting a principle for deciding what to compress.

Core claim

The central claim is that LLM-mediated decompression can be formalized using a protected/lossy packet architecture, allowing evaluation of recoverability across representation regimes, and that the viable approach is to keep safety-critical commitments protected while semantically zipping predictable low-risk context.

What carries the argument

The protected/lossy packet architecture that separates exact safety-critical text from compressible semantic codes decompressed by an LLM.

If this is right

Task-relevant semantic commitments can be recovered with weighted atom recall above 0.8 in several tested regimes.
Token reductions of 19% to 46% are achievable depending on the representation chosen.
Safety-critical text remains unchanged while low-risk context is replaced by compact codes.
The framework provides metrics like Critical Atom Recall and tokenizer gain for comparing different codes.
Different formats such as prose, JSON, and emoji yield distinct trade-offs between compression and fidelity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Applying the protected/lossy split to real documents like reports or logs could reduce storage while maintaining key facts.
Future tests on diverse, non-author-constructed texts would check if the observed recoverability holds more broadly.
The method might integrate with retrieval systems where only semantic summaries are stored for context.
Risk assessment models could automatically decide which parts of text to protect versus zip.

Load-bearing premise

The five author-constructed diagnostic cases are representative enough to evaluate recoverability of task-relevant semantic commitments across the six representation regimes.

What would settle it

If evaluations on a broader set of independently created documents show that no representation regime achieves weighted atom recall above 0.7 while providing meaningful token savings, the practical utility of the framework would be called into question.

Figures

Figures reproduced from arXiv: 2605.24541 by Natalia Trukhina, Vadim Vashkelis.

read the original abstract

Text compression for large language model (LLM) systems is usually framed as token deletion, retrieval, summarization, or exact reconstruction. We study a more aggressive but explicitly lossy setting: compress text into compact codes that an LLM can expand into task-relevant meaning. We call this setting SemanticZip. Unlike lossless compression, SemanticZip does not require byte-identical reconstruction; unlike ordinary summarization, it treats model-based decompression as part of the codec and evaluates whether task-relevant semantic commitments are recovered. This paper is a pilot framework, not a benchmark claim. We formalize LLM-mediated decompression, define a protected/lossy packet architecture, and evaluate six representation regimes over five author-constructed diagnostic cases: structured prose, JSON, CCL-Core, CCL-Min, SemanticZip ASCII, and SemanticZip emoji. An independent decoder LLM reconstructs typed semantic atoms from each compressed representation, and we score Critical Atom Recall, Weighted Atom Recall, precision, and tokenizer gain. In this pilot, structured prose has the highest recoverability, with WAR = 0.956 and 19.1% o200k_base token gain. CCL-Min is the strongest balanced point, with 39.4% token gain and WAR = 0.874. SemanticZip ASCII provides the largest useful compression, with 46.5% token gain and WAR = 0.802, while emoji-heavy SemanticZip performs worse on both compression and recovery. The main contribution is not the claim that these numbers establish a universal frontier. Rather, we introduce a reproducible experimental interface for studying lossy, LLM-decompressible text codes and a design principle: safety-critical and exact commitments should remain protected, while predictable low-risk context may be semantically zipped.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SemanticZip defines a lossy LLM-decompression setting with a protected/lossy packet split and runs it on five diagnostic cases as an explicit pilot.

read the letter

The paper's main move is to treat LLM-based expansion as part of the codec itself, scoring recovery on typed semantic atoms rather than exact text or summary quality. They formalize a protected/lossy packet split so critical commitments stay untouched while low-risk context can be compressed, then compare six regimes (structured prose, JSON, CCL variants, SemanticZip ASCII and emoji) on five author-built cases using Critical Atom Recall, Weighted Atom Recall, precision, and token gain.

The interface looks reproducible from the description, and the results are reported plainly: structured prose recovers best (WAR 0.956), CCL-Min balances gain and recall, and ASCII SemanticZip reaches 46.5% token gain at WAR 0.802. The design principle about keeping safety-critical parts protected is stated directly and matches the evaluation setup.

The limitation is exactly the one they name: everything rests on five hand-constructed diagnostic cases with no external baselines, error bars, or statistical tests. That makes the reported tradeoffs illustrative rather than general. The pilot framing prevents this from becoming a load-bearing flaw, but it does mean the numbers cannot yet support broader claims about real distributions.

This is for people already working on LLM input efficiency or semantic compression who want a starting experimental scaffold. It deserves peer review because the framing is distinct and the authors have scoped the claims to match the evidence they actually provide.

Referee Report

0 major / 2 minor

Summary. The paper claims to introduce SemanticZip, a pilot framework for lossy text compression where LLMs serve as semantic decompressors. It formalizes the decompression process, defines a protected/lossy packet architecture, and evaluates six representation regimes (structured prose, JSON, CCL-Core, CCL-Min, SemanticZip ASCII, SemanticZip emoji) across five author-constructed diagnostic cases. Metrics reported include Critical Atom Recall, Weighted Atom Recall (WAR), precision, and tokenizer gain. Structured prose achieves the highest WAR of 0.956 with 19.1% token gain, while SemanticZip ASCII offers 46.5% gain with WAR 0.802. The primary contribution is positioned as a reproducible experimental interface and the design principle that safety-critical commitments should be protected while low-risk context can be semantically zipped.

Significance. If the framework holds, it offers a novel approach to studying lossy compression tailored for LLM systems by treating decompression as part of the codec. This could be significant for developing efficient context management strategies in LLM applications. The work explicitly frames itself as a pilot without claiming generalizability, which strengthens its position. Credit is given for providing a reproducible experimental interface and a clear design principle separating protected and lossy packets. The evaluation on diagnostic cases demonstrates the concept without overclaiming.

minor comments (2)

[Abstract] Abstract: the notation '19.1% o200k_base token gain' is unclear; specify the tokenizer or base model referenced (e.g., gpt-4o-200k).
Ensure acronyms such as CCL-Core and CCL-Min are expanded on first use in the main body, as they are not defined in the abstract.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of the work as a pilot framework, for crediting the reproducible experimental interface and the protected/lossy packet design principle, and for recommending minor revision. No specific major comments were listed in the report.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is a pilot framework that introduces a reproducible experimental interface and a protected/lossy packet design principle. It evaluates six representation regimes on five author-constructed diagnostic cases using explicitly defined metrics (Critical Atom Recall, WAR, etc.) without equations, derivations, fitted-parameter predictions, or self-citations. The contribution is scoped to the interface and principle rather than any generalization claim, so no load-bearing step reduces to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The paper is a pilot framework that introduces new terminology and an experimental setup without mathematical derivations, fitted parameters, or standard axioms beyond the basic experimental design.

invented entities (1)

protected/lossy packet architecture no independent evidence
purpose: To separate safety-critical exact commitments from compressible low-risk context in the compression scheme
Newly defined as part of the framework with no independent evidence or external validation provided in the abstract.

pith-pipeline@v0.9.1-grok · 5860 in / 1223 out tokens · 66427 ms · 2026-06-30T14:41:50.914290+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

15 extracted references · 6 canonical work pages · 4 internal anchors

[1]

Schmidt, Jesse Spencer-Smith, and Jules White

Henry Gilbert, Michael Sandborn, Douglas C. Schmidt, Jesse Spencer-Smith, and Jules White. Semantic compression with large language models. arXiv preprint arXiv:2304.12512, 2023. https://arxiv. org/abs/2304.12512

work page arXiv 2023
[2]

Language Modeling Is Compression

Gr´egoire Del´etang, Anian Ruoss, Paul-Ambroise Duquenne, Elliot Catt, Tim Genewein, Christopher Mattern, Jordi Grau-Moya, Li Kevin Wenliang, Matthew Aitchison, Laurent Orseau, Marcus Hutter, and Joel Veness. Language modeling is compression. arXiv preprint arXiv:2309.10668, 2023. https: //arxiv.org/abs/2309.10668

work page internal anchor Pith review Pith/arXiv arXiv 2023
[3]

LLMLingua: Compressing prompts for accelerated inference of large language models

Huiqiang Jiang, Qianhui Wu, Chin-Yew Lin, Yuqing Yang, and Lili Qiu. LLMLingua: Compressing prompts for accelerated inference of large language models. InProceedings of EMNLP, 2023. https: //aclanthology.org/2023.emnlp-main.825/

2023
[4]

LongLLMLingua: Accelerating and enhancing LLMs in long context scenarios via prompt compression

Huiqiang Jiang, Qianhui Wu, Xufang Luo, Dongsheng Li, Chin-Yew Lin, Yuqing Yang, and Lili Qiu. LongLLMLingua: Accelerating and enhancing LLMs in long context scenarios via prompt compression. InProceedings of ACL, 2024.https://arxiv.org/abs/2310.06839

work page arXiv 2024
[5]

Selective Context: Compress your input to ChatGPT or other LLMs

Yucheng Li. Selective Context: Compress your input to ChatGPT or other LLMs. 2023. https: //github.com/liyucheng09/Selective_Context

2023
[6]

Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang

Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. Lost in the middle: How language models use long contexts.Transactions of the Association for Computational Linguistics, 12:157–173, 2024. https://aclanthology.org/2024.tacl-1. 9/

2024
[7]

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich K¨uttler, Mike Lewis, Wen-tau Yih, Tim Rockt¨aschel, Sebastian Riedel, and Douwe Kiela. 12 Retrieval-augmented generation for knowledge-intensive NLP tasks. InAdvances in Neural Information Processing Systems, 2020.https://arxiv.org/abs/2005.11401

work page internal anchor Pith review Pith/arXiv arXiv 2020
[8]

MemGPT: Towards LLMs as Operating Systems

Charles Packer, Vivian Fang, Shishir G. Patil, Kevin Lin, Sarah Wooders, and Joseph E. Gonzalez. MemGPT: Towards LLMs as operating systems. arXiv preprint arXiv:2310.08560, 2023. https: //arxiv.org/abs/2310.08560

work page internal anchor Pith review Pith/arXiv arXiv 2023
[9]

Compress the Context, Keep the Commitments: A Formal Framework for Verifiable LLM Context Compression

Natalia Trukhina and Vadim Vashkelis. Compress the context, keep the commitments: A formal framework for verifiable LLM context compression. arXiv preprint arXiv:2605.17304, 2026. https: //arxiv.org/abs/2605.17304

work page internal anchor Pith review Pith/arXiv arXiv 2026
[10]

JSON Schema validation: A vocabulary for structural valida- tion of JSON

JSON Schema Organization. JSON Schema validation: A vocabulary for structural valida- tion of JSON. Draft 2020-12, 2020. https://json-schema.org/draft/2020-12/ json-schema-validation

2020
[11]

JavaScript Object Notation (JSON) Patch

Paul Bryan and Mark Nottingham. JavaScript Object Notation (JSON) Patch. RFC 6902, Internet Engineering Task Force, 2013.https://datatracker.ietf.org/doc/html/rfc6902

2013
[12]

Tuning the decision threshold for class prediction

Scikit-learn developers. Tuning the decision threshold for class prediction. scikit- learn User Guide. https://scikit-learn.org/stable/modules/classification_ threshold.html
[13]

Jesse Dodge, Suchin Gururangan, Dallas Card, Roy Schwartz, and Noah A. Smith. Show your work: Improved reporting of experimental results.EMNLP-IJCNLP, 2019. See also NLP Reproducibility Checklist. https://www.jessedodge.ai/NLP_Reproducibility_Checklist_V1.2. pdf

2019
[14]

Reproducibility in NLP: What have we learned from the checklist?Findings of ACL, 2023

Ian Magnusson, Akshita Bhagia, Valentin Hofmann, Luca Soldaini, Oyvind Tafjord, Peter West, Kyle Lo, Dirk Groeneveld, Kyle Richardson, Ashish Sabharwal, Iz Beltagy, and Jesse Dodge. Reproducibility in NLP: What have we learned from the checklist?Findings of ACL, 2023. https://aclanthology. org/2023.findings-acl.809/

2023
[15]

Beyond accuracy: Behav- ioral testing of NLP models with CheckList.ACL, 2020

Marco Tulio Ribeiro, Tongshuang Wu, Carlos Guestrin, and Sameer Singh. Beyond accuracy: Behav- ioral testing of NLP models with CheckList.ACL, 2020. https://aclanthology.org/2020. acl-main.442/. 13

2020

[1] [1]

Schmidt, Jesse Spencer-Smith, and Jules White

Henry Gilbert, Michael Sandborn, Douglas C. Schmidt, Jesse Spencer-Smith, and Jules White. Semantic compression with large language models. arXiv preprint arXiv:2304.12512, 2023. https://arxiv. org/abs/2304.12512

work page arXiv 2023

[2] [2]

Language Modeling Is Compression

Gr´egoire Del´etang, Anian Ruoss, Paul-Ambroise Duquenne, Elliot Catt, Tim Genewein, Christopher Mattern, Jordi Grau-Moya, Li Kevin Wenliang, Matthew Aitchison, Laurent Orseau, Marcus Hutter, and Joel Veness. Language modeling is compression. arXiv preprint arXiv:2309.10668, 2023. https: //arxiv.org/abs/2309.10668

work page internal anchor Pith review Pith/arXiv arXiv 2023

[3] [3]

LLMLingua: Compressing prompts for accelerated inference of large language models

Huiqiang Jiang, Qianhui Wu, Chin-Yew Lin, Yuqing Yang, and Lili Qiu. LLMLingua: Compressing prompts for accelerated inference of large language models. InProceedings of EMNLP, 2023. https: //aclanthology.org/2023.emnlp-main.825/

2023

[4] [4]

LongLLMLingua: Accelerating and enhancing LLMs in long context scenarios via prompt compression

Huiqiang Jiang, Qianhui Wu, Xufang Luo, Dongsheng Li, Chin-Yew Lin, Yuqing Yang, and Lili Qiu. LongLLMLingua: Accelerating and enhancing LLMs in long context scenarios via prompt compression. InProceedings of ACL, 2024.https://arxiv.org/abs/2310.06839

work page arXiv 2024

[5] [5]

Selective Context: Compress your input to ChatGPT or other LLMs

Yucheng Li. Selective Context: Compress your input to ChatGPT or other LLMs. 2023. https: //github.com/liyucheng09/Selective_Context

2023

[6] [6]

Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang

Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. Lost in the middle: How language models use long contexts.Transactions of the Association for Computational Linguistics, 12:157–173, 2024. https://aclanthology.org/2024.tacl-1. 9/

2024

[7] [7]

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich K¨uttler, Mike Lewis, Wen-tau Yih, Tim Rockt¨aschel, Sebastian Riedel, and Douwe Kiela. 12 Retrieval-augmented generation for knowledge-intensive NLP tasks. InAdvances in Neural Information Processing Systems, 2020.https://arxiv.org/abs/2005.11401

work page internal anchor Pith review Pith/arXiv arXiv 2020

[8] [8]

MemGPT: Towards LLMs as Operating Systems

Charles Packer, Vivian Fang, Shishir G. Patil, Kevin Lin, Sarah Wooders, and Joseph E. Gonzalez. MemGPT: Towards LLMs as operating systems. arXiv preprint arXiv:2310.08560, 2023. https: //arxiv.org/abs/2310.08560

work page internal anchor Pith review Pith/arXiv arXiv 2023

[9] [9]

Compress the Context, Keep the Commitments: A Formal Framework for Verifiable LLM Context Compression

Natalia Trukhina and Vadim Vashkelis. Compress the context, keep the commitments: A formal framework for verifiable LLM context compression. arXiv preprint arXiv:2605.17304, 2026. https: //arxiv.org/abs/2605.17304

work page internal anchor Pith review Pith/arXiv arXiv 2026

[10] [10]

JSON Schema validation: A vocabulary for structural valida- tion of JSON

JSON Schema Organization. JSON Schema validation: A vocabulary for structural valida- tion of JSON. Draft 2020-12, 2020. https://json-schema.org/draft/2020-12/ json-schema-validation

2020

[11] [11]

JavaScript Object Notation (JSON) Patch

Paul Bryan and Mark Nottingham. JavaScript Object Notation (JSON) Patch. RFC 6902, Internet Engineering Task Force, 2013.https://datatracker.ietf.org/doc/html/rfc6902

2013

[12] [12]

Tuning the decision threshold for class prediction

Scikit-learn developers. Tuning the decision threshold for class prediction. scikit- learn User Guide. https://scikit-learn.org/stable/modules/classification_ threshold.html

[13] [13]

Jesse Dodge, Suchin Gururangan, Dallas Card, Roy Schwartz, and Noah A. Smith. Show your work: Improved reporting of experimental results.EMNLP-IJCNLP, 2019. See also NLP Reproducibility Checklist. https://www.jessedodge.ai/NLP_Reproducibility_Checklist_V1.2. pdf

2019

[14] [14]

Reproducibility in NLP: What have we learned from the checklist?Findings of ACL, 2023

Ian Magnusson, Akshita Bhagia, Valentin Hofmann, Luca Soldaini, Oyvind Tafjord, Peter West, Kyle Lo, Dirk Groeneveld, Kyle Richardson, Ashish Sabharwal, Iz Beltagy, and Jesse Dodge. Reproducibility in NLP: What have we learned from the checklist?Findings of ACL, 2023. https://aclanthology. org/2023.findings-acl.809/

2023

[15] [15]

Beyond accuracy: Behav- ioral testing of NLP models with CheckList.ACL, 2020

Marco Tulio Ribeiro, Tongshuang Wu, Carlos Guestrin, and Sameer Singh. Beyond accuracy: Behav- ioral testing of NLP models with CheckList.ACL, 2020. https://aclanthology.org/2020. acl-main.442/. 13

2020