SemanticZip is a pilot framework introducing LLM-mediated lossy text compression with an experimental interface evaluating six representation regimes on five diagnostic cases for semantic atom recovery and token efficiency.
Compress the Context, Keep the Commitments: A Formal Framework for Verifiable LLM Context Compression
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
LLM context is not just tokens; it is a set of commitments. Long-running conversations accumulate goals, constraints, decisions, preferences, tool results, retrieved evidence, artifacts, and safety boundaries that future responses must preserve. Existing context-management methods reduce length through truncation, retrieval, summarization, memory systems, or token-level prompt compression, but they rarely specify which semantic commitments must survive compression or how their preservation should be measured. We propose Context Codec, a commitment-level framework for compressing prompts and chat histories. Context Codec represents dialogue state as typed, source-grounded semantic atoms with canonical identity, equivalence, conflict, confidence, risk, and evidence spans. It separates five concerns - extraction, normalization, representation, rendering, and verification - and introduces metrics for Critical Atom Recall, Weighted Atom Recall, Commitment Density, and round-trip recoverability. It also defines a taxonomy of semantic compression errors, a concrete normalization procedure, conservative fallback rules for low-confidence and safety-critical atoms, and Context Compression Language (CCL), an ASCII-first compact rendering of canonical JSON atoms. In a small diagnostic study, CCL-Core occupies a useful middle ground between structured prose and JSON: more explicit and auditable than prose, usually more compact than JSON, and less risky than heavily minified notation. The result is not a claim that shorthand solves compression, but a framework for making context compression verifiable: compress the conversation, keep the commitments.
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
SemanticZip: A Pilot Framework for Lossy Text Compression with LLMs as Semantic Decompressors
SemanticZip is a pilot framework introducing LLM-mediated lossy text compression with an experimental interface evaluating six representation regimes on five diagnostic cases for semantic atom recovery and token efficiency.