Probabilistic Language Tries: A Unified Framework for Compression, Decision Policies, and Execution Reuse

· 2026 · cs.LG · arXiv 2604.06228

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

We introduce probabilistic language tries (PLTs), a unified representation that makes explicit the prefix structure implicitly defined by any generative model over sequences. By assigning to each outgoing edge the conditional probability of the corresponding token or action, a PLT simultaneously serves as: (i) an optimal lossless compressor via frequency-weighted interval encoding, generalizing arithmetic coding to model-conditioned distributions; (ii) a policy representation for sequential decision problems including games, search, and robotic control; and (iii) a memoization index that lets repeated inference queries be answered by structured retrieval rather than full model execution. The central technical result is a prior-guided caching theorem: under a stationary generative distribution, a PLT-guided cache achieves strictly lower expected inference cost than any empirical-frequency cache for all query counts below a threshold that grows with the concentration of the prior. This converts O(n^2) transformer attention cost into an expected cost of p_r * O(log N) + (1 - p_r) * O(n^2), where p_r is the prior-estimated reuse probability and N is the artifact store size. We further introduce a hybrid compression architecture decomposing any dataset into a PLT-covered majority and a sparse residual store, connecting arithmetic coding with Kolmogorov-style program representations and rate-distortion theory. We instantiate the framework across chess, web search, robotics, organizational workflows, and LLM inference, demonstrating that compression, decision making, and computational reuse are all derived from a single probability measure on sequence space.

representative citing papers

Sequential KV Cache Compression via Probabilistic Language Tries: Beyond the Per-Vector Shannon Limit

cs.LG · 2026-04-10 · unverdicted · novelty 7.0

Sequential KV compression via probabilistic language tries and predictive delta coding achieves 3.3-4.3 bits per token entropy, yielding up to 914x better ratios than TurboQuant even with large overhead.

LAWS: Learning from Actual Workloads Symbolically -- A Self-Certifying Parametrized Cache Architecture for Neural Inference, Robotics, and Edge Deployment

cs.LG · 2026-04-12 · unverdicted · novelty 5.0

LAWS is a self-certifying parametrized cache that generalizes mixture-of-experts and KV caching with uniform error bounds based on Lipschitz constants and embedding diameters.

citing papers explorer

Showing 2 of 2 citing papers.

Sequential KV Cache Compression via Probabilistic Language Tries: Beyond the Per-Vector Shannon Limit cs.LG · 2026-04-10 · unverdicted · none · ref 9 · internal anchor
Sequential KV compression via probabilistic language tries and predictive delta coding achieves 3.3-4.3 bits per token entropy, yielding up to 914x better ratios than TurboQuant even with large overhead.
LAWS: Learning from Actual Workloads Symbolically -- A Self-Certifying Parametrized Cache Architecture for Neural Inference, Robotics, and Edge Deployment cs.LG · 2026-04-12 · unverdicted · none · ref 15 · internal anchor
LAWS is a self-certifying parametrized cache that generalizes mixture-of-experts and KV caching with uniform error bounds based on Lipschitz constants and embedding diameters.

Probabilistic Language Tries: A Unified Framework for Compression, Decision Policies, and Execution Reuse

fields

years

verdicts

representative citing papers

citing papers explorer