pith. sign in

hub

Language Modeling Is Compression

18 Pith papers cite this work. Polarity classification is still indexing.

18 Pith papers citing it
abstract

It has long been established that predictive models can be transformed into lossless compressors and vice versa. Incidentally, in recent years, the machine learning community has focused on training increasingly large and powerful self-supervised (language) models. Since these large language models exhibit impressive predictive capabilities, they are well-positioned to be strong compressors. In this work, we advocate for viewing the prediction problem through the lens of compression and evaluate the compression capabilities of large (foundation) models. We show that large language models are powerful general-purpose predictors and that the compression viewpoint provides novel insights into scaling laws, tokenization, and in-context learning. For example, Chinchilla 70B, while trained primarily on text, compresses ImageNet patches to 43.4% and LibriSpeech samples to 16.4% of their raw size, beating domain-specific compressors like PNG (58.5%) or FLAC (30.3%), respectively. Finally, we show that the prediction-compression equivalence allows us to use any compressor (like gzip) to build a conditional generative model.

hub tools

citation-role summary

background 3 other 1

citation-polarity summary

polarities

background 3 unclear 1

representative citing papers

Are Flat Minima an Illusion?

cs.LG · 2026-03-24 · unverdicted · novelty 8.0

Flat minima are illusory; generalization is driven by weakness, a reparameterization-invariant measure of compatible completions that predicts performance better than sharpness on MNIST and Fashion-MNIST.

Efficient compression of neural networks and datasets

cs.LG · 2025-05-23 · unverdicted · novelty 5.0

Refined probabilistic and smooth l0 pruning techniques approximate minimum description length for neural networks, achieving high compression with minimal accuracy loss and empirically verifying better sample efficiency and generalization on image and text tasks.

The Rhetoric of Machine Learning

cs.LG · 2026-04-08 · unverdicted · novelty 4.0

Machine learning is inherently rhetorical and is often deployed as 'manipulation as a service' in business models.

citing papers explorer

Showing 18 of 18 citing papers.