Large language models serve as strong general-purpose lossless compressors for text, images, and audio, outperforming domain-specific methods and revealing insights into scaling, tokenization, and in-context learning.
Syntactically Informed Text Compression with Recurrent Neural Networks
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
We present a self-contained system for constructing natural language models for use in text compression. Our system improves upon previous neural network based models by utilizing recent advances in syntactic parsing -- Google's SyntaxNet -- to augment character-level recurrent neural networks. RNNs have proven exceptional in modeling sequence data such as text, as their architecture allows for modeling of long-term contextual information.
fields
cs.LG 1years
2023 1verdicts
ACCEPT 1representative citing papers
citing papers explorer
-
Language Modeling Is Compression
Large language models serve as strong general-purpose lossless compressors for text, images, and audio, outperforming domain-specific methods and revealing insights into scaling, tokenization, and in-context learning.