Syntactically Informed Text Compression with Recurrent Neural Networks
pith:NGCH2RH5 Add to your LaTeX paper
What is a Pith Number?\usepackage{pith}
\pithnumber{NGCH2RH5}
Prints a linked pith:NGCH2RH5 badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more
read the original abstract
We present a self-contained system for constructing natural language models for use in text compression. Our system improves upon previous neural network based models by utilizing recent advances in syntactic parsing -- Google's SyntaxNet -- to augment character-level recurrent neural networks. RNNs have proven exceptional in modeling sequence data such as text, as their architecture allows for modeling of long-term contextual information.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
Language Modeling Is Compression
Large language models serve as strong general-purpose lossless compressors for text, images, and audio, outperforming domain-specific methods and revealing insights into scaling, tokenization, and in-context learning.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.