Syntactically Informed Text Compression with Recurrent Neural Networks

arxiv: 1608.02893 · v2 · pith:NGCH2RH5new · submitted 2016-08-08 · 💻 cs.LG · cs.CL· cs.IT· math.IT

Syntactically Informed Text Compression with Recurrent Neural Networks

David Cox This is my paper

classification 💻 cs.LG cs.CLcs.ITmath.IT

keywords neuraltextcompressionmodelingmodelsnetworksrecurrentsystem

0 comments p. Extension

pith:NGCH2RH5 Add to your LaTeX paper

What is a Pith Number?

\usepackage{pith}
\pithnumber{NGCH2RH5}

Prints a linked pith:NGCH2RH5 badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

read the original abstract

We present a self-contained system for constructing natural language models for use in text compression. Our system improves upon previous neural network based models by utilizing recent advances in syntactic parsing -- Google's SyntaxNet -- to augment character-level recurrent neural networks. RNNs have proven exceptional in modeling sequence data such as text, as their architecture allows for modeling of long-term contextual information.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Language Modeling Is Compression
cs.LG 2023-09 accept novelty 6.0

Large language models serve as strong general-purpose lossless compressors for text, images, and audio, outperforming domain-specific methods and revealing insights into scaling, tokenization, and in-context learning.