pith. sign in

arxiv: 1811.08162 · v1 · pith:6KBSGSKAnew · submitted 2018-11-20 · 💻 cs.CL · eess.SP· q-bio.GN

DeepZip: Lossless Data Compression using Recurrent Neural Networks

classification 💻 cs.CL eess.SPq-bio.GN
keywords dataneuralcompressiondatasetsnetworkscompressgenomiclearn
0
0 comments X
read the original abstract

Sequential data is being generated at an unprecedented pace in various forms, including text and genomic data. This creates the need for efficient compression mechanisms to enable better storage, transmission and processing of such data. To solve this problem, many of the existing compressors attempt to learn models for the data and perform prediction-based compression. Since neural networks are known as universal function approximators with the capability to learn arbitrarily complex mappings, and in practice show excellent performance in prediction tasks, we explore and devise methods to compress sequential data using neural network predictors. We combine recurrent neural network predictors with an arithmetic coder and losslessly compress a variety of synthetic, text and genomic datasets. The proposed compressor outperforms Gzip on the real datasets and achieves near-optimal compression for the synthetic datasets. The results also help understand why and where neural networks are good alternatives for traditional finite context models

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. The 2026 Algorithmic Information Theory Data Compression Challenge

    cs.IT 2026-06 unverdicted novelty 6.0

    The paper introduces and reports results from a new benchmark challenge for general-purpose lossless data compressors using public training and hidden test sets.

  2. TextEconomizer: Enhancing Lossy Text Compression with Denoising Transformers and Entropy Coding

    cs.CL 2026-06 unverdicted novelty 3.0

    Presents TextEconomizer, a transformer-based encoder-decoder for lossy text compression claiming 5.39x ratio, near-perfect semantic quality via standard metrics, and 153x fewer parameters than comparables.