pith. sign in

arxiv: 1611.01479 · v1 · pith:5WSDFDVCnew · submitted 2016-11-04 · 💻 cs.DS

Space-Efficient Re-Pair Compression

classification 💻 cs.DS
keywords spacetextwordscompressionepsilonexpectedre-pairsqrt
0
0 comments X
read the original abstract

Re-Pair is an effective grammar-based compression scheme achieving strong compression rates in practice. Let $n$, $\sigma$, and $d$ be the text length, alphabet size, and dictionary size of the final grammar, respectively. In their original paper, the authors show how to compute the Re-Pair grammar in expected linear time and $5n + 4\sigma^2 + 4d + \sqrt{n}$ words of working space on top of the text. In this work, we propose two algorithms improving on the space of their original solution. Our model assumes a memory word of $\lceil\log_2 n\rceil$ bits and a re-writable input text composed by $n$ such words. Our first algorithm runs in expected $\mathcal O(n/\epsilon)$ time and uses $(1+\epsilon)n +\sqrt n$ words of space on top of the text for any parameter $0<\epsilon \leq 1$ chosen in advance. Our second algorithm runs in expected $\mathcal O(n\log n)$ time and improves the space to $n +\sqrt n$ words.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.