On Lossless Universal Compression of Distributed Identical Sources

Ahmad Beirami; Faramarz Fekri

read the original abstract

Slepian-Wolf theorem is a well-known framework that targets almost lossless compression of (two) data streams with symbol-by-symbol correlation between the outputs of (two) distributed sources. However, this paper considers a different scenario which does not fit in the Slepian-Wolf framework. We consider two identical but spatially separated sources. We wish to study the universal compression of a sequence of length $n$ from one of the sources provided that the decoder has access to (i.e., memorized) a sequence of length $m$ from the other source. Such a scenario occurs, for example, in the universal compression of data from multiple mirrors of the same server. In this setup, the correlation does not arise from symbol-by-symbol dependency of two outputs from the two sources. Instead, the sequences are correlated through the information that they contain about the unknown source parameter. We show that the finite-length nature of the compression problem at hand requires considering a notion of almost lossless source coding, where coding incurs an error probability $p_e(n)$ that vanishes with sequence length $n$. We obtain a lower bound on the average minimax redundancy of almost lossless codes as a function of the sequence length $n$ and the permissible error probability $p_e$ when the decoder has a memory of length $m$ and the encoders do not communicate. Our results demonstrate that a strict performance loss is incurred when the two encoders do not communicate even when the decoder knows the unknown parameter vector (i.e., $m \to \infty$).

On Lossless Universal Compression of Distributed Identical Sources

discussion (0)