pith. sign in

arxiv: 1206.4245 · v1 · pith:NID2LNJ2new · submitted 2012-06-19 · 💻 cs.IT · math.IT

On Lossless Universal Compression of Distributed Identical Sources

classification 💻 cs.IT math.IT
keywords compressionlengthsourceslosslesssequencealmostdecodersource
0
0 comments X
read the original abstract

Slepian-Wolf theorem is a well-known framework that targets almost lossless compression of (two) data streams with symbol-by-symbol correlation between the outputs of (two) distributed sources. However, this paper considers a different scenario which does not fit in the Slepian-Wolf framework. We consider two identical but spatially separated sources. We wish to study the universal compression of a sequence of length $n$ from one of the sources provided that the decoder has access to (i.e., memorized) a sequence of length $m$ from the other source. Such a scenario occurs, for example, in the universal compression of data from multiple mirrors of the same server. In this setup, the correlation does not arise from symbol-by-symbol dependency of two outputs from the two sources. Instead, the sequences are correlated through the information that they contain about the unknown source parameter. We show that the finite-length nature of the compression problem at hand requires considering a notion of almost lossless source coding, where coding incurs an error probability $p_e(n)$ that vanishes with sequence length $n$. We obtain a lower bound on the average minimax redundancy of almost lossless codes as a function of the sequence length $n$ and the permissible error probability $p_e$ when the decoder has a memory of length $m$ and the encoders do not communicate. Our results demonstrate that a strict performance loss is incurred when the two encoders do not communicate even when the decoder knows the unknown parameter vector (i.e., $m \to \infty$).

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.