pith. sign in

arxiv: cs/0205029 · v1 · pith:MMNG3B5Vnew · submitted 2002-05-17 · 💻 cs.DS

A Codebook Generation Algorithm for Document Image Compression

classification 💻 cs.DS
keywords algorithmcompressionproblemapproachcodebookdocumentfindingheuristics
0
0 comments X
read the original abstract

Pattern-matching-based document-compression systems (e.g. for faxing) rely on finding a small set of patterns that can be used to represent all of the ink in the document. Finding an optimal set of patterns is NP-hard; previous compression schemes have resorted to heuristics. This paper describes an extension of the cross-entropy approach, used previously for measuring pattern similarity, to this problem. This approach reduces the problem to a k-medians problem, for which the paper gives a new algorithm with a provably good performance guarantee. In comparison to previous heuristics (First Fit, with and without generalized Lloyd's/k-means postprocessing steps), the new algorithm generates a better codebook, resulting in an overall improvement in compression performance of almost 17%.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.