neal young / Zhang97Codebook
-
Pattern-matching-based document-compression systems (e.g. for
faxing) rely on finding a small set of patterns that can be
used to represent all of the ink in the document. Finding an
optimal set of patterns is NP-hard; previous compression
schemes have resorted to heuristics. This paper describes an
extension of the cross-entropy approach, used previously for
measuring pattern similarity, to this problem. This approach
reduces the problem to a k-medians problem, for which the
paper gives a new algorithm with a provably good performance
guarantee. In comparison to previous heuristics (First Fit,
with and without generalized Lloyd's/k-means post-processing
steps), the new algorithm generates a better codebook,
resulting in an overall improvement in compression performance
of almost 17%.
© Copyrights are reserved by the publishers.
Download for personal and limited academic use only.