pith. machine review for the scientific record. sign in

arxiv: 0809.2553 · v1 · submitted 2008-09-15 · 💻 cs.IR · cs.AI

Recognition: unknown

Normalized Information Distance

Authors on Pith no claims yet
classification 💻 cs.IR cs.AI
keywords distanceinformationnormalizedclusteringcomplexitykolmogorovmachineobjects
0
0 comments X
read the original abstract

The normalized information distance is a universal distance measure for objects of all kinds. It is based on Kolmogorov complexity and thus uncomputable, but there are ways to utilize it. First, compression algorithms can be used to approximate the Kolmogorov complexity if the objects have a string representation. Second, for names and abstract concepts, page count statistics from the World Wide Web can be used. These practical realizations of the normalized information distance can then be applied to machine learning tasks, expecially clustering, to perform feature-free and parameter-free data mining. This chapter discusses the theoretical foundations of the normalized information distance and both practical realizations. It presents numerous examples of successful real-world applications based on these distance measures, ranging from bioinformatics to music clustering to machine translation.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Visual Text Compression as Measure Transport

    cs.CV 2026-05 unverdicted novelty 7.0

    Framing visual text compression as measure transport decomposes encoding loss into precision and coverage costs, enabling a label-free routing rule that matches oracle performance on 17 of 24 NLP datasets while using ...