Dictionary based methods for information extraction

A. Baronchelli; E. Caglioti; E. Pizzi; V. Loreto

arxiv: cond-mat/0402581 · v2 · submitted 2004-02-24 · ❄️ cond-mat.stat-mech · cond-mat.other· cs.IR· q-bio.GN· q-bio.OT

Dictionary based methods for information extraction

A. Baronchelli , E. Caglioti , V. Loreto , E. Pizzi This is my paper

classification ❄️ cond-mat.stat-mech cond-mat.othercs.IRq-bio.GNq-bio.OT

keywords dictionaryextractionfeaturesinformationresultssequencesartificialattention

0 comments

read the original abstract

In this paper we present a general method for information extraction that exploits the features of data compression techniques. We first define and focus our attention on the so-called "dictionary" of a sequence. Dictionaries are intrinsically interesting and a study of their features can be of great usefulness to investigate the properties of the sequences they have been extracted from (e.g. DNA strings). We then describe a procedure of string comparison between dictionary-created sequences (or "artificial texts") that gives very good results in several contexts. We finally present some results on self-consistent classification problems.

This paper has not been read by Pith yet.

Dictionary based methods for information extraction

discussion (0)