On building minimal automaton for subset matching queries

Kimmo Fredriksson

arxiv: 1004.0902 · v2 · pith:UMBTOHTLnew · submitted 2010-04-06 · 💻 cs.FL · cs.DS· cs.IR

On building minimal automaton for subset matching queries

Kimmo Fredriksson This is my paper

classification 💻 cs.FL cs.DScs.IR

keywords stringsigmaaveragebuildingdeltaefficientlyindexlocation

0 comments

read the original abstract

We address the problem of building an index for a set $D$ of $n$ strings, where each string location is a subset of some finite integer alphabet of size $\sigma$, so that we can answer efficiently if a given simple query string (where each string location is a single symbol) $p$ occurs in the set. That is, we need to efficiently find a string $d \in D$ such that $p[i] \in d[i]$ for every $i$. We show how to build such index in $O(n^{\log_{\sigma/\Delta}(\sigma)}\log(n))$ average time, where $\Delta$ is the average size of the subsets. Our methods have applications e.g.\ in computational biology (haplotype inference) and music information retrieval.

This paper has not been read by Pith yet.

On building minimal automaton for subset matching queries

discussion (0)