pith. sign in

arxiv: 1502.06370 · v1 · pith:G3SHXP2Wnew · submitted 2015-02-23 · 💻 cs.DS

A framework for space-efficient string kernels

classification 💻 cs.DS
keywords kernelsdatanumberstringinputkernelliketime
0
0 comments X
read the original abstract

String kernels are typically used to compare genome-scale sequences whose length makes alignment impractical, yet their computation is based on data structures that are either space-inefficient, or incur large slowdowns. We show that a number of exact string kernels, like the $k$-mer kernel, the substrings kernels, a number of length-weighted kernels, the minimal absent words kernel, and kernels with Markovian corrections, can all be computed in $O(nd)$ time and in $o(n)$ bits of space in addition to the input, using just a $\mathtt{rangeDistinct}$ data structure on the Burrows-Wheeler transform of the input strings, which takes $O(d)$ time per element in its output. The same bounds hold for a number of measures of compositional complexity based on multiple value of $k$, like the $k$-mer profile and the $k$-th order empirical entropy, and for calibrating the value of $k$ using the data.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.