A Fast Generic Sequence Matching Algorithm

David R. Musser; Gor V. Nishanov

arxiv: 0810.0264 · v1 · submitted 2008-10-01 · 💻 cs.DS

A Fast Generic Sequence Matching Algorithm

David R. Musser , Gor V. Nishanov This is my paper

classification 💻 cs.DS

keywords algorithmmatchinggenericproblemssequenceboundboyer-mooreknuth-morris-pratt

0 comments

read the original abstract

A string matching -- and more generally, sequence matching -- algorithm is presented that has a linear worst-case computing time bound, a low worst-case bound on the number of comparisons (2n), and sublinear average-case behavior that is better than that of the fastest versions of the Boyer-Moore algorithm. The algorithm retains its efficiency advantages in a wide variety of sequence matching problems of practical interest, including traditional string matching; large-alphabet problems (as in Unicode strings); and small-alphabet, long-pattern problems (as in DNA searches). Since it is expressed as a generic algorithm for searching in sequences over an arbitrary type T, it is well suited for use in generic software libraries such as the C++ Standard Template Library. The algorithm was obtained by adding to the Knuth-Morris-Pratt algorithm one of the pattern-shifting techniques from the Boyer-Moore algorithm, with provision for use of hashing in this technique. In situations in which a hash function or random access to the sequences is not available, the algorithm falls back to an optimized version of the Knuth-Morris-Pratt algorithm.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Honest Lying: Understanding Memory Confabulation in Reflexive Agents
cs.LG 2026-05 unverdicted novelty 7.0

Reflexive agents confabulate incorrect task interpretations in memory, detected via Reflection Repetition Rate metric, with a programmatic mitigation raising correct object mentions from 0% to 86% in frozen ALFWorld cases.