Faster Cache-Efficient Pattern Matching for Deterministic Wheeler Pangenome Graphs

Carlo Tosoni; Nicola Prezza; Riccardo Maso

arxiv: 2607.02113 · v1 · pith:H3M3DPQLnew · submitted 2026-07-02 · 💻 cs.DS

Faster Cache-Efficient Pattern Matching for Deterministic Wheeler Pangenome Graphs

Riccardo Maso , Nicola Prezza , Carlo Tosoni This is my paper

Pith reviewed 2026-07-03 03:58 UTC · model grok-4.3

classification 💻 cs.DS

keywords Wheeler graphspangenome graphspattern matchingcache efficiencyBurrows-Wheeler transformdeterministic automatabioinformatics

0 comments

The pith

A cache-friendly index for Wheeler DFAs reduces I/O in pattern matching by interleaving binary search with sequential automaton scans.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces a new indexing method for deterministic Wheeler automata that addresses the high I/O cost of backward search in pattern matching. By adopting a strategy similar to suffix arrays, the method performs binary searches interleaved with fast sequential scans. On real pangenome graphs, this yields up to 500 times faster queries at the cost of up to 15 times more space, allowing single-character processing in less than 3 nanoseconds. Researchers in bioinformatics would value this because it enables efficient querying of large genomic graphs without I/O bottlenecks limiting performance.

Core claim

The paper claims that its new data structure for Wheeler DFAs, by interleaving a binary search with fast sequential scans of the automaton, reduces the number of I/O operations below the pattern length bound of backward search, resulting in up to 500 times faster pattern matching while using at most 15 times the space.

What carries the argument

The interleaving of binary search and sequential scans on the Wheeler DFA, analogous to suffix array search.

If this is right

The new structure can process each pattern character in under 3 ns on tested graphs.
It achieves speedups of up to 500 times over backward search.
Space usage increases by at most a factor of 15 compared to the BWT-based index.
This approach maintains correctness for pattern matching on Wheeler DFAs while improving cache efficiency.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Such a method might extend to other classes of automata beyond Wheeler DFAs if similar structural properties hold.
Deployments with memory constraints may need to balance the space-time tradeoff based on available hardware.
Future work could test the method on synthetic graphs to see if speedups persist without real-world structure.

Load-bearing premise

The observed performance gains on the specific real-world Wheeler pangenome graphs will generalize to other graphs and workloads.

What would settle it

Running the algorithm on a Wheeler DFA where the number of I/O operations per pattern character exceeds that of backward search would falsify the efficiency claim.

Figures

Figures reproduced from arXiv: 2607.02113 by Carlo Tosoni, Nicola Prezza, Riccardo Maso.

**Figure 2.** Figure 2: Mean 𝑑 (number of traversed unary paths) measured during locate(𝛼) versus length 𝑚 (log-log scale). The mean is computed across 104 random queries, where each string 𝛼 of length 𝑚 is sampled from the graph. Dataset WDFA Properties Index Size Queries Execution Time 𝑛 |𝛿 | Fw. Gr-SA Blowup 𝑚 𝑚/|𝛽| Fw. Gr-SA Speedup Chr 21 3.85 × 107 3.88 × 107 38.1 MB 528.9 MB ×13.9 103 0.58 968.5 ns 9.4 ns ×103 104 0.09 976… view at source ↗

read the original abstract

Pattern matching on strings is regarded as one of the core operations in computer science. Although researchers proposed several solutions to this problem, some of the most elegant and widely used approaches are based on the renowned Burrows-Wheeler transform (BWT). The success of the BWT lies in its pattern matching algorithm known as backward search, which is not only near-optimal in the RAM model, but also runs directly on a compressed representation of the input string. More recently, the backward search has been generalized to Wheeler deterministic finite automata (DFAs), a subclass of standard DFAs, without losing its near-optimal time efficiency. Similarly to the case of strings, this pattern matching algorithm for Wheeler DFAs has found applications in bioinformatics, where researchers have shown that specific pangenome graphs of human chromosomes can be transformed into Wheeler DFAs and consequently indexed using this strategy. However, this BWT-based index on Wheeler DFAs inherited a significant drawback from the original backward search, namely the high number of I/O operations triggered during the algorithm execution, which are in the worst-case lower-bounded by the length of the pattern. In this paper, we address this limitation by proposing the first cache-friendly algorithm specifically designed for Wheeler DFAs. Our new data structure reduces the number of I/O operations by employing a strategy analogous to the suffix array: it interleaves a binary search with fast sequential scans of the automaton. We empirically validate this new indexing strategy by running our algorithm on real-world Wheeler pangenome graphs. We show that while our data structure can use up to 15 times the space required by the backward search, it can also be 500 times faster and able to process a single character of the pattern in less than 3 ns.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces a hybrid binary-search-plus-sequential-scan index for Wheeler DFAs that trades space for fewer cache misses, claiming large speedups on pangenome graphs, but the numbers come from experiments whose setup is not described.

read the letter

The main takeaway is that the authors replace pure backward search on Wheeler DFAs with an interleaving strategy that does binary search steps mixed with fast sequential scans of the automaton, modeled on how suffix arrays are searched. They report this can cut query time dramatically on real human-chromosome pangenome graphs while accepting up to 15 times more space.

The construction itself is a reasonable next step from existing BWT-style indexes for these automata. The motivation is clear: backward search triggers many random accesses that hurt cache performance, and the new structure tries to improve locality by design. Testing the idea on actual pangenome data rather than synthetic cases is the right direction for the target application in bioinformatics.

The weak part is the validation. The abstract gives the headline numbers—500 times faster and under 3 ns per character—but supplies no hardware description, no graph sizes or degree distributions, no query workload details, and no statistical summary of the runs. Without those, it is impossible to judge whether the gains are general or tied to the locality properties of the tested instances. There is also no worst-case I/O analysis or external-memory bound, so the improvement stays an empirical observation rather than a guaranteed property.

This paper is aimed at people who build and tune indexes for large genomic graphs and care about practical query latency. A reader working on string algorithms or automata might pick up the interleaving idea if the full version spells out the data structure and the exact trade-offs.

I would bring the full paper to a reading group once the experimental section is available. It deserves peer review because the direction addresses a documented bottleneck in an active applied area, even though the current write-up needs more on methodology and scope of the speedups.

Referee Report

2 major / 0 minor

Summary. The paper presents a cache-efficient data structure for pattern matching on deterministic Wheeler DFAs arising from pangenome graphs. It replaces the standard backward search with an approach that interleaves binary search over the automaton with fast sequential scans, drawing an analogy to the suffix array. The construction is evaluated empirically on real-world Wheeler pangenome graphs derived from human chromosomes, reporting up to 15 imes space overhead relative to backward search but up to 500 imes speedup and sub-3 ns per character query time.

Significance. If the reported speedups prove robust, the work would offer a practical improvement for I/O-bound pattern matching in pangenomics. The empirical validation on concrete real-world graphs is a positive feature, and the suffix-array analogy is a clean conceptual contribution. However, the absence of any worst-case I/O analysis or external-memory bounds limits the result's theoretical significance and makes generalization beyond the tested instances uncertain.

major comments (2)

[Abstract] Abstract and experimental evaluation section: the headline claims of 500 imes speedup and <3 ns per character are stated without any description of baselines, hardware platform, graph sizes, number of queries, statistical aggregation, or reproducibility details. This directly affects verifiability of the central performance claim.
[Theoretical analysis] Theoretical analysis (throughout, especially any section presenting the data structure): no worst-case bound on I/O operations, cache-miss count, or external-memory complexity is derived. The reduction in I/O is asserted only via the binary-search-plus-scan strategy and demonstrated solely on specific human-chromosome pangenome graphs; without a general argument, the speedup may be instance-specific rather than a property of the construction.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive feedback and for recognizing the practical value of the empirical results and the suffix-array analogy. We respond to each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract and experimental evaluation section: the headline claims of 500 times speedup and <3 ns per character are stated without any description of baselines, hardware platform, graph sizes, number of queries, statistical aggregation, or reproducibility details. This directly affects verifiability of the central performance claim.

Authors: We agree that additional experimental context is required for verifiability. In the revised version we will expand both the abstract and the experimental section to specify the baseline (standard backward search), the hardware platform, the sizes of the human-chromosome-derived Wheeler pangenome graphs, the number of queries, the statistical aggregation method, and reproducibility instructions. revision: yes
Referee: [Theoretical analysis] Theoretical analysis (throughout, especially any section presenting the data structure): no worst-case bound on I/O operations, cache-miss count, or external-memory complexity is derived. The reduction in I/O is asserted only via the binary-search-plus-scan strategy and demonstrated solely on specific human-chromosome pangenome graphs; without a general argument, the speedup may be instance-specific rather than a property of the construction.

Authors: The manuscript presents a practical algorithm whose I/O reduction is demonstrated empirically on real pangenome graphs; no worst-case I/O or external-memory analysis is provided. Cache-miss counts depend on concrete hardware parameters, which precludes simple general bounds. The binary-search-plus-scan strategy is motivated by the suffix-array analogy, but we make no claim of instance-independent worst-case guarantees. We will add a limitations paragraph discussing this point and suggesting avenues for future theoretical work. revision: partial

standing simulated objections not resolved

Deriving a general worst-case bound on I/O operations or cache misses

Circularity Check

0 steps flagged

No circularity: novel algorithmic construction with independent empirical validation

full rationale

The paper presents a constructive algorithm that interleaves binary search with sequential automaton scans, modeled on the suffix array but defined directly via the new data structure. No equations or claims reduce by construction to fitted inputs, self-definitions, or self-citation chains. Performance numbers (500x speedup, <3 ns/char) are reported as experimental outcomes on specific graphs rather than predictions derived from the algorithm itself. The derivation chain is self-contained and does not rely on load-bearing self-citations or renamed known results.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are described in the abstract; the contribution is a new algorithmic construction without mathematical fitting or new postulates.

pith-pipeline@v0.9.1-grok · 5849 in / 1162 out tokens · 35469 ms · 2026-07-03T03:58:48.439236+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

30 extracted references · 22 canonical work pages

[1]

Regular languages meet prefix sorting

Jarno Alanko, Giovanna D’Agostino, Alberto Policriti, and Nicola Prezza. Regular languages meet prefix sorting. In Shuchi Chawla, editor,Proceedings of the 2020 ACM-SIAM Symposium 13 103 104 105 m 101 102 103 Mean d value Mean d vs m Chr 14 Chr 21 Figure 2: Mean 𝑑 (number of traversed unary paths) measured during locate( 𝛼) versus length 𝑚 (log-log scale)...

work page doi:10.1137/1.9781611975994.55doi:10.1137/1.9781611975994.55 2020
[2]

Encoding co-lex orders of finite-state automata in linear space

Ruben Becker, Nicola Cotumaccio, Sung-Hwan Kim, Nicola Prezza, and Carlo Tosoni. Encoding co-lex orders of finite-state automata in linear space. In Paola Bonizzoni and Veli Mäkinen, editors,36th Annual Symposium on Combinatorial Pattern Matching, CPM 2025, Milan, Italy, June 17-19, 2025, LIPIcs, pages 15:1–15:17. Schloss Dagstuhl - Leibniz-Zentrum für In...

work page doi:10.4230/lipics.cpm.2025.15doi:10.4230/lipics.cpm.2025.15 2025
[3]

Optimal lower and upper bounds for representing sequences.ACM Trans

Djamal Belazzougui and Gonzalo Navarro. Optimal lower and upper bounds for representing sequences.ACM Trans. Algorithms, 11(4):31:1–31:21, 2015. doi:10.1145/2629339doi:10.1145/2629339

work page doi:10.1145/2629339doi:10.1145/2629339 2015
[4]

Michael Burrows and David J. Wheeler. A block-sorting lossless data compres- sion algorithm. Technical Report 124, Digital Equipment Corporation, 1994. URL: https://www.hp.com/hpinfo/ex-labs/research/src/digital_src_research_report_1994_124.pdfhttps://www.hp.com/hpinfo/ex-labs/research/src/digital_src_research_report_1994_124.pdf

1994
[5]

Phd thesis, University of Waterloo, 1996

David Clark.Compact Pat Trees. Phd thesis, University of Waterloo, 1996. Section 2.2.2. URL: http://www.nlc-bnc.ca/obj/s4/f2/dsk3/ftp04/nq21335.pdfhttp://www.nlc-bnc.ca/obj/s4/f2/dsk3/ftp04/nq21335.pdf

1996
[6]

Computational pan-genomics: status, promises and challenges.Briefings in Bioinformatics, 19(1):118–135, 01 2018

The Computational Pan-Genomics Consortium. Computational pan-genomics: status, promises and challenges.Briefings in Bioinformatics, 19(1):118–135, 01 2018. doi:10.1093/bib/bbw089doi:10.1093/bib/bbw089

work page doi:10.1093/bib/bbw089doi:10.1093/bib/bbw089 2018
[7]

Computing matching statistics on wheeler dfas

Alessio Conte, Nicola Cotumaccio, Travis Gagie, Giovanni Manzini, Nicola Prezza, and Marinella Sciortino. Computing matching statistics on wheeler dfas. In2023 Data Compression Conference (DCC), pages 150–159, 2023. doi:10.1109/DCC55655.2023.00023doi:10.1109/DCC55655.2023.00023

work page doi:10.1109/dcc55655.2023.00023doi:10.1109/dcc55655.2023.00023 2023
[8]

Massimo Equi, Veli Mäkinen, and Alexandru I. Tomescu. Graphs cannot be indexed in polynomial time for sub-quadratic time string matching, unless SETH fails. In Tomás Bures, Riccardo Dondi, Johann Gamper, Giovanna Guerrini, Tomasz Jurdzinski, Claus Pahl, Florian Sikora, and Prudence W. H. Wong, editors,SOFSEM 2021: Theory and Practice of Computer Science -...

work page doi:10.1007/978-3-030-67731-2 2021
[9]

Tomescu, and Roberto Grossi

Massimo Equi, Veli Mäkinen, Alexandru I. Tomescu, and Roberto Grossi. On the com- plexity of string matching for graphs.ACM Trans. Algorithms, 19(3):21:1–21:25, 2023. doi:10.1145/3588334doi:10.1145/3588334

work page doi:10.1145/3588334doi:10.1145/3588334 2023
[10]

Cambridge University Press, 2023

Paolo Ferragina.Pearls of Algorithm Engineering. Cambridge University Press, 2023. doi:10.1017/9781009128933doi:10.1017/9781009128933

work page doi:10.1017/9781009128933doi:10.1017/9781009128933 2023
[11]

Indexing compressed text.J

Paolo Ferragina and Giovanni Manzini. Indexing compressed text.J. ACM, 52(4):552–581, 2005. doi:10.1145/1082036.1082039doi:10.1145/1082036.1082039

work page doi:10.1145/1082036.1082039doi:10.1145/1082036.1082039 2005
[12]

Wheeler graphs: A framework for bwt-based data structures.Theor

Travis Gagie, Giovanni Manzini, and Jouni Sirén. Wheeler graphs: A framework for bwt-based data structures.Theor. Comput. Sci., 698:67–78, 2017. doi:10.1016/J.TCS.2017.06.016doi:10.1016/J.TCS.2017.06.016

work page doi:10.1016/j.tcs.2017.06.016doi:10.1016/j.tcs.2017.06.016 2017
[13]

Thankachan

Daniel Gibney and Sharma V. Thankachan. On the hardness and inapproximability of recognizing wheeler graphs. In Michael A. Bender, Ola Svensson, and Grzegorz Herman, editors,27th Annual European Symposium on Algorithms, ESA 2019, Munich/Garching, Germany, September 9-11, 2019, LIPIcs, pages 51:1–51:16. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 20...

work page doi:10.4230/lipics.esa.2019.51doi:10.4230/lipics.esa.2019.51 2019
[14]

Compressed suffix arrays and suffix trees with applica- tions to text indexing and string matching (extended abstract)

Roberto Grossi and Jeffrey Scott Vitter. Compressed suffix arrays and suffix trees with applica- tions to text indexing and string matching (extended abstract). InProceedings of the Thirty-Second Annual ACM Symposium on Theory of Computing, STOC ’00, page 397–406, New York, NY, USA,
[15]

doi:10.1145/335305.335351doi:10.1145/335305.335351

Association for Computing Machinery. doi:10.1145/335305.335351doi:10.1145/335305.335351

work page doi:10.1145/335305.335351doi:10.1145/335305.335351
[16]

Space-efficient static trees and graphs

Guy Jacobson. Space-efficient static trees and graphs. In30th Annual Symposium on Foundations of Computer Science, Research Triangle Park, North Carolina, USA, 30 October - 1 November 1989, pages 549–554. IEEE Computer Society, 1989. doi:10.1109/SFCS.1989.63533doi:10.1109/SFCS.1989.63533

work page doi:10.1109/sfcs.1989.63533doi:10.1109/sfcs.1989.63533 1989
[17]

Faster Prefix-Sorting Algorithms for Deterministic Finite Automata

Sung-Hwan Kim, Francisco Olivares, and Nicola Prezza. Faster Prefix-Sorting Algorithms for Deterministic Finite Automata. In Laurent Bulteau and Zsuzsanna Lipták, editors,34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023), volume 259 ofLeibniz International Proceedings in Informatics (LIPIcs), pages 16:1–16:16, Dagstuhl, Germany, 2023. Sch...

work page doi:10.4230/lipics.cpm.2023.16doi:10.4230/lipics.cpm.2023.16 2023
[18]

Fast pattern matching in strings

Donald E Knuth, James H Morris, Jr, and Vaughan R Pratt. Fast pattern matching in strings. SIAM journal on computing, 6(2):323–350, 1977

1977
[19]

Salzberg

Ben Langmead, Cole Trapnell, Mihai Pop, and Steven L. Salzberg. Ultrafast and memory- efficient alignment of short dna sequences to the human genome.Genome Biology, 10(3):R25,
[20]

doi:10.1186/gb-2009-10-3-r25doi:10.1186/gb-2009-10-3-r25

work page doi:10.1186/gb-2009-10-3-r25doi:10.1186/gb-2009-10-3-r25 2009
[21]

A survey of sequence alignment algorithms for next-generation sequencing.Briefings Bioinform., 11(5):473–483, 2010

Heng Li and Nils Homer. A survey of sequence alignment algorithms for next-generation sequencing.Briefings Bioinform., 11(5):473–483, 2010. doi:10.1093/BIB/BBQ015doi:10.1093/BIB/BBQ015

work page doi:10.1093/bib/bbq015doi:10.1093/bib/bbq015 2010
[22]

Suffix arrays: a new method for on-line string searches

Udi Manber and Gene Myers. Suffix arrays: a new method for on-line string searches. In Proceedings of the First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’90, page 319–327, USA, 1990. Society for Industrial and Applied Mathematics

1990
[23]

McCreight

Edward M. McCreight. A space-economical suffix tree construction algorithm.J. ACM, 23(2):262– 272, 1976. doi:10.1145/321941.321946doi:10.1145/321941.321946

work page doi:10.1145/321941.321946doi:10.1145/321941.321946 1976
[24]

Indexing graphs for path queries with appli- cations in genome research.IEEE ACM Trans

Jouni Sirén, Niko Välimäki, and Veli Mäkinen. Indexing graphs for path queries with appli- cations in genome research.IEEE ACM Trans. Comput. Biol. Bioinform., 11(2):375–388, 2014. doi:10.1109/TCBB.2013.2297101doi:10.1109/TCBB.2013.2297101

work page doi:10.1109/tcbb.2013.2297101doi:10.1109/tcbb.2013.2297101 2014
[25]

Haplotype-aware graph indexes.Bioinformatics, 36(2):400–407, 01 2020

Jouni Sirén, Erik Garrison, Adam M Novak, Benedict Paten, and Richard Durbin. Haplotype-aware graph indexes.Bioinformatics, 36(2):400–407, 01 2020. doi:10.1093/bioinformatics/btz575doi:10.1093/bioinformatics/btz575

work page doi:10.1093/bioinformatics/btz575doi:10.1093/bioinformatics/btz575 2020
[26]

A data structure for dynamic trees.J

Daniel Dominic Sleator and Robert Endre Tarjan. A data structure for dynamic trees.J. Comput. Syst. Sci., 26(3):362–391, 1983. doi:10.1016/0022-0000(83)90006-5doi:10.1016/0022-0000(83)90006-5

work page doi:10.1016/0022-0000(83)90006-5doi:10.1016/0022-0000(83)90006-5 1983
[27]

DFAs encoding the finnish subset of frequent mutations from dbSNP (GCSA’s experimental data)

University of Helsinki. DFAs encoding the finnish subset of frequent mutations from dbSNP (GCSA’s experimental data). https://www.cs.helsinki.fi/group/gsa/1000gen-FIN-WholeGenome/https://www.cs.helsinki.fi/group/gsa/1000gen-FIN-WholeGenome/, 2014. Accessed: 2026-02-04

2014
[28]

Linear pattern matching algorithms

Peter Weiner. Linear pattern matching algorithms. In14th Annual Symposium on Switching and Automata Theory, Iowa City, Iowa, USA, October 15-17, 1973, pages 1–11. IEEE Computer Society,

1973
[29]

doi:10.1109/SWAT.1973.13doi:10.1109/SWAT.1973.13. 16

work page doi:10.1109/swat.1973.13doi:10.1109/swat.1973.13 1973
[30]

A new algorithm for optimal 2-constraint satisfaction and its implications.Theor

Ryan Williams. A new algorithm for optimal 2-constraint satisfaction and its implications.Theor. Comput. Sci., 348(2-3):357–365, 2005. doi:10.1016/J.TCS.2005.09.023doi:10.1016/J.TCS.2005.09.023. 17

work page doi:10.1016/j.tcs.2005.09.023doi:10.1016/j.tcs.2005.09.023 2005

[1] [1]

Regular languages meet prefix sorting

Jarno Alanko, Giovanna D’Agostino, Alberto Policriti, and Nicola Prezza. Regular languages meet prefix sorting. In Shuchi Chawla, editor,Proceedings of the 2020 ACM-SIAM Symposium 13 103 104 105 m 101 102 103 Mean d value Mean d vs m Chr 14 Chr 21 Figure 2: Mean 𝑑 (number of traversed unary paths) measured during locate( 𝛼) versus length 𝑚 (log-log scale)...

work page doi:10.1137/1.9781611975994.55doi:10.1137/1.9781611975994.55 2020

[2] [2]

Encoding co-lex orders of finite-state automata in linear space

Ruben Becker, Nicola Cotumaccio, Sung-Hwan Kim, Nicola Prezza, and Carlo Tosoni. Encoding co-lex orders of finite-state automata in linear space. In Paola Bonizzoni and Veli Mäkinen, editors,36th Annual Symposium on Combinatorial Pattern Matching, CPM 2025, Milan, Italy, June 17-19, 2025, LIPIcs, pages 15:1–15:17. Schloss Dagstuhl - Leibniz-Zentrum für In...

work page doi:10.4230/lipics.cpm.2025.15doi:10.4230/lipics.cpm.2025.15 2025

[3] [3]

Optimal lower and upper bounds for representing sequences.ACM Trans

Djamal Belazzougui and Gonzalo Navarro. Optimal lower and upper bounds for representing sequences.ACM Trans. Algorithms, 11(4):31:1–31:21, 2015. doi:10.1145/2629339doi:10.1145/2629339

work page doi:10.1145/2629339doi:10.1145/2629339 2015

[4] [4]

Michael Burrows and David J. Wheeler. A block-sorting lossless data compres- sion algorithm. Technical Report 124, Digital Equipment Corporation, 1994. URL: https://www.hp.com/hpinfo/ex-labs/research/src/digital_src_research_report_1994_124.pdfhttps://www.hp.com/hpinfo/ex-labs/research/src/digital_src_research_report_1994_124.pdf

1994

[5] [5]

Phd thesis, University of Waterloo, 1996

David Clark.Compact Pat Trees. Phd thesis, University of Waterloo, 1996. Section 2.2.2. URL: http://www.nlc-bnc.ca/obj/s4/f2/dsk3/ftp04/nq21335.pdfhttp://www.nlc-bnc.ca/obj/s4/f2/dsk3/ftp04/nq21335.pdf

1996

[6] [6]

Computational pan-genomics: status, promises and challenges.Briefings in Bioinformatics, 19(1):118–135, 01 2018

The Computational Pan-Genomics Consortium. Computational pan-genomics: status, promises and challenges.Briefings in Bioinformatics, 19(1):118–135, 01 2018. doi:10.1093/bib/bbw089doi:10.1093/bib/bbw089

work page doi:10.1093/bib/bbw089doi:10.1093/bib/bbw089 2018

[7] [7]

Computing matching statistics on wheeler dfas

Alessio Conte, Nicola Cotumaccio, Travis Gagie, Giovanni Manzini, Nicola Prezza, and Marinella Sciortino. Computing matching statistics on wheeler dfas. In2023 Data Compression Conference (DCC), pages 150–159, 2023. doi:10.1109/DCC55655.2023.00023doi:10.1109/DCC55655.2023.00023

work page doi:10.1109/dcc55655.2023.00023doi:10.1109/dcc55655.2023.00023 2023

[8] [8]

Massimo Equi, Veli Mäkinen, and Alexandru I. Tomescu. Graphs cannot be indexed in polynomial time for sub-quadratic time string matching, unless SETH fails. In Tomás Bures, Riccardo Dondi, Johann Gamper, Giovanna Guerrini, Tomasz Jurdzinski, Claus Pahl, Florian Sikora, and Prudence W. H. Wong, editors,SOFSEM 2021: Theory and Practice of Computer Science -...

work page doi:10.1007/978-3-030-67731-2 2021

[9] [9]

Tomescu, and Roberto Grossi

Massimo Equi, Veli Mäkinen, Alexandru I. Tomescu, and Roberto Grossi. On the com- plexity of string matching for graphs.ACM Trans. Algorithms, 19(3):21:1–21:25, 2023. doi:10.1145/3588334doi:10.1145/3588334

work page doi:10.1145/3588334doi:10.1145/3588334 2023

[10] [10]

Cambridge University Press, 2023

Paolo Ferragina.Pearls of Algorithm Engineering. Cambridge University Press, 2023. doi:10.1017/9781009128933doi:10.1017/9781009128933

work page doi:10.1017/9781009128933doi:10.1017/9781009128933 2023

[11] [11]

Indexing compressed text.J

Paolo Ferragina and Giovanni Manzini. Indexing compressed text.J. ACM, 52(4):552–581, 2005. doi:10.1145/1082036.1082039doi:10.1145/1082036.1082039

work page doi:10.1145/1082036.1082039doi:10.1145/1082036.1082039 2005

[12] [12]

Wheeler graphs: A framework for bwt-based data structures.Theor

Travis Gagie, Giovanni Manzini, and Jouni Sirén. Wheeler graphs: A framework for bwt-based data structures.Theor. Comput. Sci., 698:67–78, 2017. doi:10.1016/J.TCS.2017.06.016doi:10.1016/J.TCS.2017.06.016

work page doi:10.1016/j.tcs.2017.06.016doi:10.1016/j.tcs.2017.06.016 2017

[13] [13]

Thankachan

Daniel Gibney and Sharma V. Thankachan. On the hardness and inapproximability of recognizing wheeler graphs. In Michael A. Bender, Ola Svensson, and Grzegorz Herman, editors,27th Annual European Symposium on Algorithms, ESA 2019, Munich/Garching, Germany, September 9-11, 2019, LIPIcs, pages 51:1–51:16. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 20...

work page doi:10.4230/lipics.esa.2019.51doi:10.4230/lipics.esa.2019.51 2019

[14] [14]

Compressed suffix arrays and suffix trees with applica- tions to text indexing and string matching (extended abstract)

Roberto Grossi and Jeffrey Scott Vitter. Compressed suffix arrays and suffix trees with applica- tions to text indexing and string matching (extended abstract). InProceedings of the Thirty-Second Annual ACM Symposium on Theory of Computing, STOC ’00, page 397–406, New York, NY, USA,

[15] [15]

doi:10.1145/335305.335351doi:10.1145/335305.335351

Association for Computing Machinery. doi:10.1145/335305.335351doi:10.1145/335305.335351

work page doi:10.1145/335305.335351doi:10.1145/335305.335351

[16] [16]

Space-efficient static trees and graphs

Guy Jacobson. Space-efficient static trees and graphs. In30th Annual Symposium on Foundations of Computer Science, Research Triangle Park, North Carolina, USA, 30 October - 1 November 1989, pages 549–554. IEEE Computer Society, 1989. doi:10.1109/SFCS.1989.63533doi:10.1109/SFCS.1989.63533

work page doi:10.1109/sfcs.1989.63533doi:10.1109/sfcs.1989.63533 1989

[17] [17]

Faster Prefix-Sorting Algorithms for Deterministic Finite Automata

Sung-Hwan Kim, Francisco Olivares, and Nicola Prezza. Faster Prefix-Sorting Algorithms for Deterministic Finite Automata. In Laurent Bulteau and Zsuzsanna Lipták, editors,34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023), volume 259 ofLeibniz International Proceedings in Informatics (LIPIcs), pages 16:1–16:16, Dagstuhl, Germany, 2023. Sch...

work page doi:10.4230/lipics.cpm.2023.16doi:10.4230/lipics.cpm.2023.16 2023

[18] [18]

Fast pattern matching in strings

Donald E Knuth, James H Morris, Jr, and Vaughan R Pratt. Fast pattern matching in strings. SIAM journal on computing, 6(2):323–350, 1977

1977

[19] [19]

Salzberg

Ben Langmead, Cole Trapnell, Mihai Pop, and Steven L. Salzberg. Ultrafast and memory- efficient alignment of short dna sequences to the human genome.Genome Biology, 10(3):R25,

[20] [20]

doi:10.1186/gb-2009-10-3-r25doi:10.1186/gb-2009-10-3-r25

work page doi:10.1186/gb-2009-10-3-r25doi:10.1186/gb-2009-10-3-r25 2009

[21] [21]

A survey of sequence alignment algorithms for next-generation sequencing.Briefings Bioinform., 11(5):473–483, 2010

Heng Li and Nils Homer. A survey of sequence alignment algorithms for next-generation sequencing.Briefings Bioinform., 11(5):473–483, 2010. doi:10.1093/BIB/BBQ015doi:10.1093/BIB/BBQ015

work page doi:10.1093/bib/bbq015doi:10.1093/bib/bbq015 2010

[22] [22]

Suffix arrays: a new method for on-line string searches

Udi Manber and Gene Myers. Suffix arrays: a new method for on-line string searches. In Proceedings of the First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’90, page 319–327, USA, 1990. Society for Industrial and Applied Mathematics

1990

[23] [23]

McCreight

Edward M. McCreight. A space-economical suffix tree construction algorithm.J. ACM, 23(2):262– 272, 1976. doi:10.1145/321941.321946doi:10.1145/321941.321946

work page doi:10.1145/321941.321946doi:10.1145/321941.321946 1976

[24] [24]

Indexing graphs for path queries with appli- cations in genome research.IEEE ACM Trans

Jouni Sirén, Niko Välimäki, and Veli Mäkinen. Indexing graphs for path queries with appli- cations in genome research.IEEE ACM Trans. Comput. Biol. Bioinform., 11(2):375–388, 2014. doi:10.1109/TCBB.2013.2297101doi:10.1109/TCBB.2013.2297101

work page doi:10.1109/tcbb.2013.2297101doi:10.1109/tcbb.2013.2297101 2014

[25] [25]

Haplotype-aware graph indexes.Bioinformatics, 36(2):400–407, 01 2020

Jouni Sirén, Erik Garrison, Adam M Novak, Benedict Paten, and Richard Durbin. Haplotype-aware graph indexes.Bioinformatics, 36(2):400–407, 01 2020. doi:10.1093/bioinformatics/btz575doi:10.1093/bioinformatics/btz575

work page doi:10.1093/bioinformatics/btz575doi:10.1093/bioinformatics/btz575 2020

[26] [26]

A data structure for dynamic trees.J

Daniel Dominic Sleator and Robert Endre Tarjan. A data structure for dynamic trees.J. Comput. Syst. Sci., 26(3):362–391, 1983. doi:10.1016/0022-0000(83)90006-5doi:10.1016/0022-0000(83)90006-5

work page doi:10.1016/0022-0000(83)90006-5doi:10.1016/0022-0000(83)90006-5 1983

[27] [27]

DFAs encoding the finnish subset of frequent mutations from dbSNP (GCSA’s experimental data)

University of Helsinki. DFAs encoding the finnish subset of frequent mutations from dbSNP (GCSA’s experimental data). https://www.cs.helsinki.fi/group/gsa/1000gen-FIN-WholeGenome/https://www.cs.helsinki.fi/group/gsa/1000gen-FIN-WholeGenome/, 2014. Accessed: 2026-02-04

2014

[28] [28]

Linear pattern matching algorithms

Peter Weiner. Linear pattern matching algorithms. In14th Annual Symposium on Switching and Automata Theory, Iowa City, Iowa, USA, October 15-17, 1973, pages 1–11. IEEE Computer Society,

1973

[29] [29]

doi:10.1109/SWAT.1973.13doi:10.1109/SWAT.1973.13. 16

work page doi:10.1109/swat.1973.13doi:10.1109/swat.1973.13 1973

[30] [30]

A new algorithm for optimal 2-constraint satisfaction and its implications.Theor

Ryan Williams. A new algorithm for optimal 2-constraint satisfaction and its implications.Theor. Comput. Sci., 348(2-3):357–365, 2005. doi:10.1016/J.TCS.2005.09.023doi:10.1016/J.TCS.2005.09.023. 17

work page doi:10.1016/j.tcs.2005.09.023doi:10.1016/j.tcs.2005.09.023 2005