Computing Smallest Suffixient Arrays in Sublinear Time

Cristian Urbina; Francisco Olivares; Giuseppe Romana; Gonzalo Navarro; Hiroto Fujimaru; Jakub Radoszewski

arxiv: 2607.00204 · v1 · pith:MAVYBYL3new · submitted 2026-06-30 · 💻 cs.DS

Computing Smallest Suffixient Arrays in Sublinear Time

Hiroto Fujimaru , Gonzalo Navarro , Francisco Olivares , Jakub Radoszewski , Giuseppe Romana , Cristian Urbina This is my paper

Pith reviewed 2026-07-02 16:46 UTC · model grok-4.3

classification 💻 cs.DS

keywords suffixient arrayBurrows-Wheeler transformsublinear timetext indexingpattern matchingrun-length compression

0 comments

The pith

A smallest suffixient array for text T of length n can be computed in O(n log σ / √log n + min(r, r-bar) log^ε n) time for any ε > 0.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a new algorithm for building the smallest suffixient array on a string T. When paired with an index that gives random access to T, this array supports multiple pattern-matching queries. The running time improves to sublinear whenever the alphabet size σ stays modest and the number of equal-letter runs in the Burrows-Wheeler transforms of T and its reverse is o(n / log^ε n). The result gives an asymptotic improvement over earlier linear-time constructions. The authors also derive several related algorithmic results that follow from the same techniques.

Core claim

We show how to compute a smallest suffixient array for T[1…n] in O(n log σ / √log n + min(r, r-bar) log^ε n) time for any ε > 0, where σ is the alphabet size of T and r and r-bar are the numbers of equal-letter runs of the BWTs of T and its reverse.

What carries the argument

The smallest suffixient array, a compact structure whose size is governed by the run-length properties of the forward and reverse Burrows-Wheeler transforms and that replaces a full suffix array for the supported queries.

If this is right

Pattern-matching queries become feasible on texts whose size makes prior linear-time construction impractical.
Preprocessing cost drops below linear whenever the input is repetitive enough that min(r, r-bar) is sublinear in n.
The same run-length information used for construction can be reused for other compressed indexes.
The method yields a family of connected algorithms whose complexity is also expressed in terms of BWT runs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The technique may be adaptable to other run-length compressed structures such as the run-length FM-index.
On genomic or versioned data, where BWT runs are naturally small, the construction could become the dominant practical improvement.
It is open whether the √log n factor in the first term can be removed while retaining the same dependence on r.

Load-bearing premise

An auxiliary index already exists that supplies direct random access to every position of the input text T.

What would settle it

Construct the array on a concrete string whose BWT run counts are known, measure wall-clock time against the stated bound, and check whether the observed complexity deviates from O(n log σ / √log n + min(r, r-bar) log^ε n).

Figures

Figures reproduced from arXiv: 2607.00204 by Cristian Urbina, Francisco Olivares, Giuseppe Romana, Gonzalo Navarro, Hiroto Fujimaru, Jakub Radoszewski.

**Figure 1.** Figure 1: Processing the BWT run-break at position i of the set S. Substrings T[SA[i] . . SA[i] + |x|] and T[SA[i − 1] . . SA[i − 1] + |x|] are candidates to be supermaximal extensions of T. We add the pairs (n − (SA[i] + |x|) + 1, |x| + 1) and (n − (SA[i − 1] + |x|) + 1, |x| + 1) to the multiset S, which represents the starting positions and length of the occurrences of those reversed candidates in T. assume that … view at source ↗

read the original abstract

A suffixient array is a novel data structure that, when combined with an index providing direct access on a text $T$, allows us to answer a variety of pattern matching queries. In this work, we show how to compute a smallest suffixient array for $T[1\dots n]$ in $O(\frac{n\log \sigma}{\sqrt{\log n}}+\min(r,\bar{r})\log^\epsilon n)$ time for any $\epsilon > 0$, where $\sigma$ is the alphabet size of $T$ and $r$ and $\bar{r}$ are the numbers of equal-letter runs of the Burrows-Wheeler transforms of $T$ and its reverse $\overline{T}$, respectively. This time complexity becomes sublinear when $\sigma$ is small enough and $\min(r,\bar{r})=o(\frac{n}{\log^\epsilon n})$, yielding an asymptotic improvement over state-of-the-art algorithms. We also present a series of connected algorithmic results.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a conditional sublinear bound for smallest suffixient arrays that assumes a direct-access index on T is already available.

read the letter

The main takeaway is that this work improves the construction time for smallest suffixient arrays to O(n log σ / √log n + min(r, r-bar) log^ε n) for any ε > 0. The bound goes sublinear when the alphabet is small and the BWTs of T and its reverse have few runs.

What is new is the specific combination of the alphabet-dependent term with the run-length term to beat the previous linear-time constructions. The abstract positions the result as an asymptotic improvement for compressible texts, and the connected algorithmic results are noted as additional contributions.

The paper does a clean job of stating the model: the suffixient array is meant to be used together with an index that supplies direct access to T. That framing is explicit from the first sentence.

The soft spot is exactly the one in the stress-test note. The claimed time is o(n) only if random accesses to T cost unit time and the index is already built. In the plain RAM model starting from the raw text, building that index costs at least linear time, so the overall construction is not sublinear. The abstract does not overclaim once the assumption is read carefully, but the paper needs to keep the conditional nature of the result front and center.

No other issues stand out from the given material. The result is presented as a direct algorithmic construction with no circular definitions or free parameters.

This is for researchers in string algorithms and compressed indexing who care about repetitive texts. A reader who works on BWT-based structures or sublinear string algorithms would get concrete value from the time bound.

It deserves peer review to verify the algorithm details and the precise model assumptions.

Referee Report

2 major / 1 minor

Summary. The paper claims an algorithm to compute a smallest suffixient array for a text T[1..n] in O(n log σ / √log n + min(r, r-bar) log^ε n) time (any ε>0), where σ is the alphabet size and r, r-bar are the equal-letter run counts in the BWTs of T and its reverse. The bound is sublinear when σ is small and min(r, r-bar)=o(n/log^ε n), improving prior work; the structure supports pattern-matching queries when paired with a direct-access index on T. Additional connected algorithmic results are presented.

Significance. If the claimed time bound holds under a clearly stated model, the result would give the first sublinear construction for this data structure, with practical relevance for compressible texts (via BWT runs) and small alphabets. The connected results add value, but the headline improvement rests on the sublinear claim.

major comments (2)

[Abstract] Abstract: the O(n log σ / √log n + min(r, r-bar) log^ε n) bound is presented as sublinear, yet the algorithm explicitly requires an auxiliary index that supplies direct (random) access to T[i] as a unit-cost primitive. No construction time or space bound for this index is supplied, and standard RAM-model constructions require Ω(n) time; this makes the sublinear claim dependent on an unaccounted-for assumption that is load-bearing for the central result.
[Abstract, §1] Abstract and opening of §1: the model statement qualifies the result as holding “when combined with an index providing direct access,” but the manuscript supplies neither a construction for the index nor a reduction showing that its cost can be absorbed into the stated bound without losing sublinearity. This leaves the complexity claim incomplete in the standard word-RAM model.

minor comments (1)

[Abstract] Notation for r-bar is introduced without an explicit definition in the abstract; a parenthetical reminder would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and for pointing out the need for greater precision in the model assumptions underlying our sublinear time bound. We respond to each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the O(n log σ / √log n + min(r, r-bar) log^ε n) bound is presented as sublinear, yet the algorithm explicitly requires an auxiliary index that supplies direct (random) access to T[i] as a unit-cost primitive. No construction time or space bound for this index is supplied, and standard RAM-model constructions require Ω(n) time; this makes the sublinear claim dependent on an unaccounted-for assumption that is load-bearing for the central result.

Authors: We agree that the stated time bound applies to the construction of the smallest suffixient array under the assumption that a direct-access index on T is already available as a unit-cost primitive. The abstract and §1 explicitly qualify the result with the phrase “when combined with an index providing direct access.” The sublinearity therefore holds relative to this model; the algorithm itself does not construct the index. In application settings where T is already stored in a structure supporting random access (for example, as part of an existing FM-index or external-memory layout), the bound is as claimed. We acknowledge, however, that the manuscript does not supply a construction for the index or absorb its cost, so the claim is incomplete if interpreted as a fully self-contained word-RAM algorithm starting from the raw text. In revision we will add an explicit paragraph clarifying the model and noting that index construction would contribute an additional Ω(n) term outside the stated bound. revision: yes
Referee: [Abstract, §1] Abstract and opening of §1: the model statement qualifies the result as holding “when combined with an index providing direct access,” but the manuscript supplies neither a construction for the index nor a reduction showing that its cost can be absorbed into the stated bound without losing sublinearity. This leaves the complexity claim incomplete in the standard word-RAM model.

Authors: The manuscript intentionally presents the result in the model where the direct-access index is supplied; no construction or absorption argument is given because the contribution concerns the suffixient-array algorithm that exploits the index. We did not claim a reduction that preserves sublinearity from the plain text. The referee is correct that this leaves the overall complexity statement incomplete for the standard word-RAM model without the index. We will revise §1 to include a short discussion of representative index constructions (e.g., a plain array or a succinct structure) and their costs, making the conditional nature of the sublinear bound fully explicit. revision: yes

Circularity Check

0 steps flagged

No circularity; algorithmic construction is self-contained under stated model.

full rationale

The paper claims an O(n log σ / √log n + min(r, r-bar) log^ε n) algorithm for constructing a smallest suffixient array, explicitly conditioned on the existence of a separate direct-access index for T. No equations, parameters, or predictions appear that reduce by construction to their own inputs; the time bound is derived from standard string algorithms on run-length compressed BWTs rather than any self-definition or fitted renaming. No load-bearing self-citations or uniqueness theorems imported from prior author work are invoked to justify the central result. The derivation therefore stands as an independent algorithmic achievement within the RAM model that treats random access as a unit-cost primitive.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on standard algorithmic assumptions without introducing new free parameters, invented entities, or ad-hoc axioms beyond the usual word-RAM model implicit in string algorithm analysis.

axioms (1)

standard math Standard word-RAM computational model with word size Θ(log n) and constant-time access to array elements.
Implicit in all time-complexity statements for string algorithms on texts of length n.

pith-pipeline@v0.9.1-grok · 5708 in / 1211 out tokens · 23751 ms · 2026-07-02T16:46:00.450831+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 25 canonical work pages · 4 internal anchors

[1]

In: Indyk, P

Babenko, M.A., Gawrychowski, P., Kociumaka, T., Starikovskaya, T.: Wavelet trees meet suffix trees. In: Indyk, P. (ed.) Proceedings of the Twenty-Sixth An- nual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, San Diego, CA, USA, January 4-6, 2015. pp. 572–591. SIAM (2015). https://doi.org/10.1137/ 1.9781611973730.39

2015
[2]

In: Gawrychowski, P., Starikovskaya, T

Belazzougui, D., Kosolobov, D., Puglisi, S.J., Raman, R.: Weighted Ancestors in Suffix Trees Revisited. In: Gawrychowski, P., Starikovskaya, T. (eds.) 32nd An- nual Symposium on Combinatorial Pattern Matching (CPM 2021). Leibniz In- ternational Proceedings in Informatics (LIPIcs), vol. 191, pp. 8:1–8:15. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, ...

work page doi:10.4230/lipics.cpm.2021.8 2021
[3]

In: Gonnet, G.H., Viola, A

Bender, M.A., Farach-Colton, M.: The LCA problem revisited. In: Gonnet, G.H., Viola, A. (eds.) LATIN 2000: Theoretical Informatics. pp. 88–94. Springer Berlin Heidelberg, Berlin, Heidelberg (2000). https://doi.org/10.1007/10719839_9

work page doi:10.1007/10719839_9 2000
[4]

Blumer, A., Blumer, J., Haussler, D., McConnell, R.M., Ehrenfeucht, A.: Complete inverted files for efficient text retrieval and analysis. J. ACM34(3), 578–595 (1987). https://doi.org/10.1145/28869.28873

work page doi:10.1145/28869.28873 1987
[5]

In: Bille, P., Prezza, N

Bonizzoni, P., Gao, Y., Riccardi, B.: Constructing suffixient arrays revisited. In: Bille, P., Prezza, N. (eds.) 37th Annual Symposium on Combinatorial Pattern Matching, CPM 2026. LIPIcs, vol. 369, pp. 30:1–30:18. Schloss Dagstuhl - Leibniz- Zentrum für Informatik (2026). https://doi.org/10.4230/LIPICS.CPM.2026.30

work page doi:10.4230/lipics.cpm.2026.30 2026
[6]

Burrows, M., Wheeler, D.: A block sorting lossless data compression algorithm. Tech. Rep. 124, Digital Equipment Corporation (1994)

1994
[7]

Cenzato, D., Depuydt, L., Gagie, T., Kim, S.H., Manzini, G., Olivares, F., Prezza, N.: Suffixient arrays: a new efficient suffix array compression technique (2025), https://arxiv.org/abs/2407.18753

work page arXiv 2025
[8]

In: Lipták, Z., de Moura, E.S., Figueroa, K., Baeza-Yates, R

Cenzato, D., Olivares, F., Prezza, N.: On computing the smallest suffixient set. In: Lipták, Z., de Moura, E.S., Figueroa, K., Baeza-Yates, R. (eds.) Proceedings of 31st International Symposium on String Processing and Information Retrieval, SPIRE 2024. Lecture Notes in Computer Science, vol. 14899, pp. 73–87 (2024). https://doi.org/10.1007/978-3-031-72200-4_6

work page doi:10.1007/978-3-031-72200-4_6 2024
[9]

Testing Suffixient Sets

Cenzato, D., Olivares, F., Prezza, N.: Testing suffixient sets (2025), https://arxiv. org/abs/2506.08225

work page internal anchor Pith review Pith/arXiv arXiv 2025
[10]

MIT Press (2009), http://mitpress.mit.edu/books/ introduction-algorithms

Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd Edition. MIT Press (2009), http://mitpress.mit.edu/books/ introduction-algorithms

2009
[11]

CoRRabs/2312.01359(2023)

Depuydt, L., Gagie, T., Langmead, B., Manzini, G., Prezza, N.: Suffixient sets. CoRRabs/2312.01359(2023). https://doi.org/10.48550/ARXIV.2312.01359

work page doi:10.48550/arxiv.2312.01359 2023
[12]

Fujimaru, H., Navarro, G., Romana, G., Urbina, C.: Smallest suffixient sets: Effec- tiveness, resilience, and calculation (2025), https://arxiv.org/abs/2506.05638

work page internal anchor Pith review Pith/arXiv arXiv 2025
[13]

In: Soto, J.A., Wiese, A

Gagie, T., Goga, A., Jeż, A., Navarro, G.: Space-efficient conversions from SLPs. In: Soto, J.A., Wiese, A. (eds.) LATIN 2024: Theoretical Informatics - 16th Latin 14 H. Fujimaru et al. American Symposium, Part I. Lecture Notes in Computer Science, vol. 14578, pp. 146–161. Springer (2024). https://doi.org/10.1007/978-3-031-55598-5_10

work page doi:10.1007/978-3-031-55598-5_10 2024
[14]

Gagie, T., Navarro, G., Prezza, N.: Fully functional suffix trees and optimal text searching in BWT-runs bounded space. J. ACM67(1) (jan 2020). https://doi.org/ 10.1145/3375890

work page doi:10.1145/3375890 2020
[15]

In: SOFSEM

Giuliani, S., Inenaga, S., Lipták, Z., Prezza, N., Sciortino, M., Toffanello, A.: Novel results on the number of runs of the Burrows-Wheeler-transform. In: SOFSEM. LectureNotesinComputerScience,vol.12607,pp.249–262.Springer(2021).https: //doi.org/10.1007/978-3-030-67731-2_18

work page doi:10.1007/978-3-030-67731-2_18 2021
[16]

In: Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algo- rithms, January 12-14, 2003, Baltimore, Maryland, USA

Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes. In: Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algo- rithms, January 12-14, 2003, Baltimore, Maryland, USA. pp. 841–850. ACM/SIAM (2003), http://dl.acm.org/citation.cfm?id=644108.644250

work page arXiv 2003
[17]

Kärkkäinen, J., Sanders, P., Burkhardt, S.: Linear work suffix array construction. J. ACM53(6), 918–936 (Nov 2006). https://doi.org/10.1145/1217856.1217858

work page doi:10.1145/1217856.1217858 2006
[18]

Kempa, D., Kociumaka, T.: String synchronizing sets: sublinear-time BWT con- structionandoptimalLCEdatastructure.In:Proceedingsofthe51stAnnualACM SIGACT Symposium on Theory of Computing. p. 756–767. STOC ’19, ACM (Jun 2019). https://doi.org/10.1145/3313276.3316368

work page doi:10.1145/3313276.3316368 2019
[19]

Approximation Algorithms for Stochastic Minimum-Norm Combinatorial Optimization , booktitle =

Kempa, D., Kociumaka, T.: Resolution of the Burrows-Wheeler Transform conjec- ture. In: Proc. 61st Annual IEEE Symposium on Foundations of Computer Science (FOCS). pp. 1014–1025 (2020). https://doi.org/10.1109/FOCS46700.2020.00097

work page doi:10.1109/focs46700.2020.00097 2020
[20]

In: Bansal, N., Nagarajan, V

Kempa, D., Kociumaka, T.: Breaking the O(n)-barrier in the construction of com- pressed suffix arrays and suffix trees. In: Bansal, N., Nagarajan, V. (eds.) Proceed- ings of the 2023 ACM-SIAM Symposium on Discrete Algorithms, SODA 2023. pp. 5122–5202. SIAM (2023). https://doi.org/10.1137/1.9781611977554.ch187

work page doi:10.1137/1.9781611977554.ch187 2023
[21]

Köppl, D., Kucherov, G.: Smallest suffixient set maintenance in near-real-time (2026), https://arxiv.org/abs/2604.27548, accepted to MFCS 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[22]

Nature526(7571), 68–74 (2015)

The 1000 Genomes Project Consortium: A global reference for human genetic vari- ation. Nature526(7571), 68–74 (2015). https://doi.org/10.1038/nature15393

work page doi:10.1038/nature15393 2015
[23]

Current Opinion in Genetics & Development15(6), 589–594 (2005)

Medini, D., Donati, C., Tettelin, H., Masignani, V., Rappuoli, R.: The microbial pan-genome. Current Opinion in Genetics & Development15(6), 589–594 (2005). https://doi.org/10.1016/j.gde.2005.09.006

work page doi:10.1016/j.gde.2005.09.006 2005
[24]

Navarro, G.: Wavelet trees for all. J. Discrete Algorithms25, 2–20 (2014). https: //doi.org/10.1016/J.JDA.2013.07.004

work page doi:10.1016/j.jda.2013.07.004 2014
[25]

ACM Comput

Navarro, G.: Indexing Highly Repetitive String Collections, Part I: Repetitiveness Measures. ACM Comput. Surv.54(2), 29:1–29:31 (2022). https://doi.org/10.1145/ 3434399

2022
[26]

ACM Comput

Navarro, G.: Indexing Highly Repetitive String Collections, Part II: Compressed Indexes. ACM Comput. Surv.54(2), 26:1–26:31 (2022). https://doi.org/10.1145/ 3432999

2022
[27]

In: Badkobeh, G., Radoszewski, J., Tonellotto, N., Baeza-Yates, R

Navarro, G., Romana, G., Urbina, C.: Smallest suffixient sets as a repetitiveness measure. In: Badkobeh, G., Radoszewski, J., Tonellotto, N., Baeza-Yates, R. (eds.) Proc. 32nd International Symposium on String Processing and Information Re- trieval (SPIRE 2025). Lecture Notes in Computer Science, vol. 16073, pp. 217–232. Springer (2025). https://doi.org/1...

work page doi:10.1007/978-3-032-05228-5_18 2025
[28]

Prezza, N.: Compressed Computation for Text Indexing. Ph.D. thesis, University of Udine (2016)

2016
[29]

Algorithmica65(3), 685–709 (2013)

Raskhodnikova, S., Ron, D., Rubinfeld, R., Smith, A.D.: Sublinear algorithms for approximating string compressibility. Algorithmica65(3), 685–709 (2013). https: //doi.org/10.1007/S00453-012-9618-6 Computing Smallest Suffixient Arrays in Sublinear Time 15

work page doi:10.1007/s00453-012-9618-6 2013
[30]

Shibata, H., Bannai, H.: String representation in suffixient set size space (2026), https://arxiv.org/abs/2604.04377

work page internal anchor Pith review Pith/arXiv arXiv 2026
[31]

Nature604(7906), 437–446 (2022)

Wang, T., Antonacci-Fulton, L., Howe, K., Lawson, H.A., Lucas, J.K., Phillippy, A.M., Popejoy, A.B., Asri, M., Carson, C., Chaisson, M.J., et al.: The human pangenome project: a global resource to map genomic diversity. Nature604(7906), 437–446 (2022). https://doi.org/10.1038/s41586-022-04601-8 A Sublinear time adaptation of Cenzato et al.’s algorithm Cen...

work page doi:10.1038/s41586-022-04601-8 2022

[1] [1]

In: Indyk, P

Babenko, M.A., Gawrychowski, P., Kociumaka, T., Starikovskaya, T.: Wavelet trees meet suffix trees. In: Indyk, P. (ed.) Proceedings of the Twenty-Sixth An- nual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, San Diego, CA, USA, January 4-6, 2015. pp. 572–591. SIAM (2015). https://doi.org/10.1137/ 1.9781611973730.39

2015

[2] [2]

In: Gawrychowski, P., Starikovskaya, T

Belazzougui, D., Kosolobov, D., Puglisi, S.J., Raman, R.: Weighted Ancestors in Suffix Trees Revisited. In: Gawrychowski, P., Starikovskaya, T. (eds.) 32nd An- nual Symposium on Combinatorial Pattern Matching (CPM 2021). Leibniz In- ternational Proceedings in Informatics (LIPIcs), vol. 191, pp. 8:1–8:15. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, ...

work page doi:10.4230/lipics.cpm.2021.8 2021

[3] [3]

In: Gonnet, G.H., Viola, A

Bender, M.A., Farach-Colton, M.: The LCA problem revisited. In: Gonnet, G.H., Viola, A. (eds.) LATIN 2000: Theoretical Informatics. pp. 88–94. Springer Berlin Heidelberg, Berlin, Heidelberg (2000). https://doi.org/10.1007/10719839_9

work page doi:10.1007/10719839_9 2000

[4] [4]

Blumer, A., Blumer, J., Haussler, D., McConnell, R.M., Ehrenfeucht, A.: Complete inverted files for efficient text retrieval and analysis. J. ACM34(3), 578–595 (1987). https://doi.org/10.1145/28869.28873

work page doi:10.1145/28869.28873 1987

[5] [5]

In: Bille, P., Prezza, N

Bonizzoni, P., Gao, Y., Riccardi, B.: Constructing suffixient arrays revisited. In: Bille, P., Prezza, N. (eds.) 37th Annual Symposium on Combinatorial Pattern Matching, CPM 2026. LIPIcs, vol. 369, pp. 30:1–30:18. Schloss Dagstuhl - Leibniz- Zentrum für Informatik (2026). https://doi.org/10.4230/LIPICS.CPM.2026.30

work page doi:10.4230/lipics.cpm.2026.30 2026

[6] [6]

Burrows, M., Wheeler, D.: A block sorting lossless data compression algorithm. Tech. Rep. 124, Digital Equipment Corporation (1994)

1994

[7] [7]

Cenzato, D., Depuydt, L., Gagie, T., Kim, S.H., Manzini, G., Olivares, F., Prezza, N.: Suffixient arrays: a new efficient suffix array compression technique (2025), https://arxiv.org/abs/2407.18753

work page arXiv 2025

[8] [8]

In: Lipták, Z., de Moura, E.S., Figueroa, K., Baeza-Yates, R

Cenzato, D., Olivares, F., Prezza, N.: On computing the smallest suffixient set. In: Lipták, Z., de Moura, E.S., Figueroa, K., Baeza-Yates, R. (eds.) Proceedings of 31st International Symposium on String Processing and Information Retrieval, SPIRE 2024. Lecture Notes in Computer Science, vol. 14899, pp. 73–87 (2024). https://doi.org/10.1007/978-3-031-72200-4_6

work page doi:10.1007/978-3-031-72200-4_6 2024

[9] [9]

Testing Suffixient Sets

Cenzato, D., Olivares, F., Prezza, N.: Testing suffixient sets (2025), https://arxiv. org/abs/2506.08225

work page internal anchor Pith review Pith/arXiv arXiv 2025

[10] [10]

MIT Press (2009), http://mitpress.mit.edu/books/ introduction-algorithms

Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd Edition. MIT Press (2009), http://mitpress.mit.edu/books/ introduction-algorithms

2009

[11] [11]

CoRRabs/2312.01359(2023)

Depuydt, L., Gagie, T., Langmead, B., Manzini, G., Prezza, N.: Suffixient sets. CoRRabs/2312.01359(2023). https://doi.org/10.48550/ARXIV.2312.01359

work page doi:10.48550/arxiv.2312.01359 2023

[12] [12]

Fujimaru, H., Navarro, G., Romana, G., Urbina, C.: Smallest suffixient sets: Effec- tiveness, resilience, and calculation (2025), https://arxiv.org/abs/2506.05638

work page internal anchor Pith review Pith/arXiv arXiv 2025

[13] [13]

In: Soto, J.A., Wiese, A

Gagie, T., Goga, A., Jeż, A., Navarro, G.: Space-efficient conversions from SLPs. In: Soto, J.A., Wiese, A. (eds.) LATIN 2024: Theoretical Informatics - 16th Latin 14 H. Fujimaru et al. American Symposium, Part I. Lecture Notes in Computer Science, vol. 14578, pp. 146–161. Springer (2024). https://doi.org/10.1007/978-3-031-55598-5_10

work page doi:10.1007/978-3-031-55598-5_10 2024

[14] [14]

Gagie, T., Navarro, G., Prezza, N.: Fully functional suffix trees and optimal text searching in BWT-runs bounded space. J. ACM67(1) (jan 2020). https://doi.org/ 10.1145/3375890

work page doi:10.1145/3375890 2020

[15] [15]

In: SOFSEM

Giuliani, S., Inenaga, S., Lipták, Z., Prezza, N., Sciortino, M., Toffanello, A.: Novel results on the number of runs of the Burrows-Wheeler-transform. In: SOFSEM. LectureNotesinComputerScience,vol.12607,pp.249–262.Springer(2021).https: //doi.org/10.1007/978-3-030-67731-2_18

work page doi:10.1007/978-3-030-67731-2_18 2021

[16] [16]

In: Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algo- rithms, January 12-14, 2003, Baltimore, Maryland, USA

Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes. In: Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algo- rithms, January 12-14, 2003, Baltimore, Maryland, USA. pp. 841–850. ACM/SIAM (2003), http://dl.acm.org/citation.cfm?id=644108.644250

work page arXiv 2003

[17] [17]

Kärkkäinen, J., Sanders, P., Burkhardt, S.: Linear work suffix array construction. J. ACM53(6), 918–936 (Nov 2006). https://doi.org/10.1145/1217856.1217858

work page doi:10.1145/1217856.1217858 2006

[18] [18]

Kempa, D., Kociumaka, T.: String synchronizing sets: sublinear-time BWT con- structionandoptimalLCEdatastructure.In:Proceedingsofthe51stAnnualACM SIGACT Symposium on Theory of Computing. p. 756–767. STOC ’19, ACM (Jun 2019). https://doi.org/10.1145/3313276.3316368

work page doi:10.1145/3313276.3316368 2019

[19] [19]

Approximation Algorithms for Stochastic Minimum-Norm Combinatorial Optimization , booktitle =

Kempa, D., Kociumaka, T.: Resolution of the Burrows-Wheeler Transform conjec- ture. In: Proc. 61st Annual IEEE Symposium on Foundations of Computer Science (FOCS). pp. 1014–1025 (2020). https://doi.org/10.1109/FOCS46700.2020.00097

work page doi:10.1109/focs46700.2020.00097 2020

[20] [20]

In: Bansal, N., Nagarajan, V

Kempa, D., Kociumaka, T.: Breaking the O(n)-barrier in the construction of com- pressed suffix arrays and suffix trees. In: Bansal, N., Nagarajan, V. (eds.) Proceed- ings of the 2023 ACM-SIAM Symposium on Discrete Algorithms, SODA 2023. pp. 5122–5202. SIAM (2023). https://doi.org/10.1137/1.9781611977554.ch187

work page doi:10.1137/1.9781611977554.ch187 2023

[21] [21]

Köppl, D., Kucherov, G.: Smallest suffixient set maintenance in near-real-time (2026), https://arxiv.org/abs/2604.27548, accepted to MFCS 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[22] [22]

Nature526(7571), 68–74 (2015)

The 1000 Genomes Project Consortium: A global reference for human genetic vari- ation. Nature526(7571), 68–74 (2015). https://doi.org/10.1038/nature15393

work page doi:10.1038/nature15393 2015

[23] [23]

Current Opinion in Genetics & Development15(6), 589–594 (2005)

Medini, D., Donati, C., Tettelin, H., Masignani, V., Rappuoli, R.: The microbial pan-genome. Current Opinion in Genetics & Development15(6), 589–594 (2005). https://doi.org/10.1016/j.gde.2005.09.006

work page doi:10.1016/j.gde.2005.09.006 2005

[24] [24]

Navarro, G.: Wavelet trees for all. J. Discrete Algorithms25, 2–20 (2014). https: //doi.org/10.1016/J.JDA.2013.07.004

work page doi:10.1016/j.jda.2013.07.004 2014

[25] [25]

ACM Comput

Navarro, G.: Indexing Highly Repetitive String Collections, Part I: Repetitiveness Measures. ACM Comput. Surv.54(2), 29:1–29:31 (2022). https://doi.org/10.1145/ 3434399

2022

[26] [26]

ACM Comput

Navarro, G.: Indexing Highly Repetitive String Collections, Part II: Compressed Indexes. ACM Comput. Surv.54(2), 26:1–26:31 (2022). https://doi.org/10.1145/ 3432999

2022

[27] [27]

In: Badkobeh, G., Radoszewski, J., Tonellotto, N., Baeza-Yates, R

Navarro, G., Romana, G., Urbina, C.: Smallest suffixient sets as a repetitiveness measure. In: Badkobeh, G., Radoszewski, J., Tonellotto, N., Baeza-Yates, R. (eds.) Proc. 32nd International Symposium on String Processing and Information Re- trieval (SPIRE 2025). Lecture Notes in Computer Science, vol. 16073, pp. 217–232. Springer (2025). https://doi.org/1...

work page doi:10.1007/978-3-032-05228-5_18 2025

[28] [28]

Prezza, N.: Compressed Computation for Text Indexing. Ph.D. thesis, University of Udine (2016)

2016

[29] [29]

Algorithmica65(3), 685–709 (2013)

Raskhodnikova, S., Ron, D., Rubinfeld, R., Smith, A.D.: Sublinear algorithms for approximating string compressibility. Algorithmica65(3), 685–709 (2013). https: //doi.org/10.1007/S00453-012-9618-6 Computing Smallest Suffixient Arrays in Sublinear Time 15

work page doi:10.1007/s00453-012-9618-6 2013

[30] [30]

Shibata, H., Bannai, H.: String representation in suffixient set size space (2026), https://arxiv.org/abs/2604.04377

work page internal anchor Pith review Pith/arXiv arXiv 2026

[31] [31]

Nature604(7906), 437–446 (2022)

Wang, T., Antonacci-Fulton, L., Howe, K., Lawson, H.A., Lucas, J.K., Phillippy, A.M., Popejoy, A.B., Asri, M., Carson, C., Chaisson, M.J., et al.: The human pangenome project: a global resource to map genomic diversity. Nature604(7906), 437–446 (2022). https://doi.org/10.1038/s41586-022-04601-8 A Sublinear time adaptation of Cenzato et al.’s algorithm Cen...

work page doi:10.1038/s41586-022-04601-8 2022