Smallest suffixient set maintenance in near-real-time

Dominik K\"oppl; Gregory Kucherov

arxiv: 2604.27548 · v1 · submitted 2026-04-30 · 💻 cs.DS

Smallest suffixient set maintenance in near-real-time

Dominik K\"oppl , Gregory Kucherov This is my paper

Pith reviewed 2026-05-07 08:38 UTC · model grok-4.3

classification 💻 cs.DS

keywords smallest suffixient setonline maintenancestring repetitivenessWeiner suffix treenear-real-timedata structuresstreaming textincremental algorithms

0 comments

The pith

The smallest suffixient set of a string can be maintained online with polyloglog time per letter using Weiner's suffix tree.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper shows how to keep the smallest suffixient set up to date as a string grows letter by letter. The set measures the repetitiveness of the string by identifying the fewest positions needed to capture its suffix structure. Maintaining it efficiently matters for real-time applications like compression of streaming data or analyzing repetitive sequences without delay. The method works for text added from either end and relies on efficient updates to a suffix tree.

Core claim

We study how to maintain the smallest suffixient set online in near-real-time, that is with small (in our case, polyloglog) worst-case time on processing each letter. Two frameworks are considered: when the text is given letter-by-letter in either a right-to-left or left-to-right direction. Our central algorithmic tool is Weiner's suffix tree algorithm and associated algorithmic primitives for its efficient implementation.

What carries the argument

Weiner's suffix tree algorithm and its associated algorithmic primitives, which support incremental updates to track the smallest suffixient set.

If this is right

The repetitiveness measure becomes available for streaming strings without waiting for the full text.
Applications in compression and pattern matching can operate in near real time on growing inputs.
Both right-to-left and left-to-right arrival directions receive the same polyloglog bound.
The approach yields a dynamic structure that reports the size of the smallest suffixient set after each update.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same incremental primitives might extend to maintaining other repetitiveness measures such as the smallest grammar size online.
Integration with fully dynamic suffix trees could enable queries on the evolving set without restarting construction.
Practical implementations on repetitive data like genomes could test whether the polyloglog bound yields observable speedups over rebuilding from scratch.
The technique might combine with online compression schemes to produce adaptive compressors that track repetitiveness continuously.

Load-bearing premise

Weiner's suffix tree algorithm and its primitives can be adapted to support online maintenance with only polyloglog time per letter without additional factors depending on alphabet size or other hidden costs.

What would settle it

A concrete counterexample string where any suffix-tree-based maintenance of the smallest suffixient set requires more than polyloglog time per letter in the worst case would disprove the efficiency result.

read the original abstract

The size of the \textit{smallest suffixient set} of positions of a string recently emerged as a new measure of string \textit{repetitiveness} -- a measure reflecting how much of repetitive content the string contains. We study how to maintain the smallest suffixient set online in near-real-time, that is with small (in our case, polyloglog) worst-case time on processing each letter. Two frameworks are considered: when the text is given letter-by-letter in either a right-to-left or left-to-right direction. Our central algorithmic tool is Weiner's suffix tree algorithm and associated algorithmic primitives for its efficient implementation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives the first online polyloglog-time maintenance for the smallest suffixient set using adapted Weiner suffix trees, though the time analysis may need tighter bounds on alphabet-dependent costs.

read the letter

This paper presents algorithms for maintaining the smallest suffixient set of a string as it is built letter by letter, with a claimed polyloglog worst-case time per letter. They handle both right-to-left and left-to-right online settings. The new part is applying this to the recently introduced smallest suffixient set measure. Prior work on the measure was static, so this online version is a natural next step. They use Weiner's suffix tree algorithm as the central tool, which makes sense because it builds the tree incrementally and can be extended to track the set of positions that form the smallest suffixient set. What works well is the use of established algorithmic primitives. Weiner's method is well-understood, and adapting it for two directions shows some flexibility. The absence of circular reasoning is good; everything rests on suffix tree operations rather than assumptions about the measure itself. The soft spot is in the time bound. The stress test raises a fair point about potential hidden factors from suffix link traversals or dynamic tree operations. If the paper uses hashing or assumes constant alphabet, that should be stated explicitly. Standard implementations can pick up log factors or |Σ| dependencies, so the polyloglog guarantee needs a clear breakdown of each operation's cost, like how they handle suffix link walks without linear total time. If the full paper has those details and lemmas, it strengthens the result; otherwise, it leaves room for doubt in general alphabets. This work is mainly for researchers in stringology and data structures who care about repetitiveness measures. A reader working on online string algorithms or compressed data structures would get value from the techniques. It deserves a serious referee because the problem is well-defined, the approach is grounded in prior art, and the result, if the analysis holds, adds to the toolkit for dynamic string problems. I recommend sending it to peer review. A referee can verify the implementation of the primitives and confirm the time bounds under the stated model.

Referee Report

2 major / 1 minor

Summary. The paper studies online maintenance of the smallest suffixient set (a repetitiveness measure) for a string processed letter-by-letter, either right-to-left or left-to-right, targeting polyloglog worst-case time per letter. The central tool is an adaptation of Weiner's suffix tree algorithm together with its associated algorithmic primitives.

Significance. If the polyloglog bound can be established rigorously, the result would advance streaming algorithms for string repetitiveness measures, with potential uses in real-time compression and data processing. The grounding in Weiner primitives is a methodological strength when the implementation details are supplied.

major comments (2)

[Abstract] Abstract: the polyloglog time bound is asserted without any derivation, proof sketch, or identification of the specific primitives (e.g., constant-time suffix-link jumps or dynamic-tree height) that would eliminate the linear total cost of suffix-link traversals present in standard Weiner implementations.
[Central algorithmic tool] Central algorithmic tool paragraph: the claim that Weiner primitives support online maintenance with only polyloglog cost per letter does not address potential |Σ|-dependent or word-RAM log-log factors in suffix-link navigation or dynamic tree operations; no lemma isolates the exact per-operation cost (e.g., “suffix-link traversal in O((log log n)^c)”).

minor comments (1)

The term 'suffixient' should be defined on first use and its relation to existing repetitiveness measures (e.g., run-length or LZ) clarified for readers outside the immediate sub-area.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and for highlighting the need for greater clarity on the time-complexity claims. We agree that the abstract and central-tool paragraph would be strengthened by explicit identification of the primitives and a high-level cost analysis. We will revise the manuscript to address both points while preserving the core technical contribution.

read point-by-point responses

Referee: [Abstract] Abstract: the polyloglog time bound is asserted without any derivation, proof sketch, or identification of the specific primitives (e.g., constant-time suffix-link jumps or dynamic-tree height) that would eliminate the linear total cost of suffix-link traversals present in standard Weiner implementations.

Authors: We acknowledge that the abstract currently states the polyloglog bound without a supporting sketch. The full manuscript derives the bound from Weiner primitives that replace naïve suffix-link traversals with constant-time jumps (via precomputed or dynamically maintained link arrays) and dynamic-tree height operations whose cost is bounded by the inverse Ackermann function or polyloglog factors in the word-RAM model. In the revision we will add a concise proof outline to the abstract and a dedicated paragraph in the introduction that names these primitives and explains why the total per-letter cost remains polyloglog rather than linear. revision: yes
Referee: [Central algorithmic tool] Central algorithmic tool paragraph: the claim that Weiner primitives support online maintenance with only polyloglog cost per letter does not address potential |Σ|-dependent or word-RAM log-log factors in suffix-link navigation or dynamic tree operations; no lemma isolates the exact per-operation cost (e.g., “suffix-link traversal in O((log log n)^c)”).

Authors: The paragraph introduces the primitives but does not isolate their costs. We will insert a new lemma (placed immediately after the description of the adapted Weiner algorithm) that states: each letter insertion performs a constant number of suffix-link jumps and dynamic-tree height queries, each executable in O((log log n)^O(1)) time under the word-RAM model with |Σ| treated as a constant or handled via standard alphabet-reduction techniques. The lemma will cite the known polyloglog implementations of Weiner’s algorithm and explicitly bound the navigation cost, thereby eliminating any hidden linear factors. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on independent algorithmic primitives

full rationale

The paper constructs an online maintenance algorithm for the smallest suffixient set by adapting Weiner's suffix tree construction (a 1973 result external to the authors) and its associated primitives for right-to-left or left-to-right letter-by-letter processing. The polyloglog time bound is presented as a consequence of this adaptation rather than being presupposed or fitted from the target output. No equations or steps reduce the claimed result to a self-definition, a renamed input, or a load-bearing self-citation chain; the central tool is cited as established rather than derived within the paper. The skeptic concern addresses potential hidden implementation costs (correctness risk) but does not identify any reduction of the derivation to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach relies on standard suffix tree algorithms as the central tool but extends them to the online setting; no free parameters or new entities are introduced based on the abstract.

axioms (1)

domain assumption Weiner's suffix tree algorithm and its associated algorithmic primitives can be efficiently adapted for online, incremental maintenance in polyloglog time
This is explicitly stated as the central algorithmic tool in the abstract.

pith-pipeline@v0.9.0 · 5397 in / 1243 out tokens · 41806 ms · 2026-05-07T08:38:44.037593+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 8 canonical work pages · 2 internal anchors

[1]

Italiano

Dany Breslauer and Giuseppe F. Italiano. Near real-time suffix tree construction via the fringe marked ancestor problem. J. Discrete Algorithms , 18:32--48, 2013

2013
[2]

Demaine, J

Andrej Brodnik, Svante Carlsson, Erik D. Demaine, J. Ian Munro, and Robert Sedgewick. Resizable arrays in optimal time and space. In Proc.\ WADS , volume 1663 of LNCS , pages 37--48, 1999

1999
[3]

Suffixient arrays: a new efficient suffix array compression technique.arXiv preprint 2407.18753v2, 2025

Davide Cenzato, Lore Depuydt, Travis Gagie, Sung-Hwan Kim, Giovanni Manzini, Francisco Olivares, and Nicola Prezza. Suffixient arrays: a new efficient suffix array compression technique. CoRR , abs/2407.18753, 2025

work page arXiv 2025
[4]

On computing the smallest suffixient set

Davide Cenzato, Francisco Olivares, and Nicola Prezza. On computing the smallest suffixient set. In Zsuzsanna Lipt \' a k, Edleno Silva de Moura, Karina Figueroa, and Ricardo Baeza - Yates, editors, String Processing and Information Retrieval - 31st International Symposium, SPIRE 2024, Puerto Vallarta, Mexico, September 23-25, 2024, Proceedings , volume 1...

2024
[5]

Testing suffixient sets

Davide Cenzato, Francisco Olivares, and Nicola Prezza. Testing suffixient sets. CoRR , abs/2506.08225, 2025

work page arXiv 2025
[6]

Optimal-time dictionary-compressed indexes

Anders Roy Christiansen, Mikko Berggren Ettienne, Tomasz Kociumaka, Gonzalo Navarro, and Nicola Prezza. Optimal-time dictionary-compressed indexes. ACM Trans. Algorithms , 17(1), December 2021

2021
[7]

Dynamic LCA queries on trees

Richard Cole and Ramesh Hariharan. Dynamic LCA queries on trees. SIAM J. Comput. , 34(4):894--923, 2005

2005
[8]

Suf- fixient sets.arXiv preprint 2312.01359v3, 2024

Lore Depuydt, Travis Gagie, Ben Langmead, Giovanni Manzini, and Nicola Prezza. Suffixient sets. CoRR , abs/2312.01359, 2023

work page arXiv 2023
[9]

Alphabet-dependent string searching with wexponential search trees

Johannes Fischer and Pawel Gawrychowski. Alphabet-dependent string searching with wexponential search trees. CoRR , abs/1302.3347, 2013

work page arXiv 2013
[10]

Optimal dynamic vertical ray shooting in rectilinear planar subdivisions

Yoav Giyora and Haim Kaplan. Optimal dynamic vertical ray shooting in rectilinear planar subdivisions. ACM Trans. Algorithms , 5(3):28:1--28:51, 2009

2009
[11]

Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology

Dan Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology . Cambridge University Press, 1997

1997
[12]

At the roots of dictionary compression: string attractors

Dominik Kempa and Nicola Prezza. At the roots of dictionary compression: string attractors. In Proc.\ STOC , pages 827--840, 2018

2018
[13]

Near-real-time solutions for online string problems

Dominik K\"oppl and Gregory Kucherov. Near-real-time solutions for online string problems. CoRR , abs/2602.15311, 2026. to appear in CPM 2026

work page arXiv 2026
[14]

Full-fledged real-time indexing for constant size alphabets

Gregory Kucherov and Yakov Nekrich. Full-fledged real-time indexing for constant size alphabets. Algorithmica , 79(2):387--400, 2017

2017
[15]

Online computation of normalized substring complexity

Gregory Kucherov and Yakov Nekrich. Online computation of normalized substring complexity. CoRR , abs/2510.16454, 2025. appeared in LATIN 2026

work page arXiv 2025
[16]

Fully dynamic orthogonal range reporting on RAM

Christian Worm Mortensen. Fully dynamic orthogonal range reporting on RAM . SIAM J. Comput. , 35(6):1494--1525, 2006

2006
[17]

Indexing highly repetitive string collections, part I: repetitiveness measures

Gonzalo Navarro. Indexing highly repetitive string collections, part I: repetitiveness measures. ACM Comput. Surv. , 54(2):29:1--29:31, 2021

2021
[18]

Indexing highly repetitive string collections, part II: compressed indexes

Gonzalo Navarro. Indexing highly repetitive string collections, part II: compressed indexes. ACM Comput. Surv. , 54(2):26:1--26:32, 2021

2021
[19]

Smallest suffixient sets as a repetitiveness measure

Gonzalo Navarro, Giuseppe Romana, and Cristian Urbina. Smallest suffixient sets as a repetitiveness measure. In Golnaz Badkobeh, Jakub Radoszewski, Nicola Tonellotto, and Ricardo Baeza - Yates, editors, String Processing and Information Retrieval - 32nd International Symposium, SPIRE 2025, London, UK, September 8-11, 2025, Proceedings , volume 16073 of Le...

2025
[20]

Smallest Suffixient Sets: Effectiveness, Resilience, and Calculation

Gonzalo Navarro, Giuseppe Romana, and Cristian Urbina. Smallest suffixient sets: Effectiveness, resilience, and calculation. CoRR , abs/2506.05638v4, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[21]

Sofya Raskhodnikova, Dana Ron, Ronitt Rubinfeld, and Adam D. Smith. Sublinear algorithms for approximating string compressibility. Algorithmica , 65(3):685--709, 2013

2013
[22]

String Representation in Suffixient Set Size Space

Hiroki Shibata and Hideo Bannai. String representation in suffixient set size space. CoRR , abs/2604.04377, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[23]

Linear pattern matching algorithms

Peter Weiner. Linear pattern matching algorithms. In Proc. 14th Symposium on Switching and Automata Theory (SWAT) , pages 1--11, 1973

1973

[1] [1]

Italiano

Dany Breslauer and Giuseppe F. Italiano. Near real-time suffix tree construction via the fringe marked ancestor problem. J. Discrete Algorithms , 18:32--48, 2013

2013

[2] [2]

Demaine, J

Andrej Brodnik, Svante Carlsson, Erik D. Demaine, J. Ian Munro, and Robert Sedgewick. Resizable arrays in optimal time and space. In Proc.\ WADS , volume 1663 of LNCS , pages 37--48, 1999

1999

[3] [3]

Suffixient arrays: a new efficient suffix array compression technique.arXiv preprint 2407.18753v2, 2025

Davide Cenzato, Lore Depuydt, Travis Gagie, Sung-Hwan Kim, Giovanni Manzini, Francisco Olivares, and Nicola Prezza. Suffixient arrays: a new efficient suffix array compression technique. CoRR , abs/2407.18753, 2025

work page arXiv 2025

[4] [4]

On computing the smallest suffixient set

Davide Cenzato, Francisco Olivares, and Nicola Prezza. On computing the smallest suffixient set. In Zsuzsanna Lipt \' a k, Edleno Silva de Moura, Karina Figueroa, and Ricardo Baeza - Yates, editors, String Processing and Information Retrieval - 31st International Symposium, SPIRE 2024, Puerto Vallarta, Mexico, September 23-25, 2024, Proceedings , volume 1...

2024

[5] [5]

Testing suffixient sets

Davide Cenzato, Francisco Olivares, and Nicola Prezza. Testing suffixient sets. CoRR , abs/2506.08225, 2025

work page arXiv 2025

[6] [6]

Optimal-time dictionary-compressed indexes

Anders Roy Christiansen, Mikko Berggren Ettienne, Tomasz Kociumaka, Gonzalo Navarro, and Nicola Prezza. Optimal-time dictionary-compressed indexes. ACM Trans. Algorithms , 17(1), December 2021

2021

[7] [7]

Dynamic LCA queries on trees

Richard Cole and Ramesh Hariharan. Dynamic LCA queries on trees. SIAM J. Comput. , 34(4):894--923, 2005

2005

[8] [8]

Suf- fixient sets.arXiv preprint 2312.01359v3, 2024

Lore Depuydt, Travis Gagie, Ben Langmead, Giovanni Manzini, and Nicola Prezza. Suffixient sets. CoRR , abs/2312.01359, 2023

work page arXiv 2023

[9] [9]

Alphabet-dependent string searching with wexponential search trees

Johannes Fischer and Pawel Gawrychowski. Alphabet-dependent string searching with wexponential search trees. CoRR , abs/1302.3347, 2013

work page arXiv 2013

[10] [10]

Optimal dynamic vertical ray shooting in rectilinear planar subdivisions

Yoav Giyora and Haim Kaplan. Optimal dynamic vertical ray shooting in rectilinear planar subdivisions. ACM Trans. Algorithms , 5(3):28:1--28:51, 2009

2009

[11] [11]

Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology

Dan Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology . Cambridge University Press, 1997

1997

[12] [12]

At the roots of dictionary compression: string attractors

Dominik Kempa and Nicola Prezza. At the roots of dictionary compression: string attractors. In Proc.\ STOC , pages 827--840, 2018

2018

[13] [13]

Near-real-time solutions for online string problems

Dominik K\"oppl and Gregory Kucherov. Near-real-time solutions for online string problems. CoRR , abs/2602.15311, 2026. to appear in CPM 2026

work page arXiv 2026

[14] [14]

Full-fledged real-time indexing for constant size alphabets

Gregory Kucherov and Yakov Nekrich. Full-fledged real-time indexing for constant size alphabets. Algorithmica , 79(2):387--400, 2017

2017

[15] [15]

Online computation of normalized substring complexity

Gregory Kucherov and Yakov Nekrich. Online computation of normalized substring complexity. CoRR , abs/2510.16454, 2025. appeared in LATIN 2026

work page arXiv 2025

[16] [16]

Fully dynamic orthogonal range reporting on RAM

Christian Worm Mortensen. Fully dynamic orthogonal range reporting on RAM . SIAM J. Comput. , 35(6):1494--1525, 2006

2006

[17] [17]

Indexing highly repetitive string collections, part I: repetitiveness measures

Gonzalo Navarro. Indexing highly repetitive string collections, part I: repetitiveness measures. ACM Comput. Surv. , 54(2):29:1--29:31, 2021

2021

[18] [18]

Indexing highly repetitive string collections, part II: compressed indexes

Gonzalo Navarro. Indexing highly repetitive string collections, part II: compressed indexes. ACM Comput. Surv. , 54(2):26:1--26:32, 2021

2021

[19] [19]

Smallest suffixient sets as a repetitiveness measure

Gonzalo Navarro, Giuseppe Romana, and Cristian Urbina. Smallest suffixient sets as a repetitiveness measure. In Golnaz Badkobeh, Jakub Radoszewski, Nicola Tonellotto, and Ricardo Baeza - Yates, editors, String Processing and Information Retrieval - 32nd International Symposium, SPIRE 2025, London, UK, September 8-11, 2025, Proceedings , volume 16073 of Le...

2025

[20] [20]

Smallest Suffixient Sets: Effectiveness, Resilience, and Calculation

Gonzalo Navarro, Giuseppe Romana, and Cristian Urbina. Smallest suffixient sets: Effectiveness, resilience, and calculation. CoRR , abs/2506.05638v4, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[21] [21]

Sofya Raskhodnikova, Dana Ron, Ronitt Rubinfeld, and Adam D. Smith. Sublinear algorithms for approximating string compressibility. Algorithmica , 65(3):685--709, 2013

2013

[22] [22]

String Representation in Suffixient Set Size Space

Hiroki Shibata and Hideo Bannai. String representation in suffixient set size space. CoRR , abs/2604.04377, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[23] [23]

Linear pattern matching algorithms

Peter Weiner. Linear pattern matching algorithms. In Proc. 14th Symposium on Switching and Automata Theory (SWAT) , pages 1--11, 1973

1973