Smallest suffixient set maintenance in near-real-time
Pith reviewed 2026-05-07 08:38 UTC · model grok-4.3
The pith
The smallest suffixient set of a string can be maintained online with polyloglog time per letter using Weiner's suffix tree.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We study how to maintain the smallest suffixient set online in near-real-time, that is with small (in our case, polyloglog) worst-case time on processing each letter. Two frameworks are considered: when the text is given letter-by-letter in either a right-to-left or left-to-right direction. Our central algorithmic tool is Weiner's suffix tree algorithm and associated algorithmic primitives for its efficient implementation.
What carries the argument
Weiner's suffix tree algorithm and its associated algorithmic primitives, which support incremental updates to track the smallest suffixient set.
If this is right
- The repetitiveness measure becomes available for streaming strings without waiting for the full text.
- Applications in compression and pattern matching can operate in near real time on growing inputs.
- Both right-to-left and left-to-right arrival directions receive the same polyloglog bound.
- The approach yields a dynamic structure that reports the size of the smallest suffixient set after each update.
Where Pith is reading between the lines
- The same incremental primitives might extend to maintaining other repetitiveness measures such as the smallest grammar size online.
- Integration with fully dynamic suffix trees could enable queries on the evolving set without restarting construction.
- Practical implementations on repetitive data like genomes could test whether the polyloglog bound yields observable speedups over rebuilding from scratch.
- The technique might combine with online compression schemes to produce adaptive compressors that track repetitiveness continuously.
Load-bearing premise
Weiner's suffix tree algorithm and its primitives can be adapted to support online maintenance with only polyloglog time per letter without additional factors depending on alphabet size or other hidden costs.
What would settle it
A concrete counterexample string where any suffix-tree-based maintenance of the smallest suffixient set requires more than polyloglog time per letter in the worst case would disprove the efficiency result.
read the original abstract
The size of the \textit{smallest suffixient set} of positions of a string recently emerged as a new measure of string \textit{repetitiveness} -- a measure reflecting how much of repetitive content the string contains. We study how to maintain the smallest suffixient set online in near-real-time, that is with small (in our case, polyloglog) worst-case time on processing each letter. Two frameworks are considered: when the text is given letter-by-letter in either a right-to-left or left-to-right direction. Our central algorithmic tool is Weiner's suffix tree algorithm and associated algorithmic primitives for its efficient implementation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper studies online maintenance of the smallest suffixient set (a repetitiveness measure) for a string processed letter-by-letter, either right-to-left or left-to-right, targeting polyloglog worst-case time per letter. The central tool is an adaptation of Weiner's suffix tree algorithm together with its associated algorithmic primitives.
Significance. If the polyloglog bound can be established rigorously, the result would advance streaming algorithms for string repetitiveness measures, with potential uses in real-time compression and data processing. The grounding in Weiner primitives is a methodological strength when the implementation details are supplied.
major comments (2)
- [Abstract] Abstract: the polyloglog time bound is asserted without any derivation, proof sketch, or identification of the specific primitives (e.g., constant-time suffix-link jumps or dynamic-tree height) that would eliminate the linear total cost of suffix-link traversals present in standard Weiner implementations.
- [Central algorithmic tool] Central algorithmic tool paragraph: the claim that Weiner primitives support online maintenance with only polyloglog cost per letter does not address potential |Σ|-dependent or word-RAM log-log factors in suffix-link navigation or dynamic tree operations; no lemma isolates the exact per-operation cost (e.g., “suffix-link traversal in O((log log n)^c)”).
minor comments (1)
- The term 'suffixient' should be defined on first use and its relation to existing repetitiveness measures (e.g., run-length or LZ) clarified for readers outside the immediate sub-area.
Simulated Author's Rebuttal
We thank the referee for the careful reading and for highlighting the need for greater clarity on the time-complexity claims. We agree that the abstract and central-tool paragraph would be strengthened by explicit identification of the primitives and a high-level cost analysis. We will revise the manuscript to address both points while preserving the core technical contribution.
read point-by-point responses
-
Referee: [Abstract] Abstract: the polyloglog time bound is asserted without any derivation, proof sketch, or identification of the specific primitives (e.g., constant-time suffix-link jumps or dynamic-tree height) that would eliminate the linear total cost of suffix-link traversals present in standard Weiner implementations.
Authors: We acknowledge that the abstract currently states the polyloglog bound without a supporting sketch. The full manuscript derives the bound from Weiner primitives that replace naïve suffix-link traversals with constant-time jumps (via precomputed or dynamically maintained link arrays) and dynamic-tree height operations whose cost is bounded by the inverse Ackermann function or polyloglog factors in the word-RAM model. In the revision we will add a concise proof outline to the abstract and a dedicated paragraph in the introduction that names these primitives and explains why the total per-letter cost remains polyloglog rather than linear. revision: yes
-
Referee: [Central algorithmic tool] Central algorithmic tool paragraph: the claim that Weiner primitives support online maintenance with only polyloglog cost per letter does not address potential |Σ|-dependent or word-RAM log-log factors in suffix-link navigation or dynamic tree operations; no lemma isolates the exact per-operation cost (e.g., “suffix-link traversal in O((log log n)^c)”).
Authors: The paragraph introduces the primitives but does not isolate their costs. We will insert a new lemma (placed immediately after the description of the adapted Weiner algorithm) that states: each letter insertion performs a constant number of suffix-link jumps and dynamic-tree height queries, each executable in O((log log n)^O(1)) time under the word-RAM model with |Σ| treated as a constant or handled via standard alphabet-reduction techniques. The lemma will cite the known polyloglog implementations of Weiner’s algorithm and explicitly bound the navigation cost, thereby eliminating any hidden linear factors. revision: yes
Circularity Check
No significant circularity; derivation relies on independent algorithmic primitives
full rationale
The paper constructs an online maintenance algorithm for the smallest suffixient set by adapting Weiner's suffix tree construction (a 1973 result external to the authors) and its associated primitives for right-to-left or left-to-right letter-by-letter processing. The polyloglog time bound is presented as a consequence of this adaptation rather than being presupposed or fitted from the target output. No equations or steps reduce the claimed result to a self-definition, a renamed input, or a load-bearing self-citation chain; the central tool is cited as established rather than derived within the paper. The skeptic concern addresses potential hidden implementation costs (correctness risk) but does not identify any reduction of the derivation to its own inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Weiner's suffix tree algorithm and its associated algorithmic primitives can be efficiently adapted for online, incremental maintenance in polyloglog time
Reference graph
Works this paper leans on
-
[1]
Italiano
Dany Breslauer and Giuseppe F. Italiano. Near real-time suffix tree construction via the fringe marked ancestor problem. J. Discrete Algorithms , 18:32--48, 2013
2013
-
[2]
Demaine, J
Andrej Brodnik, Svante Carlsson, Erik D. Demaine, J. Ian Munro, and Robert Sedgewick. Resizable arrays in optimal time and space. In Proc.\ WADS , volume 1663 of LNCS , pages 37--48, 1999
1999
-
[3]
Davide Cenzato, Lore Depuydt, Travis Gagie, Sung-Hwan Kim, Giovanni Manzini, Francisco Olivares, and Nicola Prezza. Suffixient arrays: a new efficient suffix array compression technique. CoRR , abs/2407.18753, 2025
-
[4]
On computing the smallest suffixient set
Davide Cenzato, Francisco Olivares, and Nicola Prezza. On computing the smallest suffixient set. In Zsuzsanna Lipt \' a k, Edleno Silva de Moura, Karina Figueroa, and Ricardo Baeza - Yates, editors, String Processing and Information Retrieval - 31st International Symposium, SPIRE 2024, Puerto Vallarta, Mexico, September 23-25, 2024, Proceedings , volume 1...
2024
-
[5]
Davide Cenzato, Francisco Olivares, and Nicola Prezza. Testing suffixient sets. CoRR , abs/2506.08225, 2025
-
[6]
Optimal-time dictionary-compressed indexes
Anders Roy Christiansen, Mikko Berggren Ettienne, Tomasz Kociumaka, Gonzalo Navarro, and Nicola Prezza. Optimal-time dictionary-compressed indexes. ACM Trans. Algorithms , 17(1), December 2021
2021
-
[7]
Dynamic LCA queries on trees
Richard Cole and Ramesh Hariharan. Dynamic LCA queries on trees. SIAM J. Comput. , 34(4):894--923, 2005
2005
-
[8]
Suf- fixient sets.arXiv preprint 2312.01359v3, 2024
Lore Depuydt, Travis Gagie, Ben Langmead, Giovanni Manzini, and Nicola Prezza. Suffixient sets. CoRR , abs/2312.01359, 2023
-
[9]
Alphabet-dependent string searching with wexponential search trees
Johannes Fischer and Pawel Gawrychowski. Alphabet-dependent string searching with wexponential search trees. CoRR , abs/1302.3347, 2013
-
[10]
Optimal dynamic vertical ray shooting in rectilinear planar subdivisions
Yoav Giyora and Haim Kaplan. Optimal dynamic vertical ray shooting in rectilinear planar subdivisions. ACM Trans. Algorithms , 5(3):28:1--28:51, 2009
2009
-
[11]
Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology
Dan Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology . Cambridge University Press, 1997
1997
-
[12]
At the roots of dictionary compression: string attractors
Dominik Kempa and Nicola Prezza. At the roots of dictionary compression: string attractors. In Proc.\ STOC , pages 827--840, 2018
2018
-
[13]
Near-real-time solutions for online string problems
Dominik K\"oppl and Gregory Kucherov. Near-real-time solutions for online string problems. CoRR , abs/2602.15311, 2026. to appear in CPM 2026
-
[14]
Full-fledged real-time indexing for constant size alphabets
Gregory Kucherov and Yakov Nekrich. Full-fledged real-time indexing for constant size alphabets. Algorithmica , 79(2):387--400, 2017
2017
-
[15]
Online computation of normalized substring complexity
Gregory Kucherov and Yakov Nekrich. Online computation of normalized substring complexity. CoRR , abs/2510.16454, 2025. appeared in LATIN 2026
-
[16]
Fully dynamic orthogonal range reporting on RAM
Christian Worm Mortensen. Fully dynamic orthogonal range reporting on RAM . SIAM J. Comput. , 35(6):1494--1525, 2006
2006
-
[17]
Indexing highly repetitive string collections, part I: repetitiveness measures
Gonzalo Navarro. Indexing highly repetitive string collections, part I: repetitiveness measures. ACM Comput. Surv. , 54(2):29:1--29:31, 2021
2021
-
[18]
Indexing highly repetitive string collections, part II: compressed indexes
Gonzalo Navarro. Indexing highly repetitive string collections, part II: compressed indexes. ACM Comput. Surv. , 54(2):26:1--26:32, 2021
2021
-
[19]
Smallest suffixient sets as a repetitiveness measure
Gonzalo Navarro, Giuseppe Romana, and Cristian Urbina. Smallest suffixient sets as a repetitiveness measure. In Golnaz Badkobeh, Jakub Radoszewski, Nicola Tonellotto, and Ricardo Baeza - Yates, editors, String Processing and Information Retrieval - 32nd International Symposium, SPIRE 2025, London, UK, September 8-11, 2025, Proceedings , volume 16073 of Le...
2025
-
[20]
Smallest Suffixient Sets: Effectiveness, Resilience, and Calculation
Gonzalo Navarro, Giuseppe Romana, and Cristian Urbina. Smallest suffixient sets: Effectiveness, resilience, and calculation. CoRR , abs/2506.05638v4, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[21]
Sofya Raskhodnikova, Dana Ron, Ronitt Rubinfeld, and Adam D. Smith. Sublinear algorithms for approximating string compressibility. Algorithmica , 65(3):685--709, 2013
2013
-
[22]
String Representation in Suffixient Set Size Space
Hiroki Shibata and Hideo Bannai. String representation in suffixient set size space. CoRR , abs/2604.04377, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[23]
Linear pattern matching algorithms
Peter Weiner. Linear pattern matching algorithms. In Proc. 14th Symposium on Switching and Automata Theory (SWAT) , pages 1--11, 1973
1973
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.