String Attractors and Combinatorics on Words
Pith reviewed 2026-05-24 23:30 UTC · model grok-4.3
The pith
String attractors can serve as new tools to investigate combinatorial properties of words.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By focusing on several families of finite words, the paper establishes that the notion of string attractor yields new combinatorial tools, as the size and placement of attractors reflect properties of the words under study.
What carries the argument
A string attractor, defined as a subset of positions in a word such that every distinct factor has an occurrence crossing at least one position in the subset.
If this is right
- Dictionary-based compressors can be reinterpreted as approximations to minimal string attractors.
- Attractor size provides a measure that connects compressibility to combinatorial repetition patterns.
- Different word families admit attractors whose structure distinguishes them combinatorially.
- The NP-completeness of finding a smallest attractor remains compatible with its use as an analytical device.
Where Pith is reading between the lines
- Attractor-based analysis might extend to infinite words and yield characterizations of classes such as Sturmian or episturmian words.
- One could define new complexity functions that count the minimal attractor size over all factors of given length.
- The approach may unify existing repetition measures like runs or squares with compression-derived quantities.
Load-bearing premise
Results obtained on several families of finite words are enough to support the claim that string attractors form new investigative tools for combinatorial properties.
What would settle it
A family of words in which the minimal string attractor size or configuration shows no relation to any established combinatorial measure such as factor complexity or periodicity would undermine the suggestion.
Figures
read the original abstract
The notion of \emph{string attractor} has recently been introduced in [Prezza, 2017] and studied in [Kempa and Prezza, 2018] to provide a unifying framework for known dictionary-based compressors. A string attractor for a word $w=w[1]w[2]\cdots w[n]$ is a subset $\Gamma$ of the positions $\{1,\ldots,n\}$, such that all distinct factors of $w$ have an occurrence crossing at least one of the elements of $\Gamma$. While finding the smallest string attractor for a word is a NP-complete problem, it has been proved in [Kempa and Prezza, 2018] that dictionary compressors can be interpreted as algorithms approximating the smallest string attractor for a given word. In this paper we explore the notion of string attractor from a combinatorial point of view, by focusing on several families of finite words. The results presented in the paper suggest that the notion of string attractor can be used to define new tools to investigate combinatorial properties of the words.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper studies string attractors from a combinatorial viewpoint on several families of finite words (Sturmian, episturmian, Thue-Morse and others). It computes or bounds the size and positions of (smallest) attractors for these families and concludes that the notion can be used to define new tools for investigating combinatorial properties of words.
Significance. If the family-specific results hold, they supply concrete attractor characterizations that may correlate with known measures such as factor complexity. However, without a general construction, new invariant, or decision procedure that extends beyond the studied families, the suggestion that attractors constitute reusable investigative tools remains suggestive rather than demonstrated.
major comments (2)
- [Abstract] Abstract and concluding section: the central claim that results on specific finite-word families 'suggest' string attractors define new investigative tools is not supported by any meta-theorem, reusable method, or explicit link to a classical property (e.g., factor complexity or abelian complexity) that existing combinatorics-on-words machinery could not already provide.
- [Conclusion] The manuscript supplies attractor sizes/positions or bounds for particular families but does not exhibit a general construction or falsifiable prediction that would allow the case studies to generalize; this makes the broader suggestion rest on the hope of future extension rather than a demonstrated tool.
minor comments (2)
- Notation for attractor size and positions should be introduced once in a dedicated preliminary section rather than repeated per family.
- Missing references to related work on attractor approximations via dictionary compressors beyond the two cited papers.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major point below and indicate where revisions will be made to clarify the scope of our contributions.
read point-by-point responses
-
Referee: [Abstract] Abstract and concluding section: the central claim that results on specific finite-word families 'suggest' string attractors define new investigative tools is not supported by any meta-theorem, reusable method, or explicit link to a classical property (e.g., factor complexity or abelian complexity) that existing combinatorics-on-words machinery could not already provide.
Authors: We agree that the manuscript presents case studies on specific families rather than a general meta-theorem or reusable method applicable beyond them. The concrete results include explicit attractor sizes, positions, and bounds for Sturmian, episturmian, Thue-Morse, and related words. These characterizations are new and may correlate with classical measures, but the abstract's phrasing that they 'suggest' string attractors define new tools is indeed suggestive rather than demonstrated. We will revise the abstract to state that the family-specific results provide concrete characterizations indicating the potential utility of attractors for investigating word properties, without claiming they already constitute developed investigative tools. revision: yes
-
Referee: [Conclusion] The manuscript supplies attractor sizes/positions or bounds for particular families but does not exhibit a general construction or falsifiable prediction that would allow the case studies to generalize; this makes the broader suggestion rest on the hope of future extension rather than a demonstrated tool.
Authors: The manuscript's contribution consists of the family-specific computations and bounds; no general construction or falsifiable prediction for arbitrary words is provided or claimed. The suggestion in the conclusion is drawn from the patterns observed in these cases. We will revise the concluding section to focus on summarizing the obtained attractor characterizations for the studied families and to present the idea of new tools as a possible direction for future work rather than an established outcome of the current results. revision: yes
Circularity Check
No circularity; results on word families are independent case studies
full rationale
The paper cites external prior work (Prezza 2017; Kempa and Prezza 2018) solely for the definition of string attractors and then presents combinatorial computations and bounds for specific families (Sturmian, episturmian, Thue-Morse, etc.). These results do not reduce by the paper's own equations to quantities fitted or defined in the citations; the central suggestion that attractors supply new investigative tools rests on the exhibited case studies rather than on any self-referential derivation, fitted-input prediction, or load-bearing self-citation chain. No self-definitional, ansatz-smuggling, or renaming patterns appear.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We compute the size of a smallest string attractor for standard Sturmian words (size 2, consecutive positions), Thue-Morse (size n, conjecture minimal) and de Bruijn words (asymptotically n/log n).
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat recovery unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
String attractor defined as positions covering all distinct factors; links to BWT runs (Theorem 1) and LZ phrases (Theorem 2).
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Berstel, J., de Luca, A.: Sturmian words, Lyndon words and trees. Theoret. Com- put. Sci. 178(1), 171 – 203 (1997)
work page 1997
-
[2]
Discrete Applied Math- ematics 24(1-3), 83–96 (1989)
Brlek, S.: Enumeration of factors in the Thue-Morse word. Discrete Applied Math- ematics 24(1-3), 83–96 (1989)
work page 1989
-
[3]
Theoretical Computer Science 410(43), 4372–4381 (2009)
Castiglione, G., Restivo, A., Sciortino, M.: Circular sturmian words and Hopcroft’s algorithm. Theoretical Computer Science 410(43), 4372–4381 (2009)
work page 2009
-
[4]
Jour- nal of Combinatorial Theory, Series A 14(1), 8 – 20 (1973)
Fraenkel, A.S.: Complementing and exactly covering sequences. Jour- nal of Combinatorial Theory, Series A 14(1), 8 – 20 (1973). https://doi.org/https://doi.org/10.1016/0097-3165(73)90059-9
-
[5]
Justin, Jacques: Episturmian morphisms and a galois theorem on con- tinued fractions. RAIRO-Theor. Inf. Appl. 39(1), 207–215 (2005). https://doi.org/10.1051/ita:2005012, https://doi.org/10.1051/ita:2005012
-
[6]
CoRR abs/1710.10964 (2017), http://arxiv.org/abs/1710.10964
Kempa, D., Prezza, N.: At the roots of dictionary compression: String attractors. CoRR abs/1710.10964 (2017), http://arxiv.org/abs/1710.10964
-
[7]
Kempa, D., Prezza, N.: At the roots of dictionary compression: string attractors. In: Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Com- puting, STOC 2018, Los Angeles, CA, USA, June 25-29, 2018. pp. 827–840. ACM (2018)
work page 2018
-
[8]
Kida, T., Matsumoto, T., Shibata, Y., Takeda, M., Shinohara, A., Arikawa, S.: Collage system: a unifying framework for compressed pattern matching. Theoret. Comput. Sci. 298(1), 253 – 272 (2003)
work page 2003
-
[9]
SIAM Journal on Computing 6(2), 323–350 (1977)
Knuth, D., Morris, Jr., J., Pratt, V.: Fast pattern matching in strings. SIAM Journal on Computing 6(2), 323–350 (1977)
work page 1977
-
[10]
IEEE Transactions on In- formation Theory 22(1), 75–81 (1976)
Lempel, A., Ziv, J.: On the complexity of finite sequences. IEEE Transactions on In- formation Theory 22(1), 75–81 (1976). https://doi.org/10.1109/TIT.1976.1055501
-
[11]
Cambridge University Press, New York, NY, USA (2005)
Lothaire, M.: Applied Combinatorics on Words (Encyclopedia of Mathematics and its Applications). Cambridge University Press, New York, NY, USA (2005)
work page 2005
-
[12]
Louza, F.A., Telles, G.P., Gog, S., Zhao, L.: Computing Burrows-Wheeler Similar- ity Distributions for String Collections. In: SPIRE 2018. Lecture Notes in Com- puter Science, vol. 11147, pp. 285–296. Springer (2018)
work page 2018
-
[13]
The- oretical Computer Science 136(2), 361–385 (1994)
de Luca, A., Mignosi, F.: Some combinatorial properties of sturmian words. The- oretical Computer Science 136(2), 361–385 (1994)
work page 1994
- [14]
-
[15]
Mantaci, S., Restivo, A., Rosone, G., Sciortino, M., Versari, L.: Measuring the clustering effect of BWT via RLE. Theor. Comput. Sci. 698, 79–87 (2017)
work page 2017
-
[16]
Information Processing Letters 86, 241–246 (2003)
Mantaci, S., Restivo, A., Sciortino, M.: Burrows-Wheeler transform and Sturmian words. Information Processing Letters 86, 241–246 (2003)
work page 2003
-
[17]
Theoretical Computer Science 410(38), 3782 – 3791 (2009)
Paquin, G.: On a generalization of christoffel words: epichristoffel words. Theoretical Computer Science 410(38), 3782 – 3791 (2009). https://doi.org/https://doi.org/10.1016/j.tcs.2009.05.014 XIV S. Mantaci, A. Restivo, G. Romana, G. Rosone and M. Sciortino
-
[18]
Paquin, G., Vuillon, L.: A characterization of balanced episturmian sequences. The Electronic Journal of Combinatorics [electronic only] 14(1), Research paper R33, 12 p.–Research paper R33, 12 p. (2007)
work page 2007
-
[19]
Prezza, N.: String attractors. CoRR abs/1709.05314 (2017), http://arxiv.org/ abs/1709.05314
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[20]
Algorithms for Molecular Biology 14(1), 3:1–3:13 (2019)
Prezza, N., Pisanti, N., Sciortino, M., Rosone, G.: SNPs detection by eBWT posi- tional clustering. Algorithms for Molecular Biology 14(1), 3:1–3:13 (2019)
work page 2019
-
[21]
Theoretical Computer Science 410(30-32), 3018 – 3026 (2009)
Restivo, A., Rosone, G.: Burrows-Wheeler transform and palindromic richness. Theoretical Computer Science 410(30-32), 3018 – 3026 (2009)
work page 2009
-
[22]
Theoretical Computer Science 412(27), 3019 – 3032 (2011)
Restivo, A., Rosone, G.: Balancing and clustering of words in the Burrows-Wheeler transform. Theoretical Computer Science 412(27), 3019 – 3032 (2011)
work page 2011
-
[23]
Elec- tronic Journal of Combinatorics 15 (article R83, 2008)
Simpson, J., Puglisi, S.J.: Words with simple Burrows-Wheeler transforms. Elec- tronic Journal of Combinatorics 15 (article R83, 2008)
work page 2008
-
[24]
Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theor. 23(3), 337–343 (1977)
work page 1977
-
[25]
Ziv, J., Lempel, A.: Compression of individual sequences via variable-length coding. IEEE Trans. Inf. Theor. 24, 530–536 (1978)
work page 1978
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.