pith. sign in

arxiv: 1907.04660 · v1 · pith:6J4U23T4new · submitted 2019-07-10 · 💻 cs.DS · cs.FL

String Attractors and Combinatorics on Words

Pith reviewed 2026-05-24 23:30 UTC · model grok-4.3

classification 💻 cs.DS cs.FL
keywords string attractorscombinatorics on wordsfinite wordsdictionary compressorsword factorsrepetitions in words
0
0 comments X

The pith

String attractors can serve as new tools to investigate combinatorial properties of words.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper takes the string attractor notion, a set of positions that every distinct factor of a word must cross, and studies it on multiple families of finite words. Prior work had shown that dictionary compressors approximate the smallest such attractor. The authors conclude from their combinatorial examinations that attractors supply a fresh way to probe word structure. A reader would care because the approach potentially bridges compression algorithms with the study of repetitions and factors in words.

Core claim

By focusing on several families of finite words, the paper establishes that the notion of string attractor yields new combinatorial tools, as the size and placement of attractors reflect properties of the words under study.

What carries the argument

A string attractor, defined as a subset of positions in a word such that every distinct factor has an occurrence crossing at least one position in the subset.

If this is right

  • Dictionary-based compressors can be reinterpreted as approximations to minimal string attractors.
  • Attractor size provides a measure that connects compressibility to combinatorial repetition patterns.
  • Different word families admit attractors whose structure distinguishes them combinatorially.
  • The NP-completeness of finding a smallest attractor remains compatible with its use as an analytical device.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Attractor-based analysis might extend to infinite words and yield characterizations of classes such as Sturmian or episturmian words.
  • One could define new complexity functions that count the minimal attractor size over all factors of given length.
  • The approach may unify existing repetition measures like runs or squares with compression-derived quantities.

Load-bearing premise

Results obtained on several families of finite words are enough to support the claim that string attractors form new investigative tools for combinatorial properties.

What would settle it

A family of words in which the minimal string attractor size or configuration shows no relation to any established combinatorial measure such as factor complexity or periodicity would undermine the suggestion.

Figures

Figures reproduced from arXiv: 1907.04660 by Antonio Restivo, Giovanna Rosone, Giuseppe Romana, Marinella Sciortino, Sabrina Mantaci.

Figure 1
Figure 1. Figure 1: String attractors Γn for the word tn = ϕ n (a), with n = 3, 4, 5 (the positions in Γn are in bold), i.e. Γ3 = {3, 5, 6}, Γ4 = {3, 6, 9, 12}, Γ5 = {3, 6, 12, 17, 24}. Note that in Γ4 the position 9 is obtained by ADD(9) operation, the position 12 is obtained by the operation MOVE(5, 3·4). In Γ5 the position 17 is obtained by ADD(17) operation, the position 24 is obtained by the operation MOVE(9, 3 · 8). The… view at source ↗
read the original abstract

The notion of \emph{string attractor} has recently been introduced in [Prezza, 2017] and studied in [Kempa and Prezza, 2018] to provide a unifying framework for known dictionary-based compressors. A string attractor for a word $w=w[1]w[2]\cdots w[n]$ is a subset $\Gamma$ of the positions $\{1,\ldots,n\}$, such that all distinct factors of $w$ have an occurrence crossing at least one of the elements of $\Gamma$. While finding the smallest string attractor for a word is a NP-complete problem, it has been proved in [Kempa and Prezza, 2018] that dictionary compressors can be interpreted as algorithms approximating the smallest string attractor for a given word. In this paper we explore the notion of string attractor from a combinatorial point of view, by focusing on several families of finite words. The results presented in the paper suggest that the notion of string attractor can be used to define new tools to investigate combinatorial properties of the words.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper studies string attractors from a combinatorial viewpoint on several families of finite words (Sturmian, episturmian, Thue-Morse and others). It computes or bounds the size and positions of (smallest) attractors for these families and concludes that the notion can be used to define new tools for investigating combinatorial properties of words.

Significance. If the family-specific results hold, they supply concrete attractor characterizations that may correlate with known measures such as factor complexity. However, without a general construction, new invariant, or decision procedure that extends beyond the studied families, the suggestion that attractors constitute reusable investigative tools remains suggestive rather than demonstrated.

major comments (2)
  1. [Abstract] Abstract and concluding section: the central claim that results on specific finite-word families 'suggest' string attractors define new investigative tools is not supported by any meta-theorem, reusable method, or explicit link to a classical property (e.g., factor complexity or abelian complexity) that existing combinatorics-on-words machinery could not already provide.
  2. [Conclusion] The manuscript supplies attractor sizes/positions or bounds for particular families but does not exhibit a general construction or falsifiable prediction that would allow the case studies to generalize; this makes the broader suggestion rest on the hope of future extension rather than a demonstrated tool.
minor comments (2)
  1. Notation for attractor size and positions should be introduced once in a dedicated preliminary section rather than repeated per family.
  2. Missing references to related work on attractor approximations via dictionary compressors beyond the two cited papers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and indicate where revisions will be made to clarify the scope of our contributions.

read point-by-point responses
  1. Referee: [Abstract] Abstract and concluding section: the central claim that results on specific finite-word families 'suggest' string attractors define new investigative tools is not supported by any meta-theorem, reusable method, or explicit link to a classical property (e.g., factor complexity or abelian complexity) that existing combinatorics-on-words machinery could not already provide.

    Authors: We agree that the manuscript presents case studies on specific families rather than a general meta-theorem or reusable method applicable beyond them. The concrete results include explicit attractor sizes, positions, and bounds for Sturmian, episturmian, Thue-Morse, and related words. These characterizations are new and may correlate with classical measures, but the abstract's phrasing that they 'suggest' string attractors define new tools is indeed suggestive rather than demonstrated. We will revise the abstract to state that the family-specific results provide concrete characterizations indicating the potential utility of attractors for investigating word properties, without claiming they already constitute developed investigative tools. revision: yes

  2. Referee: [Conclusion] The manuscript supplies attractor sizes/positions or bounds for particular families but does not exhibit a general construction or falsifiable prediction that would allow the case studies to generalize; this makes the broader suggestion rest on the hope of future extension rather than a demonstrated tool.

    Authors: The manuscript's contribution consists of the family-specific computations and bounds; no general construction or falsifiable prediction for arbitrary words is provided or claimed. The suggestion in the conclusion is drawn from the patterns observed in these cases. We will revise the concluding section to focus on summarizing the obtained attractor characterizations for the studied families and to present the idea of new tools as a possible direction for future work rather than an established outcome of the current results. revision: yes

Circularity Check

0 steps flagged

No circularity; results on word families are independent case studies

full rationale

The paper cites external prior work (Prezza 2017; Kempa and Prezza 2018) solely for the definition of string attractors and then presents combinatorial computations and bounds for specific families (Sturmian, episturmian, Thue-Morse, etc.). These results do not reduce by the paper's own equations to quantities fitted or defined in the citations; the central suggestion that attractors supply new investigative tools rests on the exhibited case studies rather than on any self-referential derivation, fitted-input prediction, or load-bearing self-citation chain. No self-definitional, ansatz-smuggling, or renaming patterns appear.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no information on free parameters, background axioms, or newly postulated entities; the work appears to be a purely theoretical exploration without numerical fitting.

pith-pipeline@v0.9.0 · 5721 in / 1028 out tokens · 22096 ms · 2026-05-24T23:30:10.155356+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · 1 internal anchor

  1. [1]

    Berstel, J., de Luca, A.: Sturmian words, Lyndon words and trees. Theoret. Com- put. Sci. 178(1), 171 – 203 (1997)

  2. [2]

    Discrete Applied Math- ematics 24(1-3), 83–96 (1989)

    Brlek, S.: Enumeration of factors in the Thue-Morse word. Discrete Applied Math- ematics 24(1-3), 83–96 (1989)

  3. [3]

    Theoretical Computer Science 410(43), 4372–4381 (2009)

    Castiglione, G., Restivo, A., Sciortino, M.: Circular sturmian words and Hopcroft’s algorithm. Theoretical Computer Science 410(43), 4372–4381 (2009)

  4. [4]

    Jour- nal of Combinatorial Theory, Series A 14(1), 8 – 20 (1973)

    Fraenkel, A.S.: Complementing and exactly covering sequences. Jour- nal of Combinatorial Theory, Series A 14(1), 8 – 20 (1973). https://doi.org/https://doi.org/10.1016/0097-3165(73)90059-9

  5. [5]

    RAIRO-Theor

    Justin, Jacques: Episturmian morphisms and a galois theorem on con- tinued fractions. RAIRO-Theor. Inf. Appl. 39(1), 207–215 (2005). https://doi.org/10.1051/ita:2005012, https://doi.org/10.1051/ita:2005012

  6. [6]

    CoRR abs/1710.10964 (2017), http://arxiv.org/abs/1710.10964

    Kempa, D., Prezza, N.: At the roots of dictionary compression: String attractors. CoRR abs/1710.10964 (2017), http://arxiv.org/abs/1710.10964

  7. [7]

    In: Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Com- puting, STOC 2018, Los Angeles, CA, USA, June 25-29, 2018

    Kempa, D., Prezza, N.: At the roots of dictionary compression: string attractors. In: Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Com- puting, STOC 2018, Los Angeles, CA, USA, June 25-29, 2018. pp. 827–840. ACM (2018)

  8. [8]

    Kida, T., Matsumoto, T., Shibata, Y., Takeda, M., Shinohara, A., Arikawa, S.: Collage system: a unifying framework for compressed pattern matching. Theoret. Comput. Sci. 298(1), 253 – 272 (2003)

  9. [9]

    SIAM Journal on Computing 6(2), 323–350 (1977)

    Knuth, D., Morris, Jr., J., Pratt, V.: Fast pattern matching in strings. SIAM Journal on Computing 6(2), 323–350 (1977)

  10. [10]

    IEEE Transactions on In- formation Theory 22(1), 75–81 (1976)

    Lempel, A., Ziv, J.: On the complexity of finite sequences. IEEE Transactions on In- formation Theory 22(1), 75–81 (1976). https://doi.org/10.1109/TIT.1976.1055501

  11. [11]

    Cambridge University Press, New York, NY, USA (2005)

    Lothaire, M.: Applied Combinatorics on Words (Encyclopedia of Mathematics and its Applications). Cambridge University Press, New York, NY, USA (2005)

  12. [12]

    In: SPIRE 2018

    Louza, F.A., Telles, G.P., Gog, S., Zhao, L.: Computing Burrows-Wheeler Similar- ity Distributions for String Collections. In: SPIRE 2018. Lecture Notes in Com- puter Science, vol. 11147, pp. 285–296. Springer (2018)

  13. [13]

    The- oretical Computer Science 136(2), 361–385 (1994)

    de Luca, A., Mignosi, F.: Some combinatorial properties of sturmian words. The- oretical Computer Science 136(2), 361–385 (1994)

  14. [14]

    LNCS, vol

    de Luca, A.: Combinatorics of standard sturmian words. LNCS, vol. 1261, pp. 249–267. Springer (1997)

  15. [15]

    Mantaci, S., Restivo, A., Rosone, G., Sciortino, M., Versari, L.: Measuring the clustering effect of BWT via RLE. Theor. Comput. Sci. 698, 79–87 (2017)

  16. [16]

    Information Processing Letters 86, 241–246 (2003)

    Mantaci, S., Restivo, A., Sciortino, M.: Burrows-Wheeler transform and Sturmian words. Information Processing Letters 86, 241–246 (2003)

  17. [17]

    Theoretical Computer Science 410(38), 3782 – 3791 (2009)

    Paquin, G.: On a generalization of christoffel words: epichristoffel words. Theoretical Computer Science 410(38), 3782 – 3791 (2009). https://doi.org/https://doi.org/10.1016/j.tcs.2009.05.014 XIV S. Mantaci, A. Restivo, G. Romana, G. Rosone and M. Sciortino

  18. [18]

    The Electronic Journal of Combinatorics [electronic only] 14(1), Research paper R33, 12 p.–Research paper R33, 12 p

    Paquin, G., Vuillon, L.: A characterization of balanced episturmian sequences. The Electronic Journal of Combinatorics [electronic only] 14(1), Research paper R33, 12 p.–Research paper R33, 12 p. (2007)

  19. [19]

    String Attractors

    Prezza, N.: String attractors. CoRR abs/1709.05314 (2017), http://arxiv.org/ abs/1709.05314

  20. [20]

    Algorithms for Molecular Biology 14(1), 3:1–3:13 (2019)

    Prezza, N., Pisanti, N., Sciortino, M., Rosone, G.: SNPs detection by eBWT posi- tional clustering. Algorithms for Molecular Biology 14(1), 3:1–3:13 (2019)

  21. [21]

    Theoretical Computer Science 410(30-32), 3018 – 3026 (2009)

    Restivo, A., Rosone, G.: Burrows-Wheeler transform and palindromic richness. Theoretical Computer Science 410(30-32), 3018 – 3026 (2009)

  22. [22]

    Theoretical Computer Science 412(27), 3019 – 3032 (2011)

    Restivo, A., Rosone, G.: Balancing and clustering of words in the Burrows-Wheeler transform. Theoretical Computer Science 412(27), 3019 – 3032 (2011)

  23. [23]

    Elec- tronic Journal of Combinatorics 15 (article R83, 2008)

    Simpson, J., Puglisi, S.J.: Words with simple Burrows-Wheeler transforms. Elec- tronic Journal of Combinatorics 15 (article R83, 2008)

  24. [24]

    IEEE Trans

    Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theor. 23(3), 337–343 (1977)

  25. [25]

    IEEE Trans

    Ziv, J., Lempel, A.: Compression of individual sequences via variable-length coding. IEEE Trans. Inf. Theor. 24, 530–536 (1978)