pith. sign in

arxiv: 2108.13968 · v6 · submitted 2021-08-31 · 💻 cs.FL · cs.DS

Absent Subsequences in Words

Pith reviewed 2026-05-24 13:16 UTC · model grok-4.3

classification 💻 cs.FL cs.DS
keywords absent subsequencesminimal absent subsequencesshortest absent subsequencesstring algorithmscombinatorics on wordsscattered factors
0
0 comments X

The pith

Absent subsequences in a word admit combinatorial characterizations that support efficient algorithms for the minimal and shortest ones.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper extends the notion of absent factors, which are strings that never appear as contiguous substrings, to absent subsequences, which never appear even when letters can be picked non-contiguously. It focuses on two special kinds: shortest absent subsequences, those of smallest possible length, and minimal absent subsequences, those whose every proper subsequence does appear. The authors supply explicit descriptions of the sets of all such objects for any given word, show compact ways to store them, and supply algorithms that decide membership and produce the lexicographically smallest examples of each kind. They also build a data structure that answers shortest-absent-subsequence queries for every factor of the input word.

Core claim

We give combinatorial characterisations of the sets of minimal and, respectively, shortest absent subsequences in a word, as well as compact representations of these sets; we show how we can test efficiently if a string is a shortest or minimal absent subsequence in a word, and we give efficient algorithms computing the lexicographically smallest absent subsequence of each kind; also, we show how a data structure for answering shortest absent subsequence-queries for the factors of a given string can be efficiently computed.

What carries the argument

Absent subsequence: a string u that does not occur as a (possibly non-contiguous) subsequence inside w; the minimal and shortest variants of this object carry the characterizations and algorithms.

If this is right

  • Membership of any candidate string in the set of shortest absent subsequences can be decided in linear time.
  • The lexicographically smallest shortest absent subsequence of any word can be produced by a polynomial-time procedure.
  • A data structure of size linear in the input word answers shortest-absent-subsequence queries for every one of its factors.
  • The same combinatorial rules also yield the lexicographically smallest minimal absent subsequence.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same characterizations may let one decide in polynomial time whether a given string avoids a prescribed set of subsequences.
  • The compact representations could be used to enumerate all minimal forbidden patterns in subsequence-avoiding languages.
  • Extending the data structure to report minimal absent subsequences would immediately give a way to compute the subsequence-avoidance index of every factor.

Load-bearing premise

The non-contiguous skipping allowed by subsequences still permits compact combinatorial descriptions and efficient algorithms, without creating intractable complexity.

What would settle it

A concrete word w together with an explicit enumeration of its shortest absent subsequences whose size or structure cannot be captured by the claimed compact representation.

Figures

Figures reproduced from arXiv: 2108.13968 by Florin Manea, Maria Kosche, Stefan Siemer, Tore Ko{\ss}.

Figure 1
Figure 1. Figure 1: Illustration of positions and intervals inside word [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
read the original abstract

An absent factor of a string $w$ is a string $u$ which does not occur as a contiguous substring (a.k.a. factor) inside $w$. We extend this well-studied notion and define absent subsequences: a string $u$ is an absent subsequence of a string $w$ if $u$ does not occur as subsequence (a.k.a. scattered factor) inside $w$. Of particular interest to us are minimal absent subsequences, i.e., absent subsequences whose every subsequence is not absent, and shortest absent subsequences, i.e., absent subsequences of minimal length. We show a series of combinatorial and algorithmic results regarding these two notions. For instance: we give combinatorial characterisations of the sets of minimal and, respectively, shortest absent subsequences in a word, as well as compact representations of these sets; we show how we can test efficiently if a string is a shortest or minimal absent subsequence in a word, and we give efficient algorithms computing the lexicographically smallest absent subsequence of each kind; also, we show how a data structure for answering shortest absent subsequence-queries for the factors of a given string can be efficiently computed.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper extends the notion of absent factors (contiguous substrings) of a word w to absent subsequences (non-contiguous scattered factors). It provides combinatorial characterizations of the sets of minimal absent subsequences and shortest absent subsequences, along with compact representations of these sets; it also gives efficient algorithms to test whether a given string is a shortest or minimal absent subsequence, to compute the lexicographically smallest absent subsequence of each kind, and to build a data structure supporting shortest absent subsequence queries on all factors of a given input string.

Significance. If the characterizations and algorithms are correct, the work meaningfully extends stringology and combinatorics on words from the contiguous case to subsequences. The approach of tracking last-occurrence positions appears to allow polynomial-time solutions using standard automata or array constructions, without introducing intractability from the non-contiguous relation. This could support applications in pattern avoidance and string processing.

minor comments (3)
  1. [Abstract] The abstract lists many results in a single dense paragraph; separating the combinatorial results from the algorithmic ones would improve readability.
  2. A short concrete example (e.g., a small word w together with its shortest and minimal absent subsequences) should be given immediately after the definitions to illustrate the new notions.
  3. The paper should explicitly compare its subsequence results to the corresponding known results for absent factors, highlighting where the non-contiguous case requires genuinely new arguments.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of our work extending absent factors to absent subsequences, including the combinatorial characterizations, compact representations, and efficient algorithms. The recommendation for minor revision is noted.

Circularity Check

0 steps flagged

No significant circularity; derivations are self-contained combinatorial proofs

full rationale

The paper defines absent subsequences, minimal absent subsequences, and shortest absent subsequences from first principles as extensions of the well-known absent factors concept. It then derives combinatorial characterizations, testing procedures, lex-smallest computation algorithms, and a query data structure via direct arguments on last-occurrence positions and subsequence relations. These steps rely on standard stringology constructions (adapted automata or position-tracking) rather than any self-referential fit, imported uniqueness theorem, or ansatz smuggled via citation. No equation reduces to its own input by construction, and no load-bearing claim collapses to a self-citation chain. The work is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper introduces new definitions but relies entirely on standard mathematical properties of strings and subsequences; no free parameters, invented entities, or ad-hoc axioms are indicated in the abstract.

axioms (1)
  • standard math Standard definitions and properties of strings, factors, and subsequences over finite alphabets from formal language theory
    The entire development builds on these background notions without stating additional assumptions.

pith-pipeline@v0.9.0 · 5737 in / 1221 out tokens · 30834 ms · 2026-05-24T13:16:21.884265+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

60 extracted references · 60 canonical work pages · 2 internal anchors

  1. [1]

    Subwords

    Sakarovitch J, Simon I. Subwords. In: Lothaire M (ed.), Combinatorics on Words, chapter 6, pp. 105–142. Cambridge University Press, 1997

  2. [2]

    Decidability, complexity, and expressiveness of first-order logic over the subword ordering

    Halfon S, Schnoebelen P, Zetzsche G. Decidability, complexity, and expressiveness of first-order logic over the subword ordering. In: Proc. LICS 2017. 2017 pp. 1–12

  3. [3]

    On the index of Simon’s congruence for piecewise testability

    Karandikar P, Kufleitner M, Schnoebelen P. On the index of Simon’s congruence for piecewise testability. Inf. Process. Lett., 2015. 115(4):515–519

  4. [4]

    The Height of Piecewise-Testable Languages with Applications in Logical Complexity

    Karandikar P, Schnoebelen P. The Height of Piecewise-Testable Languages with Applications in Logical Complexity. In: Proc. CSL 2016, volume 62 of LIPIcs. 2016 pp. 37:1–37:22

  5. [5]

    The height of piecewise-testable languages and the complexity of the logic of subwords

    Karandikar P, Schnoebelen P. The height of piecewise-testable languages and the complexity of the logic of subwords. Log. Methods Comput. Sci., 2019. 15(2)

  6. [6]

    The Subtrace Order and Counting First-Order Logic

    Kuske D. The Subtrace Order and Counting First-Order Logic. In: Proc. CSR 2020, volume 12159 of Lecture Notes in Computer Science. 2020 pp. 289–302

  7. [7]

    Languages Ordered by the Subword Order

    Kuske D, Zetzsche G. Languages Ordered by the Subword Order. In: Proc. FOSSACS 2019, volume 11425 of Lecture Notes in Computer Science. 2019 pp. 348–364

  8. [8]

    Hierarchies of events with dot-depth one - Ph.D

    Simon I. Hierarchies of events with dot-depth one - Ph.D. thesis. University of Waterloo, 1972

  9. [9]

    Piecewise testable events

    Simon I. Piecewise testable events. In: Autom. Theor. Form. Lang., 2nd GI Conf., volume 33 of LNCS. 1975 pp. 214–222

  10. [10]

    The Complexity of Downward Closure Comparisons

    Zetzsche G. The Complexity of Downward Closure Comparisons. In: Proc. ICALP 2016, volume 55 of LIPIcs. 2016 pp. 123:1–123:14

  11. [11]

    Testing k-binomial equivalence

    Freydenberger DD, Gawrychowski P, Karhum ¨aki J, Manea F, Rytter W. Testing k-binomial equivalence. In: Multidisciplinary Creativity, a collection of papers dedicated to G. P˘aun 65th birthday. 2015 pp. 239–

  12. [12]

    Available in CoRR abs/1509.00622. M. Kosche, T. Koß, F . Manea, S. Siemer/ Absent Subsequences in Words 33

  13. [13]

    Computing the k-binomial Complexity of the Thue-Morse Word

    Lejeune M, Leroy J, Rigo M. Computing the k-binomial Complexity of the Thue-Morse Word. In: Proc. DLT 2019, volume 11647 of Lecture Notes in Computer Science. 2019 pp. 278–291

  14. [14]

    Generalized Pascal triangle for binomial coefficients of words

    Leroy J, Rigo M, Stipulanti M. Generalized Pascal triangle for binomial coefficients of words. Electron. J. Combin., 2017. 24(1.44):36 pp

  15. [15]

    Subword Histories and Parikh Matrices

    Mateescu A, Salomaa A, Yu S. Subword Histories and Parikh Matrices. J. Comput. Syst. Sci. , 2004. 68(1):1–21

  16. [16]

    Another generalization of abelian equivalence: Binomial complexity of infinite words

    Rigo M, Salimov P. Another generalization of abelian equivalence: Binomial complexity of infinite words. Theor. Comput. Sci., 2015. 601:47–57

  17. [17]

    Connections Between Subwords and Certain Matrix Mappings

    Salomaa A. Connections Between Subwords and Certain Matrix Mappings. Theoret. Comput. Sci., 2005. 340(2):188–203

  18. [18]

    Absoluteness of subword inequality is undecidable

    Seki S. Absoluteness of subword inequality is undecidable. Theor. Comput. Sci., 2012. 418:116–120

  19. [19]

    Searching Subsequences

    Baeza-Yates RA. Searching Subsequences. Theor. Comput. Sci., 1991. 78(2):363–376

  20. [20]

    Sketching, Streaming, and Fine-Grained Complexity of (Weighted) LCS

    Bringmann K, Chaudhury BR. Sketching, Streaming, and Fine-Grained Complexity of (Weighted) LCS. In: Proc. FSTTCS 2018, volume 122 of LIPIcs. 2018 pp. 40:1–40:16

  21. [21]

    Multivariate Fine-Grained Complexity of Longest Common Subsequence

    Bringmann K, K ¨unnemann M. Multivariate Fine-Grained Complexity of Longest Common Subsequence. In: Proc. SODA 2018. 2018 pp. 1216–1235

  22. [22]

    The Complexity of Some Problems on Subsequences and Supersequences

    Maier D. The Complexity of Some Problems on Subsequences and Supersequences. J. ACM, 1978. 25(2):322–336

  23. [23]

    The String-to-String Correction Problem

    Wagner RA, Fischer MJ. The String-to-String Correction Problem. J. ACM, 1974. 21(1):168–173

  24. [24]

    Time Warps, String Edits, and Macromolecules The Theory and Practice of Se- quence Comparison

    Sankoff D, Kruskal J. Time Warps, String Edits, and Macromolecules The Theory and Practice of Se- quence Comparison. Cambridge University Press, 2000 (reprinted). Originally published in 1983

  25. [25]

    The Consequences of Imre Simon’s Work in the Theory of Automata, Languages, and Semigroups

    Pin J. The Consequences of Imre Simon’s Work in the Theory of Automata, Languages, and Semigroups. In: Proc. LATIN 2004, volume 2976 of Lecture Notes in Computer Science. 2004 p. 5

  26. [26]

    The influence of Imre Simon’s work in the theory of automata, languages and semigroups

    Pin J. The influence of Imre Simon’s work in the theory of automata, languages and semigroups. Semi- group Forum, 2019. 98:1–8

  27. [27]

    Directed acyclic subsequence graph - Overview

    Crochemore M, Melichar B, Tron ´ıcek Z. Directed acyclic subsequence graph - Overview. J. Discrete Algorithms, 2003. 1(3-4):255–280

  28. [28]

    Testing Simon’s congruence

    Fleischer L, Kufleitner M. Testing Simon’s congruence. In: Proc. MFCS 2018, volume 117 of LIPIcs. 2018 pp. 62:1–62:13

  29. [29]

    An algorithm for distinguishing efficiently bit-strings by their subsequences

    Hebrard JJ. An algorithm for distinguishing efficiently bit-strings by their subsequences. Theoretical computer science, 1991. 82(1):35–49

  30. [30]

    Minimal Separators of Two Words

    Garel E. Minimal Separators of Two Words. In: Proc. CPM 1993, volume 684 of Lecture Notes in Computer Science. 1993 pp. 35–53

  31. [31]

    Words distinguished by their subwords (extended Abstract)

    Simon I. Words distinguished by their subwords (extended Abstract). In: Proc. WORDS 2003, volume 27 of TUCS General Publication. 2003 pp. 6–13

  32. [32]

    Common Subsequence Automaton

    Tron ´ıcek Z. Common Subsequence Automaton. In: Proc. CIAA 2002 (Revised Papers), volume 2608 of Lecture Notes in Computer Science. 2002 pp. 270–275

  33. [33]

    Scattered Factor-Universality of Words

    Barker L, Fleischmann P, Harwardt K, Manea F, Nowotka D. Scattered Factor-Universality of Words. In: Jonoska N, Savchuk D (eds.), Proc. DLT 2020, volume 12086 ofLecture Notes in Computer Science. 2020 pp. 14–28. 34 M. Kosche, T. Koß, F . Manea, S. Siemer/ Absent Subsequences in Words

  34. [34]

    Efficiently Testing Simon’s Congruence

    Gawrychowski P, Kosche M, Koß T, Manea F, Siemer S. Efficiently Testing Simon’s Congruence. In: Bl¨aser M, Monmege B (eds.), 38th International Symposium on Theoretical Aspects of Computer Science, STACS 2021, March 16-19, 2021, Saarbr ¨ucken, Germany (Virtual Conference), volume 187 of LIPIcs. Schloss Dagstuhl - Leibniz-Zentrum f ¨ur Informatik, 2021 pp....

  35. [35]

    The Edit Distance to k-Subsequence Universality

    Day JD, Fleischmann P, Kosche M, Koß T, Manea F, Siemer S. The Edit Distance to k-Subsequence Universality. In: Bl ¨aser M, Monmege B (eds.), 38th International Symposium on Theoretical Aspects of Computer Science, STACS 2021, March 16-19, 2021, Saarbr ¨ucken, Germany (Virtual Conference), volume 187 of LIPIcs. Schloss Dagstuhl - Leibniz-Zentrum f ¨ur Inf...

  36. [36]

    Linear-time computation of minimal absent words using suffix array

    Barton C, Heliou A, Mouchard L, Pissis SP. Linear-time computation of minimal absent words using suffix array. BMC bioinformatics, 2014. 15(1):1–10

  37. [37]

    Using minimal absent words to build phylogeny

    Chairungsee S, Crochemore M. Using minimal absent words to build phylogeny. Theoretical Computer Science, 2012. 450:109–116

  38. [38]

    Alignment-free sequence comparison using absent words

    Charalampopoulos P, Crochemore M, Fici G, Mercas ¸ R, Pissis SP. Alignment-free sequence comparison using absent words. Information and Computation, 2018. 262:57–68

  39. [39]

    Data compression using antidictionaries

    Crochemore M, Mignosi F, Restivo A, Salemi S. Data compression using antidictionaries. Proceedings of the IEEE, 2000. 88(11):1756–1768

  40. [40]

    Persistent minimal sequences of SARS-CoV-2

    Pratas D, Silva JM. Persistent minimal sequences of SARS-CoV-2. Bioinformatics, 2020

  41. [41]

    Three minimal sequences found in Ebola virus genomes and absent from human DNA

    Silva RM, Pratas D, Castro L, Pinho AJ, Ferreira PJ. Three minimal sequences found in Ebola virus genomes and absent from human DNA. Bioinformatics, 2015. 31(15):2421–2425

  42. [42]

    Constructing Strings Avoiding Forbidden Substrings

    Bernardini G, Marchetti-Spaccamela A, Pissis S, Stougie L, Sweering M. Constructing Strings Avoiding Forbidden Substrings. to appear, Proc. CPM 2021, 2021

  43. [43]

    Automata and forbidden words

    Crochemore M, Mignosi F, Restivo A. Automata and forbidden words. Information Processing Letters,

  44. [44]

    Minimal absent words in rooted and unrooted trees

    Fici G, Gawrychowski P. Minimal absent words in rooted and unrooted trees. In: International Symposium on String Processing and Information Retrieval. Springer, 2019 pp. 152–161

  45. [45]

    Word assembly through minimal forbidden words.Theoretical Computer Science, 2006

    Fici G, Mignosi F, Restivo A, Sciortino M. Word assembly through minimal forbidden words.Theoretical Computer Science, 2006. 359(1-3):214–230

  46. [46]

    Minimal forbidden factors of circular words

    Fici G, Restivo A, Rizzo L. Minimal forbidden factors of circular words. Theoretical Computer Science,

  47. [47]

    Minimal Unique Substrings and Minimal Absent Words in a Sliding Window

    Nakashima Y , Inenaga S, Bannai H, Takeda M. Minimal Unique Substrings and Minimal Absent Words in a Sliding Window. In: SOFSEM 2020: Theory and Practice of Computer Science: 46th International Conference on Current Trends in Theory and Practice of Informatics, SOFSEM 2020, Limassol, Cyprus, January 20–24, 2020, Proceedings, volume 12011. Springer Nature,...

  48. [48]

    Words and forbidden factors

    Mignosi F, Restivo A, Sciortino M. Words and forbidden factors. Theoretical Computer Science, 2002. 273(1-2):99–117

  49. [49]

    Constructing antidictionaries in output-sensitive space

    Ayad LA, Badkobeh G, Fici G, H ´eliou A, Pissis SP. Constructing antidictionaries in output-sensitive space. In: 2019 Data Compression Conference (DCC). IEEE, 2019 pp. 538–547

  50. [50]

    Internal Shortest Absent Word Queries

    Badkobeh G, Charalampopoulos P, Pissis S. Internal Shortest Absent Word Queries. to appear, Proc. CPM 2021, 2021

  51. [51]

    Parallelising the computation of minimal absent words

    Barton C, H ´eliou A, Mouchard L, Pissis SP. Parallelising the computation of minimal absent words. In: Parallel Processing and Applied Mathematics, pp. 243–253. Springer, 2016. M. Kosche, T. Koß, F . Manea, S. Siemer/ Absent Subsequences in Words 35

  52. [52]

    On extended special factors of a word

    Charalampopoulos P, Crochemore M, Pissis SP. On extended special factors of a word. In: International Symposium on String Processing and Information Retrieval. Springer, 2018 pp. 131–138

  53. [53]

    Absent words in a sliding window with applications

    Crochemore M, H ´eliou A, Kucherov G, Mouchard L, Pissis SP, Ramusat Y . Absent words in a sliding window with applications. Information and Computation, 2020. 270:104461

  54. [54]

    Computing DAWGs and minimal absent words in linear time for integer alphabets

    Fujishige Y , Tsujimaru Y , Inenaga S, Bannai H, Takeda M. Computing DAWGs and minimal absent words in linear time for integer alphabets. In: 41st International Symposium on Mathematical Foundations of Computer Science (MFCS 2016). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2016

  55. [55]

    Patterns in Permutations and Words

    Kitaev S. Patterns in Permutations and Words. Monographs in Theoretical Computer Science. An EATCS Series. Springer, 2011. ISBN 978-3-642-17332-5. doi:10.1007/978-3-642-17333-2

  56. [56]

    Algorithms on strings

    Crochemore M, Hancart C, Lecroq T. Algorithms on strings. Cambridge University Press, 2007. ISBN 978-0-521-84899-2

  57. [57]

    The Level Ancestor Problem simplified

    Bender MA, Farach-Colton M. The Level Ancestor Problem simplified. Theor. Comput. Sci. , 2004. 321(1):5–12. doi:10.1016/j.tcs.2003.05.002

  58. [58]

    The Euler Path to Static Level-Ancestors

    Ben-Amram AM. The Euler Path to Static Level-Ancestors. CoRR, 2009. abs/0909.1030. 0909.1030, URL http://arxiv.org/abs/0909.1030

  59. [59]

    Algorithms, 4th Edition

    Sedgewick R, Wayne K. Algorithms, 4th Edition. Addison-Wesley, 2011. ISBN 978-0-321-57351-3

  60. [60]

    The LCA Problem Revisited

    Bender MA, Farach-Colton M. The LCA Problem Revisited. In: Gonnet GH, Panario D, Viola A (eds.), LATIN 2000: Theoretical Informatics, 4th Latin American Symposium, Punta del Este, Uruguay, April 10-14, 2000, Proceedings, volume 1776 of Lecture Notes in Computer Science. Springer, 2000 pp. 88–94. doi:10.1007/10719839\ 9