Absent Subsequences in Words
Pith reviewed 2026-05-24 13:16 UTC · model grok-4.3
The pith
Absent subsequences in a word admit combinatorial characterizations that support efficient algorithms for the minimal and shortest ones.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We give combinatorial characterisations of the sets of minimal and, respectively, shortest absent subsequences in a word, as well as compact representations of these sets; we show how we can test efficiently if a string is a shortest or minimal absent subsequence in a word, and we give efficient algorithms computing the lexicographically smallest absent subsequence of each kind; also, we show how a data structure for answering shortest absent subsequence-queries for the factors of a given string can be efficiently computed.
What carries the argument
Absent subsequence: a string u that does not occur as a (possibly non-contiguous) subsequence inside w; the minimal and shortest variants of this object carry the characterizations and algorithms.
If this is right
- Membership of any candidate string in the set of shortest absent subsequences can be decided in linear time.
- The lexicographically smallest shortest absent subsequence of any word can be produced by a polynomial-time procedure.
- A data structure of size linear in the input word answers shortest-absent-subsequence queries for every one of its factors.
- The same combinatorial rules also yield the lexicographically smallest minimal absent subsequence.
Where Pith is reading between the lines
- The same characterizations may let one decide in polynomial time whether a given string avoids a prescribed set of subsequences.
- The compact representations could be used to enumerate all minimal forbidden patterns in subsequence-avoiding languages.
- Extending the data structure to report minimal absent subsequences would immediately give a way to compute the subsequence-avoidance index of every factor.
Load-bearing premise
The non-contiguous skipping allowed by subsequences still permits compact combinatorial descriptions and efficient algorithms, without creating intractable complexity.
What would settle it
A concrete word w together with an explicit enumeration of its shortest absent subsequences whose size or structure cannot be captured by the claimed compact representation.
Figures
read the original abstract
An absent factor of a string $w$ is a string $u$ which does not occur as a contiguous substring (a.k.a. factor) inside $w$. We extend this well-studied notion and define absent subsequences: a string $u$ is an absent subsequence of a string $w$ if $u$ does not occur as subsequence (a.k.a. scattered factor) inside $w$. Of particular interest to us are minimal absent subsequences, i.e., absent subsequences whose every subsequence is not absent, and shortest absent subsequences, i.e., absent subsequences of minimal length. We show a series of combinatorial and algorithmic results regarding these two notions. For instance: we give combinatorial characterisations of the sets of minimal and, respectively, shortest absent subsequences in a word, as well as compact representations of these sets; we show how we can test efficiently if a string is a shortest or minimal absent subsequence in a word, and we give efficient algorithms computing the lexicographically smallest absent subsequence of each kind; also, we show how a data structure for answering shortest absent subsequence-queries for the factors of a given string can be efficiently computed.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper extends the notion of absent factors (contiguous substrings) of a word w to absent subsequences (non-contiguous scattered factors). It provides combinatorial characterizations of the sets of minimal absent subsequences and shortest absent subsequences, along with compact representations of these sets; it also gives efficient algorithms to test whether a given string is a shortest or minimal absent subsequence, to compute the lexicographically smallest absent subsequence of each kind, and to build a data structure supporting shortest absent subsequence queries on all factors of a given input string.
Significance. If the characterizations and algorithms are correct, the work meaningfully extends stringology and combinatorics on words from the contiguous case to subsequences. The approach of tracking last-occurrence positions appears to allow polynomial-time solutions using standard automata or array constructions, without introducing intractability from the non-contiguous relation. This could support applications in pattern avoidance and string processing.
minor comments (3)
- [Abstract] The abstract lists many results in a single dense paragraph; separating the combinatorial results from the algorithmic ones would improve readability.
- A short concrete example (e.g., a small word w together with its shortest and minimal absent subsequences) should be given immediately after the definitions to illustrate the new notions.
- The paper should explicitly compare its subsequence results to the corresponding known results for absent factors, highlighting where the non-contiguous case requires genuinely new arguments.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of our work extending absent factors to absent subsequences, including the combinatorial characterizations, compact representations, and efficient algorithms. The recommendation for minor revision is noted.
Circularity Check
No significant circularity; derivations are self-contained combinatorial proofs
full rationale
The paper defines absent subsequences, minimal absent subsequences, and shortest absent subsequences from first principles as extensions of the well-known absent factors concept. It then derives combinatorial characterizations, testing procedures, lex-smallest computation algorithms, and a query data structure via direct arguments on last-occurrence positions and subsequence relations. These steps rely on standard stringology constructions (adapted automata or position-tracking) rather than any self-referential fit, imported uniqueness theorem, or ansatz smuggled via citation. No equation reduces to its own input by construction, and no load-bearing claim collapses to a self-citation chain. The work is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Standard definitions and properties of strings, factors, and subsequences over finite alphabets from formal language theory
Reference graph
Works this paper leans on
- [1]
-
[2]
Decidability, complexity, and expressiveness of first-order logic over the subword ordering
Halfon S, Schnoebelen P, Zetzsche G. Decidability, complexity, and expressiveness of first-order logic over the subword ordering. In: Proc. LICS 2017. 2017 pp. 1–12
work page 2017
-
[3]
On the index of Simon’s congruence for piecewise testability
Karandikar P, Kufleitner M, Schnoebelen P. On the index of Simon’s congruence for piecewise testability. Inf. Process. Lett., 2015. 115(4):515–519
work page 2015
-
[4]
The Height of Piecewise-Testable Languages with Applications in Logical Complexity
Karandikar P, Schnoebelen P. The Height of Piecewise-Testable Languages with Applications in Logical Complexity. In: Proc. CSL 2016, volume 62 of LIPIcs. 2016 pp. 37:1–37:22
work page 2016
-
[5]
The height of piecewise-testable languages and the complexity of the logic of subwords
Karandikar P, Schnoebelen P. The height of piecewise-testable languages and the complexity of the logic of subwords. Log. Methods Comput. Sci., 2019. 15(2)
work page 2019
-
[6]
The Subtrace Order and Counting First-Order Logic
Kuske D. The Subtrace Order and Counting First-Order Logic. In: Proc. CSR 2020, volume 12159 of Lecture Notes in Computer Science. 2020 pp. 289–302
work page 2020
-
[7]
Languages Ordered by the Subword Order
Kuske D, Zetzsche G. Languages Ordered by the Subword Order. In: Proc. FOSSACS 2019, volume 11425 of Lecture Notes in Computer Science. 2019 pp. 348–364
work page 2019
-
[8]
Hierarchies of events with dot-depth one - Ph.D
Simon I. Hierarchies of events with dot-depth one - Ph.D. thesis. University of Waterloo, 1972
work page 1972
-
[9]
Simon I. Piecewise testable events. In: Autom. Theor. Form. Lang., 2nd GI Conf., volume 33 of LNCS. 1975 pp. 214–222
work page 1975
-
[10]
The Complexity of Downward Closure Comparisons
Zetzsche G. The Complexity of Downward Closure Comparisons. In: Proc. ICALP 2016, volume 55 of LIPIcs. 2016 pp. 123:1–123:14
work page 2016
-
[11]
Testing k-binomial equivalence
Freydenberger DD, Gawrychowski P, Karhum ¨aki J, Manea F, Rytter W. Testing k-binomial equivalence. In: Multidisciplinary Creativity, a collection of papers dedicated to G. P˘aun 65th birthday. 2015 pp. 239–
work page 2015
-
[12]
Available in CoRR abs/1509.00622. M. Kosche, T. Koß, F . Manea, S. Siemer/ Absent Subsequences in Words 33
work page internal anchor Pith review Pith/arXiv arXiv
-
[13]
Computing the k-binomial Complexity of the Thue-Morse Word
Lejeune M, Leroy J, Rigo M. Computing the k-binomial Complexity of the Thue-Morse Word. In: Proc. DLT 2019, volume 11647 of Lecture Notes in Computer Science. 2019 pp. 278–291
work page 2019
-
[14]
Generalized Pascal triangle for binomial coefficients of words
Leroy J, Rigo M, Stipulanti M. Generalized Pascal triangle for binomial coefficients of words. Electron. J. Combin., 2017. 24(1.44):36 pp
work page 2017
-
[15]
Subword Histories and Parikh Matrices
Mateescu A, Salomaa A, Yu S. Subword Histories and Parikh Matrices. J. Comput. Syst. Sci. , 2004. 68(1):1–21
work page 2004
-
[16]
Another generalization of abelian equivalence: Binomial complexity of infinite words
Rigo M, Salimov P. Another generalization of abelian equivalence: Binomial complexity of infinite words. Theor. Comput. Sci., 2015. 601:47–57
work page 2015
-
[17]
Connections Between Subwords and Certain Matrix Mappings
Salomaa A. Connections Between Subwords and Certain Matrix Mappings. Theoret. Comput. Sci., 2005. 340(2):188–203
work page 2005
-
[18]
Absoluteness of subword inequality is undecidable
Seki S. Absoluteness of subword inequality is undecidable. Theor. Comput. Sci., 2012. 418:116–120
work page 2012
-
[19]
Baeza-Yates RA. Searching Subsequences. Theor. Comput. Sci., 1991. 78(2):363–376
work page 1991
-
[20]
Sketching, Streaming, and Fine-Grained Complexity of (Weighted) LCS
Bringmann K, Chaudhury BR. Sketching, Streaming, and Fine-Grained Complexity of (Weighted) LCS. In: Proc. FSTTCS 2018, volume 122 of LIPIcs. 2018 pp. 40:1–40:16
work page 2018
-
[21]
Multivariate Fine-Grained Complexity of Longest Common Subsequence
Bringmann K, K ¨unnemann M. Multivariate Fine-Grained Complexity of Longest Common Subsequence. In: Proc. SODA 2018. 2018 pp. 1216–1235
work page 2018
-
[22]
The Complexity of Some Problems on Subsequences and Supersequences
Maier D. The Complexity of Some Problems on Subsequences and Supersequences. J. ACM, 1978. 25(2):322–336
work page 1978
-
[23]
The String-to-String Correction Problem
Wagner RA, Fischer MJ. The String-to-String Correction Problem. J. ACM, 1974. 21(1):168–173
work page 1974
-
[24]
Time Warps, String Edits, and Macromolecules The Theory and Practice of Se- quence Comparison
Sankoff D, Kruskal J. Time Warps, String Edits, and Macromolecules The Theory and Practice of Se- quence Comparison. Cambridge University Press, 2000 (reprinted). Originally published in 1983
work page 2000
-
[25]
The Consequences of Imre Simon’s Work in the Theory of Automata, Languages, and Semigroups
Pin J. The Consequences of Imre Simon’s Work in the Theory of Automata, Languages, and Semigroups. In: Proc. LATIN 2004, volume 2976 of Lecture Notes in Computer Science. 2004 p. 5
work page 2004
-
[26]
The influence of Imre Simon’s work in the theory of automata, languages and semigroups
Pin J. The influence of Imre Simon’s work in the theory of automata, languages and semigroups. Semi- group Forum, 2019. 98:1–8
work page 2019
-
[27]
Directed acyclic subsequence graph - Overview
Crochemore M, Melichar B, Tron ´ıcek Z. Directed acyclic subsequence graph - Overview. J. Discrete Algorithms, 2003. 1(3-4):255–280
work page 2003
-
[28]
Fleischer L, Kufleitner M. Testing Simon’s congruence. In: Proc. MFCS 2018, volume 117 of LIPIcs. 2018 pp. 62:1–62:13
work page 2018
-
[29]
An algorithm for distinguishing efficiently bit-strings by their subsequences
Hebrard JJ. An algorithm for distinguishing efficiently bit-strings by their subsequences. Theoretical computer science, 1991. 82(1):35–49
work page 1991
-
[30]
Minimal Separators of Two Words
Garel E. Minimal Separators of Two Words. In: Proc. CPM 1993, volume 684 of Lecture Notes in Computer Science. 1993 pp. 35–53
work page 1993
-
[31]
Words distinguished by their subwords (extended Abstract)
Simon I. Words distinguished by their subwords (extended Abstract). In: Proc. WORDS 2003, volume 27 of TUCS General Publication. 2003 pp. 6–13
work page 2003
-
[32]
Tron ´ıcek Z. Common Subsequence Automaton. In: Proc. CIAA 2002 (Revised Papers), volume 2608 of Lecture Notes in Computer Science. 2002 pp. 270–275
work page 2002
-
[33]
Scattered Factor-Universality of Words
Barker L, Fleischmann P, Harwardt K, Manea F, Nowotka D. Scattered Factor-Universality of Words. In: Jonoska N, Savchuk D (eds.), Proc. DLT 2020, volume 12086 ofLecture Notes in Computer Science. 2020 pp. 14–28. 34 M. Kosche, T. Koß, F . Manea, S. Siemer/ Absent Subsequences in Words
work page 2020
-
[34]
Efficiently Testing Simon’s Congruence
Gawrychowski P, Kosche M, Koß T, Manea F, Siemer S. Efficiently Testing Simon’s Congruence. In: Bl¨aser M, Monmege B (eds.), 38th International Symposium on Theoretical Aspects of Computer Science, STACS 2021, March 16-19, 2021, Saarbr ¨ucken, Germany (Virtual Conference), volume 187 of LIPIcs. Schloss Dagstuhl - Leibniz-Zentrum f ¨ur Informatik, 2021 pp....
-
[35]
The Edit Distance to k-Subsequence Universality
Day JD, Fleischmann P, Kosche M, Koß T, Manea F, Siemer S. The Edit Distance to k-Subsequence Universality. In: Bl ¨aser M, Monmege B (eds.), 38th International Symposium on Theoretical Aspects of Computer Science, STACS 2021, March 16-19, 2021, Saarbr ¨ucken, Germany (Virtual Conference), volume 187 of LIPIcs. Schloss Dagstuhl - Leibniz-Zentrum f ¨ur Inf...
-
[36]
Linear-time computation of minimal absent words using suffix array
Barton C, Heliou A, Mouchard L, Pissis SP. Linear-time computation of minimal absent words using suffix array. BMC bioinformatics, 2014. 15(1):1–10
work page 2014
-
[37]
Using minimal absent words to build phylogeny
Chairungsee S, Crochemore M. Using minimal absent words to build phylogeny. Theoretical Computer Science, 2012. 450:109–116
work page 2012
-
[38]
Alignment-free sequence comparison using absent words
Charalampopoulos P, Crochemore M, Fici G, Mercas ¸ R, Pissis SP. Alignment-free sequence comparison using absent words. Information and Computation, 2018. 262:57–68
work page 2018
-
[39]
Data compression using antidictionaries
Crochemore M, Mignosi F, Restivo A, Salemi S. Data compression using antidictionaries. Proceedings of the IEEE, 2000. 88(11):1756–1768
work page 2000
-
[40]
Persistent minimal sequences of SARS-CoV-2
Pratas D, Silva JM. Persistent minimal sequences of SARS-CoV-2. Bioinformatics, 2020
work page 2020
-
[41]
Three minimal sequences found in Ebola virus genomes and absent from human DNA
Silva RM, Pratas D, Castro L, Pinho AJ, Ferreira PJ. Three minimal sequences found in Ebola virus genomes and absent from human DNA. Bioinformatics, 2015. 31(15):2421–2425
work page 2015
-
[42]
Constructing Strings Avoiding Forbidden Substrings
Bernardini G, Marchetti-Spaccamela A, Pissis S, Stougie L, Sweering M. Constructing Strings Avoiding Forbidden Substrings. to appear, Proc. CPM 2021, 2021
work page 2021
-
[43]
Crochemore M, Mignosi F, Restivo A. Automata and forbidden words. Information Processing Letters,
-
[44]
Minimal absent words in rooted and unrooted trees
Fici G, Gawrychowski P. Minimal absent words in rooted and unrooted trees. In: International Symposium on String Processing and Information Retrieval. Springer, 2019 pp. 152–161
work page 2019
-
[45]
Word assembly through minimal forbidden words.Theoretical Computer Science, 2006
Fici G, Mignosi F, Restivo A, Sciortino M. Word assembly through minimal forbidden words.Theoretical Computer Science, 2006. 359(1-3):214–230
work page 2006
-
[46]
Minimal forbidden factors of circular words
Fici G, Restivo A, Rizzo L. Minimal forbidden factors of circular words. Theoretical Computer Science,
-
[47]
Minimal Unique Substrings and Minimal Absent Words in a Sliding Window
Nakashima Y , Inenaga S, Bannai H, Takeda M. Minimal Unique Substrings and Minimal Absent Words in a Sliding Window. In: SOFSEM 2020: Theory and Practice of Computer Science: 46th International Conference on Current Trends in Theory and Practice of Informatics, SOFSEM 2020, Limassol, Cyprus, January 20–24, 2020, Proceedings, volume 12011. Springer Nature,...
work page 2020
-
[48]
Mignosi F, Restivo A, Sciortino M. Words and forbidden factors. Theoretical Computer Science, 2002. 273(1-2):99–117
work page 2002
-
[49]
Constructing antidictionaries in output-sensitive space
Ayad LA, Badkobeh G, Fici G, H ´eliou A, Pissis SP. Constructing antidictionaries in output-sensitive space. In: 2019 Data Compression Conference (DCC). IEEE, 2019 pp. 538–547
work page 2019
-
[50]
Internal Shortest Absent Word Queries
Badkobeh G, Charalampopoulos P, Pissis S. Internal Shortest Absent Word Queries. to appear, Proc. CPM 2021, 2021
work page 2021
-
[51]
Parallelising the computation of minimal absent words
Barton C, H ´eliou A, Mouchard L, Pissis SP. Parallelising the computation of minimal absent words. In: Parallel Processing and Applied Mathematics, pp. 243–253. Springer, 2016. M. Kosche, T. Koß, F . Manea, S. Siemer/ Absent Subsequences in Words 35
work page 2016
-
[52]
On extended special factors of a word
Charalampopoulos P, Crochemore M, Pissis SP. On extended special factors of a word. In: International Symposium on String Processing and Information Retrieval. Springer, 2018 pp. 131–138
work page 2018
-
[53]
Absent words in a sliding window with applications
Crochemore M, H ´eliou A, Kucherov G, Mouchard L, Pissis SP, Ramusat Y . Absent words in a sliding window with applications. Information and Computation, 2020. 270:104461
work page 2020
-
[54]
Computing DAWGs and minimal absent words in linear time for integer alphabets
Fujishige Y , Tsujimaru Y , Inenaga S, Bannai H, Takeda M. Computing DAWGs and minimal absent words in linear time for integer alphabets. In: 41st International Symposium on Mathematical Foundations of Computer Science (MFCS 2016). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2016
work page 2016
-
[55]
Patterns in Permutations and Words
Kitaev S. Patterns in Permutations and Words. Monographs in Theoretical Computer Science. An EATCS Series. Springer, 2011. ISBN 978-3-642-17332-5. doi:10.1007/978-3-642-17333-2
-
[56]
Crochemore M, Hancart C, Lecroq T. Algorithms on strings. Cambridge University Press, 2007. ISBN 978-0-521-84899-2
work page 2007
-
[57]
The Level Ancestor Problem simplified
Bender MA, Farach-Colton M. The Level Ancestor Problem simplified. Theor. Comput. Sci. , 2004. 321(1):5–12. doi:10.1016/j.tcs.2003.05.002
-
[58]
The Euler Path to Static Level-Ancestors
Ben-Amram AM. The Euler Path to Static Level-Ancestors. CoRR, 2009. abs/0909.1030. 0909.1030, URL http://arxiv.org/abs/0909.1030
work page internal anchor Pith review Pith/arXiv arXiv 2009
-
[59]
Sedgewick R, Wayne K. Algorithms, 4th Edition. Addison-Wesley, 2011. ISBN 978-0-321-57351-3
work page 2011
-
[60]
Bender MA, Farach-Colton M. The LCA Problem Revisited. In: Gonnet GH, Panario D, Viola A (eds.), LATIN 2000: Theoretical Informatics, 4th Latin American Symposium, Punta del Este, Uruguay, April 10-14, 2000, Proceedings, volume 1776 of Lecture Notes in Computer Science. Springer, 2000 pp. 88–94. doi:10.1007/10719839\ 9
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.