Tractable Gap-Constraint Languages for Complex Event Recognition
Pith reviewed 2026-06-26 18:58 UTC · model grok-4.3
The pith
Subsequence matching with gap-constraints is solvable in linear time if the constraint languages are left-convex.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Subsequence matching with gap-constraints over arbitrary interval structures can be solved in O(|D|(|u| + |C|)) time whenever the gap-constraint languages are left-convex. Left-convex languages remain expressive enough to model length constraints and similar real-world rules from CER. The same algorithm supports efficient enumeration of satisfying embeddings. Adding even one non-left-convex language such as {aa, ε} alongside length constraints makes the problem NP-complete again.
What carries the argument
The left-convexity property on each gap-constraint language L: if uvw ∈ L and v ∈ L then uv ∈ L. This property enables a single left-to-right scan of D that maintains a compact state for each position in u while respecting all constraints.
If this is right
- Matching can be decided in time linear in the data length regardless of how the constraint intervals nest or overlap.
- All satisfying embeddings can be enumerated without increasing the asymptotic time bound beyond a polynomial factor per output.
- The same linear-time bound holds when the only constraints are length bounds of the form a ≤ |w| ≤ b.
- The hardness result shows that left-convexity is essentially tight: replacing any one language by a non-left-convex alternative restores NP-completeness even when all other languages remain length constraints.
Where Pith is reading between the lines
- The linear-time algorithm may extend to streaming or online variants of complex event recognition where D arrives incrementally.
- Similar convexity-style restrictions could separate tractable from intractable cases in other string problems that combine subsequence matching with regular constraints.
- The enumeration procedure may be useful for downstream tasks that require ranking or sampling among valid embeddings rather than deciding existence.
Load-bearing premise
Left-convex languages are expressive enough to capture the gap rules needed in real-world complex event recognition applications.
What would settle it
An explicit left-convex language together with a family of inputs where every correct algorithm requires superlinear time in |D|, or a concrete CER scenario whose gap rules cannot be expressed by any collection of left-convex languages.
Figures
read the original abstract
For strings $u, D \in \Sigma^*$, a subsequence embedding of $u$ in $D$ is a function $e \colon \{1, 2, \ldots, |u|\} \to \{1, 2, \ldots, |D|\}$ with $e(i) < e(i+1)$ for every $i \in \{1, 2, \ldots, |u|-1\}$ and the $i$-th symbol of $u$ equals the $e(i)$-th symbol of $D$. A gap-constraint for $u$ is a triple $(i, j, L)$ with $1 \leq i < j \leq |u|$ and $L$ is a regular language over $\Sigma$. An embedding $e$ satisfies a gap-constraint $(i, j, L)$ if the factor of $D$ strictly between positions $e(i)$ and $e(j)$ is a word from $L$. We investigate the subsequence matching problem with gap-constraints, which is relevant in the context of complex event recognition (CER): given $u, D \in \Sigma^*$ and a set $C$ of gap-constraints, find an embedding of $u$ in $D$ that satisfies all gap-constraints from $C$. In general, subsequence matching is NP-complete and the only known tractable variants restrict the interval structure of the gap-constraints. In this work, we show that we can solve subsequence matching with gap-constraints with an arbitrary interval structure rather efficiently (in fact, optimally under SETH) in time $O(|D| (|u| + |C|))$ if the gap-constraint languages satisfy a property which we dub left-convexity: whenever $u v w \in L$ and $v \in L$, then also $uv \in L$. Left-convex languages are sufficiently expressive to model interesting real-world scenarios considered in CER, e.g., length constraints $L = \{w \mid a \leq |w| \leq b\}$ for $a, b \in \mathbb{N}$. We also show how our algorithm can be used in order to efficiently enumerate all satisfying embeddings, which is particularly relevant for possible applications in CER. Finally, we show how non-left-convex languages can lead to intractability, i.e., if in addition to length constraints we allow $\{aa, \epsilon\}$ as the only non-left-convex constraint language, then the problem is NP-complete again.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper studies the subsequence matching problem with gap-constraints, where a pattern string u must be embedded into a text D such that each gap-constraint (i,j,L) is satisfied by the intervening factor belonging to a regular language L. It defines left-convexity (if uvw ∈ L and v ∈ L then uv ∈ L) and claims that when all languages in the constraint set C are left-convex, the problem can be solved in O(|D|(|u| + |C|)) time for arbitrary interval structures; this bound is shown to be tight under SETH. The paper also gives an enumeration algorithm for all satisfying embeddings and proves NP-completeness when non-left-convex languages (such as {aa, ε} together with length constraints) are allowed. Left-convex languages are argued to capture useful CER constraints such as bounded-length gaps.
Significance. If the algorithmic result and matching lower bound hold, the work identifies a natural, expressive restriction on gap-constraint languages that permits efficient matching with completely general interval structures, improving on prior tractable fragments that required restricted interval graphs. The SETH-based optimality and the enumeration procedure are concrete strengths. The result directly addresses a practical gap in complex event recognition.
major comments (1)
- [abstract / main theorem] Abstract and the statement of the main algorithmic result: the claimed running time O(|D|(|u| + |C|)) contains no dependence on the sizes of the minimal DFAs (or NFAs) representing the languages in C. Left-convexity does not bound DFA size; arbitrarily large DFAs satisfy the property. The algorithm description must therefore either (a) include the total automaton size in the bound or (b) state an assumption on constant-size automata or succinct representation. Without this clarification the central complexity claim is not yet justified.
minor comments (2)
- [preliminaries] The definition of left-convexity is given only in the abstract; a formal definition with an example in the preliminaries section would improve readability.
- [hardness section] The hardness construction for non-left-convex languages should explicitly reference the interval structure used, to allow direct comparison with the tractable case.
Simulated Author's Rebuttal
We thank the referee for their careful reading and for highlighting this important point regarding the complexity bound. We address the major comment below.
read point-by-point responses
-
Referee: [abstract / main theorem] Abstract and the statement of the main algorithmic result: the claimed running time O(|D|(|u| + |C|)) contains no dependence on the sizes of the minimal DFAs (or NFAs) representing the languages in C. Left-convexity does not bound DFA size; arbitrarily large DFAs satisfy the property. The algorithm description must therefore either (a) include the total automaton size in the bound or (b) state an assumption on constant-size automata or succinct representation. Without this clarification the central complexity claim is not yet justified.
Authors: We agree with the referee. Left-convexity does not bound automaton size, and our algorithm (which processes each constraint via its DFA) incurs time linear in the total automaton size. The stated bound O(|D|(|u| + |C|)) is therefore incomplete as written. We will revise the abstract, Theorem 1, and the algorithm analysis to state the running time as O(|D|(|u| + |C| + Σ_L |A_L|)), where |A_L| is the size of the automaton for each L ∈ C. The linear dependence on |D| and the SETH-tightness (with the automaton-size term made explicit) remain unchanged. revision: yes
Circularity Check
No circularity: new algorithmic construction with independent complexity classification
full rationale
The paper defines left-convexity as a new property on regular languages and derives a matching algorithm whose time bound is stated directly from the construction. No quoted step reduces a claimed prediction or result to a fitted parameter, self-citation, or definitional renaming inside the paper. The derivation is self-contained against external benchmarks (standard automata algorithms plus the new convexity property) and does not invoke any of the enumerated circular patterns.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math Standard definitions of subsequence embeddings and regular languages over finite alphabets
- domain assumption Strong Exponential Time Hypothesis (SETH)
Reference graph
Works this paper leans on
-
[1]
Amir Abboud, Arturs Backurs, and Virginia Vassilevska Williams. 2015. Tight hardness results for LCS and other sequence similarity measures. InFOCS. doi:10.1109/FOCS.2015.14
-
[2]
Amir Abboud, Virginia Vassilevska Williams, and Oren Weimann. 2014. Consequences of faster alignment of sequences. InICALP. doi:10.1007/978-3-662-43948-7_4
-
[3]
Duncan Adamson, Paul Sarnighausen-Cahn, Marius Dumitran, Maria Kosche, Tore Koß, Florin Manea, and Stefan Siemer. 2025. Longest common subsequence with gap constraints.Theory Comput. Syst.69, 2 (2025). doi:10.1007/S00224- 025-10223-0
-
[4]
Jagrati Agrawal, Yanlei Diao, Daniel Gyllstrom, and Neil Immerman. 2008. Efficient pattern matching over event streams. InSIGMOD. ACM. doi:10.1145/1376616.1376634
-
[5]
Brzozowski
Thomas Ang and Janusz A. Brzozowski. 2008. Continuous languages. InAFL
2008
-
[6]
Brzozowski
Thomas Ang and Janusz A. Brzozowski. 2009. Languages convex with respect to binary relations, and their closure properties.Acta Cybern.19, 2 (2009). https://cyber.bibl.u-szeged.hu/index.php/actcybern/article/view/3776
2009
-
[7]
Alexander Artikis, Alessandro Margara, Martín Ugarte, Stijn Vansummeren, and Matthias Weidlich. 2017. Complex event recognition languages: Tutorial. InDEBS. doi:10.1145/3093742.3095106
-
[8]
Arturs Backurs and Piotr Indyk. 2016. Which regular expression patterns are hard to match?. InFOCS
2016
-
[9]
Ricardo A Baeza-Yates. 1991. Searching subsequences.Theoretical Computer Science78, 2 (1991)
1991
-
[11]
Christian Bessière, Jean-Charles Régin, Roland H. C. Yap, and Yuanlin Zhang. 2005. An optimal coarse-grained arc consistency algorithm.Artif. Intell.165, 2 (2005). doi:10.1016/J.ARTINT.2005.02.004
-
[12]
Karl Bringmann and Bhaskar Ray Chaudhury. 2018. Sketching, streaming, and fine-grained complexity of (weighted) LCS. InFSTTCS
2018
-
[13]
Karl Bringmann, Allan Grønlund, Marvin Künnemann, and Kasper Green Larsen. 2024. The NFA acceptance hypothesis: Non-combinatorial and dynamic lower bounds.TheoretiCS3 (2024). doi:10.46298/THEORETICS.24.22
-
[14]
Karl Bringmann and Marvin Künnemann. 2018. Multivariate fine-grained complexity of longest common subsequence. InSODA
2018
-
[15]
Janusz A. Brzozowski, Jeffrey O. Shallit, and Zhi Xu. 2011. Decision problems for convex languages.Inf. Comput.209, 3 (2011). doi:10.1016/J.IC.2010.11.009
-
[16]
Andrei A. Bulatov. 2017. A dichotomy theorem for nonuniform CSPs. InFOCS. doi:10.1109/FOCS.2017.37
-
[17]
Sam Buss and Michael Soltys. 2014. Unshuffling a square is NP-hard.J. Comput. Syst. Sci.80, 4 (2014). doi:10.1016/j. jcss.2013.11.002
work page doi:10.1016/j 2014
-
[18]
David A. Cohen and Peter Jeavons. 2006. The complexity of constraint languages. InHandbook of Constraint Programming. Elsevier. doi:10.1016/S1574-6526(06)80012-X
-
[19]
Day, Maria Kosche, Florin Manea, and Markus L
Joel D. Day, Maria Kosche, Florin Manea, and Markus L. Schmid. 2022. Subsequences with gap constraints: Complexity bounds for matching and analysis problems. InISAAC. doi:10.4230/LIPICS.ISAAC.2022.64
-
[20]
Johannes Fischer and Volker Heun. 2006. Theoretical and practical improvements on the RMQ-problem, with applications to LCA and LCE. InCPM. doi:10.1007/11780441_5 , Vol. 1, No. 1, Article . Publication date: June 2026. Tractable Gap-Constraint Languages for Complex Event Recognition 17
-
[21]
Pamela Fleischmann, Sungmin Kim, Tore Koß, Florin Manea, Dirk Nowotka, Stefan Siemer, and Max Wiedenhöft. 2023. Matching patterns with variables under Simon’s congruence. InRP. https://doi.org/10.1007/978-3-031-45286-4_12
-
[22]
Eugene C. Freuder and Alan K. Mackworth. 2006. Constraint Satisfaction: An Emerging Paradigm. InHandbook of Constraint Programming, Francesca Rossi, Peter van Beek, and Toby Walsh (Eds.). Elsevier, 13–27. doi:10.1016/S1574- 6526(06)80006-4
-
[23]
Freydenberger, Pawel Gawrychowski, Juhani Karhumäki, Florin Manea, and Wojciech Rytter
Dominik D. Freydenberger, Pawel Gawrychowski, Juhani Karhumäki, Florin Manea, and Wojciech Rytter. 2015. Testing 𝑘-binomial equivalence. In Multidisciplinary Creativity, a collection of papers dedicated to G. Păun 65th birthday. available in CoRR abs/1509.00622
Pith/arXiv arXiv 2015
-
[24]
André Frochaux and Sarah Kleest-Meißner. 2023. Puzzling over subsequence-query extensions: Disjunction and generalised gaps. InAMW. https://ceur-ws.org/Vol-3409/paper3.pdf
2023
-
[25]
André Frochaux, Sarah Kleest-Meißner, and Benjamin Scheidt. 2025. Reaching new limits: Discovery of multi- dimensional disjunctive subsequence-queries with intervals. InDatenbanksysteme für Business, Technologie und Web (BTW 2025), 21. Fachtagung des GI-Fachbereichs „Datenbanken und Informationssysteme" (DBIS). doi:10.18420/BTW2025- 02
-
[26]
Younan Gao, Meng He, and Yakov Nekrich. 2020. Fast preprocessing for optimal orthogonal range reporting and range successor with applications to text indexing. InESA. doi:10.4230/LIPIcs.ESA.2020.54
-
[27]
Julián García and Cristian Riveros. 2025. Complex event recognition under time constraints: Towards a formal framework for efficient query evaluation.Proc. ACM Manag. Data3, 2 (2025). doi:10.1145/3725231
-
[28]
Nikos Giatrakos, Elias Alevizos, Alexander Artikis, Antonios Deligiannakis, and Minos N. Garofalakis. 2020. Complex event recognition in the Big Data era: A survey.VLDB J.29, 1 (2020). doi:10.1007/s00778-019-00557-w
-
[29]
Inge Li Gørtz, Sebastian Maneth, Gonzalo Navarro, and Nicola Prezza. 2024. Regular expressions: Matching and indexing (Dagstuhl Seminar 24472).Dagstuhl Reports14, 11 (2024). doi:10.4230/DAGREP.14.11.108
-
[30]
Alejandro Grez, Cristian Riveros, Martín Ugarte, and Stijn Vansummeren. 2021. A formal framework for complex event recognition.ACM Trans. Database Syst.46, 4 (2021). doi:10.1145/3485463
-
[31]
Simon Halfon, Philippe Schnoebelen, and Georg Zetzsche. 2017. Decidability, complexity, and expressiveness of first-order logic over the subword ordering. InLICS
2017
-
[32]
Pascal Van Hentenryck, Yves Deville, and Choh Man Teng. 1992. A generic arc-consistency algorithm and its specializations.Artif. Intell.57, 2-3 (1992). doi:10.1016/0004-3702(92)90020-X
-
[33]
Hopcroft, Rajeev Motwani, and Jeffrey D
John E. Hopcroft, Rajeev Motwani, and Jeffrey D. Ullman. 2001.Introduction to automata theory, languages, and computation. Addison-Wesley
2001
-
[34]
Lucian Ilie, Gonzalo Navarro, and Liviu Tinta. 2010. The longest common extension problem revisited and applications to approximate string searching.J. Discrete Algorithms8, 4 (2010). doi:10.1016/J.JDA.2010.08.004
-
[35]
Peter Jeavons. 1998. On the algebraic structure of combinatorial problems.Theor. Comput. Sci.200, 1-2 (1998). doi:10.1016/S0304-3975(97)00230-2
-
[36]
Peter Jeavons, David A. Cohen, and Marc Gyssens. 1997. Closure properties of constraints.J. ACM44, 4 (1997). doi:10.1145/263867.263489
-
[37]
Peter Jeavons and Martin C. Cooper. 1995. Tractable constraints on ordered domains.Artif. Intell.79, 2 (1995). doi:10.1016/0004-3702(95)00107-7
-
[38]
Prateek Karandikar, Manfred Kufleitner, and Philippe Schnoebelen. 2015. On the index of Simon’s congruence for piecewise testability.Inf. Process. Lett.115, 4 (2015)
2015
-
[39]
Prateek Karandikar and Philippe Schnoebelen. 2016. The height of piecewise-testable languages with applications in logical complexity. InCSL (LIPIcs, Vol. 62)
2016
-
[40]
Prateek Karandikar and Philippe Schnoebelen. 2019. The height of piecewise-testable languages and the complexity of the logic of subwords.Log. Methods Comput. Sci.15, 2 (2019)
2019
-
[41]
Schmid, Nicole Schweikardt, and Matthias Weidlich
Sarah Kleest-Meißner, Rebecca Sattler, Markus L. Schmid, Nicole Schweikardt, and Matthias Weidlich. 2022. Discovering event queries from traces: Laying foundations for subsequence-queries with wildcards and gap-size constraints. In ICDT
2022
-
[42]
Schmid, Nicole Schweikardt, and Matthias Weidlich
Sarah Kleest-Meißner, Rebecca Sattler, Markus L. Schmid, Nicole Schweikardt, and Matthias Weidlich. 2023. Discovering multi-dimensional subsequence queries from traces - From theory to practice. InDatenbanksysteme für Business, Technologie und Web (BTW 2023), 20. Fachtagung des GI-Fachbereichs „Datenbanken und Informationssysteme" (DBIS). doi:10.18420/BTW2023-24
-
[43]
Maria Kosche, Tore Koß, Florin Manea, and Stefan Siemer. 2022. Combinatorial algorithms for subsequence matching: A survey. InNCMA (EPTCS, Vol. 367). doi:10.4204/EPTCS.367.2
-
[44]
Dietrich Kuske. 2020. The subtrace order and counting first-order logic. InCSR
2020
-
[45]
Dietrich Kuske and Georg Zetzsche. 2019. Languages ordered by the subword order. InFOSSACS (LNCS, Vol. 11425)
2019
-
[46]
Marie Lejeune, Julien Leroy, and Michel Rigo. 2019. Computing the 𝑘-binomial complexity of the Thue-Morse word. InDLT. , Vol. 1, No. 1, Article . Publication date: June 2026. 18 Antoine Amarilli, Florin Manea, Tina Ringleb, and Markus L. Schmid
2019
-
[47]
Julien Leroy, Michel Rigo, and Manon Stipulanti. 2017. Generalized Pascal triangle for binomial coefficients of words. Electron. J. Combin.24, 1.44 (2017)
2017
-
[48]
Alan K. Mackworth. 1977. Consistency in networks of relations.Artif. Intell.8, 1 (1977). doi:10.1016/0004-3702(77)90007- 8
-
[49]
David Maier. 1978. The complexity of some problems on subsequences and supersequences.J. ACM25, 2 (1978), 15 pages
1978
-
[50]
Florin Manea, Jonas Richardsen, and Markus L. Schmid. 2024. Subsequences with generalised gap constraints: Upper and lower complexity bounds. InCPM. doi:10.4230/LIPIcs.CPM.2024.22
-
[51]
Alexandru Mateescu, Arto Salomaa, and Sheng Yu. 2004. Subword histories and Parikh matrices.J. Comput. Syst. Sci. 68, 1 (2004). doi:10.1016/J.JCSS.2003.04.001
-
[52]
Praveen, Philippe Schnoebelen, Julien Veron, and Isa Vialard
M. Praveen, Philippe Schnoebelen, Julien Veron, and Isa Vialard. 2024. On the piecewise complexity of words and periodic words. InSOFSEM. doi:10.1007/978-3-031-52113-3_32
-
[53]
William E. Riddle. 1979. An approach to software system modelling and analysis.Comput. Lang.4, 1 (1979). doi:10. 1016/0096-0551(79)90009-2
1979
-
[54]
Michel Rigo and Pavel Salimov. 2015. Another generalization of abelian equivalence: Binomial complexity of infinite words.Theor. Comput. Sci.601 (2015)
2015
-
[55]
Arto Salomaa. 2005. Connections between subwords and certain matrix mappings.Theor. Comput. Sci.340, 1 (2005). doi:10.1016/J.TCS.2005.03.024
-
[57]
DISCES: Systematic discovery of event stream queries.Proc. ACM Manag. Data3, 1 (2025). doi:10.1145/3709682
-
[58]
Schmid, Nicole Schweikardt, and Matthias Weidlich
Rebecca Sattler, Sarah Kleest-Meißner, Steven Lange, Markus L. Schmid, Nicole Schweikardt, and Matthias Weidlich
-
[59]
InDatenbanksysteme für Business, Technologie und Web (BTW 2025), 21
Embracing change: Incremental updates of discovered event queries. InDatenbanksysteme für Business, Technologie und Web (BTW 2025), 21. Fachtagung des GI-Fachbereichs „Datenbanken und Informationssysteme" (DBIS), 03.-07, März 2025, Bamberg, Germany, Proceedings. doi:10.18420/BTW2025-19
-
[60]
Philippe Schnoebelen and Julien Veron. 2023. On arch factorization and subword universality for words and compressed words. InWORDS. doi:10.1007/978-3-031-33180-0_21
-
[61]
Shinnosuke Seki. 2012. Absoluteness of subword inequality is undecidable.Theor. Comput. Sci.418 (2012)
2012
-
[62]
Alan C. Shaw. 1978. Software descriptions with flow expressions.IEEE Trans. Software Eng.4, 3 (1978). doi:10.1109/ TSE.1978.231501
arXiv 1978
-
[63]
1972.Hierarchies of events with dot-depth one
Imre Simon. 1972.Hierarchies of events with dot-depth one. Ph. D. Dissertation. University of Waterloo
1972
-
[64]
Imre Simon. 1975. Piecewise testable events. InAutomata Theory and Formal Languages, 2nd GI Conference
1975
-
[65]
Gabriel Thierrin. 1972. Convex languages. InICALP
1972
-
[66]
Virginia Vassilevska Williams. 2018. On some fine-grained questions in algorithms and complexity. InICM. doi:10. 1142/9789813272880_0188
2018
-
[67]
Eugene Wu, Yanlei Diao, and Shariq Rizvi. 2006. High-performance complex event processing over streams. InSIGMOD. doi:10.1145/1142473.1142520
-
[68]
Georg Zetzsche. 2016. The complexity of downward closure comparisons. InICALP, Vol. 55
2016
-
[69]
Haopeng Zhang, Yanlei Diao, and Neil Immerman. 2014. On complexity and optimization of expensive queries in complex event processing. InSIGMOD. doi:10.1145/2588555.2593671
-
[70]
Yuanlin Zhang and Roland H. C. Yap. 2001. Making AC-3 an optimal algorithm. InIJCAI
2001
-
[71]
Dmitriy Zhuk. 2017. A proof of CSP dichotomy conjecture. InFOCS. doi:10.1109/FOCS.2017.38
-
[72]
Dmitriy Zhuk. 2020. A proof of the CSP dichotomy conjecture.J. ACM67, 5 (2020). doi:10.1145/3402029 A More Information About the Left-Convex Property In this section, we explore in more detail the class of left-convex languages for which we have presented an efficient algorithm. We show that left-convex languages serve as a unifying explanation of the tra...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.