Selective Memoization for Efficient Backtracking Regular Expression Matching
Pith reviewed 2026-06-26 02:11 UTC · model grok-4.3
The pith
Computing a minimum feedback vertex set on a regex automaton lets selective memoization guarantee linear backtracking time.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The Minimum Feedback Node memoization scheme computes a minimum feedback vertex set of the automaton and memos only those nodes; this eliminates all exponential paths in backtracking matching, yields linear time, and maintains explicit relationships to prior memoization approaches under both Thompson and Glushkov automata.
What carries the argument
The Minimum Feedback Node (MFN) memoization scheme, which selects memoization points by computing a minimum feedback vertex set of the automaton to cut all cycles.
If this is right
- Backtracking matchers using MFN memoization run in linear time on all inputs.
- Memory usage stays below that of full memoization while still preventing catastrophic cases.
- The same MFN construction works for both Thompson and Glushkov automata with the claimed relationships to earlier schemes preserved.
- Selective memoization can be computed once per expression rather than per match.
Where Pith is reading between the lines
- The approach may extend to other nondeterministic automata used in string matching or parsing if their cycle structure is similar.
- Practical implementations would need an efficient approximation algorithm for the minimum feedback vertex set, since the exact problem is NP-hard.
- If the MFN set proves small in real-world regexes, the method could be integrated into existing engines without large memory overhead.
Load-bearing premise
That a minimum feedback vertex set on the automaton always produces a memoization set sufficient to remove every exponential recomputation path while preserving the linear-time bound.
What would settle it
A concrete regular expression whose Thompson or Glushkov automaton has an MFN set that still permits an exponential number of matching paths on some input string.
Figures
read the original abstract
Backtracking regular expression matchers are widely used due to their expressive power but may exhibit exponential worst-case matching time. Memoization provides a principled method for eliminating redundant computation and ensuring linear matching time, but full memoization is memory-intensive and impractical. We introduce the Minimum Feedback Node (MFN) memoization scheme, a selective memoization strategy based on computing a minimum feedback vertex set of an automaton. We establish relationships with existing memoization schemes and analyze their behaviour under both Thompson and Glushkov automaton constructions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces the Minimum Feedback Node (MFN) memoization scheme, a selective memoization strategy for backtracking regular expression matchers. MFN is obtained by computing a minimum feedback vertex set of the underlying automaton; the paper claims this eliminates all exponential backtracking paths while guaranteeing linear matching time. It further establishes relationships between MFN and prior memoization schemes and analyzes behavior under both Thompson and Glushkov NFA constructions.
Significance. If the MFN construction can be performed in polynomial time and the linear-time guarantee is formally established, the selective-memoization approach would supply a memory-efficient alternative to full memoization, with potential practical value for regex engines that must handle pathological inputs.
major comments (1)
- [Abstract] Abstract: the central claim that MFN memoization 'eliminates exponential backtracking paths while delivering linear matching time' depends on an efficient computation of the minimum feedback vertex set. Minimum feedback vertex set is NP-hard on general graphs, and neither Thompson nor Glushkov NFAs are known to admit polynomial-time exact solutions in the worst case. The manuscript supplies neither a structure-exploiting polynomial algorithm nor a proven-correct approximation that preserves the end-to-end linearity; this is load-bearing for the claimed complexity result.
Simulated Author's Rebuttal
We thank the referee for the careful review and for identifying this critical point regarding the computational complexity underlying our claims. We address the comment below and outline the revisions we will undertake.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that MFN memoization 'eliminates exponential backtracking paths while delivering linear matching time' depends on an efficient computation of the minimum feedback vertex set. Minimum feedback vertex set is NP-hard on general graphs, and neither Thompson nor Glushkov NFAs are known to admit polynomial-time exact solutions in the worst case. The manuscript supplies neither a structure-exploiting polynomial algorithm nor a proven-correct approximation that preserves the end-to-end linearity; this is load-bearing for the claimed complexity result.
Authors: We agree that the end-to-end complexity claim is load-bearing and that the manuscript does not supply a polynomial-time algorithm (or approximation preserving linearity) for computing the minimum feedback vertex set on Thompson or Glushkov NFAs. The paper focuses on the correctness and linear-time properties of the matching phase once an MFN has been obtained, establishing that selective memoization on the MFN eliminates all exponential backtracking paths. We do not claim or prove that MFN computation itself is efficient in the worst case for these automata constructions. We will revise the abstract to qualify the linear-time guarantee as applying to the matching process given a precomputed MFN, and we will add a dedicated discussion section addressing the complexity of MFN computation, its NP-hardness in general, and whether the restricted structure of regex-derived NFAs admits efficient exact or approximate solutions. If no such efficient method can be established, the claims will be adjusted to separate the precomputation cost from the per-input matching time. revision: yes
Circularity Check
No significant circularity; MFN scheme defined via external min-FVS concept
full rationale
The paper defines the MFN memoization scheme directly in terms of an external graph-theoretic primitive (minimum feedback vertex set of an automaton) and states that it establishes relationships to existing memoization schemes. No equations, fitted parameters, self-citations, or ansatzes are present in the abstract that would reduce any claimed derivation or prediction to the paper's own inputs by construction. The central construction therefore remains independent of the target result and draws on standard external mathematics.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Theoretical Computer Science 679, 69–82 (2017)
Berglund, M., van der Merwe, B.: On the semantics of regular expression parsing in the wild. Theoretical Computer Science 679, 69–82 (2017). https://doi.org/10.1016/j.tcs.2016.09.006
-
[2]
Berry, G., Sethi, R.: From regular expressions to deterministic automata. Theor. Comput. Sci.48(3), 117–126 (1986). https://doi.org/10.1016/0304-3975(86)90088-5
-
[3]
In: Dumas, M., Pfahl, D., Apel, S., Russo, A
Davis, J.C., IV , L.G.M., Coghlan, C.A., Servant, F., Lee, D.: Why aren’t regular expressions a lingua franca? An empirical study on the re-use and portability of regular expressions. In: Dumas, M., Pfahl, D., Apel, S., Russo, A. (eds.) Proceedings of the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Sof...
-
[4]
Choquette - Choo, Hengrui Jia, Adelin Travers, Baiwu Zhang, David Lie, and Nicolas Papernot
Davis, J.C., Servant, F., Lee, D.: Using Selective Memoization to Defeat Regular Expression Denial of Service (ReDoS). In: 42nd IEEE Symposium on Security and Privacy, SP 2021, San Francisco, CA, USA, 24-27 May 2021. pp. 1–17. IEEE (2021). https://doi.org/10.1109/SP40001.2021.00032
-
[5]
Technical Report 98-17, Università Ca’ Foscari di Venezia (1998)
Giammarresi, D., Ponty, J.L., Wood, D.: Glushkov and Thompson Constructions: A Synthesis. Technical Report 98-17, Università Ca’ Foscari di Venezia (1998)
1998
-
[6]
Giammarresi, D., Ponty, J., Wood, D., Ziadi, D.: A characterization of Thompson digraphs. Discret. Appl. Math. 134(1-3), 317–337 (2004). https://doi.org/10.1016/S0166-218X(03)00299-3
-
[7]
Russian Mathematical Surveys 16(5), 1 (oct 1961)
Glushkov, V .M.: The abstract theory of automata. Russian Mathematical Surveys 16(5), 1 (oct 1961). https://doi.org/10.1070/RM1961v016n05ABEH004112
-
[8]
Reducibility among Combinatorial Problems
Karp, R.M.: Reducibility among combinatorial problems. In: Miller, R.E., Thatcher, J.W. (eds.) Proceedings of a symposium on the Complexity of Computer Computations, held March 20-22, 1972, at the IBM Thomas J. Watson Research Center, Yorktown Heights, New York, USA. pp. 85–103. The IBM Research Symposia Series, Plenum Press, New York (1972). https://doi....
-
[9]
van der Merwe, B., Mouton, J., van Litsenborgh, S., Berglund, M.: Memoized regular expressions. In: Maneth, S. (ed.) Implementation and Application of Automata - 25th International Conference, CIAA 2021, Virtual Event, July 19-22, 2021, Proceedings. Lecture Notes in Computer Science, vol. 12803, pp. 39–52. Springer (2021). https://doi.org/10.1007/978-3-03...
-
[10]
Roodt, A., Watling, B.K.M., Bester, W., van der Merwe, B., Sung, S., Han, Y .S.: Benchmarking regular expression matching. In: Fazekas, S.Z. (ed.) Implementation and Application of Automata - 28th Interna- tional Conference, CIAA 2024, Akita, Japan, September 3-6, 2024, Proceedings. Lecture Notes in Computer Science, vol. 15015, pp. 316–331. Springer (202...
-
[11]
Shamir, A.: A linear time algorithm for finding minimum cutsets in reducible graphs. SIAM J. Comput. 8(4), 645–655 (1979). https://doi.org/10.1137/0208051
-
[12]
Thompson, K.: Regular expression search algorithm. Commun. ACM 11(6), 419–422 (1968). https://doi.org/10.1145/363347.363387
-
[13]
Theoretical Computer Science 88(2), 325–349 (1991)
Weber, A., Seidl, H.: On the degree of ambiguity of finite automata. Theoretical Computer Science 88(2), 325–349 (1991). https://doi.org/10.1016/0304-3975(91)90381-B
-
[14]
Weideman, N., van der Merwe, B., Berglund, M., Watson, B.W.: Analyzing matching time behavior of backtracking regular expression matchers by using ambiguity of NFA. In: Han, Y ., Salomaa, K. (eds.) Im- plementation and Application of Automata - 21st International Conference, CIAA 2016, Seoul, South Korea, July 19-22, 2016, Proceedings. Lecture Notes in Co...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.