Formal Verification for JavaScript Regular Expressions: a Proven Semantics and its Applications (Extended Version)
Pith reviewed 2026-05-19 04:31 UTC · model grok-4.3
The pith
A mechanized semantics for JavaScript regular expressions is proven equivalent to the ECMAScript specification.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper defines a succinct semantics for JavaScript regular expressions in the Rocq proof assistant. This semantics records the full backtracking tree of possible matches rather than only the highest-priority one. It is proven equivalent to a preexisting line-by-line formalization of the ECMAScript specification. The authors then use the semantics to establish a notion of contextual equivalence for regex and to give the first formal proof of the PikeVM algorithm.
What carries the argument
A mechanized semantics in Rocq that captures the full backtracking tree of regex matches and is proven equivalent to the ECMAScript embedding.
If this is right
- Regex rewrites proposed in earlier work can now be formally proven correct or shown incorrect using the new contextual equivalence.
- The PikeVM algorithm receives its first machine-checked correctness proof.
- Further verification of regex engines, optimizers, or alternative matching algorithms becomes possible on a shared formal basis.
Where Pith is reading between the lines
- The same semantics could serve as a reference for checking consistency across different regex engine implementations.
- It opens a path to verified regex libraries that guarantee the same matching behavior as the specification.
- Extensions of the semantics to additional regex features or to other languages with similar backtracking rules could build directly on this foundation.
Load-bearing premise
The preexisting line-by-line embedding of the ECMAScript specification faithfully represents the informal prose standard.
What would settle it
A concrete JavaScript regex pattern on which the semantics predicts a different match result or priority ordering than an actual JavaScript engine produces.
Figures
read the original abstract
We present the first mechanized, succinct, practical, complete, and proven-faithful semantics for a modern regular expression language with backtracking semantics. We ensure its faithfulness by proving it equivalent to a preexisting line-by-line embedding of the official ECMAScript specification of JavaScript regular expressions. We demonstrate its practicality by presenting two real-world applications. First, a new notion of contextual equivalence for modern regular expressions, which we use to prove or disprove rewrites drawn from previous work. Second, the first formal proof of the PikeVM algorithm used in many real-world engines. In contrast with the specification and other formalization work, our semantics captures not only the top-priority match, but a full backtracking tree recording all possible matches and their respective priority. All our definitions and results have been mechanized in the Rocq proof assistant.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents the first mechanized semantics for JavaScript regular expressions supporting backtracking, defined succinctly in Rocq and proven equivalent to a preexisting line-by-line embedding of the ECMAScript specification. It captures the full backtracking tree (not only the top-priority match), and applies the semantics to define contextual equivalence for regex rewrites and to give the first formal proof of the PikeVM algorithm.
Significance. If the central claims hold, the work is significant for providing a machine-checked, practical semantics usable for verifying regex optimizations and real-world matching algorithms. The mechanization in Rocq, the explicit equivalence proof to the spec embedding, and the PikeVM verification are explicit strengths that support reproducibility and reliability beyond informal descriptions.
major comments (1)
- The central 'proven-faithful' claim is established via equivalence to a preexisting embedding rather than direct correspondence to the informal ECMAScript prose. While this is a standard and reasonable strategy, the paper should explicitly address (with references or discussion) the embedding's own validation against the prose specification for key aspects such as backtracking priority, assertions, and capture semantics, as any mismatch would propagate to the new succinct semantics.
minor comments (1)
- Abstract: the phrasing 'proven-faithful' could briefly clarify that faithfulness is shown relative to the embedding of the spec.
Simulated Author's Rebuttal
We thank the referee for their constructive review and recommendation of minor revision. The comment highlights an opportunity to strengthen the presentation of our faithfulness argument, which we address below.
read point-by-point responses
-
Referee: The central 'proven-faithful' claim is established via equivalence to a preexisting embedding rather than direct correspondence to the informal ECMAScript prose. While this is a standard and reasonable strategy, the paper should explicitly address (with references or discussion) the embedding's own validation against the prose specification for key aspects such as backtracking priority, assertions, and capture semantics, as any mismatch would propagate to the new succinct semantics.
Authors: We agree that an explicit discussion of the preexisting embedding's validation would improve clarity. In the revised manuscript we will add a short paragraph (likely in Section 2 or the introduction) that (1) cites the original embedding work, (2) summarizes its construction as a direct, line-by-line formalization of the ECMAScript prose, and (3) notes the validation steps reported for backtracking priority, assertions, and capture semantics (including correspondence to the official test suite and informal checks on priority ordering). This addition will make the inheritance of faithfulness explicit without altering any technical claims. revision: yes
Circularity Check
No circularity: semantics defined independently and proven equivalent to external preexisting embedding
full rationale
The paper defines a new succinct semantics for JavaScript regular expressions with backtracking and proves its equivalence in Rocq to a preexisting line-by-line embedding of the ECMAScript specification. This equivalence serves as an external benchmark rather than a self-referential construction. No steps in the derivation reduce by construction to the paper's own fitted parameters, self-citations, or ansatzes; the central faithfulness claim rests on the independent embedding. The applications (contextual equivalence and PikeVM proof) build on this without circular reduction. The derivation is self-contained against the external reference.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Standard Rocq axioms for inductive types, equality, and proof irrelevance
Reference graph
Works this paper leans on
-
[1]
A Compact Proof of Decidability for Regular Expression Equivalence. In Interactive Theorem Proving - Third International Conference, ITP 2012, Princeton, NJ, USA, August 13-15,
work page 2012
-
[2]
7406), Lennart Beringer and Amy P
Proceedings (Lecture Notes in Computer Science, Vol. 7406), Lennart Beringer and Amy P. Felty (Eds.). Springer, 283–298. doi:10.1007/978-3-642-32347-8_19 Fahad Ausaf, Roy Dyckhoff, and Christian Urban
-
[3]
POSIX Lexing with Derivatives of Regular Expressions (Proof Pearl). In Interactive Theorem Proving - 7th International Conference, ITP 2016, Nancy, France, August 22-25, 2016, Proceedings (Lecture Notes in Computer Science, Vol
work page 2016
-
[4]
, Jasmin Christian Blanchette and Stephan Merz (Eds.). Springer, 69–86. doi:10.1007/978-3-319-43144-4_5 Aurèle Barrière and Clément Pit-Claudel
-
[5]
Linear Matching of JavaScript Regular Expressions. Proc. ACM Program. Lang. 8, PLDI (2024), 1336–1360. doi:10.1145/3656431 Martin Berglund and Brink van der Merwe
-
[6]
On the semantics of regular expression parsing in the wild. Theor. Comput. Sci. 679 (2017), 69–82. doi:10.1016/J.TCS.2016.09.006 Martin Berglund, Brink van der Merwe, and Steyn van Litsenborgh
-
[7]
Regular Expressions with Lookahead. J. Univers. Comput. Sci. 27, 4 (2021), 324–340. doi:10.3897/JUCS.66330 Thomas Braibant and Damien Pous
-
[8]
In Interactive Theorem Proving, First International Conference, ITP 2010, Edinburgh, UK, July 11-14,
An Efficient Coq Tactic for Deciding Kleene Algebras. In Interactive Theorem Proving, First International Conference, ITP 2010, Edinburgh, UK, July 11-14,
work page 2010
-
[9]
6172), Matt Kaufmann and Lawrence C
Proceedings (Lecture Notes in Computer Science, Vol. 6172), Matt Kaufmann and Lawrence C. Paulson (Eds.). Springer, 163–178. doi:10.1007/978-3-642-14052-5_13 Elton Maximo Cardoso, Leonardo Vieira dos Santos Reis, and Rodrigo Geraldo Ribeiro
-
[10]
A Verified Operational Semantics for Regular Expression Parsing. In Proceedings of the XXVII Brazilian Symposium on Programming Languages, SBLP 2023, Campo Grande, MS, Brazil, September 25-29, 2023 . ACM, 82–90. doi:10.1145/3624309.3624317 Samuel Chassot and Viktor Kunčak
-
[11]
Verified invertible lexer using regular expressions and DFAs. arXiv:2412.13581 [cs.PL] https://arxiv.org/abs/2412.13581 Agnishom Chattopadhyay, Wu Angela Li, and Konstantinos Mamouras
-
[12]
Verified and Efficient Matching of Regular Expressions with Lookaround. In Proceedings of the 14th ACM SIGPLAN International Conference on Certified Programs and Proofs, CPP 2025, Denver, CO, USA, January 20-21, 2025 , Kathrin Stark, Amin Timany, Sandrine Blazy, and Nicolas Tabareau (Eds.). ACM, 198–213. doi:10.1145/3703595.3705884 Taolue Chen, Alejandro ...
-
[13]
Proceedings of the ACM on Programming Languages 6, POPL (2022), 1–31
Solving string constraints with Regex-dependent functions through transducers with priorities and variables. Proceedings of the ACM on Programming Languages 6, POPL (2022), 1–31. doi:10.1145/3498707 Nariyoshi Chida and Tachio Terauchi
-
[14]
Proceedings of the ACM on Programming Languages 7, PLDI (2023), 1633–1656
Repairing Regular Expressions for Extraction. Proceedings of the ACM on Programming Languages 7, PLDI (2023), 1633–1656. doi:10.1145/3591287 Nariyoshi Chida and Tachio Terauchi
-
[15]
Repairing Regex-Dependent String Functions. In Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering, ASE 2024, Sacramento, CA, USA, October 27 - November 1, 2024, Vladimir Filkov, Baishakhi Ray, and Minghui Zhou (Eds.). ACM, 294–305. doi:10.1145/3691620.3695005 Thierry Coquand and Vincent Siles
-
[16]
In Certified Programs and Proofs - First International Conference, CPP 2011
A Decision Procedure for Regular Expression Equivalence in Type Theory. In Certified Programs and Proofs - First International Conference, CPP 2011 . doi:10.1007/978-3-642-25379-9_11 Russ Cox
-
[17]
The impact of regular expression denial of service (ReDoS) in practice: an empirical study at the ecosystem scale. In Proceedings of the 2018 ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE
work page 2018
-
[18]
doi:10.1145/3236024.3236027 Noé De Santo, Aurèle Barrière, and Clément Pit-Claudel
ACM, 246–256. doi:10.1145/3236024.3236027 Noé De Santo, Aurèle Barrière, and Clément Pit-Claudel
-
[19]
A Coq Mechanization of JavaScript Regular Expression Semantics. Proc. ACM Program. Lang. 8, ICFP (2024), 1003–1031. doi:10.1145/3674666 Christian Doczkal, Jan-Oliver Kaiser, and Gert Smolka
-
[20]
A Constructive Theory of Regular Languages in Coq. In Certified Programs and Proofs - Third International Conference, CPP 2013, Melbourne, VIC, Australia, December 11-13, 2013, Proceedings (Lecture Notes in Computer Science, Vol
work page 2013
-
[21]
, Georges Gonthier and Michael Norrish (Eds.). Springer, 82–97. doi:10.1007/978-3-319-03545-1_6 Christian Doczkal and Gert Smolka
-
[22]
Regular Language Representations in the Constructive Type Theory of Coq. J. Autom. Reason. 61, 1-4 (2018), 521–553. doi:10.1007/S10817-018-9460-X Mark Jason Dominus
-
[23]
Acta Informatica 37, 2 (2000), 121–144
Efficiently building a parse tree from a regular expression. Acta Informatica 37, 2 (2000), 121–144. doi:10.1007/S002360000037 ECMA
-
[24]
https://262.ecma-international.org/14.0/ ECMA
ECMA-262, 14th edition: ECMAScript® 2023 Language Specification. https://262.ecma-international.org/14.0/ ECMA
work page 2023
-
[25]
https://262.ecma-international.org/15.0/ ECMA
ECMA-262, 15th edition: ECMAScript® 2024 Language Specification. https://262.ecma-international.org/15.0/ ECMA
work page 2024
-
[26]
https://262.ecma-international.org/16.0/ Derek Egolf, Sam Lasser, and Kathleen Fisher
ECMA-262, 16th edition: ECMAScript® 2025 Language Specification. https://262.ecma-international.org/16.0/ Derek Egolf, Sam Lasser, and Kathleen Fisher
work page 2025
-
[27]
In IEEE Security and Privacy Workshops, SP Workshops 2021, San Francisco, CA, USA, May 27, 2021
Verbatim: A Verified Lexer Generator. In IEEE Security and Privacy Workshops, SP Workshops 2021, San Francisco, CA, USA, May 27, 2021 . IEEE, 92–100. doi:10.1109/SPW53761.2021.00022 Publication date: September
-
[28]
Verbatim++: verified, optimized, and semantically rich lexing with derivatives. In CPP ’22: 11th ACM SIGPLAN International Conference on Certified Programs and Proofs, Philadelphia, PA, USA, January 17 - 18, 2022 , Andrei Popescu and Steve Zdancewic (Eds.). ACM, 27–39. doi:10.1145/3497775.3503694 Denis Firsov and Tarmo Uustalu
-
[29]
Certified Parsing of Regular Languages. In Certified Programs and Proofs - Third International Conference, CPP 2013, Melbourne, VIC, Australia, December 11-13, 2013, Proceedings (Lecture Notes in Computer Science, Vol. 8307), Georges Gonthier and Michael Norrish (Eds.). Springer, 98–113. doi:10.1007/978-3-319-03545-1_7 Hiroya Fujinami and Ichiro Hasuo
-
[30]
Efficient Matching with Memoization for Regexes with Look-around and Atomic Grouping. In Programming Languages and Systems - 33rd European Symposium on Programming, ESOP 2024, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2024, Luxembourg City, Luxembourg, April 6-11, 2024, Proceedings, Part II (Lecture Notes in ...
-
[31]
https://hacks.mozilla.org/2020/06/a-new-regexp-engine-in- spidermonkey/
A New RegExp Engine in SpiderMonkey. https://hacks.mozilla.org/2020/06/a-new-regexp-engine-in- spidermonkey/. Jacques-Henri Jourdan, François Pottier, and Xavier Leroy
work page 2020
-
[32]
Validating LR(1) Parsers. In Programming Languages and Systems - 21st European Symposium on Programming, ESOP 2012, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2012, Tallinn, Estonia, March 24 - April 1,
work page 2012
-
[33]
Proceedings (Lecture Notes in Computer Science, Vol. 7211), Helmut Seidl (Ed.). Springer, 397–416. doi:10.1007/978-3-642-28869-2_20 Stefan Kahrs and Colin Runciman
-
[34]
Simplifying regular expressions further. J. Symb. Comput. 109 (2022), 124–143. doi:10.1016/J.JSC.2021.08.003 Ohad Kammar and Katarzyna Marek
-
[35]
Idris TyRE: a dependently typed regex parser. CoRR abs/2305.04480 (2023). doi:10.48550/ARXIV.2305.04480 arXiv:2305.04480 Alexander Krauss and Tobias Nipkow
-
[36]
Archive of Formal Proofs (May 2010)
Regular Sets and Expressions. Archive of Formal Proofs (May 2010). https://isa- afp.org/entries/Regular-Sets.html, Formal proof development. Alexander Krauss and Tobias Nipkow
work page 2010
-
[37]
Proof Pearl: Regular Expression Equivalence and Relation Algebra. J. Autom. Reason. 49, 1 (2012), 95–106. doi:10.1007/S10817-011-9223-4 Blake Loring, Duncan Mitchell, and Johannes Kinder
-
[38]
Sound Regular Expression Semantics for Dynamic Symbolic Execution of JavaScript. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2019, Phoenix, AZ, USA, June 22-26, 2019 , Kathryn S. McKinley and Kathleen Fisher (Eds.). ACM, 425–438. doi:10.1145/3314221.3314645 Konstantinos Mamouras and Agnishom Ch...
-
[39]
Efficient Matching of Regular Expressions with Lookaround Assertions. Proc. ACM Program. Lang. 8, POPL (2024), 2761–2791. doi:10.1145/3632934 Nelma Moreira, David Pereira, and Simão Melo de Sousa
-
[40]
Deciding Regular Expressions (In-)Equivalence in Coq. In Relational and Algebraic Methods in Computer Science - 13th International Conference, RAMiCS 2012, Cambridge, UK, September 17-20,
work page 2012
-
[41]
, Wolfram Kahl and Timothy G. Griffin (Eds.). Springer, 98–113. doi:10.1007/978-3-642-33314-9_7 Dan Moseley, Mario Nishio, Jose Perez Rodriguez, Olli Saarikivi, Stephen Toub, Margus Veanes, Tiki Wan, and Eric Xu
-
[42]
Derivative Based Nonbacktracking Real-World Regex Matching with Backtracking Semantics. Proc. ACM Program. Lang. 7, PLDI (2023), 1026–1049. doi:10.1145/3591262 Lasse Nielsen and Fritz Henglein
-
[43]
Bit-coded Regular Expression Parsing. In Language and Automata Theory and Applications - 5th International Conference, LATA 2011, Tarragona, Spain, May 26-31,
work page 2011
-
[44]
6638), Adrian-Horia Dediu, Shunsuke Inenaga, and Carlos Martín-Vide (Eds.)
Proceedings (Lecture Notes in Computer Science, Vol. 6638), Adrian-Horia Dediu, Shunsuke Inenaga, and Carlos Martín-Vide (Eds.). Springer, 402–413. doi:10.1007/978-3-642-21254-3_32 Tobias Nipkow and Dmitriy Traytel
-
[45]
Unified Decision Procedures for Regular Expression Equivalence. In Interactive Theorem Proving - 5th International Conference, ITP 2014, Held as Part of the Vienna Summer of Logic, VSL 2014, Vienna, Austria, July 14-17,
work page 2014
-
[46]
, Gerwin Klein and Ruben Gamboa (Eds.). Springer, 450–466. doi:10.1007/978-3-319-08970-6_29 Taisei Nogami and Tachio Terauchi
-
[47]
On the Expressive Power of Regular Expressions with Backreferences. In 48th International Symposium on Mathematical Foundations of Computer Science, MFCS 2023, August 28 to September 1, 2023, Bordeaux, France (LIPIcs, Vol
work page 2023
-
[48]
Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 71:1–71:15
, Jérôme Leroux, Sylvain Lombardy, and David Peleg (Eds.). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 71:1–71:15. doi:10.4230/LIPICS.MFCS.2023.71 Wendlasida Ouedraogo, Gabriel Scherer, and Lutz Straßburger
-
[49]
Coqlex: Generating Formally Verified Lexers. Art Sci. Eng. Program. 8, 1 (2024). doi:10.22152/PROGRAMMING-JOURNAL.ORG/2024/8/3 Publication date: September
work page doi:10.22152/programming-journal.org/2024/8/3 2024
-
[50]
Journal of Functional Programming 19, 2 (2009), 173–190
Regular-expression derivatives re-examined. Journal of Functional Programming 19, 2 (2009), 173–190. Scott Owens and Konrad Slind
work page 2009
-
[51]
Adapting functional programs to higher order logic. High. Order Symb. Comput. 21, 4 (2008), 377–409. doi:10.1007/S10990-008-9038-0 Rob Pike
-
[52]
The Text Editor sam. Softw. Pract. Exp. 17, 11 (1987), 813–845. doi:10.1002/SPE.4380171105 Gabriel Radanne
-
[53]
Typed parsing and unparsing for untyped regular expression engines. In Proceedings of the 2019 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation, PEPM@POPL 2019, Cascais, Portugal, January 14-15, 2019, Manuel V. Hermenegildo and Atsushi Igarashi (Eds.). ACM, 35–46. doi:10.1145/3294032.3294082 Rodrigo Geraldo Ribeiro and André Rauber Du Bois
-
[54]
Certified Bit-Coded Regular Expression Parsing. In Proceedings of the 21st Brazilian Symposium on Programming Languages, SBLP 2017, Fortaleza, CE, Brazil, September 21-22, 2017 , Fabio Mascarenhas (Ed.). ACM, 4:1–4:8. doi:10.1145/3125374.3125381 Dmitry Soshnikov
-
[55]
In 27th USENIX Security Symposium, USENIX Security 2018
Freezing the Web: A Study of ReDoS Vulnerabilities in JavaScript- based Web Servers. In 27th USENIX Security Symposium, USENIX Security 2018 . USENIX Association, 361–376. https: //www.usenix.org/conference/usenixsecurity18/presentation/staicu Peter Thiemann
work page 2018
-
[56]
Derivatives for Enhanced Regular Expressions. In Implementation and Application of Automata - 21st International Conference, CIAA 2016, Seoul, South Korea, July 19-22, 2016, Proceedings (Lecture Notes in Computer Science, Vol. 9705), Yo-Sub Han and Kai Salomaa (Eds.). Springer, 285–297. doi:10.1007/978-3-319-40946-7_24 Ken Thompson
-
[57]
Regular Expression Search Algorithm.Commun. ACM 11, 6 (1968), 419–422. doi:10.1145/363347.363387 Christian Urban
-
[58]
POSIX Lexing with Derivatives of Regular Expressions. J. Autom. Reason. 67, 3 (2023),
work page 2023
-
[59]
Mismatch between the Experimental engine and the backtracking engine on empty repetitions. https://issues. chromium.org/issues/42204037. V8
-
[60]
https://issues.chromium.org/issues/388290816
RegExp: Inconsistent branch priority in look-behind assertions. https://issues.chromium.org/issues/388290816. Ian Erik Varatalu, Margus Veanes, and Juhan-Peep Ernits
-
[61]
Derivative Based Extended Regular Expression Matching Supporting Intersection, Complement and Lookarounds. CoRR abs/2309.14401 (2023). doi:10.48550/ARXIV.2309.14401 arXiv:2309.14401 Ian Erik Varatalu, Margus Veanes, and Juhan P. Ernits
-
[62]
RE#: High Performance Derivative-Based Regex Matching with Intersection, Complement, and Restricted Lookarounds. Proc. ACM Program. Lang. 9, POPL (2025), 1–32. doi:10. 1145/3704837 Xiang Wang, Yang Hong, Harry Chang, KyoungSoo Park, Geoff Langdale, Jiayu Hu, and Heqing Zhu
work page 2025
-
[63]
Hyperscan: A Fast Multi-pattern Regex Matcher for Modern CPUs. In 16th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2019, Boston, MA, February 26-28, 2019 , Jay R. Lorch and Minlan Yu (Eds.). USENIX Association, 631–648. https://www.usenix.org/conference/nsdi19/presentation/wang-xiang Ekaterina Zhuchko, Margus Veanes, and Gabriel Ebner
work page 2019
-
[64]
Lean Formalization of Extended Regular Expression Matching with Lookarounds. In Proceedings of the 13th ACM SIGPLAN International Conference on Certified Programs and Proofs, CPP 2024, London, UK, January 15-16, 2024 , Amin Timany, Dmitriy Traytel, Brigitte Pientka, and Sandrine Blazy (Eds.). ACM, 118–131. doi:10.1145/3636501.3636959 Publication date: September
-
[65]
PikeTree small-step semantics Publication date: September 2025
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.