Deconstructing Subset Construction -- Reducing While Determinizing
Pith reviewed 2026-05-22 15:08 UTC · model grok-4.3
The pith
Equivalence registries insert minimization steps into subset construction to shrink the state space while determinizing NFAs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By maintaining equivalence registries that track and unify language-equivalent states, intermediate minimization steps can be performed during subset construction, thereby reducing the exploration space on the fly; the same mechanism embeds into Brzozowski's algorithm and permits further optimizations such as convexity closures and simulation.
What carries the argument
Equivalence registries that track and unify language-equivalent states during construction.
If this is right
- The intermediate state space during determinization stays smaller, especially on worst-case inputs from automatic sequences.
- The same registries allow simulation and convexity closures to be applied without a separate pass.
- The approach can be dropped into either classic subset construction or Brzozowski's algorithm without changing their outer structure.
- Real-world examples show measurable reduction in total states visited compared with standard methods.
Where Pith is reading between the lines
- The same registry idea could be tested inside other automata algorithms that currently separate minimization from construction.
- If the registries remain sound, the technique might extend to partial determinization tasks that stop early once a target property is met.
- Combining the method with existing DFA minimization tools after construction could produce still smaller final automata.
Load-bearing premise
Equivalence registries correctly and efficiently identify and unify all language-equivalent states during construction without missing equivalences or introducing incorrect merges that would alter the accepted language.
What would settle it
An NFA on which the registry method either produces an automaton accepting a different language or explores strictly more states than ordinary subset construction.
Figures
read the original abstract
We present a novel perspective on the NFA canonization problem, which introduces intermediate minimization steps to reduce the exploration space on-the-fly. Central to our approach are equivalence registries which track and unify language-equivalent states, and allow for additional optimizations such as convexity closures and simulation. Due to the generality of our approach, these concepts can be embedded in classic subset construction or Brzozowski's approach. We evaluate our approach on a set of synthetic and real-world examples from automatic sequences and observe that we are able to improve especially worst-case scenarios. We provide an open-source library implementing our approach.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents a novel perspective on the NFA canonization problem by introducing intermediate minimization steps during determinization to reduce the exploration space on-the-fly. Central to the approach are equivalence registries that track and unify language-equivalent states, enabling additional optimizations such as convexity closures and simulation. The method is designed to be embeddable in classic subset construction or Brzozowski's algorithm. The authors evaluate the approach on synthetic and real-world examples from automatic sequences, reporting improvements especially in worst-case scenarios, and provide an open-source library implementing the method.
Significance. If the central claim holds, the work could offer practical efficiency gains for automata constructions in worst-case scenarios by reducing state-space exploration during determinization. The open-source library is a clear strength that supports reproducibility and allows direct testing of the claimed improvements on automatic sequences.
major comments (2)
- [Abstract] Abstract: the claim that equivalence registries 'track and unify language-equivalent states' and enable 'convexity closures and simulation' is load-bearing for the central claim of on-the-fly reduction, yet the manuscript provides no algorithm, invariant, or proof sketch showing that partial information available mid-construction suffices for sound equivalence decisions without missing equivalences or performing merges that alter the accepted language.
- [Abstract] Abstract: no complexity analysis, derivation details, or proof sketches are supplied to substantiate the reported reductions in exploration space or the empirical gains, preventing verification that the intermediate minimization steps preserve the accepted language while achieving the claimed improvements.
minor comments (1)
- [Abstract] The abstract introduces terms such as 'convexity closures' and 'simulation' without definitions or explanations of their integration with equivalence registries or the base algorithms.
Simulated Author's Rebuttal
We thank the referee for the constructive review and positive assessment of the work's potential practical impact and the value of the open-source library. We address the two major comments point by point below, clarifying the manuscript content and committing to targeted revisions that strengthen the presentation without altering the core claims.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that equivalence registries 'track and unify language-equivalent states' and enable 'convexity closures and simulation' is load-bearing for the central claim of on-the-fly reduction, yet the manuscript provides no algorithm, invariant, or proof sketch showing that partial information available mid-construction suffices for sound equivalence decisions without missing equivalences or performing merges that alter the accepted language.
Authors: We agree that the abstract is too concise to convey the supporting details. The full manuscript defines the equivalence registry in Section 3, including its data structure for tracking partial language information and the unification operation. The soundness argument rests on an invariant (stated in Section 3.2) that merges occur only between states whose accepted languages coincide on the transitions explored so far; this is maintained by construction because the registry consults only the current determinization frontier. We will add an explicit statement of the invariant together with a short proof sketch in a new subsection of the revised manuscript, and we will include pseudocode for the registry update to make the mid-construction decision procedure fully transparent. revision: yes
-
Referee: [Abstract] Abstract: no complexity analysis, derivation details, or proof sketches are supplied to substantiate the reported reductions in exploration space or the empirical gains, preventing verification that the intermediate minimization steps preserve the accepted language while achieving the claimed improvements.
Authors: The manuscript prioritizes empirical demonstration over asymptotic analysis because the magnitude of the reduction depends on the structure of the input NFA. Section 5 quantifies the observed savings on automatic-sequence benchmarks and includes a brief argument that early merging cannot change the accepted language. In the revision we will insert a short derivation (in Section 4) that bounds the number of subsets avoided by each registry merge, together with a proof sketch confirming language preservation. These additions will allow readers to verify the claimed improvements while preserving the paper's focus on practical gains. revision: yes
Circularity Check
No significant circularity; algorithmic technique presented as independent contribution
full rationale
The paper introduces an algorithmic perspective on NFA canonization via equivalence registries for on-the-fly minimization during subset construction or Brzozowski's method. Claims rest on the proposed technique's ability to track and unify language-equivalent states, with evaluation on external synthetic and real-world examples from automatic sequences. No derivation reduces by construction to fitted inputs, self-definitions, or load-bearing self-citations; the method is framed as a general embedding of optimizations rather than a closed self-referential loop. The central correctness of equivalence detection is an independent algorithmic claim, not shown to be equivalent to its own inputs via the paper's equations or prior self-referential results.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Language equivalence between states is decidable and can be tracked incrementally during subset construction.
invented entities (1)
-
equivalence registries
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Central to our approach are equivalence registries which track and unify language-equivalent states, and allow for additional optimizations such as convexity closures and simulation.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We present OTF – a framework for on-the-fly NFA canonization.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Parosh Aziz Abdulla et al. “When Simulation Meets Antichains”. In:Tools and Algorithms for the Construction and Analysis of Systems, 16th Inter- national Conference, TACAS 2010, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2010, Paphos, Cyprus, March 20-28, 2010. Proceedings. Ed. by Javier Esparza and Ru- pak Ma...
-
[2]
Efficient String Matching: An Aid to Bibliographic Search
Alfred V. Aho and Margaret J. Corasick. “Efficient String Matching: An Aid to Bibliographic Search”. In:Commun. ACM 18.6 (1975), pp. 333–
work page 1975
-
[3]
doi: 10.1145/360825.360855
-
[4]
AutomataLib GitHub page. Online. Accessed: Jan 12, 2026.url: https: //github.com/LearnLib/automatalib
work page 2026
-
[5]
Additive Num- ber Theory via Approximation by Regular Languages
Jason P. Bell, Thomas F. Lidbetter, and Jeffrey O. Shallit. “Additive Num- ber Theory via Approximation by Regular Languages”. In:Int. J. Found. Comput. Sci.31.6(2020),pp.667–687. doi: 10.1142/S0129054120410014
-
[6]
Lattices, closures systems and implication bases: A survey of structural aspects and algorithms
Karell Bertet et al. “Lattices, closures systems and implication bases: A survey of structural aspects and algorithms”. In:Theor. Comput. Sci.743 (2018), pp. 93–109.doi: 10.1016/J.TCS.2016.11.021
-
[7]
Checking NFA equivalence with bisim- ulations up to congruence
Filippo Bonchi and Damien Pous. “Checking NFA equivalence with bisim- ulations up to congruence”. In:The 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’13, Rome, Italy - January 23 - 25, 2013. Ed. by Roberto Giacobazzi and Radhia Cousot. ACM, 2013, pp. 457–468.doi: 10.1145/2429069.2429124
-
[8]
Using Walnut to solve problems from the OEIS
Wieb Bosma et al. “Using Walnut to solve problems from the OEIS”. In: J. Integer Seq.28.1 (2025), p. 25.3.8.url: https://cs.uwaterloo.ca/ journals/JIS/VOL28/Fokkink/fokkink9.html
work page 2025
-
[9]
Model-Based Testing of Reactive Systems
Manfred Broy et al. Model-Based Testing of Reactive Systems. Vol. 3472. Lecture Notes in Computer Science. Secaucus, NJ, USA: Springer-Verlag New York, Inc., 2005.doi: 10.1007/b137241
-
[10]
Canonical regular expressions and minimal state graphs for definite events
Janusz A. Brzozowski. “Canonical regular expressions and minimal state graphs for definite events”. In:Proc. Symposium of Mathematical Theory of Automata. 1962, pp. 529–561
work page 1962
-
[11]
Data Structures for Finite Downsets of Natural Vectors: Theory and Practice
Michaël Cadilhac et al. “Data Structures for Finite Downsets of Natural Vectors: Theory and Practice”. In:Automated Technology for Verification and Analysis - 23rd International Symposium, ATVA 2025, Bengaluru, In- dia, October 27-31, 2025, Proceedings. Ed. by Meenakshi D’Souza, Ragha- vanKomondoor,andB.Srivathsan.Vol.16145.LectureNotesinComputer Science....
-
[13]
Efficient reduction of nondeter- ministic automata with application to language inclusion testing
Lorenzo Clemente and Richard Mayr. “Efficient reduction of nondeter- ministic automata with application to language inclusion testing”. In:Log. Methods Comput. Sci.15.1 (2019). doi: 10.23638/LMCS-15(1:12)2019
-
[14]
Properties of a ternary infinite word
James D. Currie et al. “Properties of a ternary infinite word”. In:RAIRO Theor. Informatics Appl.57 (2023), p. 1.doi: 10.1051/ITA/2022010
-
[15]
AntichainAlgorithmsforFinite Automata
LaurentDoyenandJean-FrançoisRaskin.“AntichainAlgorithmsforFinite Automata”. In:Tools and Algorithms for the Construction and Analysis of Systems, 16th International Conference, TACAS 2010, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2010, Paphos, Cyprus, March 20-28, 2010. Proceedings. Ed. by Javier Es- parza a...
-
[16]
Properties of a class of Toeplitz words
Gabriele Fici and Jeffrey O. Shallit. “Properties of a class of Toeplitz words”. In:Theor. Comput. Sci. 922 (2022), pp. 1–12.doi: 10 . 1016 / J.TCS.2022.04.006
work page 2022
-
[17]
On Equivalence Checking of Nondeterministic Finite Au- tomata
Chen Fu et al. “On Equivalence Checking of Nondeterministic Finite Au- tomata”. In:Dependable Software Engineering. Theories, Tools, and Appli- cations - Third International Symposium, SETTA 2017, Changsha, China, October 23-25, 2017, Proceedings. Ed. by Kim Guldstrand Larsen, Oleg Sokolsky, and Ji Wang. Vol. 10606. Lecture Notes in Computer Science. Spri...
-
[18]
The simplest binary word with only three squares
Daniel Gabric and Jeffrey O. Shallit. “The simplest binary word with only three squares”. In:RAIRO Theor. Informatics Appl.55 (2021), pp. 1–7. doi: 10.1051/ITA/2021001
-
[19]
Equivalence Checking 40 Years After: A Review of Bisimulation Tools
Hubert Garavel and Frédéric Lang. “Equivalence Checking 40 Years After: A Review of Bisimulation Tools”. In: Lecture Notes in Computer Science 13560 (2022). Ed. by Nils Jansen, Mariëlle Stoelinga, and Petra van den Bos, pp. 213–265.doi: 10.1007/978-3-031-15629-8_13
-
[20]
Symbolic Model Checking for Real-Time Systems
Rob J. van Glabbeek. “The Linear Time-Branching Time Spectrum (Ex- tended Abstract)”. In:CONCUR ’90, Theories of Concurrency: Unifica- tion and Extension, Amsterdam, The Netherlands, August 27-30, 1990, Proceedings. Ed. by Jos C. M. Baeten and Jan Willem Klop. Vol. 458. Lecture Notes in Computer Science. Springer, 1990, pp. 278–297.doi: 10.1007/BFB0039066
-
[21]
Five Determinisation Algorithms
Rob J. van Glabbeek and Bas Ploeger. “Five Determinisation Algorithms”. In:Implementation and Applications of Automata, 13th International Con- ference, CIAA 2008, San Francisco, California, USA, July 21-24, 2008. Proceedings. Ed. by Oscar H. Ibarra and Bala Ravikumar. Vol. 5148. Lec- ture Notes in Computer Science. Springer, 2008, pp. 161–170.doi: 10 . 1...
work page 2008
-
[22]
Descriptional and computational co mplexity of finite automata—A survey
Markus Holzer and Martin Kutrib. “Descriptional and computational com- plexity of finite automata - A survey”. In:Inf. Comput. 209.3 (2011), pp. 456–470.doi: 10.1016/J.IC.2010.11.013
-
[23]
An n log n algorithm for minimizing states in a finite automaton
John Hopcroft. “An n log n algorithm for minimizing states in a finite automaton”. In:Theory of machines and computations. Elsevier, 1971, pp. 189–196
work page 1971
-
[24]
Hopcroft, Rajeev Motwani, and Jeffrey D
John E. Hopcroft, Rajeev Motwani, and Jeffrey D. Ullman.Introduction to automata theory, languages, and computation, 3rd Edition. 2007
work page 2007
-
[25]
Principles and Methods of Testing Finite State Machines - a Survey
David Lee and Mihalis Yannakakis. “Principles and Methods of Testing Finite State Machines - a Survey”. In:Proc. IEEE84.8 (1996), pp. 1090–
work page 1996
-
[26]
doi: 10.1109/5.533956
-
[27]
Designing algorithms for big graph datasets : a study of computing bisimulation and joins
Yongming Luo. “Designing algorithms for big graph datasets : a study of computing bisimulation and joins”. PhD thesis. Technische Universiteit Eindhoven, Netherlands, 2015.isbn: 978-90-386-3801-0
work page 2015
-
[28]
PseudoperiodicWordsandaQuestionofShevelev
JosephMeleshkoetal.“PseudoperiodicWordsandaQuestionofShevelev”. In:Discret. Math. Theor. Comput. Sci.25.2(2023). doi: 10.46298/DMTCS. 9919
-
[29]
Automatic proofs in combinatorial game theory
Bastien Mignoty et al. “Automatic proofs in combinatorial game theory”
-
[30]
url: https://hdl.handle.net/2268/323845
-
[31]
Robin Milner. Communication and concurrency. PHI Series in computer science. Prentice Hall, 1989.isbn: 978-0-13-115007-2
work page 1989
-
[32]
Scalable Funding of Bitcoin Micropayment Channel Networks
Hamoon Mousavi and Jeffrey O. Shallit. “Mechanical Proofs of Properties of the Tribonacci Word”. In:Combinatorics on Words - 10th International Conference, WORDS 2015, Kiel, Germany, September 14-17, 2015, Pro- ceedings. Ed. by Florin Manea and Dirk Nowotka. Vol. 9304. Lecture Notes in Computer Science. Springer, 2015, pp. 170–190.doi: 10.1007/978-3- 319-...
-
[33]
OTF GitHub page. Online. Accessed: Jan 12, 2026.url: https://github. com/jn1z/OTF
work page 2026
-
[34]
Finite Automata and Their Decision Problems
Michael O. Rabin and Dana S. Scott. “Finite Automata and Their Decision Problems”. In:IBM J. Res. Dev.3.2 (1959), pp. 114–125.doi: 10.1147/ RD.32.0114
work page 1959
-
[35]
A New Efficient Simulation Equivalence Algorithm
Francesco Ranzato and Francesco Tapparo. “A New Efficient Simulation Equivalence Algorithm”. In:22nd IEEE Symposium on Logic in Computer Science (LICS 2007), 10-12 July 2007, Wroclaw, Poland, Proceedings. IEEE Computer Society, 2007, pp. 171–180.doi: 10.1109/LICS.2007.8
-
[36]
Thefirst-ordertheoryofbinaryoverlap- free words is decidable
LukeSchaefferandJeffreyShallit.“Thefirst-ordertheoryofbinaryoverlap- free words is decidable”. In:Canadian Journal of Mathematics76.4 (2024), pp. 1144–1162.doi: 10.4153/S0008414X23000342
-
[37]
Adventures with an Automatic Prover
Jeffrey Shallit. Adventures with an Automatic Prover. Online. Accessed: April 23, 2025. 2025.url: https://cs.uwaterloo.ca/~shallit/Talks/ simons3.pdf
work page 2025
-
[38]
Developing Walnut Commands for Sequence Properties.On- line
JeffreyShallit. Developing Walnut Commands for Sequence Properties.On- line. Accessed: March 23, 2025. 2024.url: https://cs.uwaterloo.ca/ ~shallit/Talks/walnut-properties.pdf
work page 2025
-
[39]
The Logical Approach to Automatic Sequences
Jeffrey Shallit. The Logical Approach to Automatic Sequences. Vol. 482. Cambridge University Press, 2022
work page 2022
-
[40]
The Walnut Tutorial: Using A Tool for Doing Combina- torics on Words
Jeffrey Shallit. The Walnut Tutorial: Using A Tool for Doing Combina- torics on Words. Online. Accessed: March 23, 2025. 2021.url: https : //cs.uwaterloo.ca/~shallit/Talks/walnut-unpaused.pdf
work page 2025
-
[41]
Walnut Papers and Books (plus addenda).Online.Accessed: April 23, 2025
JeffreyShallit. Walnut Papers and Books (plus addenda).Online.Accessed: April 23, 2025. 2025. url: https : / / cs . uwaterloo . ca / ~shallit / walnut-papers.html
work page 2025
-
[42]
The Narayana Morphism and Related Words
Jeffrey O. Shallit. “The Narayana Morphism and Related Words”. In: CoRR abs/2503.01026 (2025). doi: 10.48550/ARXIV.2503.01026. arXiv: 2503.01026
-
[43]
Automatic sequences in negative bases and proofs of some conjectures of Shevelev
Jeffrey O. Shallit, Sonja Linghui Shan, and Kai-Hsiang Yang. “Automatic sequences in negative bases and proofs of some conjectures of Shevelev”. In: RAIRO Theor. Informatics Appl.57 (2023), p. 4.doi: 10.1051/ITA/ 2022011
-
[44]
Power-free com- plementary binary morphisms
Jeffrey O. Shallit, Arseny M. Shur, and Stefan Zorcic. “Power-free com- plementary binary morphisms”. In:J. Comb. Theory, Ser. A207 (2024), p. 105910. doi: 10.1016/J.JCTA.2024.105910
-
[45]
Circular critical exponents for Thue- Morse factors
Jeffrey O. Shallit and Ramin Zarifi. “Circular critical exponents for Thue- Morse factors”. In:RAIRO Theor. Informatics Appl.53.1-2 (2019), pp. 37–
work page 2019
-
[46]
doi: 10.1051/ITA/2018008
-
[47]
Introduction to the theory of computation
Michael Sipser. Introduction to the theory of computation. PWS Publishing Company, 1997. isbn: 978-0-534-94728-6
work page 1997
-
[48]
Vlad Slavici et al. “Finding the Minimal DFA of Very Large Finite State Automata with an Application to Token Passing Networks”. In:CoRR abs/1103.5736 (2011). arXiv:1103.5736
work page internal anchor Pith review Pith/arXiv arXiv 2011
-
[49]
Experimentalevaluationofexplicitandsymbolicautomata- theoretic algorithms
DeianTabakov.“Experimentalevaluationofexplicitandsymbolicautomata- theoretic algorithms”. Master’s thesis. Rice University, USA, 2006.url: https://hdl.handle.net/1911/17923
work page 2006
-
[50]
Experimental Evaluation of Classi- cal Automata Constructions
Deian Tabakov and Moshe Y. Vardi. “Experimental Evaluation of Classi- cal Automata Constructions”. In:Logic for Programming, Artificial Intel- ligence, and Reasoning, 12th International Conference, LPAR 2005, Mon- tego Bay, Jamaica, December 2-6, 2005, Proceedings. Ed. by Geoff Sut- cliffe and Andrei Voronkov. Vol. 3835. Lecture Notes in Computer Science....
-
[51]
Fast and exact analysis for LRU caches
Valentin Touzeau et al. “Fast and exact analysis for LRU caches”. In:Proc. ACM Program. Lang.3.POPL (2019), 54:1–54:29.doi: 10.1145/3290367
-
[52]
A taxonomy of model-based testing approaches
Mark Utting, Alexander Pretschner, and Bruno Legeard. “A taxonomy of model-based testing approaches”. In:Softw. Test. Verification Reliab.22.5 (2012), pp. 297–312.doi: 10.1002/STVR.456
-
[53]
Simple Bisimilarity Minimization inO(m log n) Time
Antti Valmari. “Simple Bisimilarity Minimization inO(m log n) Time”. In: Fundam. Informaticae105.3 (2010), pp. 319–339.doi: 10.3233/FI-2010- 369
-
[54]
Walnut GitHub page. Online. Accessed: Jan 12, 2026. url: https : / / github.com/Walnut-Theorem-Prover
work page 2026
-
[55]
Antichains: A New Algorithm for Checking Univer- sality of Finite Automata
Martin De Wulf et al. “Antichains: A New Algorithm for Checking Univer- sality of Finite Automata”. In:Computer Aided Verification, 18th Inter- national Conference, CAV 2006, Seattle, WA, USA, August 17-20, 2006, Proceedings. Ed. by Thomas Ball and Robert B. Jones. Vol. 4144. Lecture Notes in Computer Science. Springer, 2006, pp. 17–30.doi: 10 . 1007 / 11817963_5
work page 2006
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.