pith. sign in

arxiv: 2602.21073 · v2 · submitted 2026-02-24 · 💻 cs.FL

Automata Learning with an Incomplete but Inductive Teacher (Technical Report)

Pith reviewed 2026-05-15 19:36 UTC · model grok-4.3

classification 💻 cs.FL
keywords active automata learningMAT teacherinductive teacherSAT encodingregular language separationregular model checkinginvariant synthesiscounterexample handling
0
0 comments X

The pith

IdMAT teacher formalism allows active automata learning to find any regular language in a target class using incomplete inductive answers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces IdMAT, a teacher that answers membership and equivalence queries consistently with respect to any language in a given class of regular languages, supplying either incomplete don't-know responses or inductive facts such as if a word is accepted then another must be. It pairs this with LIndA, an algorithm that encodes every uncertainty into one SAT instance instead of branching on possibilities and solves it incrementally while handling both ordinary and inductive counterexamples. The approach targets applications like regular language separation and inductive invariant synthesis in regular model checking where many correct automata may exist. A sympathetic reader would care because the single-instance encoding removes the exponential cost of forking on incomplete information while still producing a correct output automaton for the class.

Core claim

The authors introduce IdMAT as a teacher formalism that can provide incomplete answers or inductive facts consistent with any language in the target class, and pair it with the LIndA algorithm that reduces the learning problem to solving a unique SAT instance using incremental solving and UNSAT core analysis, handling both simple and inductive counterexamples efficiently.

What carries the argument

IdMAT, a teacher formalism supplying incomplete or inductive answers consistent with any target-class language, and LIndA, the algorithm that encodes all uncertainties into a single SAT instance without forking.

If this is right

  • Any regular language in the target class can be found without fully characterizing the language.
  • All uncertainties are encoded in one SAT instance, avoiding exponential forking.
  • Incremental SAT solving and UNSAT core analysis improve efficiency over repeated independent solves.
  • Simple and inductive counterexamples are processed frugally using a Rivest-Schapire-inspired refinement.
  • The method applies directly to regular language separation and to inductive invariant synthesis in regular model checking.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If inductive facts turn out to be dense in practice, the number of membership queries needed may drop below that of a standard MAT teacher.
  • The single-SAT encoding could be reused for other learning settings that admit multiple consistent models.
  • Tractability of the SAT instances will determine whether the approach scales to larger alphabets or longer words in real verification tasks.
  • Pairing the method with existing query-minimization heuristics could further reduce total learning cost.

Load-bearing premise

The inductive facts supplied by the teacher are sufficiently dense and the resulting SAT instances remain tractable for regular-language separation and regular model checking.

What would settle it

An experiment on a regular language separation instance in which LIndA either fails to return a correct separator when one exists or requires more SAT solving time than a forking baseline on the same input.

Figures

Figures reproduced from arXiv: 2602.21073 by Adrien Pommellet, Daniel Stan, Juliette Jacquot.

Figure 1
Figure 1. Figure 1: b compares the runtime of both RS variants with a RS-free execution of L IndA and shows an overall improvement [PITH_FULL_IMAGE:figures/full_fig_p015_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The encoding CP,S of the observation table. These PFs can be trivially converted to CNF without exponential blow-up, resulting in O(|P| 3 |Σ|+|P| 2 |Σ| 2 |S| 2 ) clauses and O(|P| 2 |Σ|+|P||S|) variables. They are to be understood as follows: (basisε) Word ε always belongs to the basis. (basispa) The basis must be prefix-closed. (reachpa) If p · a belongs to the basis, it is the successor of state p by let… view at source ↗
read the original abstract

Active automata learning (AAL) under a Minimally Adequate Teacher (MAT) has been successfully used to infer a regular language through membership and equivalence queries. This language might not be fully characterized: we are then interested in finding any solution in a target class of possibly many regular languages. Some problems such as regular language separation or inductive invariant synthesis in the context of regular model checking (RMC) may indeed admit more than one answer. We therefore introduce IdMAT: a new teacher formalism answering queries with respect to any language in the target class, all at once. Such a teacher tailored towards invariant synthesis might provide incomplete "don't know" answers, but also inductive facts of the form "if w1 is accepted, so is w2". We pair IdMAT with a novel AAL algorithm LIndA that 1. encodes all uncertainties as a unique SAT instance and does not fork, 2. leverages incremental SAT solving and UNSAT core analysis, and 3. handles counterexamples of the simple or inductive type in a frugal manner inspired by the Rivest-Schapire refinement technique. We finally evaluate a prototype implementation in the context of regular language separation and RMC.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces IdMAT, a new teacher formalism for active automata learning that answers queries with respect to any language in a target class (possibly many regular languages), providing incomplete 'don't know' answers along with inductive facts of the form 'if w1 is accepted then w2 is'. It pairs this with the LIndA algorithm, which encodes all uncertainties (including inductive implications) into a single SAT instance without forking, leverages incremental SAT solving and UNSAT-core analysis, and handles both simple and inductive counterexamples in a frugal way inspired by Rivest-Schapire. The approach is evaluated via a prototype on regular language separation and regular model checking (RMC) problems.

Significance. If the central claims on correctness and tractability hold, the work could enable more scalable automata learning for under-specified problems such as invariant synthesis in RMC and language separation, where multiple solutions exist. The single-SAT encoding plus incremental solving is a technically interesting departure from standard forking-based AAL, and the inductive-teacher model directly addresses a practical gap in incomplete teachers. Strengths include the explicit use of standard SAT primitives and the frugal counterexample handling; these could be reusable if the encoding remains compact.

major comments (3)
  1. [§3] §3 (LIndA algorithm description): The headline claim that all uncertainties and inductive facts are captured in one SAT instance without forking is load-bearing for the efficiency argument, yet the manuscript supplies neither a formal definition of the encoding nor a size bound on the resulting CNF (or on the number of inductive facts needed per query). Without this, it is impossible to verify that the approach avoids the exponential cost it claims to sidestep when inductive facts are sparse.
  2. [Evaluation] Evaluation section: The abstract states that a prototype was evaluated on regular language separation and RMC, but no quantitative results (query counts, SAT instance sizes, runtimes, or comparison against forking baselines) are reported. This absence makes it impossible to assess whether the incremental-SAT + UNSAT-core strategy remains practical for the target problems.
  3. [§4] §4 (correctness argument): No proof or even sketch is supplied that LIndA correctly learns a language consistent with the IdMAT answers (including inductive implications). The absence of a correctness argument is a load-bearing gap for any claim that the single-instance encoding preserves soundness.
minor comments (2)
  1. [Abstract] The abstract introduces the acronyms IdMAT and LIndA without first spelling them out; this should be corrected for readability.
  2. [Introduction] Notation for inductive facts (e.g., the 'if w1 accepted then w2' implication) is used without a compact formal syntax; a small definition box would help.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment below and indicate the revisions planned for the next version.

read point-by-point responses
  1. Referee: [§3] §3 (LIndA algorithm description): The headline claim that all uncertainties and inductive facts are captured in one SAT instance without forking is load-bearing for the efficiency argument, yet the manuscript supplies neither a formal definition of the encoding nor a size bound on the resulting CNF (or on the number of inductive facts needed per query). Without this, it is impossible to verify that the approach avoids the exponential cost it claims to sidestep when inductive facts are sparse.

    Authors: We agree that a formal definition of the encoding and an accompanying size analysis are required to substantiate the efficiency claims. In the revised manuscript we will supply a precise definition of the CNF encoding that captures both membership uncertainties and inductive implications in a single instance. We will also prove a polynomial bound on the number of clauses and variables in terms of the number of queries and inductive facts, confirming that the encoding remains compact when inductive facts are sparse. revision: yes

  2. Referee: [Evaluation] Evaluation section: The abstract states that a prototype was evaluated on regular language separation and RMC, but no quantitative results (query counts, SAT instance sizes, runtimes, or comparison against forking baselines) are reported. This absence makes it impossible to assess whether the incremental-SAT + UNSAT-core strategy remains practical for the target problems.

    Authors: The prototype was run on the cited benchmarks, but the detailed metrics were not included in the submitted draft. We will expand the evaluation section with tables reporting query counts, SAT instance sizes, runtimes, and direct comparisons against forking-based baselines for both language-separation and RMC instances, thereby allowing assessment of practicality. revision: yes

  3. Referee: [§4] §4 (correctness argument): No proof or even sketch is supplied that LIndA correctly learns a language consistent with the IdMAT answers (including inductive implications). The absence of a correctness argument is a load-bearing gap for any claim that the single-instance encoding preserves soundness.

    Authors: We will add a correctness section containing a proof sketch. The argument will show that the single SAT encoding is sound with respect to the IdMAT answers and inductive facts, that every hypothesis produced is consistent with the accumulated information, and that the algorithm terminates with a regular language satisfying all constraints. revision: yes

Circularity Check

0 steps flagged

No circularity: algorithmic construction rests on standard SAT and automata primitives

full rationale

The paper introduces IdMAT (a teacher formalism) and LIndA (an AAL algorithm) as explicit constructions that encode uncertainties into a single SAT instance, leverage incremental solving, and handle counterexamples via a Rivest-Schapire-inspired refinement. No derivation step reduces by definition to its own inputs, no fitted parameter is relabeled as a prediction, and no load-bearing claim depends on a self-citation chain whose validity is presupposed. The single-SAT encoding is presented as a direct algorithmic choice whose tractability is an empirical question for the target domains rather than a self-referential assumption. The derivation chain is therefore self-contained against external benchmarks of SAT solvers and automata learning.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on standard automata-theory and SAT-solving assumptions; no free parameters, invented entities, or ad-hoc axioms are introduced beyond the new teacher interface itself.

axioms (1)
  • standard math Membership and equivalence queries can be encoded as Boolean constraints whose satisfiability corresponds to existence of a consistent regular language.
    The SAT encoding step presupposes that the language-learning problem is faithfully captured by propositional constraints.

pith-pipeline@v0.9.0 · 5509 in / 1348 out tokens · 51026 ms · 2026-05-15T19:36:54.557243+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages · 1 internal anchor

  1. [1]

    Abdulla, P.A.: Regular model checking. Int. J. Softw. Tools Technol. Transf.14(2), 109–118 (2012). https://doi.org/10.1007/S10009-011-0216-8, https://doi.org/10. 1007/s10009-011-0216-8

  2. [2]

    Abdulla, P.A., Haziza, F., Holík, L.: Parameterized verification through view ab- straction. Int. J. Softw. Tools Technol. Transf.18(5), 495–516 (2016). https://doi. org/10.1007/S10009-015-0406-X, https://doi.org/10.1007/s10009-015-0406-x

  3. [3]

    Angluin, D.: Learning regular sets from queries and counterexamples. Inf. Comput. 75(2), 87–106 (1987). https://doi.org/10.1016/0890-5401(87)90052-6, https://doi. org/10.1016/0890-5401(87)90052-6 Automata Learning with an Incomplete but Inductive Teacher 17

  4. [4]

    (eds.): Proceedings of SAT Competition 2023: Solver, Benchmark and Proof Checker Descriptions

    Balyo, T., Heule, M., Iser, M., Järvisalo, M., Suda, M. (eds.): Proceedings of SAT Competition 2023: Solver, Benchmark and Proof Checker Descriptions. Depart- ment of Computer Science Series of Publications B, Department of Computer Sci- ence, University of Helsinki, Finland (2023)

  5. [5]

    In: Gurfinkel, A., Ganesh, V

    Biere, A., Faller, T., Fazekas, K., Fleury, M., Froleyks, N., Pollitt, F.: Cadical 2.0. In: Gurfinkel, A., Ganesh, V. (eds.) Computer Aided Verification. pp. 133–152. Springer Nature Switzerland, Cham (2024)

  6. [6]

    In: Kowalewski, S., Philippou, A

    Chen, Y.F., Farzan, A., Clarke, E.M., Tsay, Y.K., Wang, B.Y.: Learning minimal separating dfa’s for compositional verification. In: Kowalewski, S., Philippou, A. (eds.) Tools and Algorithms for the Construction and Analysis of Systems. pp. 31–45. Springer Berlin Heidelberg, Berlin, Heidelberg (2009)

  7. [7]

    In: Stewart, D., Weissenbacher, G

    Chen, Y., Hong, C., Lin, A.W., Rümmer, P.: Learning to prove safety over param- eterised concurrent systems. In: Stewart, D., Weissenbacher, G. (eds.) 2017 Formal Methods in Computer Aided Design, FMCAD 2017, Vienna, Austria, October 2- 6, 2017. pp. 76–83. IEEE (2017). https://doi.org/10.23919/FMCAD.2017.8102244, https://doi.org/10.23919/FMCAD.2017.8102244

  8. [8]

    In: Majumdar, R., Silva, A

    Czerner, P., Esparza, J., Krasotin, V., Welzel-Mohr, C.: Computing Inductive In- variants of Regular Abstraction Frameworks. In: Majumdar, R., Silva, A. (eds.) 35th International Conference on Concurrency Theory (CONCUR 2024). Leibniz International Proceedings in Informatics (LIPIcs), vol. 311, pp. 19:1–19:18. Schloss Dagstuhl – Leibniz-Zentrum für Inform...

  9. [9]

    Esparza, J., Raskin, M., Welzel-Mohr, C.: Regular model checking upside-down: An invariant-based approach. Log. Methods Comput. Sci.21(1), 4 (2025). https:// doi.org/10.46298/LMCS-21(1:4)2025, https://doi.org/10.46298/lmcs-21(1:4)2025

  10. [10]

    In: Klin, B., Lasota, S., Muscholl, A

    Esparza, J., Raskin, M.A., Welzel, C.: Regular model checking upside-down: An invariant-based approach. In: Klin, B., Lasota, S., Muscholl, A. (eds.) 33rd In- ternational Conference on Concurrency Theory, CONCUR 2022, September 12-16, 2022, Warsaw, Poland. LIPIcs, vol. 243, pp. 23:1–23:19. Schloss Dagstuhl - Leibniz- Zentrum für Informatik (2022). https:/...

  11. [11]

    Journal of the ACM39(3), 675–735 (Jul 1992)

    German, S.M., Sistla, A.P.: Reasoning about systems with many processes. Journal of the ACM39(3), 675–735 (Jul 1992)

  12. [12]

    In: Furbach, U., Shankar, N

    Grinchtein, O., Leucker, M., Piterman, N.: Inferring network invariants automat- ically. In: Furbach, U., Shankar, N. (eds.) Automated Reasoning. pp. 483–497. Springer Berlin Heidelberg, Berlin, Heidelberg (2006)

  13. [13]

    In: Jansen, N., Stoelinga, M., van den Bos, P

    Howar, F., Steffen, B.: Active automata learning as black-box search and lazy par- tition refinement. In: Jansen, N., Stoelinga, M., van den Bos, P. (eds.) A Journey from Process Algebra via Timed Automata to Model Learning - Essays Dedicated to Frits Vaandrager on the Occasion of His 60th Birthday. Lecture Notes in Com- puter Science, vol. 13560, pp. 321...

  14. [14]

    Ignatiev, A., Morgado, A., Marques-Silva, J.: PySAT: A Python toolkit for pro- totyping with SAT oracles. In: SAT. pp. 428–437 (2018). https://doi.org/10.1007/ 978-3-319-94144-8_26, https://doi.org/10.1007/978-3-319-94144-8_26

  15. [15]

    Ignatiev, A., Tan, Z.L., Karamanos, C.: Towards universally accessible SAT tech- nology. In: SAT. pp. 4:1–4:11 (2024). https://doi.org/10.4230/LIPICS.SAT.2024. 16, https://doi.org/10.4230/LIPIcs.SAT.2024.16 18 D. Stan, A. Pommellet, J. Jacquot

  16. [16]

    In: Graf, S., Schwartzbach, M

    Jonsson, B., Nilsson, M.: Transitive closures of regular relations for verifying infinite-state systems. In: Graf, S., Schwartzbach, M. (eds.) Tools and Algorithms for the Construction and Analysis of Systems. pp. 220–235. Springer Berlin Hei- delberg, Berlin, Heidelberg (2000)

  17. [17]

    In: Margaria, T., Steffen, B

    Leucker, M., Neider, D.: Learning minimal deterministic automata from inexpe- rienced teachers. In: Margaria, T., Steffen, B. (eds.) Leveraging Applications of Formal Methods, Verification and Validation. Technologies for Mastering Change. pp. 524–538. Springer Berlin Heidelberg, Berlin, Heidelberg (2012)

  18. [18]

    Liffiton, M.H., Sakallah, K.A.: Algorithms for computing minimal unsatisfiable subsets of constraints. J. Autom. Reason.40(1), 1–33 (2008). https://doi.org/10. 1007/S10817-007-9084-Z, https://doi.org/10.1007/s10817-007-9084-z

  19. [19]

    In: ECOOP

    Moeller, M., Wiener, T., Solko-Breslin, A., Koch, C., Foster, N., Silva, A.: Au- tomata learning with an incomplete teacher. In: Ali, K., Salvaneschi, G. (eds.) 37th European Conference on Object-Oriented Programming, ECOOP 2023, July 17-21, 2023, Seattle, Washington, United States. LIPIcs, vol. 263, pp. 21:1–21:30. Schloss Dagstuhl - Leibniz-Zentrum für ...

  20. [20]

    Dagstuhl Artifacts Se- ries9(2), 21:1–21:3 (2023)

    Moeller, M., Wiener, T., Solko-Breslin, A., Koch, C., Foster, N., Silva, A.: Au- tomata Learning with an Incomplete Teacher (Artifact). Dagstuhl Artifacts Se- ries9(2), 21:1–21:3 (2023). https://doi.org/10.4230/DARTS.9.2.21, https://drops. dagstuhl.de/entities/document/10.4230/DARTS.9.2.21

  21. [21]

    Neider, D.: Applications of automata learning in verification and synthesis. Ph.D. thesis, RWTH Aachen University (2014), http://darwin.bth.rwth-aachen. de/opus3/volltexte/2014/5169

  22. [22]

    In: Brat, G., Rungta, N., Venet, A

    Neider, D., Jansen, N.: Regular model checking using solver technologies and automata learning. In: Brat, G., Rungta, N., Venet, A. (eds.) NASA Formal Methods, 5th International Symposium, NFM 2013, Moffett Field, CA, USA, May 14-16, 2013. Proceedings. Lecture Notes in Computer Science, vol. 7871, pp. 16–31. Springer (2013). https://doi.org/10.1007/978-3-...

  23. [23]

    Oliveira, A.L., Silva, J.P.M.: Efficient algorithms for the inference of mini- mum size dfas. Mach. Learn.44(1/2), 93–119 (2001). https://doi.org/10.1023/A: 1010828029885, https://doi.org/10.1023/A:1010828029885

  24. [24]

    Rivest, R.L., Schapire, R.E.: Inference of finite automata using homing sequences. Inf. Comput.103(2), 299–347 (1993). https://doi.org/10.1006/INCO.1993.1021, https://doi.org/10.1006/inco.1993.1021

  25. [25]

    Automata Learning with an Incomplete but Inductive Teacher (Technical Report)

    Stan, D., Pommellet, A., Jacquot, J.: Automata learning with an incomplete but inductive teacher. CoRRabs/2602.21073(2026). https://doi.org/10.48550/ ARXIV.2602.21073, https://doi.org/10.48550/arXiv.2602.21073

  26. [26]

    Steffen, B., Howar, F., Merten, M.: Introduction to Active Automata Learning from a Practical Perspective, pp. 256–296. Springer Berlin Heidelberg, Berlin, Hei- delberg (2011). https://doi.org/10.1007/978-3-642-21455-4_8, https://doi.org/10. 1007/978-3-642-21455-4_8

  27. [27]

    In: Mayr, R

    Touili, T.: Regular model checking using widening techniques. In: Mayr, R. (ed.) Verification of Parameterized Systems, VEPAS 2001, Satellite Workshop of ICALP 2001, Crete, Greece, July 13, 2001. Electronic Notes in Theoretical Computer Science, vol. 50, pp. 342–356. Elsevier (2001). https://doi.org/10.1016/ S1571-0661(04)00187-2, https://doi.org/10.1016/...

  28. [28]

    In: Fisman, D., Rosu, G

    Vaandrager, F., Garhewal, B., Rot, J., Wißmann, T.: A new approach for active automata learning based on apartness. In: Fisman, D., Rosu, G. (eds.) Tools and Automata Learning with an Incomplete but Inductive Teacher 19 Algorithms for the Construction and Analysis of Systems. pp. 223–243. Springer International Publishing, Cham (2022)

  29. [29]

    In: Davies, J., Schulte, W., Barnett, M

    Vardhan, A., Sen, K., Viswanathan, M., Agha, G.: Learning to verify safety prop- erties. In: Davies, J., Schulte, W., Barnett, M. (eds.) Formal Methods and Software Engineering. pp. 274–289. Springer Berlin Heidelberg, Berlin, Heidelberg (2004)

  30. [30]

    yes” or “no

    Yaacov, T., Weiss, G., Amram, G., Hayoun, A.: Automata models for effective bug pattern description. In: 28th ACM/IEEE International Conference on Model Driven Engineering Languages and Systems, MODELS 2025, Grand Rapids, MI, USA, October 5-10, 2025. pp. 119–129. IEEE (2025). https://doi.org/10.1109/ MODELS67397.2025.00017, https://doi.org/10.1109/MODELS6...

  31. [31]

    ifMEM(c w1,i),MEM(c w2,j)∈ {0,1}, thenα w(i, j) =MEM(c w1,i)→MEM(c w2,j),

  32. [32]

    else if(c w1,i, cw2,j)∈MEM(c w1,i,{c w2,j}), thenα w(i, j) = 1,

  33. [33]

    else ifm[xcw1 ,i ],m[x cw2 ,j ]are defined, thenαw(i, j) =m[x cw1 ,i ]→m[x cw2 ,j ],

  34. [34]

    Property 2.α w(0,0) = 1andα w(|w1|,|w 2|) = 0

    elseα w(i, j) =□. Property 2.α w(0,0) = 1andα w(|w1|,|w 2|) = 0. Let⪯be the partial order relation on[0..|w 1|]×[0..|w 2|]such that (i, j)⪯(i ′, j′)↔(i≤i ′)∧(j≤j ′). We considerrectanglesinstead of intervals: Definition 12.ABreaking Rectangle(BR) is a rectangle[i .. i ′]×[j .. j ′]such that(i, j),(i ′, j′)∈[0 :|w 1|]×[0 :|w 2|],(i, j)≺(i ′, j′),{α w(i, j)...