pith. machine review for the scientific record. sign in

arxiv: 2605.07710 · v1 · submitted 2026-05-08 · 💻 cs.FL

Recognition: 2 theorem links

· Lean Theorem

Learning Tree Automata with Term Rewriting

Jakub Kopystia\'nski, Jan Otop

Authors on Pith no claims yet

Pith reviewed 2026-05-11 02:28 UTC · model grok-4.3

classification 💻 cs.FL
keywords tree automata learningterm rewritingdeductive inferencequery complexityAngluin algorithmtree languagesautomata learningrewriting systems
0
0 comments X

The pith

Supplying a term rewriting system allows the learning algorithm to deduce answers to some queries and thereby reduce the total number required to identify a tree automaton.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper extends the standard query-based algorithm for learning tree automata by letting the user supply a term rewriting system that encodes properties of the target language. The system is used during learning to infer the answers to certain membership or equivalence queries deductively rather than by direct oracle calls. This yields a measurable drop in query count for languages whose properties, such as commutativity of subtrees, can be expressed by rewriting rules. A reader would care because high query counts have long limited the practical size of automata that can be learned from examples.

Core claim

The central claim is that an Angluin-style learning procedure for tree automata can be augmented with deductive inference: a user-provided term rewriting system that is sound and consistent with the target language is used to answer some queries without oracle interaction, producing a correct automaton while lowering overall query complexity.

What carries the argument

A term rewriting system that encodes algebraic properties of the target tree language and is consulted to deduce correct query answers during the learning loop.

If this is right

  • Languages whose tree structure obeys rewrite rules such as commutativity or associativity become learnable with far fewer direct queries.
  • The learned automaton remains correct whenever the supplied rewrite system is sound and matches the language.
  • Natural data properties expressible by short rewrite systems produce the largest practical savings in query volume.
  • The same deductive mechanism can be applied to any property that can be stated as a finite set of rewrite rules over the alphabet.
  • Equivalence queries can also be answered in part by rewriting, further trimming the interaction with the oracle.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The technique could be paired with active sampling strategies to further cut queries when the rewrite system is only partially known.
  • Domains that already maintain rewrite systems, such as term databases or algebraic data types, could feed those systems directly into the learner.
  • If the rewrite system is allowed to be approximate, the algorithm might still converge faster while producing an automaton that is correct up to the stated properties.
  • The same idea of deductive query answering might transfer to learning other structures, such as string automata or graph grammars, whenever a suitable rewriting relation exists.

Load-bearing premise

The supplied term rewriting system must be sound, complete, and fully consistent with the unknown target language so that every deduction it produces is both correct and captures all relevant equivalences.

What would settle it

Run the algorithm on a small tree language whose properties are captured by a simple rewrite system and count whether the number of oracle queries actually drops below the count produced by the unaugmented algorithm; if it does not, or if the returned automaton is incorrect, the claim fails.

read the original abstract

We present an extension of the Angluin-style learning algorithm for tree automata that incorporates deductive inference. The learning algorithm is provided with a term rewriting system that specifies properties of the target tree language (e.g., the order of subtrees under a symbol f is irrelevant). This term rewriting system is used to infer answers to some queries, which reduces the query complexity of the learning algorithm. We present examples of rewrite systems that express natural properties of tree-structured data, which yield a significant reduction in the number of queries.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper extends the Angluin-style learning algorithm for tree automata by supplying a user-provided term rewriting system (TRS) that encodes properties of the target language (e.g., commutativity of subtrees). The TRS is used to deduce answers to selected membership queries, with the goal of lowering oracle query complexity. Examples of natural rewrite systems are presented that are claimed to yield significant query savings.

Significance. If the extension preserves correctness and the supplied TRS is sound and consistent with the target language, the technique offers a principled way to inject domain knowledge into tree-automaton learning and could reduce query cost in applications involving structured data. The approach builds directly on prior Angluin-style procedures without introducing new free parameters.

major comments (2)
  1. Abstract: the central claim of query reduction is stated at a high level but is unsupported by any correctness proof for the extended observation-table construction, any complexity analysis, or concrete query-count measurements; these are load-bearing for the contribution.
  2. Algorithm integration (high-level description): it is not shown that deductive answers from the TRS preserve the termination and equivalence invariants of the base Angluin procedure when the TRS is only assumed sound and consistent; a formal argument or invariant is required.
minor comments (1)
  1. Abstract: the weakest assumption on the TRS (soundness, completeness, consistency) should be stated explicitly rather than left implicit.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed review and constructive suggestions. We address each major comment below and will incorporate revisions to strengthen the formal aspects of the work.

read point-by-point responses
  1. Referee: Abstract: the central claim of query reduction is stated at a high level but is unsupported by any correctness proof for the extended observation-table construction, any complexity analysis, or concrete query-count measurements; these are load-bearing for the contribution.

    Authors: We agree that the abstract states the query-reduction claim at a high level. The manuscript presents the extended algorithm, shows how the TRS is used to answer selected membership queries, and includes concrete examples of natural rewrite systems (e.g., commutativity) together with illustrative query savings on small instances. However, we acknowledge that a self-contained correctness argument for the modified observation table, a complexity analysis, and tabulated query-count measurements are not provided. In the revision we will add a dedicated subsection proving that the extended table construction preserves the invariants of the base Angluin procedure (under the stated soundness and consistency assumptions on the TRS), include asymptotic bounds on the number of remaining oracle queries, and report explicit query counts for the running examples. revision: yes

  2. Referee: Algorithm integration (high-level description): it is not shown that deductive answers from the TRS preserve the termination and equivalence invariants of the base Angluin procedure when the TRS is only assumed sound and consistent; a formal argument or invariant is required.

    Authors: We appreciate this observation. The current text assumes that any answer deduced from a sound and consistent TRS is identical to the answer the target language would give, and therefore that the standard termination and equivalence arguments carry over. We will make this reasoning explicit by adding an invariant stating that every entry in the observation table (whether obtained from the oracle or deduced) remains consistent with the target language. Under this invariant the original proofs of termination and correctness of the learned automaton apply unchanged. The revised manuscript will contain a short lemma formalizing the invariant together with a proof sketch. revision: yes

Circularity Check

0 steps flagged

No significant circularity; extension relies on external TRS input and prior Angluin algorithm

full rationale

The claimed contribution is an algorithmic extension of the established Angluin-style tree-automaton learning procedure. A user-supplied term rewriting system (TRS) is provided as an external artifact that is assumed sound, complete, and consistent with the target language; this TRS is used only to answer a subset of membership queries by deduction rather than oracle calls. The base learning invariants, termination, and correctness are inherited from the prior Angluin algorithm, which is treated as independent prior work. No step in the described derivation reduces a result to a fitted parameter, renames an input as a prediction, or rests on a self-citation chain whose cited result itself depends on the present paper. The derivation chain therefore contains independent content and does not collapse by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the external rewriting system being a faithful model of the target language properties; no numerical parameters or new entities are introduced.

axioms (1)
  • domain assumption The term rewriting system correctly and completely captures the relevant equivalences of the target tree language.
    Deductive inference of query answers depends on this assumption.

pith-pipeline@v0.9.0 · 5370 in / 1081 out tokens · 48935 ms · 2026-05-11T02:28:20.928349+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages

  1. [1]

    A. V . Aho, M. R. Garey, and J. D. Ullman. The transitive reduction of a directed graph.SIAM Journal on Computing, 1(2):131–137, 1972. 12

  2. [2]

    Closure properties and decision problems of dag automata.Inf

    Siva Anantharaman, Paliath Narendran, and Micha ¨el Rusinowitch. Closure properties and decision problems of dag automata.Inf. Process. Lett., 94(5):231–240, 2005

  3. [3]

    Learning regular sets from queries and counterexamples.Information and computation, 75(2):87– 106, 1987

    Dana Angluin. Learning regular sets from queries and counterexamples.Information and computation, 75(2):87– 106, 1987

  4. [4]

    Unification of concept terms in description logics

    Franz Baader and Paliath Narendran. Unification of concept terms in description logics. In Henri Prade, editor, ECAI 1998, pages 331–335. John Wiley and Sons, 1998

  5. [5]

    Cambridge University Press, 1998

    Franz Baader and Tobias Nipkow.Term rewriting and all that. Cambridge University Press, 1998

  6. [6]

    Extracting context-free grammars from recurrent neural networks using tree-automata learning and a* search

    Beno ˆıt Barbot, Benedikt Bollig, Alain Finkel, Serge Haddad, Igor Khmelnitsky, Martin Leucker, Daniel Neider, Rajarshi Roy, and Lina Ye. Extracting context-free grammars from recurrent neural networks using tree-automata learning and a* search. InICGI 2021, pages 113–129. PMLR, 2021

  7. [7]

    Berlinkov

    Mikhail V . Berlinkov. Approximating the minimum length of synchronizing words is hard.Theory Comput. Syst., 54(2):211–223, 2014

  8. [8]

    Automata learning and identification of the support of language models

    Satwik Bhattamishra, Michael Hahn, and Varun Kanade. Automata learning and identification of the support of language models

  9. [9]

    On the regularity and learnability of ordered DAG lan- guages

    Henrik Bj ¨orklund, Johanna Bj¨orklund, and Petter Ericson. On the regularity and learnability of ordered DAG lan- guages. In Arnaud Carayol and Cyril Nicaud, editors,CIAA 2017, volume 10329 ofLecture Notes in Computer Science, pages 27–39. Springer, 2017

  10. [10]

    Automata and logics for unranked and unordered trees

    Iovka Boneva and Jean-Marc Talbot. Automata and logics for unranked and unordered trees. In J ¨urgen Giesl, editor,RTA 2005, volume 3467 ofLecture Notes in Computer Science, pages 500–515. Springer, 2005

  11. [11]

    Pozn ´amka k homog´ennym experimentom s koneˇcn`ymi automatmi.Matematicko-fyzik ´alny ˇcasopis, 14(3):208–216, 1964

    J ´an ˇCern`y. Pozn ´amka k homog´ennym experimentom s koneˇcn`ymi automatmi.Matematicko-fyzik ´alny ˇcasopis, 14(3):208–216, 1964

  12. [12]

    Automata on dag representations of finite trees

    Witold Charatonik. Automata on dag representations of finite trees. 1999

  13. [13]

    Hubert Comon, Max Dauchet, R ´emi Gilleron, Florent Jacquemard, Denis Lugiez, Christof L¨oding, Sophie Tison, and Marc Tommasi.Tree Automata Techniques and Applications. 2008

  14. [14]

    Scalable tree-based register automata learning

    Simon Dierl, Paul Fiterau-Brostean, Falk Howar, Bengt Jonsson, Konstantinos Sagonas, and Fredrik T ˚aquist. Scalable tree-based register automata learning. InTACAS 2024, pages 87–108, 2024

  15. [15]

    Learning a regular tree language from a teacher

    Frank Drewes and Johanna H ¨ogberg. Learning a regular tree language from a teacher. In Zolt´an ´Esik and Zolt´an F¨ul¨op, editors,DLT 2003, volume 2710 ofLecture Notes in Computer Science, pages 279–291. Springer, 2003

  16. [16]

    Query learning of regular tree languages: How to avoid dead states.Theory Comput

    Frank Drewes and Johanna H ¨ogberg. Query learning of regular tree languages: How to avoid dead states.Theory Comput. Syst., 40(2):163–185, 2007

  17. [17]

    Reset sequences for monotonic automata.SIAM J

    David Eppstein. Reset sequences for monotonic automata.SIAM J. Comput., 19(3):500–510, 1990

  18. [18]

    Active automata learning with advice

    Michal Fica and Jan Otop. Active automata learning with advice. InECAI 2025, volume 413 ofFrontiers in Artificial Intelligence and Applications, pages 1655–1662. IOS Press, 2025

  19. [19]

    Active automata learning with adaptive distinguishing sequences.CoRR, abs/1902.01139, 2019

    Markus Theo Frohme. Active automata learning with adaptive distinguishing sequences.CoRR, abs/1902.01139, 2019

  20. [20]

    Learning definable hypotheses on trees

    ´Emilie Grienenberger and Martin Ritzert. Learning definable hypotheses on trees. In Pablo Barcel ´o and Marco Calautti, editors,ICDT 2019, volume 127 ofLIPIcs, pages 24:1–24:18. Schloss Dagstuhl - Leibniz-Zentrum f ¨ur Informatik, 2019. 13

  21. [21]

    Learning multiplicity tree automata

    Amaury Habrard and Jos ´e Oncina. Learning multiplicity tree automata. In Yasubumi Sakakibara, Satoshi Kobayashi, Kengo Sato, Tetsuro Nishino, and Etsuji Tomita, editors,ICGI 2006, volume 4201 ofLecture Notes in Computer Science, pages 268–280. Springer, 2006

  22. [22]

    The TTT algorithm: A redundancy-free approach to active automata learning

    Malte Isberner, Falk Howar, and Bernhard Steffen. The TTT algorithm: A redundancy-free approach to active automata learning. InRV 2014, pages 307–322, 2014

  23. [23]

    Four one-shot learners for regular tree languages and their polynomial characterizability.Theor

    Anna Kasprzik. Four one-shot learners for regular tree languages and their polynomial characterizability.Theor. Comput. Sci., 485:85–106, 2013

  24. [24]

    Learning tree automata with term rewriting

    Jakub Kopystia ´nski and Jan Otop. Learning tree automata with term rewriting. Into appear in IJCAI 2026. ijcai.org, 2026

  25. [25]

    Learning tree automata with term rewriting: code and data

    Jakub Kopystia ´nski and Jan Otop. Learning tree automata with term rewriting: code and data. https://github. com/jotop/LearnDFTAwithTRS, 2026

  26. [26]

    Information extraction from web documents based on local unranked tree automaton inference

    Raymond Kosala, Maurice Bruynooghe, Jan Van den Bussche, and Hendrik Blockeel. Information extraction from web documents based on local unranked tree automaton inference. InIJCAI 2003, pages 403–408. Morgan Kaufmann, 2003

  27. [27]

    Small test suites for active automata learning

    Loes Kruger, Sebastian Junges, and Jurriaan Rot. Small test suites for active automata learning. InTACAS 2024, pages 109–129, 2024

  28. [28]

    Actively learning el terminologies from large language models

    Matteo Magnini, Riccardo Squarcialupi, Martin Tunge Sterri, Ana Ozaki, and Rivera Castillo. Actively learning el terminologies from large language models. InECAI 2025, volume 413 ofFrontiers in Artificial Intelligence and Applications, pages 1792–1799. IOS Press, 2025

  29. [29]

    Complexity of equivalence and learning for multiplicity tree automata.Journal of Machine Learning Research, 16:2465–2500, 2015

    Ines Marusic and James Worrell. Complexity of equivalence and learning for multiplicity tree automata.Journal of Machine Learning Research, 16:2465–2500, 2015

  30. [30]

    An introduction to description logics.Description logic handbook, 1:40, 2003

    Daniele Nardi, Ronald J Brachman, et al. An introduction to description logics.Description logic handbook, 1:40, 2003

  31. [31]

    Learning of structurally unambiguous probabilistic gram- mars

    Dolav Nitay, Dana Fisman, and Michal Ziv-Ukelson. Learning of structurally unambiguous probabilistic gram- mars. InAAAI 2021, pages 9170–9178. AAAI Press, 2021

  32. [32]

    Automatic grammar repair

    Moeketsi Raselimo and Bernd Fischer. Automatic grammar repair. In Eelco Visser, Dimitris S. Kolovos, and Emma S¨oderberg, editors,SLE 2021, pages 126–142. ACM, 2021

  33. [33]

    Learning regular tree languages from correction and equivalence queries.J

    Catalin Ionut Tirn ˘auc˘a and Cristina Tirn˘auc˘a. Learning regular tree languages from correction and equivalence queries.J. Autom. Lang. Comb., 12(4):501–524, 2007

  34. [34]

    Craig A. Tovey. A simplified np-complete satisfiability problem.Discret. Appl. Math., 8(1):85–89, 1984

  35. [35]

    Model learning.Communications of the ACM, 60(2):86–95, 2017

    Frits Vaandrager. Model learning.Communications of the ACM, 60(2):86–95, 2017

  36. [36]

    Vaandrager, Bharat Garhewal, Jurriaan Rot, and Thorsten Wißmann

    Frits W. Vaandrager, Bharat Garhewal, Jurriaan Rot, and Thorsten Wißmann. A new approach for active automata learning based on apartness. InTACAS 2022, pages 223–243, 2022

  37. [37]

    Learning pomset automata

    Gerco van Heerdt, Tobias Kapp´e, Jurriaan Rot, and Alexandra Silva. Learning pomset automata. In Stefan Kiefer and Christine Tasson, editors,ETAPS 2021, volume 12650, pages 510–530. Springer, 2021

  38. [38]

    Witwicki, Matei Zaharia, and Sanjit A

    Marcell Vazquez-Chanlatte, Karim Elmaaroufi, Stefan J. Witwicki, Matei Zaharia, and Sanjit A. Seshia.l ∗lm: Learning automata from examples using natural language oracles, 2025

  39. [39]

    Synthesizing transformations on hierarchically structured data

    Navid Yaghmazadeh, Christian Klinger, Isil Dillig, and Swarat Chaudhuri. Synthesizing transformations on hierarchically structured data. In Chandra Krintz and Emery D. Berger, editors,PLDI 2016, pages 508–521. ACM, 2016. 14