arxiv: 2605.07710 · v1 · submitted 2026-05-08 · 💻 cs.FL

Recognition: 2 theorem links

· Lean Theorem

Learning Tree Automata with Term Rewriting

Jakub Kopystia\'nski, Jan Otop

Authors on Pith no claims yet

Pith reviewed 2026-05-11 02:28 UTC · model grok-4.3

classification 💻 cs.FL

keywords tree automata learningterm rewritingdeductive inferencequery complexityAngluin algorithmtree languagesautomata learningrewriting systems

0 comments

The pith

Supplying a term rewriting system allows the learning algorithm to deduce answers to some queries and thereby reduce the total number required to identify a tree automaton.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper extends the standard query-based algorithm for learning tree automata by letting the user supply a term rewriting system that encodes properties of the target language. The system is used during learning to infer the answers to certain membership or equivalence queries deductively rather than by direct oracle calls. This yields a measurable drop in query count for languages whose properties, such as commutativity of subtrees, can be expressed by rewriting rules. A reader would care because high query counts have long limited the practical size of automata that can be learned from examples.

Core claim

The central claim is that an Angluin-style learning procedure for tree automata can be augmented with deductive inference: a user-provided term rewriting system that is sound and consistent with the target language is used to answer some queries without oracle interaction, producing a correct automaton while lowering overall query complexity.

What carries the argument

A term rewriting system that encodes algebraic properties of the target tree language and is consulted to deduce correct query answers during the learning loop.

If this is right

Languages whose tree structure obeys rewrite rules such as commutativity or associativity become learnable with far fewer direct queries.
The learned automaton remains correct whenever the supplied rewrite system is sound and matches the language.
Natural data properties expressible by short rewrite systems produce the largest practical savings in query volume.
The same deductive mechanism can be applied to any property that can be stated as a finite set of rewrite rules over the alphabet.
Equivalence queries can also be answered in part by rewriting, further trimming the interaction with the oracle.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The technique could be paired with active sampling strategies to further cut queries when the rewrite system is only partially known.
Domains that already maintain rewrite systems, such as term databases or algebraic data types, could feed those systems directly into the learner.
If the rewrite system is allowed to be approximate, the algorithm might still converge faster while producing an automaton that is correct up to the stated properties.
The same idea of deductive query answering might transfer to learning other structures, such as string automata or graph grammars, whenever a suitable rewriting relation exists.

Load-bearing premise

The supplied term rewriting system must be sound, complete, and fully consistent with the unknown target language so that every deduction it produces is both correct and captures all relevant equivalences.

What would settle it

Run the algorithm on a small tree language whose properties are captured by a simple rewrite system and count whether the number of oracle queries actually drops below the count produced by the unaugmented algorithm; if it does not, or if the returned automaton is incorrect, the claim fails.

read the original abstract

We present an extension of the Angluin-style learning algorithm for tree automata that incorporates deductive inference. The learning algorithm is provided with a term rewriting system that specifies properties of the target tree language (e.g., the order of subtrees under a symbol f is irrelevant). This term rewriting system is used to infer answers to some queries, which reduces the query complexity of the learning algorithm. We present examples of rewrite systems that express natural properties of tree-structured data, which yield a significant reduction in the number of queries.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds a user-supplied term rewriting system to Angluin-style tree automaton learning so some membership queries can be answered by deduction instead of oracle calls.

read the letter

The core move is straightforward: take the standard observation-table learner for trees and let it consult a term rewriting system to infer answers for certain queries when the rules encode properties of the target language. The examples cover natural cases like commutativity of subtrees, and they show the learner skipping queries that would otherwise be asked. That integration keeps the original termination and correctness structure while replacing some oracle calls with rewriting steps. The paper does this cleanly enough that the basic algorithm remains recognizable, which is the main practical point. The rewrite systems are treated as external input, so the contribution is really the interface and the claim that query counts drop when the rules are good. The soft spots are the usual ones for this style of work. The whole reduction depends on the supplied system being sound and consistent with the unknown language; nothing in the setup prevents the learner from accepting wrong equivalences if the rules are off. There is also no reported complexity analysis or scaling data beyond the small examples, so it is hard to know how much is saved once the tables grow or when the rewrite steps themselves become expensive. The user still has to produce or verify the rewrite system, which may be comparable effort to learning the automaton directly in some domains. This is for people already working on automata learning for trees, especially in verification settings where algebraic properties are known in advance. A reader in that niche can extract the algorithm description and the example rewrite systems without much trouble. I would send it to peer review; the extension is modest but the query-reduction claim is concrete enough to be worth checking the proofs and any measurements they include.

Referee Report

2 major / 1 minor

Summary. The paper extends the Angluin-style learning algorithm for tree automata by supplying a user-provided term rewriting system (TRS) that encodes properties of the target language (e.g., commutativity of subtrees). The TRS is used to deduce answers to selected membership queries, with the goal of lowering oracle query complexity. Examples of natural rewrite systems are presented that are claimed to yield significant query savings.

Significance. If the extension preserves correctness and the supplied TRS is sound and consistent with the target language, the technique offers a principled way to inject domain knowledge into tree-automaton learning and could reduce query cost in applications involving structured data. The approach builds directly on prior Angluin-style procedures without introducing new free parameters.

major comments (2)

Abstract: the central claim of query reduction is stated at a high level but is unsupported by any correctness proof for the extended observation-table construction, any complexity analysis, or concrete query-count measurements; these are load-bearing for the contribution.
Algorithm integration (high-level description): it is not shown that deductive answers from the TRS preserve the termination and equivalence invariants of the base Angluin procedure when the TRS is only assumed sound and consistent; a formal argument or invariant is required.

minor comments (1)

Abstract: the weakest assumption on the TRS (soundness, completeness, consistency) should be stated explicitly rather than left implicit.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed review and constructive suggestions. We address each major comment below and will incorporate revisions to strengthen the formal aspects of the work.

read point-by-point responses

Referee: Abstract: the central claim of query reduction is stated at a high level but is unsupported by any correctness proof for the extended observation-table construction, any complexity analysis, or concrete query-count measurements; these are load-bearing for the contribution.

Authors: We agree that the abstract states the query-reduction claim at a high level. The manuscript presents the extended algorithm, shows how the TRS is used to answer selected membership queries, and includes concrete examples of natural rewrite systems (e.g., commutativity) together with illustrative query savings on small instances. However, we acknowledge that a self-contained correctness argument for the modified observation table, a complexity analysis, and tabulated query-count measurements are not provided. In the revision we will add a dedicated subsection proving that the extended table construction preserves the invariants of the base Angluin procedure (under the stated soundness and consistency assumptions on the TRS), include asymptotic bounds on the number of remaining oracle queries, and report explicit query counts for the running examples. revision: yes
Referee: Algorithm integration (high-level description): it is not shown that deductive answers from the TRS preserve the termination and equivalence invariants of the base Angluin procedure when the TRS is only assumed sound and consistent; a formal argument or invariant is required.

Authors: We appreciate this observation. The current text assumes that any answer deduced from a sound and consistent TRS is identical to the answer the target language would give, and therefore that the standard termination and equivalence arguments carry over. We will make this reasoning explicit by adding an invariant stating that every entry in the observation table (whether obtained from the oracle or deduced) remains consistent with the target language. Under this invariant the original proofs of termination and correctness of the learned automaton apply unchanged. The revised manuscript will contain a short lemma formalizing the invariant together with a proof sketch. revision: yes

Circularity Check

0 steps flagged

No significant circularity; extension relies on external TRS input and prior Angluin algorithm

full rationale

The claimed contribution is an algorithmic extension of the established Angluin-style tree-automaton learning procedure. A user-supplied term rewriting system (TRS) is provided as an external artifact that is assumed sound, complete, and consistent with the target language; this TRS is used only to answer a subset of membership queries by deduction rather than oracle calls. The base learning invariants, termination, and correctness are inherited from the prior Angluin algorithm, which is treated as independent prior work. No step in the described derivation reduces a result to a fitted parameter, renames an input as a prediction, or rests on a self-citation chain whose cited result itself depends on the present paper. The derivation chain therefore contains independent content and does not collapse by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the external rewriting system being a faithful model of the target language properties; no numerical parameters or new entities are introduced.

axioms (1)

domain assumption The term rewriting system correctly and completely captures the relevant equivalences of the target tree language.
Deductive inference of query answers depends on this assumption.

pith-pipeline@v0.9.0 · 5370 in / 1081 out tokens · 48935 ms · 2026-05-11T02:28:20.928349+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear
We present an extension of the Angluin-style learning algorithm for tree automata that incorporates deductive inference. The learning algorithm is provided with a term rewriting system that specifies properties of the target tree language
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
Lemma 6. ... The TRSRisconsistentwithL(A)if and only if the functions ⃗δA(l)and ⃗δA(r)coincide, for each rulel→r∈ R.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages

[1]

A. V . Aho, M. R. Garey, and J. D. Ullman. The transitive reduction of a directed graph.SIAM Journal on Computing, 1(2):131–137, 1972. 12

work page 1972
[2]

Closure properties and decision problems of dag automata.Inf

Siva Anantharaman, Paliath Narendran, and Micha ¨el Rusinowitch. Closure properties and decision problems of dag automata.Inf. Process. Lett., 94(5):231–240, 2005

work page 2005
[3]

Learning regular sets from queries and counterexamples.Information and computation, 75(2):87– 106, 1987

Dana Angluin. Learning regular sets from queries and counterexamples.Information and computation, 75(2):87– 106, 1987

work page 1987
[4]

Unification of concept terms in description logics

Franz Baader and Paliath Narendran. Unification of concept terms in description logics. In Henri Prade, editor, ECAI 1998, pages 331–335. John Wiley and Sons, 1998

work page 1998
[5]

Cambridge University Press, 1998

Franz Baader and Tobias Nipkow.Term rewriting and all that. Cambridge University Press, 1998

work page 1998
[6]

Extracting context-free grammars from recurrent neural networks using tree-automata learning and a* search

Beno ˆıt Barbot, Benedikt Bollig, Alain Finkel, Serge Haddad, Igor Khmelnitsky, Martin Leucker, Daniel Neider, Rajarshi Roy, and Lina Ye. Extracting context-free grammars from recurrent neural networks using tree-automata learning and a* search. InICGI 2021, pages 113–129. PMLR, 2021

work page 2021
[7]

Berlinkov

Mikhail V . Berlinkov. Approximating the minimum length of synchronizing words is hard.Theory Comput. Syst., 54(2):211–223, 2014

work page 2014
[8]

Automata learning and identification of the support of language models

Satwik Bhattamishra, Michael Hahn, and Varun Kanade. Automata learning and identification of the support of language models

work page
[9]

On the regularity and learnability of ordered DAG lan- guages

Henrik Bj ¨orklund, Johanna Bj¨orklund, and Petter Ericson. On the regularity and learnability of ordered DAG lan- guages. In Arnaud Carayol and Cyril Nicaud, editors,CIAA 2017, volume 10329 ofLecture Notes in Computer Science, pages 27–39. Springer, 2017

work page 2017
[10]

Automata and logics for unranked and unordered trees

Iovka Boneva and Jean-Marc Talbot. Automata and logics for unranked and unordered trees. In J ¨urgen Giesl, editor,RTA 2005, volume 3467 ofLecture Notes in Computer Science, pages 500–515. Springer, 2005

work page 2005
[11]

Pozn ´amka k homog´ennym experimentom s koneˇcn`ymi automatmi.Matematicko-fyzik ´alny ˇcasopis, 14(3):208–216, 1964

J ´an ˇCern`y. Pozn ´amka k homog´ennym experimentom s koneˇcn`ymi automatmi.Matematicko-fyzik ´alny ˇcasopis, 14(3):208–216, 1964

work page 1964
[12]

Automata on dag representations of finite trees

Witold Charatonik. Automata on dag representations of finite trees. 1999

work page 1999
[13]

Hubert Comon, Max Dauchet, R ´emi Gilleron, Florent Jacquemard, Denis Lugiez, Christof L¨oding, Sophie Tison, and Marc Tommasi.Tree Automata Techniques and Applications. 2008

work page 2008
[14]

Scalable tree-based register automata learning

Simon Dierl, Paul Fiterau-Brostean, Falk Howar, Bengt Jonsson, Konstantinos Sagonas, and Fredrik T ˚aquist. Scalable tree-based register automata learning. InTACAS 2024, pages 87–108, 2024

work page 2024
[15]

Learning a regular tree language from a teacher

Frank Drewes and Johanna H ¨ogberg. Learning a regular tree language from a teacher. In Zolt´an ´Esik and Zolt´an F¨ul¨op, editors,DLT 2003, volume 2710 ofLecture Notes in Computer Science, pages 279–291. Springer, 2003

work page 2003
[16]

Query learning of regular tree languages: How to avoid dead states.Theory Comput

Frank Drewes and Johanna H ¨ogberg. Query learning of regular tree languages: How to avoid dead states.Theory Comput. Syst., 40(2):163–185, 2007

work page 2007
[17]

Reset sequences for monotonic automata.SIAM J

David Eppstein. Reset sequences for monotonic automata.SIAM J. Comput., 19(3):500–510, 1990

work page 1990
[18]

Active automata learning with advice

Michal Fica and Jan Otop. Active automata learning with advice. InECAI 2025, volume 413 ofFrontiers in Artificial Intelligence and Applications, pages 1655–1662. IOS Press, 2025

work page 2025
[19]

Active automata learning with adaptive distinguishing sequences.CoRR, abs/1902.01139, 2019

Markus Theo Frohme. Active automata learning with adaptive distinguishing sequences.CoRR, abs/1902.01139, 2019

work page arXiv 1902
[20]

Learning definable hypotheses on trees

´Emilie Grienenberger and Martin Ritzert. Learning definable hypotheses on trees. In Pablo Barcel ´o and Marco Calautti, editors,ICDT 2019, volume 127 ofLIPIcs, pages 24:1–24:18. Schloss Dagstuhl - Leibniz-Zentrum f ¨ur Informatik, 2019. 13

work page 2019
[21]

Learning multiplicity tree automata

Amaury Habrard and Jos ´e Oncina. Learning multiplicity tree automata. In Yasubumi Sakakibara, Satoshi Kobayashi, Kengo Sato, Tetsuro Nishino, and Etsuji Tomita, editors,ICGI 2006, volume 4201 ofLecture Notes in Computer Science, pages 268–280. Springer, 2006

work page 2006
[22]

The TTT algorithm: A redundancy-free approach to active automata learning

Malte Isberner, Falk Howar, and Bernhard Steffen. The TTT algorithm: A redundancy-free approach to active automata learning. InRV 2014, pages 307–322, 2014

work page 2014
[23]

Four one-shot learners for regular tree languages and their polynomial characterizability.Theor

Anna Kasprzik. Four one-shot learners for regular tree languages and their polynomial characterizability.Theor. Comput. Sci., 485:85–106, 2013

work page 2013
[24]

Learning tree automata with term rewriting

Jakub Kopystia ´nski and Jan Otop. Learning tree automata with term rewriting. Into appear in IJCAI 2026. ijcai.org, 2026

work page 2026
[25]

Learning tree automata with term rewriting: code and data

Jakub Kopystia ´nski and Jan Otop. Learning tree automata with term rewriting: code and data. https://github. com/jotop/LearnDFTAwithTRS, 2026

work page 2026
[26]

Information extraction from web documents based on local unranked tree automaton inference

Raymond Kosala, Maurice Bruynooghe, Jan Van den Bussche, and Hendrik Blockeel. Information extraction from web documents based on local unranked tree automaton inference. InIJCAI 2003, pages 403–408. Morgan Kaufmann, 2003

work page 2003
[27]

Small test suites for active automata learning

Loes Kruger, Sebastian Junges, and Jurriaan Rot. Small test suites for active automata learning. InTACAS 2024, pages 109–129, 2024

work page 2024
[28]

Actively learning el terminologies from large language models

Matteo Magnini, Riccardo Squarcialupi, Martin Tunge Sterri, Ana Ozaki, and Rivera Castillo. Actively learning el terminologies from large language models. InECAI 2025, volume 413 ofFrontiers in Artificial Intelligence and Applications, pages 1792–1799. IOS Press, 2025

work page 2025
[29]

Complexity of equivalence and learning for multiplicity tree automata.Journal of Machine Learning Research, 16:2465–2500, 2015

Ines Marusic and James Worrell. Complexity of equivalence and learning for multiplicity tree automata.Journal of Machine Learning Research, 16:2465–2500, 2015

work page 2015
[30]

An introduction to description logics.Description logic handbook, 1:40, 2003

Daniele Nardi, Ronald J Brachman, et al. An introduction to description logics.Description logic handbook, 1:40, 2003

work page 2003
[31]

Learning of structurally unambiguous probabilistic gram- mars

Dolav Nitay, Dana Fisman, and Michal Ziv-Ukelson. Learning of structurally unambiguous probabilistic gram- mars. InAAAI 2021, pages 9170–9178. AAAI Press, 2021

work page 2021
[32]

Automatic grammar repair

Moeketsi Raselimo and Bernd Fischer. Automatic grammar repair. In Eelco Visser, Dimitris S. Kolovos, and Emma S¨oderberg, editors,SLE 2021, pages 126–142. ACM, 2021

work page 2021
[33]

Learning regular tree languages from correction and equivalence queries.J

Catalin Ionut Tirn ˘auc˘a and Cristina Tirn˘auc˘a. Learning regular tree languages from correction and equivalence queries.J. Autom. Lang. Comb., 12(4):501–524, 2007

work page 2007
[34]

Craig A. Tovey. A simplified np-complete satisfiability problem.Discret. Appl. Math., 8(1):85–89, 1984

work page 1984
[35]

Model learning.Communications of the ACM, 60(2):86–95, 2017

Frits Vaandrager. Model learning.Communications of the ACM, 60(2):86–95, 2017

work page 2017
[36]

Vaandrager, Bharat Garhewal, Jurriaan Rot, and Thorsten Wißmann

Frits W. Vaandrager, Bharat Garhewal, Jurriaan Rot, and Thorsten Wißmann. A new approach for active automata learning based on apartness. InTACAS 2022, pages 223–243, 2022

work page 2022
[37]

Learning pomset automata

Gerco van Heerdt, Tobias Kapp´e, Jurriaan Rot, and Alexandra Silva. Learning pomset automata. In Stefan Kiefer and Christine Tasson, editors,ETAPS 2021, volume 12650, pages 510–530. Springer, 2021

work page 2021
[38]

Witwicki, Matei Zaharia, and Sanjit A

Marcell Vazquez-Chanlatte, Karim Elmaaroufi, Stefan J. Witwicki, Matei Zaharia, and Sanjit A. Seshia.l ∗lm: Learning automata from examples using natural language oracles, 2025

work page 2025
[39]

Synthesizing transformations on hierarchically structured data

Navid Yaghmazadeh, Christian Klinger, Isil Dillig, and Swarat Chaudhuri. Synthesizing transformations on hierarchically structured data. In Chandra Krintz and Emery D. Berger, editors,PLDI 2016, pages 508–521. ACM, 2016. 14

work page 2016