The Complexity of Maximal/Closed Frequent Tree Mining for Bounded Height Trees
Pith reviewed 2026-05-21 14:40 UTC · model grok-4.3
The pith
Even height-2 trees make maximal frequent pattern mining hard unless P equals NP, while unordered closed trees at that height admit polynomial-delay enumeration.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
For rooted unordered trees of height at most 2, we give a polynomial-delay algorithm for enumerating closed frequent trees. On the other hand, for rooted ordered trees of height at most 2, we show that an output-polynomial time algorithm for enumerating closed frequent trees would imply an output-polynomial time algorithm for Dualization. For maximal frequent tree enumeration, we prove that no output-polynomial time algorithm exists unless P = NP already for rooted ordered trees of height at most 2 and for rooted unordered trees of height at most 3.
What carries the argument
Height-bounded reductions from Dualization and from NP-hard problems for the hardness directions, paired with a specialized direct enumeration procedure for the unordered closed case of height at most 2.
Load-bearing premise
The hardness results assume that P is not equal to NP and that Dualization admits no output-polynomial time algorithm.
What would settle it
Discovery of an output-polynomial time algorithm for maximal frequent tree enumeration on ordered trees of height at most 2, or an output-polynomial algorithm for Dualization, would falsify the corresponding hardness claims.
Figures
read the original abstract
Frequent tree mining asks us to enumerate tree patterns that occur frequently in a database of rooted trees. This problem is motivated by tree-structured data in bioinformatics, such as glycans and pseudoknot-free RNA secondary structures. A direct enumeration of all frequent trees is often highly redundant, because every subtree of a frequent tree is again frequent. Closed and maximal frequent trees are standard ways to reduce this redundancy, but their enumeration can still be computationally hard. In this paper, we study the effect of bounding the height of the input trees. This is a natural restriction for rooted trees, since the height is the depth of the hierarchy. We ask whether closed/maximal frequent tree mining remains hard when every input tree has a small height. Our results show that the answer depends sharply on the model. For rooted unordered trees of height at most 2, we give a polynomial-delay algorithm for enumerating closed frequent trees. On the other hand, for rooted ordered trees of height at most 2, we show that an output-polynomial time algorithm for enumerating closed frequent trees would imply an output-polynomial time algorithm for Dualization. For maximal frequent tree enumeration, we prove that no output-polynomial time algorithm exists unless P = NP already for rooted ordered trees of height at most 2 and for rooted unordered trees of height at most 3. Thus, even very small height bounds do not make the enumeration problems easy in general. At the same time, the unordered closed case of height at most 2 admits polynomial-delay enumeration. These results give a height-based classification of the complexity of closed and maximal frequent tree mining on shallow rooted trees.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper studies the complexity of enumerating closed and maximal frequent subtrees from a database of rooted trees when the height of the input trees is bounded. It presents a polynomial-delay algorithm for closed frequent trees on rooted unordered trees of height at most 2. It shows that an output-polynomial algorithm for closed frequent trees on rooted ordered trees of height at most 2 would yield an output-polynomial algorithm for Dualization. It proves that maximal frequent tree enumeration admits no output-polynomial algorithm (unless P=NP) already for rooted ordered trees of height at most 2 and for rooted unordered trees of height at most 3.
Significance. If the results hold, the work supplies a sharp height-based complexity classification for closed and maximal frequent tree mining. The polynomial-delay algorithm for the unordered closed case at height 2 is a concrete positive result, while the reductions from Dualization and from NP-complete problems delineate the remaining hardness even at very small heights. This classification is relevant to applications in bioinformatics that involve shallow rooted trees such as glycans and RNA secondary structures.
major comments (2)
- [Hardness section for maximal enumeration] The reduction establishing hardness of maximal frequent tree enumeration for ordered trees of height 2 (presumably in the section containing the NP-hardness proof) must be checked to ensure that the constructed database trees remain of height exactly 2 while preserving the exact frequency threshold correspondence with the source problem.
- [Algorithm for unordered closed trees] The polynomial-delay algorithm for closed frequent unordered trees of height at most 2 relies on a specific enumeration order or data structure; the manuscript should explicitly state the delay bound in terms of the input size and the number of closed frequent trees output so far.
minor comments (2)
- [Introduction] The abstract and introduction use 'output-polynomial time' and 'polynomial-delay' interchangeably in places; a uniform definition early in the paper would improve clarity.
- [References] A small number of citations to prior work on frequent tree mining and on the Dualization problem appear to be missing from the reference list.
Simulated Author's Rebuttal
We thank the referee for the positive assessment and constructive comments. We address the two major comments point by point below.
read point-by-point responses
-
Referee: [Hardness section for maximal enumeration] The reduction establishing hardness of maximal frequent tree enumeration for ordered trees of height 2 (presumably in the section containing the NP-hardness proof) must be checked to ensure that the constructed database trees remain of height exactly 2 while preserving the exact frequency threshold correspondence with the source problem.
Authors: We have re-checked the reduction (from Vertex Cover on graphs of maximum degree 3) in the relevant hardness section. Each database tree is built from a root with two levels of children corresponding to vertices and edges; no deeper nesting is introduced, so every tree has height exactly 2. The support threshold is copied verbatim from the source instance, establishing a direct bijection between solutions and maximal frequent trees. We will insert one clarifying sentence in the proof to make the height and threshold preservation explicit. revision: yes
-
Referee: [Algorithm for unordered closed trees] The polynomial-delay algorithm for closed frequent unordered trees of height at most 2 relies on a specific enumeration order or data structure; the manuscript should explicitly state the delay bound in terms of the input size and the number of closed frequent trees output so far.
Authors: The algorithm proceeds by depth-first traversal of the lattice of closed trees of height at most 2, using a trie-based representation of the unordered trees that permits constant-time child generation and duplicate detection. The time to produce each successive closed frequent tree is bounded by a polynomial in the total size of the input database (sum of the sizes of all trees) and is independent of the number of trees already output. We will add an explicit statement of this polynomial-delay bound in the algorithm section of the revised manuscript. revision: yes
Circularity Check
No significant circularity
full rationale
The paper's central results consist of a polynomial-delay enumeration algorithm for closed frequent unordered trees of height ≤2, a conditional hardness reduction showing that output-polynomial enumeration for the ordered closed case of height ≤2 would imply output-polynomial Dualization, and unconditional hardness proofs (unless P=NP) for maximal enumeration already at height 2 (ordered) and height 3 (unordered). These rest on explicit reductions from independently known hard problems and on a direct algorithmic construction; no equation, parameter, or claim is defined in terms of another result from the same paper, and no load-bearing step reduces to a self-citation or fitted input. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math P ≠ NP
- domain assumption Dualization has no output-polynomial time algorithm
Reference graph
Works this paper leans on
-
[1]
Reverse search for enumerat ion
David A vis and Komei Fukuda. Reverse search for enumerat ion. Dis- crete Applied Mathematics , 65(1):21–46, 1996
work page 1996
-
[2]
Mikhail A. Babin and Sergei O. Kuznetsov. Dualization in lattices given by ordered sets of irreducibles. Theoretical Computer Science, 658:316– 326, 2017. Horn formulas, directed hypergraphs, lattices a nd closure systems: related formalism and application
work page 2017
-
[3]
Mining frequent closed rooted trees
José L Balcázar, Albert Bifet, and Antoni Lozano. Mining frequent closed rooted trees. Machine Learning, 78:1–33, 2010
work page 2010
-
[4]
Fritz Bökler. Output-sensitive complexity of multiobjective combinato - rial optimization with an application to the multiobjectiv e shortest path problem. PhD thesis, Dortmund University, Germany, 2018
work page 2018
-
[5]
On maximal frequent and minimal infrequent sets in b inary matrices
Endre Boros, Vladimir Gurvich, Leonid Khachiyan, and Ka zuhisa Makino. On maximal frequent and minimal infrequent sets in b inary matrices. Ann. Math. Artif. Intell. , 39:211–221, 2003
work page 2003
-
[6]
On the hardness of inclusion -wise minimal separators enumeration
Caroline Brosse, Oscar Defrain, Kazuhiro Kurita, Vince nt Limouzy, Takeaki Uno, and Kunihiro Wasa. On the hardness of inclusion -wise minimal separators enumeration. Inf. Process. Lett., 185:106469, 2024
work page 2024
-
[7]
The complexity of maximal common subseque nce enumeration
Giovanni Buzzega, Alessio Conte, Yasuaki Kobayashi, Ka zuhiro Kurita, and Giulia Punzi. The complexity of maximal common subseque nce enumeration. Proc. ACM Manag. Data , 3(2):115:1–115:19, 2025
work page 2025
-
[8]
Efficiently mining unordere d trees
Mostafa Haghir Chehreghani. Efficiently mining unordere d trees. In 2011 IEEE 11th International Conference on Data Mining , pages 111– 120, 2011
work page 2011
-
[9]
Yun Chi, Yirong Yang, and Richard R. Muntz. Hybridtreemi ner: An efficient algorithm for mining frequent rooted trees and free trees using canonical forms. In Proceedings of the 16th International Conference on Scientific and Statistical Database Management , SSDBM ’04, page 11, USA, 2004. IEEE Computer Society
work page 2004
-
[10]
Yun Chi, Yirong Yang, Yi Xia, and Richard R. Muntz. Cmtre em- iner: Mining both closed and maximal frequent subtrees. In H onghua Dai, Ramakrishnan Srikant, and Chengqi Zhang, editors, Advances in Knowledge Discovery and Data Mining , pages 63–73, Berlin, Heidel- berg, 2004. Springer Berlin Heidelberg. 20
work page 2004
-
[11]
Comp utational aspects of monotone dualization: A brief survey
Thomas Eiter, Kazuhisa Makino, and Georg Gottlob. Comp utational aspects of monotone dualization: A brief survey. Discret. Appl. Math. , 156(11):2035–2049, 2008
work page 2035
-
[12]
Michael L. Fredman and Leonid Khachiyan. On the complex ity of dual- ization of monotone disjunctive normal forms. J. Algorithms, 21(3):618– 628, 1996
work page 1996
-
[13]
Efficient fr equent connected induced subgraph mining in graphs of bounded tree -width
Tamás Horváth, Keisuke Otaki, and Jan Ramon. Efficient fr equent connected induced subgraph mining in graphs of bounded tree -width. In PKDD 2013 , pages 622–637, Berlin, Heidelberg, 2013. Springer Berlin Heidelberg
work page 2013
-
[14]
Johnson, Mihalis Yannakakis, and Christos H
David S. Johnson, Mihalis Yannakakis, and Christos H. P apadimitriou. On generating all maximal independent sets. Information Processing Letters, 27(3):119–123, 1988
work page 1988
- [15]
-
[16]
Polynomial- delay and polynomial-space enumeration of large maximal ma tchings
Yasuaki Kobayashi, Kazuhiro Kurita, and Kunihiro Wasa . Polynomial- delay and polynomial-space enumeration of large maximal ma tchings. In Michael A. Bekos and Michael Kaufmann, editors, Graph-Theoretic Concepts in Computer Science - 48th International Workshop , WG 2022, Tübingen, Germany, June 22-24, 2022, Revised Selecte d Papers, volume 13453 of Lectu...
work page 2022
-
[17]
Polynomial- delay enumeration of large maximal common independent sets in two matroids and beyond
Yasuaki Kobayashi, Kazuhiro Kurita, and Kunihiro Wasa . Polynomial- delay enumeration of large maximal common independent sets in two matroids and beyond. Inf. Comput. , 304:105282, 2025
work page 2025
-
[18]
Spanner enumeration for temporal graphs
Kazuhiro Kurita, Andrea Marino, Jason Schoeters, and T akeaki Uno. Spanner enumeration for temporal graphs. In Kitty Meeks and Chris- tian Scheideler, editors, 4th Symposium on Algorithmic Foundations of Dynamic Networks, SAND 2025, Liverpool, UK, June 9-11, 2025 , vol- ume 330 of LIPIcs, pages 9:1–9:21. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2025
work page 2025
-
[19]
Lawler, Jan Karel Lenstra, and A
Eugene L. Lawler, Jan Karel Lenstra, and A. H. G. Rinnooy Kan. Generating all maximal independent sets: Np-hardness and p olynomial- time algorithms. SIAM J. Comput. , 9(3):558–565, 1980. 21
work page 1980
-
[20]
David W. Matula. Subtree isomorphism in o(n5/2). In B. A lspach, P. Hell, and D.J. Miller, editors, Algorithmic Aspects of Combinatorics , volume 2 of Annals of Discrete Mathematics , pages 91–106. Elsevier, 1978
work page 1978
-
[21]
Treefinder: a first step towards xml data mining
Alexandre Termier, Marie christine Rousset, and Michè le Sebag. Treefinder: a first step towards xml data mining. ICDM 2002 , pages 450–457, 2002
work page 2002
-
[22]
Craig A. Tovey. A simplified np-complete satisfiability problem. Dis- crete Applied Mathematics , 8(1):85–89, 1984
work page 1984
-
[23]
An efficient algorithm for solving pseudo cl ique enumer- ation problem
Takeaki Uno. An efficient algorithm for solving pseudo cl ique enumer- ation problem. Algorithmica, 56(1):3–16, 2010
work page 2010
-
[24]
Takeaki Uno, Masashi Kiyomi, and Hiroki Arimura. LCM ve r. 2: Ef- ficient mining algorithms for frequent/closed/maximal ite msets. In Roberto J. Bayardo Jr., Bart Goethals, and Mohammed Javeed Z aki, editors, FIMI ’04, Proceedings of the IEEE ICDM Workshop on Fre- quent Itemset Mining Implementations, Brighton, UK, Novem ber 1, 2004, volume 126 of CEUR Wor...
work page 2004
-
[25]
gspan: Graph-based substruc ture pat- tern mining
Xifeng Yan and Jiawei Han. gspan: Graph-based substruc ture pat- tern mining. In Proceedings of the 2002 IEEE International Confer- ence on Data Mining (ICDM 2002), 9-12 December 2002, Maebash i City, Japan, pages 721–724. IEEE Computer Society, 2002
work page 2002
-
[26]
Mohammed J. Zaki. Efficiently mining frequent trees in a f orest. In KDD 2002 , page 71–80, New York, NY, USA, 2002. Association for Computing Machinery
work page 2002
-
[27]
Mohammed J. Zaki. Efficiently mining frequent embedded u nordered trees. Fundamenta Informaticae, 66(1-2):33–52, 2005. 22
work page 2005
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.