pith. sign in

arxiv: 2602.03436 · v3 · pith:7DKZJQQCnew · submitted 2026-02-03 · 💻 cs.DS

The Complexity of Maximal/Closed Frequent Tree Mining for Bounded Height Trees

Pith reviewed 2026-05-21 14:40 UTC · model grok-4.3

classification 💻 cs.DS
keywords frequent tree miningclosed frequent treesmaximal frequent treesheight bounded treesenumeration complexityordered treesunordered treespolynomial delay
0
0 comments X

The pith

Even height-2 trees make maximal frequent pattern mining hard unless P equals NP, while unordered closed trees at that height admit polynomial-delay enumeration.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether bounding the height of rooted trees in a database simplifies the task of enumerating closed or maximal frequent tree patterns. It establishes that the complexity depends sharply on whether trees are ordered or unordered and on the choice between closed and maximal variants. A direct polynomial-delay algorithm is presented for closed frequent trees when the trees are unordered and have height at most 2. For ordered trees of height at most 2 the closed enumeration problem is shown to be equivalent in output-polynomial difficulty to the Dualization problem. Hardness proofs establish that maximal frequent tree enumeration admits no output-polynomial algorithm unless P equals NP, already for ordered trees of height at most 2 and for unordered trees of height at most 3.

Core claim

For rooted unordered trees of height at most 2, we give a polynomial-delay algorithm for enumerating closed frequent trees. On the other hand, for rooted ordered trees of height at most 2, we show that an output-polynomial time algorithm for enumerating closed frequent trees would imply an output-polynomial time algorithm for Dualization. For maximal frequent tree enumeration, we prove that no output-polynomial time algorithm exists unless P = NP already for rooted ordered trees of height at most 2 and for rooted unordered trees of height at most 3.

What carries the argument

Height-bounded reductions from Dualization and from NP-hard problems for the hardness directions, paired with a specialized direct enumeration procedure for the unordered closed case of height at most 2.

Load-bearing premise

The hardness results assume that P is not equal to NP and that Dualization admits no output-polynomial time algorithm.

What would settle it

Discovery of an output-polynomial time algorithm for maximal frequent tree enumeration on ordered trees of height at most 2, or an output-polynomial algorithm for Dualization, would falsify the corresponding hardness claims.

Figures

Figures reproduced from arXiv: 2602.03436 by Hirotaka Ono, Kazuhiro Kurita, Kenta Komoto.

Figure 1
Figure 1. Figure 1: Our reduction from (3,4)-SAT to Another Maximal Frequent Tree. from the paper [15] (Theorem 5.2) and provides a reduction from (3,4)-SAT. In our reduction, we assume that the number of variables and clauses are greater than 10. We introduce notations. Let ϕ be a CNF formula C1 ∧ . . . ∧ Cm and V (ϕ) = {x1, . . . , xn} be the set of variables in ϕ. A literal ai is an element in {xi ,¬xi} for a variable xi ∈… view at source ↗
read the original abstract

Frequent tree mining asks us to enumerate tree patterns that occur frequently in a database of rooted trees. This problem is motivated by tree-structured data in bioinformatics, such as glycans and pseudoknot-free RNA secondary structures. A direct enumeration of all frequent trees is often highly redundant, because every subtree of a frequent tree is again frequent. Closed and maximal frequent trees are standard ways to reduce this redundancy, but their enumeration can still be computationally hard. In this paper, we study the effect of bounding the height of the input trees. This is a natural restriction for rooted trees, since the height is the depth of the hierarchy. We ask whether closed/maximal frequent tree mining remains hard when every input tree has a small height. Our results show that the answer depends sharply on the model. For rooted unordered trees of height at most 2, we give a polynomial-delay algorithm for enumerating closed frequent trees. On the other hand, for rooted ordered trees of height at most 2, we show that an output-polynomial time algorithm for enumerating closed frequent trees would imply an output-polynomial time algorithm for Dualization. For maximal frequent tree enumeration, we prove that no output-polynomial time algorithm exists unless P = NP already for rooted ordered trees of height at most 2 and for rooted unordered trees of height at most 3. Thus, even very small height bounds do not make the enumeration problems easy in general. At the same time, the unordered closed case of height at most 2 admits polynomial-delay enumeration. These results give a height-based classification of the complexity of closed and maximal frequent tree mining on shallow rooted trees.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper studies the complexity of enumerating closed and maximal frequent subtrees from a database of rooted trees when the height of the input trees is bounded. It presents a polynomial-delay algorithm for closed frequent trees on rooted unordered trees of height at most 2. It shows that an output-polynomial algorithm for closed frequent trees on rooted ordered trees of height at most 2 would yield an output-polynomial algorithm for Dualization. It proves that maximal frequent tree enumeration admits no output-polynomial algorithm (unless P=NP) already for rooted ordered trees of height at most 2 and for rooted unordered trees of height at most 3.

Significance. If the results hold, the work supplies a sharp height-based complexity classification for closed and maximal frequent tree mining. The polynomial-delay algorithm for the unordered closed case at height 2 is a concrete positive result, while the reductions from Dualization and from NP-complete problems delineate the remaining hardness even at very small heights. This classification is relevant to applications in bioinformatics that involve shallow rooted trees such as glycans and RNA secondary structures.

major comments (2)
  1. [Hardness section for maximal enumeration] The reduction establishing hardness of maximal frequent tree enumeration for ordered trees of height 2 (presumably in the section containing the NP-hardness proof) must be checked to ensure that the constructed database trees remain of height exactly 2 while preserving the exact frequency threshold correspondence with the source problem.
  2. [Algorithm for unordered closed trees] The polynomial-delay algorithm for closed frequent unordered trees of height at most 2 relies on a specific enumeration order or data structure; the manuscript should explicitly state the delay bound in terms of the input size and the number of closed frequent trees output so far.
minor comments (2)
  1. [Introduction] The abstract and introduction use 'output-polynomial time' and 'polynomial-delay' interchangeably in places; a uniform definition early in the paper would improve clarity.
  2. [References] A small number of citations to prior work on frequent tree mining and on the Dualization problem appear to be missing from the reference list.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment and constructive comments. We address the two major comments point by point below.

read point-by-point responses
  1. Referee: [Hardness section for maximal enumeration] The reduction establishing hardness of maximal frequent tree enumeration for ordered trees of height 2 (presumably in the section containing the NP-hardness proof) must be checked to ensure that the constructed database trees remain of height exactly 2 while preserving the exact frequency threshold correspondence with the source problem.

    Authors: We have re-checked the reduction (from Vertex Cover on graphs of maximum degree 3) in the relevant hardness section. Each database tree is built from a root with two levels of children corresponding to vertices and edges; no deeper nesting is introduced, so every tree has height exactly 2. The support threshold is copied verbatim from the source instance, establishing a direct bijection between solutions and maximal frequent trees. We will insert one clarifying sentence in the proof to make the height and threshold preservation explicit. revision: yes

  2. Referee: [Algorithm for unordered closed trees] The polynomial-delay algorithm for closed frequent unordered trees of height at most 2 relies on a specific enumeration order or data structure; the manuscript should explicitly state the delay bound in terms of the input size and the number of closed frequent trees output so far.

    Authors: The algorithm proceeds by depth-first traversal of the lattice of closed trees of height at most 2, using a trie-based representation of the unordered trees that permits constant-time child generation and duplicate detection. The time to produce each successive closed frequent tree is bounded by a polynomial in the total size of the input database (sum of the sizes of all trees) and is independent of the number of trees already output. We will add an explicit statement of this polynomial-delay bound in the algorithm section of the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper's central results consist of a polynomial-delay enumeration algorithm for closed frequent unordered trees of height ≤2, a conditional hardness reduction showing that output-polynomial enumeration for the ordered closed case of height ≤2 would imply output-polynomial Dualization, and unconditional hardness proofs (unless P=NP) for maximal enumeration already at height 2 (ordered) and height 3 (unordered). These rest on explicit reductions from independently known hard problems and on a direct algorithmic construction; no equation, parameter, or claim is defined in terms of another result from the same paper, and no load-bearing step reduces to a self-citation or fitted input. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper depends on standard complexity-theoretic assumptions rather than new parameters or entities; no free parameters or invented objects are introduced in the abstract.

axioms (2)
  • standard math P ≠ NP
    Invoked to establish unconditional hardness for maximal frequent tree enumeration at small heights.
  • domain assumption Dualization has no output-polynomial time algorithm
    Used as the basis for the conditional hardness result linking ordered closed frequent tree enumeration at height 2 to Dualization.

pith-pipeline@v0.9.0 · 5837 in / 1486 out tokens · 61651 ms · 2026-05-21T14:40:20.510094+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages

  1. [1]

    Reverse search for enumerat ion

    David A vis and Komei Fukuda. Reverse search for enumerat ion. Dis- crete Applied Mathematics , 65(1):21–46, 1996

  2. [2]

    Babin and Sergei O

    Mikhail A. Babin and Sergei O. Kuznetsov. Dualization in lattices given by ordered sets of irreducibles. Theoretical Computer Science, 658:316– 326, 2017. Horn formulas, directed hypergraphs, lattices a nd closure systems: related formalism and application

  3. [3]

    Mining frequent closed rooted trees

    José L Balcázar, Albert Bifet, and Antoni Lozano. Mining frequent closed rooted trees. Machine Learning, 78:1–33, 2010

  4. [4]

    Output-sensitive complexity of multiobjective combinato - rial optimization with an application to the multiobjectiv e shortest path problem

    Fritz Bökler. Output-sensitive complexity of multiobjective combinato - rial optimization with an application to the multiobjectiv e shortest path problem. PhD thesis, Dortmund University, Germany, 2018

  5. [5]

    On maximal frequent and minimal infrequent sets in b inary matrices

    Endre Boros, Vladimir Gurvich, Leonid Khachiyan, and Ka zuhisa Makino. On maximal frequent and minimal infrequent sets in b inary matrices. Ann. Math. Artif. Intell. , 39:211–221, 2003

  6. [6]

    On the hardness of inclusion -wise minimal separators enumeration

    Caroline Brosse, Oscar Defrain, Kazuhiro Kurita, Vince nt Limouzy, Takeaki Uno, and Kunihiro Wasa. On the hardness of inclusion -wise minimal separators enumeration. Inf. Process. Lett., 185:106469, 2024

  7. [7]

    The complexity of maximal common subseque nce enumeration

    Giovanni Buzzega, Alessio Conte, Yasuaki Kobayashi, Ka zuhiro Kurita, and Giulia Punzi. The complexity of maximal common subseque nce enumeration. Proc. ACM Manag. Data , 3(2):115:1–115:19, 2025

  8. [8]

    Efficiently mining unordere d trees

    Mostafa Haghir Chehreghani. Efficiently mining unordere d trees. In 2011 IEEE 11th International Conference on Data Mining , pages 111– 120, 2011

  9. [9]

    Yun Chi, Yirong Yang, and Richard R. Muntz. Hybridtreemi ner: An efficient algorithm for mining frequent rooted trees and free trees using canonical forms. In Proceedings of the 16th International Conference on Scientific and Statistical Database Management , SSDBM ’04, page 11, USA, 2004. IEEE Computer Society

  10. [10]

    Yun Chi, Yirong Yang, Yi Xia, and Richard R. Muntz. Cmtre em- iner: Mining both closed and maximal frequent subtrees. In H onghua Dai, Ramakrishnan Srikant, and Chengqi Zhang, editors, Advances in Knowledge Discovery and Data Mining , pages 63–73, Berlin, Heidel- berg, 2004. Springer Berlin Heidelberg. 20

  11. [11]

    Comp utational aspects of monotone dualization: A brief survey

    Thomas Eiter, Kazuhisa Makino, and Georg Gottlob. Comp utational aspects of monotone dualization: A brief survey. Discret. Appl. Math. , 156(11):2035–2049, 2008

  12. [12]

    Fredman and Leonid Khachiyan

    Michael L. Fredman and Leonid Khachiyan. On the complex ity of dual- ization of monotone disjunctive normal forms. J. Algorithms, 21(3):618– 628, 1996

  13. [13]

    Efficient fr equent connected induced subgraph mining in graphs of bounded tree -width

    Tamás Horváth, Keisuke Otaki, and Jan Ramon. Efficient fr equent connected induced subgraph mining in graphs of bounded tree -width. In PKDD 2013 , pages 622–637, Berlin, Heidelberg, 2013. Springer Berlin Heidelberg

  14. [14]

    Johnson, Mihalis Yannakakis, and Christos H

    David S. Johnson, Mihalis Yannakakis, and Christos H. P apadimitriou. On generating all maximal independent sets. Information Processing Letters, 27(3):119–123, 1988

  15. [15]

    Kolaitis

    Benny Kimelfeld and Phokion G. Kolaitis. The complexit y of mining maximal frequent subgraphs. In PODS 2013 , page 13–24, New York, NY, USA, 2013. Association for Computing Machinery

  16. [16]

    Polynomial- delay and polynomial-space enumeration of large maximal ma tchings

    Yasuaki Kobayashi, Kazuhiro Kurita, and Kunihiro Wasa . Polynomial- delay and polynomial-space enumeration of large maximal ma tchings. In Michael A. Bekos and Michael Kaufmann, editors, Graph-Theoretic Concepts in Computer Science - 48th International Workshop , WG 2022, Tübingen, Germany, June 22-24, 2022, Revised Selecte d Papers, volume 13453 of Lectu...

  17. [17]

    Polynomial- delay enumeration of large maximal common independent sets in two matroids and beyond

    Yasuaki Kobayashi, Kazuhiro Kurita, and Kunihiro Wasa . Polynomial- delay enumeration of large maximal common independent sets in two matroids and beyond. Inf. Comput. , 304:105282, 2025

  18. [18]

    Spanner enumeration for temporal graphs

    Kazuhiro Kurita, Andrea Marino, Jason Schoeters, and T akeaki Uno. Spanner enumeration for temporal graphs. In Kitty Meeks and Chris- tian Scheideler, editors, 4th Symposium on Algorithmic Foundations of Dynamic Networks, SAND 2025, Liverpool, UK, June 9-11, 2025 , vol- ume 330 of LIPIcs, pages 9:1–9:21. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2025

  19. [19]

    Lawler, Jan Karel Lenstra, and A

    Eugene L. Lawler, Jan Karel Lenstra, and A. H. G. Rinnooy Kan. Generating all maximal independent sets: Np-hardness and p olynomial- time algorithms. SIAM J. Comput. , 9(3):558–565, 1980. 21

  20. [20]

    David W. Matula. Subtree isomorphism in o(n5/2). In B. A lspach, P. Hell, and D.J. Miller, editors, Algorithmic Aspects of Combinatorics , volume 2 of Annals of Discrete Mathematics , pages 91–106. Elsevier, 1978

  21. [21]

    Treefinder: a first step towards xml data mining

    Alexandre Termier, Marie christine Rousset, and Michè le Sebag. Treefinder: a first step towards xml data mining. ICDM 2002 , pages 450–457, 2002

  22. [22]

    Craig A. Tovey. A simplified np-complete satisfiability problem. Dis- crete Applied Mathematics , 8(1):85–89, 1984

  23. [23]

    An efficient algorithm for solving pseudo cl ique enumer- ation problem

    Takeaki Uno. An efficient algorithm for solving pseudo cl ique enumer- ation problem. Algorithmica, 56(1):3–16, 2010

  24. [24]

    LCM ve r

    Takeaki Uno, Masashi Kiyomi, and Hiroki Arimura. LCM ve r. 2: Ef- ficient mining algorithms for frequent/closed/maximal ite msets. In Roberto J. Bayardo Jr., Bart Goethals, and Mohammed Javeed Z aki, editors, FIMI ’04, Proceedings of the IEEE ICDM Workshop on Fre- quent Itemset Mining Implementations, Brighton, UK, Novem ber 1, 2004, volume 126 of CEUR Wor...

  25. [25]

    gspan: Graph-based substruc ture pat- tern mining

    Xifeng Yan and Jiawei Han. gspan: Graph-based substruc ture pat- tern mining. In Proceedings of the 2002 IEEE International Confer- ence on Data Mining (ICDM 2002), 9-12 December 2002, Maebash i City, Japan, pages 721–724. IEEE Computer Society, 2002

  26. [26]

    Mohammed J. Zaki. Efficiently mining frequent trees in a f orest. In KDD 2002 , page 71–80, New York, NY, USA, 2002. Association for Computing Machinery

  27. [27]

    Mohammed J. Zaki. Efficiently mining frequent embedded u nordered trees. Fundamenta Informaticae, 66(1-2):33–52, 2005. 22