KG-HTC: Integrating Knowledge Graphs into LLMs for Effective Zero-shot Hierarchical Text Classification
Pith reviewed 2026-05-22 15:32 UTC · model grok-4.3
The pith
Knowledge graphs retrieved via RAG supply structured context that lets LLMs classify documents into deep taxonomies in a strict zero-shot setting.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
KG-HTC integrates knowledge graphs into LLMs by retrieving subgraphs related to the input text using a Retrieval-Augmented Generation approach. This provides structured semantic context that enhances the LLM's ability to understand label semantics at various hierarchy levels. The method improves classification accuracy in strict zero-shot settings on the WoS, DBpedia, and Amazon datasets, particularly at deeper levels of the hierarchy.
What carries the argument
RAG-based retrieval of relevant subgraphs from knowledge graphs to augment LLM prompts with structured semantic context for label understanding.
If this is right
- LLMs can handle larger label spaces in HTC without supervision or fine-tuning.
- Performance improves especially for labels at deeper levels of the taxonomy.
- The approach mitigates challenges from long-tail label distributions.
- Structured knowledge integration addresses real-world HTC applications lacking annotated data.
Where Pith is reading between the lines
- Similar retrieval augmentation could extend to other zero-shot structured prediction tasks such as relation extraction.
- Domain-specific knowledge graphs might yield further gains when the general-purpose graph coverage is sparse.
- The method suggests a broader pattern where external structured data reduces the need for task-specific labeled examples in NLP.
Load-bearing premise
Subgraphs retrieved from general-purpose knowledge graphs via RAG supply sufficiently relevant and structured semantic context to improve an LLM's understanding of label semantics across hierarchy levels without any task-specific fine-tuning or labeled examples.
What would settle it
Running KG-HTC on a dataset where the knowledge graph has no or minimal coverage for the taxonomy labels and measuring whether it still outperforms plain LLM baselines.
Figures
read the original abstract
Hierarchical Text Classification (HTC) involves assigning documents to labels organized within a taxonomy. Most previous research on HTC has focused on supervised methods. However, in real-world scenarios, employing supervised HTC can be challenging due to a lack of annotated data. Moreover, HTC often faces issues with large label spaces and long-tail distributions. In this work, we present Knowledge Graphs for zero-shot Hierarchical Text Classification (KG-HTC), which aims to address these challenges of HTC in applications by integrating knowledge graphs with Large Language Models (LLMs) to provide structured semantic context during classification. Our method retrieves relevant subgraphs from knowledge graphs related to the input text using a Retrieval-Augmented Generation (RAG) approach. Our KG-HTC can enhance LLMs to understand label semantics at various hierarchy levels. We evaluate KG-HTC on three open-source HTC datasets: WoS, DBpedia, and Amazon. Our experimental results show that KG-HTC significantly outperforms three baselines in the strict zero-shot setting, particularly achieving substantial improvements at deeper levels of the hierarchy. This evaluation demonstrates the effectiveness of incorporating structured knowledge into LLMs to address HTC's challenges in large label spaces and long-tailed label distributions. Our code is available at: https://github.com/QianboZang/KG-HTC.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes KG-HTC, a zero-shot hierarchical text classification method that retrieves subgraphs from general-purpose knowledge graphs via RAG and injects them as structured semantic context into LLMs to improve label understanding across hierarchy levels. It reports significant outperformance over three baselines on the WoS, DBpedia, and Amazon datasets, with larger gains at deeper hierarchy levels, and releases code for reproducibility.
Significance. If the reported gains prove robust under strict zero-shot conditions with no information leakage, the approach could meaningfully advance zero-shot HTC by addressing large label spaces and long-tail distributions through external structured knowledge. The public code release at https://github.com/QianboZang/KG-HTC is a clear strength for reproducibility and follow-up work.
major comments (1)
- [Abstract and Experiments] Abstract and Experiments section: The headline claim of 'strict zero-shot setting' and substantial gains at deeper hierarchy levels rests on the assumption that RAG-retrieved subgraphs supply only external semantic context. For the DBpedia dataset, if the chosen KG (e.g., a DBpedia-style graph) contains the same category hierarchy or instance-level relations used as ground-truth labels, retrieval can surface direct label definitions or parent-child links, converting the method into implicit supervised lookup rather than zero-shot reasoning. No exclusion filters or disjoint-KG protocol is described to rule out this overlap.
minor comments (2)
- [Abstract] The abstract states outperformance on three datasets but provides no details on exact baselines, metrics, statistical significance tests, or error analysis; these should be summarized in the abstract or early in the experiments section for clarity.
- [Method] Notation for hierarchy levels and subgraph retrieval should be defined more explicitly (e.g., how depth is measured and how relevance scoring in RAG is performed) to aid replication.
Simulated Author's Rebuttal
We thank the referee for the constructive comment on the zero-shot integrity of our evaluation. We address the concern point by point below.
read point-by-point responses
-
Referee: [Abstract and Experiments] Abstract and Experiments section: The headline claim of 'strict zero-shot setting' and substantial gains at deeper hierarchy levels rests on the assumption that RAG-retrieved subgraphs supply only external semantic context. For the DBpedia dataset, if the chosen KG (e.g., a DBpedia-style graph) contains the same category hierarchy or instance-level relations used as ground-truth labels, retrieval can surface direct label definitions or parent-child links, converting the method into implicit supervised lookup rather than zero-shot reasoning. No exclusion filters or disjoint-KG protocol is described to rule out this overlap.
Authors: We appreciate this observation, which correctly identifies a point that requires greater transparency. Our experiments used a general-purpose knowledge graph (Wikidata) whose entity and relation structure is not identical to the Wikipedia-derived category taxonomies in the DBpedia dataset or the label hierarchies in WoS and Amazon. Retrieval operates via embedding similarity between the input document and KG entities rather than direct lookup of label strings or parent-child edges. Nevertheless, the manuscript does not explicitly document exclusion filters or a formal disjoint-KG protocol. We will therefore revise the Experiments section to specify the exact KG source, the embedding-based retrieval procedure, and any post-retrieval checks confirming that retrieved subgraphs do not contain the ground-truth label definitions or hierarchy edges. This addition will be made in the next version. revision: yes
Circularity Check
No circularity; method uses external KGs and public benchmarks without self-referential reduction
full rationale
The paper describes a RAG-based retrieval of subgraphs from general-purpose knowledge graphs to augment LLMs for zero-shot HTC. No equations, parameters, or derivations are presented that reduce by construction to fitted inputs or self-citations. The central claim relies on external structured knowledge and standard evaluation on WoS, DBpedia, and Amazon datasets, remaining self-contained against external benchmarks with no load-bearing self-referential steps.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Knowledge graphs contain structured semantic information relevant to taxonomy labels that can be retrieved to augment LLM understanding in zero-shot settings.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Our method retrieves relevant subgraphs from knowledge graphs related to the input text using a Retrieval-Augmented Generation (RAG) approach... upwards propagation algorithm... structured sequences are subsequently concatenated into a prompt
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We evaluate KG-HTC on three open-source HTC datasets: WoS, DBpedia, and Amazon... significant improvements at deeper levels of the hierarchy
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 2 Pith papers
-
Taxon: Hierarchical Tax Code Prediction with Semantically Aligned LLM Expert Guidance
Taxon uses a mixture-of-experts architecture and LLM-derived semantic verification to achieve state-of-the-art performance on hierarchical tax code prediction and has been deployed in production at Alibaba.
-
SC-Taxo: Hierarchical Taxonomy Generation under Semantic Consistency Constraints using Large Language Models
SC-Taxo adds bidirectional heading generation and peer semantic dependency modeling to LLMs to produce taxonomies with improved hierarchy alignment and heading quality on scientific literature benchmarks.
Reference graph
Works this paper leans on
-
[1]
S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives. Dbpedia: A nucleus for a web of open data. In international semantic web conference, pages 722–735. Springer, 2007
work page 2007
-
[2]
L. Bongiovanni, L. Bruno, F. Dominici, and G. Rizzo. Zero-shot tax- onomy mapping for document classification. In Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing, pages 911–918, 2023
work page 2023
- [3]
-
[4]
S. Chatterjee, A. Maheshwari, G. Ramakrishnan, and S. N. Jagarlapudi. Joint learning of hyperbolic label embeddings for hierarchical multi- label classification. In P. Merlo, J. Tiedemann, and R. Tsarfaty, editors, Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 2829– 2841, On...
-
[5]
D. Chen, Z. Yu, and S. R. Bowman. Clean or annotate: How to spend a limited data collection budget. In C. Cherry, A. Fan, G. Fos- ter, G. R. Haffari, S. Khadivi, N. V . Peng, X. Ren, E. Shareghi, and S. Swayamdipta, editors, Proceedings of the Third Workshop on Deep Learning for Low-Resource Natural Language Processing, pages 152–168, Hybrid, July 2022. A...
-
[6]
H. Chen, Q. Ma, Z. Lin, and J. Yan. Hierarchy-aware label semantics matching network for hierarchical text classification. In C. Zong, F. Xia, W. Li, and R. Navigli, editors, Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th Interna- tional Joint Conference on Natural Language Processing (Volume 1: Long P...
-
[7]
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In Pro- ceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technolo- gies, volume 1 (long and short papers), pages 4171–4186, 2019
work page 2019
-
[8]
D. Edge, H. Trinh, N. Cheng, J. Bradley, A. Chao, A. Mody, S. Truitt, D. Metropolitansky, R. O. Ness, and J. Larson. From local to global: A graph rag approach to query-focused summarization. arXiv preprint arXiv:2404.16130, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[9]
W. Fan, Y . Ding, L. Ning, S. Wang, H. Li, D. Yin, T.-S. Chua, and Q. Li. A survey on rag meeting llms: Towards retrieval-augmented large language models. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 6491–6501, 2024
work page 2024
-
[10]
Y . Gao, Y . Xiong, X. Gao, K. Jia, J. Pan, Y . Bi, Y . Dai, J. Sun, H. Wang, and H. Wang. Retrieval-augmented generation for large language mod- els: A survey. arXiv preprint arXiv:2312.10997, 2, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[11]
K. Halder, A. Akbik, J. Krapac, and R. V ollgraf. Task-aware rep- resentation of sentences for generic text classification. In D. Scott, N. Bel, and C. Zong, editors,Proceedings of the 28th International Con- ference on Computational Linguistics , pages 3202–3213, Barcelona, Spain (Online), Dec. 2020. International Committee on Computational Linguistics. ...
-
[12]
D. Hendrycks, C. Burns, S. Basart, A. Zou, M. Mazeika, D. Song, and J. Steinhardt. Measuring massive multitask language understanding. In International Conference on Learning Representations, 2021
work page 2021
-
[13]
W. Huang, E. Chen, Q. Liu, Y . Chen, Z. Huang, Y . Liu, Z. Zhao, D. Zhang, and S. Wang. Hierarchical multi-label text classification: An attention-based recurrent network approach. In Proceedings of the 28th ACM international conference on information and knowledge manage- ment, pages 1051–1060, 2019
work page 2019
-
[14]
V . Karpukhin, B. Oguz, S. Min, P. Lewis, L. Wu, S. Edunov, D. Chen, and W.-t. Yih. Dense passage retrieval for open-domain ques- tion answering. In B. Webber, T. Cohn, Y . He, and Y . Liu, ed- itors, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages 6769–6781, On- line, Nov. 2020. Association for Compu...
work page 2020
-
[15]
Y . Kashnitsky. Hierarchical text classification, 2020. URL https://www. kaggle.com/dsv/1054619
-
[16]
K. Kowsari, D. E. Brown, M. Heidarysafa, K. Jafari Meimandi, M. S. Gerber, and L. E. Barnes. Hdltex: Hierarchical deep learning for text classification. In 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA) , pages 364–371, 2017. doi: 10. 1109/ICMLA.2017.0-134
work page 2017
- [17]
- [18]
- [19]
-
[20]
A. Liu, B. Feng, B. Xue, B. Wang, B. Wu, C. Lu, C. Zhao, C. Deng, C. Zhang, C. Ruan, et al. Deepseek-v3 technical report. arXiv preprint arXiv:2412.19437, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[21]
Y . Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V . Stoyanov. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1907
-
[22]
Y . Liu, K. Zhang, Z. Huang, K. Wang, Y . Zhang, Q. Liu, and E. Chen. Enhancing hierarchical text classification through knowledge graph in- tegration. In A. Rogers, J. Boyd-Graber, and N. Okazaki, editors, Findings of the Association for Computational Linguistics: ACL 2023 , pages 5797–5810, Toronto, Canada, July 2023. Association for Com- putational Lin...
- [23]
-
[24]
Y . Meng, J. Shen, C. Zhang, and J. Han. Weakly-supervised hierarchical text classification. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 6826–6833, 2019
work page 2019
-
[25]
L. Paletto, V . Basile, and R. Esposito. Label augmentation for zero-shot hierarchical text classification. In L.-W. Ku, A. Martins, and V . Sriku- mar, editors, Proceedings of the 62nd Annual Meeting of the Associ- ation for Computational Linguistics (Volume 1: Long Papers) , pages 7697–7706, Bangkok, Thailand, Aug. 2024. Association for Compu- tational ...
- [26]
- [27]
-
[28]
A. Radford, K. Narasimhan, T. Salimans, I. Sutskever, et al. Improving language understanding by generative pre-training. OpenAI, 2018
work page 2018
-
[29]
A. Simonofski, J. Fink, and C. Burnay. Supporting policy-making with social media and e-participation platforms data: A policy analytics framework. Government Information Quarterly, 38(3):101590, 2021
work page 2021
-
[30]
A. Sun and E.-P. Lim. Hierarchical text classification and evaluation. In Proceedings 2001 IEEE International Conference on Data Mining , pages 521–528, 2001. doi: 10.1109/ICDM.2001.989560
-
[31]
LLaMA: Open and Efficient Foundation Language Models
H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[32]
A. Yang, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Li, D. Liu, F. Huang, H. Wei, et al. Qwen2. 5 technical report. arXiv preprint arXiv:2412.15115, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
- [33]
- [34]
- [35]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.