KG-HTC: Integrating Knowledge Graphs into LLMs for Effective Zero-shot Hierarchical Text Classification

Afshin Khadangi; Christophe Zgrzendek; Igor Tchappi; Johannes Sedlmeir; Qianbo Zang

arxiv: 2505.05583 · v1 · pith:E4JOD2HCnew · submitted 2025-05-08 · 💻 cs.CL

KG-HTC: Integrating Knowledge Graphs into LLMs for Effective Zero-shot Hierarchical Text Classification

Qianbo Zang , Christophe Zgrzendek , Igor Tchappi , Afshin Khadangi , Johannes Sedlmeir This is my paper

Pith reviewed 2026-05-22 15:32 UTC · model grok-4.3

classification 💻 cs.CL

keywords hierarchical text classificationknowledge graphszero-shot learninglarge language modelsretrieval-augmented generationtaxonomy

0 comments

The pith

Knowledge graphs retrieved via RAG supply structured context that lets LLMs classify documents into deep taxonomies in a strict zero-shot setting.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces KG-HTC to perform hierarchical text classification using LLMs without any training examples. It retrieves relevant subgraphs from knowledge graphs to give the model semantic information about labels at different levels. Experiments on three datasets show it beats baselines, with bigger gains at deeper hierarchy levels. This matters because real-world classification often lacks labels and has large, imbalanced label sets.

Core claim

KG-HTC integrates knowledge graphs into LLMs by retrieving subgraphs related to the input text using a Retrieval-Augmented Generation approach. This provides structured semantic context that enhances the LLM's ability to understand label semantics at various hierarchy levels. The method improves classification accuracy in strict zero-shot settings on the WoS, DBpedia, and Amazon datasets, particularly at deeper levels of the hierarchy.

What carries the argument

RAG-based retrieval of relevant subgraphs from knowledge graphs to augment LLM prompts with structured semantic context for label understanding.

If this is right

LLMs can handle larger label spaces in HTC without supervision or fine-tuning.
Performance improves especially for labels at deeper levels of the taxonomy.
The approach mitigates challenges from long-tail label distributions.
Structured knowledge integration addresses real-world HTC applications lacking annotated data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar retrieval augmentation could extend to other zero-shot structured prediction tasks such as relation extraction.
Domain-specific knowledge graphs might yield further gains when the general-purpose graph coverage is sparse.
The method suggests a broader pattern where external structured data reduces the need for task-specific labeled examples in NLP.

Load-bearing premise

Subgraphs retrieved from general-purpose knowledge graphs via RAG supply sufficiently relevant and structured semantic context to improve an LLM's understanding of label semantics across hierarchy levels without any task-specific fine-tuning or labeled examples.

What would settle it

Running KG-HTC on a dataset where the knowledge graph has no or minimal coverage for the taxonomy labels and measuring whether it still outperforms plain LLM baselines.

Figures

Figures reproduced from arXiv: 2505.05583 by Afshin Khadangi, Christophe Zgrzendek, Igor Tchappi, Johannes Sedlmeir, Qianbo Zang.

**Figure 1.** Figure 1: An example of HTC from the Amazon Product Review dataset. However, in real-world applications, HTC often faces one or multiple out of the following three significant challenges. First, there may be a shortage of annotated data, particularly as the cost of manually labeling custom data at multiple hierarchical levels is prohibitively high [5]. This problem becomes even more severe in dynamic environments … view at source ↗

**Figure 2.** Figure 2: The overview pipeline of KG-HTC. posed Z-STC to propagate similarity scores up the hierarchy and leverage this propagated information to optimize classification for upper-level labels. The third method combines embedding models with LLM-based classification. Paletto et al. [25] introduced HiLA, where LLMs generate new label layers inserted into the bottom of the current taxonomy. Then, Paletto follows Z-ST… view at source ↗

**Figure 3.** Figure 3: Visualization of the knowledge graph (tree) constructed from the multi-level taxonomy in the Amazon Product Review dataset. The red nodes represent labels in the first hierarchical level. The green nodes denote subcategories (second level) interconnected through parent-child relationships. And the yellow nodes correspond to finally fine-grained leaf categories in the third level. output text y ∼ LLM(x) us… view at source ↗

**Figure 4.** Figure 4: As the taxonomy deepens, KG-HTC exhibits a slower performance degradation on the WoS and Amazon datasets [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

read the original abstract

Hierarchical Text Classification (HTC) involves assigning documents to labels organized within a taxonomy. Most previous research on HTC has focused on supervised methods. However, in real-world scenarios, employing supervised HTC can be challenging due to a lack of annotated data. Moreover, HTC often faces issues with large label spaces and long-tail distributions. In this work, we present Knowledge Graphs for zero-shot Hierarchical Text Classification (KG-HTC), which aims to address these challenges of HTC in applications by integrating knowledge graphs with Large Language Models (LLMs) to provide structured semantic context during classification. Our method retrieves relevant subgraphs from knowledge graphs related to the input text using a Retrieval-Augmented Generation (RAG) approach. Our KG-HTC can enhance LLMs to understand label semantics at various hierarchy levels. We evaluate KG-HTC on three open-source HTC datasets: WoS, DBpedia, and Amazon. Our experimental results show that KG-HTC significantly outperforms three baselines in the strict zero-shot setting, particularly achieving substantial improvements at deeper levels of the hierarchy. This evaluation demonstrates the effectiveness of incorporating structured knowledge into LLMs to address HTC's challenges in large label spaces and long-tailed label distributions. Our code is available at: https://github.com/QianboZang/KG-HTC.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper proposes KG-HTC, a zero-shot hierarchical text classification method that retrieves subgraphs from general-purpose knowledge graphs via RAG and injects them as structured semantic context into LLMs to improve label understanding across hierarchy levels. It reports significant outperformance over three baselines on the WoS, DBpedia, and Amazon datasets, with larger gains at deeper hierarchy levels, and releases code for reproducibility.

Significance. If the reported gains prove robust under strict zero-shot conditions with no information leakage, the approach could meaningfully advance zero-shot HTC by addressing large label spaces and long-tail distributions through external structured knowledge. The public code release at https://github.com/QianboZang/KG-HTC is a clear strength for reproducibility and follow-up work.

major comments (1)

[Abstract and Experiments] Abstract and Experiments section: The headline claim of 'strict zero-shot setting' and substantial gains at deeper hierarchy levels rests on the assumption that RAG-retrieved subgraphs supply only external semantic context. For the DBpedia dataset, if the chosen KG (e.g., a DBpedia-style graph) contains the same category hierarchy or instance-level relations used as ground-truth labels, retrieval can surface direct label definitions or parent-child links, converting the method into implicit supervised lookup rather than zero-shot reasoning. No exclusion filters or disjoint-KG protocol is described to rule out this overlap.

minor comments (2)

[Abstract] The abstract states outperformance on three datasets but provides no details on exact baselines, metrics, statistical significance tests, or error analysis; these should be summarized in the abstract or early in the experiments section for clarity.
[Method] Notation for hierarchy levels and subgraph retrieval should be defined more explicitly (e.g., how depth is measured and how relevance scoring in RAG is performed) to aid replication.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comment on the zero-shot integrity of our evaluation. We address the concern point by point below.

read point-by-point responses

Referee: [Abstract and Experiments] Abstract and Experiments section: The headline claim of 'strict zero-shot setting' and substantial gains at deeper hierarchy levels rests on the assumption that RAG-retrieved subgraphs supply only external semantic context. For the DBpedia dataset, if the chosen KG (e.g., a DBpedia-style graph) contains the same category hierarchy or instance-level relations used as ground-truth labels, retrieval can surface direct label definitions or parent-child links, converting the method into implicit supervised lookup rather than zero-shot reasoning. No exclusion filters or disjoint-KG protocol is described to rule out this overlap.

Authors: We appreciate this observation, which correctly identifies a point that requires greater transparency. Our experiments used a general-purpose knowledge graph (Wikidata) whose entity and relation structure is not identical to the Wikipedia-derived category taxonomies in the DBpedia dataset or the label hierarchies in WoS and Amazon. Retrieval operates via embedding similarity between the input document and KG entities rather than direct lookup of label strings or parent-child edges. Nevertheless, the manuscript does not explicitly document exclusion filters or a formal disjoint-KG protocol. We will therefore revise the Experiments section to specify the exact KG source, the embedding-based retrieval procedure, and any post-retrieval checks confirming that retrieved subgraphs do not contain the ground-truth label definitions or hierarchy edges. This addition will be made in the next version. revision: yes

Circularity Check

0 steps flagged

No circularity; method uses external KGs and public benchmarks without self-referential reduction

full rationale

The paper describes a RAG-based retrieval of subgraphs from general-purpose knowledge graphs to augment LLMs for zero-shot HTC. No equations, parameters, or derivations are presented that reduce by construction to fitted inputs or self-citations. The central claim relies on external structured knowledge and standard evaluation on WoS, DBpedia, and Amazon datasets, remaining self-contained against external benchmarks with no load-bearing self-referential steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that knowledge graphs contain label-relevant facts that RAG can surface usefully for LLMs; no free parameters or invented entities are mentioned in the abstract.

axioms (1)

domain assumption Knowledge graphs contain structured semantic information relevant to taxonomy labels that can be retrieved to augment LLM understanding in zero-shot settings.
Invoked as the core mechanism enabling performance gains without labeled data.

pith-pipeline@v0.9.0 · 5776 in / 1050 out tokens · 35238 ms · 2026-05-22T15:32:07.017329+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our method retrieves relevant subgraphs from knowledge graphs related to the input text using a Retrieval-Augmented Generation (RAG) approach... upwards propagation algorithm... structured sequences are subsequently concatenated into a prompt
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We evaluate KG-HTC on three open-source HTC datasets: WoS, DBpedia, and Amazon... significant improvements at deeper levels of the hierarchy

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Taxon: Hierarchical Tax Code Prediction with Semantically Aligned LLM Expert Guidance
cs.LG 2026-01 unverdicted novelty 6.0

Taxon uses a mixture-of-experts architecture and LLM-derived semantic verification to achieve state-of-the-art performance on hierarchical tax code prediction and has been deployed in production at Alibaba.
SC-Taxo: Hierarchical Taxonomy Generation under Semantic Consistency Constraints using Large Language Models
cs.CL 2026-05 unverdicted novelty 4.0

SC-Taxo adds bidirectional heading generation and peer semantic dependency modeling to LLMs to produce taxonomies with improved hierarchy alignment and heading quality on scientific literature benchmarks.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · cited by 2 Pith papers · 6 internal anchors

[1]

S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives. Dbpedia: A nucleus for a web of open data. In international semantic web conference, pages 722–735. Springer, 2007

work page 2007
[2]

Bongiovanni, L

L. Bongiovanni, L. Bruno, F. Dominici, and G. Rizzo. Zero-shot tax- onomy mapping for document classification. In Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing, pages 911–918, 2023

work page 2023
[3]

Brown, B

T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al. Language mod- els are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020

work page 1901
[4]

Chatterjee, A

S. Chatterjee, A. Maheshwari, G. Ramakrishnan, and S. N. Jagarlapudi. Joint learning of hyperbolic label embeddings for hierarchical multi- label classification. In P. Merlo, J. Tiedemann, and R. Tsarfaty, editors, Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 2829– 2841, On...

work page doi:10.18653/v1/2021.eacl-main.247 2021
[5]

D. Chen, Z. Yu, and S. R. Bowman. Clean or annotate: How to spend a limited data collection budget. In C. Cherry, A. Fan, G. Fos- ter, G. R. Haffari, S. Khadivi, N. V . Peng, X. Ren, E. Shareghi, and S. Swayamdipta, editors, Proceedings of the Third Workshop on Deep Learning for Low-Resource Natural Language Processing, pages 152–168, Hybrid, July 2022. A...

work page doi:10.18653/v1/2022.deeplo-1.17 2022
[6]

H. Chen, Q. Ma, Z. Lin, and J. Yan. Hierarchy-aware label semantics matching network for hierarchical text classification. In C. Zong, F. Xia, W. Li, and R. Navigli, editors, Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th Interna- tional Joint Conference on Natural Language Processing (Volume 1: Long P...

work page doi:10.18653/v1/2021.acl-long.337 2021
[7]

Devlin, M.-W

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In Pro- ceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technolo- gies, volume 1 (long and short papers), pages 4171–4186, 2019

work page 2019
[8]

D. Edge, H. Trinh, N. Cheng, J. Bradley, A. Chao, A. Mody, S. Truitt, D. Metropolitansky, R. O. Ness, and J. Larson. From local to global: A graph rag approach to query-focused summarization. arXiv preprint arXiv:2404.16130, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[9]

W. Fan, Y . Ding, L. Ning, S. Wang, H. Li, D. Yin, T.-S. Chua, and Q. Li. A survey on rag meeting llms: Towards retrieval-augmented large language models. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 6491–6501, 2024

work page 2024
[10]

Y . Gao, Y . Xiong, X. Gao, K. Jia, J. Pan, Y . Bi, Y . Dai, J. Sun, H. Wang, and H. Wang. Retrieval-augmented generation for large language mod- els: A survey. arXiv preprint arXiv:2312.10997, 2, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[11]

Halder, A

K. Halder, A. Akbik, J. Krapac, and R. V ollgraf. Task-aware rep- resentation of sentences for generic text classification. In D. Scott, N. Bel, and C. Zong, editors,Proceedings of the 28th International Con- ference on Computational Linguistics , pages 3202–3213, Barcelona, Spain (Online), Dec. 2020. International Committee on Computational Linguistics. ...

work page doi:10.18653/v1/2020.coling-main.285 2020
[12]

Hendrycks, C

D. Hendrycks, C. Burns, S. Basart, A. Zou, M. Mazeika, D. Song, and J. Steinhardt. Measuring massive multitask language understanding. In International Conference on Learning Representations, 2021

work page 2021
[13]

Huang, E

W. Huang, E. Chen, Q. Liu, Y . Chen, Z. Huang, Y . Liu, Z. Zhao, D. Zhang, and S. Wang. Hierarchical multi-label text classification: An attention-based recurrent network approach. In Proceedings of the 28th ACM international conference on information and knowledge manage- ment, pages 1051–1060, 2019

work page 2019
[14]

Karpukhin, B

V . Karpukhin, B. Oguz, S. Min, P. Lewis, L. Wu, S. Edunov, D. Chen, and W.-t. Yih. Dense passage retrieval for open-domain ques- tion answering. In B. Webber, T. Cohn, Y . He, and Y . Liu, ed- itors, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages 6769–6781, On- line, Nov. 2020. Association for Compu...

work page 2020
[15]

Kashnitsky

Y . Kashnitsky. Hierarchical text classification, 2020. URL https://www. kaggle.com/dsv/1054619

work page arXiv 2020
[16]

Kowsari, D

K. Kowsari, D. E. Brown, M. Heidarysafa, K. Jafari Meimandi, M. S. Gerber, and L. E. Barnes. Hdltex: Hierarchical deep learning for text classification. In 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA) , pages 364–371, 2017. doi: 10. 1109/ICMLA.2017.0-134

work page 2017
[17]

Lewis, E

P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschel, et al. Retrieval- augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems, 33:9459–9474, 2020

work page 2020
[18]

T. Li, G. Zhang, Q. D. Do, X. Yue, and W. Chen. Long- context llms struggle with long in-context learning. arXiv preprint arXiv:2404.02060, 2024

work page arXiv 2024
[19]

Z. Li, Q. Zang, D. Ma, J. Guo, T. Zheng, M. Liu, X. Niu, Y . Wang, J. Yang, J. Liu, et al. Autokaggle: A multi-agent framework for au- tonomous data science competitions. arXiv preprint arXiv:2410.20424, 2024

work page arXiv 2024
[20]

A. Liu, B. Feng, B. Xue, B. Wang, B. Wu, C. Lu, C. Zhao, C. Deng, C. Zhang, C. Ruan, et al. Deepseek-v3 technical report. arXiv preprint arXiv:2412.19437, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[21]

Y . Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V . Stoyanov. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1907
[22]

Y . Liu, K. Zhang, Z. Huang, K. Wang, Y . Zhang, Q. Liu, and E. Chen. Enhancing hierarchical text classification through knowledge graph in- tegration. In A. Rogers, J. Boyd-Graber, and N. Okazaki, editors, Findings of the Association for Computational Linguistics: ACL 2023 , pages 5797–5810, Toronto, Canada, July 2023. Association for Com- putational Lin...

work page doi:10.18653/v1/2023.findings-acl.358 2023
[23]

Z. Luo, X. Song, H. Huang, J. Lian, C. Zhang, J. Jiang, and X. Xie. Graphinstruct: Empowering large language models with graph under- standing and reasoning capability. arXiv preprint arXiv:2403.04483 , 2024

work page arXiv 2024
[24]

Y . Meng, J. Shen, C. Zhang, and J. Han. Weakly-supervised hierarchical text classification. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 6826–6833, 2019

work page 2019
[25]

Paletto, V

L. Paletto, V . Basile, and R. Esposito. Label augmentation for zero-shot hierarchical text classification. In L.-W. Ku, A. Martins, and V . Sriku- mar, editors, Proceedings of the 62nd Annual Meeting of the Associ- ation for Computational Linguistics (Volume 1: Long Papers) , pages 7697–7706, Bangkok, Thailand, Aug. 2024. Association for Compu- tational ...

work page doi:10.18653/v1/2024.acl-long.416 2024
[26]

Patel, P

D. Patel, P. Dangati, J.-Y . Lee, M. Boratko, and A. McCallum. Modeling label space interactions in multi-label classification using box embed- dings. ICLR 2022 Poster, 2022

work page 2022
[27]

B. Peng, Y . Zhu, Y . Liu, X. Bo, H. Shi, C. Hong, Y . Zhang, and S. Tang. Graph retrieval-augmented generation: A survey. arXiv preprint arXiv:2408.08921, 2024

work page arXiv 2024
[28]

Radford, K

A. Radford, K. Narasimhan, T. Salimans, I. Sutskever, et al. Improving language understanding by generative pre-training. OpenAI, 2018

work page 2018
[29]

Simonofski, J

A. Simonofski, J. Fink, and C. Burnay. Supporting policy-making with social media and e-participation platforms data: A policy analytics framework. Government Information Quarterly, 38(3):101590, 2021

work page 2021
[30]

Sun and E.-P

A. Sun and E.-P. Lim. Hierarchical text classification and evaluation. In Proceedings 2001 IEEE International Conference on Data Mining , pages 521–528, 2001. doi: 10.1109/ICDM.2001.989560

work page doi:10.1109/icdm.2001.989560 2001
[31]

LLaMA: Open and Efficient Foundation Language Models

H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[32]

A. Yang, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Li, D. Liu, F. Huang, H. Wei, et al. Qwen2. 5 technical report. arXiv preprint arXiv:2412.15115, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[33]

Zhang, J

Q. Zhang, J. Dong, H. Chen, D. Zha, Z. Yu, and X. Huang. Knowgpt: Knowledge graph based prompting for large language models. Ad- vances in Neural Information Processing Systems , 37:6052–6080, 2024

work page 2024
[34]

Zhang, R

Y . Zhang, R. Yang, X. Xu, R. Li, J. Xiao, J. Shen, and J. Han. Teleclass: Taxonomy enrichment and llm-enhanced hierarchical text classification with minimal supervision. arXiv preprint arXiv:2403.00165, 2024

work page arXiv 2024
[35]

K. Zhu, Q. Zang, S. Jia, S. Wu, F. Fang, Y . Li, S. Gavin, T. Zheng, J. Guo, B. Li, et al. Lime: Less is more for mllm evaluation. arXiv preprint arXiv:2409.06851, 2024

work page arXiv 2024

[1] [1]

S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives. Dbpedia: A nucleus for a web of open data. In international semantic web conference, pages 722–735. Springer, 2007

work page 2007

[2] [2]

Bongiovanni, L

L. Bongiovanni, L. Bruno, F. Dominici, and G. Rizzo. Zero-shot tax- onomy mapping for document classification. In Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing, pages 911–918, 2023

work page 2023

[3] [3]

Brown, B

T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al. Language mod- els are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020

work page 1901

[4] [4]

Chatterjee, A

S. Chatterjee, A. Maheshwari, G. Ramakrishnan, and S. N. Jagarlapudi. Joint learning of hyperbolic label embeddings for hierarchical multi- label classification. In P. Merlo, J. Tiedemann, and R. Tsarfaty, editors, Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 2829– 2841, On...

work page doi:10.18653/v1/2021.eacl-main.247 2021

[5] [5]

D. Chen, Z. Yu, and S. R. Bowman. Clean or annotate: How to spend a limited data collection budget. In C. Cherry, A. Fan, G. Fos- ter, G. R. Haffari, S. Khadivi, N. V . Peng, X. Ren, E. Shareghi, and S. Swayamdipta, editors, Proceedings of the Third Workshop on Deep Learning for Low-Resource Natural Language Processing, pages 152–168, Hybrid, July 2022. A...

work page doi:10.18653/v1/2022.deeplo-1.17 2022

[6] [6]

H. Chen, Q. Ma, Z. Lin, and J. Yan. Hierarchy-aware label semantics matching network for hierarchical text classification. In C. Zong, F. Xia, W. Li, and R. Navigli, editors, Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th Interna- tional Joint Conference on Natural Language Processing (Volume 1: Long P...

work page doi:10.18653/v1/2021.acl-long.337 2021

[7] [7]

Devlin, M.-W

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In Pro- ceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technolo- gies, volume 1 (long and short papers), pages 4171–4186, 2019

work page 2019

[8] [8]

D. Edge, H. Trinh, N. Cheng, J. Bradley, A. Chao, A. Mody, S. Truitt, D. Metropolitansky, R. O. Ness, and J. Larson. From local to global: A graph rag approach to query-focused summarization. arXiv preprint arXiv:2404.16130, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[9] [9]

W. Fan, Y . Ding, L. Ning, S. Wang, H. Li, D. Yin, T.-S. Chua, and Q. Li. A survey on rag meeting llms: Towards retrieval-augmented large language models. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 6491–6501, 2024

work page 2024

[10] [10]

Y . Gao, Y . Xiong, X. Gao, K. Jia, J. Pan, Y . Bi, Y . Dai, J. Sun, H. Wang, and H. Wang. Retrieval-augmented generation for large language mod- els: A survey. arXiv preprint arXiv:2312.10997, 2, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[11] [11]

Halder, A

K. Halder, A. Akbik, J. Krapac, and R. V ollgraf. Task-aware rep- resentation of sentences for generic text classification. In D. Scott, N. Bel, and C. Zong, editors,Proceedings of the 28th International Con- ference on Computational Linguistics , pages 3202–3213, Barcelona, Spain (Online), Dec. 2020. International Committee on Computational Linguistics. ...

work page doi:10.18653/v1/2020.coling-main.285 2020

[12] [12]

Hendrycks, C

D. Hendrycks, C. Burns, S. Basart, A. Zou, M. Mazeika, D. Song, and J. Steinhardt. Measuring massive multitask language understanding. In International Conference on Learning Representations, 2021

work page 2021

[13] [13]

Huang, E

W. Huang, E. Chen, Q. Liu, Y . Chen, Z. Huang, Y . Liu, Z. Zhao, D. Zhang, and S. Wang. Hierarchical multi-label text classification: An attention-based recurrent network approach. In Proceedings of the 28th ACM international conference on information and knowledge manage- ment, pages 1051–1060, 2019

work page 2019

[14] [14]

Karpukhin, B

V . Karpukhin, B. Oguz, S. Min, P. Lewis, L. Wu, S. Edunov, D. Chen, and W.-t. Yih. Dense passage retrieval for open-domain ques- tion answering. In B. Webber, T. Cohn, Y . He, and Y . Liu, ed- itors, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages 6769–6781, On- line, Nov. 2020. Association for Compu...

work page 2020

[15] [15]

Kashnitsky

Y . Kashnitsky. Hierarchical text classification, 2020. URL https://www. kaggle.com/dsv/1054619

work page arXiv 2020

[16] [16]

Kowsari, D

K. Kowsari, D. E. Brown, M. Heidarysafa, K. Jafari Meimandi, M. S. Gerber, and L. E. Barnes. Hdltex: Hierarchical deep learning for text classification. In 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA) , pages 364–371, 2017. doi: 10. 1109/ICMLA.2017.0-134

work page 2017

[17] [17]

Lewis, E

P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschel, et al. Retrieval- augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems, 33:9459–9474, 2020

work page 2020

[18] [18]

T. Li, G. Zhang, Q. D. Do, X. Yue, and W. Chen. Long- context llms struggle with long in-context learning. arXiv preprint arXiv:2404.02060, 2024

work page arXiv 2024

[19] [19]

Z. Li, Q. Zang, D. Ma, J. Guo, T. Zheng, M. Liu, X. Niu, Y . Wang, J. Yang, J. Liu, et al. Autokaggle: A multi-agent framework for au- tonomous data science competitions. arXiv preprint arXiv:2410.20424, 2024

work page arXiv 2024

[20] [20]

A. Liu, B. Feng, B. Xue, B. Wang, B. Wu, C. Lu, C. Zhao, C. Deng, C. Zhang, C. Ruan, et al. Deepseek-v3 technical report. arXiv preprint arXiv:2412.19437, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[21] [21]

Y . Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V . Stoyanov. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1907

[22] [22]

Y . Liu, K. Zhang, Z. Huang, K. Wang, Y . Zhang, Q. Liu, and E. Chen. Enhancing hierarchical text classification through knowledge graph in- tegration. In A. Rogers, J. Boyd-Graber, and N. Okazaki, editors, Findings of the Association for Computational Linguistics: ACL 2023 , pages 5797–5810, Toronto, Canada, July 2023. Association for Com- putational Lin...

work page doi:10.18653/v1/2023.findings-acl.358 2023

[23] [23]

Z. Luo, X. Song, H. Huang, J. Lian, C. Zhang, J. Jiang, and X. Xie. Graphinstruct: Empowering large language models with graph under- standing and reasoning capability. arXiv preprint arXiv:2403.04483 , 2024

work page arXiv 2024

[24] [24]

Y . Meng, J. Shen, C. Zhang, and J. Han. Weakly-supervised hierarchical text classification. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 6826–6833, 2019

work page 2019

[25] [25]

Paletto, V

L. Paletto, V . Basile, and R. Esposito. Label augmentation for zero-shot hierarchical text classification. In L.-W. Ku, A. Martins, and V . Sriku- mar, editors, Proceedings of the 62nd Annual Meeting of the Associ- ation for Computational Linguistics (Volume 1: Long Papers) , pages 7697–7706, Bangkok, Thailand, Aug. 2024. Association for Compu- tational ...

work page doi:10.18653/v1/2024.acl-long.416 2024

[26] [26]

Patel, P

D. Patel, P. Dangati, J.-Y . Lee, M. Boratko, and A. McCallum. Modeling label space interactions in multi-label classification using box embed- dings. ICLR 2022 Poster, 2022

work page 2022

[27] [27]

B. Peng, Y . Zhu, Y . Liu, X. Bo, H. Shi, C. Hong, Y . Zhang, and S. Tang. Graph retrieval-augmented generation: A survey. arXiv preprint arXiv:2408.08921, 2024

work page arXiv 2024

[28] [28]

Radford, K

A. Radford, K. Narasimhan, T. Salimans, I. Sutskever, et al. Improving language understanding by generative pre-training. OpenAI, 2018

work page 2018

[29] [29]

Simonofski, J

A. Simonofski, J. Fink, and C. Burnay. Supporting policy-making with social media and e-participation platforms data: A policy analytics framework. Government Information Quarterly, 38(3):101590, 2021

work page 2021

[30] [30]

Sun and E.-P

A. Sun and E.-P. Lim. Hierarchical text classification and evaluation. In Proceedings 2001 IEEE International Conference on Data Mining , pages 521–528, 2001. doi: 10.1109/ICDM.2001.989560

work page doi:10.1109/icdm.2001.989560 2001

[31] [31]

LLaMA: Open and Efficient Foundation Language Models

H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[32] [32]

A. Yang, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Li, D. Liu, F. Huang, H. Wei, et al. Qwen2. 5 technical report. arXiv preprint arXiv:2412.15115, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[33] [33]

Zhang, J

Q. Zhang, J. Dong, H. Chen, D. Zha, Z. Yu, and X. Huang. Knowgpt: Knowledge graph based prompting for large language models. Ad- vances in Neural Information Processing Systems , 37:6052–6080, 2024

work page 2024

[34] [34]

Zhang, R

Y . Zhang, R. Yang, X. Xu, R. Li, J. Xiao, J. Shen, and J. Han. Teleclass: Taxonomy enrichment and llm-enhanced hierarchical text classification with minimal supervision. arXiv preprint arXiv:2403.00165, 2024

work page arXiv 2024

[35] [35]

K. Zhu, Q. Zang, S. Jia, S. Wu, F. Fang, Y . Li, S. Gavin, T. Zheng, J. Guo, B. Li, et al. Lime: Less is more for mllm evaluation. arXiv preprint arXiv:2409.06851, 2024

work page arXiv 2024