pith. sign in

arxiv: 2601.08418 · v2 · submitted 2026-01-13 · 💻 cs.LG · cs.AI

Taxon: Hierarchical Tax Code Prediction with Semantically Aligned LLM Expert Guidance

Pith reviewed 2026-05-16 14:39 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords tax code predictionhierarchical classificationmixture of expertssemantic consistencye-commerce automationcompliance managementlarge language modelsmulti-source training
0
0 comments X

The pith

A mixture-of-experts model guided by distilled LLM semantics maps products to hierarchical tax codes with high accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Taxon as a framework that routes multi-modal product features through a feature-gating mixture-of-experts architecture while using a semantic consistency model distilled from large language models to check alignment with official tax definitions. It trains on a combination of curated tax databases, invoice logs, and merchant data to handle noisy supervision in real business records. The central goal is accurate placement of each product at the correct node in a multi-level national tax hierarchy, where mistakes create financial and regulatory problems. If the approach holds, e-commerce platforms gain reliable automation for invoicing and compliance without constant manual correction. An added step that reconstructs full hierarchical paths further boosts structural consistency and overall scores.

Core claim

Taxon integrates a feature-gating mixture-of-experts architecture that adaptively routes multi-modal features across taxonomy levels with a semantic consistency model distilled from large language models that verifies alignment between product titles and official tax definitions, trained via a multi-source pipeline of tax databases, invoice validation logs, and merchant registration data to deliver state-of-the-art performance on hierarchical tax code prediction and full-path reconstruction.

What carries the argument

Feature-gating mixture-of-experts architecture paired with an LLM-distilled semantic consistency model that routes features across levels and checks title-to-definition alignment.

If this is right

  • The model outperforms strong baselines on both the proprietary TaxCode dataset and public benchmarks.
  • Full hierarchical path reconstruction produces the highest overall F1 scores by improving structural consistency.
  • The system supports production deployment at volumes above 500,000 queries per day.
  • Interpretability improves because the semantic consistency checks link predictions directly to official definitions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same routing-plus-alignment pattern could transfer to other multi-level regulatory classification tasks such as customs codes or accounting categories.
  • Periodic retraining on fresh invoice logs might keep the model aligned after tax code updates without full retraining from scratch.
  • The semantic verification step could be applied independently to flag low-confidence predictions for human review in high-stakes compliance flows.
  • Extending the multi-source pipeline to include user-generated product descriptions might further reduce reliance on merchant registration data.

Load-bearing premise

The combination of curated tax databases, invoice logs, and merchant registration data supplies clean enough and representative enough supervision for the model to generalize to unseen products and tax updates.

What would settle it

A large drop in F1 scores when the model is tested on products from merchant categories or tax-rule revisions absent from the training sources.

Figures

Figures reproduced from arXiv: 2601.08418 by Chuanfei Xu, Jihang Li, Jing Wang, Qing Liu, Wei Wang, Zeyi Wen, Zulong Chen.

Figure 1
Figure 1. Figure 1: An illustration of tax code prediction in e-commerce. [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the proposed framework. The system [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overview of the training workflow, integrating hi [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Four-stage data processing and training pipeline. [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Log-scale Cumulative distribution of prediction confi [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Structured requirements specifying input/output [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗
Figure 6
Figure 6. Figure 6: Role hinting the model to act as a tax expert and [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
Figure 9
Figure 9. Figure 9: Integration of the proposed framework into Alibaba’s [PITH_FULL_IMAGE:figures/full_fig_p007_9.png] view at source ↗
Figure 11
Figure 11. Figure 11: Model performance at different levels of path com [PITH_FULL_IMAGE:figures/full_fig_p009_11.png] view at source ↗
Figure 10
Figure 10. Figure 10: Model performance at different levels of path com [PITH_FULL_IMAGE:figures/full_fig_p009_10.png] view at source ↗
read the original abstract

Tax code prediction is a crucial yet underexplored task in automating invoicing and compliance management for large-scale e-commerce platforms. Each product must be accurately mapped to a node within a multi-level taxonomic hierarchy defined by national standards, where errors lead to financial inconsistencies and regulatory risks. This paper presents Taxon, a semantically aligned and expert-guided framework for hierarchical tax code prediction. Taxon integrates (i) a feature-gating mixture-of-experts architecture that adaptively routes multi-modal features across taxonomy levels, and (ii) a semantic consistency model distilled from large language models acting as domain experts to verify alignment between product titles and official tax definitions. To address noisy supervision in real business records, we design a multi-source training pipeline that combines curated tax databases, invoice validation logs, and merchant registration data to provide both structural and semantic supervision. Extensive experiments on the proprietary TaxCode dataset and public benchmarks demonstrate that Taxon achieves state-of-the-art performance, outperforming strong baselines. Further, an additional full hierarchical paths reconstruction procedure significantly improves structural consistency, yielding the highest overall F1 scores. Taxon has been deployed in production within Alibaba's tax service system, handling an average of over 500,000 tax code queries per day and reaching peak volumes above five million requests during business event with improved accuracy, interpretability, and robustness.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces Taxon, a framework for hierarchical tax code prediction that integrates a feature-gating mixture-of-experts architecture for routing multi-modal features across taxonomy levels with a semantic consistency model distilled from LLMs to align product titles with official tax definitions. It employs a multi-source training pipeline combining curated tax databases, invoice validation logs, and merchant registration data to handle noisy supervision, followed by a full hierarchical paths reconstruction procedure. Experiments on the proprietary TaxCode dataset and public benchmarks claim state-of-the-art performance with highest overall F1 scores, and the system is reported as deployed in production at Alibaba handling over 500,000 queries per day.

Significance. If the empirical claims hold, the work provides practical value for automating tax compliance in large-scale e-commerce, where accurate hierarchical mapping reduces financial and regulatory risks. The combination of MoE routing with LLM-distilled semantic guidance and the post-processing reconstruction step represents a targeted engineering contribution for noisy real-world data. Production deployment with high query volume offers concrete evidence of robustness and interpretability beyond benchmark results.

major comments (2)
  1. [§4.2] §4.2 (Experiments): The SOTA claim on the proprietary TaxCode dataset and public benchmarks is stated without reporting specific F1 scores, baseline details, ablation studies on the MoE gating or LLM distillation components, or error analysis by taxonomy depth; this prevents verification of the improvement margins and the contribution of the full-path reconstruction procedure.
  2. [§3.1] §3.1 (Multi-source training pipeline): The assumption that combining curated tax databases, invoice logs, and merchant data yields sufficiently clean supervision for generalization is central to the claims but lacks quantitative characterization of label noise rates or distribution shift metrics between sources.
minor comments (2)
  1. [Abstract] The abstract and §5 (Deployment) mention improved accuracy and robustness but do not define the exact evaluation protocol (e.g., micro/macro F1, hierarchical distance metrics) used for the reported results.
  2. [§3.2] Notation for the semantic consistency model (e.g., how LLM outputs are distilled into the consistency loss) should be formalized with an equation in §3.2 to improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comments point by point below and commit to revisions that will strengthen the empirical support and transparency of the results.

read point-by-point responses
  1. Referee: [§4.2] §4.2 (Experiments): The SOTA claim on the proprietary TaxCode dataset and public benchmarks is stated without reporting specific F1 scores, baseline details, ablation studies on the MoE gating or LLM distillation components, or error analysis by taxonomy depth; this prevents verification of the improvement margins and the contribution of the full-path reconstruction procedure.

    Authors: We agree that explicit numerical results and component-wise analysis are necessary to substantiate the SOTA claims. In the revised manuscript we will report the precise overall and per-level F1 scores for Taxon against all baselines on both the TaxCode dataset and the public benchmarks. We will add ablation tables that isolate the feature-gating MoE routing and the LLM-distilled semantic consistency model, and we will include an error analysis stratified by taxonomy depth that quantifies the contribution of the full hierarchical paths reconstruction step. revision: yes

  2. Referee: [§3.1] §3.1 (Multi-source training pipeline): The assumption that combining curated tax databases, invoice logs, and merchant data yields sufficiently clean supervision for generalization is central to the claims but lacks quantitative characterization of label noise rates or distribution shift metrics between sources.

    Authors: We acknowledge that quantitative characterization of label noise and distribution shifts would make the multi-source pipeline more convincing. In the revision we will add estimates of label noise rates obtained via cross-source consistency checks and will report distribution-shift metrics (e.g., feature-space divergence and semantic similarity scores) between the curated tax databases, invoice logs, and merchant registration data. These additions will directly support the claim that the combined supervision is sufficiently clean for generalization. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper describes an empirical ML architecture (feature-gating MoE plus LLM-distilled semantic consistency) trained on multi-source business data and evaluated via standard F1 metrics on proprietary and public benchmarks. No equations, derivations, or parameter-fitting steps are presented that could reduce a claimed prediction to its own inputs by construction. All performance claims rest on external experimental outcomes rather than self-referential definitions or self-citation chains, rendering the work self-contained against the listed circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the framework implicitly assumes reliable multi-source supervision and accurate LLM distillation.

pith-pipeline@v0.9.0 · 5554 in / 1176 out tokens · 32159 ms · 2026-05-16T14:39:47.954242+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 3 internal anchors

  1. [1]

    Goods and Services Tax Classification Catalogue,

    S. T. A. of the People’s Republic of China, “Goods and Services Tax Classification Catalogue,” 2017. [Online]. Available: https: //fgk.chinatax.gov.cn/zcfgk/c100012/c5194763/content.html

  2. [2]

    Classifying Short Text for the Harmonized System with Convolutional Neural Networks,

    J. Luppes, A. P. de Vries, and F. Hasibi, “Classifying Short Text for the Harmonized System with Convolutional Neural Networks,”Radboud University, 2019

  3. [3]

    An Ensemble-Based Approach for Assigning Text to Correct Harmonized System Code,

    Shubham, A. Arya, S. Roy, and S. Jonnala, “An Ensemble-Based Approach for Assigning Text to Correct Harmonized System Code,” in2023 International Conference on Artificial Intelligence and Smart Communication (AISC), Jan 2023, pp. 35–41

  4. [4]

    Enhanced HS Code Classification for Import and Export Goods via Multiscale Attention and ERNIE-BiLSTM,

    M. Liao, L. Huang, J. Zhang, L. Song, and B. Li, “Enhanced HS Code Classification for Import and Export Goods via Multiscale Attention and ERNIE-BiLSTM,”Applied Sciences, vol. 14, no. 22, 2024. [Online]. Available: https://www.mdpi.com/2076-3417/14/22/10267

  5. [5]

    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

    J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,”arXiv preprint arXiv:1810.04805, 2018

  6. [6]

    Convolutional Neural Networks for Sentence Classification,

    Y . Kim, “Convolutional Neural Networks for Sentence Classification,” inProceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Oct 2014

  7. [7]

    XLNet: Generalized Autoregressive Pretraining for Language Understanding,

    Z. Yang, Z. Dai, Y . Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V . Le, “XLNet: Generalized Autoregressive Pretraining for Language Understanding,” inAdvances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch ´e-Buc, E. Fox, and R. Garnett, Eds., vol. 32. Curran Associates, Inc., 2019. [Online]. Available:...

  8. [8]

    R. Y . Rubinstein and D. P. Kroese,The Cross Entropy Method: A Unified Approach To Combinatorial Optimization, Monte-carlo Simulation (In- formation Science and Statistics). Berlin, Heidelberg: Springer-Verlag, 2004

  9. [9]

    Qwen2.5 Technical Report

    Q. Team, “Qwen2.5 Technical Report,”arXiv preprint arXiv:2412.15115, 2024

  10. [10]

    HDLTex: Hierarchical Deep Learning for Text Classification,

    K. Kowsari, D. E. Brown, M. Heidarysafa, K. Jafari Meimandi, M. S. Gerber, and L. E. Barnes, “HDLTex: Hierarchical Deep Learning for Text Classification,” in2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), 2017, pp. 364–371

  11. [11]

    RCV1: A New Benchmark Collection for Text Categorization Research,

    D. D. Lewis, Y . Yang, T. G. Rose, and F. Li, “RCV1: A New Benchmark Collection for Text Categorization Research,”J. Mach. Learn. Res., vol. 5, pp. 361–397, Dec 2004

  12. [12]

    The New York Times Annotated Corpus,

    E. Sandhaus, “The New York Times Annotated Corpus,”Linguistic Data Consortium, Philadelphia, vol. 6, no. 12, p. e26752, 2008

  13. [13]

    Incorporating Hierarchy into Text Encoder: a Contrastive Learning Approach for Hierarchical Text Classification,

    Z. Wang, P. Wang, L. Huang, X. Sun, and H. Wang, “Incorporating Hierarchy into Text Encoder: a Contrastive Learning Approach for Hierarchical Text Classification,” inProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), S. Muresan, P. Nakov, and A. Villavicencio, Eds. Dublin, Ireland: Association ...

  14. [14]

    HILL: Hierarchy-aware Information Lossless Contrastive Learning for Hierarchical Text Classification,

    H. Zhu, J. Wu, R. Liu, Y . Hou, Z. Yuan, S. Li, Y . Pan, and K. Xu, “HILL: Hierarchy-aware Information Lossless Contrastive Learning for Hierarchical Text Classification,” inProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), K. Duh, H. Gome...

  15. [15]

    LH-Mix: Local Hierarchy Correlation Guided Mixup over Hierarchical Prompt Tuning,

    F. Kong, R. Zhang, and Z. Wang, “LH-Mix: Local Hierarchy Correlation Guided Mixup over Hierarchical Prompt Tuning,” inProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V .1, ser. KDD ’25. New York, NY , USA: Association for Computing Machinery, 2025, pp. 636–646. [Online]. Available: https://doi.org/10.1145/3690624.3709326

  16. [16]

    HPT: Hierarchy-aware Prompt Tuning for Hierarchical Text Classification,

    Z. Wang, P. Wang, T. Liu, B. Lin, Y . Cao, Z. Sui, and H. Wang, “HPT: Hierarchy-aware Prompt Tuning for Hierarchical Text Classification,” inProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Y . Goldberg, Z. Kozareva, and Y . Zhang, Eds. Abu Dhabi, United Arab Emirates: Association for Computational Linguistics, Dec 2...

  17. [17]

    HyILR: Hyperbolic Instance-Specific Local Relationships for Hierarchical Text Classification,

    A. Kumar and D. Toshniwal, “HyILR: Hyperbolic Instance-Specific Local Relationships for Hierarchical Text Classification,” inProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop), J. Zhao, M. Wang, and Z. Liu, Eds. Vienna, Austria: Association for Computational Linguistics, Jul 2025, ...

  18. [18]

    Large-Scale Hierarchical Text Classification with Recursively Regularized Deep Graph-CNN,

    H. Peng, J. Li, Y . He, Y . Liu, M. Bao, L. Wang, Y . Song, and Q. Yang, “Large-Scale Hierarchical Text Classification with Recursively Regularized Deep Graph-CNN,” inProceedings of the 2018 World Wide Web Conference, ser. WWW ’18. Republic and Canton of Geneva, CHE: International World Wide Web Conferences Steering Committee, 2018, pp. 1063–1072. [Online...

  19. [19]

    HFT-CNN: Learning Hierarchical Category Structure for Multi-label Short Text Categorization,

    K. Shimura, J. Li, and F. Fukumoto, “HFT-CNN: Learning Hierarchical Category Structure for Multi-label Short Text Categorization,” in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, E. Riloff, D. Chiang, J. Hockenmaier, and J. Tsujii, Eds. Brussels, Belgium: Association for Computational Linguistics, Oct 2018, pp. 8...

  20. [20]

    Hierarchical Transfer Learning for Multi-label Text Classification,

    S. Banerjee, C. Akkaya, F. Perez-Sorrosal, and K. Tsioutsiouliklis, “Hierarchical Transfer Learning for Multi-label Text Classification,” inProceedings of the 57th Annual Meeting of the Association for Computational Linguistics, A. Korhonen, D. Traum, and L. M `arquez, Eds. Florence, Italy: Association for Computational Linguistics, Jul 2019, pp. 6295–630...

  21. [21]

    Hierarchical Multi-Label Classification Networks,

    J. Wehrmann, R. Cerri, and R. Barros, “Hierarchical Multi-Label Classification Networks,” inProceedings of the 35th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, J. Dy and A. Krause, Eds., vol. 80. PMLR, 10–15 Jul 2018, pp. 5075–5084. [Online]. Available: https://proceedings.mlr. press/v80/wehrmann18a.html

  22. [22]

    Hierarchy-Aware Global Model for Hierarchical Text Classification,

    J. Zhou, C. Ma, D. Long, G. Xu, N. Ding, H. Zhang, P. Xie, and G. Liu, “Hierarchy-Aware Global Model for Hierarchical Text Classification,” inProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, D. Jurafsky, J. Chai, N. Schluter, and J. Tetreault, Eds. Association for Computational Linguistics, Jul 2020, pp. 1106–1117. ...

  23. [23]

    Hierarchy-Aware Label Semantics Matching Network for Hierarchical Text Classification,

    H. Chen, Q. Ma, Z. Lin, and J. Yan, “Hierarchy-Aware Label Semantics Matching Network for Hierarchical Text Classification,” inProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), C. Zong, F. Xia, W. Li, and R. Navigli, Ed...

  24. [24]

    Constrained Sequence-to-Tree Generation for Hierarchical Text Classification,

    C. Yu, Y . Shen, and Y . Mao, “Constrained Sequence-to-Tree Generation for Hierarchical Text Classification,” inProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, ser. SIGIR ’22. New York, NY , USA: Association for Computing Machinery, 2022, pp. 1865–1869. [Online]. Available: https://doi.org/1...

  25. [25]

    SGM: Sequence Generation Model for Multi-label Classification,

    P. Yang, X. Sun, W. Li, S. Ma, W. Wu, and H. Wang, “SGM: Sequence Generation Model for Multi-label Classification,” inProceedings of the 27th International Conference on Computational Linguistics, E. M. Bender, L. Derczynski, and P. Isabelle, Eds. Santa Fe, New Mexico, USA: Association for Computational Linguistics, Aug 2018, pp. 3915–3926. [Online]. Avai...

  26. [26]

    Exploring Label Hierarchy in a Generative Way for Hierarchical Text Classification,

    W. Huang, C. Liu, B. Xiao, Y . Zhao, Z. Pan, Z. Zhang, X. Yang, and G. Liu, “Exploring Label Hierarchy in a Generative Way for Hierarchical Text Classification,” inProceedings of the 29th International Conference on Computational Linguistics, N. Calzolari, C.-R. Huang, H. Kim, J. Pustejovsky, L. Wanner, K.-S. Choi, P.-M. Ryu, H.-H. Chen, L. Do- natelli, H...

  27. [27]

    Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer,

    C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y . Zhou, W. Li, and P. J. Liu, “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer,”J. Mach. Learn. Res., vol. 21, no. 1, Jan 2020

  28. [28]

    UMP-MG: A Uni- directed Message-Passing Multi-label Generation Model for Hierarchical Text Classification,

    B. Ning, D. Zhao, X. Zhang, C. Wang, and S. Song, “UMP-MG: A Uni- directed Message-Passing Multi-label Generation Model for Hierarchical Text Classification,”Data Science and Engineering, vol. 8, no. 2, pp. 112–123, Jun 2023

  29. [29]

    Hierarchical text classification as sub-hierarchy sequence generation,

    S. Im, G. Kim, H.-S. Oh, S. Jo, and D. H. Kim, “Hierarchical text classification as sub-hierarchy sequence generation,” inProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence,...

  30. [30]

    Available: https://doi.org/10.1609/aaai.v37i11.26520

    [Online]. Available: https://doi.org/10.1609/aaai.v37i11.26520

  31. [31]

    HiGen: Hierarchy-Aware Sequence Generation for Hierarchical Text Classification,

    V . Jain, M. Rungta, Y . Zhuang, Y . Yu, Z. Wang, M. Gao, J. Skolnick, and C. Zhang, “HiGen: Hierarchy-Aware Sequence Generation for Hierarchical Text Classification,” inProceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), Y . Graham and M. Purver, Eds. St. Julian’s, Malta: As...

  32. [32]

    AttentionXML: Label Tree-based Attention-Aware Deep Model for High-Performance Extreme Multi-Label Text Classification,

    R. You, Z. Zhang, Z. Wang, S. Dai, H. Mamitsuka, and S. Zhu, “AttentionXML: Label Tree-based Attention-Aware Deep Model for High-Performance Extreme Multi-Label Text Classification,” in Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch ´e-Buc, E. Fox, and R. Garnett, Eds., vol. 32. Curran Associates, I...

  33. [33]

    HTCInfoMax: A Global Model for Hierarchical Text Classification via Information Maximization,

    Z. Deng, H. Peng, D. He, J. Li, and P. Yu, “HTCInfoMax: A Global Model for Hierarchical Text Classification via Information Maximization,” inProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, K. Toutanova, A. Rumshisky, L. Zettlemoyer, D. Hakkani-Tur, I. Beltagy, ...

  34. [34]

    Exploiting Global and Local Hierarchies for Hierarchical Text Classification,

    T. Jiang, D. Wang, L. Sun, Z. Chen, F. Zhuang, and Q. Yang, “Exploiting Global and Local Hierarchies for Hierarchical Text Classification,” inProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Y . Goldberg, Z. Kozareva, and Y . Zhang, Eds. Abu Dhabi, United Arab Emirates: Association for Computational Linguistics, Dec ...

  35. [35]

    Enhancing Hierarchical Text Classification through Knowledge Graph Integration,

    Y . Liu, K. Zhang, Z. Huang, K. Wang, Y . Zhang, Q. Liu, and E. Chen, “Enhancing Hierarchical Text Classification through Knowledge Graph Integration,” inFindings of the Association for Computational Linguistics: ACL 2023, A. Rogers, J. Boyd- Graber, and N. Okazaki, Eds. Toronto, Canada: Association for Computational Linguistics, Jul 2023, pp. 5797–5810. ...

  36. [36]

    HiTIN: Hierarchy- aware Tree Isomorphism Network for Hierarchical Text Classification,

    H. Zhu, C. Zhang, J. Huang, J. Wu, and K. Xu, “HiTIN: Hierarchy- aware Tree Isomorphism Network for Hierarchical Text Classification,” inProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), A. Rogers, J. Boyd-Graber, and N. Okazaki, Eds. Toronto, Canada: Association for Computational Linguistics,...

  37. [37]

    Instances and Labels: Hierarchy-aware Joint Supervised Contrastive Learning for Hierarchical Multi-Label Text Classification,

    S. C. L. Yu, J. He, V . G. Basulto, and J. Z. Pan, “Instances and Labels: Hierarchy-aware Joint Supervised Contrastive Learning for Hierarchical Multi-Label Text Classification,” inThe 2023 Conference on Empirical Methods in Natural Language Processing, 2023. [Online]. Available: https://openreview.net/forum?id=S0eqbM16k2

  38. [38]

    Hierarchy-Aware and Label Balanced Model for Hierarchical Text Classification,

    J. Zhang, Y . Li, F. Shen, C. Xia, H. Tan, and Y . He, “Hierarchy-Aware and Label Balanced Model for Hierarchical Text Classification,”Knowledge-Based Systems, vol. 300, p. 112153, 2024. [Online]. Available: https://www.sciencedirect.com/science/article/pii/ S0950705124007871

  39. [39]

    Hierarchy-aware Biased Bound Margin Loss Function for Hierarchical Text Classification,

    G. Kim, S. Im, and H.-S. Oh, “Hierarchy-aware Biased Bound Margin Loss Function for Hierarchical Text Classification,” inFindings of the Association for Computational Linguistics: ACL 2024, L.-W. Ku, A. Martins, and V . Srikumar, Eds. Bangkok, Thailand: Association for Computational Linguistics, Aug 2024, pp. 7672–7682. [Online]. Available: https://aclant...

  40. [40]

    Utilizing Local Hierarchy with Adver- sarial Training for Hierarchical Text Classification,

    Z. Wang, P. Wang, and H. Wang, “Utilizing Local Hierarchy with Adver- sarial Training for Hierarchical Text Classification,” inProceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), N. Calzo- lari, M.-Y . Kan, V . Hoste, A. Lenci, S. Sakti, and N. Xue, Eds. Torino, Italia:...

  41. [41]

    A Novel Negative Sample Generation Method for Contrastive Learning in Hierarchical Text Classification,

    J. Zhou, L. Zhang, Y . He, R. Fan, L. Zhang, and J. Wan, “A Novel Negative Sample Generation Method for Contrastive Learning in Hierarchical Text Classification,” inProceedings of the 31st International Conference on Computational Linguistics, O. Rambow, L. Wanner, M. Apidianaki, H. Al-Khalifa, B. D. Eugenio, and S. Schockaert, Eds. Abu Dhabi, UAE: Associ...

  42. [42]

    Hierarchical Verbalizer for Few-Shot Hierarchical Text Classification,

    K. Ji, Y . Lian, J. Gao, and B. Wang, “Hierarchical Verbalizer for Few-Shot Hierarchical Text Classification,” inProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), A. Rogers, J. Boyd-Graber, and N. Okazaki, Eds. Toronto, Canada: Association for Computational Linguistics, Jul 2023, pp. 2918–2933...

  43. [43]

    NER-guided Comprehensive Hierarchy-aware Prompt Tuning for Hierarchical Text Classification,

    F. Cai, D. Liu, Z. Zhang, G. Liu, X. Yang, and X. Fang, “NER-guided Comprehensive Hierarchy-aware Prompt Tuning for Hierarchical Text Classification,” inProceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), N. Calzolari, M.-Y . Kan, V . Hoste, A. Lenci, S. Sakti, and N. X...

  44. [44]

    Retrieval-style In-context Learning for Few-shot Hierarchical Text Classification,

    H. Chen, Y . Zhao, Z. Chen, M. Wang, L. Li, M. Zhang, and M. Zhang, “Retrieval-style In-context Learning for Few-shot Hierarchical Text Classification,”Transactions of the Association for Computational Linguistics, vol. 12, pp. 1214–1231, 2024. [Online]. Available: https://aclanthology.org/2024.tacl-1.67/

  45. [45]

    Dual prompt tuning based contrastive learning for hierarchical text classification,

    S. Xiong, Y . Zhao, J. Zhang, L. Mengxiang, Z. He, X. Li, and S. Song, “Dual prompt tuning based contrastive learning for hierarchical text classification,” inFindings of the Association for Computational Linguistics: ACL 2024, L.-W. Ku, A. Martins, and V . Srikumar, Eds. Bangkok, Thailand: Association for Computational Linguistics, Aug 2024, pp. 12 146–1...

  46. [46]

    TELEClass: Taxonomy Enrichment and LLM-Enhanced Hierarchical Text Classification with Minimal Supervision,

    Y . Zhang, R. Yang, X. Xu, R. Li, J. Xiao, J. Shen, and J. Han, “TELEClass: Taxonomy Enrichment and LLM-Enhanced Hierarchical Text Classification with Minimal Supervision,” inProceedings of the ACM on Web Conference 2025, ser. WWW ’25. New York, NY , USA: Association for Computing Machinery, 2025, pp. 2032–2042. [Online]. Available: https://doi.org/10.114...

  47. [47]

    Leveraging Taxonomy and LLMs for Improved Multimodal Hierarchical Classification,

    S. Chen, M. R. Bouadjenek, U. Naseem, B. Suleiman, S. Jameel, F. Salim, H. Hacid, and I. Razzak, “Leveraging Taxonomy and LLMs for Improved Multimodal Hierarchical Classification,” inProceedings of the 31st International Conference on Computational Linguistics, O. Rambow, L. Wanner, M. Apidianaki, H. Al-Khalifa, B. D. Eugenio, and S. Schockaert, Eds. Abu ...

  48. [48]

    KG- HTC: Integrating Knowledge Graphs into LLMs for Effective Zero- shot Hierarchical Text Classification,

    Q. Zang, C. Zgrzendek, I. Tchappi, A. Khadangi, and J. Sedlmeir, “KG- HTC: Integrating Knowledge Graphs into LLMs for Effective Zero- shot Hierarchical Text Classification,”arXiv preprint arXiv:2505.05583, 2025

  49. [49]

    Tax Classification of Invoice Details Based on Directed Heterogeneous Graph,

    P. Zhao, Q. Zheng, B. Dong, J. Ruan, and M. Luo, “Tax Classification of Invoice Details Based on Directed Heterogeneous Graph,” in Proceedings of the 19th Chinese National Conference on Computational Linguistics, M. Sun, S. Li, Y . Zhang, and Y . Liu, Eds. Haikou, China: Chinese Information Processing Society of China, Oct 2020, pp. 771–782. [Online]. Ava...

  50. [50]

    Multimodal Approach for Harmonized System Code Prediction,

    O. Amel, S. Stassin, S. A. Mahmoudi, and X. Siebert, “Multimodal Approach for Harmonized System Code Prediction,” inESANN 2023 Proceesdings. Louvain-la-Neuve (Belgium): Ciaco - i6doc.com, 2023

  51. [51]

    Harmonized system code classification using supervised contrastive learning with sentence bert and multiple negative ranking loss,

    A. W. Anggoro, P. Corcoran, D. De Widt, and Y . Li, “Harmonized system code classification using supervised contrastive learning with sentence bert and multiple negative ranking loss,”Data Technologies and Applications, vol. 59, no. 2, pp. 276–301, 12 2024. [Online]. Available: https://doi.org/10.1108/DTA-01-2024-0052