pith. machine review for the scientific record. sign in

arxiv: 2605.12520 · v1 · submitted 2026-04-03 · 💻 cs.CL · cs.AI

Recognition: 1 theorem link

· Lean Theorem

BoostTaxo: Zero-Shot Taxonomy Induction via Boosting-Style Agentic Reasoning and Constraint-Aware Calibration

Authors on Pith no claims yet

Pith reviewed 2026-05-14 21:21 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords zero-shot taxonomy inductionLLM boostingparent candidate selectionstructure-aware calibrationWordNet benchmarkDBLPSemEval-Sci
0
0 comments X

The pith

BoostTaxo induces taxonomies from domain terms using a boosting-style LLM framework in zero-shot settings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces BoostTaxo, a boosting-style LLM framework for zero-shot taxonomy induction from a set of domain terms. It refines definitions with retrieval, selects parents in a coarse-to-fine hybrid manner using lightweight and large LLMs, and calibrates scores with structural features to build reliable hierarchies. This matters because existing methods struggle with generalization and structural reliability in new domains. The evaluation on WordNet, DBLP, and SemEval-Sci shows it achieves superior or comparable performance to state-of-the-art methods, with ablations confirming the key components.

Core claim

BoostTaxo performs parent identification in a coarse-to-fine manner, employing retrieval-augmented definition refinement, hybrid parent candidate selection, candidate rating, and structure-aware score calibration to improve taxonomy construction, where a lightweight LLM filters candidates and a large-scale LLM ranks them, with structural features calibrating edge weights.

What carries the argument

Boosting-style agentic reasoning with hybrid LLM selection and structure-aware score calibration for parent identification.

Load-bearing premise

The combination of retrieval-augmented refinement, hybrid selection, and structure calibration will consistently produce reliable taxonomies without inheriting biases from the underlying LLMs.

What would settle it

Evaluating BoostTaxo on a new benchmark dataset where its performance falls below that of existing zero-shot methods would falsify the superior performance claim.

Figures

Figures reproduced from arXiv: 2605.12520 by Leizhen Wang, Yancheng Ling, Zhenliang Ma, Zhenlin Qin.

Figure 1
Figure 1. Figure 1: Boosting-Style Zero-Shot Taxonomy Induction with Large Language [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The boosting-style zero-shot framework for taxonomy induction. Starting from a root concept and a term set, the framework first enriches term [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Prompt design for LLM-Based term definition refinement [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Prompt design for LLM-based candidate parent ranking and scoring [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Prompt design for LLM-Based calibration of candidate parent scores [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Comparison of taxonomy construction results across di [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Representative failure cases in taxonomy construction. In each subfigure, the left side shows the ground-truth taxonomy and the right side shows the [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗
read the original abstract

Taxonomy induction is crucial for organizing concepts into explicit and interpretable semantic hierarchies. While existing methods have achieved promising results, their generalization, structural reliability, and efficiency remain limited, hindering their performance in zero-shot and large-scale scenarios. To overcome these limitations, we introduce BoostTaxo, a boosting-style LLM framework for zero-shot taxonomy induction. It takes a set of domain terms as inputs and performs parent identification in a coarse-to-fine manner, employing retrieval-augmented definition refinement, hybrid parent candidate selection, candidate rating, and structure-aware score calibration to improve taxonomy construction. Specifically, a lightweight LLM is used to efficiently filter candidate parents, while a large-scale LLM is employed to rank and score candidate parents for fine-grained parent selection. Structural features are further incorporated to calibrate candidate edge weights and enhance the reliability of the induced taxonomy. The unified BoostTaxo is evaluated on three public benchmark datasets, namely WordNet, DBLP, and SemEval-Sci, and achieves superior or comparable performance to state-of-the-art methods in zero-shot taxonomy induction. The ablation study validates the contribution of the hybrid parent candidate selection and the structure-aware score calibration to the overall performance. Further analysis investigates the impact of candidate selection size on taxonomy quality and presents representative case and failure studies, providing deeper insights into the effectiveness and limitations of the proposed framework.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces BoostTaxo, a boosting-style LLM framework for zero-shot taxonomy induction that takes domain terms as input and performs coarse-to-fine parent identification via retrieval-augmented definition refinement, hybrid parent candidate selection (lightweight LLM for filtering, large-scale LLM for ranking), candidate rating, and structure-aware score calibration. It evaluates the unified approach on WordNet, DBLP, and SemEval-Sci benchmarks, claiming superior or comparable performance to SOTA methods, with ablations validating the hybrid selection and calibration components plus analysis of candidate size and case studies.

Significance. If the performance gains are shown to stem from the proposed pipeline rather than LLM pretraining leakage, the work could meaningfully advance reliable zero-shot taxonomy induction by demonstrating practical ways to combine lightweight and large LLMs with structural constraints for better generalization and reduced error propagation in hierarchy construction.

major comments (2)
  1. [Evaluation] Evaluation section (and abstract): the claim of superior/comparable zero-shot performance on WordNet, DBLP, and SemEval-Sci is load-bearing but vulnerable to pretraining contamination, as these long-standing public hierarchies are likely present in LLM training corpora; without decontamination experiments, post-cutoff held-out domains, or explicit checks that parent-ranking steps do not exploit memorized fragments, the results cannot isolate framework efficacy from leakage.
  2. [Ablation study] Ablation study: while it validates hybrid candidate selection and structure-aware calibration, the reported metrics lack error bars, statistical significance tests, or full baseline details (e.g., exact SOTA implementations and hyperparameter settings), making it difficult to assess whether the gains are robust or merely incremental.
minor comments (2)
  1. [Abstract] Abstract and method description: the term 'boosting-style' is used without a precise mapping to classical boosting mechanics (e.g., sequential error weighting), which could be clarified to avoid confusion with standard ensemble boosting.
  2. [Results] Figure and table captions: ensure all quantitative results (precision, recall, F1, or hierarchy metrics) are explicitly labeled with dataset splits and LLM versions used.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will incorporate revisions to strengthen the evaluation and ablation sections.

read point-by-point responses
  1. Referee: [Evaluation] Evaluation section (and abstract): the claim of superior/comparable zero-shot performance on WordNet, DBLP, and SemEval-Sci is load-bearing but vulnerable to pretraining contamination, as these long-standing public hierarchies are likely present in LLM training corpora; without decontamination experiments, post-cutoff held-out domains, or explicit checks that parent-ranking steps do not exploit memorized fragments, the results cannot isolate framework efficacy from leakage.

    Authors: We agree this is a valid and important concern for any LLM-based method evaluated on long-standing public benchmarks. Our framework is designed to reduce reliance on memorization through retrieval-augmented definition refinement, hybrid candidate selection (lightweight filtering followed by large-model ranking), and structure-aware calibration that enforces hierarchical constraints. These components encourage step-by-step reasoning over direct recall. Nevertheless, without explicit decontamination or post-cutoff experiments, full isolation of framework efficacy from leakage remains difficult. In the revised manuscript we will add a dedicated limitations subsection discussing pretraining contamination risks for the chosen benchmarks and, where feasible, report preliminary checks (e.g., membership inference on a small set of held-out terms or comparison against a synthetic domain). revision: yes

  2. Referee: [Ablation study] Ablation study: while it validates hybrid candidate selection and structure-aware calibration, the reported metrics lack error bars, statistical significance tests, or full baseline details (e.g., exact SOTA implementations and hyperparameter settings), making it difficult to assess whether the gains are robust or merely incremental.

    Authors: We thank the referee for highlighting this presentation gap. The ablation results were intended to isolate the contributions of hybrid selection and calibration, yet we acknowledge the absence of error bars, significance testing, and exhaustive baseline documentation reduces interpretability. In the revision we will (1) rerun ablations with multiple seeds where stochasticity exists and report means with standard deviations, (2) add statistical significance tests (paired t-tests on F1 and precision@1), and (3) expand the appendix with exact model versions, prompt templates, temperature settings, and reproduction details for all baselines. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation; framework uses external LLMs and benchmarks

full rationale

The paper describes a boosting-style LLM pipeline (retrieval-augmented refinement, hybrid candidate selection, structure-aware calibration) for zero-shot taxonomy induction. No equations, fitted parameters, or self-definitions reduce the claimed outputs to the inputs by construction. Performance is assessed via external public benchmarks (WordNet, DBLP, SemEval-Sci) and ablations that isolate component contributions without renaming or self-referential closure. No load-bearing self-citations or uniqueness theorems imported from prior author work appear in the provided text. The derivation chain remains independent of the target results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the assumption that LLMs guided by retrieval and structural calibration can perform accurate parent identification in a zero-shot manner; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)
  • domain assumption LLMs can reliably identify semantic parent concepts when provided with retrieval-augmented definitions and structure-aware score calibration
    This assumption underpins the coarse-to-fine parent selection and overall taxonomy quality claims.

pith-pipeline@v0.9.0 · 5553 in / 1163 out tokens · 39086 ms · 2026-05-14T21:21:22.078518+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages · 2 internal anchors

  1. [1]

    Hiexpan: Task-guided taxonomy construction by hierarchical tree expansion,

    J. Shen, Z. Wu, D. Lei, C. Zhang, X. Ren, M. T. Vanni, B. M. Sadler, and J. Han, “Hiexpan: Task-guided taxonomy construction by hierarchical tree expansion,” inProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery&Data Mining, 2018, pp. 2180– 2189

  2. [2]

    Setexpan: Corpus- based set expansion via context feature selection and rank ensemble,

    J. Shen, Z. Wu, D. Lei, J. Shang, X. Ren, and J. Han, “Setexpan: Corpus- based set expansion via context feature selection and rank ensemble,” inJoint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 2017, pp. 288–304

  3. [3]

    Automatic taxonomy con- struction from keywords,

    X. Liu, Y . Song, S. Liu, and H. Wang, “Automatic taxonomy con- struction from keywords,” inProceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, 2012, pp. 1433–1441

  4. [4]

    An intent taxonomy for questions asked in web search,

    B. B. Cambazoglu, L. Tavakoli, F. Scholer, M. Sanderson, and B. Croft, “An intent taxonomy for questions asked in web search,” inProceedings of the 2021 Conference on Human Information Interaction and Retrieval, 2021, pp. 85–94

  5. [5]

    Efficiently answering technical questions—a knowledge graph approach,

    S. Yang, L. Zou, Z. Wang, J. Yan, and J.-R. Wen, “Efficiently answering technical questions—a knowledge graph approach,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 31, no. 1, 2017

  6. [6]

    Building taxonomy of web search intents for name entity queries,

    X. Yin and S. Shah, “Building taxonomy of web search intents for name entity queries,” inProceedings of the 19th international conference on World wide web, 2010, pp. 1001–1010

  7. [7]

    Probase: A probabilistic taxonomy for text understanding,

    W. Wu, H. Li, H. Wang, and K. Q. Zhu, “Probase: A probabilistic taxonomy for text understanding,” inProceedings of the 2012 ACM SIGMOD international conference on management of data, 2012, pp. 481–492

  8. [8]

    Understand short texts by harvesting and analyzing semantic knowledge,

    W. Hua, Z. Wang, H. Wang, K. Zheng, and X. Zhou, “Understand short texts by harvesting and analyzing semantic knowledge,”IEEE transactions on Knowledge and data Engineering, vol. 29, no. 3, pp. 499–512, 2016

  9. [9]

    Taxonomy-aware multi-hop reasoning networks for sequential recom- mendation,

    J. Huang, Z. Ren, W. X. Zhao, G. He, J.-R. Wen, and D. Dong, “Taxonomy-aware multi-hop reasoning networks for sequential recom- mendation,” inProceedings of the twelfth ACM international conference on web search and data mining, 2019, pp. 573–581

  10. [10]

    Enhancing recommendation with automated tag taxonomy construction in hyper- bolic space,

    Y . Tan, C. Yang, X. Wei, C. Chen, L. Li, and X. Zheng, “Enhancing recommendation with automated tag taxonomy construction in hyper- bolic space,” in2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 2022, pp. 1180–1192. 12 industrial processfractionationfractional distillationdestructive distillationcarbonizationfractionation fract...

  11. [11]

    Learning to rank hypernyms of financial terms using semantic textual similarity,

    S. Ghosh, A. Chopra, and S. K. Naskar, “Learning to rank hypernyms of financial terms using semantic textual similarity,”SN Computer Science, vol. 4, no. 5, p. 610, 2023

  12. [12]

    Kepl: Knowledge enhanced prompt learning for chinese hypernym-hyponym extraction,

    N. Ma, D. Wang, H. Bao, L. He, and S. Zheng, “Kepl: Knowledge enhanced prompt learning for chinese hypernym-hyponym extraction,” inProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023, pp. 5858–5867

  13. [13]

    Constructing taxonomies from pre- trained language models,

    C. Chen, K. Lin, and D. Klein, “Constructing taxonomies from pre- trained language models,” inProceedings of the 2021 conference of the North American chapter of the Association for Computational Linguistics: Human language technologies, 2021, pp. 4687–4700

  14. [14]

    Automatic acquisition of hyponyms from large text corpora,

    M. A. Hearst, “Automatic acquisition of hyponyms from large text corpora,” inCOLING 1992 volume 2: The 14th international conference on computational linguistics, 1992

  15. [15]

    A semi-supervised method to learn and construct taxonomies using the web,

    Z. Kozareva and E. Hovy, “A semi-supervised method to learn and construct taxonomies using the web,” inProceedings of the 2010 conference on empirical methods in natural language processing, 2010, pp. 1110–1118

  16. [16]

    Lexico-syntactic patterns for automatic ontology building,

    C. Klaussner and D. Zhekova, “Lexico-syntactic patterns for automatic ontology building,” inProceedings of the Second Student Research Workshop associated with RANLP 2011, 2011, pp. 109–114

  17. [17]

    Taxi at semeval-2016 task 13: a taxonomy induction method based on lexico-syntactic patterns, substrings and focused crawling,

    A. Panchenko, S. Faralli, E. Ruppert, S. Remus, H. Naets, C. Fairon, S. P. Ponzetto, and C. Biemann, “Taxi at semeval-2016 task 13: a taxonomy induction method based on lexico-syntactic patterns, substrings and focused crawling,” inProceedings of the 10th international workshop on semantic evaluation (SemEval-2016), 2016, pp. 1320–1327

  18. [18]

    Large language models for generative infor- mation extraction: A survey,

    D. Xu, W. Chen, W. Peng, C. Zhang, T. Xu, X. Zhao, X. Wu, Y . Zheng, Y . Wang, and E. Chen, “Large language models for generative infor- mation extraction: A survey,”Frontiers of Computer Science, vol. 18, no. 6, p. 186357, 2024

  19. [19]

    How to unleash the power of large language models for few-shot relation extraction?

    X. Xu, Y . Zhu, X. Wang, and N. Zhang, “How to unleash the power of large language models for few-shot relation extraction?” inProceedings of The Fourth Workshop on Simple and Efficient Natural Language Processing (SustaiNLP), 2023, pp. 190–200

  20. [20]

    Language mod- els are few-shot learners,

    T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell,et al., “Language mod- els are few-shot learners,”Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020

  21. [21]

    Large language models are zero-shot text classifiers,

    Z. Wang, Y . Pang, and Y . Lin, “Large language models are zero-shot text classifiers,”arXiv preprint arXiv:2312.01044, 2023

  22. [22]

    Gpt3mix: Leveraging large-scale language models for text augmentation,

    K. M. Yoo, D. Park, J. Kang, S.-W. Lee, and W. Park, “Gpt3mix: Leveraging large-scale language models for text augmentation,” in Findings of the Association for Computational Linguistics: EMNLP 2021, 2021, pp. 2225–2239

  23. [23]

    A review of knowledge graph construction using large language models in transportation: Problems, methods, and challenges,

    Y . Ling, Z. Qin, and Z. Ma, “A review of knowledge graph construction using large language models in transportation: Problems, methods, and challenges,”Transportation Research Part C: Emerging Technologies, vol. 183, p. 105428, 2026

  24. [24]

    Enhancing knowl- edge graph construction using large language models,

    M. Trajanoska, R. Stojanov, and D. Trajanov, “Enhancing knowl- edge graph construction using large language models,”arXiv preprint arXiv:2305.04676, 2023

  25. [25]

    Prompting or fine-tuning? a comparative study of large language models for taxonomy construction,

    B. Chen, F. Yi, and D. Varr ´o, “Prompting or fine-tuning? a comparative study of large language models for taxonomy construction,” in2023 ACM/IEEE International Conference on Model Driven Engineering Languages and Systems Companion (MODELS-C). IEEE, 2023, pp. 588–596

  26. [26]

    Distilling hypernymy relations from language models: On the effectiveness of zero-shot taxonomy induction,

    D. Jain and L. E. Anke, “Distilling hypernymy relations from language models: On the effectiveness of zero-shot taxonomy induction,” in Proceedings of the 11th joint conference on lexical and computational semantics, 2022, pp. 151–156

  27. [27]

    Chain-of-layer: Iteratively prompting large language models for taxon- omy induction from limited examples,

    Q. Zeng, Y . Bai, Z. Tan, S. Feng, Z. Liang, Z. Zhang, and M. Jiang, “Chain-of-layer: Iteratively prompting large language models for taxon- omy induction from limited examples,” inProceedings of the 33rd ACM International Conference on Information and Knowledge Management, 2024, pp. 3093–3102

  28. [28]

    Chain-of-thought prompting elicits reasoning in large language models,

    J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V . Le, D. Zhou,et al., “Chain-of-thought prompting elicits reasoning in large language models,”Advances in neural information processing systems, vol. 35, pp. 24 824–24 837, 2022

  29. [29]

    TinyLlama: An Open-Source Small Language Model

    P. Zhang, G. Zeng, T. Wang, and W. Lu, “Tinyllama: An open-source small language model, 2024,”URL https://arxiv. org/abs/2401.02385, 2024

  30. [30]

    Towards reasoning ability of small language models,

    G. Srivastava, S. Cao, and X. Wang, “Towards reasoning ability of small language models,”arXiv preprint arXiv:2502.11569, 2025

  31. [31]

    Beyond chinchilla- optimal: Accounting for inference in language model scaling laws,

    N. Sardana, J. Portes, S. Doubov, and J. Frankle, “Beyond chinchilla- optimal: Accounting for inference in language model scaling laws,” arXiv preprint arXiv:2401.00448, 2023

  32. [32]

    Emergent Abilities of Large Language Models

    J. Wei, Y . Tay, R. Bommasani, C. Raffel, B. Zoph, S. Borgeaud, D. Yogatama, M. Bosma, D. Zhou, D. Metzler,et al., “Emergent abilities of large language models,”arXiv preprint arXiv:2206.07682, 2022

  33. [33]

    Inferring concept hierarchies from text corpora via hyperbolic embeddings,

    M. Le, S. Roller, L. Papaxanthos, D. Kiela, and M. Nickel, “Inferring concept hierarchies from text corpora via hyperbolic embeddings,” inProceedings of the 57th annual meeting of the association for computational linguistics, 2019, pp. 3231–3241

  34. [34]

    Improving hypernymy detection with an integrated path-based and distributional method,

    V . Shwartz, Y . Goldberg, and I. Dagan, “Improving hypernymy detection with an integrated path-based and distributional method,” inProceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2016, pp. 2389–2398. 13

  35. [35]

    Learning syntactic patterns for auto- matic hypernym discovery,

    R. Snow, D. Jurafsky, and A. Ng, “Learning syntactic patterns for auto- matic hypernym discovery,”Advances in neural information processing systems, vol. 17, 2004

  36. [36]

    Learning semantic hierarchies via word embeddings,

    R. Fu, J. Guo, B. Qin, W. Che, H. Wang, and T. Liu, “Learning semantic hierarchies via word embeddings,” inProceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2014, pp. 1199–1209

  37. [37]

    Learning term embeddings for taxonomic relation identification using dynamic weighting neural network,

    L. A. Tuan, Y . Tay, S. C. Hui, and S. K. Ng, “Learning term embeddings for taxonomic relation identification using dynamic weighting neural network,” inProceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016, pp. 403–413

  38. [38]

    End-to-end reinforcement learning for automatic taxonomy induction,

    Y . Mao, X. Ren, J. Shen, X. Gu, and J. Han, “End-to-end reinforcement learning for automatic taxonomy induction,” inProceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2018, pp. 2462–2472

  39. [39]

    Rebel: Relation extraction by end-to-end language generation,

    P.-L. H. Cabot and R. Navigli, “Rebel: Relation extraction by end-to-end language generation,” inFindings of the association for computational linguistics: EMNLP 2021, 2021, pp. 2370–2381

  40. [40]

    Unified structure generation for universal information extraction,

    Y . Lu, Q. Liu, D. Dai, X. Xiao, H. Lin, X. Han, L. Sun, and H. Wu, “Unified structure generation for universal information extraction,” in Proceedings of the 60th annual meeting of the association for compu- tational linguistics (volume 1: long papers), 2022, pp. 5755–5772

  41. [41]

    Instruct and extract: Instruction tuning for on-demand information extraction,

    Y . Jiao, M. Zhong, S. Li, R. Zhao, S. Ouyang, H. Ji, and J. Han, “Instruct and extract: Instruction tuning for on-demand information extraction,” in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023, pp. 10 030–10 051

  42. [42]

    Adelie: Aligning large language models on information extraction,

    Y . Qi, H. Peng, X. Wang, B. Xu, L. Hou, and J. Li, “Adelie: Aligning large language models on information extraction,” inProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024, pp. 7371–7387

  43. [43]

    Genres: Rethinking evaluation for generative relation extraction in the era of large language models,

    P. Jiang, J. Lin, Z. Wang, J. Sun, and J. Han, “Genres: Rethinking evaluation for generative relation extraction in the era of large language models,” inProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024, pp. 2820–2837

  44. [44]

    Towards robust universal information extraction: Dataset, evaluation, and solu- tion,

    J. Zhu, A. Shi, Z. Li, L. Bai, X. Jin, J. Guo, and X. Cheng, “Towards robust universal information extraction: Dataset, evaluation, and solu- tion,” inProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025, pp. 28 052– 28 070

  45. [45]

    Large lan- guage models are zero-shot reasoners,

    T. Kojima, S. S. Gu, M. Reid, Y . Matsuo, and Y . Iwasawa, “Large lan- guage models are zero-shot reasoners,”Advances in neural information processing systems, vol. 35, pp. 22 199–22 213, 2022

  46. [46]

    Pal: Program-aided language models,

    L. Gao, A. Madaan, S. Zhou, U. Alon, P. Liu, Y . Yang, J. Callan, and G. Neubig, “Pal: Program-aided language models,” inInternational conference on machine learning. PMLR, 2023, pp. 10 764–10 799

  47. [47]

    Reason-align-respond: Aligning llm reasoning with knowledge graphs for kgqa,

    X. Shen, F. Wang, Z. Yang, B. Wang, W. Du, C. Zong, and R. Xia, “Reason-align-respond: Aligning llm reasoning with knowledge graphs for kgqa,”IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 2026

  48. [48]

    Optimum branchings,

    J. Edmondset al., “Optimum branchings,”Journal of Research of the national Bureau of Standards B, vol. 71, no. 4, pp. 233–240, 1967

  49. [49]

    Taxonomy construction of unseen domains via graph-based cross-domain knowledge transfer,

    C. Shang, S. Dash, M. F. M. Chowdhury, N. Mihindukulasooriya, and A. Gliozzo, “Taxonomy construction of unseen domains via graph-based cross-domain knowledge transfer,” inProceedings of the 58th annual meeting of the Association for Computational Linguistics, 2020, pp. 2198–2208. Yancheng Lingreceived his B.E. degrees in Traf- fic Engineering and Internet...