pith. sign in

arxiv: 2605.21029 · v1 · pith:NKABGPL6new · submitted 2026-05-20 · 💻 cs.CL

Building a Custom Taxonomy of AI Skills and Tasks from the Ground Up with Job Postings

Pith reviewed 2026-05-21 05:20 UTC · model grok-4.3

classification 💻 cs.CL
keywords taxonomy constructionAI skillsjob postingsLLM applicationsdata filteringhierarchical taxonomyworkplace skills analysis
0
0 comments X

The pith

Filtering job postings data creates clearer AI skills taxonomies than using the full unfiltered corpus.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper examines design choices for building taxonomies of AI skills from job postings using LLMs. It introduces TaxonomyBuilder to test different configurations of data inclusion for creating custom and hierarchical taxonomies. The central result is that filtering the input data leads to taxonomies with superior domain-specific coverage compared to applying clustering and LLM labeling directly to unfiltered large datasets. A sympathetic reader would care because it shows how to efficiently map complex, growing domains like AI workplace skills without being overwhelmed by data volume.

Core claim

Utilizing LLMs for automated taxonomy construction presents an opportunity for mapping complex domains efficiently. Using two large-scale job postings corpora, the authors investigate how to best leverage data for optimal taxonomy construction in the case of AI skills. They propose TaxonomyBuilder as a blueprint for systematic study and evaluate configurations of custom, data-informed, and hierarchical taxonomies, demonstrating that filtering inputs provides better domain-specific coverage than unfiltered inputs to clustering and LLM-enhanced tools.

What carries the argument

TaxonomyBuilder, a proposed blueprint for systematically evaluating configurations of custom, data-informed, and hierarchical taxonomies built from job postings data.

If this is right

  • Taxonomies for AI skills can achieve better coverage by selectively filtering job postings rather than using all available data.
  • Data-informed approaches outperform standard clustering and LLM hierarchical labeling when inputs are filtered for relevance.
  • Systematic evaluation of data inclusion decisions improves the quality of automated taxonomies in high-volume domains.
  • The method can extend to systematizing skills in other rapidly growing fields using similar corpora.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Organizations building internal skill databases might reduce noise by pre-filtering sources before taxonomy generation.
  • Similar filtering principles could apply to other LLM uses on large text corpora to improve output specificity.
  • Future work could test TaxonomyBuilder on different domains to confirm if less data generally provides more clarity.
  • Job platforms could integrate such filtered taxonomies for better matching of AI roles and candidates.

Load-bearing premise

The job postings corpora used are representative of actual AI skills and tasks in the workplace and have minimal bias or noise that would affect the taxonomy.

What would settle it

An independent evaluation where unfiltered data produces taxonomies that match or exceed the coverage of filtered ones when compared to a gold-standard set of AI skills derived from expert review or additional sources.

Figures

Figures reproduced from arXiv: 2605.21029 by Peter Norlander, Stephen Meisenbacher.

Figure 1
Figure 1. Figure 1: The TAXONOMYBUILDER method. In the top lane, we detail the setup method we follow as a precursor to taxonomy construction, which consists of keyword-based context mining and class-based scoring. The TAX￾ONOMYBUILDER method, in turn, consists of two primary stages (depicted in the center and bottom lanes): (1) the construction of the foundation (leaf) level, followed by iterative vertical construction of fu… view at source ↗
Figure 2
Figure 2. Figure 2: Abridged example of the taxonomy structure produced by [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
read the original abstract

Utilizing LLMs for automated taxonomy construction presents a clear opportunity for the comprehensive, yet efficient mapping of potentially complex domains. When contending with high volumes of rapidly growing corpora, however, it becomes unclear how to best leverage such data for optimal taxonomy construction. Taking the case of systematizing AI skills in the workplace, we use two large-scale job postings corpora to investigate key design decisions for the inclusion (or exclusion) of data points for taxonomy construction. We propose TaxonomyBuilder as a blueprint for our systematic study, with which we evaluate various configurations of custom, data-informed, and hierarchical taxonomies. We demonstrate that less data can provide more clarity: filtering inputs to TaxonomyBuilder provides better domain-specific coverage than offering unfiltered inputs to clustering and LLM-enhanced hierarchical taxonomy labeling tools.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes TaxonomyBuilder as a blueprint for constructing custom, data-informed, hierarchical taxonomies of AI skills and tasks from job postings. Using two large-scale job postings corpora, the authors investigate design decisions around data inclusion/exclusion and claim that filtering inputs to TaxonomyBuilder provides better domain-specific coverage than offering unfiltered inputs to clustering and LLM-enhanced hierarchical taxonomy labeling tools.

Significance. If the central empirical comparison holds under independent validation, the result would be significant for practical taxonomy construction in high-volume, rapidly evolving domains such as AI workplace skills, by demonstrating that targeted data filtering can outperform unfiltered LLM-assisted clustering pipelines.

major comments (2)
  1. [Abstract] Abstract: the main finding is stated without any quantitative metrics, validation procedures, or details on how 'domain-specific coverage' was measured or compared, preventing assessment of the claim.
  2. [Evaluation] Evaluation section (or equivalent): the demonstration that filtered inputs outperform unfiltered clustering + LLM labeling requires a reproducible, pre-registered metric for coverage (e.g., held-out test set of postings, inter-annotator agreement on expert ratings, or disjoint validation corpus). Without this, the result risks being driven by alignment between the filtering heuristic and the chosen assessment rather than genuine improvement.
minor comments (1)
  1. [Methods] Clarify the precise operational definition of 'domain-specific coverage' and the two corpora used (size, source, preprocessing) in the methods section for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback on our manuscript. We have carefully reviewed the major comments and provide point-by-point responses below, indicating where revisions will be made to improve clarity and rigor.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the main finding is stated without any quantitative metrics, validation procedures, or details on how 'domain-specific coverage' was measured or compared, preventing assessment of the claim.

    Authors: We agree that the abstract would benefit from greater specificity. In the revised version, we will update the abstract to include key quantitative metrics (such as coverage percentages on validation postings for filtered versus unfiltered pipelines) and a concise description of how domain-specific coverage was operationalized and compared. This will enable readers to assess the central claim more directly. revision: yes

  2. Referee: [Evaluation] Evaluation section (or equivalent): the demonstration that filtered inputs outperform unfiltered clustering + LLM labeling requires a reproducible, pre-registered metric for coverage (e.g., held-out test set of postings, inter-annotator agreement on expert ratings, or disjoint validation corpus). Without this, the result risks being driven by alignment between the filtering heuristic and the chosen assessment rather than genuine improvement.

    Authors: We acknowledge the value of an explicitly reproducible metric. Our evaluation already relies on a disjoint validation corpus of job postings excluded from taxonomy construction, measuring the taxonomy's coverage of AI skills in these held-out postings. We will revise the Evaluation section to describe this procedure in greater detail, including any quantitative thresholds or agreement measures employed, to support independent reproduction and mitigate concerns about heuristic alignment. While the study was not pre-registered, the expanded description will address the core reproducibility issue. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical comparison of taxonomy construction methods

full rationale

The paper conducts an empirical study comparing TaxonomyBuilder configurations on two job postings corpora, evaluating filtered versus unfiltered inputs for domain-specific coverage in AI skills taxonomies. No equations, derivations, or self-definitional reductions are present. The central claim rests on direct comparison of outputs from data-driven processes rather than any fitted parameter or self-citation chain that collapses back to the inputs by construction. The work is self-contained as a standard empirical evaluation of design choices in automated taxonomy building.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review based solely on abstract; full methods, metrics, and parameter choices are not available. The work rests on the representativeness of job postings and the reliability of LLM labeling.

axioms (1)
  • domain assumption Job postings corpora accurately reflect current AI skills and tasks in the workplace
    The study uses these corpora as the primary data source for taxonomy construction.

pith-pipeline@v0.9.0 · 5657 in / 1185 out tokens · 32499 ms · 2026-05-21T05:20:15.601307+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages

  1. [1]

    Building

    Acemoglu, Daron and Autor, David and Johnson, Simon , month = feb, year =. Building

  2. [2]

    2023 , month = aug, url =

    The. 2023 , month = aug, url =

  3. [3]

    and Autor, David and Bessen, James E

    Frank, Morgan R. and Autor, David and Bessen, James E. and Brynjolfsson, Erik and Cebrian, Manuel and Deming, David J. and Feldman, Maryann and Groh, Matthew and Lobo, José and Moro, Esteban and Wang, Dashun and Youn, Hyejin and Rahwan, Iyad , month = apr, year =. Toward understanding the impact of artificial intelligence on labor , volume =. Proceedings ...

  4. [4]

    Artificial intelligence and skills in the workplace:

    Margaryan, Anoush , month = jul, year =. Artificial intelligence and skills in the workplace:. Big Data & Society , publisher =. doi:10.1177/20539517231206804 , language =

  5. [5]

    Journal of Economic Literature , author =

    Digital. Journal of Economic Literature , author =. 2019 , pages =. doi:10.1257/jel.20171452 , language =

  6. [6]

    The Journal of Industrial Economics , author =

    Some. The Journal of Industrial Economics , author =. 2002 , pages =. doi:10.1111/1467-6451.00174 , language =

  7. [7]

    Improving data access democratizes and diversifies science , volume =

    Nagaraj, Abhishek and Shears, Esther and de Vaan, Mathijs , month = sep, year =. Improving data access democratizes and diversifies science , volume =. Proceedings of the National Academy of Sciences , publisher =. doi:10.1073/pnas.2001682117 , number =

  8. [8]

    Nagaraj, Abhishek , month = jan, year =. The. Management Science , publisher =. doi:10.1287/mnsc.2020.3878 , number =

  9. [9]

    Artificial

    National Academies of Sciences,. Artificial. 2025 , keywords =. doi:10.17226/27644 , language =

  10. [10]

    The labor market impacts of technological change:

    Autor, David , editor =. The labor market impacts of technological change:. An. 2022 , pages =

  11. [11]

    , month = jun, year =

    Lane, Julia and Owen-Smith, Jason and Weinberg, Bruce A. , month = jun, year =. How to track the economic impact of public investments in. Nature , publisher =. doi:10.1038/d41586-024-01721-1 , language =

  12. [12]

    Zweig, Ben , year =. Job

  13. [13]

    Eng , volume=

    Automated taxonomy construction using large language models: A comparative study of fine-tuning and prompt engineering , author=. Eng , volume=. 2025 , publisher=

  14. [14]

    European journal of information systems , volume=

    A method for taxonomy development and its application in information systems , author=. European journal of information systems , volume=. 2013 , publisher=

  15. [15]

    Machine learning , pages=

    Learning from observation: Conceptual clustering , author=. Machine learning , pages=. 1983 , publisher=

  16. [16]

    Clustering and classification , pages=

    Hierarchical classification , author=. Clustering and classification , pages=. 1996 , publisher=

  17. [17]

    International Conference on Data Warehousing and Knowledge Discovery , pages=

    Towards the automatic construction of conceptual taxonomies , author=. International Conference on Data Warehousing and Knowledge Discovery , pages=. 2008 , organization=

  18. [18]

    Proceedings of the 16th European Conference on Artificial Intelligence , pages =

    Cimiano, Philipp and Hotho, Andreas and Staab, Steffen , title =. Proceedings of the 16th European Conference on Artificial Intelligence , pages =. 2004 , isbn =

  19. [19]

    Semantic Web , volume=

    Large language models for creation, enrichment and evaluation of taxonomic graphs , author=. Semantic Web , volume=. 2026 , publisher=

  20. [20]

    Business & Information Systems Engineering , pages=

    Semi-Automatic Hierarchical Taxonomy Creation from Existing Taxonomies with Large Language Models , author=. Business & Information Systems Engineering , pages=. 2026 , publisher=

  21. [21]

    LLMT axo: Leveraging Large Language Models for Constructing Taxonomy of Factual Claims from Social Media

    Zhang, Haiqi and Zhu, Zhengyuan and Zhang, Zeyu and Li, Chengkai. LLMT axo: Leveraging Large Language Models for Constructing Taxonomy of Factual Claims from Social Media. Findings of the Association for Computational Linguistics: ACL 2025. 2025. doi:10.18653/v1/2025.findings-acl.1007

  22. [22]

    Prompting or Fine-tuning? A Comparative Study of Large Language Models for Taxonomy Construction , year=

    Chen, Boqi and Yi, Fandi and Varró, Dániel , booktitle=. Prompting or Fine-tuning? A Comparative Study of Large Language Models for Taxonomy Construction , year=

  23. [23]

    Automatic Acquisition of Hyponyms from Large Text Corpora

    Hearst, Marti A. Automatic Acquisition of Hyponyms from Large Text Corpora. COLING 1992 Volume 2: The 14th I nternational C onference on C omputational L inguistics. 1992

  24. [24]

    Dependency-Based Construction of Semantic Space Models

    Pad \'o , Sebastian and Lapata, Mirella. Dependency-Based Construction of Semantic Space Models. Computational Linguistics. 2007. doi:10.1162/coli.2007.33.2.161

  25. [25]

    Taxonomy Induction Using Hierarchical Random Graphs

    Fountain, Trevor and Lapata, Mirella. Taxonomy Induction Using Hierarchical Random Graphs. Proceedings of the 2012 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies. 2012

  26. [26]

    Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management , pages =

    Pasca, Marius , title =. Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management , pages =. 2004 , isbn =. doi:10.1145/1031171.1031194 , abstract =

  27. [27]

    Semantic Taxonomy Induction from Heterogenous Evidence

    Snow, Rion and Jurafsky, Daniel and Ng, Andrew Y. Semantic Taxonomy Induction from Heterogenous Evidence. Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics. 2006. doi:10.3115/1220175.1220276

  28. [28]

    O nto L earn Reloaded: A Graph-Based Algorithm for Taxonomy Induction

    Velardi, Paola and Faralli, Stefano and Navigli, Roberto. O nto L earn Reloaded: A Graph-Based Algorithm for Taxonomy Induction. Computational Linguistics. 2013. doi:10.1162/COLI_a_00146

  29. [29]

    Unsupervised Ontology Induction from Text

    Poon, Hoifung and Domingos, Pedro. Unsupervised Ontology Induction from Text. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. 2010

  30. [30]

    Learning Semantic Hierarchies via Word Embeddings

    Fu, Ruiji and Guo, Jiang and Qin, Bing and Che, Wanxiang and Wang, Haifeng and Liu, Ting. Learning Semantic Hierarchies via Word Embeddings. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2014. doi:10.3115/v1/P14-1113

  31. [31]

    Supervised Distributional Hypernym Discovery via Domain Adaptation

    Espinosa-Anke, Luis and Camacho-Collados, Jose and Delli Bovi, Claudio and Saggion, Horacio. Supervised Distributional Hypernym Discovery via Domain Adaptation. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016. doi:10.18653/v1/D16-1041

  32. [32]

    End-to-End Reinforcement Learning for Automatic Taxonomy Induction

    Mao, Yuning and Ren, Xiang and Shen, Jiaming and Gu, Xiaotao and Han, Jiawei. End-to-End Reinforcement Learning for Automatic Taxonomy Induction. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2018. doi:10.18653/v1/P18-1229

  33. [33]

    Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , pages =

    Zhang, Chao and Tao, Fangbo and Chen, Xiusi and Shen, Jiaming and Jiang, Meng and Sadler, Brian and Vanni, Michelle and Han, Jiawei , title =. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , pages =. 2018 , isbn =. doi:10.1145/3219819.3220064 , abstract =

  34. [34]

    A Semi-Supervised Method to Learn and Construct Taxonomies Using the Web

    Kozareva, Zornitsa and Hovy, Eduard. A Semi-Supervised Method to Learn and Construct Taxonomies Using the Web. Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. 2010

  35. [35]

    Intuitionistic and Type-2 fuzzy logic enhancements in neural and optimization algorithms: Theory and applications , pages=

    Automated ontology extraction from unstructured texts using deep learning , author=. Intuitionistic and Type-2 fuzzy logic enhancements in neural and optimization algorithms: Theory and applications , pages=. 2020 , publisher=

  36. [36]

    T axo A dapt: Aligning LLM -Based Multidimensional Taxonomy Construction to Evolving Research Corpora

    Kargupta, Priyanka and Zhang, Nan and Zhang, Yunyi and Zhang, Rui and Mitra, Prasenjit and Han, Jiawei. T axo A dapt: Aligning LLM -Based Multidimensional Taxonomy Construction to Evolving Research Corpora. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl-long.1442

  37. [37]

    Context-Aware Hierarchical Taxonomy Generation for Scientific Papers via LLM -Guided Multi-Aspect Clustering

    Zhu, Kun and Liao, Lizi and Gu, Yuxuan and Huang, Lei and Feng, Xiaocheng and Qin, Bing. Context-Aware Hierarchical Taxonomy Generation for Scientific Papers via LLM -Guided Multi-Aspect Clustering. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.788

  38. [38]

    Proceedings of the 2025 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region , pages =

    Huang, Chen and He, Guoxiu , title =. Proceedings of the 2025 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region , pages =. 2025 , isbn =. doi:10.1145/3767695.3769519 , abstract =

  39. [39]

    Building Data-Driven Occupation Taxonomies: A Bottom-Up Multi-Stage Approach via Semantic Clustering and Multi-Agent Collaboration

    Li, Nan and Kang, Bo and De Bie, Tijl. Building Data-Driven Occupation Taxonomies: A Bottom-Up Multi-Stage Approach via Semantic Clustering and Multi-Agent Collaboration. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track. 2025. doi:10.18653/v1/2025.emnlp-industry.113

  40. [40]

    Proceedings of the 33rd ACM International Conference on Information and Knowledge Management , pages =

    Zeng, Qingkai and Bai, Yuyang and Tan, Zhaoxuan and Feng, Shangbin and Liang, Zhenwen and Zhang, Zhihan and Jiang, Meng , title =. Proceedings of the 33rd ACM International Conference on Information and Knowledge Management , pages =. 2024 , isbn =. doi:10.1145/3627673.3679608 , abstract =

  41. [41]

    , author=

    TaxoRankConstruct: A Novel Rank-based Iterative Approach To Taxonomy Construction With Large Language Models. , author=. ISS@ IT&I , pages=. 2024 , url=

  42. [42]

    and Yang, Longqi and Andersen, Reid and Buscher, Georg and Joshi, Dhruv and Rangan, Nagu , title =

    Wan, Mengting and Safavi, Tara and Jauhar, Sujay Kumar and Kim, Yujin and Counts, Scott and Neville, Jennifer and Suri, Siddharth and Shah, Chirag and White, Ryen W. and Yang, Longqi and Andersen, Reid and Buscher, Georg and Joshi, Dhruv and Rangan, Nagu , title =. Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , page...

  43. [43]

    T axo A lign: Scholarly Taxonomy Generation Using Language Models

    Lahiri, Avishek and Hou, Yufang and Sanyal, Debarshi Kumar. T axo A lign: Scholarly Taxonomy Generation Using Language Models. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.1536

  44. [44]

    William Resh and Keunyoung Lee and Yi Ming. U.S. Federal Civil Position Job Postings (2018-2023). 2025. doi:10.6084/m9.figshare.28509314.v5

  45. [45]

    Identifying and measuring developments in artificial intelligence:

    Baruffaldi, Stefano and Beuzekom, Brigitte van and Dernis, Hélène and Harhoff, Dietmar and Rao, Nandan and Rosenfeld, David and Squicciarini, Mariagrazia , month = apr, year =. Identifying and measuring developments in artificial intelligence:. OECD Science, Technology and Industry Working Papers , publisher =. doi:10.1787/5f65ff7e-en , abstract =

  46. [46]

    Computer Aided Geometric Design 88, 102002

    The demand for. Labour Economics , author =. 2021 , keywords =. doi:10.1016/j.labeco.2021.102002 , abstract =

  47. [47]

    The changing economics of knowledge production , volume =

    Abis, Simona and Veldkamp, Laura , year =. The changing economics of knowledge production , volume =. The Review of Financial Studies , publisher =

  48. [48]

    Research Policy , author =

    Could machine learning be a general purpose technology?. Research Policy , author =. 2023 , keywords =. doi:10.1016/j.respol.2022.104653 , abstract =

  49. [49]

    2021 , pages =

    Management Information Systems Quarterly , author =. 2021 , pages =

  50. [50]

    Artificial

    Maslej, Nestor and Fattorini, Loredana and Perrault, Raymond and Gil, Yolanda and Parli, Vanessa and Kariuki, Njenga and Capstick, Emily and Reuel, Anka and Brynjolfsson, Erik and Etchemendy, John and Ligett, Katrina and Lyons, Terah and Manyika, James and Niebles, Juan Carlos and Shoham, Yoav and Wald, Russell and Walsh, Toby and Hamrah, Armin and Santar...

  51. [51]

    , month = jan, year =

    Tambe, Prasanna B. , month = jan, year =. Reskilling the. Management Science , publisher =. doi:10.1287/mnsc.2022.03968 , abstract =

  52. [52]

    International conference on web information systems and technologies , pages=

    Semantic label representations with lbl2vec: A similarity-based approach for unsupervised text classification , author=. International conference on web information systems and technologies , pages=. 2020 , organization=

  53. [53]

    An Improved Method for Class-specific Keyword Extraction: A Case Study in the G erman Business Registry

    Meisenbacher, Stephen and Schopf, Tim and Yan, Weixin and Holl, Patrick and Matthes, Florian. An Improved Method for Class-specific Keyword Extraction: A Case Study in the G erman Business Registry. Proceedings of the 20th Conference on Natural Language Processing (KONVENS 2024). 2024

  54. [54]

    doi: 10.18653/v1/D19-1410

    Reimers, Nils and Gurevych, Iryna. Sentence- BERT : Sentence Embeddings using S iamese BERT -Networks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019. doi:10.18653/v1/D19-1410

  55. [55]

    2023 , eprint=

    Towards General Text Embeddings with Multi-stage Contrastive Learning , author=. 2023 , eprint=

  56. [56]

    2025 , eprint=

    EmbeddingGemma: Powerful and Lightweight Text Representations , author=. 2025 , eprint=

  57. [57]

    Pacific-Asia conference on knowledge discovery and data mining , pages=

    Density-based clustering based on hierarchical density estimates , author=. Pacific-Asia conference on knowledge discovery and data mining , pages=. 2013 , organization=

  58. [58]

    Journal of Open Source Software , volume=

    UMAP: Uniform Manifold Approximation and Projection , author=. Journal of Open Source Software , volume=. 2018 , doi=

  59. [59]

    2024 , eprint=

    GPT-4o System Card , author=. 2024 , eprint=

  60. [60]

    Introducing an Evaluation Method for Taxonomies , year =

    Kaplan, Angelika and K\". Introducing an Evaluation Method for Taxonomies , year =. Proceedings of the 26th International Conference on Evaluation and Assessment in Software Engineering , pages =. doi:10.1145/3530019.3535305 , abstract =

  61. [61]

    2025 , eprint=

    OpenAI GPT-5 System Card , author=. 2025 , eprint=