pith. sign in

arxiv: 2604.12025 · v1 · submitted 2026-04-13 · 💻 cs.AI

WiseOWL: A Methodology for Evaluating Ontological Descriptiveness and Semantic Correctness for Ontology Reuse and Ontology Recommendations

Pith reviewed 2026-05-10 15:48 UTC · model grok-4.3

classification 💻 cs.AI
keywords ontology reuseontology evaluationsemantic webOWLembeddingsdocumentation coveragehierarchical structure
0
0 comments X

The pith

WiseOWL scores ontologies on documentation coverage, label alignment via embeddings, structural connections, and hierarchical balance to support reuse decisions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces WiseOWL as a methodology to address the lack of systematic criteria for choosing ontologies, where developers often rely on hard-to-justify intuition. It defines four metrics that produce normalized scores from 0 to 10 along with guidance: documentation coverage, alignment between labels and definitions using embeddings, degree of interconnectedness, and balance in the hierarchy. The approach is implemented as a Streamlit application that accepts OWL files, converts them to RDF Turtle, and generates interactive visualizations. Evaluation on six ontologies from biology, food, and general domains shows the scores can differentiate quality levels in practice. If the method holds, it would allow more defensible and consistent choices that improve interoperability in semantic web applications.

Core claim

WiseOWL is a methodology for evaluating ontological descriptiveness and semantic correctness that computes four metrics: Well-Described for documentation coverage, Well-Defined using state-of-the-art embeddings to measure label-definition alignment, Connection for structural interconnectedness, and Hierarchical Breadth for hierarchical balance. The system delivers normalized scores between 0 and 10 with actionable feedback, is realized as a Streamlit app that ingests OWL, converts to RDF Turtle, and supplies visualizations, and demonstrates promising effectiveness when applied to the Plant Ontology, Gene Ontology, Semanticscience Integrated Ontology, Food Ontology, Dublin Core, and GoodRelat

What carries the argument

The WiseOWL four-metric scoring system that quantifies descriptiveness and correctness to guide ontology selection and reuse.

If this is right

  • Ontology selection moves from intuition to a reproducible scoring process that can be defended in project decisions.
  • Developers receive specific feedback on which aspects of an ontology need improvement before reuse.
  • The Streamlit implementation enables quick, interactive assessment and visualization without custom tooling.
  • Normalized 0-10 scores allow direct comparison across ontologies from different domains.
  • Consistent reuse reduces duplication and supports more reliable machine-operable semantic content.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The embedding-based check for label-definition alignment could be updated as new language models appear, potentially improving the Well-Defined metric over time.
  • Extending the evaluation to larger sets of ontologies or additional domains might show whether metric weights need domain-specific tuning.
  • The methodology could be combined with usage statistics from ontology repositories to create hybrid recommendation systems.
  • If the scores predict real-world reuse success, they might serve as a lightweight quality filter before deeper manual review.

Load-bearing premise

The four metrics together capture what makes an ontology suitable for reuse and semantically correct, and that results from six test cases are sufficient to establish the method's effectiveness.

What would settle it

An independent ranking of the same six ontologies by domain experts for reuse suitability that shows no correlation with the WiseOWL scores, or a controlled reuse experiment where high-scoring ontologies produce more inconsistencies in integrated applications than low-scoring ones.

Figures

Figures reproduced from arXiv: 2604.12025 by Anna Maria Masci, Aryan Singh Dalal, Asiyah Yu Lin, Hande Kucuk McGinty, Kathleen M. Jagodnik, Maria Baloch.

Figure 2
Figure 2. Figure 2: WiseOWL UI application displaying evaluation results for the Gene [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 1
Figure 1. Figure 1: WiseOWL UI application displaying evaluation results for the [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
read the original abstract

The Semantic Web standardizes concept meaning for humans and machines, enabling machine-operable content and consistent interpretation that improves advanced analytics. Reusing ontologies speeds development and enforces consistency, yet selecting the optimal choice is challenging because authors lack systematic selection criteria and often rely on intuition that is difficult to justify, limiting reuse. To solve this, WiseOWL is proposed, a methodology with scoring and guidance to select ontologies for reuse. It scores four metrics: (i) Well-Described, measuring documentation coverage; (ii) Well-Defined, using state-of-the-art embeddings to assess label-definition alignment; (iii) Connection, capturing structural interconnectedness; and (iv) Hierarchical Breadth, reflecting hierarchical balance. WiseOWL outputs normalized 0-10 scores with actionable feedback. Implemented as a Streamlit app, it ingests OWL format, converts to RDF Turtle, and provides interactive visualizations. Evaluation across six ontologies, including the Plant Ontology (PO), Gene Ontology (GO), Semanticscience Integrated Ontology (SIO), Food Ontology (FoodON), Dublin Core (DC), and GoodRelations, demonstrates promising effectiveness.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes WiseOWL, a methodology to support ontology reuse and recommendation by scoring four metrics: Well-Described (documentation coverage), Well-Defined (embedding-based assessment of label-definition alignment), Connection (structural interconnectedness), and Hierarchical Breadth (hierarchical balance). Scores are normalized to a 0-10 scale with actionable feedback; the approach is implemented as a Streamlit app that ingests OWL files and produces visualizations. Evaluation on six ontologies (PO, GO, SIO, FoodON, DC, GoodRelations) is reported to demonstrate promising effectiveness.

Significance. If the metrics receive precise, reproducible definitions and the evaluation is expanded with quantitative results, baselines, and external validation, WiseOWL could address a genuine practical need in the Semantic Web by offering a systematic alternative to intuition-based ontology selection. The interactive tool implementation is a concrete strength that could facilitate adoption and further testing.

major comments (3)
  1. [§3] §3 (Metric definitions): The four metrics are introduced at a high level without equations, algorithms, or parameter specifications. For instance, the Well-Defined metric invokes 'state-of-the-art embeddings' to assess label-definition alignment but supplies no model, similarity function, aggregation method, or threshold; this is load-bearing because the central claim that WiseOWL measures semantic correctness cannot be evaluated or reproduced without these details.
  2. [§5] §5 (Evaluation): The evaluation states that the six ontologies yield 'promising effectiveness' yet reports no tabulated metric scores, no statistical tests, no error analysis, and no comparison to any baseline selection heuristic or prior method. This directly undermines the claim that the methodology has been shown to be effective.
  3. [§3.2] §3.2 (Well-Defined metric): No external validation (e.g., correlation with expert ratings of semantic correctness or observed reuse frequency) is provided for any metric, including the embedding-based one. Without such grounding, it remains unclear whether high scores actually predict better ontology reuse, which is required for the methodology's stated purpose.
minor comments (2)
  1. [Abstract and §4] The normalization procedure that maps raw metric values to the 0-10 scale is mentioned in the abstract but not specified in the main text; adding the exact transformation (including any free parameters) would improve reproducibility.
  2. [§4] The Streamlit app description could usefully include screenshots or explicit details on the interactive visualizations and the form of the actionable feedback generated for each metric.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the insightful comments on our manuscript describing the WiseOWL methodology. We address each of the major comments below and indicate the revisions we plan to make to strengthen the paper.

read point-by-point responses
  1. Referee: [§3] §3 (Metric definitions): The four metrics are introduced at a high level without equations, algorithms, or parameter specifications. For instance, the Well-Defined metric invokes 'state-of-the-art embeddings' to assess label-definition alignment but supplies no model, similarity function, aggregation method, or threshold; this is load-bearing because the central claim that WiseOWL measures semantic correctness cannot be evaluated or reproduced without these details.

    Authors: We agree with the referee that the metric definitions in section 3 are currently described at a conceptual level without sufficient formal details. To address this, we will revise the manuscript to include explicit equations for each metric, detailed algorithms (including pseudocode), and specific parameter values. For the Well-Defined metric, we will specify the embedding model used (such as a particular Sentence-BERT variant), the similarity function (cosine similarity), the method for aggregating alignment scores across labels and definitions, and any decision thresholds. These additions will make the methodology fully reproducible and allow readers to evaluate the semantic correctness claims. revision: yes

  2. Referee: [§5] §5 (Evaluation): The evaluation states that the six ontologies yield 'promising effectiveness' yet reports no tabulated metric scores, no statistical tests, no error analysis, and no comparison to any baseline selection heuristic or prior method. This directly undermines the claim that the methodology has been shown to be effective.

    Authors: We acknowledge that the evaluation section is limited and does not provide the quantitative details necessary to robustly support the effectiveness claim. In the revised manuscript, we will include a table presenting the normalized scores for all four metrics across the six ontologies. We will also add comparisons to baseline approaches, such as selecting ontologies based on their size or the presence of documentation alone, and include basic statistical analysis of the scores. An error analysis discussing any discrepancies or limitations observed will be incorporated. These changes will provide a more solid foundation for the 'promising effectiveness' statement. revision: yes

  3. Referee: [§3.2] §3.2 (Well-Defined metric): No external validation (e.g., correlation with expert ratings of semantic correctness or observed reuse frequency) is provided for any metric, including the embedding-based one. Without such grounding, it remains unclear whether high scores actually predict better ontology reuse, which is required for the methodology's stated purpose.

    Authors: We recognize that external validation is essential to confirm that the metrics, particularly Well-Defined, correlate with actual ontology quality and reuse success. The current work applies the metrics to well-known ontologies and interprets the results qualitatively. For the revision, we will expand the discussion to include the importance of external validation and outline potential methods for future work, such as correlating scores with expert judgments or reuse statistics from ontology repositories. However, conducting a full empirical validation study is outside the scope of this methodology paper and would require substantial additional effort and data access. We will explicitly note this limitation in the revised text. revision: partial

standing simulated objections not resolved
  • Conducting a comprehensive external validation study correlating metric scores with expert ratings or observed reuse frequencies, as this requires new data collection and analysis beyond the current manuscript's scope.

Circularity Check

0 steps flagged

No circularity: WiseOWL metrics rely on independent external techniques without self-referential reduction or load-bearing self-citations.

full rationale

The paper defines WiseOWL as a scoring methodology using four metrics computed from standard ontology properties and external components: documentation coverage for Well-Described, state-of-the-art embeddings for label-definition alignment in Well-Defined, graph-based structural measures for Connection, and hierarchical balance for Hierarchical Breadth. These are presented as direct computations with normalized 0-10 outputs and no equations or definitions that reduce outputs to inputs by construction. The evaluation on six ontologies is described as a demonstration of effectiveness rather than a fitted prediction or self-referential validation. No self-citations are invoked to justify uniqueness or core premises, and the methodology draws on independent techniques (embeddings, graph analysis) without smuggling ansatzes or renaming known results. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on the domain assumption that embeddings reliably measure label-definition alignment and that the four metrics together capture the target properties; no explicit free parameters beyond normalization to 0-10 are stated, and no new entities are postulated.

free parameters (1)
  • Normalization to 0-10 scale
    Scores are normalized to 0-10 range but the exact scaling procedure, weights across metrics, or reference points are not specified in the abstract.
axioms (2)
  • domain assumption State-of-the-art embeddings can accurately assess semantic alignment between labels and definitions
    Invoked in the Well-Defined metric description.
  • domain assumption Structural graph measures and hierarchy balance metrics are meaningful indicators of ontology quality
    Used for Connection and Hierarchical Breadth metrics.

pith-pipeline@v0.9.0 · 5524 in / 1771 out tokens · 91787 ms · 2026-05-10T15:48:44.032108+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages

  1. [1]

    Gliide: Global-local image integration via descriptive extraction,

    A. S. Dalal, S. Abadifard, and H. K. McGinty, “Gliide: Global-local image integration via descriptive extraction,” inProceedings of the 13th Knowledge Capture Conference 2025, 2025, pp. 194–197

  2. [2]

    Flavonoid fusion: Creating a knowledge graph to unveil the interplay between food and health,

    A. Singh Dalal, Y . Zhang, D. Do ˘gan, A. Mert ˙Ileri, and H. K ¨uc ¸¨uk McGinty, “Flavonoid fusion: Creating a knowledge graph to unveil the interplay between food and health,”arXiv e-prints, pp. arXiv–2510, 2025

  3. [3]

    G ´omez-P´erez, M

    A. G ´omez-P´erez, M. Fern ´andez-L´opez, and O. Corcho,Ontological Engineering: with examples from the areas of Knowledge Management, e-Commerce and the Semantic Web. Springer Science & Business Media, 2006

  4. [4]

    Ontology development 101: A guide to creating your first ontology,

    N. F. Noy, D. L. McGuinnesset al., “Ontology development 101: A guide to creating your first ontology,” 2001

  5. [5]

    A methodology for ontology integra- tion,

    H. S. Pinto and J. P. Martins, “A methodology for ontology integra- tion,” inProceedings of the 1st international conference on Knowledge capture, 2001, pp. 131–138

  6. [6]

    Computer vision based automated quantification of agricultural sprayers boom displacement,

    A. S. Dalal, S. Rai, R. Singh, T. S. Kaloya, R. H. Cheppally, and A. Sharda, “Computer vision based automated quantification of agricultural sprayers boom displacement,”Computers and Electronics in Agriculture, vol. 243, p. 111341, 2026. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0168169925014474

  7. [7]

    The prompt suite: interactive tools for ontology merging and mapping,

    N. F. Noy and M. A. Musen, “The prompt suite: interactive tools for ontology merging and mapping,”International journal of human- computer studies, vol. 59, no. 6, pp. 983–1024, 2003

  8. [8]

    A theoretical framework for ontology evaluation and validation

    A. Gangemi, C. Catenacci, M. Ciaramita, and J. Lehmann, “A theoretical framework for ontology evaluation and validation.” inSWAP, vol. 166, 2005, p. 16

  9. [9]

    A survey of ontology evaluation techniques,

    J. Brank, M. Grobelnik, and D. Mladenic, “A survey of ontology evaluation techniques,” inProceedings of the conference on data mining and data warehouses (SiKDD 2005). Citeseer, 2005, pp. 166–170

  10. [10]

    Methontology: from ontological art towards ontological engineering,

    M. Fern ´andez-L´opez, A. G ´omez-P´erez, and N. Juristo Juzgado, “Methontology: from ontological art towards ontological engineering,” 1997

  11. [11]

    Diligent: Towards a fine- grained methodology for distributed, loosely-controlled and evolving engineering of ontologies,

    H. S. Pinto, S. Staab, and C. Tempich, “Diligent: Towards a fine- grained methodology for distributed, loosely-controlled and evolving engineering of ontologies,” inECAI, vol. 16, 2004, p. 393

  12. [12]

    Neon methodology for building ontology networks: specification, scheduling and reuse,

    M. C. Su ´arez-Figueroa, “Neon methodology for building ontology networks: specification, scheduling and reuse,” Ph.D. dissertation, In- formatica, 2010

  13. [13]

    Ontology design patterns,

    A. Gangemi and V . Presutti, “Ontology design patterns,” inHandbook on ontologies. Springer, 2009, pp. 221–243

  14. [14]

    Echo-llm evidence-checked hierarchical ontology,

    A. S. Dalal and H. McGinty, “Echo-llm evidence-checked hierarchical ontology,” inOpen Conference Proceedings, vol. 8, 2026

  15. [15]

    Knowledge acquisition and representation methodology (knarm) and its applications,

    H. K. McGinty, “Knowledge acquisition and representation methodology (knarm) and its applications,” Ph.D. dissertation, University of Miami, 2018

  16. [16]

    Olive: Ontology learning with integrated vector embeddings,

    Y . Zhang, A. S. Dalal, C. Martin, S. R. Gadusu, and H. K. McGinty, “Olive: Ontology learning with integrated vector embeddings,”Applied Ontology, p. 15705838251329268, 2024

  17. [17]

    Ontoqa: Metric-based ontology quality analysis,

    S. Tartir, I. B. Arpinar, M. Moore, A. P. Sheth, and B. Aleman-Meza, “Ontoqa: Metric-based ontology quality analysis,” 2005

  18. [18]

    Ontometric: A method to choose the appropriate ontology,

    A. Lozano-Tello and A. G ´omez-P´erez, “Ontometric: A method to choose the appropriate ontology,”Journal of Database Management (JDM), vol. 15, no. 2, pp. 1–18, 2004

  19. [19]

    Ncbo ontology recommender 2.0: an enhanced ap- proach for biomedical ontology recommendation,

    M. Mart ´ınez-Romero, C. Jonquet, M. J. O’connor, J. Graybeal, A. Pazos, and M. A. Musen, “Ncbo ontology recommender 2.0: an enhanced ap- proach for biomedical ontology recommendation,”Journal of biomedical semantics, vol. 8, pp. 1–22, 2017

  20. [20]

    Supporting ontological analysis of taxonomic relationships,

    C. Welty and N. Guarino, “Supporting ontological analysis of taxonomic relationships,”Data & knowledge engineering, vol. 39, no. 1, pp. 51–74, 2001

  21. [21]

    Aeon–an approach to the automatic evaluation of ontologies,

    J. V ¨olker, D. Vrande ˇci´c, Y . Sure, and A. Hotho, “Aeon–an approach to the automatic evaluation of ontologies,”Applied Ontology, vol. 3, no. 1-2, pp. 41–62, 2008

  22. [22]

    A semiotic metrics suite for assessing the quality of ontologies,

    A. Burton-Jones, V . C. Storey, V . Sugumaran, and P. Ahluwalia, “A semiotic metrics suite for assessing the quality of ontologies,”Data & Knowledge Engineering, vol. 55, no. 1, pp. 84–102, 2005

  23. [23]

    A conceptual model for ontology quality assessment: A systematic review,

    R. Wilson, J. S. Goonetillake, W. Indika, and A. Ginige, “A conceptual model for ontology quality assessment: A systematic review,”Semantic Web, vol. 14, no. 6, pp. 1051–1097, 2023

  24. [24]

    User-driven quality evaluation of dbpedia,

    A. Zaveri, D. Kontokostas, M. A. Sherif, L. B ¨uhmann, M. Morsey, S. Auer, and J. Lehmann, “User-driven quality evaluation of dbpedia,” in Proceedings of the 9th International Conference on Semantic Systems, 2013, pp. 97–104

  25. [25]

    The semantic web: A new form of web content that is meaningful to computers will unleash a revolution of new possibilities,

    T. Berners-Lee, J. Hendler, and O. Lassila, “The semantic web: A new form of web content that is meaningful to computers will unleash a revolution of new possibilities,” inLinking the world’s information: essays on Tim Berners-Lee’s invention of the World Wide Web, 2023, pp. 91–103

  26. [26]

    Oops!(ontology pitfall scanner!): An on-line tool for ontology evalua- tion,

    M. Poveda-Villal ´on, A. G ´omez-P´erez, and M. C. Su ´arez-Figueroa, “Oops!(ontology pitfall scanner!): An on-line tool for ontology evalua- tion,”International Journal on Semantic Web and Information Systems (IJSWIS), vol. 10, no. 2, pp. 7–34, 2014

  27. [27]

    An ontology knowledge inspection methodology for quality assessment and continuous improvement,

    G. Roldan-Molina, D. Ruano-Ord ´as, V . Basto-Fernandes, and J. R. M´endez, “An ontology knowledge inspection methodology for quality assessment and continuous improvement,”An ontology knowledge in- spection methodology for quality assessment and continuous improve- ment, 2021

  28. [28]

    Bert: Pre-training of deep bidirectional transformers for language understanding,

    J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” inPro- ceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), 2019, pp. 4171–4186

  29. [29]

    Depth-first and breadth-first search,

    D. C. Kozen, “Depth-first and breadth-first search,” inThe design and analysis of algorithms. Springer, 1992, pp. 19–24

  30. [30]

    M. Bergman. (2010, August) An executive intro to ontologies. [Online]. Available: https://www.mkbergman.com/900/an-executive-intro-to-ont ologies/

  31. [31]

    Streamlit: A faster way to build and share data apps,

    S. Inc., “Streamlit: A faster way to build and share data apps,” 2023, open-source Python framework. [Online]. Available: https://streamlit.io

  32. [32]

    P. T. Inc. (2015) Collaborative data science. Montreal, QC. [Online]. Available: https://plot.ly

  33. [33]

    Plant ontology (po): a controlled vocabulary of plant structures and growth stages,

    P. Jaiswal, S. Avraham, K. Ilic, E. A. Kellogg, S. McCouch, A. Pujar, L. Reiser, S. Y . Rhee, M. M. Sachs, M. Schaefferet al., “Plant ontology (po): a controlled vocabulary of plant structures and growth stages,” Comparative and functional genomics, vol. 6, no. 7-8, pp. 388–397, 2005

  34. [34]

    Gene ontology: tool for the unification of biology,

    M. Ashburner, C. A. Ball, J. A. Blake, D. Botstein, H. Butler, J. M. Cherry, A. P. Davis, K. Dolinski, S. S. Dwight, J. T. Eppiget al., “Gene ontology: tool for the unification of biology,”Nature genetics, vol. 25, no. 1, pp. 25–29, 2000

  35. [35]

    Semanticscience integrated ontology,

    “Semanticscience integrated ontology,” https://raw.githubusercontent.co m/micheldumontier/semanticscience/master/ontology/sio/release/sio-rel ease.owl, 2024

  36. [36]

    Foodon: a harmonized food ontology to increase global food traceability, quality control and data integration,

    D. M. Dooley, E. J. Griffiths, G. S. Gosal, P. L. Buttigieg, R. Hoehndorf, M. C. Lange, L. M. Schriml, F. S. Brinkman, and W. W. Hsiao, “Foodon: a harmonized food ontology to increase global food traceability, quality control and data integration,”npj Science of Food, vol. 2, no. 1, p. 23, 2018

  37. [37]

    Integrating dublin core metadata for cultural heritage collections using ontologies,

    C. Kakali, I. Lourdi, T. Stasinopoulou, L. Bountouri, C. Papatheodorou, M. Doerr, and M. Gergatsoulis, “Integrating dublin core metadata for cultural heritage collections using ontologies,” inProceedings of the International Conference on Dublin Core and Metadata Applications. Dublin Core Metadata Initiative, 2007

  38. [38]

    Goodrelations: An ontology for describing products and services offers on the web,

    M. Hepp, “Goodrelations: An ontology for describing products and services offers on the web,” inInternational conference on knowledge engineering and knowledge management. Springer, 2008, pp. 329–346

  39. [39]

    Plant ontology — Wikipedia, the free encyclo- pedia,

    Wikipedia contributors, “Plant ontology — Wikipedia, the free encyclo- pedia,” https://en.wikipedia.org/wiki/Plant ontology, 2025, accessed: 30 Oct 2025

  40. [40]

    Ontology documentation – gene ontology,

    Gene Ontology Consortium, “Ontology documentation – gene ontology,” https://geneontology.org/docs/ontology-documentation/, 2025, accessed: 1 Nov 2025

  41. [41]

    About the gene ontology (go): Introduction to go,

    ——, “About the gene ontology (go): Introduction to go,” https://gene ontology.org/docs/introduction-to-go, 2025, accessed: 1 Nov 2025

  42. [42]

    Biobert: a pre-trained biomedical language representation model for biomedical text mining,

    J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C. H. So, and J. Kang, “Biobert: a pre-trained biomedical language representation model for biomedical text mining,”Bioinformatics, vol. 36, no. 4, pp. 1234–1240, 2020