Intelligent Knowledge Mining Framework: Bridging AI Analysis and Trustworthy Preservation

Binh Vu

arxiv: 2512.17795 · v2 · submitted 2025-12-19 · 💻 cs.DL · cs.AI· cs.IR

Intelligent Knowledge Mining Framework: Bridging AI Analysis and Trustworthy Preservation

Binh Vu This is my paper

Pith reviewed 2026-05-16 20:44 UTC · model grok-4.3

classification 💻 cs.DL cs.AIcs.IR

keywords knowledge miningAI analysisdata preservationdigital repositoriesframework designactionable intelligence

0 comments

The pith

A dual-stream architecture bridges AI knowledge mining with trustworthy archiving to create living data ecosystems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the Intelligent Knowledge Mining Framework as a conceptual model to solve the problem of data trapped in silos across digital systems. It describes a dual-stream setup where one process uses AI to turn raw data into semantically rich and actionable knowledge, while a parallel stream handles archiving to keep integrity, provenance, and reproducibility intact. If true, this would let organizations move beyond static storage to ecosystems that continuously supply usable intelligence to users. The work outlines the motivation, research questions, methodology, and design details for building such a system.

Core claim

The paper establishes that by implementing a dual-stream architecture—one for systematic transformation of raw data into machine-actionable knowledge via AI mining and the other for parallel trustworthy archiving—the Intelligent Knowledge Mining Framework serves as a foundational model that converts static repositories into living ecosystems that facilitate the flow of actionable intelligence from producers to consumers.

What carries the argument

The dual-stream architecture of the Intelligent Knowledge Mining Framework, consisting of a horizontal Mining Process that transforms raw data into semantically rich knowledge and a parallel Trustworthy Archiving Stream that maintains integrity, provenance, and reproducibility.

If this is right

Static repositories gain the ability to deliver ongoing actionable intelligence rather than remaining passive stores.
AI-driven analysis and preservation processes operate in parallel without one undermining the other.
Data producers and consumers interact through a shared flow of machine-actionable knowledge.
Computational reproducibility becomes a built-in property of all archived knowledge assets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The model could guide updates to existing digital libraries by adding parallel AI processing layers.
Real-world use would require defining exact interfaces and standards between the mining and archiving streams.
Fields handling large unstructured datasets, such as scientific publishing, could test the framework for improved knowledge reuse.

Load-bearing premise

That defining a dual-stream architecture alone will successfully bridge dynamic AI analysis and long-term preservation without further technical specifications or empirical demonstration.

What would settle it

A concrete implementation of the framework on heterogeneous data sources that shows both higher rates of actionable knowledge extraction and maintained long-term reproducibility would confirm or refute the central claim.

Figures

Figures reproduced from arXiv: 2512.17795 by Binh Vu.

**Figure 2.** Figure 2: Decomposition of the Research Program into Targeted R&D Projects. The overall [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

**Figure 3.** Figure 3: A Conceptual Schema for Planning and Synthesizing R&D Project Contributions. [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 4.** Figure 4: The SECI Model of Knowledge Creation [33], illustrating the spiral process through which tacit and explicit knowledge are converted and amplified within an organization. A significant theoretical shift occurred with the widespread recognition of the critical importance of tacit knowledge, the unarticulated, experience-based wisdom of individuals. This led to second-generation systems, which adopted a pers… view at source ↗

**Figure 5.** Figure 5: Layered Architecture of a Knowledge Management System. This model provides a [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

**Figure 6.** Figure 6: The DIKW Pyramid [40]. This model illustrates the hierarchical process of transforming raw data into actionable wisdom. The IKMF aims to facilitate the transitions between each level. to the ”knowledge graveyard” problem and the high failure rate of many KMS initiatives. This reveals a foundational challenge that underpins the entire effort: The creation of a persistent and active organizational memory. A… view at source ↗

**Figure 7.** Figure 7: A conceptual NLP processing pipeline [43], illustrating how raw text is transformed into a richly annotated document (‘Doc‘) object through a series of modular components. 14 [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗

**Figure 8.** Figure 8: A conceptual illustration of the Latent Dirichlet Allocation (LDA) model. It models [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗

**Figure 9.** Figure 9: The Semantic Web Stack, illustrating the hierarchy of technologies from foundational [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗

**Figure 10.** Figure 10: An example of the SKOS (Simple Knowledge Organization System) data model [ [PITH_FULL_IMAGE:figures/full_fig_p017_10.png] view at source ↗

**Figure 11.** Figure 11: The parts hierarchy of the OWL 2 RDF-Based Semantics. Each node represents [PITH_FULL_IMAGE:figures/full_fig_p018_11.png] view at source ↗

**Figure 12.** Figure 12: The Neuro-Symbolic AI Cycle. Sub-symbolic models (e.g., LLMs) learn from data [PITH_FULL_IMAGE:figures/full_fig_p019_12.png] view at source ↗

**Figure 13.** Figure 13: , OAIS provides a comprehensive conceptual model for a digital archive, defining its key functional entities and information packages. It establishes a common vocabulary and a set of mandatory responsibilities for any organization claiming to be a trustworthy digital repository. This model is often implemented using production-ready institutional repository software, with DSpace being a prominent open-sou… view at source ↗

**Figure 14.** Figure 14: The CERIF Data Model, illustrating how base entities (like Project and Person) are [PITH_FULL_IMAGE:figures/full_fig_p021_14.png] view at source ↗

**Figure 15.** Figure 15: The IKMF Reference Model, illustrating the progression from Producer to Consumer [PITH_FULL_IMAGE:figures/full_fig_p023_15.png] view at source ↗

read the original abstract

The unprecedented proliferation of digital data presents significant challenges in access, integration, and value creation across all data-intensive sectors. Valuable information is frequently encapsulated within disparate systems, unstructured documents, and heterogeneous formats, creating silos that impede efficient utilization and collaborative decision-making. This paper introduces the Intelligent Knowledge Mining Framework (IKMF), a comprehensive conceptual model designed to bridge the critical gap between dynamic AI-driven analysis and trustworthy long-term preservation. The framework proposes a dual-stream architecture: a horizontal Mining Process that systematically transforms raw data into semantically rich, machine-actionable knowledge, and a parallel Trustworthy Archiving Stream that ensures the integrity, provenance, and computational reproducibility of these assets. By defining a blueprint for this symbiotic relationship, the paper provides a foundational model for transforming static repositories into living ecosystems that facilitate the flow of actionable intelligence from producers to consumers. This paper outlines the motivation, problem statement, and key research questions guiding the research and development of the framework, presents the underlying scientific methodology, and details its conceptual design and modeling.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a high-level conceptual sketch of a dual-stream framework with no mechanisms, tests, or evidence to back its claims about turning repositories into living ecosystems.

read the letter

The paper's main point is a proposal for the Intelligent Knowledge Mining Framework, which pairs a horizontal AI mining stream with a parallel trustworthy archiving stream to handle data silos. It identifies the problem of static repositories and unstructured data but stays at the level of a blueprint without delivering results or details. What it does reasonably is lay out the motivation and describe the two streams at a conceptual level, making the idea of complementary dynamic analysis and preservation easy to follow on paper. The architecture is presented cleanly enough as a starting point for discussion in knowledge management. The soft spots are the lack of substance around the core claim. The paper asserts that the streams form a symbiotic relationship enabling real-time enrichment and long-term reproducibility, yet it supplies no protocols for provenance during updates, no way to handle conflicts between mutable knowledge graphs and immutable archives, and no invariants or examples to show how AI transformations stay reproducible after archiving. There are no equations, data, worked examples, or even pseudocode to evaluate whether the design holds up. This leaves the transformation into living ecosystems as an untested assertion rather than a demonstrated property, and the framework draws from existing ideas in digital archiving without adding first-principles advances. This is for readers who want broad conceptual overviews in AI and preservation, but it offers little to practitioners or researchers needing implementable methods or validation. The thinking is internally consistent and engages honestly with the stated problem, but the absence of depth means it does not rise to the level of a contribution that deserves referee time. I would not send it for peer review in this form.

Referee Report

2 major / 1 minor

Summary. The paper introduces the Intelligent Knowledge Mining Framework (IKMF) as a conceptual dual-stream architecture: a horizontal Mining Process that transforms raw data into semantically rich, machine-actionable knowledge, paired with a parallel Trustworthy Archiving Stream that preserves integrity, provenance, and reproducibility. It positions this as a blueprint for converting static repositories into living ecosystems that enable flow of actionable intelligence, outlining motivation, research questions, methodology, and high-level design.

Significance. If the unspecified integration mechanisms between dynamic mining and static archiving can be rigorously defined and validated, the framework could supply a useful high-level blueprint for digital libraries and data-intensive domains seeking to combine AI-driven enrichment with long-term trustworthiness. As presented, however, the contribution remains a high-level proposal without derivations, protocols, or tests, limiting its immediate significance to stimulating discussion rather than providing an actionable model.

major comments (2)

[Conceptual design and modeling] Conceptual design and modeling sections: the central claim that the Mining Process and Trustworthy Archiving Stream form a symbiotic relationship enabling simultaneous real-time semantic enrichment and long-term reproducibility is asserted without any protocol for provenance tracking during live updates, conflict resolution between mutable knowledge graphs and immutable archives, or formal invariants for computational reproducibility. This integration mechanism is load-bearing for the transformation of static repositories into living ecosystems but is left unspecified.
[Scientific methodology] Scientific methodology and key research questions sections: no mathematical derivations, formal invariants, data, or empirical tests are supplied to evaluate whether the dual-stream architecture actually achieves its stated goals. The soundness assessment rests entirely on architectural description, which is insufficient to substantiate the foundational-model claim.

minor comments (1)

[Abstract and introduction] The abstract and introduction repeat the high-level motivation without distinguishing the novel aspects of IKMF from prior work on knowledge graphs, digital preservation, or AI pipelines; adding targeted comparisons would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We appreciate the acknowledgment of the framework's potential as a high-level blueprint. Below we respond point-by-point to the major comments, clarifying the manuscript's conceptual scope while outlining targeted revisions to address the concerns about integration mechanisms and validation.

read point-by-point responses

Referee: [Conceptual design and modeling] Conceptual design and modeling sections: the central claim that the Mining Process and Trustworthy Archiving Stream form a symbiotic relationship enabling simultaneous real-time semantic enrichment and long-term reproducibility is asserted without any protocol for provenance tracking during live updates, conflict resolution between mutable knowledge graphs and immutable archives, or formal invariants for computational reproducibility. This integration mechanism is load-bearing for the transformation of static repositories into living ecosystems but is left unspecified.

Authors: We agree that the integration mechanisms between the dynamic Mining Process and the static Trustworthy Archiving Stream are described at a high conceptual level without detailed protocols. The manuscript positions IKMF as a foundational blueprint rather than a fully specified system, which is why concrete mechanisms for provenance tracking during live updates, conflict resolution (e.g., between mutable knowledge graphs and immutable archives), and formal invariants were not elaborated. In the revised manuscript we will expand the conceptual design section to include high-level architectural outlines for these aspects, such as using append-only ledgers for archiving provenance and version-control strategies for knowledge-graph updates, while explicitly noting that full protocol definitions and invariants remain topics for follow-on implementation work. This will better bound the current contribution without overclaiming. revision: partial
Referee: [Scientific methodology] Scientific methodology and key research questions sections: no mathematical derivations, formal invariants, data, or empirical tests are supplied to evaluate whether the dual-stream architecture actually achieves its stated goals. The soundness assessment rests entirely on architectural description, which is insufficient to substantiate the foundational-model claim.

Authors: The paper is framed throughout as a conceptual model that outlines motivation, research questions, and high-level design; it does not present empirical evaluation or formal proofs. We accept that the absence of mathematical derivations, invariants, data, or tests means the soundness argument rests on architectural coherence and alignment with existing principles in AI knowledge extraction and digital preservation. In revision we will augment the methodology section with an explicit discussion of the conceptual nature of the work and a roadmap for future validation (e.g., simulation-based case studies or prototype implementations). We do not claim the current manuscript provides a fully substantiated operational model, only a blueprint intended to guide subsequent rigorous development. revision: partial

Circularity Check

0 steps flagged

No circularity: conceptual proposal without derivations or self-referential reductions

full rationale

The paper presents a high-level conceptual framework (IKMF) defined by a dual-stream architecture whose purpose is stated as bridging analysis and preservation. No equations, fitted parameters, predictions, or load-bearing self-citations appear in the abstract or described structure. The central claim is an architectural assertion by definition rather than a result derived from prior inputs that reduces tautologically. No steps match the enumerated circularity patterns; the derivation chain is self-contained as a modeling exercise.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The paper rests on standard domain assumptions about data silos and introduces the IKMF as a new organizing concept without independent evidence or fitted parameters.

axioms (2)

domain assumption Valuable information is frequently encapsulated within disparate systems, unstructured documents, and heterogeneous formats, creating silos that impede efficient utilization.
Stated directly in the abstract as the core problem motivating the framework.
ad hoc to paper A dual-stream architecture can bridge AI-driven analysis and trustworthy long-term preservation.
This is the central modeling choice of the IKMF.

invented entities (1)

Intelligent Knowledge Mining Framework (IKMF) no independent evidence
purpose: To provide a blueprint for transforming static repositories into living knowledge ecosystems.
New conceptual model introduced by the paper.

pith-pipeline@v0.9.0 · 5469 in / 1221 out tokens · 29918 ms · 2026-05-16T20:44:38.036867+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

80 extracted references · 80 canonical work pages

[1]

T. Hey, S. Tansley, and K. Tolle,The fourth paradigm: data-intensive scientific discovery, vol. 1. Microsoft research, 2009

work page 2009
[2]

3d data management: Controlling data volume, velocity and variety,

D. Laney, “3d data management: Controlling data volume, velocity and variety,”META group research note, vol. 6, 2001

work page 2001
[3]

Bridging data silos using big data integration,

S. Abraham, D. S. Ewen, and B. Burnett, “Bridging data silos using big data integration,” International Journal of Database Management Systems, vol. 11, no. 2/3, pp. 1–17, 2019

work page 2019
[4]

1,500 scientists lift the lid on reproducibility,

M. Baker, “1,500 scientists lift the lid on reproducibility,”Nature News, vol. 533, no. 7604, pp. 452–454, 2016

work page 2016
[5]

W. H. Inmon, C. Imhoff, and R. Sousa,Corporate information factory. John Wiley & Sons, 2002

work page 2002
[6]

C. C. Aggarwal and C. Zhai,Mining text data. Springer Science & Business Media, 2012

work page 2012
[7]

Review: Knowledge management and knowledge management systems: Conceptual foundations and research issues,

M. Alavi and D. E. Leidner, “Review: Knowledge management and knowledge management systems: Conceptual foundations and research issues,”MIS Quarterly, pp. 107–136, 2001

work page 2001
[8]

Evolution of knowledge management,

B. Maˇ si´ c, S. Neˇ si´ c, D. Nikoli´ c, and M. Dˇ zeletovi´ c, “Evolution of knowledge management,” Industrija, vol. 45, no. 2, pp. 127–147, 2017. 26

work page 2017
[9]

Knowledge management in organizations: examining the interaction between technologies, techniques, and people,

G. D. Bhatt, “Knowledge management in organizations: examining the interaction between technologies, techniques, and people,”Journal of knowledge management, vol. 5, no. 1, pp. 68–75, 2001

work page 2001
[10]

Cognitive load during problem solving: Effects on learning,

J. Sweller, “Cognitive load during problem solving: Effects on learning,”Cognitive science, vol. 12, no. 2, pp. 257–285, 1988

work page 1988
[11]

Digital ecosystems: Evolving service-oriented architectures,

G. Briscoe and P. De Wilde, “Digital ecosystems: Evolving service-oriented architectures,” in2008 Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence), pp. 2997–3004, IEEE, 2008

work page 2008
[12]

Probabilistic machine learning and artificial intelligence,

Z. Ghahramani, “Probabilistic machine learning and artificial intelligence,”Nature, vol. 521, no. 7553, pp. 452–459, 2015

work page 2015
[13]

Knowledge graphs,

A. Hogan, E. Blomqvist, M. Cochez, C. d’Amato, G. de Melo, C. Gutierrez, J. E. L. Gayo, S. Kirrane, S. Neumaier, A. Polleres,et al., “Knowledge graphs,”ACM Computing Surveys (CSUR), vol. 54, no. 4, pp. 1–37, 2021

work page 2021
[14]

A content and knowledge management system supporting emotion detection from speech,

B. Vu, M. de Velasco, P. Mc Kevitt, R. Bond, R. Turkington, F. Booth, M. Mulvenna, M. Fuchs, and M. Hemmje, “A content and knowledge management system supporting emotion detection from speech,” inConversational Dialogue Systems for the Next Decade (L. F. D’Haro, Z. Callejas, and S. Nakamura, eds.), vol. 704 ofLecture Notes in Electrical Engineering, Sprin...

work page 2021
[15]

Vu,A Taxonomy Management System Supporting Crowd-based Taxonomy Generation, Evolution, and Management

B. Vu,A Taxonomy Management System Supporting Crowd-based Taxonomy Generation, Evolution, and Management. PhD thesis, Hagen, 2020

work page 2020
[16]

MPEG-7: The generic multimedia content description standard, part 1,

J. M. Martinez, “MPEG-7: The generic multimedia content description standard, part 1,” IEEE multimedia, vol. 9, no. 2, pp. 78–87, 2002

work page 2002
[17]

Towards continuous professional monitoring of health status based on energetic balancing,

B. Vu, S. Bruchhaus, A. Moorhead, H. Zheng, L. D’Arco, L. Lynch, L. S. Sica, M. Pon- ticorvo, F. Diano, H. Afli, P. Joshi, A. Molinari, and M. Hemmje, “Towards continuous professional monitoring of health status based on energetic balancing,” in2022 IEEE Inter- national Workshop on Sport, Technology and Research (STAR), (Trento - Cavalese, Italy), pp. 72–77, 2022

work page 2022
[18]

Supporting Mental Health in Young People: Integrated Methodology for cLinical dEcisions (SMILE)

European Commission, CORDIS, “Supporting Mental Health in Young People: Integrated Methodology for cLinical dEcisions (SMILE).” EU CORDIS Project Page, 2023. Grant agreement ID: 101080923

work page 2023
[19]

Using Large Language Models for Microbiome Findings Reports in Laboratory Diagnos- tics,

T. Krause, L. Glau, P. Newels, T. Reis, M. X. Bornschlegl, M. Kramer, and M. L. Hemmje, “Using Large Language Models for Microbiome Findings Reports in Laboratory Diagnos- tics,”BioMedInformatics, vol. 4, no. 3, pp. 1979–2001, 2024

work page 1979
[20]

The fair guiding principles for scientific data management and stewardship,

M. D. Wilkinson, M. Dumontier, I. J. Aalbersberg, G. Appleton, M. Axton, A. Baak, N. Blomberg, J.-W. Boiten, L. B. da Silva Santos, P. E. Bourne,et al., “The fair guiding principles for scientific data management and stewardship,”Scientific data, vol. 3, no. 1, pp. 1–9, 2016

work page 2016
[21]

Preservation of digital records: issues and challenges in the digital era,

P. Jain and N. Mnjama, “Preservation of digital records: issues and challenges in the digital era,”Journal of the South African Society of Archivists, vol. 49, pp. 157–172, 2016

work page 2016
[22]

Premis data dictionary for preservation metadata, version 3.0,

PREMIS Editorial Committee, “Premis data dictionary for preservation metadata, version 3.0,” 2015. 27

work page 2015
[23]

The oais reference model: a study from a technical point of view,

F. d. J. Lavorato and R. d. C. Sant’Ana, “The oais reference model: a study from a technical point of view,” in2014 IEEE 8th International Conference on Application of Information and Communication Technologies (AICT), pp. 1–6, IEEE, 2014

work page 2014
[24]

Prov-o: The prov ontology,

T. Lebo, S. Sahoo, and D. McGuinness, “Prov-o: The prov ontology,” 2013

work page 2013
[25]

Cris/oar portals: a roadmap for making research information publicly visible,

P. De Castro, M. Casado, and M. Legido, “Cris/oar portals: a roadmap for making research information publicly visible,”Program, vol. 45, no. 4, pp. 415–433, 2011

work page 2011
[26]

The irods data grid,

A. Rajasekar, R. Moore, M. Wan, and W. Schroeder, “The irods data grid,” inData Grids-The Next Generation of Data-Centric Collaborations, pp. 101–141, Springer, 2010

work page 2010
[27]

Systems development in information systems research,

J. F. Nunamaker, M. Chen, and T. D. M. Purdin, “Systems development in information systems research,”Journal of Management Information Systems, vol. 7, no. 3, pp. 89–106, 1991

work page 1991
[28]

Design science in information systems research,

A. R. Hevner, S. T. March, J. Park, and S. Ram, “Design science in information systems research,”MIS quarterly, pp. 75–105, 2004

work page 2004
[29]

Creating impact through systematic programs of research,

J. F. Nunamaker, R. O. Briggs, N. W. Twyman, and J. S. Giboney, “Creating impact through systematic programs of research,”Journal of Management Information Systems, vol. 31, no. 3, pp. 13–41, 2014

work page 2014
[30]

A design science research methodology for information systems research,

K. Peffers, T. Tuunanen, M. A. Rothenberger, and S. Chatterjee, “A design science research methodology for information systems research,”Journal of management information sys- tems, vol. 24, no. 3, pp. 45–77, 2007

work page 2007
[31]

Towards trustworthiness in ai-based big data analysis,

M. X. Bornschlegl, “Towards trustworthiness in ai-based big data analysis,” 2024

work page 2024
[32]

The evolution of knowledge management systems needs to be managed,

R. Lindgren and O. Henfridsson, “The evolution of knowledge management systems needs to be managed,”Knowledge Management Research & Practice, vol. 2, no. 1, pp. 56–64, 2004

work page 2004
[33]

Nonaka and H

I. Nonaka and H. Takeuchi,The knowledge-creating company: How Japanese companies create the dynamics of innovation. Oxford university press, 1995

work page 1995
[34]

Wenger, R

E. Wenger, R. A. McDermott, and W. Snyder,Cultivating communities of practice: A guide to managing knowledge. Harvard Business School Press, 2002

work page 2002
[35]

The knowledge caf´ e—a knowledge management system and its application to hospitality and tourism,

N. Gronau, E. Weber, and A. Kienle, “The knowledge caf´ e—a knowledge management system and its application to hospitality and tourism,” inInformation and Communication Technologies in Tourism 2009, pp. 309–320, Springer, 2009

work page 2009
[36]

Kms failure: a study of the contributing factors and cures,

A. Y. Chua and W.-Y. Lam, “Kms failure: a study of the contributing factors and cures,” Industrial Management & Data Systems, vol. 109, no. 1, pp. 64–79, 2009

work page 2009
[37]

Knowledge management critical failure factors: a multi- case study,

P. Akhavan and A. Pezeshkan, “Knowledge management critical failure factors: a multi- case study,”VINE, vol. 44, pp. 22–41, 02 2014

work page 2014
[38]

Social media, social acts, and knowledge sharing in or- ganizations: A case of a professional services firm,

M. H. Jarrahi and S. Sawyer, “Social media, social acts, and knowledge sharing in or- ganizations: A case of a professional services firm,”Journal of the American Society for Information Science and Technology, vol. 63, no. 10, pp. 2028–2040, 2012

work page 2028
[39]

Absorptive capacity: A new perspective on learning and innovation,

W. M. Cohen and D. A. Levinthal, “Absorptive capacity: A new perspective on learning and innovation,”Administrative science quarterly, pp. 128–152, 1990

work page 1990
[40]

Dikw pyramid

J. Winter, “Dikw pyramid.” Jeff Winter Insights, 2023. Accessed: 2025-07-10. 28

work page 2023
[41]

The probabilistic relevance framework: Bm25 and be- yond,

S. Robertson and H. Zaragoza, “The probabilistic relevance framework: Bm25 and be- yond,”Foundations and Trends in Information Retrieval, vol. 3, no. 4, pp. 333–389, 2009

work page 2009
[42]

Dense pas- sage retrieval for open-domain question answering,

V. Karpukhin, B. Oguz, S. Min, P. Lewis, W.-t. Yih, N. Goyal, and D. Chen, “Dense pas- sage retrieval for open-domain question answering,” inProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 6769–6781, 2020

work page 2020
[43]

Processing pipelines

spaCy, “Processing pipelines.” spaCy Usage Documentation, 2024. Accessed: 2025-07-10

work page 2024
[44]

Neural archi- tectures for named entity recognition,

G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, and C. Dyer, “Neural archi- tectures for named entity recognition,” inProceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 260–270, 2016

work page 2016
[45]

Distant supervision for relation extraction without labeled data,

M. Mintz, S. Bills, R. Snow, and D. Jurafsky, “Distant supervision for relation extraction without labeled data,” inProceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp. 1003–1011, 2009

work page 2009
[46]

Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension,

M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, and L. Zettlemoyer, “Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension,” inProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7871–7880, 2020

work page 2020
[47]

Pegasus: Pre-training with extracted gap- sentences for abstractive summarization,

J. Zhang, Y. Zhao, M. Saleh, and P. J. Liu, “Pegasus: Pre-training with extracted gap- sentences for abstractive summarization,” inInternational Conference on Machine Learn- ing, pp. 11328–11339, PMLR, 2020

work page 2020
[48]

Latent dirichlet allocation,

D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,”Journal of machine Learning research, vol. 3, no. Jan, pp. 993–1022, 2003

work page 2003
[49]

What is topic modeling? Discuss key algorithms, working, applications, and the pros and cons

AIML.com, “What is topic modeling? Discuss key algorithms, working, applications, and the pros and cons.” Web page, 2024. Accessed: 2025-07-11

work page 2024
[50]

Spectral

D. Angelov, “Top2vec: Distributed representations of topics,”arXiv preprint arXiv:2008.09470, 2020

work page arXiv 2008
[51]

Reading wikipedia to answer open-domain questions,

D. Chen, A. Fisch, J. Weston, and A. Bordes, “Reading wikipedia to answer open-domain questions,” inProceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1870–1879, 2017

work page 2017
[52]

Data fusion,

J. Bleiholder and F. Naumann, “Data fusion,”ACM Computing Surveys (CSUR), vol. 41, no. 1, pp. 1–41, 2008

work page 2008
[53]

The semantic web,

T. Berners-Lee, J. Hendler, and O. Lassila, “The semantic web,”Scientific american, vol. 284, no. 5, pp. 34–43, 2001

work page 2001
[54]

Rdf 1.1 concepts and abstract syntax,

R. Cyganiak, D. Wood, and M. Lanthaler, “Rdf 1.1 concepts and abstract syntax,”W3C Recommendation, 2014

work page 2014
[55]

Owl web ontology language overview,

D. L. McGuinness, F. Van Harmelen,et al., “Owl web ontology language overview,”W3C recommendation, vol. 10, no. 2004-02-10, p. 2004, 2004

work page 2004
[56]

The rq-tech methodology: A new paradigm for conceptualizing strategic enterprise architectures,

C. Hoyland, K. Adams, A. Tolk, and L. Xu, “The rq-tech methodology: A new paradigm for conceptualizing strategic enterprise architectures,”Journal of Management Analytics, vol. 1, pp. 55–77, 05 2014. 29

work page 2014
[57]

Lambe,Organizing knowledge: taxonomies, knowledge and organizational effectiveness

P. Lambe,Organizing knowledge: taxonomies, knowledge and organizational effectiveness. Chandos Publishing, 2007

work page 2007
[58]

Hedden,The accidental taxonomist

H. Hedden,The accidental taxonomist. Information Today, Inc., 2016

work page 2016
[59]

Skos (simple knowledge organization system)

J. Busse, “Skos (simple knowledge organization system).” Web page for Deutscher Terminologie-Tag e.V., 2023. Accessed: 2025-07-10

work page 2023
[60]

Thesauri, taxonomies, and ontologies: An etymological note,

A. Gilchrist, “Thesauri, taxonomies, and ontologies: An etymological note,”Journal of Documentation, vol. 59, no. 1, pp. 7–18, 2003

work page 2003
[61]

Toward principles for the design of ontologies used for knowledge sharing?,

T. R. Gruber, “Toward principles for the design of ontologies used for knowledge sharing?,” International journal of human-computer studies, vol. 43, no. 5-6, pp. 907–928, 1995

work page 1995
[62]

Owl 2 web ontology language rdf-based semantics (second edition)

B. Motik, P. F. Patel-Schneider, and B. Cuenca Grau, “Owl 2 web ontology language rdf-based semantics (second edition).” W3C Recommendation, December 2012. Accessed: 2025-07-10

work page 2012
[63]

Knowledge engineering: principles and meth- ods,

R. Studer, V. R. Benjamins, and D. Fensel, “Knowledge engineering: principles and meth- ods,”Data & knowledge engineering, vol. 25, no. 1-2, pp. 161–197, 1998

work page 1998
[64]

Fast algorithms for mining association rules,

R. Agrawal and R. Srikant, “Fast algorithms for mining association rules,” inProc. 20th int. conf. very large data bases, VLDB, vol. 1215, pp. 487–499, 1994

work page 1994
[65]

Swrl: A semantic web rule language combining owl and ruleml,

I. Horrocks, P. F. Patel-Schneider, H. Boley, S. Tabet, B. Grosof, and M. Dean, “Swrl: A semantic web rule language combining owl and ruleml,” vol. 21, pp. 2004–05, 2004

work page 2004
[66]

How neuro-symbolic ai helps understand scenes

Bosch Global, “How neuro-symbolic ai helps understand scenes.” Bosch Stories, April 2022. Accessed: 2025-07-10

work page 2022
[67]

Neurosymbolic ai: The 3rd wave,

A. S. d. Garcez and L. C. Lamb, “Neurosymbolic ai: The 3rd wave,”arXiv preprint arXiv:1908.06627, 2019

work page arXiv 1908
[68]

Survey of hallucination in natural language generation,

Z. Ji, N. Lee, R. Frieske, T. Yu, D. Su, Y. Xu, E. Ishii, Y. Bang, A. Madotto, and P. Fung, “Survey of hallucination in natural language generation,”ACM Computing Surveys, vol. 55, no. 12, pp. 1–38, 2023

work page 2023
[69]

Deepproblog: Neural probabilistic logic programming,

R. Manhaeve, S. Dumancic, A. Kimmig, T. Demeester, and L. De Raedt, “Deepproblog: Neural probabilistic logic programming,” inAdvances in Neural Information Processing Systems 31 (NeurIPS 2018)(S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa- Bianchi, and R. Garnett, eds.), pp. 3749–3759, 2018

work page 2018
[70]

Scallop: From prob- abilistic deductive databases to scalable differentiable reasoning,

J. Huang, Z. Li, B. Chen, K. Samel, M. Naik, L. Song, and X. Si, “Scallop: From prob- abilistic deductive databases to scalable differentiable reasoning,” inAdvances in Neural Information Processing Systems 34 (NeurIPS 2021), pp. 25134–25145, 2021

work page 2021
[71]

Reference Model for an Open Archival Information System (OAIS),

Consultative Committee for Space Data Systems, “Reference Model for an Open Archival Information System (OAIS),” Tech. Rep. CCSDS 650.0-M-3, CCSDS, 2024. Magenta Book

work page 2024
[72]

Dspace: an open source dynamic digital repository,

M. Smith, M. Barton, M. Bass, M. Branschofsky, G. McClellan, D. Stuve, R. Tansley, and J. H. Walker, “Dspace: an open source dynamic digital repository,”D-Lib magazine, vol. 9, no. 1, 2003

work page 2003
[73]

Cerif: The common european research infor- mation format model,

B. J”org, K. G. Jeffery, and A. Asserson, “Cerif: The common european research infor- mation format model,” inThe 6th International Conference on Theory and Practice of Electronic Governance, pp. 381–384, 2012. 30

work page 2012
[74]

Business semantics management sup- ports government innovation information portal,

G. Grootel, P. Spyns, S. Christiaens, and B. J¨ org, “Business semantics management sup- ports government innovation information portal,” vol. 5872, pp. 757–766, 11 2009

work page 2009
[75]

Emulation as a digital preservation strategy,

S. Granger, “Emulation as a digital preservation strategy,”D-Lib magazine, vol. 6, no. 10, pp. 1010–45, 2000

work page 2000
[76]

FAIR digital objects for science: From data pieces to actionable knowledge units,

K. De Smedt, D. Koureas, and P. Wittenburg, “FAIR digital objects for science: From data pieces to actionable knowledge units,”Publications, vol. 8, no. 2, p. 21, 2020

work page 2020
[77]

RO-Crate 1.1: A lightweight approach to research data packaging

P. Sefton, S. Soiland-Reyes, L. J. Castro, C. Goble, and RO-Crate Community, “RO-Crate 1.1: A lightweight approach to research data packaging.” Zenodo, Aug. 2021

work page 2021
[78]

The BagIt file packaging format (V1.0)

J. Kunze, J. Littman, L. Madden, J. Scancella, and C. Adams, “The BagIt file packaging format (V1.0).” RFC 8493, Oct. 2018

work page 2018
[79]

Common workflow language, v1.0

P. Amstutz, M. R. Crusoe, N. Tijani´ c, B. Chapman, J. Chilton, M. Heuer, A. Kartashov, D. Leehr, H. M´ enager, M. Nedeljkovich, M. Scales, S. Soiland-Reyes, and L. Stojanovic, “Common workflow language, v1.0.” figshare, July 2016

work page 2016
[80]

Nextflow enables reproducible computational workflows,

P. Di Tommaso, M. Chatzou, E. W. Floden, P. P. Barja, E. Palumbo, and C. Notredame, “Nextflow enables reproducible computational workflows,”Nature Biotechnology, vol. 35, no. 4, pp. 316–319, 2017. 31

work page 2017

[1] [1]

T. Hey, S. Tansley, and K. Tolle,The fourth paradigm: data-intensive scientific discovery, vol. 1. Microsoft research, 2009

work page 2009

[2] [2]

3d data management: Controlling data volume, velocity and variety,

D. Laney, “3d data management: Controlling data volume, velocity and variety,”META group research note, vol. 6, 2001

work page 2001

[3] [3]

Bridging data silos using big data integration,

S. Abraham, D. S. Ewen, and B. Burnett, “Bridging data silos using big data integration,” International Journal of Database Management Systems, vol. 11, no. 2/3, pp. 1–17, 2019

work page 2019

[4] [4]

1,500 scientists lift the lid on reproducibility,

M. Baker, “1,500 scientists lift the lid on reproducibility,”Nature News, vol. 533, no. 7604, pp. 452–454, 2016

work page 2016

[5] [5]

W. H. Inmon, C. Imhoff, and R. Sousa,Corporate information factory. John Wiley & Sons, 2002

work page 2002

[6] [6]

C. C. Aggarwal and C. Zhai,Mining text data. Springer Science & Business Media, 2012

work page 2012

[7] [7]

Review: Knowledge management and knowledge management systems: Conceptual foundations and research issues,

M. Alavi and D. E. Leidner, “Review: Knowledge management and knowledge management systems: Conceptual foundations and research issues,”MIS Quarterly, pp. 107–136, 2001

work page 2001

[8] [8]

Evolution of knowledge management,

B. Maˇ si´ c, S. Neˇ si´ c, D. Nikoli´ c, and M. Dˇ zeletovi´ c, “Evolution of knowledge management,” Industrija, vol. 45, no. 2, pp. 127–147, 2017. 26

work page 2017

[9] [9]

Knowledge management in organizations: examining the interaction between technologies, techniques, and people,

G. D. Bhatt, “Knowledge management in organizations: examining the interaction between technologies, techniques, and people,”Journal of knowledge management, vol. 5, no. 1, pp. 68–75, 2001

work page 2001

[10] [10]

Cognitive load during problem solving: Effects on learning,

J. Sweller, “Cognitive load during problem solving: Effects on learning,”Cognitive science, vol. 12, no. 2, pp. 257–285, 1988

work page 1988

[11] [11]

Digital ecosystems: Evolving service-oriented architectures,

G. Briscoe and P. De Wilde, “Digital ecosystems: Evolving service-oriented architectures,” in2008 Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence), pp. 2997–3004, IEEE, 2008

work page 2008

[12] [12]

Probabilistic machine learning and artificial intelligence,

Z. Ghahramani, “Probabilistic machine learning and artificial intelligence,”Nature, vol. 521, no. 7553, pp. 452–459, 2015

work page 2015

[13] [13]

Knowledge graphs,

A. Hogan, E. Blomqvist, M. Cochez, C. d’Amato, G. de Melo, C. Gutierrez, J. E. L. Gayo, S. Kirrane, S. Neumaier, A. Polleres,et al., “Knowledge graphs,”ACM Computing Surveys (CSUR), vol. 54, no. 4, pp. 1–37, 2021

work page 2021

[14] [14]

A content and knowledge management system supporting emotion detection from speech,

B. Vu, M. de Velasco, P. Mc Kevitt, R. Bond, R. Turkington, F. Booth, M. Mulvenna, M. Fuchs, and M. Hemmje, “A content and knowledge management system supporting emotion detection from speech,” inConversational Dialogue Systems for the Next Decade (L. F. D’Haro, Z. Callejas, and S. Nakamura, eds.), vol. 704 ofLecture Notes in Electrical Engineering, Sprin...

work page 2021

[15] [15]

Vu,A Taxonomy Management System Supporting Crowd-based Taxonomy Generation, Evolution, and Management

B. Vu,A Taxonomy Management System Supporting Crowd-based Taxonomy Generation, Evolution, and Management. PhD thesis, Hagen, 2020

work page 2020

[16] [16]

MPEG-7: The generic multimedia content description standard, part 1,

J. M. Martinez, “MPEG-7: The generic multimedia content description standard, part 1,” IEEE multimedia, vol. 9, no. 2, pp. 78–87, 2002

work page 2002

[17] [17]

Towards continuous professional monitoring of health status based on energetic balancing,

B. Vu, S. Bruchhaus, A. Moorhead, H. Zheng, L. D’Arco, L. Lynch, L. S. Sica, M. Pon- ticorvo, F. Diano, H. Afli, P. Joshi, A. Molinari, and M. Hemmje, “Towards continuous professional monitoring of health status based on energetic balancing,” in2022 IEEE Inter- national Workshop on Sport, Technology and Research (STAR), (Trento - Cavalese, Italy), pp. 72–77, 2022

work page 2022

[18] [18]

Supporting Mental Health in Young People: Integrated Methodology for cLinical dEcisions (SMILE)

European Commission, CORDIS, “Supporting Mental Health in Young People: Integrated Methodology for cLinical dEcisions (SMILE).” EU CORDIS Project Page, 2023. Grant agreement ID: 101080923

work page 2023

[19] [19]

Using Large Language Models for Microbiome Findings Reports in Laboratory Diagnos- tics,

T. Krause, L. Glau, P. Newels, T. Reis, M. X. Bornschlegl, M. Kramer, and M. L. Hemmje, “Using Large Language Models for Microbiome Findings Reports in Laboratory Diagnos- tics,”BioMedInformatics, vol. 4, no. 3, pp. 1979–2001, 2024

work page 1979

[20] [20]

The fair guiding principles for scientific data management and stewardship,

M. D. Wilkinson, M. Dumontier, I. J. Aalbersberg, G. Appleton, M. Axton, A. Baak, N. Blomberg, J.-W. Boiten, L. B. da Silva Santos, P. E. Bourne,et al., “The fair guiding principles for scientific data management and stewardship,”Scientific data, vol. 3, no. 1, pp. 1–9, 2016

work page 2016

[21] [21]

Preservation of digital records: issues and challenges in the digital era,

P. Jain and N. Mnjama, “Preservation of digital records: issues and challenges in the digital era,”Journal of the South African Society of Archivists, vol. 49, pp. 157–172, 2016

work page 2016

[22] [22]

Premis data dictionary for preservation metadata, version 3.0,

PREMIS Editorial Committee, “Premis data dictionary for preservation metadata, version 3.0,” 2015. 27

work page 2015

[23] [23]

The oais reference model: a study from a technical point of view,

F. d. J. Lavorato and R. d. C. Sant’Ana, “The oais reference model: a study from a technical point of view,” in2014 IEEE 8th International Conference on Application of Information and Communication Technologies (AICT), pp. 1–6, IEEE, 2014

work page 2014

[24] [24]

Prov-o: The prov ontology,

T. Lebo, S. Sahoo, and D. McGuinness, “Prov-o: The prov ontology,” 2013

work page 2013

[25] [25]

Cris/oar portals: a roadmap for making research information publicly visible,

P. De Castro, M. Casado, and M. Legido, “Cris/oar portals: a roadmap for making research information publicly visible,”Program, vol. 45, no. 4, pp. 415–433, 2011

work page 2011

[26] [26]

The irods data grid,

A. Rajasekar, R. Moore, M. Wan, and W. Schroeder, “The irods data grid,” inData Grids-The Next Generation of Data-Centric Collaborations, pp. 101–141, Springer, 2010

work page 2010

[27] [27]

Systems development in information systems research,

J. F. Nunamaker, M. Chen, and T. D. M. Purdin, “Systems development in information systems research,”Journal of Management Information Systems, vol. 7, no. 3, pp. 89–106, 1991

work page 1991

[28] [28]

Design science in information systems research,

A. R. Hevner, S. T. March, J. Park, and S. Ram, “Design science in information systems research,”MIS quarterly, pp. 75–105, 2004

work page 2004

[29] [29]

Creating impact through systematic programs of research,

J. F. Nunamaker, R. O. Briggs, N. W. Twyman, and J. S. Giboney, “Creating impact through systematic programs of research,”Journal of Management Information Systems, vol. 31, no. 3, pp. 13–41, 2014

work page 2014

[30] [30]

A design science research methodology for information systems research,

K. Peffers, T. Tuunanen, M. A. Rothenberger, and S. Chatterjee, “A design science research methodology for information systems research,”Journal of management information sys- tems, vol. 24, no. 3, pp. 45–77, 2007

work page 2007

[31] [31]

Towards trustworthiness in ai-based big data analysis,

M. X. Bornschlegl, “Towards trustworthiness in ai-based big data analysis,” 2024

work page 2024

[32] [32]

The evolution of knowledge management systems needs to be managed,

R. Lindgren and O. Henfridsson, “The evolution of knowledge management systems needs to be managed,”Knowledge Management Research & Practice, vol. 2, no. 1, pp. 56–64, 2004

work page 2004

[33] [33]

Nonaka and H

I. Nonaka and H. Takeuchi,The knowledge-creating company: How Japanese companies create the dynamics of innovation. Oxford university press, 1995

work page 1995

[34] [34]

Wenger, R

E. Wenger, R. A. McDermott, and W. Snyder,Cultivating communities of practice: A guide to managing knowledge. Harvard Business School Press, 2002

work page 2002

[35] [35]

The knowledge caf´ e—a knowledge management system and its application to hospitality and tourism,

N. Gronau, E. Weber, and A. Kienle, “The knowledge caf´ e—a knowledge management system and its application to hospitality and tourism,” inInformation and Communication Technologies in Tourism 2009, pp. 309–320, Springer, 2009

work page 2009

[36] [36]

Kms failure: a study of the contributing factors and cures,

A. Y. Chua and W.-Y. Lam, “Kms failure: a study of the contributing factors and cures,” Industrial Management & Data Systems, vol. 109, no. 1, pp. 64–79, 2009

work page 2009

[37] [37]

Knowledge management critical failure factors: a multi- case study,

P. Akhavan and A. Pezeshkan, “Knowledge management critical failure factors: a multi- case study,”VINE, vol. 44, pp. 22–41, 02 2014

work page 2014

[38] [38]

Social media, social acts, and knowledge sharing in or- ganizations: A case of a professional services firm,

M. H. Jarrahi and S. Sawyer, “Social media, social acts, and knowledge sharing in or- ganizations: A case of a professional services firm,”Journal of the American Society for Information Science and Technology, vol. 63, no. 10, pp. 2028–2040, 2012

work page 2028

[39] [39]

Absorptive capacity: A new perspective on learning and innovation,

W. M. Cohen and D. A. Levinthal, “Absorptive capacity: A new perspective on learning and innovation,”Administrative science quarterly, pp. 128–152, 1990

work page 1990

[40] [40]

Dikw pyramid

J. Winter, “Dikw pyramid.” Jeff Winter Insights, 2023. Accessed: 2025-07-10. 28

work page 2023

[41] [41]

The probabilistic relevance framework: Bm25 and be- yond,

S. Robertson and H. Zaragoza, “The probabilistic relevance framework: Bm25 and be- yond,”Foundations and Trends in Information Retrieval, vol. 3, no. 4, pp. 333–389, 2009

work page 2009

[42] [42]

Dense pas- sage retrieval for open-domain question answering,

V. Karpukhin, B. Oguz, S. Min, P. Lewis, W.-t. Yih, N. Goyal, and D. Chen, “Dense pas- sage retrieval for open-domain question answering,” inProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 6769–6781, 2020

work page 2020

[43] [43]

Processing pipelines

spaCy, “Processing pipelines.” spaCy Usage Documentation, 2024. Accessed: 2025-07-10

work page 2024

[44] [44]

Neural archi- tectures for named entity recognition,

G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, and C. Dyer, “Neural archi- tectures for named entity recognition,” inProceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 260–270, 2016

work page 2016

[45] [45]

Distant supervision for relation extraction without labeled data,

M. Mintz, S. Bills, R. Snow, and D. Jurafsky, “Distant supervision for relation extraction without labeled data,” inProceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp. 1003–1011, 2009

work page 2009

[46] [46]

Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension,

M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, and L. Zettlemoyer, “Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension,” inProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7871–7880, 2020

work page 2020

[47] [47]

Pegasus: Pre-training with extracted gap- sentences for abstractive summarization,

J. Zhang, Y. Zhao, M. Saleh, and P. J. Liu, “Pegasus: Pre-training with extracted gap- sentences for abstractive summarization,” inInternational Conference on Machine Learn- ing, pp. 11328–11339, PMLR, 2020

work page 2020

[48] [48]

Latent dirichlet allocation,

D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,”Journal of machine Learning research, vol. 3, no. Jan, pp. 993–1022, 2003

work page 2003

[49] [49]

What is topic modeling? Discuss key algorithms, working, applications, and the pros and cons

AIML.com, “What is topic modeling? Discuss key algorithms, working, applications, and the pros and cons.” Web page, 2024. Accessed: 2025-07-11

work page 2024

[50] [50]

Spectral

D. Angelov, “Top2vec: Distributed representations of topics,”arXiv preprint arXiv:2008.09470, 2020

work page arXiv 2008

[51] [51]

Reading wikipedia to answer open-domain questions,

D. Chen, A. Fisch, J. Weston, and A. Bordes, “Reading wikipedia to answer open-domain questions,” inProceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1870–1879, 2017

work page 2017

[52] [52]

Data fusion,

J. Bleiholder and F. Naumann, “Data fusion,”ACM Computing Surveys (CSUR), vol. 41, no. 1, pp. 1–41, 2008

work page 2008

[53] [53]

The semantic web,

T. Berners-Lee, J. Hendler, and O. Lassila, “The semantic web,”Scientific american, vol. 284, no. 5, pp. 34–43, 2001

work page 2001

[54] [54]

Rdf 1.1 concepts and abstract syntax,

R. Cyganiak, D. Wood, and M. Lanthaler, “Rdf 1.1 concepts and abstract syntax,”W3C Recommendation, 2014

work page 2014

[55] [55]

Owl web ontology language overview,

D. L. McGuinness, F. Van Harmelen,et al., “Owl web ontology language overview,”W3C recommendation, vol. 10, no. 2004-02-10, p. 2004, 2004

work page 2004

[56] [56]

The rq-tech methodology: A new paradigm for conceptualizing strategic enterprise architectures,

C. Hoyland, K. Adams, A. Tolk, and L. Xu, “The rq-tech methodology: A new paradigm for conceptualizing strategic enterprise architectures,”Journal of Management Analytics, vol. 1, pp. 55–77, 05 2014. 29

work page 2014

[57] [57]

Lambe,Organizing knowledge: taxonomies, knowledge and organizational effectiveness

P. Lambe,Organizing knowledge: taxonomies, knowledge and organizational effectiveness. Chandos Publishing, 2007

work page 2007

[58] [58]

Hedden,The accidental taxonomist

H. Hedden,The accidental taxonomist. Information Today, Inc., 2016

work page 2016

[59] [59]

Skos (simple knowledge organization system)

J. Busse, “Skos (simple knowledge organization system).” Web page for Deutscher Terminologie-Tag e.V., 2023. Accessed: 2025-07-10

work page 2023

[60] [60]

Thesauri, taxonomies, and ontologies: An etymological note,

A. Gilchrist, “Thesauri, taxonomies, and ontologies: An etymological note,”Journal of Documentation, vol. 59, no. 1, pp. 7–18, 2003

work page 2003

[61] [61]

Toward principles for the design of ontologies used for knowledge sharing?,

T. R. Gruber, “Toward principles for the design of ontologies used for knowledge sharing?,” International journal of human-computer studies, vol. 43, no. 5-6, pp. 907–928, 1995

work page 1995

[62] [62]

Owl 2 web ontology language rdf-based semantics (second edition)

B. Motik, P. F. Patel-Schneider, and B. Cuenca Grau, “Owl 2 web ontology language rdf-based semantics (second edition).” W3C Recommendation, December 2012. Accessed: 2025-07-10

work page 2012

[63] [63]

Knowledge engineering: principles and meth- ods,

R. Studer, V. R. Benjamins, and D. Fensel, “Knowledge engineering: principles and meth- ods,”Data & knowledge engineering, vol. 25, no. 1-2, pp. 161–197, 1998

work page 1998

[64] [64]

Fast algorithms for mining association rules,

R. Agrawal and R. Srikant, “Fast algorithms for mining association rules,” inProc. 20th int. conf. very large data bases, VLDB, vol. 1215, pp. 487–499, 1994

work page 1994

[65] [65]

Swrl: A semantic web rule language combining owl and ruleml,

I. Horrocks, P. F. Patel-Schneider, H. Boley, S. Tabet, B. Grosof, and M. Dean, “Swrl: A semantic web rule language combining owl and ruleml,” vol. 21, pp. 2004–05, 2004

work page 2004

[66] [66]

How neuro-symbolic ai helps understand scenes

Bosch Global, “How neuro-symbolic ai helps understand scenes.” Bosch Stories, April 2022. Accessed: 2025-07-10

work page 2022

[67] [67]

Neurosymbolic ai: The 3rd wave,

A. S. d. Garcez and L. C. Lamb, “Neurosymbolic ai: The 3rd wave,”arXiv preprint arXiv:1908.06627, 2019

work page arXiv 1908

[68] [68]

Survey of hallucination in natural language generation,

Z. Ji, N. Lee, R. Frieske, T. Yu, D. Su, Y. Xu, E. Ishii, Y. Bang, A. Madotto, and P. Fung, “Survey of hallucination in natural language generation,”ACM Computing Surveys, vol. 55, no. 12, pp. 1–38, 2023

work page 2023

[69] [69]

Deepproblog: Neural probabilistic logic programming,

R. Manhaeve, S. Dumancic, A. Kimmig, T. Demeester, and L. De Raedt, “Deepproblog: Neural probabilistic logic programming,” inAdvances in Neural Information Processing Systems 31 (NeurIPS 2018)(S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa- Bianchi, and R. Garnett, eds.), pp. 3749–3759, 2018

work page 2018

[70] [70]

Scallop: From prob- abilistic deductive databases to scalable differentiable reasoning,

J. Huang, Z. Li, B. Chen, K. Samel, M. Naik, L. Song, and X. Si, “Scallop: From prob- abilistic deductive databases to scalable differentiable reasoning,” inAdvances in Neural Information Processing Systems 34 (NeurIPS 2021), pp. 25134–25145, 2021

work page 2021

[71] [71]

Reference Model for an Open Archival Information System (OAIS),

Consultative Committee for Space Data Systems, “Reference Model for an Open Archival Information System (OAIS),” Tech. Rep. CCSDS 650.0-M-3, CCSDS, 2024. Magenta Book

work page 2024

[72] [72]

Dspace: an open source dynamic digital repository,

M. Smith, M. Barton, M. Bass, M. Branschofsky, G. McClellan, D. Stuve, R. Tansley, and J. H. Walker, “Dspace: an open source dynamic digital repository,”D-Lib magazine, vol. 9, no. 1, 2003

work page 2003

[73] [73]

Cerif: The common european research infor- mation format model,

B. J”org, K. G. Jeffery, and A. Asserson, “Cerif: The common european research infor- mation format model,” inThe 6th International Conference on Theory and Practice of Electronic Governance, pp. 381–384, 2012. 30

work page 2012

[74] [74]

Business semantics management sup- ports government innovation information portal,

G. Grootel, P. Spyns, S. Christiaens, and B. J¨ org, “Business semantics management sup- ports government innovation information portal,” vol. 5872, pp. 757–766, 11 2009

work page 2009

[75] [75]

Emulation as a digital preservation strategy,

S. Granger, “Emulation as a digital preservation strategy,”D-Lib magazine, vol. 6, no. 10, pp. 1010–45, 2000

work page 2000

[76] [76]

FAIR digital objects for science: From data pieces to actionable knowledge units,

K. De Smedt, D. Koureas, and P. Wittenburg, “FAIR digital objects for science: From data pieces to actionable knowledge units,”Publications, vol. 8, no. 2, p. 21, 2020

work page 2020

[77] [77]

RO-Crate 1.1: A lightweight approach to research data packaging

P. Sefton, S. Soiland-Reyes, L. J. Castro, C. Goble, and RO-Crate Community, “RO-Crate 1.1: A lightweight approach to research data packaging.” Zenodo, Aug. 2021

work page 2021

[78] [78]

The BagIt file packaging format (V1.0)

J. Kunze, J. Littman, L. Madden, J. Scancella, and C. Adams, “The BagIt file packaging format (V1.0).” RFC 8493, Oct. 2018

work page 2018

[79] [79]

Common workflow language, v1.0

P. Amstutz, M. R. Crusoe, N. Tijani´ c, B. Chapman, J. Chilton, M. Heuer, A. Kartashov, D. Leehr, H. M´ enager, M. Nedeljkovich, M. Scales, S. Soiland-Reyes, and L. Stojanovic, “Common workflow language, v1.0.” figshare, July 2016

work page 2016

[80] [80]

Nextflow enables reproducible computational workflows,

P. Di Tommaso, M. Chatzou, E. W. Floden, P. P. Barja, E. Palumbo, and C. Notredame, “Nextflow enables reproducible computational workflows,”Nature Biotechnology, vol. 35, no. 4, pp. 316–319, 2017. 31

work page 2017