Tacit Knowledge Extraction via Logic Augmented Generation and Active Inference

Aldo Gangemi; Alessio Giberti; Andrea Giovanni Nuzzolese; Francesco Poggi; Lorenzo Lamazzi; Mattia Torta; Vittorio Andrea Rocca

arxiv: 2605.07639 · v1 · submitted 2026-05-08 · 💻 cs.AI

Tacit Knowledge Extraction via Logic Augmented Generation and Active Inference

Lorenzo Lamazzi , Aldo Gangemi , Alessio Giberti , Andrea Giovanni Nuzzolese , Vittorio Andrea Rocca , Mattia Torta , Francesco Poggi This is my paper

Pith reviewed 2026-05-11 02:05 UTC · model grok-4.3

classification 💻 cs.AI

keywords tacit knowledgeknowledge graph constructionneuro-symbolic methodslogic augmented generationactive inferencemanufacturing proceduresontology groundingprocedural knowledge extraction

0 comments

The pith

A neuro-symbolic framework extracts tacit knowledge from procedural videos into ontology-grounded knowledge graphs with greater completeness.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to solve the problem of capturing unspoken human expertise in procedural tasks such as assembly and repair, which current systems miss because they rely only on explicit instructions. It proposes a framework that combines logic-augmented generation with an active-inference approach to build structured knowledge graphs grounded in ontologies. This matters because procedural domains depend on implicit assumptions, contextual judgments, and embodied skills that are rarely written down, limiting machine reuse and reasoning. The method is tested on instructional videos of assembly-like repair procedures as a stand-in for manufacturing. Evaluation results indicate gains in how much knowledge is captured and how accurately it fits semantic standards.

Core claim

The authors introduce a neuro-symbolic framework that combines Logic-Augmented Generation and an Active-Inference-inspired approach for ontology-grounded Knowledge Graph construction. They evaluate the approach in a knowledge transfer case study in manufacturing, using assembly-like repair procedures from instructional videos as a reproducible proxy domain. Results show that the proposed solution improves completeness and semantic quality, advancing neuro-symbolic knowledge engineering for industrial domains.

What carries the argument

The neuro-symbolic framework that integrates Logic-Augmented Generation with an Active-Inference-inspired approach to produce ontology-grounded knowledge graphs from procedural video descriptions.

If this is right

Tacit elements in procedural instructions can be turned into machine-queryable and reason-able knowledge graphs.
Knowledge engineering pipelines in industrial settings gain higher coverage of implicit constraints and judgments.
The extracted graphs support validation, reuse, and transfer of expert procedures across similar tasks.
Neuro-symbolic techniques become more practical for domains where execution relies on experience-based decisions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same extraction pipeline could be tested on other procedural domains such as medical protocols or maintenance routines.
Integration with sensor data from physical execution might further ground the graphs in embodied performance.
Automated extraction at scale could lower the cost of building maintainable knowledge bases in factories.

Load-bearing premise

That performance gains seen in a proxy setting of instructional videos for assembly repairs will carry over to actual manufacturing tasks without major domain-specific adjustments.

What would settle it

Apply the method to a real manufacturing assembly line, build the knowledge graph, and have domain experts independently create one for the same process, then compare both for measured completeness and semantic quality.

Figures

Figures reproduced from arXiv: 2605.07639 by Aldo Gangemi, Alessio Giberti, Andrea Giovanni Nuzzolese, Francesco Poggi, Lorenzo Lamazzi, Mattia Torta, Vittorio Andrea Rocca.

**Figure 1.** Figure 1: Overview of the proposed ontology. 3.2 Logic Augmented Generation pipeline The LAG approach is implemented through the pipeline illustrated in [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗

**Figure 2.** Figure 2: LAG pipeline. The final output is a Turtle file instantiating the ontology and representing the extracted procedural knowledge as a KG. The generated graph is then validated against the reference ontology through SHACL constraints. Shape-based validations verify that the extracted individuals comply with the expected ontological structure, while a set of global consistency checks detects major extraction… view at source ↗

**Figure 3.** Figure 3: LAG framework performance on tools and artifacts extraction. Precision, recall, and F1 score are reported for all evaluated models. Transcript-based models are shown with solid bars, video-based models with highlighted hatched bars. All models achieve perfect precision (1.00), while recall varies substantially across models and modalities, with transcript-based models showing the largest gap. stage suffer… view at source ↗

**Figure 4.** Figure 4: Active Inference impact on tool and artifact extraction. The upper and [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

**Figure 5.** Figure 5: Application of the proposed framework. (a) Electrospindle assembly by [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗

read the original abstract

Tacit knowledge plays a central role in human expertise, yet it remains difficult to capture, formalize, and reuse in machine-interpretable form. This challenge is especially relevant in procedural domains, where successful execution depends not only on explicit instructions, but also on implicit assumptions, contextual constraints, embodied skills, and experience-based judgments rarely documented. As a result, current knowledge engineering pipelines struggle to transform tacit and process-centric knowledge into formally specified, machine-interpretable representations that can be queried, validated, reasoned over, and reused. In this paper, we introduce a neuro-symbolic framework that combines Logic-Augmented Generation and an Active-Inference-inspired approach for ontology-grounded Knowledge Graph construction. We evaluate the approach in a knowledge transfer case study in manufacturing, using assembly-like repair procedures from instructional videos as a reproducible proxy domain. Results show that the proposed solution improves completeness and semantic quality, advancing neuro-symbolic knowledge engineering for industrial domains.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces a neuro-symbolic framework that integrates Logic-Augmented Generation with an Active-Inference-inspired mechanism to construct ontology-grounded knowledge graphs from tacit knowledge sources. It evaluates the approach via a case study in manufacturing that uses assembly-like repair procedures extracted from instructional videos as a reproducible proxy domain, reporting improvements in completeness and semantic quality relative to baseline knowledge engineering pipelines.

Significance. If the reported gains in completeness and semantic quality are robust and the pipeline generalizes beyond the proxy, the work could meaningfully advance neuro-symbolic methods for formalizing procedural tacit knowledge in industrial settings. The combination of logic augmentation and active inference for KG construction is a plausible direction, though the current evidence base is limited to a single proxy domain without demonstrated transfer.

major comments (2)

[Evaluation / Case Study] The evaluation (case study section) relies exclusively on instructional videos of assembly-like repair procedures as a proxy for manufacturing tacit knowledge. This domain omits embodied execution, real-time sensor feedback, and on-floor expert validation that define tacit knowledge in actual production environments. No transfer experiment, domain-adaptation ablation, or comparison against real manufacturing data is reported, so the claim that the results advance neuro-symbolic knowledge engineering for industrial domains rests on an unverified extrapolation.
[Abstract and Results] The abstract and results summary assert improvements in completeness and semantic quality, yet no concrete metrics, baselines, statistical tests, or inter-annotator agreement figures are supplied in the provided text. Without these details it is impossible to determine whether the observed gains are substantive or merely artifacts of the chosen proxy and evaluation protocol.

minor comments (1)

[Method] Notation for the Active-Inference component and the precise interface between Logic-Augmented Generation and the ontology grounding step should be defined more explicitly, ideally with a small diagram or pseudocode.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We appreciate the emphasis on evaluation rigor and reporting clarity. Below we address each major comment point by point, providing honest clarifications based on the current manuscript while outlining targeted revisions.

read point-by-point responses

Referee: [Evaluation / Case Study] The evaluation (case study section) relies exclusively on instructional videos of assembly-like repair procedures as a proxy for manufacturing tacit knowledge. This domain omits embodied execution, real-time sensor feedback, and on-floor expert validation that define tacit knowledge in actual production environments. No transfer experiment, domain-adaptation ablation, or comparison against real manufacturing data is reported, so the claim that the results advance neuro-symbolic knowledge engineering for industrial domains rests on an unverified extrapolation.

Authors: We agree that the evaluation is limited to a proxy domain of instructional videos for assembly-like repair procedures, which does not encompass embodied execution, real-time sensor feedback, or direct on-floor expert validation characteristic of live manufacturing environments. This proxy was deliberately chosen to support reproducibility and to sidestep proprietary data access barriers that typically hinder academic studies in industrial settings. The manuscript frames the contribution as an initial demonstration within this controlled proxy rather than a blanket claim of immediate transfer to all production contexts. In revision we will expand the Discussion and Limitations sections to explicitly delineate the proxy's boundaries, temper language around industrial advancement, and add a forward-looking subsection detailing planned transfer experiments, domain-adaptation ablations, and pathways for obtaining real manufacturing data (e.g., via industry partnerships). No new empirical experiments can be conducted for this revision cycle, but the textual clarifications will prevent over-extrapolation. revision: partial
Referee: [Abstract and Results] The abstract and results summary assert improvements in completeness and semantic quality, yet no concrete metrics, baselines, statistical tests, or inter-annotator agreement figures are supplied in the provided text. Without these details it is impossible to determine whether the observed gains are substantive or merely artifacts of the chosen proxy and evaluation protocol.

Authors: We acknowledge that the abstract and high-level summary in the reviewed version do not embed the specific quantitative details. The full evaluation section reports concrete metrics for completeness (recall of tacit elements against ground-truth annotations), semantic quality (ontology alignment precision and coherence scores), direct comparisons to baselines (standard LLM extraction pipelines and conventional knowledge-engineering workflows), and statistical tests (paired t-tests with p-values). Inter-annotator agreement was computed via Cohen's kappa on a double-annotated subset. To resolve the concern we will revise the abstract to include the key numerical results and ensure the Results section features a dedicated table summarizing all metrics, baselines, and agreement statistics with appropriate statistical reporting. This will make the substantive nature of the gains transparent. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical evaluation on proxy domain stands independently of inputs

full rationale

The paper introduces a neuro-symbolic framework (Logic-Augmented Generation combined with Active Inference for KG construction) and reports empirical improvements in completeness and semantic quality on a stated proxy domain of instructional videos for assembly-like procedures. No equations, parameter-fitting steps, or self-citations appear in the provided text that would reduce the reported results or the advancement claim to a definitional equivalence or forced prediction. The proxy-domain evaluation is presented as an independent test case rather than a renaming or self-referential derivation, leaving the central claims self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; all elements are described at a high level without derivation details.

pith-pipeline@v0.9.0 · 5477 in / 1043 out tokens · 41167 ms · 2026-05-11T02:05:21.122965+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

neuro-symbolic framework that combines Logic-Augmented Generation (LAG) and an Active-Inference-inspired approach for ontology-grounded Knowledge Graph construction
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Active-Inference-inspired strategy ... observation phase ... hidden state inference phase ... policy reconstruction phase ... affordance reasoning phase

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages

[1]

Harvard University Press, Cam- bridge, MA (1983)

Anderson, J.R.: The Architecture of Cognition. Harvard University Press, Cam- bridge, MA (1983)

work page 1983
[2]

Organizational Behavior and Human Decision Processes82(1), 150–169 (2000)

Argote, L., Ingram, P.: Knowledge transfer: A basis for competitive advantage in firms. Organizational Behavior and Human Decision Processes82(1), 150–169 (2000)

work page 2000
[3]

In: Hitzler, P., Gangemi, A., Janowicz, K., Krisnadhi, A., Presutti, V

Blomqvist, E., Hammar, K., Presutti, V.: Engineering Ontologies with Patterns - The eXtreme Design Methodology. In: Hitzler, P., Gangemi, A., Janowicz, K., Krisnadhi, A., Presutti, V. (eds.) Ontology Engineering with Ontology Design Patterns, Studies on the Semantic Web, vol. 25, pp. 23–50. IOS Press (2016). https://doi.org/10.3233/978-1-61499-676-7-23

work page doi:10.3233/978-1-61499-676-7-23 2016
[4]

Neuroscience & Biobehavioral Reviews68, 862– 879 (2016).https://doi.org/https://doi.org/10.1016/j.neubiorev.2016.06

Friston, K., FitzGerald, T., Rigoli, F., Schwartenbeck, P., O’Doherty, J., Pezzulo, G.: Active inference and learning. Neuroscience & Biobehavioral Reviews68, 862– 879 (2016).https://doi.org/https://doi.org/10.1016/j.neubiorev.2016.06. 022

work page doi:10.1016/j.neubiorev.2016.06 2016
[5]

Neural Computation29(1), 1 – 49 (2017).https: //doi.org/10.1162/NECO_a_00912

Friston, K., FitzGerald, T., Rigoli, F., Schwartenbeck, P., Pezzulo, G.: Active inference: A process theory. Neural Computation29(1), 1 – 49 (2017).https: //doi.org/10.1162/NECO_a_00912

work page doi:10.1162/neco_a_00912 2017
[6]

In: The Semantic Web – ISWC 2005, Lecture Notes in Computer Science, vol

Gangemi, A.: Ontology design patterns for semantic web content. In: The Semantic Web – ISWC 2005, Lecture Notes in Computer Science, vol. 3729, pp. 262–276. Springer (2005).https://doi.org/10.1007/11574620_21

work page doi:10.1007/11574620_21 2005
[7]

Journal of Web Se- mantics85(2025).https://doi.org/10.1016/j.websem.2024.100859

Gangemi, A., Nuzzolese, A.G.: Logic augmented generation. Journal of Web Se- mantics85(2025).https://doi.org/10.1016/j.websem.2024.100859

work page doi:10.1016/j.websem.2024.100859 2025
[8]

ACM Computing Surveys54(4) (2022).https://doi.org/10

Hogan, A., Blomqvist, E., Cochez, M., D’Amato, C., Melo, G.D., Gutierrez, C., Kirrane, S., Gayo, J.E.L., Navigli, R., Neumaier, S., Ngomo, A.C.N., Polleres, A., Rashid, S.M., Rula, A., Schmelzeisen, L., Sequeda, J., Staab, S., Zimmermann, A.: Knowledge graphs. ACM Computing Surveys54(4) (2022).https://doi.org/10. 1145/3447772

work page 2022
[9]

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial In- telligence and Lecture Notes in Bioinformatics)14266 LNCS, 247 – 265 (2023)

Mihindukulasooriya, N., Tiwari, S., Enguix, C.F., Lata, K.: Text2kgbench: A benchmark for ontology-driven knowledge graph generation from text. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial In- telligence and Lecture Notes in Bioinformatics)14266 LNCS, 247 – 265 (2023). https://doi.org/10.1007/978-3-031-47243-5_14 Taci...

work page doi:10.1007/978-3-031-47243-5_14 2023
[10]

Semantic Web8(3), 489 – 508 (2017).https://doi.org/10.3233/ SW-160218

Paulheim, H.: Knowledge graph refinement: A survey of approaches and evalua- tion methods. Semantic Web8(3), 489 – 508 (2017).https://doi.org/10.3233/ SW-160218

work page 2017
[11]

Doubleday and Company, Garden City, NY (1966)

Polanyi, M.: The Tacit Dimension. Doubleday and Company, Garden City, NY (1966)

work page 1966
[12]

In: International conference on machine learning

Radford,A.,Kim,J.W.,Xu,T.,Brockman,G.,McLeavey,C.,Sutskever,I.:Robust speech recognition via large-scale weak supervision. In: International conference on machine learning. pp. 28492–28518. PMLR (2023)

work page 2023
[13]

Hutchinson, London (1949)

Ryle, G.: The Concept of Mind. Hutchinson, London (1949)

work page 1949
[14]

Zhong, J

Zhong, L., Wu, J., Li, Q., Peng, H., Wu, X.: A comprehensive survey on automatic knowledge graph construction. ACM Computing Surveys56(4) (2024).https: //doi.org/10.1145/3618295

work page doi:10.1145/3618295 2024

[1] [1]

Harvard University Press, Cam- bridge, MA (1983)

Anderson, J.R.: The Architecture of Cognition. Harvard University Press, Cam- bridge, MA (1983)

work page 1983

[2] [2]

Organizational Behavior and Human Decision Processes82(1), 150–169 (2000)

Argote, L., Ingram, P.: Knowledge transfer: A basis for competitive advantage in firms. Organizational Behavior and Human Decision Processes82(1), 150–169 (2000)

work page 2000

[3] [3]

In: Hitzler, P., Gangemi, A., Janowicz, K., Krisnadhi, A., Presutti, V

Blomqvist, E., Hammar, K., Presutti, V.: Engineering Ontologies with Patterns - The eXtreme Design Methodology. In: Hitzler, P., Gangemi, A., Janowicz, K., Krisnadhi, A., Presutti, V. (eds.) Ontology Engineering with Ontology Design Patterns, Studies on the Semantic Web, vol. 25, pp. 23–50. IOS Press (2016). https://doi.org/10.3233/978-1-61499-676-7-23

work page doi:10.3233/978-1-61499-676-7-23 2016

[4] [4]

Neuroscience & Biobehavioral Reviews68, 862– 879 (2016).https://doi.org/https://doi.org/10.1016/j.neubiorev.2016.06

Friston, K., FitzGerald, T., Rigoli, F., Schwartenbeck, P., O’Doherty, J., Pezzulo, G.: Active inference and learning. Neuroscience & Biobehavioral Reviews68, 862– 879 (2016).https://doi.org/https://doi.org/10.1016/j.neubiorev.2016.06. 022

work page doi:10.1016/j.neubiorev.2016.06 2016

[5] [5]

Neural Computation29(1), 1 – 49 (2017).https: //doi.org/10.1162/NECO_a_00912

Friston, K., FitzGerald, T., Rigoli, F., Schwartenbeck, P., Pezzulo, G.: Active inference: A process theory. Neural Computation29(1), 1 – 49 (2017).https: //doi.org/10.1162/NECO_a_00912

work page doi:10.1162/neco_a_00912 2017

[6] [6]

In: The Semantic Web – ISWC 2005, Lecture Notes in Computer Science, vol

Gangemi, A.: Ontology design patterns for semantic web content. In: The Semantic Web – ISWC 2005, Lecture Notes in Computer Science, vol. 3729, pp. 262–276. Springer (2005).https://doi.org/10.1007/11574620_21

work page doi:10.1007/11574620_21 2005

[7] [7]

Journal of Web Se- mantics85(2025).https://doi.org/10.1016/j.websem.2024.100859

Gangemi, A., Nuzzolese, A.G.: Logic augmented generation. Journal of Web Se- mantics85(2025).https://doi.org/10.1016/j.websem.2024.100859

work page doi:10.1016/j.websem.2024.100859 2025

[8] [8]

ACM Computing Surveys54(4) (2022).https://doi.org/10

Hogan, A., Blomqvist, E., Cochez, M., D’Amato, C., Melo, G.D., Gutierrez, C., Kirrane, S., Gayo, J.E.L., Navigli, R., Neumaier, S., Ngomo, A.C.N., Polleres, A., Rashid, S.M., Rula, A., Schmelzeisen, L., Sequeda, J., Staab, S., Zimmermann, A.: Knowledge graphs. ACM Computing Surveys54(4) (2022).https://doi.org/10. 1145/3447772

work page 2022

[9] [9]

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial In- telligence and Lecture Notes in Bioinformatics)14266 LNCS, 247 – 265 (2023)

Mihindukulasooriya, N., Tiwari, S., Enguix, C.F., Lata, K.: Text2kgbench: A benchmark for ontology-driven knowledge graph generation from text. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial In- telligence and Lecture Notes in Bioinformatics)14266 LNCS, 247 – 265 (2023). https://doi.org/10.1007/978-3-031-47243-5_14 Taci...

work page doi:10.1007/978-3-031-47243-5_14 2023

[10] [10]

Semantic Web8(3), 489 – 508 (2017).https://doi.org/10.3233/ SW-160218

Paulheim, H.: Knowledge graph refinement: A survey of approaches and evalua- tion methods. Semantic Web8(3), 489 – 508 (2017).https://doi.org/10.3233/ SW-160218

work page 2017

[11] [11]

Doubleday and Company, Garden City, NY (1966)

Polanyi, M.: The Tacit Dimension. Doubleday and Company, Garden City, NY (1966)

work page 1966

[12] [12]

In: International conference on machine learning

Radford,A.,Kim,J.W.,Xu,T.,Brockman,G.,McLeavey,C.,Sutskever,I.:Robust speech recognition via large-scale weak supervision. In: International conference on machine learning. pp. 28492–28518. PMLR (2023)

work page 2023

[13] [13]

Hutchinson, London (1949)

Ryle, G.: The Concept of Mind. Hutchinson, London (1949)

work page 1949

[14] [14]

Zhong, J

Zhong, L., Wu, J., Li, Q., Peng, H., Wu, X.: A comprehensive survey on automatic knowledge graph construction. ACM Computing Surveys56(4) (2024).https: //doi.org/10.1145/3618295

work page doi:10.1145/3618295 2024