pith. sign in

arxiv: 2604.13042 · v1 · submitted 2026-02-27 · 💻 cs.DB · cs.AI· cs.SE

A Pythonic Functional Approach for Semantic Data Harmonisation in the ILIAD Project

Pith reviewed 2026-05-15 18:50 UTC · model grok-4.3

classification 💻 cs.DB cs.AIcs.SE
keywords semantic data harmonisationPython functionsOcean Information ModelRDF generationontology design patternsILIAD projectdata interoperabilitydigital twins
0
0 comments X

The pith

Python functions encode ocean ontology patterns so data scientists generate valid RDF through simple calls.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a layered set of Python libraries for semantic data harmonisation in the ILIAD project. Low-level functions expose basic OWL and RDF syntax, mid-level functions wrap the design patterns of the Ocean Information Model, and high-level functions coordinate domain tasks such as harmonising heterogeneous environmental data. This structure lets users avoid learning namespaces, IRIs, OWL constructors, and specialised tools like RML or OTTR while staying inside their existing Python environment. Feedback from ILIAD data scientists reports that the approach meets their practical requirements and increases their direct involvement in harmonisation work, as shown in the aquaculture pilot.

Core claim

The central claim is that a Pythonic functional approach, implemented as libraries organised at three levels of abstraction, encodes the design patterns of the Ocean Information Model so that correct RDF for data harmonisation can be produced by ordinary function calls rather than by writing specialised mapping syntax or mastering semantic web details.

What carries the argument

The three-tier Python function libraries that encode Ocean Information Model (OIM) ontology design patterns, with low-level functions exposing raw RDF/OWL syntax, mid-level functions packaging reusable patterns, and high-level functions orchestrating complete harmonisation tasks.

If this is right

  • Data scientists can perform harmonisation tasks without learning RML, OTTR, or semantic web syntax.
  • High-level functions compose mid-level pattern functions to handle entire domain workflows.
  • The generated RDF directly supports interoperable digital twins of the ocean.
  • The same library structure applies to the ILIAD aquaculture pilot and other environmental data sets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same multi-level library pattern could be replicated for other modular ontology families outside ocean data.
  • Integration into standard data-science notebooks might reduce transcription errors that arise when moving between mapping tools and Python analysis code.
  • Equivalence tests against existing RML mappings on the same datasets would provide an independent check of semantic fidelity.

Load-bearing premise

The Python functions must correctly and completely encode the OIM design patterns so that every generated RDF graph remains semantically valid for all intended use cases.

What would settle it

Apply the high-level functions to a sample ILIAD aquaculture dataset, then check whether the resulting RDF validates against the OIM ontologies and supports expected digital-twin queries; mismatch or validation failure would falsify the claim.

Figures

Figures reproduced from arXiv: 2604.13042 by Erik Johan Nystad, Francisco Mart\'in-Recuerda.

Figure 1
Figure 1. Figure 1: Data-ingestion and harmonisation pipeline in the ILIAD Aquacul￾ture pilot [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: ILIAD Dashboard displaying a sea temperature measurement at a specific time and location. OTTR is a framework for defining and instantiating OWL ontology design patterns, sup￾ported by a family of template languages and associated tools. Unlike RML, which specifies mappings between JSON fields and RDF predicates, OTTR takes a template-based approach [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Example of RML mappings for sea-temperature observations and results. of classes and functions. Even the use of Python libraries for interpreting OTTR templates, such as maplib [1], were not well-accepted by ILIAD data scientists. They demanded a pure Python approach where no new syntaxes or tools are required. In response, we designed a [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Example of OTTR template for sea-temperature observation. more functional Pythonic approach, where the modelling patterns themselves are expressed as ordinary Python functions without an intermediate template language. This approach is presented in the next section. 3. A Pythonic Template-Based Approach to Semantic Data Harmonisation Instead of describing patterns in a separate OTTR file or an RML mapping … view at source ↗
Figure 5
Figure 5. Figure 5: Diagram of a hierarchy of Python functions. As a result of the previous discussion, we defined five main design principles for our Pythonic approach for semantic data harmonisation. (1) Functional programming for encoding ontology templates. Ontology template functions are defined according to core functional programming principles, such as immutability, composability, and the use of small, reusable functi… view at source ↗
Figure 6
Figure 6. Figure 6: Example of a high-level harmonisation function for sea tempera￾ture. A possible implementation of the latter function is depicted in [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Example of a mid-level harmonisation function for sea tempera￾ture. def create_sosa_result( measured_value: float, observation_uri: URIRef, unit: URIRef, result_uri: URIRef, ) -> Graph: g = Graph() # Add the result triples to the graph g.add((result_uri, RDF.type, SOSA.Result)) g.add((result_uri, SOSA.hasValue, Literal(measured_value, datatype=XSD.float))) g.add((result_uri, SOSA.hasUnit, unit)) # Add as a… view at source ↗
Figure 8
Figure 8. Figure 8: Examples of low-level functions used in the sea temperature ex￾ample. functions means the implementation is not purely functional. This is a deliberate trade-off: RDFLib provides mature and well-tested RDF serialisation capabilities, and wrapping it in a strictly functional interface would have added complexity without clear practical benefit. The functional discipline is maintained at the mid- and high-le… view at source ↗
Figure 9
Figure 9. Figure 9: Data pipeline for generating QUDT Python functions using Jinja templates. approach greatly improves maintainability and scalability, ensuring that the library remains synchronised with evolving semantic standards at minimal human cost. 4. Related Work Semantic data harmonisation has been addressed through various approaches, each balancing expressiveness, automation, and usability differently. In Section 2… view at source ↗
read the original abstract

Semantic data harmonisation is a central requirement in the ILIAD project, where heterogeneous environmental data must be harmonised according to the Ocean Information Model (OIM), a modular family of ontologies for enabling the implementation of interoperable Digital Twins of the Ocean. Existing approaches to Semantic Data Harmonisation, such as RML and OTTR, offer valuable abstractions but require extensive knowledge of the technical intricacies of the OIM and the Semantic Web standards, including namespaces, IRIs, OWL constructors, and ontology design patterns. Furthermore, RML and OTTR oblige practitioners to learn specialised syntaxes and dedicated tooling. Data scientists in ILIAD have found these approaches overly cumbersome and have therefore expressed the need for a solution that abstracts away these technical details while remaining seamlessly integrated into their Python-based environments. To address these requirements, we have developed a Pythonic functional approach to semantic data harmonisation that enables users to produce correct RDF through simple function calls. The functions, structured as Python libraries, encode the design patterns of the OIM and are organised across multiple levels of abstraction. Low-level functions directly expose OWL and RDF syntax, mid-level functions encapsulate ontology design patterns, and high-level domain-specific functions orchestrate data harmonisation tasks by invoking mid-level functions. According to feedback from ILIAD data scientists, this approach satisfies their requirements and substantially enhances their ability to participate in harmonisation activities. In this paper, we present the details of our Pythonic functional approach to semantic data harmonisation and demonstrate its applicability within the ILIAD Aquaculture pilot.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents a layered Python library for semantic data harmonisation in the ILIAD project. Low-level functions expose OWL/RDF constructors, mid-level functions encapsulate OIM ontology design patterns, and high-level domain-specific functions allow data scientists to generate RDF via simple calls. The authors claim this abstracts away Semantic Web technicalities, integrates with Python workflows, and meets ILIAD requirements based on feedback from project data scientists, with a demonstration in the Aquaculture pilot.

Significance. If the functions correctly encode OIM patterns and produce valid RDF, the work could meaningfully lower barriers for domain experts to contribute to semantic interoperability in environmental data projects like Digital Twins of the Ocean, offering a practical alternative to RML/OTTR for Python-centric teams.

major comments (2)
  1. [Abstract and demonstration section] The central claim that the layered functions produce semantically valid RDF for all intended use cases rests solely on unspecified qualitative feedback from ILIAD data scientists. No explicit input-to-triple mappings, OWL reasoner results, SHACL validation outputs, test suites, or quantitative comparison of generated RDF against ground truth are reported anywhere in the manuscript, which is load-bearing for the correctness and applicability assertions.
  2. [Implementation and pilot demonstration] The libraries are described as encoding OIM design patterns but are not released, and the Aquaculture pilot demonstration provides no concrete examples of function calls, generated triples, or verification steps, leaving the mid- and high-level abstractions untested in the text.
minor comments (2)
  1. [Approach description] The paper would benefit from a table or figure explicitly mapping high-level function signatures to the OIM patterns they invoke.
  2. [Low-level functions] No discussion of edge cases, error handling, or namespace/IRI management in the low-level functions is provided.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address the major points below and agree to strengthen the demonstration with concrete examples and mappings in a revision.

read point-by-point responses
  1. Referee: [Abstract and demonstration section] The central claim that the layered functions produce semantically valid RDF for all intended use cases rests solely on unspecified qualitative feedback from ILIAD data scientists. No explicit input-to-triple mappings, OWL reasoner results, SHACL validation outputs, test suites, or quantitative comparison of generated RDF against ground truth are reported anywhere in the manuscript, which is load-bearing for the correctness and applicability assertions.

    Authors: We acknowledge that the manuscript supports its central claim primarily through qualitative feedback from ILIAD data scientists rather than explicit technical validations such as input-to-triple mappings or SHACL outputs. This feedback directly reflects the project's requirements and usability for domain experts. In revision we will add representative input-to-triple mappings and verification steps drawn from the Aquaculture pilot to make the correctness claims more concrete and testable. revision: yes

  2. Referee: [Implementation and pilot demonstration] The libraries are described as encoding OIM design patterns but are not released, and the Aquaculture pilot demonstration provides no concrete examples of function calls, generated triples, or verification steps, leaving the mid- and high-level abstractions untested in the text.

    Authors: The manuscript focuses on the methodological and functional approach rather than serving as a software release note; the libraries remain internal to the ILIAD project at present. We agree that the pilot section lacks concrete illustrations. In the revised version we will insert explicit examples of high-level and mid-level function calls, the resulting triples, and the verification steps performed against OIM patterns, allowing readers to evaluate the abstractions directly from the text. revision: yes

Circularity Check

0 steps flagged

No circularity: software design description without derivational steps

full rationale

The paper describes a multi-level Python library that encodes OIM ontology design patterns into functions for RDF generation. No equations, fitted parameters, predictions, or uniqueness theorems appear. Core claims rest on qualitative user feedback rather than any self-referential derivation or self-citation chain. The work is a self-contained implementation report whose correctness assertions are external to any internal reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that OIM patterns can be faithfully captured by Python functions and that user feedback is sufficient evidence of correctness and usability. No quantitative parameters or new entities are introduced.

axioms (1)
  • domain assumption OIM ontology design patterns can be encoded in Python functions without semantic loss or incorrect RDF output.
    Invoked when mid-level functions are said to encapsulate the patterns.
invented entities (1)
  • High-level domain-specific harmonisation functions no independent evidence
    purpose: Orchestrate calls to mid-level pattern functions for concrete ILIAD tasks
    New functions introduced by the authors

pith-pipeline@v0.9.0 · 5586 in / 1106 out tokens · 22802 ms · 2026-05-15T18:50:18.334778+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 2 internal anchors

  1. [1]

    maplib: interactive, literal RDF model mapping for industry.IEEE Access, 11:39990– 40005, 2023

    Magnus Bakken. maplib: interactive, literal RDF model mapping for industry.IEEE Access, 11:39990– 40005, 2023

  2. [2]

    Mapping between rdf and xml with xsparql.Journal on Data Semantics, 1(3):147–185, 2012

    Stefan Bischof, Stefan Decker, Thomas Krennwallner, Nuno Lopes, and Axel Polleres. Mapping between rdf and xml with xsparql.Journal on Data Semantics, 1(3):147–185, 2012

  3. [3]

    Jena: implementing the semantic web recommendations

    Jeremy J Carroll, Ian Dickinson, Chris Dollin, Dave Reynolds, Andy Seaborne, and Kevin Wilkinson. Jena: implementing the semantic web recommendations. InProceedings of the 13th international World Wide Web conference on Alternate track papers & posters, pages 74–83, 2004

  4. [4]

    Helio: a framework for implementing the life cycle of knowledge graphs.Semantic Web, 15(1):223–249, 2024

    Andrea Cimmino and Raúl García-Castro. Helio: a framework for implementing the life cycle of knowledge graphs.Semantic Web, 15(1):223–249, 2024

  5. [5]

    Opendrift v1

    Knut-Frode Dagestad, Johannes Röhrs, Øyvind Breivik, and Bjørn Ådlandsvik. Opendrift v1. 0: a generic framework for trajectory modelling.Geoscientific Model Development, 11(4):1405–1420, 2018

  6. [6]

    Dagster.https://dagster.io/, 2024

    Dagster Labs. Dagster.https://dagster.io/, 2024

  7. [7]

    Rml: A generic language for integrated rdf mappings of heterogeneous data

    Anastasia Dimou, Miel Vander Sande, Pieter Colpaert, Ruben Verborgh, Erik Mannens, and Rik Van de Walle. Rml: A generic language for integrated rdf mappings of heterogeneous data. InProceedings of the Workshop on Linked Data on the Web (LDOW), 2014

  8. [8]

    InHandbook on ontologies, pages 221–243

    AldoGangemiandValentina Presutti.Ontologydesignpatterns. InHandbook on ontologies, pages 221–243. Springer, 2009

  9. [9]

    Shexml: improving the usability of heterogeneous data mapping languages for first-time users.PeerJ Computer Science, 6:e318, 2020

    Herminio García-González, Iovka Boneva, Sławek Staworko, José Emilio Labra-Gayo, and Juan Manuel Cueva Lovelle. Shexml: improving the usability of heterogeneous data mapping languages for first-time users.PeerJ Computer Science, 6:e318, 2020

  10. [10]

    QUDT – quantities, units, dimensions and data types ontologies.https://github

    QUDT Working Group. QUDT – quantities, units, dimensions and data types ontologies.https://github. com/qudt/qudt-public-repo, 2025

  11. [11]

    Ontology design patterns in Webprotégé

    Karl Hammar. Ontology design patterns in Webprotégé. InISWC 2015 Posters & Demonstrations Track co-located with the 14th International Semantic Web Conference (ISWC-2015), Betlehem, USA, October 11, 2015. CEUR-WS, 2015

  12. [12]

    Declarative rules for linked data generation at your fingertips! InEuropean Semantic Web Conference, pages 213–217

    Pieter Heyvaert, Ben De Meester, Anastasia Dimou, and Ruben Verborgh. Declarative rules for linked data generation at your fingertips! InEuropean Semantic Web Conference, pages 213–217. Springer, 2018

  13. [13]

    RMLMapper-JAVA.https://github.com/RMLio/rmlmapper-java, 2023

    Pieter Heyvaert, Dylan Van Assche, Ben De Meester, Gerald Haesendonck, Els de Vleeschauwer, and Sitt Min Oo. RMLMapper-JAVA.https://github.com/RMLio/rmlmapper-java, 2023

  14. [14]

    IOS Press, 2016

    Pascal Hitzler, Aldo Gangemi, Krzysztof Janowicz, Adila Krisnadhi, and Valentina Presutti.Ontology engineering with ontology design patterns: foundations and applications, volume 25. IOS Press, 2016

  15. [15]

    The OWL API: A Java API for OWL ontologies.Semantic web, 2(1):11–21, 2011

    Matthew Horridge and Sean Bechhofer. The OWL API: A Java API for OWL ontologies.Semantic web, 2(1):11–21, 2011

  16. [16]

    A PYTHONIC FUNCTIONAL APPROACH FOR SEMANTIC DATA HARMONISATION 17

    PaulHudak.Conception, evolution, andapplicationoffunctionalprogramminglanguages.ACM Computing Surveys (CSUR), 21(3):359–411, 1989. A PYTHONIC FUNCTIONAL APPROACH FOR SEMANTIC DATA HARMONISATION 17

  17. [17]

    Sdm-rdfizer: An rml interpreter for the efficient creation of rdf knowledge graphs

    Enrique Iglesias, Samaneh Jozashoori, David Chaves-Fraga, Diego Collarana, and Maria-Esther Vidal. Sdm-rdfizer: An rml interpreter for the efficient creation of rdf knowledge graphs. InProceedings of the 29th ACM international conference on Information & Knowledge Management, pages 3039–3046, 2020

  18. [18]

    Iliad: Digital twin of the ocean.https://ocean-twin.eu/, 2025

    ILIAD Consortium. Iliad: Digital twin of the ocean.https://ocean-twin.eu/, 2025. Project website. Accessed: 2025-02-22

  19. [19]

    Ocean Information Model (OIM).https://github.com/ILIAD-ocean-twin/OIM, 2025

    ILIAD Consortium. Ocean Information Model (OIM).https://github.com/ILIAD-ocean-twin/OIM, 2025

  20. [20]

    Kgtk: a toolkit for large knowledge graph manipulation and analysis

    FilipIlievski, DanielGarijo, HansChalupsky, NarenTejaDivvala, YixiangYao, CraigRogers, RongpengLi, Jun Liu, Amandeep Singh, Daniel Schwabe, et al. Kgtk: a toolkit for large knowledge graph manipulation and analysis. InInternational Semantic Web Conference, pages 278–293. Springer, 2020

  21. [21]

    Robot: a tool for automating ontology workflows.BMC bioinformatics, 20(1):407, 2019

    Rebecca C Jackson, James P Balhoff, Eric Douglass, Nomi L Harris, Christopher J Mungall, and James A Overton. Robot: a tool for automating ontology workflows.BMC bioinformatics, 20(1):407, 2019

  22. [22]

    SOSA: A lightweight ontology for sensors, observations, samples, and actuators.Journal of Web Semantics, 56:1–10, 2019

    Krzysztof Janowicz, Armin Haller, Simon JD Cox, Danh Le Phuoc, and Maxime Lefrançois. SOSA: A lightweight ontology for sensors, observations, samples, and actuators.Journal of Web Semantics, 56:1–10, 2019

  23. [23]

    Object-oriented simulation of systems with sophisticated control.Inter- national Journal of General Systems, 40(3):313–343, 2011

    Eugene Kindler and Ivan Krivy. Object-oriented simulation of systems with sophisticated control.Inter- national Journal of General Systems, 40(3):313–343, 2011

  24. [24]

    Shapes Constraint Language (SHACL).https://www.w3

    Holger Knublauch and Dimitris Kontokostas. Shapes Constraint Language (SHACL).https://www.w3. org/TR/shacl/, 2017. W3C Recommendation

  25. [25]

    RDFLib.https://github.com/RDFLib/rdflib, 2025

    Daniel Krech et al. RDFLib.https://github.com/RDFLib/rdflib, 2025

  26. [26]

    Generic ontologies and generic ontology design patterns

    Bernd Krieg-Brückner and Till Mossakowski. Generic ontologies and generic ontology design patterns. In WOP@ ISWC, 2017

  27. [27]

    Owlready2.https://github.com/pwin/owlready2, 2024

    Jean-Baptiste Lamy. Owlready2.https://github.com/pwin/owlready2, 2024

  28. [28]

    Sparql-generate: A sparql extension for gener- ating rdf from heterogeneous data.Proceedings of the 14th Extended Semantic Web Conference (ESWC), 2017

    Maxime Lefrançois, Raphaël Troncy, and Fabien Gandon. Sparql-generate: A sparql extension for gener- ating rdf from heterogeneous data.Proceedings of the 14th Extended Semantic Web Conference (ESWC), 2017

  29. [29]

    The Semantic Web takes Wing: Programming Ontologies with Tawny-OWL

    Phillip Lord. The semantic web takes wing: Programming ontologies with tawny-owl.arXiv preprint arXiv:1303.0213, 2013

  30. [30]

    Maali, E

    F. Maali, E. Simperl, et al. Data Catalog Vocabulary (DCAT) – Version 2.https://www.w3.org/TR/ vocab-dcat-2/, 2019. W3C Recommendation

  31. [31]

    pyOTTR.https://github.com/Callidon/pyOTTR

    Thomas Minier et al. pyOTTR.https://github.com/Callidon/pyOTTR

  32. [32]

    The distributed ontology, modeling and specification language–dol

    Till Mossakowski, Mihai Codescu, Fabian Neuhaus, and Oliver Kutz. The distributed ontology, modeling and specification language–dol. InThe Road to Universal Logic: Festschrift for the 50th Birthday of Jean- Yves Béziau Volume II, pages 489–520. Springer, 2015

  33. [33]

    Barentswatch API.https://www.barentswatch.no

    Norwegian Coastal Administration. Barentswatch API.https://www.barentswatch.no

  34. [34]

    Morph-RDB.https://github.com/oeg-upm/morph-rdb, 2020

    Ontology Engineering Group, UPM. Morph-RDB.https://github.com/oeg-upm/morph-rdb, 2020

  35. [35]

    Ontop.https://github.com/ontop/ontop, 2024

    Ontop Team. Ontop.https://github.com/ontop/ontop, 2024

  36. [36]

    Jinja documentation.https://jinja.palletsprojects.com/, 2024

    Pallets Projects. Jinja documentation.https://jinja.palletsprojects.com/, 2024. Accessed: 2025-02-03

  37. [37]

    Shape expressions: an rdf validation and transformation language

    Eric Prud’hommeaux, Jose Emilio Labra Gayo, and Harold Solbrig. Shape expressions: an rdf validation and transformation language. InProceedings of the 10th International Conference on Semantic Systems, pages 32–40, 2014

  38. [38]

    Comodide–the comprehensive modular ontology engineering ide

    Cogan Shimizu and Karl Hammar. Comodide–the comprehensive modular ontology engineering ide. In ISWC 2019 Satellite Tracks (Posters & Demonstrations, Industry, and Outrageous Ideas) co-located with 18th International Semantic Web Conference (ISWC 2019) Auckland, New Zealand, October 26-30, 2019., volume 2456, pages 249–252. CEUR-WS, 2019

  39. [39]

    MODL: A Modular Ontology Design Library

    Cogan Shimizu, Quinn Hirt, and Pascal Hitzler. Modl: A modular ontology design library.arXiv preprint arXiv:1904.05405, 2019

  40. [40]

    The reasonable ontology templates framework.Trans- actions on Graph Data and Knowledge, 2(2):5–1, 2024

    Martin Georg Skjæveland and Leif Harald Karlsen. The reasonable ontology templates framework.Trans- actions on Graph Data and Knowledge, 2(2):5–1, 2024

  41. [41]

    TARQL: SPARQL for Tables.https://tarql.github.io/, 2019

    Tarql Contributors. TARQL: SPARQL for Tables.https://tarql.github.io/, 2019

  42. [42]

    W3C Semantic Web standards.https://www.w3.org/2001/sw/wiki/Main_ Page

    Word Wide Web Consortium. W3C Semantic Web standards.https://www.w3.org/2001/sw/wiki/Main_ Page