pith. sign in

arxiv: 1907.01627 · v1 · pith:JKHLZGCNnew · submitted 2019-07-02 · 💻 cs.DB · cs.AI

Rule Applicability on RDF Triplestore Schemas

Pith reviewed 2026-05-25 10:04 UTC · model grok-4.3

classification 💻 cs.DB cs.AI
keywords RDFtriplestore schemarule applicabilityquery rewritingcritical instanceIoThealth policiesdata schema
0
0 comments X

The pith

A query rewriting method decides rule applicability on RDF triplestore schemas and produces output schemas.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper defines a triplestore schema as a way to model collections of RDF graphs. It then shows how to determine whether a given set of rules applies to graphs that fit the schema and what the schema of the graphs after rule application would look like. Two methods are developed for this task: one that builds a canonical critical instance of the schema, and a new method that rewrites the rules into queries. The rewriting method is shown through theory, complexity analysis, and experiments to be more efficient while remaining correct.

Core claim

Given a triplestore schema and a set of rules, rule applicability can be decided and an output schema produced by rewriting the rules into queries that check for matches within the schema constraints, avoiding the need to construct a full canonical instance of the schema.

What carries the argument

The query rewriting technique that transforms rule application checks into schema queries to determine applicability without materializing data instances.

Load-bearing premise

The triplestore schema is assumed to exactly capture the family of RDF graphs to which the rules might be applied.

What would settle it

A counterexample schema and rule set where the rewriting method and the critical instance method produce different applicability results or different output schemas.

Figures

Figures reproduced from arXiv: 1907.01627 by George Konstantinidis, Murat \c{S}ensoy, Paolo Pareti, Timothy J. Norman.

Figure 2
Figure 2. Figure 2: Average time to compute 20 schema consequences using [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 1
Figure 1. Figure 1: Average time to compute 20 schema consequences using [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
read the original abstract

Rule-based systems play a critical role in health and safety, where policies created by experts are usually formalised as rules. When dealing with increasingly large and dynamic sources of data, as in the case of Internet of Things (IoT) applications, it becomes important not only to efficiently apply rules, but also to reason about their applicability on datasets confined by a certain schema. In this paper we define the notion of a triplestore schema which models a set of RDF graphs. Given a set of rules and such a schema as input we propose a method to determine rule applicability and produce output schemas. Output schemas model the graphs that would be obtained by running the rules on the graph models of the input schema. We present two approaches: one based on computing a canonical (critical) instance of the schema, and a novel approach based on query rewriting. We provide theoretical, complexity and evaluation results that show the superior efficiency of our rewriting approach.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper defines a triplestore schema as a structure that models a set of RDF graphs. Given a set of rules and such a schema, it proposes two methods to decide rule applicability and produce an output schema modeling the graphs after rule application: (1) computation of a canonical (critical) instance of the schema, and (2) a novel query-rewriting approach. Theoretical results, complexity analysis, and experimental evaluation are claimed to establish that the rewriting method is more efficient.

Significance. If the soundness and complexity claims hold, the work would be significant for rule-based policy systems in IoT and safety-critical domains by enabling applicability checks and schema evolution without materializing full instances. The dual-method presentation (canonical instance plus rewriting) and the explicit comparison of their efficiency constitute a clear contribution if the derivations are gap-free.

major comments (2)
  1. [Abstract, §1] Abstract and §1: the central efficiency claim rests on the rewriting approach being sound and complete for the given schema; however, the manuscript provides no mechanism, proof obligation, or discussion showing how a schema designer can guarantee that the input schema exactly captures (neither over- nor under-approximates) the intended family of RDF graphs. Any mismatch renders both the applicability decision and the produced output schema incorrect for the real data.
  2. [§4] §4 (or the section presenting the rewriting reduction): the claim that the rewriting approach yields a parameter-free decision procedure is load-bearing for the superiority result, yet the reduction appears to inherit the same schema-fidelity precondition as the canonical-instance method; without an explicit statement of the preconditions under which the rewriting is guaranteed to be both sound and complete, the complexity comparison cannot be evaluated.
minor comments (2)
  1. [§2] Notation for the triplestore schema (e.g., how constraints on predicates and objects are encoded) should be introduced with a small running example before the formal definitions.
  2. [Evaluation] The experimental section should report the precise schema sizes, rule counts, and query-rewriting times rather than aggregated speed-up factors alone.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address the concerns about schema fidelity and the preconditions for the rewriting method point by point.

read point-by-point responses
  1. Referee: [Abstract, §1] Abstract and §1: the central efficiency claim rests on the rewriting approach being sound and complete for the given schema; however, the manuscript provides no mechanism, proof obligation, or discussion showing how a schema designer can guarantee that the input schema exactly captures (neither over- nor under-approximates) the intended family of RDF graphs. Any mismatch renders both the applicability decision and the produced output schema incorrect for the real data.

    Authors: The triplestore schema is defined as an input that models a set of RDF graphs, and both methods (canonical instance and rewriting) are proven sound and complete with respect to this schema. The paper focuses on the problem of rule applicability checking given such a schema, rather than on how to construct or verify the schema itself. This is analogous to how query rewriting in databases assumes a given schema. We agree that a discussion of this modeling assumption would strengthen the paper and will add a paragraph in §1 clarifying that the results hold relative to the provided schema, and that fidelity to real-world data is the designer's responsibility. No automated mechanism is provided because it would require access to the actual instance or additional oracles, which is outside the paper's scope. revision: partial

  2. Referee: [§4] §4 (or the section presenting the rewriting reduction): the claim that the rewriting approach yields a parameter-free decision procedure is load-bearing for the superiority result, yet the reduction appears to inherit the same schema-fidelity precondition as the canonical-instance method; without an explicit statement of the preconditions under which the rewriting is guaranteed to be both sound and complete, the complexity comparison cannot be evaluated.

    Authors: We will revise the presentation in §4 to explicitly state the preconditions: the input must be a valid triplestore schema (as defined in §3), and the rules are standard Datalog-style rules over RDF. Under these conditions, the rewriting is sound and complete, and parameter-free in the sense that it does not require choosing additional parameters such as instance sizes or approximations. The complexity results compare the two methods under identical assumptions on the input. We believe this clarification will address the concern without altering the core claims. revision: yes

Circularity Check

0 steps flagged

No circularity: algorithmic constructions are self-contained

full rationale

The paper defines triplestore schemas as modeling sets of RDF graphs and presents two direct algorithmic methods (canonical instance and query rewriting) to decide rule applicability and produce output schemas. These are supported by theoretical results, complexity analysis, and evaluation, with no reduction of any central claim to a fitted parameter, self-citation chain, or definitional equivalence. The input schema assumption is stated explicitly but does not create a circular derivation; the methods operate on the given schema without smuggling in the target result by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The paper introduces the triplestore schema as a modelling device and relies on standard notions of RDF entailment and rule application; no numerical free parameters are mentioned.

axioms (2)
  • domain assumption A triplestore schema can be defined that models a set of RDF graphs
    Central modelling step stated in the abstract.
  • domain assumption Rule applicability can be decided by testing on a canonical instance or by query rewriting
    The two approaches presuppose that these reductions preserve semantics.
invented entities (1)
  • triplestore schema no independent evidence
    purpose: Compact representation of an entire family of RDF graphs for rule applicability checking
    New modelling construct introduced to stand for sets of graphs.

pith-pipeline@v0.9.0 · 5696 in / 1437 out tokens · 40330 ms · 2026-05-25T10:04:17.026330+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. A Blueprint for AI-Driven Software Quality: Integrating LLMs with Established Standards

    cs.SE 2025-05 unverdicted novelty 3.0

    Survey mapping LLM applications in software quality assurance to established standards including ISO/IEC 12207, ISO 25010, CMMI, and TMM, with case studies, challenges, and future directions.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages · cited by 1 Pith paper

  1. [1]

    Addison-Wesley Longman Publishing Co., Inc

    Abiteboul, S., Hull, R., Vianu, V .: Foundations of databases: the logical level. Addison-Wesley Longman Publishing Co., Inc. (1995)

  2. [2]

    CoRR abs/1801.09061 (2018), http://arxiv

    Bassiliades, N.: SWRL2SPIN: A tool for transforming SWRL rule bases in OWL ontologies to object-oriented SPIN rules. CoRR abs/1801.09061 (2018), http://arxiv. org/abs/1801.09061

  3. [3]

    Data Knowl

    Beimel, D., Peleg, M.: Editorial: Using owl and swrl to represent and reason with situation-based access control policies. Data Knowl. Eng. 70(6), 596–615 (2011)

  4. [4]

    In: Proceedings of the 36th ACM SIGMOD- SIGACT-SIGAI Symposium on Principles of Database Systems

    Benedikt, M., Konstantinidis, G., Mecca, G., Motik, B., Papotti, P., Santoro, D., Tsamoura, E.: Benchmarking the chase. In: Proceedings of the 36th ACM SIGMOD- SIGACT-SIGAI Symposium on Principles of Database Systems. pp. 37–52. ACM (2017)

  5. [5]

    Semantic Web 8(3), 471–487 (2017)

    Calvanese, D., Cogrel, B., Komla-Ebri, S., Kontchakov, R., Lanti, D., Rezk, M., Rodriguez-Muro, M., Xiao, G.: Ontop: Answering SPARQL queries over relational databases. Semantic Web 8(3), 471–487 (2017)

  6. [6]

    IEEE Transactions on Knowledge and Data Engineering 1(1), 146–166 (1989)

    Ceri, S., Gottlob, G., Tanca, L.: What you always wanted to know about datalog (and never dared to ask). IEEE Transactions on Knowledge and Data Engineering 1(1), 146–166 (1989)

  7. [7]

    2016 IEEE Int

    Chaochaisit, W., Sakamura, K., Koshizuka, N., Bessho, M.: CSV-X: A Linked Data Enabled Schema Lan- guage, Model, and Processing Engine for Non-Uniform CSV. 2016 IEEE Int. Conf. on Internet of Things (iThings) and IEEE Green Computing and Communica- tions (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (Smart- Data) pp...

  8. [8]

    W3C Recom- mendation, W3C (2014), http://www.w3.org/TR/2014/ REC-rdf11-concepts-20140225/

    Cyganiak, R., Wood, D., Markus Lanthaler, G.: RDF 1.1 Concepts and Abstract Syntax. W3C Recom- mendation, W3C (2014), http://www.w3.org/TR/2014/ REC-rdf11-concepts-20140225/

  9. [9]

    ACM Computing Surveys (CSUR) 33(3), 374– 425 (2001)

    Dantsin, E., Eiter, T., Gottlob, G., V oronkov, A.: Complexity and expressive power of logic program- ming. ACM Computing Surveys (CSUR) 33(3), 374– 425 (2001)

  10. [10]

    International Journal of Prod- uct Lifecycle Management 7(1), 75–93 (2014), pMID: 65458

    Fortineau, V ., Fiorentini, X., Paviot, T., Louis-Sidney, L., Lamouri, S.: Expressing formal rules within ontology-based models using SWRL: an application to the nuclear industry. International Journal of Prod- uct Lifecycle Management 7(1), 75–93 (2014), pMID: 65458

  11. [11]

    In: International Semantic Web Conference

    Glimm, B., Kazakov, Y ., Liebig, T., Tran, T.K., Vialard, V .: Abstraction refinement for ontology materialization. In: International Semantic Web Conference. pp. 180–

  12. [12]

    In: 26th International Conference on World Wide Web Companion

    Gyrard, A., Serrano, M., Jares, J.B., Datta, S.K., Ali, M.I.: Sensor-based Linked Open Rules (S-LOR): An Automated Rule Discovery Approach for IoT Applica- tions and Its Use in Smart Cities. In: 26th International Conference on World Wide Web Companion. pp. 1153–

  13. [13]

    W3C Recommendation, W3C (2013), https://www.w3

    Harris, S., Seaborne, A.: SPARQL 1.1 Query Language. W3C Recommendation, W3C (2013), https://www.w3. org/TR/sparql11-query/

  14. [14]

    In: Rules and Reasoning

    Hecham, A., Bisquert, P., Croitoru, M.: On the Chase for All Provenance Paths with Existential Rules. In: Rules and Reasoning. pp. 135–150. Springer Interna- tional Publishing (2017)

  15. [15]

    W3C Member Submission, W3C (2004), https://www.w3.org/Submission/SWRL/

    Horrocks, I., Patel-Schneider, P.F., Boley, H., Tabet, S., Grosof, B., Dean, M., et al.: SWRL: A se- mantic web rule language combining OWL and RuleML. W3C Member Submission, W3C (2004), https://www.w3.org/Submission/SWRL/

  16. [16]

    6331 on Occupational Health and Safety (2012), https://www.ilo.org/dyn/natlex/natlex4.detail?p lang= fr&p isn=92011

    International Labour Organization: Act No. 6331 on Occupational Health and Safety (2012), https://www.ilo.org/dyn/natlex/natlex4.detail?p lang= fr&p isn=92011

  17. [17]

    W3C Member Submission, W3C (2011), http://www.w3.org/Submission/spin-sparql/

    Knublauch, H.: SPIN - SPARQL Syntax. W3C Member Submission, W3C (2011), http://www.w3.org/Submission/spin-sparql/

  18. [18]

    W3C Recommendation, W3C (2017), https://www.w3.org/TR/shacl/

    Knublauch, H., Kontokostas, D.: Shapes constraint lan- guage (SHACL). W3C Recommendation, W3C (2017), https://www.w3.org/TR/shacl/

  19. [19]

    W3C Recommendation, W3C (2017), https:// www.w3.org/TR/2017/REC-vocab-ssn-20171019/

    Lefranc ¸ois, M., Cox, S., Taylor, K., Haller, A., Janow- icz, K., Phuoc, D.L.: Semantic Sensor Network On- tology. W3C Recommendation, W3C (2017), https:// www.w3.org/TR/2017/REC-vocab-ssn-20171019/

  20. [20]

    In: Proc

    Marnette, B.: Generalized schema-mappings: from ter- mination to tractability. In: Proc. of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symp. on Principles of database systems. pp. 13–22. ACM (2009)

  21. [21]

    IEEE Communications Surveys Tu- torials 16(1), 414–454 (2014)

    Perera, C., Zaslavsky, A., Christen, P., Georgakopou- los, D.: Context Aware Computing for The Internet of Things: A Survey. IEEE Communications Surveys Tu- torials 16(1), 414–454 (2014)

  22. [22]

    ACM Transactions on Database Systems 34(3), 16:1–16:45 (2009)

    P´erez, J., Arenas, M., Gutierrez, C.: Semantics and Complexity of SPARQL. ACM Transactions on Database Systems 34(3), 16:1–16:45 (2009)

  23. [23]

    In: Proceedings of the 10th Inter- national Conference on Semantic Systems

    Prud’hommeaux, E., Labra Gayo, J.E., Solbrig, H.: Shape Expressions: An RDF Validation and Transfor- mation Language. In: Proceedings of the 10th Inter- national Conference on Semantic Systems. pp. 32–40. SEM ’14, ACM (2014)

  24. [24]

    Building Blocks for IoT Analytics p

    Serrano, M., Gyrard, A.: A Review of Tools for IoT Se- mantics and Data Streaming Analytics. Building Blocks for IoT Analytics p. 139 (2016)

  25. [25]

    In: Proceedings of the Fourteenth Annual ACM Symposium on Theory of Computing

    Vardi, M.Y .: The Complexity of Relational Query Lan- guages (Extended Abstract). In: Proceedings of the Fourteenth Annual ACM Symposium on Theory of Computing. pp. 137–146. STOC ’82, ACM, New York, NY , USA (1982) A Proof of Theorem 1 Theorem 1 For all rules r : A→ C and triplestore schemas S, I(score(S, r)) = I(critical(S, r)). Proof: Since the mappings...

  26. [26]

    Moreover, since tS A must be a model oftI A, it follows thattS A[i] = tI A[i] and therefore tS A[i] = tI A[i]

    If tA[i] and tS A[i] are both constants, then sincetI A∈ m(A), it must be true that tA[i] = tI A[i]. Moreover, since tS A must be a model oftI A, it follows thattS A[i] = tI A[i] and therefore tS A[i] = tI A[i]. We set element tq[i] to be the constant tA[i], which matches the constant tS A[i]

  27. [27]

    We set element tq[i] to be the constant :λ, which matches the constant tS A[i]

    If tA[i] is a constant but tS A[i] is a variable, then we know that tS A[i] = :λ. We set element tq[i] to be the constant :λ, which matches the constant tS A[i]

  28. [28]

    Therefore mapping m must contain binding ?v→ x

    If tA[i] is a variable ?v and tS A[i] is a constant x, then we know that tI A[i] = tS A[i] = x. Therefore mapping m must contain binding ?v→ x. We set element tq[i] to be the variable ?v, so that when we complete the construction of q, m∗ will contain mapping ?v→ x

  29. [29]

    If tA[i] and tS A[i] are both variables, it must be that tS A[i] = :λ. If it exists a triple t′ A ∈ A and a modelling triple tS′ A of m(t′ A), and position j such that t′ A[i] is variable ?v and tS′ A [i] is a constant, then we set element tq[i] to be the constant :λ (tS A[i] will be :λ in our sandobox graph). Note that even though we don’t use variablet′...