pith. sign in

arxiv: 1907.10603 · v1 · pith:TNM2HEMOnew · submitted 2019-07-24 · 💻 cs.DB

Semi Automatic Construction of ShEx and SHACL Schemas

Pith reviewed 2026-05-24 16:52 UTC · model grok-4.3

classification 💻 cs.DB
keywords ShExSHACLRDFschema constructionsemi-automaticshape constraintsinteractive workflowsample nodes
0
0 comments X

The pith

An algorithm that builds shape constraints from sample nodes, guided by schema patterns, combines with an interactive editor to produce ShEx or SHACL schemas for RDF datasets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper describes a semi-automatic approach to creating constraints in ShEx or SHACL for an existing RDF dataset. One part is an algorithm that receives sets of sample nodes and outputs a shape for each set; it accepts schema patterns that encode structural rules or external knowledge from experts and ontologies. The second part is an interactive workflow that displays dataset statistics, runs validation against the growing schema, and supplies editing actions that feed back into the algorithm. Together they let users start from samples and patterns and iteratively reach a complete schema without writing every constraint by hand.

Core claim

The central claim is that shape constraints can be generated automatically from collections of sample nodes once the generation process is parametrized by reusable schema patterns, and that an interactive loop of validation feedback and editing operations can refine those initial shapes into a full, usable ShEx or SHACL schema.

What carries the argument

The schema construction algorithm that, given sample node sets and an optional schema pattern, produces one shape constraint per sample set; this algorithm is invoked inside an interactive workflow that supplies validation results and editing operations.

If this is right

  • Schemas can be started from any chosen sample sets rather than from the entire graph at once.
  • Domain knowledge expressed as schema patterns is injected once and then reused across multiple construction steps.
  • Validation results shown during editing directly guide the next automatic construction call.
  • The same algorithm and workflow apply unchanged to both ShEx and SHACL targets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could be tested on datasets where an ontology already exists, to measure how much the ontology-derived patterns reduce the number of manual edits required.
  • If sample selection were itself automated by clustering nodes with similar neighborhood signatures, the whole pipeline would move closer to fully automatic schema induction.
  • The interactive workflow naturally supports collaborative construction, because validation feedback and pattern suggestions can be shared among several users.

Load-bearing premise

The chosen sample nodes must represent the structural patterns that actually occur in the full dataset, and the supplied schema patterns must correctly capture the intended constraints.

What would settle it

Run the algorithm on an RDF dataset whose complete, manually authored ShEx schema is already known; if the shapes produced from representative sample sets differ in structure or cardinality from the known schema even after the interactive workflow is applied, the method fails.

Figures

Figures reproduced from arXiv: 1907.10603 by Daniel Fern\'andez \'Alvarez, Iovka Boneva, J\'er\'emie Dusart, Jose Emilio Labra Gayo.

Figure 1
Figure 1. Figure 1: Nodes of type foaf:Person We now introduce the abstract Schapes Constraint Language (SCL) that containing the main features of ShEx and SHACL and has a quite straightfor￾ward translation to both of them described in Appendix A of the long version. Its syntax is very similar to ShEx compact syntax. A shape constraint Constr is defined by the abstract syntax in [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Syntax of the constraint language. A Constr is satisfied by a node if this node and its neighbourhood satisfy both the value and the neighbourhood constraints. Each ValueConstr restricts the [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Syntax of uniform constraints. mark that uniform constraints belong to the SCL language defined in Sect. 2. 3.1 Most Specific Constraint Let F be the set of all constraints definable by ValueConstrs and let  be the (partial) ordering relation over F defined by V1  V2 if all nodes that satisfy V1 also satisfy V2. Then for any prefix pr: and for any XSD datatype X it holds pr:  iri  nonlit  any blank  … view at source ↗
Figure 4
Figure 4. Figure 4: Extract of the Wikidata entry for Auckland construction by describing a general form for the target schema, but can also be used simply to restrict the namespaces of predicates of interest, or to give a particular role to some values such as the value of rdf:type. We start by a motivating example based on Wikidata that we use for an informal description of schema patterns in Sect. 4.1. Then in Sec 4.2 we g… view at source ↗
Figure 5
Figure 5. Figure 5: Most specific constraint for a sample of Wikidata cities <City> { a wikibase: {1;1} ; wdt:P17 wd: {1;1} ; # exactly one direct country statement p:P17 @<Y_P17> {1;*} ; # several country statements wdt:P6 wd: {0;1} ; # optional direct h. gov. statement p:P6 @<Y_P6> {0;*} } # 0 or more h. gov. statements <Y_P17> { a [ wikibase:Statement ] {1;1} ; ps:P17 wd: {1;*} } # at least one country statement <Y_P6> { a… view at source ↗
Figure 6
Figure 6. Figure 6: Schema Scity for Wikidata cities poor to represent the actual structure of the data. A more appropriate schema Scity for cities in Wikidata7 is given in [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Schema pattern Pcity for Wikidata cities As we saw, schema patterns are particularly fitted for the Wikidata dataset as it strongly relies on reification. But they are also useful for all datasets that use reification [5] or repetitive local structure. Simpler schema patterns can also be used for e.g. restraining the properties of interest. 4.2 Formal Definition Assume a countable set Vsch of shape label v… view at source ↗
Figure 8
Figure 8. Figure 8: Syntax of constraint patterns. A schema pattern defines a most specific uniform schema (or a largely accepted consensus schema) which definition is given in Appendix Bof the long version due to space limitations. 4.3 Patterns for Ontologies We show here how a schema pattern can be used to encode some of the infor￾mation available in an existing ontology that might be relevant for the schema construction. C… view at source ↗
Figure 9
Figure 9. Figure 9: shows a consolidated view of three of the already implemented com￾ponents. We consider a sample with four Wikidata entries: Auckland, Monterey, Vienna and Kobe (restricted to a part of their predicates only). The top right panel contains the schema that was automatically constructed using a schema pattern similar to the one presented in [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Shape <Person> in SHACL <Person> { rdf:type [foaf:person] ; owl:sameAs IRI * ; foaf:name xsd:string ; foaf:familyName xsd:string ; ( bio:birth xsd:gYear | rdgr2:dateOfBirth @<Date> ) } <Date> { rdf:type [time:Instant] ; rdfs:label xsd:int } [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Shape <Person> in ShEx production suite. The validation results in the interactive tool are computed using the translation to the chosen target schema language, whichever it is. When the desired abstract SCL schema is ready, it is exported and can be used in production. B Formal Definition of Shape Patterns A schema pattern is defined by the syntax in [PITH_FULL_IMAGE:figures/full_fig_p018_11.png] view at source ↗
read the original abstract

We present a method for the construction of SHACL or ShEx constraints for an existing RDF dataset. It has two components that are used conjointly: an algorithm for automatic schema construction, and an interactive workflow for editing the schema. The schema construction algorithm takes as input sets of sample nodes and constructs a shape constraint for every sample set. It can be parametrized by a schema pattern that defines structural requirements for the schema to be constructed. Schema patterns are also used to feed the algorithm with relevant information about the dataset coming from a domain expert or from some ontology. The interactive workflow provides useful information about the dataset, shows validation results w.r.t. the schema under construction, and offers schema editing operations that combined with the schema construction algorithm allow to build a complex ShEx or SHACL schema.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper presents a semi-automatic method for constructing ShEx or SHACL shape constraints over an existing RDF dataset. The method combines (1) an automatic construction algorithm that, given sets of sample nodes and optional schema patterns (encoding structural requirements or domain knowledge from experts/ontologies), produces a shape for each sample set, with (2) an interactive editing workflow that supplies dataset statistics, validation feedback against the partial schema, and editing primitives that can be interleaved with the automatic step to produce a complete schema.

Significance. If the algorithm and workflow are realized as described and produce usable schemas on realistic RDF graphs, the contribution would be practically useful for lowering the barrier to schema adoption in the RDF ecosystem. The explicit support for schema patterns as a vehicle for injecting external knowledge is a reasonable design choice that distinguishes the approach from purely data-driven miners.

minor comments (3)
  1. [Introduction / Method] The abstract and introduction describe the algorithm and workflow at a high level; the manuscript would benefit from a dedicated section (or appendix) containing the pseudocode or a precise functional specification of the automatic construction procedure so that readers can reproduce or compare the method.
  2. [Evaluation] No experimental evaluation, complexity analysis, or case-study results are referenced in the provided abstract. Adding at least a small-scale validation (e.g., schema quality metrics on one or two public RDF datasets) would strengthen the claim that the conjoint use of the two components yields useful schemas.
  3. [Preliminaries] Notation for shapes, constraints, and schema patterns should be introduced once and used consistently; currently the abstract mixes “shape constraint,” “shape,” and “schema” without a clarifying definition.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of the work and for recommending minor revision. The report does not enumerate any specific major comments.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper describes a method with an automatic schema construction algorithm (taking sample node sets and optional patterns) and an interactive editing workflow. No equations, derivations, fitted parameters, predictions, or uniqueness theorems appear. The central claim is simply the presentation of the algorithm and workflow descriptions themselves; no load-bearing step reduces by construction to its own inputs or to a self-citation chain. This is a standard non-circular contribution in algorithm-description papers.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no concrete parameters, axioms, or invented entities; the ledger is therefore empty.

pith-pipeline@v0.9.0 · 5677 in / 1020 out tokens · 22123 ms · 2026-05-24T16:52:02.151746+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages

  1. [1]

    The VLDB Journal (2018)

    ˇCebiri´ c,ˇS., Goasdou´ e, F., Kondylakis, H., Kotzinos, D., Manolescu, I., Troullinou, G., Zneika, M.: Summarizing semantic graphs: a survey. The VLDB Journal (2018)

  2. [2]

    Journal of Biomedical Semantics 6 (2015)

    van Dam, Jesse, C., Koehorst, J.J., Schaap, P.J., Martins, V.A., Suarez-Diez, M.: RDF2Graph a tool to recover, understand and validate the ontology of an RDF resource. Journal of Biomedical Semantics 6 (2015)

  3. [3]

    In: 4th VOILA

    De Meester, B., Heyvaert, P., Dimou, A., Verborgh, R.: Towards a uniform user interface for editing data shapes. In: 4th VOILA. vol. 2187, pp. 13–24 (2018)

  4. [4]

    In: International Semantic Web Conference (2018)

    Fern´ andez-´Alvarez, D., Garc´ ıa-Gonz´ alez, H., Frey, J., Hellmann, S., Labra Gayo, J.E.: Inference of Latent Shape Expressions Associated to DBpedia Ontology. In: International Semantic Web Conference (2018)

  5. [5]

    Semantic Web (Preprint), 1–25 (2017)

    Frey, J., M¨ uller, K., Hellmann, S., Rahm, E., Vidal, M.E.: Evaluation of Metadata Representations in RDF stores. Semantic Web (Preprint), 1–25 (2017)

  6. [6]

    W3C Candidate Recommendation, W3C (July 2017), https://www.w3.org/TR/shacl/

    Knublauch, H., Ryman, A.: Shapes constraint language (SHACL). W3C Candidate Recommendation, W3C (July 2017), https://www.w3.org/TR/shacl/

  7. [7]

    In: Proceedings of ISWC (2018)

    Labra Gayo, J.E., Fern´ andez-´Alvarez, D., Garc´ ıa-Gonz´ alez, H.: RDFShape: An RDF playground based on Shapes. In: Proceedings of ISWC (2018)

  8. [8]

    In: Alor-Hern´ andez, G., S´ anchez-Cervantes, J.L., Rodr´ ıguez-Gonz´ alez, A

    Labra Gayo, J.E., Garc´ ıa-Gonz´ alez, H., Fern´ andez-´Alvarez, D., Prud’hommeaux, E.: Challenges in RDF Validation. In: Alor-Hern´ andez, G., S´ anchez-Cervantes, J.L., Rodr´ ıguez-Gonz´ alez, A. (eds.) Current Trends in Semantic Web Technologies: Theory and Practice, chap. 6, pp. 121–151 (2018)

  9. [9]

    Labra Gayo, J.E., Prud’hommeaux, E., Boneva, I., Kontokostas, D.: Validating RDF Data, vol. 7. Morgan & Claypool Publishers LLC (2017)

  10. [10]

    Melo, A.: Automatic refinement of large-scale cross-domain knowledge graphs. Ph.D. thesis (2018)

  11. [11]

    In: Proceedings of the Knowledge Capture Conference

    Melo, A., Paulheim, H.: Detection of relation assertion errors in knowledge graphs. In: Proceedings of the Knowledge Capture Conference. p. 22. ACM (2017)

  12. [12]

    Journal of Web Semantics First Look (2017) Construction of ShEx and SHACL Schemas 17

    Potoniec, J., Jakubowski, P., Lawrynowicz, A.: Swift linked data miner: Mining OWL 2 EL class expressions directly from on-line rdf datasets. Journal of Web Semantics First Look (2017) Construction of ShEx and SHACL Schemas 17

  13. [13]

    In: European Semantic Web Conference

    Principe, R.A.A., Spahiu, B., Palmonari, M., Rula, A., De Paoli, F., Maurino, A.: ABSTAT 1.0: Compute, Manage and Share Semantic Profiles of RDF Knowledge Graphs. In: European Semantic Web Conference. pp. 170–175 (2018)

  14. [14]

    W3C Shape Expressions Community Group Drat Report (2018)

    Prud’hommeaux, E., Boneva, I., Labra Gayo, J.E., Gregg, K.: Shape Expressions Language (ShEx). W3C Shape Expressions Community Group Drat Report (2018)

  15. [15]

    In: WOP@ISWC (2018)

    Spahiu, B., Maurino, A., Palmonari, M.: Towards Improving the Quality of Knowl- edge Graphs with Data-driven Ontology Patterns and SHACL. In: WOP@ISWC (2018)

  16. [16]

    shrinking samples

    Werkmeister, L.: Schema Inference on Wikidata. Master Thesis (2018) A The Shape Constraint Language, SHACL and ShEx We explain here how SCL is translated to SHACL and to ShEx, then we explain the small differences in semantics depending on the target language. Translation to SHACL SCL’s ValueConstrs are represented using sh:nodeKind (for lit, nonlit, blank...