Semi Automatic Construction of ShEx and SHACL Schemas
Pith reviewed 2026-05-24 16:52 UTC · model grok-4.3
The pith
An algorithm that builds shape constraints from sample nodes, guided by schema patterns, combines with an interactive editor to produce ShEx or SHACL schemas for RDF datasets.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that shape constraints can be generated automatically from collections of sample nodes once the generation process is parametrized by reusable schema patterns, and that an interactive loop of validation feedback and editing operations can refine those initial shapes into a full, usable ShEx or SHACL schema.
What carries the argument
The schema construction algorithm that, given sample node sets and an optional schema pattern, produces one shape constraint per sample set; this algorithm is invoked inside an interactive workflow that supplies validation results and editing operations.
If this is right
- Schemas can be started from any chosen sample sets rather than from the entire graph at once.
- Domain knowledge expressed as schema patterns is injected once and then reused across multiple construction steps.
- Validation results shown during editing directly guide the next automatic construction call.
- The same algorithm and workflow apply unchanged to both ShEx and SHACL targets.
Where Pith is reading between the lines
- The approach could be tested on datasets where an ontology already exists, to measure how much the ontology-derived patterns reduce the number of manual edits required.
- If sample selection were itself automated by clustering nodes with similar neighborhood signatures, the whole pipeline would move closer to fully automatic schema induction.
- The interactive workflow naturally supports collaborative construction, because validation feedback and pattern suggestions can be shared among several users.
Load-bearing premise
The chosen sample nodes must represent the structural patterns that actually occur in the full dataset, and the supplied schema patterns must correctly capture the intended constraints.
What would settle it
Run the algorithm on an RDF dataset whose complete, manually authored ShEx schema is already known; if the shapes produced from representative sample sets differ in structure or cardinality from the known schema even after the interactive workflow is applied, the method fails.
Figures
read the original abstract
We present a method for the construction of SHACL or ShEx constraints for an existing RDF dataset. It has two components that are used conjointly: an algorithm for automatic schema construction, and an interactive workflow for editing the schema. The schema construction algorithm takes as input sets of sample nodes and constructs a shape constraint for every sample set. It can be parametrized by a schema pattern that defines structural requirements for the schema to be constructed. Schema patterns are also used to feed the algorithm with relevant information about the dataset coming from a domain expert or from some ontology. The interactive workflow provides useful information about the dataset, shows validation results w.r.t. the schema under construction, and offers schema editing operations that combined with the schema construction algorithm allow to build a complex ShEx or SHACL schema.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents a semi-automatic method for constructing ShEx or SHACL shape constraints over an existing RDF dataset. The method combines (1) an automatic construction algorithm that, given sets of sample nodes and optional schema patterns (encoding structural requirements or domain knowledge from experts/ontologies), produces a shape for each sample set, with (2) an interactive editing workflow that supplies dataset statistics, validation feedback against the partial schema, and editing primitives that can be interleaved with the automatic step to produce a complete schema.
Significance. If the algorithm and workflow are realized as described and produce usable schemas on realistic RDF graphs, the contribution would be practically useful for lowering the barrier to schema adoption in the RDF ecosystem. The explicit support for schema patterns as a vehicle for injecting external knowledge is a reasonable design choice that distinguishes the approach from purely data-driven miners.
minor comments (3)
- [Introduction / Method] The abstract and introduction describe the algorithm and workflow at a high level; the manuscript would benefit from a dedicated section (or appendix) containing the pseudocode or a precise functional specification of the automatic construction procedure so that readers can reproduce or compare the method.
- [Evaluation] No experimental evaluation, complexity analysis, or case-study results are referenced in the provided abstract. Adding at least a small-scale validation (e.g., schema quality metrics on one or two public RDF datasets) would strengthen the claim that the conjoint use of the two components yields useful schemas.
- [Preliminaries] Notation for shapes, constraints, and schema patterns should be introduced once and used consistently; currently the abstract mixes “shape constraint,” “shape,” and “schema” without a clarifying definition.
Simulated Author's Rebuttal
We thank the referee for their positive summary of the work and for recommending minor revision. The report does not enumerate any specific major comments.
Circularity Check
No significant circularity
full rationale
The paper describes a method with an automatic schema construction algorithm (taking sample node sets and optional patterns) and an interactive editing workflow. No equations, derivations, fitted parameters, predictions, or uniqueness theorems appear. The central claim is simply the presentation of the algorithm and workflow descriptions themselves; no load-bearing step reduces by construction to its own inputs or to a self-citation chain. This is a standard non-circular contribution in algorithm-description papers.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
algorithm that takes as input a set of sample nodes ... constructs a shape constraint ... parametrized by a schema pattern ... interactive workflow ... editing operations
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
most specific uniform constraint msc(N) ... largely accepted consensus constraint lace(N) ... consensus(O,⪯,W,π,t)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
ˇCebiri´ c,ˇS., Goasdou´ e, F., Kondylakis, H., Kotzinos, D., Manolescu, I., Troullinou, G., Zneika, M.: Summarizing semantic graphs: a survey. The VLDB Journal (2018)
work page 2018
-
[2]
Journal of Biomedical Semantics 6 (2015)
van Dam, Jesse, C., Koehorst, J.J., Schaap, P.J., Martins, V.A., Suarez-Diez, M.: RDF2Graph a tool to recover, understand and validate the ontology of an RDF resource. Journal of Biomedical Semantics 6 (2015)
work page 2015
-
[3]
De Meester, B., Heyvaert, P., Dimou, A., Verborgh, R.: Towards a uniform user interface for editing data shapes. In: 4th VOILA. vol. 2187, pp. 13–24 (2018)
work page 2018
-
[4]
In: International Semantic Web Conference (2018)
Fern´ andez-´Alvarez, D., Garc´ ıa-Gonz´ alez, H., Frey, J., Hellmann, S., Labra Gayo, J.E.: Inference of Latent Shape Expressions Associated to DBpedia Ontology. In: International Semantic Web Conference (2018)
work page 2018
-
[5]
Semantic Web (Preprint), 1–25 (2017)
Frey, J., M¨ uller, K., Hellmann, S., Rahm, E., Vidal, M.E.: Evaluation of Metadata Representations in RDF stores. Semantic Web (Preprint), 1–25 (2017)
work page 2017
-
[6]
W3C Candidate Recommendation, W3C (July 2017), https://www.w3.org/TR/shacl/
Knublauch, H., Ryman, A.: Shapes constraint language (SHACL). W3C Candidate Recommendation, W3C (July 2017), https://www.w3.org/TR/shacl/
work page 2017
-
[7]
In: Proceedings of ISWC (2018)
Labra Gayo, J.E., Fern´ andez-´Alvarez, D., Garc´ ıa-Gonz´ alez, H.: RDFShape: An RDF playground based on Shapes. In: Proceedings of ISWC (2018)
work page 2018
-
[8]
In: Alor-Hern´ andez, G., S´ anchez-Cervantes, J.L., Rodr´ ıguez-Gonz´ alez, A
Labra Gayo, J.E., Garc´ ıa-Gonz´ alez, H., Fern´ andez-´Alvarez, D., Prud’hommeaux, E.: Challenges in RDF Validation. In: Alor-Hern´ andez, G., S´ anchez-Cervantes, J.L., Rodr´ ıguez-Gonz´ alez, A. (eds.) Current Trends in Semantic Web Technologies: Theory and Practice, chap. 6, pp. 121–151 (2018)
work page 2018
-
[9]
Labra Gayo, J.E., Prud’hommeaux, E., Boneva, I., Kontokostas, D.: Validating RDF Data, vol. 7. Morgan & Claypool Publishers LLC (2017)
work page 2017
-
[10]
Melo, A.: Automatic refinement of large-scale cross-domain knowledge graphs. Ph.D. thesis (2018)
work page 2018
-
[11]
In: Proceedings of the Knowledge Capture Conference
Melo, A., Paulheim, H.: Detection of relation assertion errors in knowledge graphs. In: Proceedings of the Knowledge Capture Conference. p. 22. ACM (2017)
work page 2017
-
[12]
Journal of Web Semantics First Look (2017) Construction of ShEx and SHACL Schemas 17
Potoniec, J., Jakubowski, P., Lawrynowicz, A.: Swift linked data miner: Mining OWL 2 EL class expressions directly from on-line rdf datasets. Journal of Web Semantics First Look (2017) Construction of ShEx and SHACL Schemas 17
work page 2017
-
[13]
In: European Semantic Web Conference
Principe, R.A.A., Spahiu, B., Palmonari, M., Rula, A., De Paoli, F., Maurino, A.: ABSTAT 1.0: Compute, Manage and Share Semantic Profiles of RDF Knowledge Graphs. In: European Semantic Web Conference. pp. 170–175 (2018)
work page 2018
-
[14]
W3C Shape Expressions Community Group Drat Report (2018)
Prud’hommeaux, E., Boneva, I., Labra Gayo, J.E., Gregg, K.: Shape Expressions Language (ShEx). W3C Shape Expressions Community Group Drat Report (2018)
work page 2018
-
[15]
Spahiu, B., Maurino, A., Palmonari, M.: Towards Improving the Quality of Knowl- edge Graphs with Data-driven Ontology Patterns and SHACL. In: WOP@ISWC (2018)
work page 2018
-
[16]
Werkmeister, L.: Schema Inference on Wikidata. Master Thesis (2018) A The Shape Constraint Language, SHACL and ShEx We explain here how SCL is translated to SHACL and to ShEx, then we explain the small differences in semantics depending on the target language. Translation to SHACL SCL’s ValueConstrs are represented using sh:nodeKind (for lit, nonlit, blank...
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.