pith. sign in

arxiv: 2604.20946 · v2 · submitted 2026-04-22 · 💻 cs.LO · cs.DB

Common Foundations for Recursive Shape Languages

Pith reviewed 2026-05-14 21:06 UTC · model grok-4.3

classification 💻 cs.LO cs.DB
keywords ShExSHACLRDF schemasrecursive semanticsfixpoint semanticsdualityvalidation complexitystratified fragments
0
0 comments X

The pith

ShEx and SHACL contain large fragments with identical expressive power connected by a duality between least and greatest fixpoint semantics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper supplies a single formal semantics that covers least fixpoint, greatest fixpoint, and supported-model interpretations for shape languages. It then proves that, once restricted to stratified fragments, the least-fixpoint version of SHACL and the greatest-fixpoint version of ShEx coincide exactly in what they can express. The same duality maps every query in one fragment to a corresponding query in the other. The work also gives a full complexity map for validation under each semantics, showing that supported-model semantics is harder in the combined case. Practitioners gain a clear picture of which shape constraints can be moved between the two languages without changing their meaning.

Core claim

Although ShEx and SHACL appear to commit to opposite fixpoint semantics, the paper shows that their stratified fragments are equally expressive and stand in one-to-one correspondence via an explicit duality that swaps least and greatest fixpoints while preserving the sets of satisfying graphs.

What carries the argument

A unifying formal semantics that interprets shape schemas under least fixpoint (LFP), greatest fixpoint (GFP), and supported model (SMS) semantics and isolates a duality operator on stratified fragments.

If this is right

  • Any shape constraint written in one language's stratified fragment can be mechanically rewritten into the other language while preserving exactly the same set of valid graphs.
  • Validation algorithms developed for one semantics can be reused for the dual fragment without loss of correctness.
  • The data-complexity and combined-complexity bounds for LFP, GFP, and SMS validation are now known for both languages, allowing implementers to choose the cheapest semantics that still meets requirements.
  • Interoperability between ShEx and SHACL validators becomes feasible on the identified common fragments without additional translation layers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Tool builders can now offer a single validation engine that switches between LFP and GFP modes while guaranteeing identical results on the dual fragments.
  • The complexity results suggest that supported-model semantics may be avoided in practice unless its extra cost is justified by specific modeling needs.
  • The duality may extend to other schema languages that mix recursion with negation, offering a general technique for comparing fixpoint-based formalisms.

Load-bearing premise

The chosen unifying semantics correctly models both the official greatest-fixpoint behavior of ShEx and the emerging least-fixpoint direction for SHACL, and the duality continues to hold for the non-stratified fragments that appear in real schemas.

What would settle it

An RDF graph together with a shape expression that is stratified, satisfies the shape under the GFP semantics of ShEx, yet fails to satisfy the dual shape under the LFP semantics of SHACL.

Figures

Figures reproduced from arXiv: 2604.20946 by Cem Okulmus, Dominik Tomaszuk, Fabio Mogavero, Filip Murlak, Iovka Boneva, Jan Hidders, Jose-Emilio Labra-Gayo, Mantas \v{S}imkus, Maxime Jakubowski, Ognjen Savkovi\'c, Shqiponja Ahmetaj, Wim Martens.

Figure 1
Figure 1. Figure 1: A graph G and a catalogue C with a, b, p, q ∈ IRIs, ”d” ∈ Literals, and s1, s2, s3 ∈ Names. The semantics of a catalogue C on a graph G is defined in terms of shape assignments, which indicate which shape names hold in which nodes of G. A shape assignment for C and G is a relation α ⊆ dom(C) × (Nodes(G) ∪ Const(C)). We define the semantics of shape φ in catalogue C on graph G under assignment α inductively… view at source ↗
read the original abstract

As schema languages for RDF data become more mature, we are seeing efforts to extend them with recursive semantics, applying diverse ideas from logic programming and description logics. While ShEx has an official recursive semantics based on greatest fixpoints (GFP), the discussion for SHACL is ongoing and seems to be converging towards least fixpoints (LFP). A practical study we perform shows that, indeed, ShEx validators implement GFP, whereas SHACL validators are more heterogeneous. This situation creates tension between ShEx and SHACL, as their semantic commitments appear to diverge, potentially undermining interoperability and predictability. We aim to clarify this design space by comparing the main semantic options in a principled yet accessible way, hoping to engage both theoreticians and practitioners, especially those involved in developing tools and standards. We present a unifying formal semantics that treats LFP, GFP, and supported model semantics (SMS), clarifying their relationships and highlighting a duality between LFP and GFP on stratified fragments. Next, we investigate to which extent the directions taken by SHACL and ShEx are compatible. We show that, although ShEx and SHACL seem to be going in different directions, they include large fragments with identical expressive power. Moreover, there is a strong correspondence between these fragments through the aforementioned principle of duality. Finally, we present a complete picture of the data and combined complexity of ShEx and SHACL validation under LFP, GFP, and SMS, showing that SMS comes at a higher computational cost under standard complexity-theoretic assumptions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper presents a unifying formal semantics for recursive shape languages that treats least fixpoints (LFP), greatest fixpoints (GFP), and supported model semantics (SMS) in a common framework. It establishes a duality between LFP and GFP on stratified fragments, proves that large fragments of ShEx (GFP) and SHACL (LFP) have identical expressive power via this duality, and gives a complete data and combined complexity classification for validation under all three semantics.

Significance. If the formal development holds, the results clarify the apparent divergence between ShEx and SHACL semantics, demonstrate concrete interoperability via duality on practically relevant fragments, and supply tight complexity bounds that directly inform validator design. The grounding in standard fixpoint theory and the explicit complexity picture are strengths that make the work useful to both theoreticians and implementers.

major comments (2)
  1. [§4.2, Theorem 4.3] §4.2, Theorem 4.3 (expressive-power equivalence): the reduction from stratified ShEx to stratified SHACL (and vice versa) is stated to be polynomial, but the proof sketch does not explicitly address how the translation preserves the stratified stratification ordering when shape references cross multiple components; this step is load-bearing for the claim that the fragments used in practice coincide.
  2. [§5.1, Table 2] §5.1, Table 2 (complexity summary): the combined-complexity entry for SMS is listed as PSPACE-complete, yet the reduction from QBF is only outlined; a missing detail is whether the reduction still works when the input shape graph contains cycles that are not stratified, which would affect the tightness of the bound.
minor comments (3)
  1. [§3] Notation for the three semantics (LFP/GFP/SMS) is introduced in §3 but reused without reminder in later complexity statements; adding a one-line reminder in each theorem statement would improve readability.
  2. [§2.3] The practical validator survey in §2.3 reports implementation choices but does not list the exact versions or test suites used; a short appendix table would make the empirical claim reproducible.
  3. [Figure 1] Figure 1 (duality diagram) uses arrows whose direction is not explained in the caption; the caption should state whether an arrow denotes a translation or a semantic equivalence.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment and constructive comments. We address each major point below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [§4.2, Theorem 4.3] §4.2, Theorem 4.3 (expressive-power equivalence): the reduction from stratified ShEx to stratified SHACL (and vice versa) is stated to be polynomial, but the proof sketch does not explicitly address how the translation preserves the stratified stratification ordering when shape references cross multiple components; this step is load-bearing for the claim that the fragments used in practice coincide.

    Authors: We agree the proof sketch would benefit from greater explicitness. The translation is defined stratum-by-stratum: each stratum is rewritten independently, and the dependency relation (which defines the stratification order) is preserved verbatim because the mapping replaces shape expressions without introducing forward references across strata. We will add a short lemma to the proof of Theorem 4.3 establishing that the image of any stratified shape graph remains stratified with identical stratum ordering. This will be included in the revised version. revision: yes

  2. Referee: [§5.1, Table 2] §5.1, Table 2 (complexity summary): the combined-complexity entry for SMS is listed as PSPACE-complete, yet the reduction from QBF is only outlined; a missing detail is whether the reduction still works when the input shape graph contains cycles that are not stratified, which would affect the tightness of the bound.

    Authors: The QBF reduction constructs a stratified shape graph whose strata mirror the quantifier prefix; this already yields PSPACE-hardness for SMS validation on stratified inputs. Because stratified graphs form a valid subclass of all inputs, hardness carries over to the general (possibly cyclic) case. Membership in PSPACE is shown separately for arbitrary graphs via a direct encoding into quantified Boolean formulas that respects the supported-model definition. We will insert one clarifying sentence in §5.1 noting that the lower bound is witnessed already on the stratified fragment. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper establishes a unifying semantics relating LFP, GFP, and SMS for ShEx and SHACL fragments, then derives duality and expressive-power equivalence on stratified cases directly from the fixpoint definitions and standard complexity results. No step reduces by construction to a fitted parameter, self-citation chain, or renamed input; the duality follows from the mathematical properties of least/greatest fixpoints on the given stratified fragments without external load-bearing assumptions that loop back to the paper's own claims. The results are therefore self-contained against external benchmarks of fixpoint theory.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The work relies on standard fixpoint semantics from logic programming and description logics; no new free parameters or invented entities are introduced beyond the three semantic options already discussed in the field.

axioms (2)
  • standard math Standard least and greatest fixpoint semantics for recursive definitions
    Invoked when defining LFP and GFP for shape expressions
  • domain assumption Stratification condition on rule sets
    Used to establish the duality between LFP and GFP

pith-pipeline@v0.9.0 · 5619 in / 1257 out tokens · 21440 ms · 2026-05-14T21:06:25.281589+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

8 extracted references · 8 canonical work pages

  1. [1]

    InISWC, 104–120

    Semantics and validation of shapes schemas for RDF. InISWC, 104–120. Bray, T.; Paoli, J.; Sperberg-McQueen, C. M.; Maler, E.; and Yergeau, F. 2008. Extensible markup language (XML) 1.0 (fifth edition). Technical report, World Wide Web Consor- tium. Corman, J.; Florenzano, F.; Reutter, J. L.; and Savkovic, O. 2019. Validating SHACL constraints over a SPARQ...

  2. [2]

    Formalizing and validating Wikidata’s property con- straints using SHACL and SPARQL.Semantic Web. Gao, S. S.; Sperberg-McQueen, C. M.; Thompson, H. S.; Mendelsohn, N.; Beech, D.; and Maloney, M. 2012. W3C XML Schema Definition Language (XSD) 1.1 Part 1: Struc- tures. Technical report, World Wide Web Consortium. Giacomo, G. D., and Lenzerini, M. 1994. Conc...

  3. [3]

    Database Syst

    BonXai: Combining the simplicity of DTD with the expressiveness of XML schema.ACM Trans. Database Syst. 42(3):15:1–15:42. Mohamed, A.; Abuoda, G.; Ghanem, A.; Kaoudi, Z.; and Aboulnaga, A. 2022. RDFFrames: knowledge graph access for machine learning tools.The VLDB Journal31(2):321– 346. Nebel, B. 1991. Terminological reasoning is inherently intractable. I...

  4. [4]

    InISWC20, LNCS 12506, 474–493

    SHACL Satisfiability and Containment. InISWC20, LNCS 12506, 474–493. Springer. Pareti, P.; Konstantinidis, G.; and Mogavero, F. 2022. Sat- isfiability and Containment of Recursive SHACL.JWS 74:100721:1–24. Prud’hommeaux, E., and Baker, T. 2017. ShapeMap Struc- ture and Language. W3C draft community group report, W3C. http://shex.io/shape-map/. Prud’hommea...

  5. [5]

    Take two fresh shape namess ∃π.φ ands φ

  6. [6]

    Add the shape declarations φ :φtoC

  7. [7]

    Replace each occurrence of∃π.φinCwiths ∃π.φ

  8. [8]

    For every state q of A, take a fresh shape name sq and add the following declarations to C: (a)s ∃π.φ :s q0 ∨ · · · ∨s qn where q0,

    Construct a non-deterministic finite automaton A that accepts the language of π. For every state q of A, take a fresh shape name sq and add the following declarations to C: (a)s ∃π.φ :s q0 ∨ · · · ∨s qn where q0, . . . , qn are the initial states ofA, (b)s q :s φ for each accepting stateqofA, and (c)s q :∃r.s q′ for each transition(q, r, q ′)ofA. The resu...