pith. sign in

arxiv: 2606.03502 · v1 · pith:64EK2O75new · submitted 2026-06-02 · 💻 cs.DB

A Community Survey on SHACL and ShEx: Briding Gaps in RDF Validation

Pith reviewed 2026-06-28 08:04 UTC · model grok-4.3

classification 💻 cs.DB
keywords SHACLShExRDF validationcommunity surveydata qualityknowledge graphssemantic webuser needs
0
0 comments X

The pith

RDF validation is widely used but users need better documentation, tool support, performance, and expressiveness.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper reports findings from a survey of SHACL and ShEx users in academia and industry to map current RDF validation practices. It establishes that these languages are adopted because they improve data quality in knowledge graphs, yet users encounter persistent limitations that reduce effectiveness on large or complex datasets. The survey data on tool usage, benefits, and desired changes is meant to direct future tool development and standardization work. A sympathetic reader would care because clearer paths to better validation directly affect the reliability of semantic data applications.

Core claim

The analysis shows that while RDF validation is widely adopted and valued for enhancing data quality, significant challenges remain. In particular, users report a need for better documentation, improved tool support, enhanced performance, and greater language expressiveness to handle complex large-scale validation tasks effectively.

What carries the argument

The community survey targeting diverse RDF validation technology users across academia and industry, which collected responses on practices, tool usage, perceived benefits, limitations, and desired enhancements.

Load-bearing premise

The survey respondents represent the full population of RDF validation users and their self-reported perceptions match actual practices and needs.

What would settle it

A larger follow-up survey using different sampling methods that finds most users are satisfied with existing documentation, tool support, and performance levels.

Figures

Figures reproduced from arXiv: 2606.03502 by Dominik Tomaszuk, Katja Hose, Maxime Jakubowski.

Figure 1
Figure 1. Figure 1: User Background and Usage Profile SHACL appears consistent across different professional backgrounds observed in the survey. While we cannot pinpoint the exact reasons for this extensive familiarity using the survey data, it is likely influenced significantly by SHACL’s W3C recommendation status, alongside factors like perceived technical advantages or available tooling. In summary, the user profile emergi… view at source ↗
Figure 2
Figure 2. Figure 2: Methods for Shape Creation new tooling and investigation, which currently appears under-explored. Furthermore, respondents explicitly noted a need for clearer mappings between OWL ontologies and SHACL shapes. Regarding shape authoring tooling (Table 1a), general text editors, Proteg´ e, and ´ commercial solutions like TopBraid EDG are the most common tools reported for au￾thoring shapes. Conversely, while … view at source ↗
Figure 3
Figure 3. Figure 3: Validation Usage Frequency Perceived Advantage Ac. Ind. Both Compatibility with RDF 34 22 20 Strong validation capabilities 21 17 20 Flexible constraints 23 13 17 Integration with existing tools 14 10 12 Ease of use 16 10 7 Advanced constraints 9 6 13 Documentation quality 12 6 7 Community support 13 6 5 Comprehensive documenta￾tion 5 4 2 Performance and scalability 3 3 4 Issue / Limitation Ac. Ind. Both D… view at source ↗
Figure 4
Figure 4. Figure 4: SHACL-SPARQL Usage nent challenges. The absence of foundational learning resources, such as an official W3C SHACL Primer comparable to those for other RDF standards, was specifically highlighted as a notable shortcoming. Furthermore, two critical technical challenges emerged from the survey: ensuring adequate performance, particularly for large datasets, and addressing perceived gaps in the expressiveness … view at source ↗
Figure 5
Figure 5. Figure 5: Methods for Validating Evolving KGs as an opportunity. Table 4a indicates that while SHACL validation reports are gener￾ally satisfactory, there is considerable scope for enhancing their usability, potentially through more informative error details or better integration with debugging workflows. Finally, the survey touched upon the increasingly relevant challenge of managing evolving knowledge graphs [24].… view at source ↗
Figure 6
Figure 6. Figure 6: Advanced Analysis infrastructure and usability remains vital for effective adoption. This includes better documentation and tutorials (e.g., official primers), more mature tooling, and enhanced validation report utility to aid debugging and understanding of existing features. 5 Interactive Data Exploration In our commitment to open science and fostering community engagement, this section details the method… view at source ↗
read the original abstract

This paper examines RDF validation practices and challenges to understand stakeholder applications, their needs, and identify areas for improvement in technologies and methodologies, thereby guiding future research and standardization efforts. A community survey was conducted, targeting a diverse group of RDF validation technology users across academia and industry. The survey collected data on current practices, tool usage, perceived benefits, limitations, and desired enhancements to gain a broad overview of the validation landscape. Our analysis shows that while RDF validation is widely adopted and valued for enhancing data quality, significant challenges remain. In particular, users report a need for better documentation, improved tool support, enhanced performance, and greater language expressiveness to handle complex large-scale validation tasks effectively. This work provides crucial insights into the RDF validation landscape, highlighting current practices and key areas for development. It offers a foundation for researchers, developers, and standardization bodies to address current limitations and advance validation technologies, ultimately improving data quality and usability in knowledge graphs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper reports results from a community survey of RDF validation technology users (SHACL and ShEx) across academia and industry. It collects data on current practices, tool usage, perceived benefits, limitations, and desired enhancements, concluding that RDF validation is widely adopted for data quality improvement but that users need better documentation, improved tool support, enhanced performance, and greater language expressiveness for complex large-scale tasks. The work aims to guide future research and standardization.

Significance. If the survey sample is representative, the findings would supply actionable community input on practical barriers in RDF validation, an area where empirical stakeholder perspectives are limited. Such insights could directly inform tool developers and standards bodies on priorities for documentation, performance, and expressiveness, ultimately supporting higher-quality knowledge graphs.

major comments (2)
  1. [Survey Methodology] Survey Methodology section: The paper provides no details on sampling frame, recruitment channels, total invitations sent, response rate, stratification by sector/tool, or any weighting/post-stratification. This is load-bearing for the central claims, as the assertions of 'widely adopted' status and specific documented needs for documentation/tool support/performance/expressiveness rest entirely on the representativeness of the responses.
  2. [Results] Results section: The analysis states that users 'report a need for' the four enhancements without reporting per-question sample sizes, response distributions, or how free-text answers were coded/quantified. This prevents assessment of whether the reported needs are broadly supported or driven by a small subset of respondents.
minor comments (2)
  1. [Title] The title uses 'Briding' (typo for 'Bridging').
  2. [Abstract] Abstract and introduction could more explicitly link the four reported needs to concrete gaps in the current SHACL/ShEx specifications or reference implementations.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive comments on our survey paper. We address the two major comments point by point below and will revise the manuscript to improve transparency where feasible.

read point-by-point responses
  1. Referee: [Survey Methodology] Survey Methodology section: The paper provides no details on sampling frame, recruitment channels, total invitations sent, response rate, stratification by sector/tool, or any weighting/post-stratification. This is load-bearing for the central claims, as the assertions of 'widely adopted' status and specific documented needs for documentation/tool support/performance/expressiveness rest entirely on the representativeness of the responses.

    Authors: We agree that additional methodological details would improve the paper. The survey was distributed as an open online instrument via public Semantic Web mailing lists, RDF community channels, and announcements at relevant conferences. Because it was not a closed invitation-based survey, we did not record total invitations sent or compute a response rate. In revision we will expand the methodology section with all available information on recruitment channels, respondent demographics, and any stratification by sector or tool. We will also qualify claims of 'widely adopted' status to reflect the self-selected sample. We cannot supply the missing quantitative sampling metrics because they were never collected. revision: partial

  2. Referee: [Results] Results section: The analysis states that users 'report a need for' the four enhancements without reporting per-question sample sizes, response distributions, or how free-text answers were coded/quantified. This prevents assessment of whether the reported needs are broadly supported or driven by a small subset of respondents.

    Authors: We accept that reporting response counts, distributions, and coding procedures is necessary for interpretability. The revised Results section will include per-question sample sizes, summary distributions for closed questions, and a description of how free-text responses were coded and aggregated. These additions will be incorporated in the next version. revision: yes

standing simulated objections not resolved
  • Total invitations sent and formal response rate, because the survey was distributed through open public channels without individual tracking.

Circularity Check

0 steps flagged

No circularity: observational survey with no derivation chain

full rationale

This is a community survey paper reporting self-reported practices, tool usage, and perceived needs from respondents. No equations, first-principles derivations, fitted parameters, or predictions appear in the manuscript. The central claims rest on direct tabulation and qualitative summary of survey responses rather than any reduction to prior inputs by construction. Representativeness of the sample is an external-validity assumption, not a circularity pattern. No self-citation load-bearing steps, ansatz smuggling, or renaming of known results are present. The paper is self-contained as an empirical report.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The work is an empirical survey report; it does not introduce mathematical free parameters, domain axioms, or invented entities. It implicitly relies on standard assumptions of survey methodology (representative sampling, honest self-reporting) but these are not formalized in the abstract.

pith-pipeline@v0.9.1-grok · 5694 in / 1028 out tokens · 16511 ms · 2026-06-28T08:04:09.541341+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

38 extracted references · 3 canonical work pages · 1 internal anchor

  1. [1]

    SHACLGEN.https://github.com/AKSW/shaclgen, 2025

    AKSW. SHACLGEN.https://github.com/AKSW/shaclgen, 2025

  2. [2]

    Apache jena shacl, 2025

    Apache Software Foundation. Apache jena shacl, 2025

  3. [3]

    Dbpedia: a nucleus for a web of open data

    S ¨oren Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. Dbpedia: a nucleus for a web of open data. InISWC, pages 722–735. Springer-Verlag, 2007

  4. [4]

    shacl-engine, 2025

    Thomas Bergwinkl. shacl-engine, 2025. Accessed: May 15, 2025. 14

  5. [5]

    Shape designer for shex and shacl constraints

    Iovka Boneva, J ´er´emie Dusart, Daniel Fern´andez Alvarez, and Jose Emilio Labra Gayo. Shape designer for shex and shacl constraints. InISWC, 2019

  6. [6]

    Data shapes working group charter

    Nicholas Car and Eliana Papoutsoglou. Data shapes working group charter. W3C Working Group Charter, December 2024

  7. [7]

    Astrea: Automatic generation of shacl shapes from ontologies

    Andrea Cimmino, Alba Fern ´andez-Izquierdo, and Ra ´ul Garc´ıa-Castro. Astrea: Automatic generation of shacl shapes from ontologies. InThe Semantic Web, pages 497–513. Springer International Publishing, 2020

  8. [8]

    Corese core, 2025

    Corese. Corese core, 2025. Accessed: May 15, 2025

  9. [9]

    maplib, 2025

    DataTreeHouse. maplib, 2025. Accessed: May 15, 2025

  10. [10]

    From Local to Global: A Graph RAG Approach to Query-Focused Summarization

    Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, and Jonathan Larson. From local to global: A graph RAG approach to query-focused summarization.CoRR, abs/2404.16130, 2024

  11. [11]

    Automatic extraction of shapes using shexer.Knowledge-Based Systems, 238:107975, 2022

    Daniel Fernandez- ´Alvarez, Jose Emilio Labra-Gayo, and Daniel Gayo-Avello. Automatic extraction of shapes using shexer.Knowledge-Based Systems, 238:107975, 2022

  12. [12]

    Rohde, and Maria-Esther Vidal

    M ´onica Figuera, Philipp D. Rohde, and Maria-Esther Vidal. Trav-shacl: Effi- ciently validating networks of SHACL constraints. InWWW, pages 3337–3348. ACM / IW3C2, 2021

  13. [13]

    SHACL Play!https://github.com/sparna-git/ shacl-play, 2025

    Thomas Francart. SHACL Play!https://github.com/sparna-git/ shacl-play, 2025

  14. [14]

    Rdfshape: An RDF playground based on shapes

    Jos ´e Emilio Labra Gayo, Daniel Fern ´andez- ´Alvarez, and Herminio Garc ´ıa- Gonz´alez. Rdfshape: An RDF playground based on shapes. InISWC Posters & Demonstrations, volume 2180 ofCEUR Workshop Proceedings. CEUR-WS.org, 2018

  15. [15]

    Compiling SHACL into SQL

    Maxime Jakubowski and Jan Van den Bussche. Compiling SHACL into SQL. InISWC (2), volume 15232 ofLecture Notes in Computer Science, pages 59–77. Springer, 2024

  16. [16]

    Shapes constraint language (SHACL)

    Dimitris Kontokostas and Holger Knublauch. Shapes constraint language (SHACL). W3C recommendation, W3C, July 2017. https://www.w3.org/TR/2017/REC-shacl-20170720/

  17. [17]

    rudof, 2025

    Jose Emilio Labra-Gayo. rudof, 2025. Accessed: May 15, 2025

  18. [18]

    Visual nota- tions for viewing rdf constraints with unshacled.Semantic Web, 13(5):757–792, 2022

    Sven Lieber, Ben De Meester, Pieter Heyvaert, Femke Br ¨uckmann, Ruben Wambacq, Erik Mannens, Ruben Verborgh, and Anastasia Dimou. Visual nota- tions for viewing rdf constraints with unshacled.Semantic Web, 13(5):757–792, 2022. 15

  19. [19]

    The linked data modeling language (linkml): A general-purpose data modeling framework grounded in machine-readable seman- tics.ICBO, 3073:148–151, 2021

    Sierra AT Moxon, Harold Solbrig, Deepak R Unni, Dazhi Jiao, Richard M Bruskiewich, James P Balhoff, Gaurav Vaidya, William D Duncan, Harshad Hegde, Mark Miller, et al. The linked data modeling language (linkml): A general-purpose data modeling framework grounded in machine-readable seman- tics.ICBO, 3073:148–151, 2021

  20. [20]

    Mark A. Musen. The prot ´eg´e project: a look back and a look forward.AI Matters, 1(4):4–12, June 2015

  21. [21]

    Industry-scale knowledge graphs: lessons and challenges.CACM, 62(8):36–43, July 2019

    Natasha Noy, Yuqing Gao, Anshu Jain, Anant Narayanan, Alan Patterson, and Jamie Taylor. Industry-scale knowledge graphs: lessons and challenges.CACM, 62(8):36–43, July 2019

  22. [22]

    Graph retrieval-augmented generation: A survey

    Boci Peng, Yun Zhu, Yongchao Liu, Xiaohe Bo, Haizhou Shi, Chuntao Hong, Yan Zhang, and Siliang Tang. Graph retrieval-augmented generation: A survey. arXiv preprint arXiv:2408.08921, 2024

  23. [23]

    Foundations of json schema

    Felipe Pezoa, Juan L Reutter, Fernando Suarez, Mart ´ın Ugarte, and Domagoj Vrgoˇc. Foundations of json schema. InThe Web Conference, pages 263–273, 2016

  24. [24]

    How does knowledge evolve in open knowledge graphs? TGDK, 1(1):11:1–11:59, 2023

    Axel Polleres, Romana Pernisch, Angela Bonifati, Daniele Dell’Aglio, Daniil Dobriy, Stefania Dumbrava, Lorena Etcheverry, Nicolas Ferranti, Katja Hose, Ernesto Jim´enez-Ruiz, Matteo Lissandrini, Ansgar Scherp, Riccardo Tommasini, and Johannes Wachs. How does knowledge evolve in open knowledge graphs? TGDK, 1(1):11:1–11:59, 2023

  25. [25]

    Shape expressions language 2.1

    Eric Prud’hommeaux, Iovka Boneva, Jose Emilio Labra Gayo, and Gregg Kel- logg. Shape expressions language 2.1. Final community group report, W3C, October 2019

  26. [26]

    Shacl and shex in the wild: a community survey on validating shapes generation and adoption

    Kashif Rabbani, Matteo Lissandrini, and Katja Hose. Shacl and shex in the wild: a community survey on validating shapes generation and adoption. InCompanion Proceedings of the Web Conference 2022, pages 260–263, 2022

  27. [27]

    Extraction of validating shapes from very large knowledge graphs.Proc

    Kashif Rabbani, Matteo Lissandrini, and Katja Hose. Extraction of validating shapes from very large knowledge graphs.Proc. VLDB Endow., 16(5):1023– 1032, 2023

  28. [28]

    W3C Recommendation, February 2014

    RDF 1.1 concepts and abstract syntax. W3C Recommendation, February 2014

  29. [29]

    pySHACL, March 2025

    Ashley Sommer and Nicholas Car. pySHACL, March 2025

  30. [30]

    Towards improving the quality of knowledge graphs with data-driven ontology patterns and shacl

    Blerina Spahiu, Andrea Maurino, and Matteo Palmonari. Towards improving the quality of knowledge graphs with data-driven ontology patterns and shacl. In Emerging Topics in Semantic Technologies, pages 103–117. IOS Press, 2018

  31. [31]

    Stardog enterprise knowledge graph platform, 2025

    Stardog. Stardog enterprise knowledge graph platform, 2025. Accessed: May 15, 2025. 16

  32. [32]

    Yago 4.5: A large and clean knowledge base with a rich taxonomy

    Fabian M Suchanek, Mehwish Alam, Thomas Bonald, Lihu Chen, Pierre-Henri Paris, and Jules Soria. Yago 4.5: A large and clean knowledge base with a rich taxonomy. InACM SIGIR, pages 131–140, 2024

  33. [33]

    Eclipse RDF4J, 2025

    The Eclipse Foundation. Eclipse RDF4J, 2025. Accessed: May 15, 2025

  34. [34]

    shacl-js

    TopQuadrant. shacl-js. Accessed: May 15, 2025

  35. [35]

    TopBraid EDG, 2025

    TopQuadrant. TopBraid EDG, 2025. Accessed: May 15, 2025

  36. [36]

    Wikidata: a free collaborative knowl- edgebase.Communications of the ACM, 57(10):78–85, 2014

    Denny Vrande ˇci´c and Markus Kr ¨otzsch. Wikidata: a free collaborative knowl- edgebase.Communications of the ACM, 57(10):78–85, 2014

  37. [37]

    rdf-validate-shacl, 2025

    Zazuko. rdf-validate-shacl, 2025. Accessed: May 15, 2025

  38. [38]

    A survey of graph retrieval-augmented generation for customized large language models

    Qinggang Zhang, Shengyuan Chen, Yuanchen Bei, Zheng Yuan, Huachi Zhou, Zijin Hong, Junnan Dong, Hao Chen, Yi Chang, and Xiao Huang. A survey of graph retrieval-augmented generation for customized large language models. arXiv preprint arXiv:2501.13958, 2025. 17