A Community Survey on SHACL and ShEx: Briding Gaps in RDF Validation
Pith reviewed 2026-06-28 08:04 UTC · model grok-4.3
The pith
RDF validation is widely used but users need better documentation, tool support, performance, and expressiveness.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The analysis shows that while RDF validation is widely adopted and valued for enhancing data quality, significant challenges remain. In particular, users report a need for better documentation, improved tool support, enhanced performance, and greater language expressiveness to handle complex large-scale validation tasks effectively.
What carries the argument
The community survey targeting diverse RDF validation technology users across academia and industry, which collected responses on practices, tool usage, perceived benefits, limitations, and desired enhancements.
Load-bearing premise
The survey respondents represent the full population of RDF validation users and their self-reported perceptions match actual practices and needs.
What would settle it
A larger follow-up survey using different sampling methods that finds most users are satisfied with existing documentation, tool support, and performance levels.
Figures
read the original abstract
This paper examines RDF validation practices and challenges to understand stakeholder applications, their needs, and identify areas for improvement in technologies and methodologies, thereby guiding future research and standardization efforts. A community survey was conducted, targeting a diverse group of RDF validation technology users across academia and industry. The survey collected data on current practices, tool usage, perceived benefits, limitations, and desired enhancements to gain a broad overview of the validation landscape. Our analysis shows that while RDF validation is widely adopted and valued for enhancing data quality, significant challenges remain. In particular, users report a need for better documentation, improved tool support, enhanced performance, and greater language expressiveness to handle complex large-scale validation tasks effectively. This work provides crucial insights into the RDF validation landscape, highlighting current practices and key areas for development. It offers a foundation for researchers, developers, and standardization bodies to address current limitations and advance validation technologies, ultimately improving data quality and usability in knowledge graphs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper reports results from a community survey of RDF validation technology users (SHACL and ShEx) across academia and industry. It collects data on current practices, tool usage, perceived benefits, limitations, and desired enhancements, concluding that RDF validation is widely adopted for data quality improvement but that users need better documentation, improved tool support, enhanced performance, and greater language expressiveness for complex large-scale tasks. The work aims to guide future research and standardization.
Significance. If the survey sample is representative, the findings would supply actionable community input on practical barriers in RDF validation, an area where empirical stakeholder perspectives are limited. Such insights could directly inform tool developers and standards bodies on priorities for documentation, performance, and expressiveness, ultimately supporting higher-quality knowledge graphs.
major comments (2)
- [Survey Methodology] Survey Methodology section: The paper provides no details on sampling frame, recruitment channels, total invitations sent, response rate, stratification by sector/tool, or any weighting/post-stratification. This is load-bearing for the central claims, as the assertions of 'widely adopted' status and specific documented needs for documentation/tool support/performance/expressiveness rest entirely on the representativeness of the responses.
- [Results] Results section: The analysis states that users 'report a need for' the four enhancements without reporting per-question sample sizes, response distributions, or how free-text answers were coded/quantified. This prevents assessment of whether the reported needs are broadly supported or driven by a small subset of respondents.
minor comments (2)
- [Title] The title uses 'Briding' (typo for 'Bridging').
- [Abstract] Abstract and introduction could more explicitly link the four reported needs to concrete gaps in the current SHACL/ShEx specifications or reference implementations.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our survey paper. We address the two major comments point by point below and will revise the manuscript to improve transparency where feasible.
read point-by-point responses
-
Referee: [Survey Methodology] Survey Methodology section: The paper provides no details on sampling frame, recruitment channels, total invitations sent, response rate, stratification by sector/tool, or any weighting/post-stratification. This is load-bearing for the central claims, as the assertions of 'widely adopted' status and specific documented needs for documentation/tool support/performance/expressiveness rest entirely on the representativeness of the responses.
Authors: We agree that additional methodological details would improve the paper. The survey was distributed as an open online instrument via public Semantic Web mailing lists, RDF community channels, and announcements at relevant conferences. Because it was not a closed invitation-based survey, we did not record total invitations sent or compute a response rate. In revision we will expand the methodology section with all available information on recruitment channels, respondent demographics, and any stratification by sector or tool. We will also qualify claims of 'widely adopted' status to reflect the self-selected sample. We cannot supply the missing quantitative sampling metrics because they were never collected. revision: partial
-
Referee: [Results] Results section: The analysis states that users 'report a need for' the four enhancements without reporting per-question sample sizes, response distributions, or how free-text answers were coded/quantified. This prevents assessment of whether the reported needs are broadly supported or driven by a small subset of respondents.
Authors: We accept that reporting response counts, distributions, and coding procedures is necessary for interpretability. The revised Results section will include per-question sample sizes, summary distributions for closed questions, and a description of how free-text responses were coded and aggregated. These additions will be incorporated in the next version. revision: yes
- Total invitations sent and formal response rate, because the survey was distributed through open public channels without individual tracking.
Circularity Check
No circularity: observational survey with no derivation chain
full rationale
This is a community survey paper reporting self-reported practices, tool usage, and perceived needs from respondents. No equations, first-principles derivations, fitted parameters, or predictions appear in the manuscript. The central claims rest on direct tabulation and qualitative summary of survey responses rather than any reduction to prior inputs by construction. Representativeness of the sample is an external-validity assumption, not a circularity pattern. No self-citation load-bearing steps, ansatz smuggling, or renaming of known results are present. The paper is self-contained as an empirical report.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
SHACLGEN.https://github.com/AKSW/shaclgen, 2025
AKSW. SHACLGEN.https://github.com/AKSW/shaclgen, 2025
2025
-
[2]
Apache jena shacl, 2025
Apache Software Foundation. Apache jena shacl, 2025
2025
-
[3]
Dbpedia: a nucleus for a web of open data
S ¨oren Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. Dbpedia: a nucleus for a web of open data. InISWC, pages 722–735. Springer-Verlag, 2007
2007
-
[4]
shacl-engine, 2025
Thomas Bergwinkl. shacl-engine, 2025. Accessed: May 15, 2025. 14
2025
-
[5]
Shape designer for shex and shacl constraints
Iovka Boneva, J ´er´emie Dusart, Daniel Fern´andez Alvarez, and Jose Emilio Labra Gayo. Shape designer for shex and shacl constraints. InISWC, 2019
2019
-
[6]
Data shapes working group charter
Nicholas Car and Eliana Papoutsoglou. Data shapes working group charter. W3C Working Group Charter, December 2024
2024
-
[7]
Astrea: Automatic generation of shacl shapes from ontologies
Andrea Cimmino, Alba Fern ´andez-Izquierdo, and Ra ´ul Garc´ıa-Castro. Astrea: Automatic generation of shacl shapes from ontologies. InThe Semantic Web, pages 497–513. Springer International Publishing, 2020
2020
-
[8]
Corese core, 2025
Corese. Corese core, 2025. Accessed: May 15, 2025
2025
-
[9]
maplib, 2025
DataTreeHouse. maplib, 2025. Accessed: May 15, 2025
2025
-
[10]
From Local to Global: A Graph RAG Approach to Query-Focused Summarization
Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, and Jonathan Larson. From local to global: A graph RAG approach to query-focused summarization.CoRR, abs/2404.16130, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[11]
Automatic extraction of shapes using shexer.Knowledge-Based Systems, 238:107975, 2022
Daniel Fernandez- ´Alvarez, Jose Emilio Labra-Gayo, and Daniel Gayo-Avello. Automatic extraction of shapes using shexer.Knowledge-Based Systems, 238:107975, 2022
2022
-
[12]
Rohde, and Maria-Esther Vidal
M ´onica Figuera, Philipp D. Rohde, and Maria-Esther Vidal. Trav-shacl: Effi- ciently validating networks of SHACL constraints. InWWW, pages 3337–3348. ACM / IW3C2, 2021
2021
-
[13]
SHACL Play!https://github.com/sparna-git/ shacl-play, 2025
Thomas Francart. SHACL Play!https://github.com/sparna-git/ shacl-play, 2025
2025
-
[14]
Rdfshape: An RDF playground based on shapes
Jos ´e Emilio Labra Gayo, Daniel Fern ´andez- ´Alvarez, and Herminio Garc ´ıa- Gonz´alez. Rdfshape: An RDF playground based on shapes. InISWC Posters & Demonstrations, volume 2180 ofCEUR Workshop Proceedings. CEUR-WS.org, 2018
2018
-
[15]
Compiling SHACL into SQL
Maxime Jakubowski and Jan Van den Bussche. Compiling SHACL into SQL. InISWC (2), volume 15232 ofLecture Notes in Computer Science, pages 59–77. Springer, 2024
2024
-
[16]
Shapes constraint language (SHACL)
Dimitris Kontokostas and Holger Knublauch. Shapes constraint language (SHACL). W3C recommendation, W3C, July 2017. https://www.w3.org/TR/2017/REC-shacl-20170720/
2017
-
[17]
rudof, 2025
Jose Emilio Labra-Gayo. rudof, 2025. Accessed: May 15, 2025
2025
-
[18]
Visual nota- tions for viewing rdf constraints with unshacled.Semantic Web, 13(5):757–792, 2022
Sven Lieber, Ben De Meester, Pieter Heyvaert, Femke Br ¨uckmann, Ruben Wambacq, Erik Mannens, Ruben Verborgh, and Anastasia Dimou. Visual nota- tions for viewing rdf constraints with unshacled.Semantic Web, 13(5):757–792, 2022. 15
2022
-
[19]
The linked data modeling language (linkml): A general-purpose data modeling framework grounded in machine-readable seman- tics.ICBO, 3073:148–151, 2021
Sierra AT Moxon, Harold Solbrig, Deepak R Unni, Dazhi Jiao, Richard M Bruskiewich, James P Balhoff, Gaurav Vaidya, William D Duncan, Harshad Hegde, Mark Miller, et al. The linked data modeling language (linkml): A general-purpose data modeling framework grounded in machine-readable seman- tics.ICBO, 3073:148–151, 2021
2021
-
[20]
Mark A. Musen. The prot ´eg´e project: a look back and a look forward.AI Matters, 1(4):4–12, June 2015
2015
-
[21]
Industry-scale knowledge graphs: lessons and challenges.CACM, 62(8):36–43, July 2019
Natasha Noy, Yuqing Gao, Anshu Jain, Anant Narayanan, Alan Patterson, and Jamie Taylor. Industry-scale knowledge graphs: lessons and challenges.CACM, 62(8):36–43, July 2019
2019
-
[22]
Graph retrieval-augmented generation: A survey
Boci Peng, Yun Zhu, Yongchao Liu, Xiaohe Bo, Haizhou Shi, Chuntao Hong, Yan Zhang, and Siliang Tang. Graph retrieval-augmented generation: A survey. arXiv preprint arXiv:2408.08921, 2024
-
[23]
Foundations of json schema
Felipe Pezoa, Juan L Reutter, Fernando Suarez, Mart ´ın Ugarte, and Domagoj Vrgoˇc. Foundations of json schema. InThe Web Conference, pages 263–273, 2016
2016
-
[24]
How does knowledge evolve in open knowledge graphs? TGDK, 1(1):11:1–11:59, 2023
Axel Polleres, Romana Pernisch, Angela Bonifati, Daniele Dell’Aglio, Daniil Dobriy, Stefania Dumbrava, Lorena Etcheverry, Nicolas Ferranti, Katja Hose, Ernesto Jim´enez-Ruiz, Matteo Lissandrini, Ansgar Scherp, Riccardo Tommasini, and Johannes Wachs. How does knowledge evolve in open knowledge graphs? TGDK, 1(1):11:1–11:59, 2023
2023
-
[25]
Shape expressions language 2.1
Eric Prud’hommeaux, Iovka Boneva, Jose Emilio Labra Gayo, and Gregg Kel- logg. Shape expressions language 2.1. Final community group report, W3C, October 2019
2019
-
[26]
Shacl and shex in the wild: a community survey on validating shapes generation and adoption
Kashif Rabbani, Matteo Lissandrini, and Katja Hose. Shacl and shex in the wild: a community survey on validating shapes generation and adoption. InCompanion Proceedings of the Web Conference 2022, pages 260–263, 2022
2022
-
[27]
Extraction of validating shapes from very large knowledge graphs.Proc
Kashif Rabbani, Matteo Lissandrini, and Katja Hose. Extraction of validating shapes from very large knowledge graphs.Proc. VLDB Endow., 16(5):1023– 1032, 2023
2023
-
[28]
W3C Recommendation, February 2014
RDF 1.1 concepts and abstract syntax. W3C Recommendation, February 2014
2014
-
[29]
pySHACL, March 2025
Ashley Sommer and Nicholas Car. pySHACL, March 2025
2025
-
[30]
Towards improving the quality of knowledge graphs with data-driven ontology patterns and shacl
Blerina Spahiu, Andrea Maurino, and Matteo Palmonari. Towards improving the quality of knowledge graphs with data-driven ontology patterns and shacl. In Emerging Topics in Semantic Technologies, pages 103–117. IOS Press, 2018
2018
-
[31]
Stardog enterprise knowledge graph platform, 2025
Stardog. Stardog enterprise knowledge graph platform, 2025. Accessed: May 15, 2025. 16
2025
-
[32]
Yago 4.5: A large and clean knowledge base with a rich taxonomy
Fabian M Suchanek, Mehwish Alam, Thomas Bonald, Lihu Chen, Pierre-Henri Paris, and Jules Soria. Yago 4.5: A large and clean knowledge base with a rich taxonomy. InACM SIGIR, pages 131–140, 2024
2024
-
[33]
Eclipse RDF4J, 2025
The Eclipse Foundation. Eclipse RDF4J, 2025. Accessed: May 15, 2025
2025
-
[34]
shacl-js
TopQuadrant. shacl-js. Accessed: May 15, 2025
2025
-
[35]
TopBraid EDG, 2025
TopQuadrant. TopBraid EDG, 2025. Accessed: May 15, 2025
2025
-
[36]
Wikidata: a free collaborative knowl- edgebase.Communications of the ACM, 57(10):78–85, 2014
Denny Vrande ˇci´c and Markus Kr ¨otzsch. Wikidata: a free collaborative knowl- edgebase.Communications of the ACM, 57(10):78–85, 2014
2014
-
[37]
rdf-validate-shacl, 2025
Zazuko. rdf-validate-shacl, 2025. Accessed: May 15, 2025
2025
-
[38]
A survey of graph retrieval-augmented generation for customized large language models
Qinggang Zhang, Shengyuan Chen, Yuanchen Bei, Zheng Yuan, Huachi Zhou, Zijin Hong, Junnan Dong, Hao Chen, Yi Chang, and Xiao Huang. A survey of graph retrieval-augmented generation for customized large language models. arXiv preprint arXiv:2501.13958, 2025. 17
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.