pith. sign in

arxiv: 2606.06127 · v1 · pith:S7BY7COEnew · submitted 2026-06-04 · 💻 cs.DB

Validation of graph databases against PG-Schema

Pith reviewed 2026-06-27 23:09 UTC · model grok-4.3

classification 💻 cs.DB
keywords graph database validationPG-Schemacomputational complexityNP-completenessproperty graphsschema conformancecombined complexitydata complexity
0
0 comments X

The pith

Validating graph databases against PG-Schema types without integrity constraints is NP-complete in combined complexity but polynomial-time in data complexity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that determining whether a given graph database instance satisfies a PG-Schema type description, when no integrity constraints are present, is NP-complete if both the type and the instance are allowed to vary. This combined complexity becomes polynomial when the schema restricts how type combinations and unions alternate. In contrast, when the schema is fixed and only the database instance grows, the validation task stays in polynomial time. These results highlight that the computational difficulty arises primarily from the interplay between the schema structure and the data rather than from data size alone.

Core claim

The validation problem for PG-Schema graph types without integrity constraints is NP-complete in combined complexity and in PTIME in data complexity. The combined complexity drops to PTIME when the alternation between type combinations and unions is suitably restricted.

What carries the argument

Complexity analysis of the validation problem for PG-Schema graph types, focusing on how type combinations and unions interact to determine hardness.

If this is right

  • Efficient algorithms exist when only the database instance varies with a fixed schema.
  • Restricting alternation between type combinations and unions makes combined validation tractable.
  • The hardness originates in schema-data interaction rather than instance size.
  • Systems can choose algorithms based on schema structure without integrity constraints.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Schema designers could limit alternation depth to guarantee efficient validation in practice.
  • Similar complexity separations may exist in other graph schema languages that separate type structure from constraints.
  • Extending the analysis to include integrity constraints would likely increase complexity and requires separate study.

Load-bearing premise

The complexity results are established specifically for PG-Schema types that contain no integrity constraints.

What would settle it

Discovery of a polynomial-time algorithm solving the unrestricted combined-complexity case for arbitrary PG-Schema types would disprove the NP-completeness claim.

Figures

Figures reproduced from arXiv: 2606.06127 by Dominik Tomaszuk, Filip Murlak, Jacek Ciszewski, Jakub K{\l}os, Maxime Jakubowski.

Figure 1
Figure 1. Figure 1: A small customer graph. For an edge e ∈ E with ρG(e) = (u, v), the nodes u and v are the endpoints of e, where u is the source and v is the target of e. For an element x ∈ N ∪ E, the record π(x) collects all properties of x (key-value pairs) and is called the content of x [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: A diagram of a customer graph schema. CREATE GRAPH TYPE customerGraph STRICT { ( person : Person { name STRING , birthdate DATE }) , ( company : Company { name STRING , VAT_Number STRING }) , ( customer : Customer & ( person | company ) { id INT32 }) , ( account : Account { iban STRING }) , (: customer ) -[ owns : Owns { since DATE }] - > (: account ) , FOR (c: customer ) EXCLUSIVE c. id , FOR (a: account … view at source ↗
Figure 3
Figure 3. Figure 3: PG-Schema of a customer graph schema. either a person or a company. This graph schema can be expressed in PG￾Schema as shown in [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
read the original abstract

The problem of validating a given graph database instance against a given PG-Schema graph type without integrity constraints is NP- complete in terms of combined complexity and in PTIME in terms of data complexity. The combined complexity drops to PTIME when the alternation between type combinations and unions is suitably restricted

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper claims that the problem of validating a graph database instance against a given PG-Schema graph type (without integrity constraints) is NP-complete in combined complexity and in PTIME in data complexity. It further claims that the combined complexity drops to PTIME when the alternation between type combinations and unions is suitably restricted.

Significance. If the complexity classifications hold, the results delineate the computational boundaries of schema validation in property-graph databases, distinguishing combined vs. data complexity and identifying a syntactic restriction that restores tractability. This is useful for algorithm design and for deciding which PG-Schema features can be supported efficiently at scale.

minor comments (3)
  1. The abstract states the main theorems but the manuscript should include a brief proof sketch or high-level reduction argument in the introduction or a dedicated complexity section to allow readers to assess the claims without reading the full proofs.
  2. Clarify the precise syntactic restriction on 'alternation between type combinations and unions' (e.g., bound the number of alternations or forbid certain nestings) with a formal definition or example early in the paper.
  3. Ensure that all complexity statements explicitly restate the scope limitation 'without integrity constraints' in the theorem statements themselves, not only in the abstract.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the careful review and the positive recommendation of minor revision. The referee's summary accurately captures the main results of the paper.

Circularity Check

0 steps flagged

No significant circularity in complexity classification

full rationale

The paper states direct complexity results (NP-complete combined complexity, PTIME data complexity, with a syntactic restriction recovering PTIME) for validating graph instances against PG-Schema types that contain no integrity constraints. No equations, fitted parameters, self-citations, or ansatzes appear in the abstract or described claims. The result is a standard complexity classification with no reduction of the claimed theorem to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Abstract-only review; no free parameters, invented entities, or non-standard axioms are visible. The result relies on the standard definitions of combined versus data complexity and the definition of PG-Schema without integrity constraints.

axioms (2)
  • standard math Standard definitions of combined complexity and data complexity in database theory
    The NP-completeness and PTIME statements are expressed using these established notions.
  • domain assumption PG-Schema types are defined without integrity constraints
    The abstract explicitly scopes the result to schemas without integrity constraints.

pith-pipeline@v0.9.1-grok · 5569 in / 1200 out tokens · 22969 ms · 2026-06-27T23:09:57.965088+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

32 extracted references · 10 canonical work pages

  1. [1]

    Hopcroft and Richard M

    John E. Hopcroft and Richard M. Karp , title =. 1973 , url =. doi:10.1137/0202019 , timestamp =

  2. [2]

    PG-Schema: Schemas for Property Graphs , author =. Proc. doi:10.1145/3589778 , url =

  3. [3]

    net/forum?id=rJXMpikCZ

    PG-Keys: Keys for Property Graphs , author =. doi:10.1145/3448016.3457561 , url =

  4. [4]

    Reutter and Ognjen Savkovi

    Julien Corman and Juan L. Reutter and Ognjen Savkovi. Semantics and Validation of Recursive. The Semantic Web -- ISWC 2018, Proceedings, Part I , publisher =

  5. [5]

    Stupp and Jos

    Katherine Thornton and Harold Solbrig and Gregory S. Stupp and Jos. Using Shape Expressions (. The Semantic Web -- ESWC 2019, Proceedings , publisher =. doi:10.1007/978-3-030-21348-0_39 , note =

  6. [6]

    Semantics and Validation of Shape Schemas for

    Iovka Boneva and Eric Prud'hommeaux and Jos. Semantics and Validation of Shape Schemas for. The Semantic Web – ISWC 2017, Part I , publisher =

  7. [7]

    GRADES-NDA'19: 2nd Joint Int

    Defining Schemas for Property Graphs by using the GraphQL Schema Definition Language , author =. GRADES-NDA'19: 2nd Joint Int. Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA) , publisher =

  8. [8]

    Conceptual Modeling: 38th International Conference, ER 2019, Salvador, Brazil, Proceedings , publisher =

    Schema Validation and Evolution for Graph Databases , author =. Conceptual Modeling: 38th International Conference, ER 2019, Salvador, Brazil, Proceedings , publisher =

  9. [9]

    Companion Proceedings of the 23rd International Conference on World Wide Web (WWW Companion) , publisher =

    Test-Driven Evaluation of Linked Data Quality , author =. Companion Proceedings of the 23rd International Conference on World Wide Web (WWW Companion) , publisher =

  10. [10]

    EDBT 2021 — 24th International Conference on Extending Database Technology , publisher =

    Schema Inference for Property Graphs , author =. EDBT 2021 — 24th International Conference on Extending Database Technology , publisher =

  11. [11]

    Proceedings of the VLDB Endowment (PVLDB) , volume = 15, number = 12, pages =

    DiscoPG: Property Graph Schema Discovery and Exploration , author =. Proceedings of the VLDB Endowment (PVLDB) , volume = 15, number = 12, pages =

  12. [12]

    Complexity and Expressiveness of

    S. Complexity and Expressiveness of. 18th International Conference on Database Theory (ICDT 2015) , publisher =

  13. [13]

    Semantics and Complexity of

    Jorge P. Semantics and Complexity of. The Semantic Web – ISWC 2006 , publisher =

  14. [14]

    Kostylev and Juan L

    Egor V. Kostylev and Juan L. Reutter and Miguel Romero and Domagoj Vrgo. The Semantic Web – ISWC 2015, Part I , publisher =

  15. [15]

    Foundations of Databases , author =

  16. [16]

    Computational Complexity: A Modern Approach , author =

  17. [17]

    Shapes Constraint Language (

  18. [18]

    ACM Computing Surveys , volume = 50, number = 5, doi =

    Foundations of Modern Query Languages for Graph Databases , author =. ACM Computing Surveys , volume = 50, number = 5, doi =

  19. [19]

    Graph Theory , author =

  20. [20]

    Digraphs: Theory, Algorithms and Applications , author =

  21. [21]

    Algorithmic Graph Theory and Perfect Graphs , author =

  22. [22]

    Companion Proceedings of the ACM Web Conference 2023 , pages =

    Wikidata: The making of , author =. Companion Proceedings of the ACM Web Conference 2023 , pages =

  23. [23]

    Gabb and Denise Gosnell and Alastair Green and Zhihui Guo and Keith W

    G\'abor Sz\'arnyas and Brad Bebee and Altan Birler and Alin Deutsch and George Fletcher and Henry A. Gabb and Denise Gosnell and Alastair Green and Zhihui Guo and Keith W. Hare and Jan Hidders and Alexandru Iosup and Atanas Kiryakov and Tomas Kovatchev and Xinsheng Li and Leonid Libkin and Heng Lin and Xiaojian Luo and Arnau Prat-P\'erez and David P\"uroj...

  24. [24]

    2024 , url =

    Crime Investigation Graph Example (POLE): Person--Object--Location--Event Dataset , howpublished =. 2024 , url =

  25. [25]

    2022 , url =

    Star Wars Graph Example: Exploring the Kaggle Star Wars Dataset in Neo4j , howpublished =. 2022 , url =

  26. [26]

    2024 , url =

    Twitch Network Analysis Graph Example , howpublished =. 2024 , url =

  27. [27]

    eLife , volume =

    Takemura, Shin-Ya and Aso, Yoshinori and Hige, Toshihide and Wong, Allan and Lu, Zhiyuan and Xu, C Shan and Rivlin, Patricia K and Hess, Harald and Zhao, Ting and Parag, Toufiq and Berg, Stuart and Huang, Gary and Katz, William and Olbris, Donald J and Plaza, Stephen and Umayam, Lowell and Aniceto, Roxanne and Chang, Lei-Ann and Lauchie, Shirley and Ogund...

  28. [28]

    eLife , volume =

    Himmelstein, Daniel Scott and Lizee, Antoine and Hessler, Christine and Brueggeman, Leo and Chen, Sabrina L and Hadley, Dexter and Green, Ari and Khankhanian, Pouya and Baranzini, Sergio E , title =. eLife , volume =. 2017 , month = sep, day =. doi:10.7554/eLife.26726 , url =

  29. [29]

    Proceedings of the National Academy of Sciences of the United States of America , volume =

    Takemura, Shin-ya and Xu, C Shan and Lu, Zhiyuan and Rivlin, Patricia K and Parag, Toufiq and Olbris, Donald J and Plaza, Stephen and Zhao, Ting and Katz, William T and Umayam, Lowell and Weaver, Charlotte and Hess, Harald F and Horne, Jane Anne and Nunez-Iglesias, Juan and Aniceto, Roxanne and Chang, Lei-Ann and Lauchie, Shirley and Nasca, Ashley and Ogu...

  30. [30]

    Exploring Use Cases for CovidGraph , booktitle =

    G. Exploring Use Cases for CovidGraph , booktitle =. 2023 , publisher =. doi:10.3233/SHTI230254 , url =

  31. [31]

    covidgraph/data\_cord19: Transfers CORD-19 data into a neo4j graph , howpublished =

  32. [32]

    The hetnet awakens at https://neo4j.het.io [poster]

    Daniel Himmelstein and Antoine Lizee and Pouya Khankhanian and Leo Brueggeman and Sabrina Chen and Dexter Hadley and Chrissy Hessler and Ari Green and Sergio Baranzini. The hetnet awakens at https://neo4j.het.io [poster]. 2016. doi:10.6084/m9.figshare.4054902.v3