A Graph-Native Approach to Normalization
Pith reviewed 2026-05-15 16:43 UTC · model grok-4.3
The pith
Graph-native normalization reduces redundancies in labeled property graphs by defining functional dependencies across nodes, edges, and their combinations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that normalization can be lifted from the relational setting to graphs by introducing graph object functional dependencies that capture constraints inside nodes, inside edges, and between nodes and edges, then using these to define a family of graph-native normal forms together with algorithms that rewrite a graph to satisfy the chosen form.
What carries the argument
Graph object functional dependencies (GOFDs) together with the associated graph-native normal forms, which identify and remove redundant data patterns involving nodes, edges, and node-edge combinations.
If this is right
- Graphs can be systematically rewritten into higher normal forms that eliminate node-edge redundancies.
- The same dependency notion supports algorithms that decide whether a given graph already satisfies a chosen normal form.
- Knowledge-graph applications obtain data with fewer update and deletion anomalies after the transformation.
- Normalization decisions can now be made on the combined structure of nodes and edges rather than nodes alone.
Where Pith is reading between the lines
- The approach opens a route to automatic normalization tools inside existing graph database systems.
- Similar dependency-based rewriting might be applied to other graph models such as RDF or hypergraphs.
- Normalized graphs could improve the accuracy of graph-query optimizers that rely on dependency information.
- The framework supplies a measurable way to compare the quality of different graph datasets before and after cleaning.
Load-bearing premise
Functional dependencies that involve edges in real labeled property graphs can be discovered and used for normalization in the same direct way as relational functional dependencies without creating fresh inconsistencies.
What would settle it
A concrete counter-example would be a labeled property graph on which the proposed transformation algorithms either lose information that should be preserved or introduce new anomalies that were absent in the original data.
Figures
read the original abstract
In recent years, knowledge graphs (KGs) - in particular in the form of labeled property graphs (LPGs) - have become essential components in a broad range of applications. Although the absence of strict schemas for KGs facilitates structural issues that lead to redundancies and subsequently to inconsistencies and anomalies, the problem of KG quality has so far received only little attention. Inspired by normalization using functional dependencies for relational data, a first approach exploiting dependencies within nodes has been proposed. However, real-world KGs also expose functional dependencies involving edges. In this paper, we therefore propose graph-native normalization, which considers dependencies within nodes, edges, and their combination. We define a range of graph-native normal forms and graph object functional dependencies and propose algorithms for transforming graphs accordingly. We evaluate our contributions using a broad range of synthetic and native graph datasets.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes graph-native normalization for labeled property graphs (LPGs) in knowledge graphs. It extends relational normalization by introducing graph object functional dependencies (GOFDs) that capture dependencies within nodes, edges, and their combinations, defines corresponding graph-native normal forms, presents transformation algorithms, and evaluates the approach on synthetic and native graph datasets.
Significance. If the GOFD semantics and algorithms can be shown to eliminate redundancies and anomalies without introducing inconsistencies in LPGs, the work would provide a foundational extension of Codd-style normalization to graph data models. This addresses an important gap in KG quality management and could influence schema design and data cleaning practices in graph databases.
major comments (3)
- [Abstract] Abstract: The central claim that GOFDs enable normalization 'analogous to relational FDs' is load-bearing but unsupported; the manuscript supplies no concrete formal semantics for edge GOFDs (e.g., how a property on an edge is determined by source/target node properties or other edges) or proofs that enforcement preserves graph structure (multi-edges, directionality, connectivity).
- [Algorithms] Algorithms section: The transformation algorithms are described at a high level but lack any proof of correctness, termination, or anomaly elimination; without these, it is impossible to verify that normalization removes redundancies without creating new inconsistencies in the graph.
- [Evaluation] Evaluation section: The paper states that it evaluates the contributions on synthetic and native datasets, yet no metrics, baseline comparisons, or quantitative results (e.g., redundancy reduction, anomaly counts before/after) are provided, leaving the empirical support for the claims unverifiable.
minor comments (1)
- [Abstract] The abstract would be strengthened by including a small concrete example of a node-edge GOFD and the resulting normalized graph.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback. We address each major comment below and will revise the manuscript accordingly to provide stronger formal foundations and empirical support.
read point-by-point responses
-
Referee: [Abstract] The central claim that GOFDs enable normalization 'analogous to relational FDs' is load-bearing but unsupported; the manuscript supplies no concrete formal semantics for edge GOFDs (e.g., how a property on an edge is determined by source/target node properties or other edges) or proofs that enforcement preserves graph structure (multi-edges, directionality, connectivity).
Authors: We agree the abstract is high-level. Section 3 of the manuscript formally defines GOFDs, including edge GOFDs where an edge property is determined by a combination of source-node properties, target-node properties, and other edge properties. To address the concern, we will expand the abstract and add an explicit subsection with the full semantics plus a proof sketch showing that normalization steps preserve multi-edges, directionality, and connectivity. revision: yes
-
Referee: [Algorithms] The transformation algorithms are described at a high level but lack any proof of correctness, termination, or anomaly elimination; without these, it is impossible to verify that normalization removes redundancies without creating new inconsistencies in the graph.
Authors: We acknowledge the algorithms section presents pseudocode and high-level steps without formal proofs. We will add a new subsection containing proofs of correctness, termination (leveraging the finite size of the graph and strict reduction in dependency violations), and anomaly elimination that does not introduce inconsistencies while preserving graph invariants. revision: yes
-
Referee: [Evaluation] The paper states that it evaluates the contributions on synthetic and native datasets, yet no metrics, baseline comparisons, or quantitative results (e.g., redundancy reduction, anomaly counts before/after) are provided, leaving the empirical support for the claims unverifiable.
Authors: The current evaluation provides qualitative discussion of the datasets. We agree quantitative evidence is needed and will revise the section to include concrete metrics (redundancy reduction ratios, pre/post anomaly counts) together with baseline comparisons on both the synthetic and native graph datasets. revision: yes
Circularity Check
No circularity: graph-native normal forms extend relational theory independently
full rationale
The paper defines graph object functional dependencies (GOFDs) and graph-native normal forms by direct analogy to relational functional dependencies, explicitly covering nodes, edges, and their combinations as new constructs. No equations or algorithms reduce the proposed normal forms back to the input data or prior definitions by construction; the transformation algorithms are presented as novel procedures rather than tautological renamings or self-fitted parameters. The derivation chain remains self-contained against external relational benchmarks without load-bearing self-citations or ansatz smuggling.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Functional dependencies can be defined for nodes, edges, and their combinations in labeled property graphs
Reference graph
Works this paper leans on
-
[1]
Serge Abiteboul, Richard Hull, and Victor Vianu. 1995.Foundations of Databases. Addison-Wesley
work page 1995
-
[2]
Shqiponja Ahmetaj, Iovka Boneva, Jan Hidders, Katja Hose, Maxime Jakubowski, José Emilio Labra Gayo, Wim Martens, Fabio Mogavero, Filip Murlak, Cem Okul- mus, Axel Polleres, Ognjen Savkovic, Mantas Simkus, and Dominik Tomaszuk
-
[3]
Common Foundations for SHACL, ShEx, and PG-Schema. InWWW. ACM, 8–21
-
[4]
Renzo Angles, Angela Bonifati, Stefania Dumbrava, George Fletcher, Alastair Green, Jan Hidders, Bei Li, Leonid Libkin, Victor Marsault, Wim Martens, Filip Murlak, Stefan Plantikow, Ognjen Savkovic, Michael Schmidt, Juan Sequeda, Slawek Staworko, Dominik Tomaszuk, Hannes Voigt, Domagoj Vrgoc, Mingxi Wu, and Dusan Zivkovic. 2023. PG-Schema: Schemas for Prop...
work page 2023
-
[5]
Renzo Angles, Angela Bonifati, Stefania Dumbrava, George Fletcher, Keith W. Hare, Jan Hidders, Victor E. Lee, Bei Li, Leonid Libkin, Wim Martens, Filip Murlak, Josh Perryman, Ognjen Savkovic, Michael Schmidt, Juan F. Sequeda, Slawek Staworko, and Dominik Tomaszuk. 2021. PG-Keys: Keys for Property Graphs. InSIGMOD Conference. ACM, 2423–2436
work page 2021
-
[6]
William Ward Armstrong. 1974. Dependency Structures of Data Base Relation- ships. InIFIP Congress. North-Holland, 580–583
work page 1974
-
[7]
2016.Data and Information Quality - Dimensions, Principles and Techniques
Carlo Batini and Monica Scannapieco. 2016.Data and Information Quality - Dimensions, Principles and Techniques. Springer
work page 2016
-
[8]
Angela Bonifati. 2025. Versatile Property Graph Transformations.Proc. VLDB Endow.18, 12 (2025), 5516–5526
work page 2025
-
[9]
Angela Bonifati, George H. L. Fletcher, Hannes Voigt, and Nikolay Yakovets. 2018.Querying Graphs. Morgan & Claypool Publishers
work page 2018
-
[10]
Antoon Bronselaer. 2021. Data Quality Management: An Overview of Methods and Challenges. InFQAS (Lecture Notes in Computer Science, Vol. 12871). Springer, 127–141
work page 2021
-
[11]
E. F. Codd. 1970. A Relational Model of Data for Large Shared Data Banks. Commun. ACM13, 6 (1970), 377–387
work page 1970
-
[12]
E. F. Codd. 1971. Further Normalization of the Data Base Relational Model. Research Report / RJ / IBM / San Jose, CaliforniaRJ909 (1971)
work page 1971
-
[13]
E. F. Codd. 1974. Recent Investigations in Relational Data Base Systems. InIFIP Congress. North-Holland, 1017–1021
work page 1974
-
[14]
Claude Delobel and Richard G. Casey. 1973. Decomposition of a Data Base and the Theory of Boolean Switching Functions.IBM J. Res. Dev.17, 5 (1973), 374–386
work page 1973
-
[15]
Lisa Ehrlinger and Wolfram Wöß. 2018. A Novel Data Quality Metric for Mini- mality. InQUAT@WISE (Lecture Notes in Computer Science, Vol. 11235). Springer, 1–15
work page 2018
-
[16]
Nadime Francis, Amélie Gheerbrant, Paolo Guagliardo, Leonid Libkin, Victor Marsault, Wim Martens, Filip Murlak, Liat Peterfreund, Alexandra Rogova, and Domagoj Vrgoc. 2023. GPC: A Pattern Calculus for Property Graphs. InPODS. ACM, 241–250
work page 2023
-
[17]
Nadime Francis, Amélie Gheerbrant, Paolo Guagliardo, Leonid Libkin, Victor Marsault, Wim Martens, Filip Murlak, Liat Peterfreund, Alexandra Rogova, and Domagoj Vrgoc. 2023. A Researcher’s Digest of GQL (Invited Talk). InICDT (LIPIcs, Vol. 255). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 1:1–1:22
work page 2023
-
[18]
Jelle Hellings, Marc Gyssens, Jan Paredaens, and Yuqing Wu. 2014. Implication and Axiomatization of Functional Constraints on Patterns with an Application to the RDF Data Model. InFoIKS (Lecture Notes in Computer Science, Vol. 8367). Springer, 250–269
work page 2014
-
[19]
2018.Datenbanken - Konzepte und Sprachen, 6
Andreas Heuer, Gunter Saake, and Kai-Uwe Sattler. 2018.Datenbanken - Konzepte und Sprachen, 6. Auflage. MITP
work page 2018
-
[20]
Rashid, Anisa Rula, Lukas Schmelzeisen, Juan F
Aidan Hogan, Eva Blomqvist, Michael Cochez, Claudia d’Amato, Gerard de Melo, Claudio Gutierrez, Sabrina Kirrane, José Emilio Labra Gayo, Roberto Nav- igli, Sebastian Neumaier, Axel-Cyrille Ngonga Ngomo, Axel Polleres, Sabbir M. Rashid, Anisa Rula, Lukas Schmelzeisen, Juan F. Sequeda, Steffen Staab, and Antoine Zimmermann. 2022. Knowledge Graphs.ACM Comput...
work page 2022
-
[21]
ISO/IEC 39075:2024 - Information technology — Database languages — GQL
ISO GQL. ISO/IEC 39075:2024 - Information technology — Database languages — GQL. https://www.iso.org/standard/76120.html
work page 2024
-
[22]
Ernests Lavrinovics, Russa Biswas, Johannes Bjerva, and Katja Hose. 2025. Knowl- edge Graphs, Large Language Models, and Hallucinations: An NLP Perspective. J. Web Semant.85 (2025), 100844
work page 2025
-
[23]
Ulf Leser and Felix Naumann. 2007.Informationsintegration - Architekturen und Methoden zur Integration verteilter und heterogener Datenquellen. dpunkt.verlag
work page 2007
-
[24]
Junhong Lin, Xiaojie Guo, Yada Zhu, Samuel Mitchell, Erik Altman, and Julian Shun. 2024. FraudGT: A Simple, Effective, and Efficient Graph Transformer for Financial Fraud Detection. InICAIF. ACM, 292–300
work page 2024
-
[25]
Maude Manouvrier and Khalid Belhajjame. 2026. Graph functional dependencies: Analysis and translation to PG-schema.Inf. Syst.136 (2026), 102633
work page 2026
-
[26]
Memgraph. Graph data model. https://memgraph.com/docs/data-modeling/ graph-data-model
-
[27]
Neo4j. Graph database concepts. https://neo4j.com/docs/getting-started/ appendix/graphdb-concepts/
-
[28]
Neo4j. 2026.Northwind Graph Example. https://github.com/neo4j-graph- examples/northwind
work page 2026
-
[29]
International Consortium of Investigative Journalists. Offshore Leaks Database. https://github.com/ICIJ/offshoreleaks-data-packages
-
[30]
1989.The Structure of the Relational Database Model
Jan Paredaens, Paul De Bra, Marc Gyssens, and Dirk Van Gucht. 1989.The Structure of the Relational Database Model. EATCS Monographs on Theoretical Computer Science, Vol. 17. Springer
work page 1989
-
[31]
Terence Parr. ANTLR. https://www.antlr.org
-
[32]
Kashif Rabbani, Matteo Lissandrini, Angela Bonifati, and Katja Hose. 2024. Trans- forming RDF Graphs to Property Graphs using Standardized Schemas.Proc. ACM Manag. Data2, 6 (2024), 242:1–242:25
work page 2024
-
[33]
Aref, Marcelo Arenas, Maciej Besta, Peter A
Sherif Sakr, Angela Bonifati, Hannes Voigt, Alexandru Iosup, Khaled Ammar, Renzo Angles, Walid G. Aref, Marcelo Arenas, Maciej Besta, Peter A. Boncz, Johannes Schrott, Maxime Jakubowski, and Katja Hose Khuzaima Daudjee, Emanuele Della Valle, Stefania Dumbrava, Olaf Hartig, Bern- hard Haslhofer, Tim Hegeman, Jan Hidders, Katja Hose, Adriana Iamnitchi, Vasi...
work page 2021
-
[34]
Johannes Schrott. 2026.London Public Transport. doi:10.5281/zenodo.18479731
-
[35]
Johannes Schrott. 2026.Train Services. doi:10.5281/zenodo.18478421
-
[36]
Philipp Skavantzos and Sebastian Link. 2023. Normalizing Property Graphs. Proc. VLDB Endow.16, 11 (2023), 3031–3043
work page 2023
-
[37]
Philipp Skavantzos and Sebastian Link. 2025. Third and Boyce-Codd normal form for property graphs.VLDB J.34, 2 (2025), 23
work page 2025
-
[38]
Philipp Skavantzos, Kaiqi Zhao, and Sebastian Link. 2021. Uniqueness Constraints on Property Graphs. InCAiSE (Lecture Notes in Computer Science, Vol. 12751). Springer, 280–295
work page 2021
-
[39]
2000.Entity-relationship modeling - foundations of database technology
Bernhard Thalheim. 2000.Entity-relationship modeling - foundations of database technology. Springer
work page 2000
-
[40]
Bernhard Thalheim. 2020. Schema Optimisation Instead of (Local) Normalisation. InFoIKS (Lecture Notes in Computer Science, Vol. 12012). Springer, 281–300
work page 2020
-
[41]
Domagoj Vrgoc, Carlos Rojas, Renzo Angles, Marcelo Arenas, Diego Arroyuelo, Carlos Buil-Aranda, Aidan Hogan, Gonzalo Navarro, Cristian Riveros, and Juan Romero. 2023. MillenniumDB: An Open-Source Graph Database System.Data Intell.5, 3 (2023), 560–610. A Graph-Native Approach to Normalization A Appendix As supplementary material to the experimental evaluat...
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.