Graph-Native Normalization

Johannes Schrott; Katja Hose; Maxime Jakubowski

arxiv: 2603.02995 · v3 · pith:IWXW2TNQnew · submitted 2026-03-03 · 💻 cs.DB

Graph-Native Normalization

Johannes Schrott , Maxime Jakubowski , Katja Hose This is my paper

Pith reviewed 2026-05-15 16:43 UTC · model grok-4.3

classification 💻 cs.DB

keywords graph normalizationfunctional dependencieslabeled property graphsknowledge graphsdata qualitynormal formsgraph databases

0 comments

The pith

Graph-native normalization reduces redundancies in labeled property graphs by defining functional dependencies across nodes, edges, and their combinations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Labeled property graphs in knowledge graph applications often contain redundancies that lead to inconsistencies because they lack rigid schemas. Earlier work handled functional dependencies only inside nodes, yet real graphs also show dependencies that cross edges. The paper defines graph-native normal forms and graph object functional dependencies that treat nodes, edges, and their joint patterns together. It supplies algorithms that transform a graph into these normal forms while preserving its information. Experiments on synthetic and real graph datasets confirm that the transformations can be carried out in practice.

Core claim

The central claim is that normalization can be lifted from the relational setting to graphs by introducing graph object functional dependencies that capture constraints inside nodes, inside edges, and between nodes and edges, then using these to define a family of graph-native normal forms together with algorithms that rewrite a graph to satisfy the chosen form.

What carries the argument

Graph object functional dependencies (GOFDs) together with the associated graph-native normal forms, which identify and remove redundant data patterns involving nodes, edges, and node-edge combinations.

If this is right

Graphs can be systematically rewritten into higher normal forms that eliminate node-edge redundancies.
The same dependency notion supports algorithms that decide whether a given graph already satisfies a chosen normal form.
Knowledge-graph applications obtain data with fewer update and deletion anomalies after the transformation.
Normalization decisions can now be made on the combined structure of nodes and edges rather than nodes alone.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach opens a route to automatic normalization tools inside existing graph database systems.
Similar dependency-based rewriting might be applied to other graph models such as RDF or hypergraphs.
Normalized graphs could improve the accuracy of graph-query optimizers that rely on dependency information.
The framework supplies a measurable way to compare the quality of different graph datasets before and after cleaning.

Load-bearing premise

Functional dependencies that involve edges in real labeled property graphs can be discovered and used for normalization in the same direct way as relational functional dependencies without creating fresh inconsistencies.

What would settle it

A concrete counter-example would be a labeled property graph on which the proposed transformation algorithms either lose information that should be preserved or introduce new anomalies that were absent in the original data.

Figures

Figures reproduced from arXiv: 2603.02995 by Johannes Schrott, Katja Hose, Maxime Jakubowski.

**Figure 1.** Figure 1: Examples of denormalized LPGs describing courses, lecturers, and students. Redundancies are ............... underlined....... with ...... dots. :Course n1 title: “Database Systems” language: “English” .............. usingBook: . ......... “Alice” :Lecturer n3 :teaches e2 at: 2026-02-16 name: “Maxime” :Lecturer n2 :teaches e1 at: 2026-02-09 name: “Johannes” :Lecturer n4 :teaches e3 at: 2026-02-23 name: “Kat… view at source ↗

**Figure 4.** Figure 4: Resolving redundancies in Figure 1 related to [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: GO-FDs of different shapes for which no redun [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Example graph for the calculation of per [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗

**Figure 7.** Figure 7: Percentual changes (log scale) to our structural per-graph metrics following graph-native [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

**Figure 8.** Figure 8: Comparison of query performance (𝑚Q.DBHit) between original and rewritten queries the average number of properties but, due to reification, create substantially more new nodes and edges. Overall, Figure 7a shows that graph-native normalization almost always increases the number of graph objects and decreases the average number of properties per object. The exception is𝜓between-n-ep, which “moves” a prope… view at source ↗

read the original abstract

In recent years, knowledge graphs (KGs) - in particular in the form of labeled property graphs (LPGs) - have become essential components in a broad range of applications. Although the absence of strict schemas for KGs facilitates structural issues that lead to redundancies and subsequently to inconsistencies and anomalies, the problem of KG quality has so far received only little attention. Inspired by normalization using functional dependencies for relational data, a first approach exploiting dependencies within nodes has been proposed. However, real-world KGs also expose functional dependencies involving edges. In this paper, we therefore propose graph-native normalization, which considers dependencies within nodes, edges, and their combination. We define a range of graph-native normal forms and graph object functional dependencies and propose algorithms for transforming graphs accordingly. We evaluate our contributions using a broad range of synthetic and native graph datasets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper extends relational normalization to labeled property graphs by defining functional dependencies that cover edges as well as nodes.

read the letter

The main point is that this work defines graph object functional dependencies (GOFDs) spanning nodes, edges, and their combinations, then gives algorithms to reach a set of graph-native normal forms. That extension to edges is the actual novelty over the node-only prior work referenced in the abstract. They also run an evaluation on synthetic graphs plus real labeled property graph datasets, which is the right move for a methods paper in this area. The framing is straightforward: without schemas, KGs accumulate redundancies that cause anomalies, and treating the graph structure directly makes more sense than shoehorning everything into relations. The proposal builds cleanly on established relational concepts without obvious circularity. The soft spot is the formal semantics for edge dependencies. The abstract states the definitions and algorithms exist but supplies none of the concrete rules, so it is impossible to check whether an edge GOFD can be enforced without allowing duplicate edges, breaking direction, or creating fresh inconsistencies when properties are updated. If the body has clear examples and proofs that anomalies are removed without new problems, the claim holds; otherwise the central step remains unverified. This is aimed at database researchers who work on knowledge graph quality and schema design. A reader already thinking about data cleaning in LPGs would find the normal forms and transformation steps useful as a starting point. It deserves peer review because the problem is real, the proposal is concrete, and the evaluation gives something to discuss even if the formal details need tightening in revision.

Referee Report

3 major / 1 minor

Summary. The paper proposes graph-native normalization for labeled property graphs (LPGs) in knowledge graphs. It extends relational normalization by introducing graph object functional dependencies (GOFDs) that capture dependencies within nodes, edges, and their combinations, defines corresponding graph-native normal forms, presents transformation algorithms, and evaluates the approach on synthetic and native graph datasets.

Significance. If the GOFD semantics and algorithms can be shown to eliminate redundancies and anomalies without introducing inconsistencies in LPGs, the work would provide a foundational extension of Codd-style normalization to graph data models. This addresses an important gap in KG quality management and could influence schema design and data cleaning practices in graph databases.

major comments (3)

[Abstract] Abstract: The central claim that GOFDs enable normalization 'analogous to relational FDs' is load-bearing but unsupported; the manuscript supplies no concrete formal semantics for edge GOFDs (e.g., how a property on an edge is determined by source/target node properties or other edges) or proofs that enforcement preserves graph structure (multi-edges, directionality, connectivity).
[Algorithms] Algorithms section: The transformation algorithms are described at a high level but lack any proof of correctness, termination, or anomaly elimination; without these, it is impossible to verify that normalization removes redundancies without creating new inconsistencies in the graph.
[Evaluation] Evaluation section: The paper states that it evaluates the contributions on synthetic and native datasets, yet no metrics, baseline comparisons, or quantitative results (e.g., redundancy reduction, anomaly counts before/after) are provided, leaving the empirical support for the claims unverifiable.

minor comments (1)

[Abstract] The abstract would be strengthened by including a small concrete example of a node-edge GOFD and the resulting normalized graph.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address each major comment below and will revise the manuscript accordingly to provide stronger formal foundations and empirical support.

read point-by-point responses

Referee: [Abstract] The central claim that GOFDs enable normalization 'analogous to relational FDs' is load-bearing but unsupported; the manuscript supplies no concrete formal semantics for edge GOFDs (e.g., how a property on an edge is determined by source/target node properties or other edges) or proofs that enforcement preserves graph structure (multi-edges, directionality, connectivity).

Authors: We agree the abstract is high-level. Section 3 of the manuscript formally defines GOFDs, including edge GOFDs where an edge property is determined by a combination of source-node properties, target-node properties, and other edge properties. To address the concern, we will expand the abstract and add an explicit subsection with the full semantics plus a proof sketch showing that normalization steps preserve multi-edges, directionality, and connectivity. revision: yes
Referee: [Algorithms] The transformation algorithms are described at a high level but lack any proof of correctness, termination, or anomaly elimination; without these, it is impossible to verify that normalization removes redundancies without creating new inconsistencies in the graph.

Authors: We acknowledge the algorithms section presents pseudocode and high-level steps without formal proofs. We will add a new subsection containing proofs of correctness, termination (leveraging the finite size of the graph and strict reduction in dependency violations), and anomaly elimination that does not introduce inconsistencies while preserving graph invariants. revision: yes
Referee: [Evaluation] The paper states that it evaluates the contributions on synthetic and native datasets, yet no metrics, baseline comparisons, or quantitative results (e.g., redundancy reduction, anomaly counts before/after) are provided, leaving the empirical support for the claims unverifiable.

Authors: The current evaluation provides qualitative discussion of the datasets. We agree quantitative evidence is needed and will revise the section to include concrete metrics (redundancy reduction ratios, pre/post anomaly counts) together with baseline comparisons on both the synthetic and native graph datasets. revision: yes

Circularity Check

0 steps flagged

No circularity: graph-native normal forms extend relational theory independently

full rationale

The paper defines graph object functional dependencies (GOFDs) and graph-native normal forms by direct analogy to relational functional dependencies, explicitly covering nodes, edges, and their combinations as new constructs. No equations or algorithms reduce the proposed normal forms back to the input data or prior definitions by construction; the transformation algorithms are presented as novel procedures rather than tautological renamings or self-fitted parameters. The derivation chain remains self-contained against external relational benchmarks without load-bearing self-citations or ansatz smuggling.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The proposal rests on the domain assumption that graph objects admit functional dependencies analogous to relational tuples; no free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption Functional dependencies can be defined for nodes, edges, and their combinations in labeled property graphs
Direct extension of relational functional dependencies stated in the abstract as the foundation for graph-native normal forms.

pith-pipeline@v0.9.0 · 5434 in / 1024 out tokens · 27637 ms · 2026-05-15T16:43:55.218084+00:00 · methodology

Graph-Native Normalization

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)