pith. sign in

arxiv: 2406.05417 · v2 · pith:WWSM2NEInew · submitted 2024-06-08 · 💻 cs.DB

Optimizing Navigational Graph Queries

Pith reviewed 2026-05-24 00:26 UTC · model grok-4.3

classification 💻 cs.DB
keywords navigational graph queriesquery optimizationgraph databasesrecursive queriespattern matchingintermediate resultsquery evaluation
0
0 comments X

The pith

Novel optimization techniques for navigational graph queries constrain intermediate results to achieve orders of magnitude faster evaluation on real workloads.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper focuses on evaluating queries that combine recursive navigation with pattern matching on graphs, where current methods struggle in practice. It introduces several optimization techniques specifically aimed at limiting the growth of intermediate results generated during evaluation. These techniques are planned and executed in ways that support efficient processing of complex queries. Experiments across diverse datasets and query types show substantial performance gains compared to prior approaches.

Core claim

We present a number of novel powerful optimization techniques which aim to constrain the intermediate results during query evaluation. We show how these techniques can be planned effectively and executed efficiently towards the first practical evaluation solution for complex navigational queries on real-world workloads. Indeed, our experimental results show several orders of magnitude improvement in query evaluation performance over state-of-the-art techniques on a wide range of queries on diverse datasets.

What carries the argument

Optimization techniques that constrain intermediate results during evaluation of queries combining recursive and pattern-matching fragments.

If this is right

  • Complex navigational queries become practically evaluable on real-world graph data without excessive resource use.
  • Query engines can handle wider ranges of recursive and pattern-matching combinations at scale.
  • Performance advantages appear consistently across diverse datasets and query workloads.
  • The approach reduces reliance on ad-hoc tuning for graph query performance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same constraint-based approach might apply to optimizing other graph query languages that mix recursion with joins.
  • Database systems could incorporate these planners to lower overall memory demands during query execution.
  • Future work might test whether the techniques scale to streaming or dynamic graph settings.

Load-bearing premise

The proposed optimization techniques can be planned effectively and executed efficiently to constrain intermediate results for complex navigational queries on real-world workloads.

What would settle it

A test on real-world datasets where the techniques produce no reduction in intermediate result sizes or yield no measurable speedup over existing methods on navigational queries would disprove the central claim.

Figures

Figures reproduced from arXiv: 2406.05417 by George Fletcher, Nikolay Yakovets, Thomas Mulder.

Figure 1
Figure 1. Figure 1: A property graph capturing financial data [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Query-graphs of various queries and templates [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Application of seeding in Program D2 Seeding query Seeds Seeded closures Rewritten query [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Application of seeding in Program D3 enumerator. The enumerator’s overall design and the set of rules it is configured to use at any one time to￾gether determine the set of query plans that will be con￾sidered for an input query. To extend the enumerator’s capabilities, new query features can be easily supported by new rules, and new rules can easily be added to the set of rules. We adopt the rule-based de… view at source ↗
Figure 5
Figure 5. Figure 5: Application of seeding in Program D4 captured by a recursive parent-child relationship. Most operator types are defined in terms of one or more child operators. We define eleven types of operators to be used in query plans. Six of these operator types are well-known from the Relational Algebra, although they might differ slightly from Codd’s original specification [8]. The eleven types are as follows: – E(… view at source ↗
Figure 6
Figure 6. Figure 6: Example enumeration process (§ 4.2) databases [10, 12, 13, 14, 24, 27]. Hence, we employ a state-of-the-art algorithm developed in the relational context for precisly this purpose, namely MinCutBranch (MCB)[12]. V (s, t) ←E(s, e, t), P(e, label, l1) W(s, t) ←E(s, e, t), P(e, label, l2) Y (s, t) ←E(s, e, t), P(e, label, l3) Z(s, t) ←E(s, e, t), P(e, label, l4) Ans(x,y,z) ←V +(s, x), W+(x, y), Y +(y, z), Z(x… view at source ↗
Figure 7
Figure 7. Figure 7: Constructing a seeded plan (§ 4.3.4) tor. These queries correspond to the recursive Dat￾alog rules → WT (x, y) ← → WT (x, w), W(w, y) for Q1, ← Y T (y, z) ← Y (y, w), ← Y T (w, z) for Q2 and ← V S (s, x) ← V (s, w), ← V S (w, x) for Q3. 4.4 Complexity analysis We analyze the impact that the application of the pro￾posed optimizations has on the complexity of the enu￾meration procedure. To this end, we will … view at source ↗
Figure 8
Figure 8. Figure 8: Star-shaped queries of n terms (§ 4.4) is proportional to the number of sub-queries of the in￾put query [27]. The number of sub-queries is in turn proportional to the number of times the seeding rule can be applied. We derive Pu(n) and Po(n) based on the query shapes presented in Figure 8a and Figure 8b, respec￾tively. By deriving Pu(n) based on the non-recursive query in Figure 8a, we ensure by constructi… view at source ↗
Figure 10
Figure 10. Figure 10: The optimization time scales well in terms of [PITH_FULL_IMAGE:figures/full_fig_p017_10.png] view at source ↗
Figure 9
Figure 9. Figure 9: Query evaluation time on STRING (left) and DBPedia (middle & right). Significant speed-ups of up to [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗
Figure 11
Figure 11. Figure 11: An embedding of PCC2 in DBPedia (§ 6). 7 Related work Navigational queries are a key class of queries for graph database systems, and while the problem of optimizing their evaluation has received considerable attention in recent years, many systems still struggle with the eval￾uation of such queries in practice [1]. While evaluating navigational queries certainly in￾troduces a set of challenges, it also i… view at source ↗
Figure 12
Figure 12. Figure 12: Plans pu (left) and po (right) for a PCC2 instance on DBPedia (§ 6) References 1. Angles, R., Aranda, C.B., Hogan, A., Ro￾jas, C., Vrgoˇc, D.: Wdbench: A wikidata graph query benchmark. In: The Seman￾tic Web – ISWC 2022: 21st International Semantic Web Conference, Virtual Event, Oc￾tober 23–27, 2022, Proceedings, p. 714–731. Springer-Verlag, Berlin, Heidelberg (2022). DOI 10.1007/978-3-031-19433-7{\ }41. … view at source ↗
read the original abstract

We study the optimization of navigational graph queries, i.e., queries which combine recursive and pattern-matching fragments. Current approaches to their evaluation are not effective in practice. Towards addressing this, we present a number of novel powerful optimization techniques which aim to constrain the intermediate results during query evaluation. We show how these techniques can be planned effectively and executed efficiently towards the first practical evaluation solution for complex navigational queries on real-world workloads. Indeed, our experimental results show several orders of magnitude improvement in query evaluation performance over state-of-the-art techniques on a wide range of queries on diverse datasets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper studies the optimization of navigational graph queries combining recursive and pattern-matching fragments. It presents novel optimization techniques to constrain intermediate results during evaluation, claims these can be planned effectively and executed efficiently to yield the first practical evaluation solution for complex navigational queries on real-world workloads, and reports experimental results showing several orders of magnitude improvement over state-of-the-art techniques across a wide range of queries on diverse datasets.

Significance. If the techniques and results hold, the work would address a practical gap in graph database query evaluation where current approaches are ineffective, potentially enabling efficient handling of complex queries in real-world settings.

major comments (1)
  1. [Abstract] Abstract: the central claim of 'several orders of magnitude improvement' and 'first practical evaluation solution' rests entirely on experimental results, yet the provided manuscript text supplies no details on the optimization techniques, planning/execution methods, experimental setup, datasets, or baselines, rendering the claim impossible to assess for soundness.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review. We address the single major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim of 'several orders of magnitude improvement' and 'first practical evaluation solution' rests entirely on experimental results, yet the provided manuscript text supplies no details on the optimization techniques, planning/execution methods, experimental setup, datasets, or baselines, rendering the claim impossible to assess for soundness.

    Authors: The abstract is a concise summary. The full manuscript details the optimization techniques (Sections 3–4), planning and execution methods (Section 4), and experimental setup including datasets and baselines (Section 5). These sections support assessment of the claims. revision: no

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper describes novel optimization techniques for navigational graph queries and validates them via experimental results showing performance gains on real-world workloads. No equations, derivations, fitted parameters presented as predictions, or self-citation chains appear in the provided abstract or described content. The central claims rest on empirical evaluation of planning and execution efficiency rather than any self-definitional or constructionally equivalent steps. This is a standard experimental systems paper whose derivation chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are described in the abstract; the work appears empirical rather than axiomatic.

pith-pipeline@v0.9.0 · 5610 in / 894 out tokens · 14284 ms · 2026-05-24T00:26:00.054457+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · 1 internal anchor

  1. [1]

    In: The Seman- tic Web – ISWC 2022: 21st International Semantic Web Conference, Virtual Event, Oc- tober 23–27, 2022, Proceedings, p

    Angles, R., Aranda, C.B., Hogan, A., Ro- jas, C., Vrgoˇ c, D.: Wdbench: A wikidata graph query benchmark. In: The Seman- tic Web – ISWC 2022: 21st International Semantic Web Conference, Virtual Event, Oc- tober 23–27, 2022, Proceedings, p. 714–731. Springer-Verlag, Berlin, Heidelberg (2022). DOI 10.1007/978-3-031-19433-7 {\ }41. URL https: //doi.org/10.10...

  2. [2]

    The VLDB Journal pp

    Arroyuelo, D., G´ omez-Brand´ on, A., Hogan, A., Navarro, G., Rojas-Ledesma, J.: Optimizing rpqs over a compact graph representation. The VLDB Journal pp. 1–26 (2023)

  3. [3]

    In: 38th IEEE International Con- ference on Data Engineering, ICDE 2022, Kuala Lumpur, Malaysia, May 9-12, 2022, pp

    Arroyuelo, D., Hogan, A., Navarro, G., Rojas- Ledesma, J.: Time- and space-efficient regular path queries. In: 38th IEEE International Con- ference on Data Engineering, ICDE 2022, Kuala Lumpur, Malaysia, May 9-12, 2022, pp. 3091–

  4. [4]

    DOI 10.1109/ICDE53745

    IEEE (2022). DOI 10.1109/ICDE53745. 2022.00277. URL https://doi.org/10.1109/ ICDE53745.2022.00277

  5. [5]

    Barcel´ o, P., P´ erez, J., Reutter, J.L.: Relative ex- pressiveness of nested regular expressions. In: J. Freire, D. Suciu (eds.) Proceedings of the 6th Al- berto Mendelzon International Workshop on Foun- dations of Data Management, Ouro Preto, Brazil, June 27-30, 2012, CEUR Workshop Proceedings , vol. 866, pp. 180–195. CEUR-WS.org (2012). URL https://ce...

  6. [6]

    Synthesis Lectures on Data Management

    Bonifati, A., Fletcher, G., Voigt, H., Yakovets, N.: Querying Graphs. Synthesis Lectures on Data Management. Morgan & Clay- pool Publishers (2018). DOI 10.2200/ S00873ED1V01Y201808DTM051. URL https:// doi.org/10.2200/S00873ED1V01Y201808DTM051

  7. [7]

    Bonifati, A., Martens, W., Timm, T.: An an- alytical study of large SPARQL query logs. VLDB J. 29(2-3), 655–679 (2020). DOI 10.1007/ s00778-019-00558-9. URL https://doi.org/10. 1007/s00778-019-00558-9

  8. [8]

    Calvanese, D., Giacomo, G.D., Lenzerini, M., Vardi, M.Y.: Containment of conjunctive regu- lar path queries with inverse. In: A.G. Cohn, F. Giunchiglia, B. Selman (eds.) KR 2000, Prin- ciples of Knowledge Representation and Reasoning Proceedings of the Seventh International Confer- ence, Breckenridge, Colorado, USA, April 11-15, 2000, pp. 176–185. Morgan ...

  9. [9]

    Codd, E.F.: A relational model of data for large shared data banks. Commun. ACM 13(6), 377–387 (1970). DOI 10.1145/362384.362685. URL https: //doi.org/10.1145/362384.362685

  10. [10]

    URL https://string-db.org/ cgi/about?footer{_}active{_}subpage= statistics

    CONSORTIUM, S.: String database — statis- tics (2023). URL https://string-db.org/ cgi/about?footer{_}active{_}subpage= statistics. Accessed: 13 Oct 2023 16:00

  11. [11]

    DeHaan, D., Tompa, F.W.: Optimal top-down join enumeration. In: C.Y. Chan, B.C. Ooi, A. Zhou (eds.) Proceedings of the ACM SIGMOD Interna- tional Conference on Management of Data, Beijing, China, June 12-14, 2007, pp. 785–796. ACM (2007). 22 Thomas Mulder et al. DOI 10.1145/1247480.1247567. URL https:// doi.org/10.1145/1247480.1247567

  12. [12]

    TigerGraph: A Native MPP Graph Database

    Deutsch, A., Xu, Y., Wu, M., Lee, V.E.: Tiger- graph: A native MPP graph database. CoRR abs/1901.08248 (2019). URL http://arxiv. org/abs/1901.08248

  13. [13]

    Fender, P., Moerkotte, G.: A new, highly effi- cient, and easy to implement top-down join enu- meration algorithm. In: S. Abiteboul, K. B¨ ohm, C. Koch, K. Tan (eds.) Proceedings of the 27th In- ternational Conference on Data Engineering, ICDE 2011, April 11-16, 2011, Hannover, Germany, pp. 864–875. IEEE Computer Society (2011). DOI 10.1109/ICDE.2011.5767...

  14. [14]

    IEEE Trans

    Fender, P., Moerkotte, G.: Reassessing top-down join enumeration. IEEE Trans. Knowl. Data Eng. 24(10), 1803–1818 (2012). DOI 10.1109/TKDE. 2011.235. URL https://doi.org/10.1109/TKDE. 2011.235

  15. [15]

    Fender, P., Moerkotte, G.: Counter strike: Generic top-down join enumeration for hypergraphs. Proc. VLDB Endow. 6(14), 1822–1833 (2013). DOI 10. 14778/2556549.2556565. URL http://www.vldb. org/pvldb/vol6/p1822-fender.pdf

  16. [16]

    https://github.com/kuzudb/kuzu (2022)

    Feng, X., Jin, G., Chen, Z., Liu, C., Saliho˘ glu, S.: K` uzu Database Management System Source Code. https://github.com/kuzudb/kuzu (2022)

  17. [17]

    In: CIDR (2023)

    Feng, X., Jin, G., Chen, Z., Liu, C., Saliho˘ glu, S.: K` uzu graph database management system. In: CIDR (2023)

  18. [18]

    Francis, N., Gheerbrant, A., Guagliardo, P., Libkin, L., Marsault, V., Martens, W., Murlak, F., Peter- freund, L., Rogova, A., Vrgoc, D.: A researcher’s digest of GQL (invited talk). In: F. Geerts, B. Van- devoort (eds.) 26th International Conference on Database Theory, ICDT 2023, March 28-31, 2023, Ioannina, Greece, LIPIcs, vol. 255, pp. 1:1–1:22 (2023)

  19. [19]

    Han, Y., Wu, Z., Wu, P., Zhu, R., Yang, J., Tan, L.W., Zeng, K., Cong, G., Qin, Y., Pfadler, A., Qian, Z., Zhou, J., Li, J., Cui, B.: Cardinality estimation in dbms: a comprehensive benchmark evaluation. Proc. VLDB Endow. 15(4), 752–765 (2021). DOI 10.14778/3503585.3503586. URL https://doi.org/10.14778/3503585.3503586

  20. [20]

    Hofer, M., Hellmann, S., Dojchinovski, M., Frey, J.: The new dbpedia release cycle: Increasing agility and efficiency in knowledge extraction work- flows. In: E. Blomqvist, P. Groth, V. de Boer, T. Pellegrini, M. Alam, T. K¨ afer, P. Kiese- berg, S. Kirrane, A. Mero˜ no-Pe˜ nuela, H.J. Pan- dit (eds.) Semantic Systems. In the Era of Knowl- edge Graphs - 1...

  21. [21]

    Jachiet, L., Genev` es, P., Gesbert, N., Laya¨ ıda, N.: On the optimization of recursive relational queries: Application to graph queries. In: D. Maier, R. Pot- tinger, A. Doan, W. Tan, A. Alawini, H.Q. Ngo (eds.) Proceedings of the 2020 International Confer- ence on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 1...

  22. [22]

    IEEE Trans

    van Leeuwen, W., Fletcher, G., Yakovets, N.: A general cardinality estimation framework for sub- graph matching in property graphs. IEEE Trans. Knowl. Data Eng. 35(6), 5485–5505 (2023). DOI 10.1109/TKDE.2022.3161328. URL https://doi. org/10.1109/TKDE.2022.3161328

  23. [23]

    Semantic Web 6(2), 167–195 (2015)

    Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., Bizer, C.: Db- pedia - A large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web 6(2), 167–195 (2015). DOI 10.3233/SW-140134. URL https://doi.org/10.3233/SW-140134

  24. [24]

    Leis, V., Radke, B., Gubichev, A., Mirchev, A., Boncz, P.A., Kemper, A., Neumann, T.: Query optimization through the looking glass, and what we found running the join order benchmark. VLDB J. 27(5), 643–668 (2018). DOI 10.1007/ S00778-017-0480-7. URL https://doi.org/10. 1007/s00778-017-0480-7

  25. [25]

    Moerkotte, G., Neumann, T.: Dynamic program- ming strikes back. In: J.T. Wang (ed.) Proceed- ings of the ACM SIGMOD International Confer- ence on Management of Data, SIGMOD 2008, Van- couver, BC, Canada, June 10-12, 2008, pp. 539–552. ACM (2008). DOI 10.1145/1376616.1376672. URL https://doi.org/10.1145/1376616.1376672

  26. [26]

    Neumann, T.: Efficient generation and execution of dag-structured query graphs. Ph.D. thesis, University of Mannheim, Germany (2005). URL http://bibserv7.bib.uni-mannheim.de/madoc/ volltexte/2005/1089/index.html

  27. [27]

    In: 10th Conference on Innovative Data Systems Research, CIDR 2020, Amsterdam, The Netherlands, January Optimizing Navigational Graph Queries 23 12-15, 2020, Online Proceedings

    Neumann, T., Freitag, M.J.: Umbra: A disk-based system with in-memory performance. In: 10th Conference on Innovative Data Systems Research, CIDR 2020, Amsterdam, The Netherlands, January Optimizing Navigational Graph Queries 23 12-15, 2020, Online Proceedings. www.cidrdb.org (2020). URL http://cidrdb.org/cidr2020/ papers/p29-neumann-cidr20.pdf

  28. [28]

    In: Proceedings of the Sixteenth International Confer- ence on Very Large Databases, p

    Ono, K., Lohman, G.M.: Measuring the complex- ity of join enumeration in query optimization. In: Proceedings of the Sixteenth International Confer- ence on Very Large Databases, p. 314–325. Mor- gan Kaufmann Publishers Inc., San Francisco, CA, USA (1990)

  29. [29]

    URL https://github.com/duckdb/duckdb

    Raasveldt, M., Muehleisen, H.: DuckDB (2023). URL https://github.com/duckdb/duckdb

  30. [30]

    Theory Com- put

    Reutter, J.L., Romero, M., Vardi, M.Y.: Regu- lar queries on graph databases. Theory Com- put. Syst. 61(1), 31–83 (2017). DOI 10.1007/ s00224-016-9676-2. URL https://doi.org/10. 1007/s00224-016-9676-2

  31. [31]

    Nucleic Acids Research51(D1), D638–D646 (2022)

    Szklarczyk, D., Kirsch, R., Koutrouli, M., Nas- tou, K., Mehryary, F., Hachilif, R., Gable, A.L., Fang, T., Doncheva, N., Pyysalo, S., Bork, P., Jensen, L., von Mering, C.: The STRING database in 2023: protein–protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Research51(D1), D638–D646 (2022...

  32. [32]

    Willsey, M., Nandi, C., Wang, Y.R., Flatt, O., Tat- lock, Z., Panchekha, P.: Egg: Fast and extensible equality saturation. Proc. ACM Program. Lang. 5(POPL) (2021). DOI 10.1145/3434304. URL https://doi.org/10.1145/3434304

  33. [33]

    In: 13th Confer- ence on Innovative Data Systems Research, CIDR 2023, Amsterdam, The Netherlands, January 8-11,

    ten Wolde, D., Singh, T., Sz´ arnyas, G., Boncz, P.A.: Duckpgq: Efficient property graph queries in an analytical RDBMS. In: 13th Confer- ence on Innovative Data Systems Research, CIDR 2023, Amsterdam, The Netherlands, January 8-11,

  34. [34]

    URL https://www

    www.cidrdb.org (2023). URL https://www. cidrdb.org/cidr2023/papers/p66-wolde.pdf

  35. [35]

    Yakovets, N., Godfrey, P., Gryz, J.: Query plan- ning for evaluating SPARQL property paths. In: F. ¨Ozcan, G. Koutrika, S. Madden (eds.) Pro- ceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016, pp. 1875–1889. ACM (2016). DOI 10. 1145/2882903.2882944. URL https://doi....

  36. [36]

    URL https://github.com/ ymous5380/ongq

    Ymous, A.: Optimizing Navigational Queries On- line Appendix (2023). URL https://github.com/ ymous5380/ongq