Efficient Path Query Processing in Relational Database Systems
Pith reviewed 2026-05-13 19:50 UTC · model grok-4.3
The pith
ReCAP lets standard relational systems like DuckDB push property constraints deep into path query plans for speedups up to 400000x.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ReCAP provides an abstraction that encodes early filtering for path queries by requiring only an NFA-style state transition function and a handful of functions mirroring those needed for user-defined aggregates; this lets a relational optimizer push arbitrary property constraints deep into the execution plan, regardless of constraint complexity.
What carries the argument
ReCAP abstraction, which encodes early filtering opportunities via NFA state transitions and user-defined aggregate functions so the optimizer can apply constraints to partial paths.
If this is right
- Relational DBMS can match or exceed specialized graph DBMS performance on path queries that include property constraints.
- Early elimination of invalid intermediate results becomes feasible without custom engine modifications for each new constraint type.
- Implementation cost for supporting a new constraint remains low, limited to the state transition and aggregate functions.
- Speedups grow with the number of partial paths that can be pruned early by the constraints.
Where Pith is reading between the lines
- The same early-filtering pattern could be applied to other relational query problems that involve reachability or sequence constraints.
- Vendors could expose ReCAP-like hooks natively to reduce the need for users to write the transition functions themselves.
- Hybrid systems might avoid maintaining separate graph engines if relational optimizers become reliably competitive on path workloads.
- One could test whether automatic derivation of the NFA transition from constraint definitions is feasible.
Load-bearing premise
The relational query optimizer will reliably detect and exploit the early filtering opportunities encoded through ReCAP whenever such filtering is theoretically possible.
What would settle it
A set of queries and graphs where ReCAP is fully implemented yet the optimizer produces the same plan and execution time as the baseline without ReCAP.
Figures
read the original abstract
Path queries are crucial for property graphs, and there is growing interest in queries that combine regular expressions over labels with constraints on property values of vertices and edges. Efficient evaluation of such general path queries requires that intermediate results be eliminated early when there is no possible completion to a full result path. Neither state-of-the-art (SOA) graph DBMS nor relational DBMS currently can do this effectively for a large class of queries. We show that this problem can be addressed by giving a relational optimizer ``a little help'' by specifying early filtering opportunities explicitly in the query. To this end, we propose ReCAP, an abstraction that greatly simplifies the implementation of early filtering techniques for any type of property constraint for which such early filtering can be derived. No matter how complex the constraint, one only needs to implement (1) an NFA-style state transition function and (2) a handful of functions that mirror those needed for user-defined aggregates. We show that when using ReCAP, a standard relational DBMS like DuckDB can effectively push property constraints deep into the query plan, beating the SOA graph and relational DBMS by a factor up to 400,000 over a variety of queries and input graphs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes ReCAP, an abstraction for early filtering of property constraints in path queries over property graphs evaluated on relational DBMS. Users implement an NFA-style state transition function plus a small set of functions mirroring user-defined aggregates; the claim is that this suffices for a standard optimizer (DuckDB) to push constraints deep into the plan, yielding speedups up to 400,000× versus state-of-the-art graph and relational engines across a variety of queries and input graphs.
Significance. If the central claim holds, the work is significant because it shows how commodity relational engines can handle complex regular-path queries with property constraints without custom graph engines or hand-tuned rewrites. The simplicity of the ReCAP interface for arbitrary constraints is a practical strength and could lower the barrier to efficient path query processing in existing RDBMS deployments.
major comments (2)
- [ReCAP Abstraction and Query Optimization] The central claim that an NFA-style transition plus UDA-mirroring functions is sufficient for the relational optimizer to reliably push early filtering for history-dependent or non-local constraints is load-bearing yet under-validated. The manuscript must include concrete query-plan excerpts or cost-model analysis (e.g., in the optimization or experimental sections) demonstrating that DuckDB actually performs the desired pushdown for the more complex constraints used in the evaluation.
- [Experimental Evaluation] Table or figure reporting the 400,000× speedup must be accompanied by the exact query definitions, graph characteristics (size, label distribution, property selectivity), and error bars or run-time variance; without these the magnitude of the reported gains cannot be assessed for robustness.
minor comments (2)
- [ReCAP Abstraction] Notation for the state-transition function and the UDA-mirroring primitives should be introduced with a small, self-contained example in the ReCAP section to improve readability.
- [Related Work] The related-work discussion should explicitly contrast ReCAP with prior techniques for pushing regular-expression constraints into relational plans (e.g., via recursive CTEs or custom operators).
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and the recommendation for major revision. We address each major comment below and will revise the manuscript to incorporate the requested clarifications and additions.
read point-by-point responses
-
Referee: [ReCAP Abstraction and Query Optimization] The central claim that an NFA-style transition plus UDA-mirroring functions is sufficient for the relational optimizer to reliably push early filtering for history-dependent or non-local constraints is load-bearing yet under-validated. The manuscript must include concrete query-plan excerpts or cost-model analysis (e.g., in the optimization or experimental sections) demonstrating that DuckDB actually performs the desired pushdown for the more complex constraints used in the evaluation.
Authors: We agree that explicit evidence of the optimizer's pushdown decisions for complex, history-dependent constraints would strengthen the central claim. In the revised manuscript we will add DuckDB query-plan excerpts (with annotations) for the more complex constraints from the evaluation, together with a short cost-model discussion in the optimization section explaining how the NFA transition function and UDA-mirroring primitives enable the desired early filtering. These additions will directly demonstrate that the ReCAP interface suffices for reliable pushdown. revision: yes
-
Referee: [Experimental Evaluation] Table or figure reporting the 400,000× speedup must be accompanied by the exact query definitions, graph characteristics (size, label distribution, property selectivity), and error bars or run-time variance; without these the magnitude of the reported gains cannot be assessed for robustness.
Authors: We acknowledge that the current experimental presentation lacks sufficient detail for independent assessment of the largest speedups. In the revision we will expand the relevant table/figure and its caption to include the exact query definitions, complete graph characteristics (vertex/edge counts, label distributions, and property-value selectivities), and run-time variance reported as error bars or standard deviations over repeated executions. This will allow readers to evaluate the robustness of all reported gains, including the 400,000× case. revision: yes
Circularity Check
No significant circularity; ReCAP is a user-defined abstraction evaluated via benchmarks
full rationale
The paper's core contribution is the ReCAP abstraction, which requires users to supply an NFA-style state transition function plus a small set of UDA-mirroring functions so that a standard relational optimizer (DuckDB) can push property constraints early. Performance claims rest on experimental comparisons against SOA systems rather than any derivation that reduces to fitted parameters, self-citations, or by-construction equivalence. No load-bearing step in the provided text equates a prediction to its own input or imports uniqueness via author-overlapping citations; the approach is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption NFA-style state transition functions plus a handful of user-defined-aggregate-style functions are sufficient to derive early filtering for any property constraint
invented entities (1)
-
ReCAP abstraction
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat recovery and embed_strictMono unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
ReCAP abstraction: NFA for label regex + selective aggregate (init_d, update_d, is_viable_d, finalize_d) for property constraints, compiled to recursive CTE with early is_viable_d filter
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel and Jcost_pos_of_ne_one unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
early filtering of doomed prefixes via incremental state (last_time, edge_ids, max/min_amount)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
[n. d.]. SQL/PGQ. https://duckpgq.org/documentation/sql_pgq/. [Accessed 1-Nov-2025]
work page 2025
-
[2]
Renzo Angles, Marcelo Arenas, Pablo Barceló, Aidan Hogan, Juan Reutter, and Do- magoj Vrgoč. 2017. Foundations of modern query languages for graph databases. ACM Computing Surveys (CSUR)50, 5 (2017), 1–40
work page 2017
-
[3]
Diego Arroyuelo, Aidan Hogan, Gonzalo Navarro, and Javiel Rojas-Ledesma. 2022. Time-and space-efficient regular path queries. In2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 3091–3105
work page 2022
-
[4]
Jorge Baier, Dietrich Daroch, Juan L. Reutter, and Domagoj Vrgoč. 2017. Evalu- ating Navigational RDF Queries over the Web. InProceedings of the 28th ACM Conference on Hypertext and Social Media(Prague, Czech Republic). Association for Computing Machinery, New York, NY, USA, 165–174. doi:10.1145/3078714. 3078731
-
[5]
Pablo Barceló Baeza. 2013. Querying graph databases. InProceedings of the 32nd ACM SIGMOD-SIGACT-SIGAI symposium on Principles of database systems. 175–188
work page 2013
- [6]
-
[7]
Lars Brenna, Alan Demers, Johannes Gehrke, Mingsheng Hong, Joel Ossher, Biswanath Panda, Mirek Riedewald, Mohit Thatte, and Walker White. 2007. Cayuga: a high-performance event processing engine. InProceedings of the 2007 ACM SIGMOD international conference on Management of data (SIGMOD ’07). Association for Computing Machinery, New York, NY, USA, 1100–11...
- [8]
-
[9]
Chainalysis Team. 2023. Case Study: How We Track Crypto Money Laundering for Off-Chain Crime. https://www.chainalysis.com/blog/japan-cryptocurrency- money-laundering-case-study/ Accessed: 2026-03-26
work page 2023
-
[10]
Sarah Chlyah, Pierre Genevès, and Nabil Layaïda. 2025. Distributed Evaluation of Graph Queries using Recursive Relational Algebra. In2025 IEEE 41st International Conference on Data Engineering (ICDE). IEEE, 2351–2365
work page 2025
-
[11]
Faizan Iftikhar Janjua. 2023. Metaverse Financial Transactions Dataset. https://www.kaggle.com/datasets/faizaniftikharjanjua/metaverse-financial- transactions-dataset. Kaggle dataset. Accessed: 2026-03-26
work page 2023
- [12]
-
[13]
Amela Fejza, Pierre Genevès, Nabil Layaïda, and Sarah Chlyah. 2023. The 𝜇-ra system for recursive path queries over graphs. InProceedings of the 32nd ACM International Conference on Information and Knowledge Management. 5041–5045
work page 2023
-
[14]
Xiyang Feng, Guodong Jin, Ziyi Chen, Chang Liu, and Semih Salihoğlu. 2023. Kùzu graph database management system. InThe Conference on Innovative Data Systems Research, Vol. 7. 25–35
work page 2023
-
[15]
Diego Figueira, Artur Jez, and Anthony W Lin. 2022. Data path queries over embedded graph databases. InProceedings of the 41st ACM SIGMOD-SIGACT- SIGAI Symposium on Principles of Database Systems. 189–201
work page 2022
-
[16]
Diego Figueira and Leonid Libkin. 2015. Path logics for querying graphs: Com- bining expressiveness and efficiency. In2015 30th Annual ACM/IEEE Symposium on Logic in Computer Science. IEEE, 329–340
work page 2015
-
[17]
Nadime Francis, Amélie Gheerbrant, Paolo Guagliardo, Leonid Libkin, Victor Marsault, Wim Martens, Filip Murlak, Liat Peterfreund, Alexandra Rogova, and Diego Rivera Correa and Mirek Riedewald Domagoj Vrgoc. 2023. A Researcher’s Digest of GQL. InThe 26th International Conference on Database Theory, 2023. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 1–1
work page 2023
- [18]
-
[19]
Amélie Gheerbrant, Leonid Libkin, Liat Peterfreund, and Alexandra Rogova
- [20]
-
[21]
Amélie Gheerbrant, Leonid Libkin, and Alexandra Rogova. 2025. Dangers of List Processing in Querying Property Graphs.Proceedings of the ACM on Management of Data3, 3 (2025), 1–25
work page 2025
-
[22]
Daniel Gyllstrom, Eugene Wu, Hee-Jin Chae, Yanlei Diao, Patrick Stahlberg, and Gordon Anderson. 2006. SASE: Complex event processing over streams.arXiv preprint cs/0612128(2006)
work page internal anchor Pith review Pith/arXiv arXiv 2006
-
[23]
Mohamed Hassan, Tatiana Kuznetsova, Hyun Chai Jeong, Walid Aref, and Mohammad Sadoghi. 2018. Extending In-Memory Relational Database En- gines with Native Graph Support. InProceedings of the 21st International Conference on Extending Database Technology (EDBT). OpenProceedings.org. doi:10.5441/002/EDBT.2018.04
-
[24]
Alexandru Iosup, Tim Hegeman, Wing Lung Ngai, Stijn Heldens, Arnau Prat- Pérez, Thomas Manhardto, Hassan Chafio, Mihai Capotă, Narayanan Sundaram, et al. 2016. LDBC Graphalytics: A benchmark for large-scale graph analysis on parallel and distributed platforms.Proceedings of the VLDB Endowment9, 13 (2016), 1317–1328
work page 2016
- [25]
-
[26]
Michael Kaminski and Nissim Francez. 1994. Finite-memory automata.Theoreti- cal Computer Science134, 2 (1994), 329–363
work page 1994
-
[27]
André Koschmieder and Ulf Leser. 2012. Regular path queries on large graphs. InProceedings of the 24th International Conference on Scientific and Statistical Database Management(Chania, Crete, Greece)(SSDBM’12). Springer-Verlag, Berlin, Heidelberg, 177–194. doi:10.1007/978-3-642-31235-9_12
-
[28]
Srijan Kumar, Bryan Hooi, Disha Makhija, Mohit Kumar, Christos Faloutsos, and VS Subrahmanian. 2018. Rev2: Fraudulent user prediction in rating platforms. InProceedings of the Eleventh ACM International Conference on Web Search and Data Mining. ACM, 333–341
work page 2018
-
[29]
Srijan Kumar, Francesca Spezzano, VS Subrahmanian, and Christos Faloutsos
-
[30]
InData Mining (ICDM), 2016 IEEE 16th International Conference on
Edge weight prediction in weighted signed networks. InData Mining (ICDM), 2016 IEEE 16th International Conference on. IEEE, 221–230
work page 2016
-
[31]
Srijan Kumar, Xikun Zhang, and Jure Leskovec. 2019. Predicting dynamic em- bedding trajectory in temporal interaction networks. InProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 1269–1278
work page 2019
-
[32]
Jure Leskovec. [n. d.]. Stanford Large Network Dataset Collection. https: //snap.stanford.edu/data/ [Accessed: 2024]
work page 2024
- [33]
-
[34]
Leonid Libkin, Wim Martens, Filip Murlak, Liat Peterfreund, and Domagoj Vrgoč
-
[35]
InCompanion of the 44th Symposium on Principles of Database Systems
Querying Graph Data: Where We Are and Where To Go. InCompanion of the 44th Symposium on Principles of Database Systems. 9–26
-
[36]
Leonid Libkin, Wim Martens, and Domagoj Vrgoč. 2016. Querying graphs with data.Journal of the ACM (JACM)63, 2 (2016), 1–53
work page 2016
-
[37]
Leonid Libkin and Domagoj Vrgoč. 2012. Regular path queries on graphs with data. InProceedings of the 15th International Conference on Database Theory (Berlin, Germany)(ICDT ’12). Association for Computing Machinery, New York, NY, USA, 74–85. doi:10.1145/2274576.2274585
-
[38]
Chunbin Lin, Benjamin Mandel, Yannis Papakonstantinou, and Matthias Springer
-
[39]
Fast in-memory SQL analytics on typed graphs.Proc. VLDB Endow.10, 3 (Nov. 2016), 265–276. doi:10.14778/3021924.3021941
-
[40]
Katja Losemann and Wim Martens. 2013. The complexity of regular expressions and property paths in SPARQL.ACM Transactions on Database Systems (TODS) 38, 4 (2013), 1–39
work page 2013
-
[41]
Wim Martens, Matthias Niewerth, Tina Popp, Carlos Rojas, Stijn Vansummeren, and Domagoj Vrgoč. 2023. Representing Paths in Graph Database Pattern Match- ing.Proc. VLDB Endow.16, 7 (March 2023), 1790–1803. doi:10.14778/3587136. 3587151
-
[42]
József Marton, Gábor Szárnyas, and Dániel Varró. 2017. Formalising open- Cypher graph queries in relational algebra. InEuropean Conference on Advances in Databases and Information Systems. Springer, 182–196
work page 2017
-
[43]
Memgraph. 2024. Fastest, Most Affordable Graph Database. https://memgraph. com/ [Accessed: 2024]
work page 2024
-
[44]
Alberto O. Mendelzon and Peter T. Wood. 1995. Finding Regular Simple Paths in Graph Databases.SIAM J. Comput.24, 6 (1995), 1235–1258. arXiv:https://doi.org/10.1137/S009753979122370X doi:10.1137/ S009753979122370X
-
[45]
Merkle Science. n.d.. Crypto Risk Scores: How They Help Prevent Financial Crime. https://www.merklescience.com/blog/crypto-risk-scores-how-they- help-prevent-financial-crime Accessed: 2026-03-26
work page 2026
-
[46]
Amine Mhedhbi, Amol Deshpande, and Semih Salihoğlu. 2024. Modern Tech- niques For Querying Graph-structured Databases.Found. Trends Databases14, 2 (Oct. 2024), 72–185. doi:10.1561/1900000090
-
[47]
Neo4j. 2024. Neo4j Graph Database and Analytics. https://neo4j.com/ [Accessed: 2024]
work page 2024
-
[48]
Van-Quyet Nguyen, Van-Hau Nguyen, Minh-Quy Nguyen, Quyet-Thang Huynh, and Kyungbaek Kim. 2021. Efficiently Estimating Joining Cost of Subqueries in Regular Path Queries.Electronics10, 9 (2021), 990
work page 2021
-
[49]
Van-Quyet Nguyen, Van-Hau Nguyen, Huy-The Vu, Minh-Quy Nguyen, Quyet- Thang Huynh, and Kyungbaek Kim. 2020. Accelerating Parallel Evaluation of Regular Path Queries on Large Graphs by Estimating Joining Cost of Subqueries. InThe 9th International Conference on Smart Media and Applications. 464–469
work page 2020
-
[50]
Anil Pacaci, Angela Bonifati, and M Tamer Özsu. 2020. Regular path query evalu- ation on streaming graphs. InProceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 1415–1430
work page 2020
-
[51]
Mark Raasveldt and Hannes Mühleisen. 2019. Duckdb: an embeddable analytical database. InProceedings of the 2019 international conference on management of data. 1981–1984
work page 2019
-
[52]
Michael Sipser. 1996. Introduction to the Theory of Computation.ACM Sigact News27, 1 (1996), 27–29
work page 1996
-
[53]
Frank Tetzel, Romans Kasperovics, and Wolfgang Lehner. 2019. Graph traversals for regular path queries. InProceedings of the 2nd Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA). 1–8
work page 2019
-
[54]
Domagoj Vrgoč. 2015. Using variable automata for querying data graphs.Inform. Process. Lett.115, 3 (2015), 425–430
work page 2015
-
[55]
Domagoj Vrgoč, Carlos Rojas, Renzo Angles, Marcelo Arenas, Diego Arroyuelo, Carlos Buil-Aranda, Aidan Hogan, Gonzalo Navarro, Cristian Riveros, and Juan Romero. 2023. MillenniumDB: An Open-Source Graph Database System.Data Intelligence5, 3 (08 2023), 560–610. arXiv:https://direct.mit.edu/dint/article- pdf/5/3/560/2158194/dint_a_00229.pdf doi:10.1162/dint_a_00229
-
[56]
Sarisht Wadhwa, Anagh Prasad, Sayan Ranu, Amitabha Bagchi, and Srikanta Bedathur. 2019. Efficiently Answering Regular Simple Path Queries on Large Labeled Networks. InProceedings of the 2019 International Conference on Man- agement of Data(Amsterdam, Netherlands)(SIGMOD ’19). Association for Com- puting Machinery, New York, NY, USA, 1463–1480. doi:10.1145...
-
[57]
Daniel ten Wolde, Gábor Szárnyas, and Peter Boncz. 2023. Duckpgq: Bringing sql/pgq´ to duckdb.Proceedings of the VLDB Endowment16, 12 (2023), 4034–4037
work page 2023
-
[58]
Nikolay Yakovets, Parke Godfrey, and Jarek Gryz. 2013. Evaluation of SPARQL Property Paths via Recursive SQL.AMW1087 (2013)
work page 2013
-
[59]
Nikolay Yakovets, Parke Godfrey, and Jarek Gryz. 2015. WAVEGUIDE: Evaluating SPARQL Property Path Queries.. InEDBT, Vol. 2015. 525–528
work page 2015
-
[60]
Nikolay Yakovets, Parke Godfrey, and Jarek Gryz. 2016. Query planning for evalu- ating SPARQL property paths. InProceedings of the 2016 International Conference on Management of Data. 1875–1889
work page 2016
-
[61]
Guangyi Yang, Xiaoxing Liu, and Beixin Li. 2023. Anti-money laundering super- vision by intelligent algorithm.Computers & Security132 (2023), 103344
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.