Efficient Path Query Processing in Relational Database Systems

Diego Rivera Correa; Mirek Riedewald

arxiv: 2604.02553 · v1 · submitted 2026-04-02 · 💻 cs.DB

Efficient Path Query Processing in Relational Database Systems

Diego Rivera Correa , Mirek Riedewald This is my paper

Pith reviewed 2026-05-13 19:50 UTC · model grok-4.3

classification 💻 cs.DB

keywords path queriesproperty graphsquery optimizationearly filteringrelational databasesNFADuckDBgraph databases

0 comments

The pith

ReCAP lets standard relational systems like DuckDB push property constraints deep into path query plans for speedups up to 400000x.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Path queries on property graphs combine regular expressions over labels with value constraints on vertices and edges. Current graph and relational database systems cannot eliminate invalid partial paths early enough for many such queries, causing large amounts of wasted work. The paper introduces ReCAP, an abstraction that makes it simple to encode early filtering opportunities for any property constraint. Users implement only an NFA-style state transition function and a small set of functions similar to those for user-defined aggregates. When these are supplied, a relational optimizer can apply the constraints much earlier in the plan, delivering large performance gains on diverse workloads.

Core claim

ReCAP provides an abstraction that encodes early filtering for path queries by requiring only an NFA-style state transition function and a handful of functions mirroring those needed for user-defined aggregates; this lets a relational optimizer push arbitrary property constraints deep into the execution plan, regardless of constraint complexity.

What carries the argument

ReCAP abstraction, which encodes early filtering opportunities via NFA state transitions and user-defined aggregate functions so the optimizer can apply constraints to partial paths.

If this is right

Relational DBMS can match or exceed specialized graph DBMS performance on path queries that include property constraints.
Early elimination of invalid intermediate results becomes feasible without custom engine modifications for each new constraint type.
Implementation cost for supporting a new constraint remains low, limited to the state transition and aggregate functions.
Speedups grow with the number of partial paths that can be pruned early by the constraints.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same early-filtering pattern could be applied to other relational query problems that involve reachability or sequence constraints.
Vendors could expose ReCAP-like hooks natively to reduce the need for users to write the transition functions themselves.
Hybrid systems might avoid maintaining separate graph engines if relational optimizers become reliably competitive on path workloads.
One could test whether automatic derivation of the NFA transition from constraint definitions is feasible.

Load-bearing premise

The relational query optimizer will reliably detect and exploit the early filtering opportunities encoded through ReCAP whenever such filtering is theoretically possible.

What would settle it

A set of queries and graphs where ReCAP is fully implemented yet the optimizer produces the same plan and execution time as the baseline without ReCAP.

Figures

Figures reproduced from arXiv: 2604.02553 by Diego Rivera Correa, Mirek Riedewald.

**Figure 1.** Figure 1: Graph representing accounts (vertices) transactions between them (edges). The label is shown next to the ID; properties [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Runtime of ReCAP vs. SOA competitors for varying path-length limits. All competitors scale poorly and exceed a 2-hour timeout for ℓ > 5. the number of candidate paths, this wasted computation dominates runtime [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Query plan for Example 1 on Neo4j. It shows all paths up to length ℓ being generated first (left-most column), and then the constraints are applied in the end (middle and right-most column). Why can the optimizer not find a better query plan? Surely the timestamp monotonicity could be checked any time an edge is appended to a path prefix. For example, for a 3-way self-join of the edge relation, a condition… view at source ↗

**Figure 4.** Figure 4: Left: NFA for regex Domestic+ Foreign. Right: Tabular representation of the NFA. Listing 3: NFA in SQL. WITH RECURSIVE Paths AS ( SELECT s as v, 𝑞0 as q UNION ALL SELECT E.dst as v, T.to_state as q, FROM Paths P JOIN Edges E ON P.v = E.src JOIN Transitions T ON T.from_state = P.q WHERE T.label = E.label ) SELECT ... FROM Paths WHERE q IN 𝑞𝐹 for a given path query 𝑄 = (𝑆, 𝑅, 𝜑) is to use standard textbook … view at source ↗

**Figure 6.** Figure 6: NFA representation for 𝑄𝐵 (Example 4). The regex is represented as a standard NFA, but with functions update_d and is_viable_d to maintain the dictionary and use it for identifying doomed intermediate paths, respectively. 𝑞0 AS q, init_d() AS D UNION ALL SELECT E.dst AS v, T.to_state AS q, update_d(P.D, P.q, T.to_state, E.∗) AS D FROM Paths P JOIN Edges E ON P.v = E.src JOIN Transitions T ON P.q = T.from_s… view at source ↗

**Figure 7.** Figure 7: init_d inlined into the anchor’s SELECT clause. WITH RECURSIVE Paths AS ( SELECT src as v, q0 as q, init_d() as D, 0 as path_length UNION ALL … flatten D … JOIN Edges E ON … JOIN Transitions T ON … WHERE T.label = E.label AND is_viable_d(P.D, P.q, T.to_state, E.*) … inline is_viable_d() … JOIN Edges E ON … JOIN Transitions T ON … WHERE T.label = E.label AND CASE WHEN P.q = 1 AND T.to_state = 2 THEN TRUE WH… view at source ↗

**Figure 11.** Figure 11: Fully optimized code for the ReCAP from Example 4. Graph |𝑉 | |𝐸| Bitcoin 5k 35k Metaverse 1.3k 78k Reddit 35k 286k LDBC100 (P-K-P) 448k 19.9M Datagen-7.6 754k 84.3M Datagen-7.7 13.1M 53.7M [PITH_FULL_IMAGE:figures/full_fig_p009_11.png] view at source ↗

**Figure 12.** Figure 12: Running-time comparison between ReCAP- Standard vs ReCAP- Optimized. All y-axis are log scale. graph DBMS stems from the implementation of the regex: while the former uses our implementation as a join with the transitions table, the latter apply the forced splitting that causes additional overhead. However, all competitors followed the same approach for the property constraints, applying them in the end.… view at source ↗

**Figure 13.** Figure 13: Running-time comparison across different queries and graphs. The competitors behave very similar to each other [PITH_FULL_IMAGE:figures/full_fig_p012_13.png] view at source ↗

**Figure 14.** Figure 14: Total number of intermediate paths generated across different queries and datasets. [PITH_FULL_IMAGE:figures/full_fig_p012_14.png] view at source ↗

read the original abstract

Path queries are crucial for property graphs, and there is growing interest in queries that combine regular expressions over labels with constraints on property values of vertices and edges. Efficient evaluation of such general path queries requires that intermediate results be eliminated early when there is no possible completion to a full result path. Neither state-of-the-art (SOA) graph DBMS nor relational DBMS currently can do this effectively for a large class of queries. We show that this problem can be addressed by giving a relational optimizer ``a little help'' by specifying early filtering opportunities explicitly in the query. To this end, we propose ReCAP, an abstraction that greatly simplifies the implementation of early filtering techniques for any type of property constraint for which such early filtering can be derived. No matter how complex the constraint, one only needs to implement (1) an NFA-style state transition function and (2) a handful of functions that mirror those needed for user-defined aggregates. We show that when using ReCAP, a standard relational DBMS like DuckDB can effectively push property constraints deep into the query plan, beating the SOA graph and relational DBMS by a factor up to 400,000 over a variety of queries and input graphs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ReCAP gives a lightweight way to push property constraints into relational path query plans, but the huge speedups rest on unshown experiments and an optimizer that may not always cooperate.

read the letter

The core idea is straightforward: ReCAP lets you hand the optimizer an NFA-style state machine plus a few UDA-style functions so that DuckDB can drop non-viable path prefixes early, even when the constraints are complicated. That is the actual novelty. It decouples the filtering logic from the core optimizer, which is cleaner than trying to teach the planner every possible constraint from scratch. If the implementation really is as small as claimed, it could let ordinary relational engines handle a slice of graph workloads without a separate engine. That part looks useful on paper. The experiments are the weak link. The abstract throws around a 400,000x number but supplies no query definitions, graph sizes, or baseline details. Without those, it is impossible to tell whether the gains come from the abstraction itself or from favorable cases where the cost model already cooperates. The stress-test concern is also real: nothing guarantees that DuckDB’s rewrite rules will always push the custom state deep enough when the transition logic is non-local or history-dependent. If the paper only shows hand-picked queries where it works, the practical takeaway shrinks. This is the kind of work that belongs in a reading group for people who care about query optimization or graph-on-relational systems. It is worth a serious referee round because the abstraction is new and the problem is real, but only if the full paper includes reproducible experiments and a clear discussion of when the optimizer actually fails to exploit the hooks. I would send it out rather than desk-reject.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes ReCAP, an abstraction for early filtering of property constraints in path queries over property graphs evaluated on relational DBMS. Users implement an NFA-style state transition function plus a small set of functions mirroring user-defined aggregates; the claim is that this suffices for a standard optimizer (DuckDB) to push constraints deep into the plan, yielding speedups up to 400,000× versus state-of-the-art graph and relational engines across a variety of queries and input graphs.

Significance. If the central claim holds, the work is significant because it shows how commodity relational engines can handle complex regular-path queries with property constraints without custom graph engines or hand-tuned rewrites. The simplicity of the ReCAP interface for arbitrary constraints is a practical strength and could lower the barrier to efficient path query processing in existing RDBMS deployments.

major comments (2)

[ReCAP Abstraction and Query Optimization] The central claim that an NFA-style transition plus UDA-mirroring functions is sufficient for the relational optimizer to reliably push early filtering for history-dependent or non-local constraints is load-bearing yet under-validated. The manuscript must include concrete query-plan excerpts or cost-model analysis (e.g., in the optimization or experimental sections) demonstrating that DuckDB actually performs the desired pushdown for the more complex constraints used in the evaluation.
[Experimental Evaluation] Table or figure reporting the 400,000× speedup must be accompanied by the exact query definitions, graph characteristics (size, label distribution, property selectivity), and error bars or run-time variance; without these the magnitude of the reported gains cannot be assessed for robustness.

minor comments (2)

[ReCAP Abstraction] Notation for the state-transition function and the UDA-mirroring primitives should be introduced with a small, self-contained example in the ReCAP section to improve readability.
[Related Work] The related-work discussion should explicitly contrast ReCAP with prior techniques for pushing regular-expression constraints into relational plans (e.g., via recursive CTEs or custom operators).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for major revision. We address each major comment below and will revise the manuscript to incorporate the requested clarifications and additions.

read point-by-point responses

Referee: [ReCAP Abstraction and Query Optimization] The central claim that an NFA-style transition plus UDA-mirroring functions is sufficient for the relational optimizer to reliably push early filtering for history-dependent or non-local constraints is load-bearing yet under-validated. The manuscript must include concrete query-plan excerpts or cost-model analysis (e.g., in the optimization or experimental sections) demonstrating that DuckDB actually performs the desired pushdown for the more complex constraints used in the evaluation.

Authors: We agree that explicit evidence of the optimizer's pushdown decisions for complex, history-dependent constraints would strengthen the central claim. In the revised manuscript we will add DuckDB query-plan excerpts (with annotations) for the more complex constraints from the evaluation, together with a short cost-model discussion in the optimization section explaining how the NFA transition function and UDA-mirroring primitives enable the desired early filtering. These additions will directly demonstrate that the ReCAP interface suffices for reliable pushdown. revision: yes
Referee: [Experimental Evaluation] Table or figure reporting the 400,000× speedup must be accompanied by the exact query definitions, graph characteristics (size, label distribution, property selectivity), and error bars or run-time variance; without these the magnitude of the reported gains cannot be assessed for robustness.

Authors: We acknowledge that the current experimental presentation lacks sufficient detail for independent assessment of the largest speedups. In the revision we will expand the relevant table/figure and its caption to include the exact query definitions, complete graph characteristics (vertex/edge counts, label distributions, and property-value selectivities), and run-time variance reported as error bars or standard deviations over repeated executions. This will allow readers to evaluate the robustness of all reported gains, including the 400,000× case. revision: yes

Circularity Check

0 steps flagged

No significant circularity; ReCAP is a user-defined abstraction evaluated via benchmarks

full rationale

The paper's core contribution is the ReCAP abstraction, which requires users to supply an NFA-style state transition function plus a small set of UDA-mirroring functions so that a standard relational optimizer (DuckDB) can push property constraints early. Performance claims rest on experimental comparisons against SOA systems rather than any derivation that reduces to fitted parameters, self-citations, or by-construction equivalence. No load-bearing step in the provided text equates a prediction to its own input or imports uniqueness via author-overlapping citations; the approach is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that NFA-style transitions plus a small set of aggregate-like functions are sufficient to expose all useful early filtering opportunities to a standard relational optimizer. No free parameters or invented physical entities are introduced.

axioms (1)

domain assumption NFA-style state transition functions plus a handful of user-defined-aggregate-style functions are sufficient to derive early filtering for any property constraint
Invoked when describing what the user must implement to use ReCAP.

invented entities (1)

ReCAP abstraction no independent evidence
purpose: To simplify the implementation of early filtering techniques inside relational query plans
New abstraction introduced by the paper to bridge user-defined constraint logic with the relational optimizer.

pith-pipeline@v0.9.0 · 5501 in / 1294 out tokens · 35481 ms · 2026-05-13T19:50:56.069637+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat recovery and embed_strictMono unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

ReCAP abstraction: NFA for label regex + selective aggregate (init_d, update_d, is_viable_d, finalize_d) for property constraints, compiled to recursive CTE with early is_viable_d filter
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel and Jcost_pos_of_ne_one unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

early filtering of doomed prefixes via incremental state (last_time, edge_ids, max/min_amount)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · 1 internal anchor

[1]

[n. d.]. SQL/PGQ. https://duckpgq.org/documentation/sql_pgq/. [Accessed 1-Nov-2025]

work page 2025
[2]

Renzo Angles, Marcelo Arenas, Pablo Barceló, Aidan Hogan, Juan Reutter, and Do- magoj Vrgoč. 2017. Foundations of modern query languages for graph databases. ACM Computing Surveys (CSUR)50, 5 (2017), 1–40

work page 2017
[3]

Diego Arroyuelo, Aidan Hogan, Gonzalo Navarro, and Javiel Rojas-Ledesma. 2022. Time-and space-efficient regular path queries. In2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 3091–3105

work page 2022
[4]

Reutter, and Domagoj Vrgoč

Jorge Baier, Dietrich Daroch, Juan L. Reutter, and Domagoj Vrgoč. 2017. Evalu- ating Navigational RDF Queries over the Web. InProceedings of the 28th ACM Conference on Hypertext and Social Media(Prague, Czech Republic). Association for Computing Machinery, New York, NY, USA, 165–174. doi:10.1145/3078714. 3078731

work page doi:10.1145/3078714 2017
[5]

Pablo Barceló Baeza. 2013. Querying graph databases. InProceedings of the 32nd ACM SIGMOD-SIGACT-SIGAI symposium on Principles of database systems. 175–188

work page 2013
[6]

Georgiy Belyanin and Semyon Grigoriev. 2024. Single-Source Regular Path Querying in Terms of Linear Algebra.arXiv preprint arXiv:2412.10287(2024)

work page arXiv 2024
[7]

Lars Brenna, Alan Demers, Johannes Gehrke, Mingsheng Hong, Joel Ossher, Biswanath Panda, Mirek Riedewald, Mohit Thatte, and Walker White. 2007. Cayuga: a high-performance event processing engine. InProceedings of the 2007 ACM SIGMOD international conference on Management of data (SIGMOD ’07). Association for Computing Machinery, New York, NY, USA, 1100–11...

work page doi:10.1145/1247480.1247620 2007
[8]

Marco Bucchi, Alejandro Grez, Andrés Quintana, Cristian Riveros, and Stijn Vansummeren. 2021. CORE: a complex event recognition engine.arXiv preprint arXiv:2111.04635(2021)

work page arXiv 2021
[9]

Chainalysis Team. 2023. Case Study: How We Track Crypto Money Laundering for Off-Chain Crime. https://www.chainalysis.com/blog/japan-cryptocurrency- money-laundering-case-study/ Accessed: 2026-03-26

work page 2023
[10]

Sarah Chlyah, Pierre Genevès, and Nabil Layaïda. 2025. Distributed Evaluation of Graph Queries using Recursive Relational Algebra. In2025 IEEE 41st International Conference on Data Engineering (ICDE). IEEE, 2351–2365

work page 2025
[11]

Faizan Iftikhar Janjua. 2023. Metaverse Financial Transactions Dataset. https://www.kaggle.com/datasets/faizaniftikharjanjua/metaverse-financial- transactions-dataset. Kaggle dataset. Accessed: 2026-03-26

work page 2023
[12]

Benjamín Farías, Wim Martens, Carlos Rojas, and Domagoj Vrgoč. 2023. Pathfinder: A unified approach for handling paths in graph query languages. arXiv preprint arXiv:2306.02194(2023)

work page arXiv 2023
[13]

Amela Fejza, Pierre Genevès, Nabil Layaïda, and Sarah Chlyah. 2023. The 𝜇-ra system for recursive path queries over graphs. InProceedings of the 32nd ACM International Conference on Information and Knowledge Management. 5041–5045

work page 2023
[14]

Xiyang Feng, Guodong Jin, Ziyi Chen, Chang Liu, and Semih Salihoğlu. 2023. Kùzu graph database management system. InThe Conference on Innovative Data Systems Research, Vol. 7. 25–35

work page 2023
[15]

Diego Figueira, Artur Jez, and Anthony W Lin. 2022. Data path queries over embedded graph databases. InProceedings of the 41st ACM SIGMOD-SIGACT- SIGAI Symposium on Principles of Database Systems. 189–201

work page 2022
[16]

Diego Figueira and Leonid Libkin. 2015. Path logics for querying graphs: Com- bining expressiveness and efficiency. In2015 30th Annual ACM/IEEE Symposium on Logic in Computer Science. IEEE, 329–340

work page 2015
[17]

Nadime Francis, Amélie Gheerbrant, Paolo Guagliardo, Leonid Libkin, Victor Marsault, Wim Martens, Filip Murlak, Liat Peterfreund, Alexandra Rogova, and Diego Rivera Correa and Mirek Riedewald Domagoj Vrgoc. 2023. A Researcher’s Digest of GQL. InThe 26th International Conference on Database Theory, 2023. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 1–1

work page 2023
[18]

Roberto García, Renzo Angles, Vicente Rojas, and Sebastián Ferrada. 2025. PathDB: A system for evaluating regular path queries.arXiv preprint arXiv:2507.01755(2025)

work page arXiv 2025
[19]

Amélie Gheerbrant, Leonid Libkin, Liat Peterfreund, and Alexandra Rogova

work page
[20]

Gql and sql/pgq: Theoretical models and expressive power.arXiv preprint arXiv:2409.01102(2024)

work page arXiv 2024
[21]

Amélie Gheerbrant, Leonid Libkin, and Alexandra Rogova. 2025. Dangers of List Processing in Querying Property Graphs.Proceedings of the ACM on Management of Data3, 3 (2025), 1–25

work page 2025
[22]

Daniel Gyllstrom, Eugene Wu, Hee-Jin Chae, Yanlei Diao, Patrick Stahlberg, and Gordon Anderson. 2006. SASE: Complex event processing over streams.arXiv preprint cs/0612128(2006)

work page internal anchor Pith review Pith/arXiv arXiv 2006
[23]

Mohamed Hassan, Tatiana Kuznetsova, Hyun Chai Jeong, Walid Aref, and Mohammad Sadoghi. 2018. Extending In-Memory Relational Database En- gines with Native Graph Support. InProceedings of the 21st International Conference on Extending Database Technology (EDBT). OpenProceedings.org. doi:10.5441/002/EDBT.2018.04

work page doi:10.5441/002/edbt.2018.04 2018
[24]

Alexandru Iosup, Tim Hegeman, Wing Lung Ngai, Stijn Heldens, Arnau Prat- Pérez, Thomas Manhardto, Hassan Chafio, Mihai Capotă, Narayanan Sundaram, et al. 2016. LDBC Graphalytics: A benchmark for large-scale graph analysis on parallel and distributed platforms.Proceedings of the VLDB Endowment9, 13 (2016), 1317–1328

work page 2016
[25]

Guodong Jin and Semih Salihoglu. 2021. Making RDBMSs efficient on graph workloads through predefined joins.arXiv preprint arXiv:2108.10540(2021)

work page arXiv 2021
[26]

Michael Kaminski and Nissim Francez. 1994. Finite-memory automata.Theoreti- cal Computer Science134, 2 (1994), 329–363

work page 1994
[27]

André Koschmieder and Ulf Leser. 2012. Regular path queries on large graphs. InProceedings of the 24th International Conference on Scientific and Statistical Database Management(Chania, Crete, Greece)(SSDBM’12). Springer-Verlag, Berlin, Heidelberg, 177–194. doi:10.1007/978-3-642-31235-9_12

work page doi:10.1007/978-3-642-31235-9_12 2012
[28]

Srijan Kumar, Bryan Hooi, Disha Makhija, Mohit Kumar, Christos Faloutsos, and VS Subrahmanian. 2018. Rev2: Fraudulent user prediction in rating platforms. InProceedings of the Eleventh ACM International Conference on Web Search and Data Mining. ACM, 333–341

work page 2018
[29]

Srijan Kumar, Francesca Spezzano, VS Subrahmanian, and Christos Faloutsos

work page
[30]

InData Mining (ICDM), 2016 IEEE 16th International Conference on

Edge weight prediction in weighted signed networks. InData Mining (ICDM), 2016 IEEE 16th International Conference on. IEEE, 221–230

work page 2016
[31]

Srijan Kumar, Xikun Zhang, and Jure Leskovec. 2019. Predicting dynamic em- bedding trajectory in temporal interaction networks. InProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 1269–1278

work page 2019
[32]

Jure Leskovec. [n. d.]. Stanford Large Network Dataset Collection. https: //snap.stanford.edu/data/ [Accessed: 2024]

work page 2024
[33]

Qi Liang, Dian Ouyang, Fan Zhang, Jianye Yang, Xuemin Lin, and Zhihong Tian. 2024. Efficient Regular Simple Path Queries under Transitive Restricted Expressions.Proc. VLDB Endow.17, 7 (March 2024), 1710–1722. doi:10.14778/ 3654621.3654636

work page arXiv 2024
[34]

Leonid Libkin, Wim Martens, Filip Murlak, Liat Peterfreund, and Domagoj Vrgoč

work page
[35]

InCompanion of the 44th Symposium on Principles of Database Systems

Querying Graph Data: Where We Are and Where To Go. InCompanion of the 44th Symposium on Principles of Database Systems. 9–26

work page
[36]

Leonid Libkin, Wim Martens, and Domagoj Vrgoč. 2016. Querying graphs with data.Journal of the ACM (JACM)63, 2 (2016), 1–53

work page 2016
[37]

Leonid Libkin and Domagoj Vrgoč. 2012. Regular path queries on graphs with data. InProceedings of the 15th International Conference on Database Theory (Berlin, Germany)(ICDT ’12). Association for Computing Machinery, New York, NY, USA, 74–85. doi:10.1145/2274576.2274585

work page doi:10.1145/2274576.2274585 2012
[38]

Chunbin Lin, Benjamin Mandel, Yannis Papakonstantinou, and Matthias Springer

work page
[39]

VLDB Endow.10, 3 (Nov

Fast in-memory SQL analytics on typed graphs.Proc. VLDB Endow.10, 3 (Nov. 2016), 265–276. doi:10.14778/3021924.3021941

work page doi:10.14778/3021924.3021941 2016
[40]

Katja Losemann and Wim Martens. 2013. The complexity of regular expressions and property paths in SPARQL.ACM Transactions on Database Systems (TODS) 38, 4 (2013), 1–39

work page 2013
[41]

Wim Martens, Matthias Niewerth, Tina Popp, Carlos Rojas, Stijn Vansummeren, and Domagoj Vrgoč. 2023. Representing Paths in Graph Database Pattern Match- ing.Proc. VLDB Endow.16, 7 (March 2023), 1790–1803. doi:10.14778/3587136. 3587151

work page doi:10.14778/3587136 2023
[42]

József Marton, Gábor Szárnyas, and Dániel Varró. 2017. Formalising open- Cypher graph queries in relational algebra. InEuropean Conference on Advances in Databases and Information Systems. Springer, 182–196

work page 2017
[43]

Memgraph. 2024. Fastest, Most Affordable Graph Database. https://memgraph. com/ [Accessed: 2024]

work page 2024
[44]

Mendelzon and Peter T

Alberto O. Mendelzon and Peter T. Wood. 1995. Finding Regular Simple Paths in Graph Databases.SIAM J. Comput.24, 6 (1995), 1235–1258. arXiv:https://doi.org/10.1137/S009753979122370X doi:10.1137/ S009753979122370X

work page doi:10.1137/s009753979122370x 1995
[45]

Merkle Science. n.d.. Crypto Risk Scores: How They Help Prevent Financial Crime. https://www.merklescience.com/blog/crypto-risk-scores-how-they- help-prevent-financial-crime Accessed: 2026-03-26

work page 2026
[46]

Amine Mhedhbi, Amol Deshpande, and Semih Salihoğlu. 2024. Modern Tech- niques For Querying Graph-structured Databases.Found. Trends Databases14, 2 (Oct. 2024), 72–185. doi:10.1561/1900000090

work page doi:10.1561/1900000090 2024
[47]

Neo4j. 2024. Neo4j Graph Database and Analytics. https://neo4j.com/ [Accessed: 2024]

work page 2024
[48]

Van-Quyet Nguyen, Van-Hau Nguyen, Minh-Quy Nguyen, Quyet-Thang Huynh, and Kyungbaek Kim. 2021. Efficiently Estimating Joining Cost of Subqueries in Regular Path Queries.Electronics10, 9 (2021), 990

work page 2021
[49]

Van-Quyet Nguyen, Van-Hau Nguyen, Huy-The Vu, Minh-Quy Nguyen, Quyet- Thang Huynh, and Kyungbaek Kim. 2020. Accelerating Parallel Evaluation of Regular Path Queries on Large Graphs by Estimating Joining Cost of Subqueries. InThe 9th International Conference on Smart Media and Applications. 464–469

work page 2020
[50]

Anil Pacaci, Angela Bonifati, and M Tamer Özsu. 2020. Regular path query evalu- ation on streaming graphs. InProceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 1415–1430

work page 2020
[51]

Mark Raasveldt and Hannes Mühleisen. 2019. Duckdb: an embeddable analytical database. InProceedings of the 2019 international conference on management of data. 1981–1984

work page 2019
[52]

Michael Sipser. 1996. Introduction to the Theory of Computation.ACM Sigact News27, 1 (1996), 27–29

work page 1996
[53]

Frank Tetzel, Romans Kasperovics, and Wolfgang Lehner. 2019. Graph traversals for regular path queries. InProceedings of the 2nd Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA). 1–8

work page 2019
[54]

Domagoj Vrgoč. 2015. Using variable automata for querying data graphs.Inform. Process. Lett.115, 3 (2015), 425–430

work page 2015
[55]

Domagoj Vrgoč, Carlos Rojas, Renzo Angles, Marcelo Arenas, Diego Arroyuelo, Carlos Buil-Aranda, Aidan Hogan, Gonzalo Navarro, Cristian Riveros, and Juan Romero. 2023. MillenniumDB: An Open-Source Graph Database System.Data Intelligence5, 3 (08 2023), 560–610. arXiv:https://direct.mit.edu/dint/article- pdf/5/3/560/2158194/dint_a_00229.pdf doi:10.1162/dint_a_00229

work page doi:10.1162/dint_a_00229 2023
[56]

Sarisht Wadhwa, Anagh Prasad, Sayan Ranu, Amitabha Bagchi, and Srikanta Bedathur. 2019. Efficiently Answering Regular Simple Path Queries on Large Labeled Networks. InProceedings of the 2019 International Conference on Man- agement of Data(Amsterdam, Netherlands)(SIGMOD ’19). Association for Com- puting Machinery, New York, NY, USA, 1463–1480. doi:10.1145...

work page doi:10.1145/3299869.3319882 2019
[57]

Daniel ten Wolde, Gábor Szárnyas, and Peter Boncz. 2023. Duckpgq: Bringing sql/pgq´ to duckdb.Proceedings of the VLDB Endowment16, 12 (2023), 4034–4037

work page 2023
[58]

Nikolay Yakovets, Parke Godfrey, and Jarek Gryz. 2013. Evaluation of SPARQL Property Paths via Recursive SQL.AMW1087 (2013)

work page 2013
[59]

Nikolay Yakovets, Parke Godfrey, and Jarek Gryz. 2015. WAVEGUIDE: Evaluating SPARQL Property Path Queries.. InEDBT, Vol. 2015. 525–528

work page 2015
[60]

Nikolay Yakovets, Parke Godfrey, and Jarek Gryz. 2016. Query planning for evalu- ating SPARQL property paths. InProceedings of the 2016 International Conference on Management of Data. 1875–1889

work page 2016
[61]

Guangyi Yang, Xiaoxing Liu, and Beixin Li. 2023. Anti-money laundering super- vision by intelligent algorithm.Computers & Security132 (2023), 103344

work page 2023

[1] [1]

[n. d.]. SQL/PGQ. https://duckpgq.org/documentation/sql_pgq/. [Accessed 1-Nov-2025]

work page 2025

[2] [2]

Renzo Angles, Marcelo Arenas, Pablo Barceló, Aidan Hogan, Juan Reutter, and Do- magoj Vrgoč. 2017. Foundations of modern query languages for graph databases. ACM Computing Surveys (CSUR)50, 5 (2017), 1–40

work page 2017

[3] [3]

Diego Arroyuelo, Aidan Hogan, Gonzalo Navarro, and Javiel Rojas-Ledesma. 2022. Time-and space-efficient regular path queries. In2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 3091–3105

work page 2022

[4] [4]

Reutter, and Domagoj Vrgoč

Jorge Baier, Dietrich Daroch, Juan L. Reutter, and Domagoj Vrgoč. 2017. Evalu- ating Navigational RDF Queries over the Web. InProceedings of the 28th ACM Conference on Hypertext and Social Media(Prague, Czech Republic). Association for Computing Machinery, New York, NY, USA, 165–174. doi:10.1145/3078714. 3078731

work page doi:10.1145/3078714 2017

[5] [5]

Pablo Barceló Baeza. 2013. Querying graph databases. InProceedings of the 32nd ACM SIGMOD-SIGACT-SIGAI symposium on Principles of database systems. 175–188

work page 2013

[6] [6]

Georgiy Belyanin and Semyon Grigoriev. 2024. Single-Source Regular Path Querying in Terms of Linear Algebra.arXiv preprint arXiv:2412.10287(2024)

work page arXiv 2024

[7] [7]

Lars Brenna, Alan Demers, Johannes Gehrke, Mingsheng Hong, Joel Ossher, Biswanath Panda, Mirek Riedewald, Mohit Thatte, and Walker White. 2007. Cayuga: a high-performance event processing engine. InProceedings of the 2007 ACM SIGMOD international conference on Management of data (SIGMOD ’07). Association for Computing Machinery, New York, NY, USA, 1100–11...

work page doi:10.1145/1247480.1247620 2007

[8] [8]

Marco Bucchi, Alejandro Grez, Andrés Quintana, Cristian Riveros, and Stijn Vansummeren. 2021. CORE: a complex event recognition engine.arXiv preprint arXiv:2111.04635(2021)

work page arXiv 2021

[9] [9]

Chainalysis Team. 2023. Case Study: How We Track Crypto Money Laundering for Off-Chain Crime. https://www.chainalysis.com/blog/japan-cryptocurrency- money-laundering-case-study/ Accessed: 2026-03-26

work page 2023

[10] [10]

Sarah Chlyah, Pierre Genevès, and Nabil Layaïda. 2025. Distributed Evaluation of Graph Queries using Recursive Relational Algebra. In2025 IEEE 41st International Conference on Data Engineering (ICDE). IEEE, 2351–2365

work page 2025

[11] [11]

Faizan Iftikhar Janjua. 2023. Metaverse Financial Transactions Dataset. https://www.kaggle.com/datasets/faizaniftikharjanjua/metaverse-financial- transactions-dataset. Kaggle dataset. Accessed: 2026-03-26

work page 2023

[12] [12]

Benjamín Farías, Wim Martens, Carlos Rojas, and Domagoj Vrgoč. 2023. Pathfinder: A unified approach for handling paths in graph query languages. arXiv preprint arXiv:2306.02194(2023)

work page arXiv 2023

[13] [13]

Amela Fejza, Pierre Genevès, Nabil Layaïda, and Sarah Chlyah. 2023. The 𝜇-ra system for recursive path queries over graphs. InProceedings of the 32nd ACM International Conference on Information and Knowledge Management. 5041–5045

work page 2023

[14] [14]

Xiyang Feng, Guodong Jin, Ziyi Chen, Chang Liu, and Semih Salihoğlu. 2023. Kùzu graph database management system. InThe Conference on Innovative Data Systems Research, Vol. 7. 25–35

work page 2023

[15] [15]

Diego Figueira, Artur Jez, and Anthony W Lin. 2022. Data path queries over embedded graph databases. InProceedings of the 41st ACM SIGMOD-SIGACT- SIGAI Symposium on Principles of Database Systems. 189–201

work page 2022

[16] [16]

Diego Figueira and Leonid Libkin. 2015. Path logics for querying graphs: Com- bining expressiveness and efficiency. In2015 30th Annual ACM/IEEE Symposium on Logic in Computer Science. IEEE, 329–340

work page 2015

[17] [17]

Nadime Francis, Amélie Gheerbrant, Paolo Guagliardo, Leonid Libkin, Victor Marsault, Wim Martens, Filip Murlak, Liat Peterfreund, Alexandra Rogova, and Diego Rivera Correa and Mirek Riedewald Domagoj Vrgoc. 2023. A Researcher’s Digest of GQL. InThe 26th International Conference on Database Theory, 2023. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 1–1

work page 2023

[18] [18]

Roberto García, Renzo Angles, Vicente Rojas, and Sebastián Ferrada. 2025. PathDB: A system for evaluating regular path queries.arXiv preprint arXiv:2507.01755(2025)

work page arXiv 2025

[19] [19]

Amélie Gheerbrant, Leonid Libkin, Liat Peterfreund, and Alexandra Rogova

work page

[20] [20]

Gql and sql/pgq: Theoretical models and expressive power.arXiv preprint arXiv:2409.01102(2024)

work page arXiv 2024

[21] [21]

Amélie Gheerbrant, Leonid Libkin, and Alexandra Rogova. 2025. Dangers of List Processing in Querying Property Graphs.Proceedings of the ACM on Management of Data3, 3 (2025), 1–25

work page 2025

[22] [22]

Daniel Gyllstrom, Eugene Wu, Hee-Jin Chae, Yanlei Diao, Patrick Stahlberg, and Gordon Anderson. 2006. SASE: Complex event processing over streams.arXiv preprint cs/0612128(2006)

work page internal anchor Pith review Pith/arXiv arXiv 2006

[23] [23]

Mohamed Hassan, Tatiana Kuznetsova, Hyun Chai Jeong, Walid Aref, and Mohammad Sadoghi. 2018. Extending In-Memory Relational Database En- gines with Native Graph Support. InProceedings of the 21st International Conference on Extending Database Technology (EDBT). OpenProceedings.org. doi:10.5441/002/EDBT.2018.04

work page doi:10.5441/002/edbt.2018.04 2018

[24] [24]

Alexandru Iosup, Tim Hegeman, Wing Lung Ngai, Stijn Heldens, Arnau Prat- Pérez, Thomas Manhardto, Hassan Chafio, Mihai Capotă, Narayanan Sundaram, et al. 2016. LDBC Graphalytics: A benchmark for large-scale graph analysis on parallel and distributed platforms.Proceedings of the VLDB Endowment9, 13 (2016), 1317–1328

work page 2016

[25] [25]

Guodong Jin and Semih Salihoglu. 2021. Making RDBMSs efficient on graph workloads through predefined joins.arXiv preprint arXiv:2108.10540(2021)

work page arXiv 2021

[26] [26]

Michael Kaminski and Nissim Francez. 1994. Finite-memory automata.Theoreti- cal Computer Science134, 2 (1994), 329–363

work page 1994

[27] [27]

André Koschmieder and Ulf Leser. 2012. Regular path queries on large graphs. InProceedings of the 24th International Conference on Scientific and Statistical Database Management(Chania, Crete, Greece)(SSDBM’12). Springer-Verlag, Berlin, Heidelberg, 177–194. doi:10.1007/978-3-642-31235-9_12

work page doi:10.1007/978-3-642-31235-9_12 2012

[28] [28]

Srijan Kumar, Bryan Hooi, Disha Makhija, Mohit Kumar, Christos Faloutsos, and VS Subrahmanian. 2018. Rev2: Fraudulent user prediction in rating platforms. InProceedings of the Eleventh ACM International Conference on Web Search and Data Mining. ACM, 333–341

work page 2018

[29] [29]

Srijan Kumar, Francesca Spezzano, VS Subrahmanian, and Christos Faloutsos

work page

[30] [30]

InData Mining (ICDM), 2016 IEEE 16th International Conference on

Edge weight prediction in weighted signed networks. InData Mining (ICDM), 2016 IEEE 16th International Conference on. IEEE, 221–230

work page 2016

[31] [31]

Srijan Kumar, Xikun Zhang, and Jure Leskovec. 2019. Predicting dynamic em- bedding trajectory in temporal interaction networks. InProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 1269–1278

work page 2019

[32] [32]

Jure Leskovec. [n. d.]. Stanford Large Network Dataset Collection. https: //snap.stanford.edu/data/ [Accessed: 2024]

work page 2024

[33] [33]

Qi Liang, Dian Ouyang, Fan Zhang, Jianye Yang, Xuemin Lin, and Zhihong Tian. 2024. Efficient Regular Simple Path Queries under Transitive Restricted Expressions.Proc. VLDB Endow.17, 7 (March 2024), 1710–1722. doi:10.14778/ 3654621.3654636

work page arXiv 2024

[34] [34]

Leonid Libkin, Wim Martens, Filip Murlak, Liat Peterfreund, and Domagoj Vrgoč

work page

[35] [35]

InCompanion of the 44th Symposium on Principles of Database Systems

Querying Graph Data: Where We Are and Where To Go. InCompanion of the 44th Symposium on Principles of Database Systems. 9–26

work page

[36] [36]

Leonid Libkin, Wim Martens, and Domagoj Vrgoč. 2016. Querying graphs with data.Journal of the ACM (JACM)63, 2 (2016), 1–53

work page 2016

[37] [37]

Leonid Libkin and Domagoj Vrgoč. 2012. Regular path queries on graphs with data. InProceedings of the 15th International Conference on Database Theory (Berlin, Germany)(ICDT ’12). Association for Computing Machinery, New York, NY, USA, 74–85. doi:10.1145/2274576.2274585

work page doi:10.1145/2274576.2274585 2012

[38] [38]

Chunbin Lin, Benjamin Mandel, Yannis Papakonstantinou, and Matthias Springer

work page

[39] [39]

VLDB Endow.10, 3 (Nov

Fast in-memory SQL analytics on typed graphs.Proc. VLDB Endow.10, 3 (Nov. 2016), 265–276. doi:10.14778/3021924.3021941

work page doi:10.14778/3021924.3021941 2016

[40] [40]

Katja Losemann and Wim Martens. 2013. The complexity of regular expressions and property paths in SPARQL.ACM Transactions on Database Systems (TODS) 38, 4 (2013), 1–39

work page 2013

[41] [41]

Wim Martens, Matthias Niewerth, Tina Popp, Carlos Rojas, Stijn Vansummeren, and Domagoj Vrgoč. 2023. Representing Paths in Graph Database Pattern Match- ing.Proc. VLDB Endow.16, 7 (March 2023), 1790–1803. doi:10.14778/3587136. 3587151

work page doi:10.14778/3587136 2023

[42] [42]

József Marton, Gábor Szárnyas, and Dániel Varró. 2017. Formalising open- Cypher graph queries in relational algebra. InEuropean Conference on Advances in Databases and Information Systems. Springer, 182–196

work page 2017

[43] [43]

Memgraph. 2024. Fastest, Most Affordable Graph Database. https://memgraph. com/ [Accessed: 2024]

work page 2024

[44] [44]

Mendelzon and Peter T

Alberto O. Mendelzon and Peter T. Wood. 1995. Finding Regular Simple Paths in Graph Databases.SIAM J. Comput.24, 6 (1995), 1235–1258. arXiv:https://doi.org/10.1137/S009753979122370X doi:10.1137/ S009753979122370X

work page doi:10.1137/s009753979122370x 1995

[45] [45]

Merkle Science. n.d.. Crypto Risk Scores: How They Help Prevent Financial Crime. https://www.merklescience.com/blog/crypto-risk-scores-how-they- help-prevent-financial-crime Accessed: 2026-03-26

work page 2026

[46] [46]

Amine Mhedhbi, Amol Deshpande, and Semih Salihoğlu. 2024. Modern Tech- niques For Querying Graph-structured Databases.Found. Trends Databases14, 2 (Oct. 2024), 72–185. doi:10.1561/1900000090

work page doi:10.1561/1900000090 2024

[47] [47]

Neo4j. 2024. Neo4j Graph Database and Analytics. https://neo4j.com/ [Accessed: 2024]

work page 2024

[48] [48]

Van-Quyet Nguyen, Van-Hau Nguyen, Minh-Quy Nguyen, Quyet-Thang Huynh, and Kyungbaek Kim. 2021. Efficiently Estimating Joining Cost of Subqueries in Regular Path Queries.Electronics10, 9 (2021), 990

work page 2021

[49] [49]

Van-Quyet Nguyen, Van-Hau Nguyen, Huy-The Vu, Minh-Quy Nguyen, Quyet- Thang Huynh, and Kyungbaek Kim. 2020. Accelerating Parallel Evaluation of Regular Path Queries on Large Graphs by Estimating Joining Cost of Subqueries. InThe 9th International Conference on Smart Media and Applications. 464–469

work page 2020

[50] [50]

Anil Pacaci, Angela Bonifati, and M Tamer Özsu. 2020. Regular path query evalu- ation on streaming graphs. InProceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 1415–1430

work page 2020

[51] [51]

Mark Raasveldt and Hannes Mühleisen. 2019. Duckdb: an embeddable analytical database. InProceedings of the 2019 international conference on management of data. 1981–1984

work page 2019

[52] [52]

Michael Sipser. 1996. Introduction to the Theory of Computation.ACM Sigact News27, 1 (1996), 27–29

work page 1996

[53] [53]

Frank Tetzel, Romans Kasperovics, and Wolfgang Lehner. 2019. Graph traversals for regular path queries. InProceedings of the 2nd Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA). 1–8

work page 2019

[54] [54]

Domagoj Vrgoč. 2015. Using variable automata for querying data graphs.Inform. Process. Lett.115, 3 (2015), 425–430

work page 2015

[55] [55]

Domagoj Vrgoč, Carlos Rojas, Renzo Angles, Marcelo Arenas, Diego Arroyuelo, Carlos Buil-Aranda, Aidan Hogan, Gonzalo Navarro, Cristian Riveros, and Juan Romero. 2023. MillenniumDB: An Open-Source Graph Database System.Data Intelligence5, 3 (08 2023), 560–610. arXiv:https://direct.mit.edu/dint/article- pdf/5/3/560/2158194/dint_a_00229.pdf doi:10.1162/dint_a_00229

work page doi:10.1162/dint_a_00229 2023

[56] [56]

Sarisht Wadhwa, Anagh Prasad, Sayan Ranu, Amitabha Bagchi, and Srikanta Bedathur. 2019. Efficiently Answering Regular Simple Path Queries on Large Labeled Networks. InProceedings of the 2019 International Conference on Man- agement of Data(Amsterdam, Netherlands)(SIGMOD ’19). Association for Com- puting Machinery, New York, NY, USA, 1463–1480. doi:10.1145...

work page doi:10.1145/3299869.3319882 2019

[57] [57]

Daniel ten Wolde, Gábor Szárnyas, and Peter Boncz. 2023. Duckpgq: Bringing sql/pgq´ to duckdb.Proceedings of the VLDB Endowment16, 12 (2023), 4034–4037

work page 2023

[58] [58]

Nikolay Yakovets, Parke Godfrey, and Jarek Gryz. 2013. Evaluation of SPARQL Property Paths via Recursive SQL.AMW1087 (2013)

work page 2013

[59] [59]

Nikolay Yakovets, Parke Godfrey, and Jarek Gryz. 2015. WAVEGUIDE: Evaluating SPARQL Property Path Queries.. InEDBT, Vol. 2015. 525–528

work page 2015

[60] [60]

Nikolay Yakovets, Parke Godfrey, and Jarek Gryz. 2016. Query planning for evalu- ating SPARQL property paths. InProceedings of the 2016 International Conference on Management of Data. 1875–1889

work page 2016

[61] [61]

Guangyi Yang, Xiaoxing Liu, and Beixin Li. 2023. Anti-money laundering super- vision by intelligent algorithm.Computers & Security132 (2023), 103344

work page 2023