gMatch: Fine-Grained and Hardware-Efficient Subgraph Matching on GPUs

Cheng Chen; Minyi Guo; Shixuan Sun; Weitian Chen; Yingqian Hu; Yongmin Hu

arxiv: 2604.10601 · v1 · submitted 2026-04-12 · 💻 cs.DB

gMatch: Fine-Grained and Hardware-Efficient Subgraph Matching on GPUs

Weitian Chen , Shixuan Sun , Cheng Chen , Yongmin Hu , Yingqian Hu , Minyi Guo This is my paper

Pith reviewed 2026-05-10 16:07 UTC · model grok-4.3

classification 💻 cs.DB

keywords subgraph matchingGPU accelerationfine-grained executiongraph analyticsload balancingscalabilityperformance optimizationwarp-level processing

0 comments

The pith

A fine-grained GPU execution model for subgraph matching outperforms prior methods and scales to larger queries and datasets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Subgraph matching locates every copy of a small pattern inside a large data graph and underpins applications from social network analysis to bioinformatics. Prior GPU methods rely on coarse-grained task assignment, which creates high memory demands and leaves many threads idle on irregular workloads. gMatch replaces the coarse model with a fine-grained one that breaks work into smaller units for individual threads, lowering memory use and permitting flexible scheduling. It adds warp-level batch exploration to process thread groups together and lightweight load balancing to keep utilization high. Experiments demonstrate faster runtimes and the ability to finish on query and data sizes where earlier systems either slow down sharply or fail outright.

Core claim

gMatch achieves subgraph matching on GPUs through a fine-grained execution model that reduces memory consumption and supports flexible task scheduling among threads, combined with warp-level batch exploration and lightweight load balancing, outperforming prior methods in both performance and scalability on diverse workloads and real-world datasets.

What carries the argument

The fine-grained execution model that assigns smaller tasks to individual threads to cut memory overhead and enable dynamic scheduling, supported by warp-level batch exploration and lightweight load balancing.

If this is right

Subgraph matching becomes practical for substantially larger real-world graphs in social networks and bioinformatics.
GPU thread utilization rises and memory pressure drops, shortening runtimes across varied workloads.
Mining systems limited to small patterns can be extended without the same scalability collapse.
Coarse-grained GPU designs are shown to hit inherent limits from memory waste and idle threads.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same fine-grained scheduling ideas could accelerate other irregular graph algorithms on GPUs.
Hardware vendors might add direct support for warp-level batching to improve graph processing.
Combining gMatch with distributed multi-GPU setups could push matching to even larger scales.
Gains may depend on graph density, suggesting further tuning for sparse versus dense cases.

Load-bearing premise

The fine-grained model, warp-level batching, and load balancing will deliver consistent speed and correctness gains without hidden overheads or trade-offs on every graph structure, query size, and GPU hardware.

What would settle it

A run on a query and dataset larger than those handled by STMatch or T-DFS where gMatch either fails to finish, returns incorrect matches, or shows no speedup relative to the best prior method.

Figures

Figures reproduced from arXiv: 2604.10601 by Cheng Chen, Minyi Guo, Shixuan Sun, Weitian Chen, Yingqian Hu, Yongmin Hu.

**Figure 1.** Figure 1: Example query graph and data graph. applications, including social network recommendation [15], fraud detection [21], bioinformatics [6, 46], and cheminformatics [39]. Given the importance of subgraph matching, numerous algorithms have been proposed [5, 16, 33]. Although they adopt different optimization techniques, most follow the same core idea: iteratively extend partial matches by mapping query vertic… view at source ↗

**Figure 2.** Figure 2: The search tree of coarse-grained parallel execution [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Coarse-grained parallel execution on the example [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: An overview of gMatch [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 7.** Figure 7: Warp-level batch exploration. and complex queries, advanced filtering techniques [34, 45] further reduce the number of candidates. As a result, the local candidate set of a partial match is often small, and its task group cannot fully utilize all threads in a warp, leading to underutilized GPU resources. Experimental results in Section 2.3 confirm the severity of this issue. To address this, we propose war… view at source ↗

**Figure 5.** Figure 5: The search tree of fine-grained parallel execution [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: To complete the tasks at level 2, the warp simultane [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 8.** Figure 8: Query graphs. Query Sets. We evaluate the performance of subgraph matching algorithms across different query graph sizes. For small query graphs 1https://github.com/HPC-Research-Lab/STMatch (commit: f98462a) 2https://github.com/RapidsAtHKUST/EGSM (commit: de854d5) 3https://github.com/lyuheng/tdfs (commit: 05ef1b1) 4https://github.com/Nagi-Research-Group/BEEP (commit: 3eca170) 5https://github.com/chenxuhao… view at source ↗

**Figure 9.** Figure 9: Comparison of search time between EGSM and gMatch (millisecond). [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗

**Figure 11.** Figure 11: Speedup comparison vs. naive coarse-grained parallel execution [PITH_FULL_IMAGE:figures/full_fig_p011_11.png] view at source ↗

**Figure 12.** Figure 12: Dataset of user payment relationships [PITH_FULL_IMAGE:figures/full_fig_p011_12.png] view at source ↗

**Figure 13.** Figure 13: Comparison of search times for the methods under [PITH_FULL_IMAGE:figures/full_fig_p012_13.png] view at source ↗

**Figure 14.** Figure 14: a, the query time is generally longer with smaller query graph density. This is because sparse query graphs usually lead to larger intermediate results and are more difficult to solve. Analysis of Query Graph Size. To evaluate the impact of query graph size, we generate 100 random query graphs per dataset at each of three sizes: 8, 12 and 16 vertices. The average query time of solved queries is shown in F… view at source ↗

**Figure 16.** Figure 16: Search time under different initial task sizes. Each [PITH_FULL_IMAGE:figures/full_fig_p016_16.png] view at source ↗

**Figure 15.** Figure 15: Search time with varying data graph sizes. B EVALUATIONS OF LOAD BALANCING We evaluate the impact of the initialization phase and work stealing in our load-balancing strategy, respectively. Evaluation of Initialization Phase. To evaluate the impact of the initial task pool size 𝜏, we conduct experiments on the en and gw datasets using multiple 𝜏 values (105 , 106 , 107 ). Values below 105 cause significan… view at source ↗

read the original abstract

Subgraph matching is a core operation in graph analytics, supporting a broad spectrum of applications from social network analysis to bioinformatics. Recent GPU-based approaches accelerate subgraph matching by leveraging parallelism but rely on a coarse-grained execution model that suffers from scalability and efficiency issues due to high memory overhead and thread underutilization. In this paper, we propose gMatch, a hardware-efficient subgraph matching approach on GPUs. gMatch introduces a fine-grained execution model that reduces memory consumption and enables flexible task scheduling among threads. We further design warp-level batch exploration and lightweight load balancing to improve execution efficiency and scalability. Experiments on diverse workloads and real-world datasets show that gMatch outperforms state-of-the-art subgraph matching methods, including STMatch, T-DFS, and EGSM, in both performance and scalability. We also compare against state-of-the-art systems for mining small patterns, such as BEEP and G$^2$Miner. While these systems achieve better performance on small datasets, gMatch scales to substantially larger queries and datasets, where existing approaches degrade or fail to complete.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

gMatch swaps coarse-grained GPU subgraph matching for a fine-grained model with warp batching and light balancing, delivering claimed scalability wins over STMatch and similar systems.

read the letter

The main point is that gMatch replaces the coarse-grained task model common in prior GPU subgraph matchers with a finer-grained one that lowers memory use and allows more flexible thread scheduling. It layers on warp-level batch exploration and lightweight load balancing to cut underutilization. That specific combination is presented as new relative to STMatch, T-DFS, and EGSM, and the paper shows it lets the system handle larger queries and datasets where those baselines degrade or fail.

Referee Report

0 major / 3 minor

Summary. The paper introduces gMatch, a GPU-based subgraph matching system that replaces coarse-grained execution with a fine-grained model to lower memory overhead and improve thread utilization. It adds warp-level batch exploration for task scheduling and lightweight load balancing for efficiency. Experiments on real-world datasets and diverse workloads claim that gMatch outperforms STMatch, T-DFS, EGSM in speed and scalability, and scales to larger queries than BEEP and G²Miner where the baselines fail.

Significance. If the claimed speedups and scaling limits hold under controlled conditions, gMatch would provide a practical advance for GPU graph analytics by directly mitigating memory bloat and underutilization that limit prior coarse-grained methods. The design choices (fine-grained tasks, warp batches, lightweight balancing) are portable to other irregular graph workloads and could serve as a template for hardware-efficient implementations.

minor comments (3)

The abstract states performance and scalability wins but supplies no quantitative results, error bars, dataset sizes, or experimental controls. Add a sentence with representative speedups and the largest graph/query sizes tested.
Section 4 (or wherever the experimental setup appears) should explicitly list the GPU model, memory capacity, and whether results are averaged over multiple runs with standard deviation; this is needed to assess reproducibility of the scalability claims.
Figure captions and axis labels for the scaling plots should state the exact query sizes and graph edge counts at which baselines time out, so readers can judge the 'substantially larger' claim without consulting the text.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of gMatch and the recommendation for minor revision. The review accurately captures the key ideas of replacing coarse-grained execution with fine-grained task scheduling, warp-level batch exploration, and lightweight load balancing to address memory overhead and thread utilization in GPU subgraph matching.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper is a systems/engineering contribution that proposes a fine-grained GPU execution model for subgraph matching, along with warp-level batch exploration and lightweight load balancing. Its claims are supported by direct experimental comparisons against independent prior systems (STMatch, T-DFS, EGSM, BEEP, G²Miner) on real-world datasets and workloads. No mathematical derivations, equations, fitted parameters presented as predictions, or self-citation chains appear in the abstract or described approach. The central results do not reduce to inputs by construction; they are externally validated through implementation and benchmarking, making the work self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an applied systems paper whose central claim rests on empirical performance measurements rather than mathematical axioms or new theoretical entities. No free parameters, axioms, or invented entities are described in the abstract.

pith-pipeline@v0.9.0 · 5495 in / 1126 out tokens · 68008 ms · 2026-05-10T16:07:52.905482+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages

[1]

Mohammad Almasri, Izzat El Hajj, Rakesh Nagi, Jinjun Xiong, and Wen-mei Hwu. 2022. Parallel K-clique counting on GPUs. InProceedings of the 36th ACM International Conference on Supercomputing(Virtual Event)(ICS ’22). Association for Computing Machinery, New York, NY, USA, Article 21, 14 pages

work page 2022
[2]

Mohammad Almasri, Neo Vasudeva, Rakesh Nagi, Jinjun Xiong, and Wen-Mei Hwu. 2021. HyKernel: A Hybrid Selection of One/Two-Phase Kernels for Triangle Counting on GPUs. In2021 IEEE High Performance Extreme Computing Conference (HPEC). 1–7

work page 2021
[3]

Junya Arai, Yasuhiro Fujiwara, and Makoto Onizuka. 2023. GuP: Fast Subgraph Matching by Guard-based Pruning.Proc. ACM Manag. Data1, 2, Article 167 (June 2023), 26 pages

work page 2023
[4]

Howie Huang

Bibek Bhattarai, Hang Liu, and H. Howie Huang. 2019. CECI: Compact Em- bedding Cluster Index for Scalable Subgraph Matching. InProceedings of the 2019 International Conference on Management of Data(Amsterdam, Nether- lands)(SIGMOD ’19). Association for Computing Machinery, New York, NY, USA, 1447–1462

work page 2019
[5]

Fei Bi, Lijun Chang, Xuemin Lin, Lu Qin, and Wenjie Zhang. 2016. Efficient Subgraph Matching by Postponing Cartesian Products. InProceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016, Fatma Özcan, Georgia Koutrika, and Sam Madden (Eds.). ACM, 1199–1214

work page 2016
[6]

Vincenzo Bonnici, Rosalba Giugno, Alfredo Pulvirenti, Dennis Shasha, and Al- fredo Ferro. 2013. A subgraph isomorphism algorithm and its application to biochemical data.BMC bioinformatics14 (2013), 1–13

work page 2013
[7]

Hongzhi Chen, Miao Liu, Yunjian Zhao, Xiao Yan, Da Yan, and James Cheng

work page
[8]

InProceedings of the Thirteenth EuroSys Conference(Porto, Portugal)(EuroSys ’18)

G-Miner: an efficient task-oriented graph mining system. InProceedings of the Thirteenth EuroSys Conference(Porto, Portugal)(EuroSys ’18). Association for Computing Machinery, New York, NY, USA, Article 32, 12 pages

work page
[9]

Xuhao Chen and Arvind. 2022. Efficient and Scalable Graph Pattern Mining on GPUs. In16th USENIX Symposium on Operating Systems Design and Implementa- tion (OSDI 22). USENIX Association, Carlsbad, CA, 857–877

work page 2022
[10]

Xuhao Chen, Roshan Dathathri, Gurbinder Gill, and Keshav Pingali. 2020. Pan- golin: an efficient and flexible graph mining system on CPU and GPU.Proc. VLDB Endow.13, 8 (April 2020), 1190–1205

work page 2020
[11]

Vinicius Dias, Carlos H. C. Teixeira, Dorgival Guedes, Wagner Meira, and Srini- vasan Parthasarathy. 2019. Fractal: A General-Purpose Graph Pattern Mining System. InProceedings of the 2019 International Conference on Management of Data (Amsterdam, Netherlands)(SIGMOD ’19). Association for Computing Machinery, New York, NY, USA, 1357–1374

work page 2019
[12]

Vibhor Dodeja, Mohammad Almasri, Rakesh Nagi, Jinjun Xiong, and Wen-mei Hwu. 2022. PARSEC: PARallel Subgraph Enumeration in CUDA. In2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 168–178

work page 2022
[13]

Orri Erling, Alex Averbuch, Josep-Lluis Larriba-Pey, Hassan Chafi, Andrey Gu- bichev, Arnau Prat-Pérez, Minh-Duc Pham, and Peter A. Boncz. 2015. The LDBC Social Network Benchmark: Interactive Workload. InSIGMOD. 619–630

work page 2015
[14]

Oded Green, Pavan Yalamanchili, and Lluís-Miquel Munguía. 2014. Fast Tri- angle Counting on the GPU. In2014 4th Workshop on Irregular Applications: Architectures and Algorithms (IA 3). 1–8

work page 2014
[15]

Wentian Guo, Yuchen Li, Mo Sha, Bingsheng He, Xiaokui Xiao, and Kian-Lee Tan. 2020. GPU-Accelerated Subgraph Enumeration on Partitioned Graphs. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data(Portland, OR, USA)(SIGMOD ’20). Association for Computing Machinery, New York, NY, USA, 1067–1082

work page 2020
[16]

Pankaj Gupta, Venu Satuluri, Ajeet Grewal, Siva Gurumurthy, Volodymyr Zhabiuk, Quannan Li, and Jimmy Lin. 2014. Real-time twitter recommenda- tion: Online motif detection in large dynamic graphs.Proceedings of the VLDB Endowment7, 13 (2014), 1379–1380

work page 2014
[17]

Myoungji Han, Hyunjoon Kim, Geonmo Gu, Kunsoo Park, and Wook-Shin Han. 2019. Efficient Subgraph Matching: Harmonizing Dynamic Programming, Adaptive Matching Order, and Failing Set Together. InSIGMOD Conference. 1429–1446

work page 2019
[18]

Huahai He and Ambuj K. Singh. 2008. Graphs-at-a-time: query language and access methods for graph databases. InProceedings of the 2008 ACM SIGMOD International Conference on Management of Data(Vancouver, Canada)(SIGMOD ’08). Association for Computing Machinery, New York, NY, USA, 405–418

work page 2008
[19]

Howie Huang

Yang Hu, Hang Liu, and H. Howie Huang. 2018. High-Performance Triangle Counting on GPUs. In2018 IEEE High Performance extreme Computing Conference (HPEC). 1–5

work page 2018
[20]

Kasra Jamshidi, Rakesh Mahadasa, and Keval Vora. 2020. Peregrine: a pattern- aware graph mining system. InProceedings of the Fifteenth European Conference on Computer Systems(Heraklion, Greece)(EuroSys ’20). Association for Computing Machinery, New York, NY, USA, Article 13, 16 pages

work page 2020
[21]

Guanxian Jiang, Qihui Zhou, Tatiana Jin, Boyang Li, Yunjian Zhao, Yichao Li, and James Cheng. 2022. VSGM: view-based GPU-accelerated subgraph matching on large graphs. InProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis(Dallas, Texas)(SC ’22). IEEE Press, Article 52, 15 pages

work page 2022
[22]

Jiawei Jiang, Yusong Hu, Xiaosen Li, Wen Ouyang, Zhitao Wang, Fangcheng Fu, and Bin Cui. 2022. Analyzing Online Transaction Networks with Network Motifs. InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining(Washington DC, USA)(KDD ’22). Association for Computing Machinery, New York, NY, USA, 3098–3106

work page 2022
[23]

Tatiana Jin, Boyang Li, Yichao Li, Qihui Zhou, Qianli Ma, Yunjian Zhao, Hongzhi Chen, and James Cheng. 2023. Circinus: Fast Redundancy-Reduced Subgraph Matching.Proc. ACM Manag. Data1, 1, Article 12 (May 2023), 26 pages

work page 2023
[24]

Samiran Kawtikwar, Mohammad Almasri, Wen-Mei Hwu, Rakesh Nagi, and Jinjun Xiong. 2023. Beep: Balanced efficient subgraph enumeration in parallel. In Proceedings of the 52nd International Conference on Parallel Processing. 142–152

work page 2023
[25]

Vinayak Kesarwani, Shivangi Gaur, Sudeep Ranjan Sahoo, Kishan Tamboli, and Vishwesh Jatala. 2024. A Partitioning Scheme for Large Scale Clique Counting on Single GPU. In2024 IEEE 31st International Conference on High Performance Computing, Data and Analytics Workshop (HiPCW). 167–168

work page 2024
[26]

Farzad Khorasani, Rajiv Gupta, and Laxmi N. Bhuyan. 2015. Scalable SIMD- Efficient Graph Processing on GPUs. InProceedings of the 24th International Conference on Parallel Architectures and Compilation Techniques (PACT ’15). 39– 50

work page 2015
[27]

Hyunjoon Kim, Yunyoung Choi, Kunsoo Park, Xuemin Lin, Seok-Hee Hong, and Wook-Shin Han. 2021. Versatile Equivalences: Speeding up Subgraph Query Processing and Subgraph Matching. InProceedings of the 2021 International Con- ference on Management of Data(Virtual Event, China)(SIGMOD ’21). Association for Computing Machinery, New York, NY, USA, 925–937

work page 2021
[28]

Bhowmick, Wook-Shin Han, JeongHoon Lee, Seongyun Ko, and Moath H.A

Hyeonji Kim, Juneyoung Lee, Sourav S. Bhowmick, Wook-Shin Han, JeongHoon Lee, Seongyun Ko, and Moath H.A. Jarrah. 2016. DUALSIM: Parallel Subgraph Enumeration in a Massive Graph on a Single Machine. InProceedings of the 2016 International Conference on Management of Data(San Francisco, California, USA)(SIGMOD ’16). Association for Computing Machinery, New...

work page 2016
[29]

Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data. Accessed: April 11, 2026

work page 2014
[30]

Yujie Lu, Zhijie Zhang, and Weiguo Zheng. 2025. B S○X: Subgraph Matching with Batch Backtracking Search.Proc. ACM Manag. Data3, 1, Article 15 (Feb. 2025), 27 pages

work page 2025
[31]

Chenhao Ma, Reynold Cheng, Laks V. S. Lakshmanan, Tobias Grubenmann, Yixiang Fang, and Xiaodong Li. 2019. LINC: a motif counting algorithm for uncertain graphs.Proc. VLDB Endow.13, 2 (Oct. 2019), 155–168

work page 2019
[32]

Daniel Mawhirter and Bo Wu. 2019. AutoMine: harmonizing high-level abstrac- tion and high performance for graph mining. InProceedings of the 27th ACM Symposium on Operating Systems Principles(Huntsville, Ontario, Canada)(SOSP ’19). Association for Computing Machinery, New York, NY, USA, 509–523

work page 2019
[33]

Shixuan Sun and Qiong Luo. 2020. In-Memory Subgraph Matching: An In- depth Study. InProceedings of the 2020 ACM SIGMOD International Conference on Management of Data(Portland, OR, USA)(SIGMOD ’20). Association for Computing Machinery, New York, NY, USA, 1083–1098

work page 2020
[34]

Shixuan Sun, Xibo Sun, Yulin Che, Qiong Luo, and Bingsheng He. 2020. Rapid- Match: a holistic approach to subgraph query processing.Proc. VLDB Endow.14, 2 (Oct. 2020), 176–188

work page 2020
[35]

Xibo Sun and Qiong Luo. 2023. Efficient GPU-Accelerated Subgraph Matching. Proc. ACM Manag. Data1, 2, Article 181 (June 2023), 26 pages

work page 2023
[36]

Carlos H. C. Teixeira, Alexandre J. Fonseca, Marco Serafini, Georgos Siganos, Mohammed J. Zaki, and Ashraf Aboulnaga. 2015. Arabesque: a system for distributed graph mining. InProceedings of the 25th Symposium on Operating Systems Principles(Monterey, California)(SOSP ’15). Association for Computing Machinery, New York, NY, USA, 425–440

work page 2015
[37]

Ha-Nguyen Tran, Jung-jae Kim, and Bingsheng He. 2015. Fast Subgraph Matching on Large Graphs using Graphics Processors. InDatabase Systems for Advanced Ap- plications, Matthias Renz, Cyrus Shahabi, Xiaofang Zhou, and Muhammad Aamir Cheema (Eds.). Springer International Publishing, Cham, 299–315

work page 2015
[38]

J. R. Ullmann. 1976. An Algorithm for Subgraph Isomorphism.J. ACM23, 1 (Jan. 1976), 31–42

work page 1976
[39]

Yihua Wei and Peng Jiang. 2022. STMatch: accelerating graph pattern matching on GPU with stack-based loop optimizations. InProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (Dallas, Texas)(SC ’22). IEEE Press, Article 53, 13 pages

work page 2022
[40]

1999.Matching of Chemical and Biological Structures Using Subgraph and Maximal Common Subgraph Isomorphism Algorithms

Peter Willett. 1999.Matching of Chemical and Biological Structures Using Subgraph and Maximal Common Subgraph Isomorphism Algorithms. Springer New York, New York, NY, 11–38

work page 1999
[41]

Lizhi Xiang, Arif Khan, Edoardo Serra, Mahantesh Halappanavar, and Aravind Sukumaran-Rajam. 2021. cuTS: scaling subgraph isomorphism on distributed multi-GPU systems using trie based data structure. InProceedings of the Inter- national Conference for High Performance Computing, Networking, Storage and Analysis(St. Louis, Missouri)(SC ’21). Association for...

work page 2021
[42]

Rongjian Yang, Zhijie Zhang, Weiguo Zheng, and Jeffrey Xu Yu. 2023. Fast Con- tinuous Subgraph Matching over Streaming Graphs via Backtracking Reduction. Proc. ACM Manag. Data1, 1, Article 15 (May 2023), 26 pages

work page 2023
[43]

Lyuheng Yuan, Akhlaque Ahmad, Da Yan, Jiao Han, Saugat Adhikari, Xiaodong Yu, and Yang Zhou. 2024. G2-AIMD: A Memory-Efficient Subgraph-Centric Framework for Efficient Subgraph Finding on GPUs. In2024 IEEE 40th Interna- tional Conference on Data Engineering (ICDE). 3164–3177

work page 2024
[44]

Lyuheng Yuan, Da Yan, Jiao Han, Akhlaque Ahmad, Yang Zhou, and Zhe Jiang

work page
[45]

In2024 IEEE 40th Interna- tional Conference on Data Engineering (ICDE)

Faster Depth-First Subgraph Matching on GPUs. In2024 IEEE 40th Interna- tional Conference on Data Engineering (ICDE). 3151–3163

work page
[46]

Tamer Özsu

Li Zeng, Lei Zou, and M. Tamer Özsu. 2023. SGSI – A Scalable GPU-Friendly Subgraph Isomorphism Algorithm.IEEE Transactions on Knowledge and Data Engineering35, 11 (2023), 11899–11916

work page 2023
[47]

Tamer Özsu, Lin Hu, and Fan Zhang

Li Zeng, Lei Zou, M. Tamer Özsu, Lin Hu, and Fan Zhang. 2020. GSI: GPU- friendly Subgraph Isomorphism. In2020 IEEE 36th International Conference on Data Engineering (ICDE). 1249–1260

work page 2020
[48]

Shijie Zhang, Shirong Li, and Jiong Yang. 2009. GADDI: distance index based subgraph matching in biological networks. InProceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology (Saint Petersburg, Russia)(EDBT ’09). Association for Computing Machinery, New York, NY, USA, 192–203

work page 2009
[49]

Zhijie Zhang, Yujie Lu, Weiguo Zheng, and Xuemin Lin. 2024. A Comprehensive Survey and Experimental Study of Subgraph Matching: Trends, Unbiasedness, and Interaction.Proc. ACM Manag. Data2, 1, Article 60 (March 2024), 29 pages. A SCALABILITY EV ALUATIONS To evaluate the scalability of our algorithm, we conduct four ex- periments on two representative data...

work page 2024

[1] [1]

Mohammad Almasri, Izzat El Hajj, Rakesh Nagi, Jinjun Xiong, and Wen-mei Hwu. 2022. Parallel K-clique counting on GPUs. InProceedings of the 36th ACM International Conference on Supercomputing(Virtual Event)(ICS ’22). Association for Computing Machinery, New York, NY, USA, Article 21, 14 pages

work page 2022

[2] [2]

Mohammad Almasri, Neo Vasudeva, Rakesh Nagi, Jinjun Xiong, and Wen-Mei Hwu. 2021. HyKernel: A Hybrid Selection of One/Two-Phase Kernels for Triangle Counting on GPUs. In2021 IEEE High Performance Extreme Computing Conference (HPEC). 1–7

work page 2021

[3] [3]

Junya Arai, Yasuhiro Fujiwara, and Makoto Onizuka. 2023. GuP: Fast Subgraph Matching by Guard-based Pruning.Proc. ACM Manag. Data1, 2, Article 167 (June 2023), 26 pages

work page 2023

[4] [4]

Howie Huang

Bibek Bhattarai, Hang Liu, and H. Howie Huang. 2019. CECI: Compact Em- bedding Cluster Index for Scalable Subgraph Matching. InProceedings of the 2019 International Conference on Management of Data(Amsterdam, Nether- lands)(SIGMOD ’19). Association for Computing Machinery, New York, NY, USA, 1447–1462

work page 2019

[5] [5]

Fei Bi, Lijun Chang, Xuemin Lin, Lu Qin, and Wenjie Zhang. 2016. Efficient Subgraph Matching by Postponing Cartesian Products. InProceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016, Fatma Özcan, Georgia Koutrika, and Sam Madden (Eds.). ACM, 1199–1214

work page 2016

[6] [6]

Vincenzo Bonnici, Rosalba Giugno, Alfredo Pulvirenti, Dennis Shasha, and Al- fredo Ferro. 2013. A subgraph isomorphism algorithm and its application to biochemical data.BMC bioinformatics14 (2013), 1–13

work page 2013

[7] [7]

Hongzhi Chen, Miao Liu, Yunjian Zhao, Xiao Yan, Da Yan, and James Cheng

work page

[8] [8]

InProceedings of the Thirteenth EuroSys Conference(Porto, Portugal)(EuroSys ’18)

G-Miner: an efficient task-oriented graph mining system. InProceedings of the Thirteenth EuroSys Conference(Porto, Portugal)(EuroSys ’18). Association for Computing Machinery, New York, NY, USA, Article 32, 12 pages

work page

[9] [9]

Xuhao Chen and Arvind. 2022. Efficient and Scalable Graph Pattern Mining on GPUs. In16th USENIX Symposium on Operating Systems Design and Implementa- tion (OSDI 22). USENIX Association, Carlsbad, CA, 857–877

work page 2022

[10] [10]

Xuhao Chen, Roshan Dathathri, Gurbinder Gill, and Keshav Pingali. 2020. Pan- golin: an efficient and flexible graph mining system on CPU and GPU.Proc. VLDB Endow.13, 8 (April 2020), 1190–1205

work page 2020

[11] [11]

Vinicius Dias, Carlos H. C. Teixeira, Dorgival Guedes, Wagner Meira, and Srini- vasan Parthasarathy. 2019. Fractal: A General-Purpose Graph Pattern Mining System. InProceedings of the 2019 International Conference on Management of Data (Amsterdam, Netherlands)(SIGMOD ’19). Association for Computing Machinery, New York, NY, USA, 1357–1374

work page 2019

[12] [12]

Vibhor Dodeja, Mohammad Almasri, Rakesh Nagi, Jinjun Xiong, and Wen-mei Hwu. 2022. PARSEC: PARallel Subgraph Enumeration in CUDA. In2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 168–178

work page 2022

[13] [13]

Orri Erling, Alex Averbuch, Josep-Lluis Larriba-Pey, Hassan Chafi, Andrey Gu- bichev, Arnau Prat-Pérez, Minh-Duc Pham, and Peter A. Boncz. 2015. The LDBC Social Network Benchmark: Interactive Workload. InSIGMOD. 619–630

work page 2015

[14] [14]

Oded Green, Pavan Yalamanchili, and Lluís-Miquel Munguía. 2014. Fast Tri- angle Counting on the GPU. In2014 4th Workshop on Irregular Applications: Architectures and Algorithms (IA 3). 1–8

work page 2014

[15] [15]

Wentian Guo, Yuchen Li, Mo Sha, Bingsheng He, Xiaokui Xiao, and Kian-Lee Tan. 2020. GPU-Accelerated Subgraph Enumeration on Partitioned Graphs. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data(Portland, OR, USA)(SIGMOD ’20). Association for Computing Machinery, New York, NY, USA, 1067–1082

work page 2020

[16] [16]

Pankaj Gupta, Venu Satuluri, Ajeet Grewal, Siva Gurumurthy, Volodymyr Zhabiuk, Quannan Li, and Jimmy Lin. 2014. Real-time twitter recommenda- tion: Online motif detection in large dynamic graphs.Proceedings of the VLDB Endowment7, 13 (2014), 1379–1380

work page 2014

[17] [17]

Myoungji Han, Hyunjoon Kim, Geonmo Gu, Kunsoo Park, and Wook-Shin Han. 2019. Efficient Subgraph Matching: Harmonizing Dynamic Programming, Adaptive Matching Order, and Failing Set Together. InSIGMOD Conference. 1429–1446

work page 2019

[18] [18]

Huahai He and Ambuj K. Singh. 2008. Graphs-at-a-time: query language and access methods for graph databases. InProceedings of the 2008 ACM SIGMOD International Conference on Management of Data(Vancouver, Canada)(SIGMOD ’08). Association for Computing Machinery, New York, NY, USA, 405–418

work page 2008

[19] [19]

Howie Huang

Yang Hu, Hang Liu, and H. Howie Huang. 2018. High-Performance Triangle Counting on GPUs. In2018 IEEE High Performance extreme Computing Conference (HPEC). 1–5

work page 2018

[20] [20]

Kasra Jamshidi, Rakesh Mahadasa, and Keval Vora. 2020. Peregrine: a pattern- aware graph mining system. InProceedings of the Fifteenth European Conference on Computer Systems(Heraklion, Greece)(EuroSys ’20). Association for Computing Machinery, New York, NY, USA, Article 13, 16 pages

work page 2020

[21] [21]

Guanxian Jiang, Qihui Zhou, Tatiana Jin, Boyang Li, Yunjian Zhao, Yichao Li, and James Cheng. 2022. VSGM: view-based GPU-accelerated subgraph matching on large graphs. InProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis(Dallas, Texas)(SC ’22). IEEE Press, Article 52, 15 pages

work page 2022

[22] [22]

Jiawei Jiang, Yusong Hu, Xiaosen Li, Wen Ouyang, Zhitao Wang, Fangcheng Fu, and Bin Cui. 2022. Analyzing Online Transaction Networks with Network Motifs. InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining(Washington DC, USA)(KDD ’22). Association for Computing Machinery, New York, NY, USA, 3098–3106

work page 2022

[23] [23]

Tatiana Jin, Boyang Li, Yichao Li, Qihui Zhou, Qianli Ma, Yunjian Zhao, Hongzhi Chen, and James Cheng. 2023. Circinus: Fast Redundancy-Reduced Subgraph Matching.Proc. ACM Manag. Data1, 1, Article 12 (May 2023), 26 pages

work page 2023

[24] [24]

Samiran Kawtikwar, Mohammad Almasri, Wen-Mei Hwu, Rakesh Nagi, and Jinjun Xiong. 2023. Beep: Balanced efficient subgraph enumeration in parallel. In Proceedings of the 52nd International Conference on Parallel Processing. 142–152

work page 2023

[25] [25]

Vinayak Kesarwani, Shivangi Gaur, Sudeep Ranjan Sahoo, Kishan Tamboli, and Vishwesh Jatala. 2024. A Partitioning Scheme for Large Scale Clique Counting on Single GPU. In2024 IEEE 31st International Conference on High Performance Computing, Data and Analytics Workshop (HiPCW). 167–168

work page 2024

[26] [26]

Farzad Khorasani, Rajiv Gupta, and Laxmi N. Bhuyan. 2015. Scalable SIMD- Efficient Graph Processing on GPUs. InProceedings of the 24th International Conference on Parallel Architectures and Compilation Techniques (PACT ’15). 39– 50

work page 2015

[27] [27]

Hyunjoon Kim, Yunyoung Choi, Kunsoo Park, Xuemin Lin, Seok-Hee Hong, and Wook-Shin Han. 2021. Versatile Equivalences: Speeding up Subgraph Query Processing and Subgraph Matching. InProceedings of the 2021 International Con- ference on Management of Data(Virtual Event, China)(SIGMOD ’21). Association for Computing Machinery, New York, NY, USA, 925–937

work page 2021

[28] [28]

Bhowmick, Wook-Shin Han, JeongHoon Lee, Seongyun Ko, and Moath H.A

Hyeonji Kim, Juneyoung Lee, Sourav S. Bhowmick, Wook-Shin Han, JeongHoon Lee, Seongyun Ko, and Moath H.A. Jarrah. 2016. DUALSIM: Parallel Subgraph Enumeration in a Massive Graph on a Single Machine. InProceedings of the 2016 International Conference on Management of Data(San Francisco, California, USA)(SIGMOD ’16). Association for Computing Machinery, New...

work page 2016

[29] [29]

Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data. Accessed: April 11, 2026

work page 2014

[30] [30]

Yujie Lu, Zhijie Zhang, and Weiguo Zheng. 2025. B S○X: Subgraph Matching with Batch Backtracking Search.Proc. ACM Manag. Data3, 1, Article 15 (Feb. 2025), 27 pages

work page 2025

[31] [31]

Chenhao Ma, Reynold Cheng, Laks V. S. Lakshmanan, Tobias Grubenmann, Yixiang Fang, and Xiaodong Li. 2019. LINC: a motif counting algorithm for uncertain graphs.Proc. VLDB Endow.13, 2 (Oct. 2019), 155–168

work page 2019

[32] [32]

Daniel Mawhirter and Bo Wu. 2019. AutoMine: harmonizing high-level abstrac- tion and high performance for graph mining. InProceedings of the 27th ACM Symposium on Operating Systems Principles(Huntsville, Ontario, Canada)(SOSP ’19). Association for Computing Machinery, New York, NY, USA, 509–523

work page 2019

[33] [33]

Shixuan Sun and Qiong Luo. 2020. In-Memory Subgraph Matching: An In- depth Study. InProceedings of the 2020 ACM SIGMOD International Conference on Management of Data(Portland, OR, USA)(SIGMOD ’20). Association for Computing Machinery, New York, NY, USA, 1083–1098

work page 2020

[34] [34]

Shixuan Sun, Xibo Sun, Yulin Che, Qiong Luo, and Bingsheng He. 2020. Rapid- Match: a holistic approach to subgraph query processing.Proc. VLDB Endow.14, 2 (Oct. 2020), 176–188

work page 2020

[35] [35]

Xibo Sun and Qiong Luo. 2023. Efficient GPU-Accelerated Subgraph Matching. Proc. ACM Manag. Data1, 2, Article 181 (June 2023), 26 pages

work page 2023

[36] [36]

Carlos H. C. Teixeira, Alexandre J. Fonseca, Marco Serafini, Georgos Siganos, Mohammed J. Zaki, and Ashraf Aboulnaga. 2015. Arabesque: a system for distributed graph mining. InProceedings of the 25th Symposium on Operating Systems Principles(Monterey, California)(SOSP ’15). Association for Computing Machinery, New York, NY, USA, 425–440

work page 2015

[37] [37]

Ha-Nguyen Tran, Jung-jae Kim, and Bingsheng He. 2015. Fast Subgraph Matching on Large Graphs using Graphics Processors. InDatabase Systems for Advanced Ap- plications, Matthias Renz, Cyrus Shahabi, Xiaofang Zhou, and Muhammad Aamir Cheema (Eds.). Springer International Publishing, Cham, 299–315

work page 2015

[38] [38]

J. R. Ullmann. 1976. An Algorithm for Subgraph Isomorphism.J. ACM23, 1 (Jan. 1976), 31–42

work page 1976

[39] [39]

Yihua Wei and Peng Jiang. 2022. STMatch: accelerating graph pattern matching on GPU with stack-based loop optimizations. InProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (Dallas, Texas)(SC ’22). IEEE Press, Article 53, 13 pages

work page 2022

[40] [40]

1999.Matching of Chemical and Biological Structures Using Subgraph and Maximal Common Subgraph Isomorphism Algorithms

Peter Willett. 1999.Matching of Chemical and Biological Structures Using Subgraph and Maximal Common Subgraph Isomorphism Algorithms. Springer New York, New York, NY, 11–38

work page 1999

[41] [41]

Lizhi Xiang, Arif Khan, Edoardo Serra, Mahantesh Halappanavar, and Aravind Sukumaran-Rajam. 2021. cuTS: scaling subgraph isomorphism on distributed multi-GPU systems using trie based data structure. InProceedings of the Inter- national Conference for High Performance Computing, Networking, Storage and Analysis(St. Louis, Missouri)(SC ’21). Association for...

work page 2021

[42] [42]

Rongjian Yang, Zhijie Zhang, Weiguo Zheng, and Jeffrey Xu Yu. 2023. Fast Con- tinuous Subgraph Matching over Streaming Graphs via Backtracking Reduction. Proc. ACM Manag. Data1, 1, Article 15 (May 2023), 26 pages

work page 2023

[43] [43]

Lyuheng Yuan, Akhlaque Ahmad, Da Yan, Jiao Han, Saugat Adhikari, Xiaodong Yu, and Yang Zhou. 2024. G2-AIMD: A Memory-Efficient Subgraph-Centric Framework for Efficient Subgraph Finding on GPUs. In2024 IEEE 40th Interna- tional Conference on Data Engineering (ICDE). 3164–3177

work page 2024

[44] [44]

Lyuheng Yuan, Da Yan, Jiao Han, Akhlaque Ahmad, Yang Zhou, and Zhe Jiang

work page

[45] [45]

In2024 IEEE 40th Interna- tional Conference on Data Engineering (ICDE)

Faster Depth-First Subgraph Matching on GPUs. In2024 IEEE 40th Interna- tional Conference on Data Engineering (ICDE). 3151–3163

work page

[46] [46]

Tamer Özsu

Li Zeng, Lei Zou, and M. Tamer Özsu. 2023. SGSI – A Scalable GPU-Friendly Subgraph Isomorphism Algorithm.IEEE Transactions on Knowledge and Data Engineering35, 11 (2023), 11899–11916

work page 2023

[47] [47]

Tamer Özsu, Lin Hu, and Fan Zhang

Li Zeng, Lei Zou, M. Tamer Özsu, Lin Hu, and Fan Zhang. 2020. GSI: GPU- friendly Subgraph Isomorphism. In2020 IEEE 36th International Conference on Data Engineering (ICDE). 1249–1260

work page 2020

[48] [48]

Shijie Zhang, Shirong Li, and Jiong Yang. 2009. GADDI: distance index based subgraph matching in biological networks. InProceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology (Saint Petersburg, Russia)(EDBT ’09). Association for Computing Machinery, New York, NY, USA, 192–203

work page 2009

[49] [49]

Zhijie Zhang, Yujie Lu, Weiguo Zheng, and Xuemin Lin. 2024. A Comprehensive Survey and Experimental Study of Subgraph Matching: Trends, Unbiasedness, and Interaction.Proc. ACM Manag. Data2, 1, Article 60 (March 2024), 29 pages. A SCALABILITY EV ALUATIONS To evaluate the scalability of our algorithm, we conduct four ex- periments on two representative data...

work page 2024