A Benchmark Dataset for Graph Regression with Homogeneous and Multi-Relational Variants

Antonio Longa; Marcus Vukojevic; Morteza Haghir Chehreghani; Peter Samoaa

arxiv: 2505.23875 · v1 · submitted 2025-05-29 · 💻 cs.LG · cs.AI

A Benchmark Dataset for Graph Regression with Homogeneous and Multi-Relational Variants

Peter Samoaa , Marcus Vukojevic , Morteza Haghir Chehreghani , Antonio Longa This is my paper

Pith reviewed 2026-05-19 13:06 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords graph regressionbenchmark datasetprogram graphsmulti-relational graphsgraph neural networksexecution timesource code analysis

0 comments

The pith

RelSC provides a new benchmark for graph regression using program graphs labeled by execution time in both homogeneous and multi-relational forms.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper presents RelSC, a dataset of graphs derived from source code that combine syntactic and semantic details, each labeled with the program's execution time as a continuous target. It includes two versions: one homogeneous with a single edge type and rich node features, and one multi-relational that keeps multiple distinct edge types for different relationships. Testing various graph neural networks on these reveals consistent differences in performance depending on whether the multi-relational structure is preserved. The dataset aims to diversify benchmarks away from molecules and citations to better test generalization in graph regression tasks.

Core claim

RelSC is a graph-regression dataset constructed from program graphs that integrate syntactic and semantic information from source code, with each graph annotated by the execution-time cost of the program. The dataset comes in a homogeneous variant (RelSC-H) with a single edge type and a multi-relational variant (RelSC-M) that maintains multiple edge types, allowing comparison of how representation choice affects model performance. Evaluations demonstrate that graph neural networks exhibit different behaviors across these variants.

What carries the argument

The RelSC dataset and its homogeneous (RelSC-H) versus multi-relational (RelSC-M) variants, which encode program structure for predicting continuous execution costs.

If this is right

Graph models need to account for both single-relation and multi-relation structures to perform well on diverse data.
The choice of graph representation significantly influences regression accuracy on execution time.
Continuous labels from runtime costs provide a regression target distinct from typical discrete or property-based ones in other benchmarks.
This setup can help develop models that generalize better across homogeneous and heterogeneous graphs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar datasets could be created from other programming languages or domains to test broader applicability.
The performance gaps might point to specific ways multi-relational edges capture semantic dependencies useful for prediction.
This benchmark could support research into efficient code analysis tools that predict runtime without execution.

Load-bearing premise

The syntactic and semantic information extracted from source code into graph form sufficiently captures the factors that determine execution time.

What would settle it

If experiments show that execution time labels cannot be predicted from the graphs better than a simple baseline or if the performance difference between homogeneous and multi-relational variants disappears under different model trainings.

Figures

Figures reproduced from arXiv: 2505.23875 by Antonio Longa, Marcus Vukojevic, Morteza Haghir Chehreghani, Peter Samoaa.

**Figure 2.** Figure 2: CFG of the method presented in Listing 1 A Control Flow Graph (CFG) is a directed graph that models the execution flow of a program. Formally, a CFG is defined as a tuple GCF G = (V, E), where V represents a set of basic blocks—sequences of statements with a single entry and exit point—and E denotes directed edges that capture control flow transitions, such as sequential execution, branches, and loops [53… view at source ↗

**Figure 1.** Figure 1: Simplified abstract syntax tree (AST) representing the [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 3.** Figure 3: DFG of the method presented in Listing 1 A Data Flow Graph (DFG) is a directed graph that models the flow of data within a program. Formally, a DFG is defined as a tuple GDF G = (V, E), where V represents a set of nodes corresponding to variables or computations, and E denotes directed edges that capture data dependencies, such as variable definitions and their subsequent uses. Unlike CFGs, which represe… view at source ↗

**Figure 4.** Figure 4: (Left) RelSC-H graph for the example presented in Listing 1. (Right) RelSC-M graph for the example presented in Listing 1 7 [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 6.** Figure 6: Distribution of target values in OssBuilds (left) and Hadoop (right). 5.2 Target values [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Test predictions versus target values for the PNA model in OssBuilds (left) and Hadoop (right). The results highlight the challenges posed by the proposed datasets and the varying performance of different models. PNA achieves the best results on RelSC-H datasets, while HeteroGAT outperforms HeteroSAGE on RelSC-M datasets. However, HeteroGAT struggles on smaller datasets, such as SystemDS and H2, indicati… view at source ↗

**Figure 9.** Figure 9: Node Category Distribution for RelSC-M SystemDS dataset [PITH_FULL_IMAGE:figures/full_fig_p022_9.png] view at source ↗

**Figure 11.** Figure 11: Node Category Distribution for RelSC-M Dubbo dataset 22 [PITH_FULL_IMAGE:figures/full_fig_p022_11.png] view at source ↗

**Figure 12.** Figure 12: Average number of relations for dataset RelSC-M Dubbo [PITH_FULL_IMAGE:figures/full_fig_p024_12.png] view at source ↗

**Figure 15.** Figure 15: Average number of relations for dataset RelSC-M OssBuilds [PITH_FULL_IMAGE:figures/full_fig_p024_15.png] view at source ↗

**Figure 16.** Figure 16: Average number of relations for dataset RelSC-M RDF4J [PITH_FULL_IMAGE:figures/full_fig_p024_16.png] view at source ↗

**Figure 18.** Figure 18: Example of RelSC-H and RelSC-M graphs from Hadoop In [PITH_FULL_IMAGE:figures/full_fig_p025_18.png] view at source ↗

**Figure 19.** Figure 19: Example of RelSC-H and RelSC-M graphs from OssBuilds F.1 Metric Distributions [PITH_FULL_IMAGE:figures/full_fig_p026_19.png] view at source ↗

**Figure 20.** Figure 20: Degree distributions of OssBuilds (left) and Hadoop (right) [PITH_FULL_IMAGE:figures/full_fig_p026_20.png] view at source ↗

**Figure 21.** Figure 21: Distribution of target values for SystemDS, H2, Dubbo, and RDF4J, subprojects of OssBuilds. 28 [PITH_FULL_IMAGE:figures/full_fig_p028_21.png] view at source ↗

read the original abstract

Graph-level regression underpins many real-world applications, yet public benchmarks remain heavily skewed toward molecular graphs and citation networks. This limited diversity hinders progress on models that must generalize across both homogeneous and heterogeneous graph structures. We introduce RelSC, a new graph-regression dataset built from program graphs that combine syntactic and semantic information extracted from source code. Each graph is labelled with the execution-time cost of the corresponding program, providing a continuous target variable that differs markedly from those found in existing benchmarks. RelSC is released in two complementary variants. RelSC-H supplies rich node features under a single (homogeneous) edge type, while RelSC-M preserves the original multi-relational structure, connecting nodes through multiple edge types that encode distinct semantic relationships. Together, these variants let researchers probe how representation choice influences model behaviour. We evaluate a diverse set of graph neural network architectures on both variants of RelSC. The results reveal consistent performance differences between the homogeneous and multi-relational settings, emphasising the importance of structural representation. These findings demonstrate RelSC's value as a challenging and versatile benchmark for advancing graph regression methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RelSC gives a fresh code-graph regression benchmark with matched homogeneous and multi-relational versions, but the reported performance gaps may not isolate edge-type effects cleanly.

read the letter

The paper's core contribution is a new dataset, RelSC, built from program graphs that mix syntactic and semantic information from source code, with execution time as the regression target. It ships two versions: RelSC-H with a single edge type and richer node features, and RelSC-M that keeps the original multi-relational edges. This setup is genuinely new relative to the molecular and citation benchmarks that dominate the field, and it gives researchers a concrete way to test how representation choices affect graph regression models on code-like data. The evaluation runs a range of GNN architectures on both variants and notes consistent differences, which at least shows the dataset is usable for that kind of comparison. That is useful work. The main soft spot is the comparison itself. The abstract and stress-test note leave open whether the node-feature sets and preprocessing steps are identical across the two variants. If RelSC-H carries extra or differently scaled features while RelSC-M does not, the performance gaps cannot be cleanly attributed to the presence or absence of multiple edge types. The paper would need explicit confirmation that the only controlled difference is the relational structure, plus standard details on splits, hyperparameter search, and error bars to make the empirical claims robust. Without those, the headline observation stays suggestive rather than conclusive. This paper is aimed at graph ML researchers who need benchmarks outside molecules and citations, especially anyone working on code or program analysis tasks. A reader who wants a new dataset to run their own models on will find it worth downloading and testing. It is coherent enough on its own terms to deserve a serious referee, provided the review focuses on tightening the variant controls and evaluation reporting. I would send it to peer review with those specific requests rather than desk-reject it.

Referee Report

2 major / 2 minor

Summary. The paper introduces RelSC, a new graph-regression dataset constructed from program graphs that encode both syntactic and semantic information extracted from source code. Each graph is paired with a continuous label given by the execution-time cost of the corresponding program. The dataset is released in two variants: RelSC-H, which uses a single homogeneous edge type together with rich node features, and RelSC-M, which retains the original multi-relational edge types. A range of graph neural network architectures is evaluated on both variants; the results show consistent performance differences between the homogeneous and multi-relational settings, which the authors interpret as evidence that structural representation choice matters for graph regression.

Significance. If the reported performance gaps prove robust, RelSC would supply a useful addition to the limited set of public graph-regression benchmarks. The continuous execution-time target differs from the discrete or molecular-property targets that dominate existing collections, and the paired homogeneous/multi-relational variants enable controlled investigation of representation effects. The explicit release of both variants is a constructive feature that could support future ablation studies.

major comments (2)

[Evaluation] Evaluation section: the abstract and results description state that consistent performance differences appear between RelSC-H and RelSC-M, yet no information is supplied on train/validation/test splits, hyper-parameter selection protocol, number of random seeds, error bars, or statistical significance tests. Without these details it is impossible to determine whether the observed gaps are stable or sensitive to post-hoc choices.
[Methods] Methods / Dataset construction: the two variants are described as differing primarily in edge-type encoding, but the manuscript does not state that the node-feature matrices (including feature sets and dimensionality) are identical across RelSC-H and RelSC-M. Any systematic mismatch in node features or preprocessing would confound the attribution of performance differences to relational structure rather than to feature or extraction artifacts.

minor comments (2)

[Abstract] The abstract refers to “rich node features” for RelSC-H and “original multi-relational structure” for RelSC-M; a short table comparing the exact node-feature dimensions and edge-type counts of the two variants would improve clarity.
[Introduction] A few sentences in the introduction repeat the motivation for graph regression benchmarks; tightening the prose would reduce redundancy.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and positive assessment of RelSC as a potential benchmark. We address the two major comments point by point below and will revise the manuscript to incorporate the requested clarifications and details.

read point-by-point responses

Referee: [Evaluation] Evaluation section: the abstract and results description state that consistent performance differences appear between RelSC-H and RelSC-M, yet no information is supplied on train/validation/test splits, hyper-parameter selection protocol, number of random seeds, error bars, or statistical significance tests. Without these details it is impossible to determine whether the observed gaps are stable or sensitive to post-hoc choices.

Authors: We agree that these experimental details are essential for reproducibility and for confirming that the observed performance differences are robust. In the revised manuscript we will expand the Evaluation section to explicitly describe the train/validation/test splits, the hyper-parameter selection protocol, the number of random seeds, the reporting of error bars, and the statistical significance tests used to compare results between the two variants. revision: yes
Referee: [Methods] Methods / Dataset construction: the two variants are described as differing primarily in edge-type encoding, but the manuscript does not state that the node-feature matrices (including feature sets and dimensionality) are identical across RelSC-H and RelSC-M. Any systematic mismatch in node features or preprocessing would confound the attribution of performance differences to relational structure rather than to feature or extraction artifacts.

Authors: We confirm that the node-feature matrices (feature sets and dimensionality) are identical in RelSC-H and RelSC-M; the variants differ only in edge-type encoding. To eliminate any possible ambiguity we will add an explicit statement to this effect in the Methods / Dataset construction section of the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: dataset introduction with empirical observations only

full rationale

The paper introduces the RelSC benchmark dataset from program graphs labeled by execution time and releases two variants (RelSC-H homogeneous with rich node features; RelSC-M multi-relational). It then reports empirical GNN performance differences between variants. No derivation chain, first-principles prediction, equation, or fitted parameter is claimed or present; the work contains no self-definitional steps, no predictions that reduce to inputs by construction, and no load-bearing self-citations of uniqueness theorems. The central claims rest on dataset construction and direct experimental comparison, which are self-contained against external benchmarks and do not reduce to the paper's own fitted values or prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical benchmark dataset paper. No mathematical derivations, fitted parameters, background axioms, or newly postulated entities are required or introduced.

pith-pipeline@v0.9.0 · 5737 in / 1126 out tokens · 38927 ms · 2026-05-19T13:06:04.860949+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce RelSC, a new graph-regression dataset built from program graphs that combine syntactic and semantic information extracted from source code. Each graph is labelled with the execution-time cost of the corresponding program
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

RelSC-H supplies rich node features under a single (homogeneous) edge type, while RelSC-M preserves the original multi-relational structure

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

99 extracted references · 99 canonical work pages · 2 internal anchors

[1]

The graph neural network model.IEEE transactions on neural networks, 20(1):61–80, 2008

Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. The graph neural network model.IEEE transactions on neural networks, 20(1):61–80, 2008

work page 2008
[2]

Neural network for graphs: A contextual constructive approach.IEEE Transactions on Neural Networks, 20(3):498–511, 2009

Alessio Micheli. Neural network for graphs: A contextual constructive approach.IEEE Transactions on Neural Networks, 20(3):498–511, 2009

work page 2009
[3]

Semi-Supervised Classification with Graph Convolutional Networks

Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks.arXiv preprint arXiv:1609.02907, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[4]

Inductive representation learning on large graphs.Advances in neural information processing systems, 30, 2017

Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs.Advances in neural information processing systems, 30, 2017

work page 2017
[5]

Graph attention networks

Petar Veliˇckovi´c, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Li`o, and Yoshua Bengio. Graph attention networks. InInternational Conference on Learning Representations, 2018

work page 2018
[6]

Predict then propagate: Graph neural networks meet personalized pagerank.arXiv preprint arXiv:1810.05997, 2018

Johannes Gasteiger, Aleksandar Bojchevski, and Stephan G ¨unnemann. Predict then propagate: Graph neural networks meet personalized pagerank.arXiv preprint arXiv:1810.05997, 2018

work page arXiv 2018
[7]

Link prediction based on graph neural networks.Advances in neural information processing systems, 31, 2018

Muhan Zhang and Yixin Chen. Link prediction based on graph neural networks.Advances in neural information processing systems, 31, 2018

work page 2018
[8]

Simplifying graph convolutional networks

Felix Wu, Amauri Souza, Tianyi Zhang, Christopher Fifty, Tao Yu, and Kilian Weinberger. Simplifying graph convolutional networks. InInternational conference on machine learning, pages 6861–6871. PMLR, 2019. 8https://anonymous.4open.science/r/graph_regression_datasets-407E/ 12

work page 2019
[9]

Labeling trick: A theory of using graph neural networks for multi-node representation learning.Advances in Neural Information Processing Systems, 34:9061–9073, 2021

Muhan Zhang, Pan Li, Yinglong Xia, Kai Wang, and Long Jin. Labeling trick: A theory of using graph neural networks for multi-node representation learning.Advances in Neural Information Processing Systems, 34:9061–9073, 2021

work page 2021
[10]

A simple and expressive graph neural network based method for structural link representation

Veronica Lachi, Francesco Ferrini, Antonio Longa, Bruno Lepri, and Andrea Passerini. A simple and expressive graph neural network based method for structural link representation. InICML 2024 Workshop on Geometry- grounded Representation Learning and Generative Modeling, 2024

work page 2024
[11]

Sheaf diffusion goes nonlinear: Enhancing GNNs with adaptive sheaf laplacians

Olga Zaghen, Antonio Longa, Steve Azzolin, Lev Telyatnikov, Andrea Passerini, and Pietro Lio. Sheaf diffusion goes nonlinear: Enhancing GNNs with adaptive sheaf laplacians. InICML 2024 Workshop on Geometry-grounded Representation Learning and Generative Modeling, 2024

work page 2024
[12]

Linkbench: a database benchmark based on the facebook social graph

Timothy G Armstrong, Vamsi Ponnekanti, Dhruba Borthakur, and Mark Callaghan. Linkbench: a database benchmark based on the facebook social graph. InProceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pages 1185–1196, 2013

work page 2013
[13]

Open graph benchmark: Datasets for machine learning on graphs.Advances in neural information processing systems, 33:22118–22133, 2020

Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. Open graph benchmark: Datasets for machine learning on graphs.Advances in neural information processing systems, 33:22118–22133, 2020

work page 2020
[14]

arXiv preprint arXiv:2007.08663 , year=

Christopher Morris, Nils M Kriege, Franka Bause, Kristian Kersting, Petra Mutzel, and Marion Neumann. Tudataset: A collection of benchmark datasets for learning with graphs.arXiv preprint arXiv:2007.08663, 2020

work page arXiv 2007
[15]

Long range graph benchmark.Advances in Neural Information Processing Systems, 35:22326–22340, 2022

Vijay Prakash Dwivedi, Ladislav Ramp´aˇsek, Michael Galkin, Ali Parviz, Guy Wolf, Anh Tuan Luu, and Dominique Beaini. Long range graph benchmark.Advances in Neural Information Processing Systems, 35:22326–22340, 2022

work page 2022
[16]

Opengsl: A comprehensive benchmark for graph structure learning.Advances in Neural Information Processing Systems, 36, 2024

Zhou Zhiyao, Sheng Zhou, Bochao Mao, Xuanyi Zhou, Jiawei Chen, Qiaoyu Tan, Daochen Zha, Yan Feng, Chun Chen, and Can Wang. Opengsl: A comprehensive benchmark for graph structure learning.Advances in Neural Information Processing Systems, 36, 2024

work page 2024
[17]

Temporal graph benchmark for machine learning on temporal graphs.Advances in Neural Information Processing Systems, 36, 2024

Shenyang Huang, Farimah Poursafaei, Jacob Danovitch, Matthias Fey, Weihua Hu, Emanuele Rossi, Jure Leskovec, Michael Bronstein, Guillaume Rabusseau, and Reihaneh Rabbany. Temporal graph benchmark for machine learning on temporal graphs.Advances in Neural Information Processing Systems, 36, 2024

work page 2024
[18]

Graph neural networks and their current applications in bioinformatics.Frontiers in genetics, 12:690049, 2021

Xiao-Meng Zhang, Li Liang, Lin Liu, and Ming-Jing Tang. Graph neural networks and their current applications in bioinformatics.Frontiers in genetics, 12:690049, 2021

work page 2021
[19]

Biognn: how graph neural networks can solve biological problems

Pietro Bongini, Niccol`o Pancino, Franco Scarselli, and Monica Bianchini. Biognn: how graph neural networks can solve biological problems. InArtificial Intelligence and Machine Learning for Healthcare: Vol. 1: Image and Data Analytics, pages 211–231. Springer, 2022

work page 2022
[20]

Graph neural network for traffic forecasting: A survey.Expert Systems with Applications, 207:117921, 2022

Weiwei Jiang and Jiayun Luo. Graph neural network for traffic forecasting: A survey.Expert Systems with Applications, 207:117921, 2022

work page 2022
[21]

A survey of graph neural network based recommendation in social networks.Neurocomputing, 549:126441, 2023

Xiao Li, Li Sun, Mengjie Ling, and Yan Peng. A survey of graph neural network based recommendation in social networks.Neurocomputing, 549:126441, 2023

work page 2023
[22]

Graph neural networks for social recommendation

Wenqi Fan, Yao Ma, Qing Li, Yuan He, Eric Zhao, Jiliang Tang, and Dawei Yin. Graph neural networks for social recommendation. InThe world wide web conference, pages 417–426, 2019

work page 2019
[23]

Dejun Jiang, Zhenxing Wu, Chang-Yu Hsieh, Guangyong Chen, Ben Liao, Zhe Wang, Chao Shen, Dongsheng Cao, Jian Wu, and Tingjun Hou. Could graph neural networks learn better molecular representation for drug discovery? a comparison study of descriptor-based and graph-based models.Journal of Cheminformatics, 13(1):12, Feb 2021

work page 2021
[24]

A compact review of molecular property prediction with graph neural networks.Drug Discovery Today: Technologies, 37:1–12, 2020

Oliver Wieder, Stefan Kohlbacher, M´elaine Kuenemann, Arthur Garon, Pierre Ducrot, Thomas Seidel, and Thierry Langer. A compact review of molecular property prediction with graph neural networks.Drug Discovery Today: Technologies, 37:1–12, 2020

work page 2020
[25]

Graph neural network approaches for drug-target interactions.Current Opinion in Structural Biology, 73:102327, 2022

Zehong Zhang, Lifan Chen, Feisheng Zhong, Dingyan Wang, Jiaxin Jiang, Sulin Zhang, Hualiang Jiang, Mingyue Zheng, and Xutong Li. Graph neural network approaches for drug-target interactions.Current Opinion in Structural Biology, 73:102327, 2022

work page 2022
[26]

Performance improvements in chrome’s rendering pipeline

Chris Harrelson. Performance improvements in chrome’s rendering pipeline. chromium blog, 2017

work page 2017
[27]

Rapid regression detection in software deployments through sequential testing

Michael Lindon, Chris Sanden, and Vach´e Shirikian. Rapid regression detection in software deployments through sequential testing. InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 3336–3346, 2022

work page 2022
[28]

Pace: A program analysis framework for continuous performance prediction

Chidera Biringa and G¨okhan Kul. Pace: A program analysis framework for continuous performance prediction. ACM Transactions on Software Engineering and Methodology, 33(4):1–23, 2024. 13

work page 2024
[29]

Recursive functions of symbolic expressions and their computation by machine, part i.Communi- cations of the ACM, 3(4):184–195, 1960

John McCarthy. Recursive functions of symbolic expressions and their computation by machine, part i.Communi- cations of the ACM, 3(4):184–195, 1960

work page 1960
[30]

Understanding source code evolution using abstract syntax tree matching

Iulian Neamtiu, Jeffrey S Foster, and Michael Hicks. Understanding source code evolution using abstract syntax tree matching. InProceedings of the 2005 international workshop on Mining software repositories, pages 1–5, 2005

work page 2005
[31]

A novel neural source code representation based on abstract syntax tree

Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, Kaixuan Wang, and Xudong Liu. A novel neural source code representation based on abstract syntax tree. In2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pages 783–794. IEEE, 2019

work page 2019
[32]

Cast: Enhancing code summarization with hierarchical splitting and reconstruction of abstract syntax trees

Ensheng Shi, Yanlin Wang, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, and Hongbin Sun. Cast: Enhancing code summarization with hierarchical splitting and reconstruction of abstract syntax trees. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 4053–4062, 2021

work page 2021
[33]

A systematic mapping study of source code representation for deep learning in software engineering.IET Software, 16(4):351–385, 2022

Peter Samoaa, Firas Bayram, Pasquale Salza, and Philipp Leitner. A systematic mapping study of source code representation for deep learning in software engineering.IET Software, 16(4):351–385, 2022

work page 2022
[34]

Control flow analysis.ACM Sigplan Notices, 5(7):1–19, 1970

Frances E Allen. Control flow analysis.ACM Sigplan Notices, 5(7):1–19, 1970

work page 1970
[35]

Traces of control-flow graphs

Simone Campanoni and Stefano Crespi Reghizzi. Traces of control-flow graphs. InDevelopments in Language Theory: 13th International Conference, DLT 2009, Stuttgart, Germany, June 30-July 3, 2009. Proceedings 13, pages 156–169. Springer, 2009

work page 2009
[36]

Automatically deriving control-flow graph generators from operational semantics.Proceedings of the ACM on Programming Languages, 6(ICFP):742–771, 2022

James Koppel, Jackson Kearl, and Armando Solar-Lezama. Automatically deriving control-flow graph generators from operational semantics.Proceedings of the ACM on Programming Languages, 6(ICFP):742–771, 2022

work page 2022
[37]

Survey of malware analysis through control flow graph using machine learning

Shaswata Mitra, Stephen A Torri, and Sudip Mittal. Survey of malware analysis through control flow graph using machine learning. In2023 IEEE 22nd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), pages 1554–1561. IEEE, 2023

work page 2023
[38]

A preliminary architecture for a basic data-flow processor

Jack B Dennis and David P Misunas. A preliminary architecture for a basic data-flow processor. InProceedings of the 2nd annual symposium on Computer architecture, pages 126–132, 1974

work page 1974
[39]

Davis and Robert M

Alan L. Davis and Robert M. Keller. Data flow program graphs.Computer, 15(02):26–41, 1982

work page 1982
[40]

A formal definition of data flow graph models.IEEE Transactions on computers, 100(11):940–948, 1986

Kavi, Buckles, and Bhat. A formal definition of data flow graph models.IEEE Transactions on computers, 100(11):940–948, 1986

work page 1986
[41]

Graphiler: Optimizing graph neural networks with message passing data flow graph.Proceedings of Machine Learning and Systems, 4:515–528, 2022

Zhiqiang Xie, Minjie Wang, Zihao Ye, Zheng Zhang, and Rui Fan. Graphiler: Optimizing graph neural networks with message passing data flow graph.Proceedings of Machine Learning and Systems, 4:515–528, 2022

work page 2022
[42]

Tep-gnn: Accurate execution time prediction of functional tests using graph neural networks

Peter Samoaa, Antonio Longa, Mazen Mohamad, Morteza Haghir Chehreghani, and Philipp Leitner. Tep-gnn: Accurate execution time prediction of functional tests using graph neural networks. In Davide Taibi, Marco Kuhrmann, Tommi Mikkonen, Jil Kl¨under, and Pekka Abrahamsson, editors,Product-Focused Software Process Improvement, pages 464–479, Cham, 2022. Spri...

work page 2022
[43]

Moleculenet: a benchmark for molecular machine learning.Chemical science, 9(2):513–530, 2018

Zhenqin Wu, Bharath Ramsundar, Evan N Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S Pappu, Karl Leswing, and Vijay Pande. Moleculenet: a benchmark for molecular machine learning.Chemical science, 9(2):513–530, 2018

work page 2018
[44]

Automatic chemical design using a data-driven continuous representation of molecules.ACS central science, 4(2):268–276, 2018

Rafael G´omez-Bombarelli, Jennifer N Wei, David Duvenaud, Jos´e Miguel Hern´andez-Lobato, Benjam´ın S´anchez- Lengeling, Dennis Sheberla, Jorge Aguilera-Iparraguirre, Timothy D Hirzel, Ryan P Adams, and Al´an Aspuru- Guzik. Automatic chemical design using a data-driven continuous representation of molecules.ACS central science, 4(2):268–276, 2018

work page 2018
[45]

An adaptive graph learning method for automated molecular interactions and properties predictions.nature machine intelligence, 4(7):645–651, 2022

Yuquan Li, Chang-Yu Hsieh, Ruiqiang Lu, Xiaoqing Gong, Xiaorui Wang, Pengyong Li, Shuo Liu, Yanan Tian, Dejun Jiang, Jiaxian Yan, et al. An adaptive graph learning method for automated molecular interactions and properties predictions.nature machine intelligence, 4(7):645–651, 2022

work page 2022
[46]

Freesolv: a database of experimental and calculated hydration free energies, with input files.Journal of computer-aided molecular design, 28:711–720, 2014

David L Mobley and J Peter Guthrie. Freesolv: a database of experimental and calculated hydration free energies, with input files.Journal of computer-aided molecular design, 28:711–720, 2014

work page 2014
[47]

Pdb-wide collection of binding data: current status of the pdbbind database.Bioinformatics, 31(3):405–412, 2015

Zhihai Liu, Yan Li, Li Han, Jie Li, Jie Liu, Zhixiong Zhao, Wei Nie, Yuchen Liu, and Renxiao Wang. Pdb-wide collection of binding data: current status of the pdbbind database.Bioinformatics, 31(3):405–412, 2015

work page 2015
[48]

Graph rationalization with environment-based augmentations

Gang Liu, Tong Zhao, Jiaxin Xu, Tengfei Luo, and Meng Jiang. Graph rationalization with environment-based augmentations. InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 1069–1078, 2022

work page 2022
[49]

Graph neural network for source code defect prediction.IEEE access, 10:10402–10415, 2022

Lucija ˇSiki´c, Adrian Satja Kurdija, Klemo Vladimir, and Marin ˇSili´c. Graph neural network for source code defect prediction.IEEE access, 10:10402–10415, 2022. 14

work page 2022
[50]

Regvd: Revisiting graph neural networks for vulnerability detection

Van-Anh Nguyen, Dai Quoc Nguyen, Van Nguyen, Trung Le, Quan Hung Tran, and Dinh Phung. Regvd: Revisiting graph neural networks for vulnerability detection. InProceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings, pages 178–182, 2022

work page 2022
[51]

Graph neural networks in program analysis.Graph neural networks: foundations, frontiers, and applications, pages 483–497, 2022

Miltiadis Allamanis. Graph neural networks in program analysis.Graph neural networks: foundations, frontiers, and applications, pages 483–497, 2022

work page 2022
[52]

Learning graph-based code representations for source-level functional similarity detection

Jiahao Liu, Jun Zeng, Xiang Wang, and Zhenkai Liang. Learning graph-based code representations for source-level functional similarity detection. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pages 345–357. IEEE, 2023

work page 2023
[53]

Learning to represent programs with graphs

Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. Learning to represent programs with graphs. InInternational Conference on Learning Representations, 2018

work page 2018
[54]

Graphcodebert: Pre-training code representations with data flow, 2021

Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, Michele Tufano, Shao Kun Deng, Colin Clement, Dawn Drain, Neel Sundaresan, Jian Yin, Daxin Jiang, and Ming Zhou. Graphcodebert: Pre-training code representations with data flow, 2021

work page 2021
[55]

Contrastive code representation learning

Paras Jain, Ajay Jain, Tianjun Zhang, Pieter Abbeel, Joseph Gonzalez, and Ion Stoica. Contrastive code representation learning. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2021

work page 2021
[56]

Devign: Effective vulnerability identifi- cation by learning comprehensive program semantics via graph neural networks

Yaqin Zhou, Shangqing Liu, Jingkai Siow, Xiaoning Du, and Yang Liu. Devign: Effective vulnerability identifi- cation by learning comprehensive program semantics via graph neural networks. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch´e-Buc, E. Fox, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 32. Curran Assoc...

work page 2019
[57]

Ali Babar

David Hin, Andrey Kan, Huaming Chen, and M. Ali Babar. Linevd: statement-level vulnerability detection using graph neural networks. InProceedings of the 19th International Conference on Mining Software Repositories, MSR ’22, page 596–607, New York, NY , USA, 2022. Association for Computing Machinery

work page 2022
[58]

A novel neural source code representation based on abstract syntax tree

Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, Kaixuan Wang, and Xudong Liu. A novel neural source code representation based on abstract syntax tree. InProceedings of the 41st International Conference on Software Engineering, ICSE ’19, page 783–794. IEEE Press, 2019

work page 2019
[59]

Cclearner: A deep learning-based clone detection approach

Liuqing Li, He Feng, Wenjie Zhuang, Na Meng, and Barbara Ryder. Cclearner: A deep learning-based clone detection approach. In2017 IEEE international conference on software maintenance and evolution (ICSME), pages 249–260. IEEE, 2017

work page 2017
[60]

Automated vulnerability detection in source code using deep representation learning

Rebecca Russell, Louis Kim, Lei Hamilton, Tomo Lazovich, Jacob Harer, Onur Ozdemir, Paul Ellingwood, and Marc McConley. Automated vulnerability detection in source code using deep representation learning. In2018 17th IEEE international conference on machine learning and applications (ICMLA), pages 757–762. IEEE, 2018

work page 2018
[61]

Towards better graph neural network-based fault localization through enhanced code representation.Proc

Md Nakhla Rafi, Dong Jae Kim, An Ran Chen, Tse-Hsun (Peter) Chen, and Shaowei Wang. Towards better graph neural network-based fault localization through enhanced code representation.Proc. ACM Softw. Eng., 1(FSE), July 2024

work page 2024
[62]

Codebert: A pre-trained model for programming and natural languages, 2020

Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. Codebert: A pre-trained model for programming and natural languages, 2020

work page 2020
[63]

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian...

work page 2021
[64]

Dobf: a deobfuscation pre- training objective for programming languages

Marie-Anne Lachaux, Baptiste Roziere, Marc Szafraniec, and Guillaume Lample. Dobf: a deobfuscation pre- training objective for programming languages. InProceedings of the 35th International Conference on Neural Information Processing Systems, NIPS ’21, Red Hook, NY , USA, 2021. Curran Associates Inc

work page 2021
[65]

Dobf: A deobfuscation pre-training objective for programming languages, 2021

Baptiste Roziere, Marie-Anne Lachaux, Marc Szafraniec, and Guillaume Lample. Dobf: A deobfuscation pre-training objective for programming languages, 2021

work page 2021
[66]

Codegen: An open large language model for code with multi-turn program synthesis, 2023

Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. Codegen: An open large language model for code with multi-turn program synthesis, 2023. 15

work page 2023
[67]

Yue Wang, Weishi Wang, Shafiq Joty, and Steven C.H. Hoi. CodeT5: Identifier-aware unified pre-trained encoder- decoder models for code understanding and generation. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih, editors,Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 8696–8708, O...

work page 2021
[68]

Predicting unstable software benchmarks using static source code features.Empirical Softw

Christoph Laaber, Mikael Basmaci, and Pasquale Salza. Predicting unstable software benchmarks using static source code features.Empirical Softw. Engg., 26(6), November 2021

work page 2021
[69]

Deepperf: performance prediction for configurable software with deep sparse neural network

Huong Ha and Hongyu Zhang. Deepperf: performance prediction for configurable software with deep sparse neural network. InProceedings of the 41st International Conference on Software Engineering, ICSE ’19, page 1095–1106. IEEE Press, 2019

work page 2019
[70]

Batch mode deep active learning for regression on graph data

Peter Samoaa, Linus Aronsson, Philipp Leitner, and Morteza Haghir Chehreghani. Batch mode deep active learning for regression on graph data. In2023 IEEE International Conference on Big Data (BigData), pages 5904–5913, 2023

work page 2023
[71]

Static analysis: An introduction: The fundamental challenge of software engineering is one of complexity.Queue, 19(4):29–41, September 2021

Patrick Thomson. Static analysis: An introduction: The fundamental challenge of software engineering is one of complexity.Queue, 19(4):29–41, September 2021

work page 2021
[72]

Vuldeepecker: A deep learning-based system for vulnerability detection

Zhen Li, Deqing Zou, Shouhuai Xu, Xinyu Ou, Hai Jin, Sujuan Wang, Zhijun Deng, and Yuyi Zhong. Vuldeepecker: A deep learning-based system for vulnerability detection. InProceedings 2018 Network and Distributed System Security Symposium, NDSS 2018. Internet Society, 2018

work page 2018
[73]

Dfept: Data flow embedding for enhancing pre-trained model based vulnerability detection

Zhonghao Jiang, Weifeng Sun, Xiaoyan Gu, Jiaxin Wu, Tao Wen, Haibo Hu, and Meng Yan. Dfept: Data flow embedding for enhancing pre-trained model based vulnerability detection. InProceedings of the 15th Asia-Pacific Symposium on Internetware, Internetware ’24, page 95–104, New York, NY , USA, 2024. Association for Computing Machinery

work page 2024
[74]

Hellendoorn, Charles Sutton, Rishabh Singh, Petros Maniatis, and David Bieber

Vincent J. Hellendoorn, Charles Sutton, Rishabh Singh, Petros Maniatis, and David Bieber. Global relational models of source code. InInternational Conference on Learning Representations, 2020

work page 2020
[75]

Neural message passing for quantum chemistry

Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural message passing for quantum chemistry. InInternational conference on machine learning, pages 1263–1272. PMLR, 2017

work page 2017
[76]

Spectral Networks and Locally Connected Networks on Graphs

Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. Spectral networks and locally connected networks on graphs.arXiv preprint arXiv:1312.6203, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[77]

Structpool: Structured graph pooling via conditional random fields

Hao Yuan and Shuiwang Ji. Structpool: Structured graph pooling via conditional random fields. InProceedings of the 8th International Conference on Learning Representations, 2020

work page 2020
[78]

Memory-based graph networks

Amir Hosein Khasahmadi, Kaveh Hassani, Parsa Moradi, Leo Lee, and Quaid Morris. Memory-based graph networks. InInternational Conference on Learning Representations, 2020

work page 2020
[79]

How powerful are graph neural networks? In International Conference on Learning Representations, 2019

Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks? In International Conference on Learning Representations, 2019

work page 2019
[80]

Convolutional neural networks on graphs with fast localized spectral filtering.Advances in neural information processing systems, 29, 2016

Micha¨el Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering.Advances in neural information processing systems, 29, 2016

work page 2016

Showing first 80 references.

[1] [1]

The graph neural network model.IEEE transactions on neural networks, 20(1):61–80, 2008

Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. The graph neural network model.IEEE transactions on neural networks, 20(1):61–80, 2008

work page 2008

[2] [2]

Neural network for graphs: A contextual constructive approach.IEEE Transactions on Neural Networks, 20(3):498–511, 2009

Alessio Micheli. Neural network for graphs: A contextual constructive approach.IEEE Transactions on Neural Networks, 20(3):498–511, 2009

work page 2009

[3] [3]

Semi-Supervised Classification with Graph Convolutional Networks

Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks.arXiv preprint arXiv:1609.02907, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[4] [4]

Inductive representation learning on large graphs.Advances in neural information processing systems, 30, 2017

Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs.Advances in neural information processing systems, 30, 2017

work page 2017

[5] [5]

Graph attention networks

Petar Veliˇckovi´c, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Li`o, and Yoshua Bengio. Graph attention networks. InInternational Conference on Learning Representations, 2018

work page 2018

[6] [6]

Predict then propagate: Graph neural networks meet personalized pagerank.arXiv preprint arXiv:1810.05997, 2018

Johannes Gasteiger, Aleksandar Bojchevski, and Stephan G ¨unnemann. Predict then propagate: Graph neural networks meet personalized pagerank.arXiv preprint arXiv:1810.05997, 2018

work page arXiv 2018

[7] [7]

Link prediction based on graph neural networks.Advances in neural information processing systems, 31, 2018

Muhan Zhang and Yixin Chen. Link prediction based on graph neural networks.Advances in neural information processing systems, 31, 2018

work page 2018

[8] [8]

Simplifying graph convolutional networks

Felix Wu, Amauri Souza, Tianyi Zhang, Christopher Fifty, Tao Yu, and Kilian Weinberger. Simplifying graph convolutional networks. InInternational conference on machine learning, pages 6861–6871. PMLR, 2019. 8https://anonymous.4open.science/r/graph_regression_datasets-407E/ 12

work page 2019

[9] [9]

Labeling trick: A theory of using graph neural networks for multi-node representation learning.Advances in Neural Information Processing Systems, 34:9061–9073, 2021

Muhan Zhang, Pan Li, Yinglong Xia, Kai Wang, and Long Jin. Labeling trick: A theory of using graph neural networks for multi-node representation learning.Advances in Neural Information Processing Systems, 34:9061–9073, 2021

work page 2021

[10] [10]

A simple and expressive graph neural network based method for structural link representation

Veronica Lachi, Francesco Ferrini, Antonio Longa, Bruno Lepri, and Andrea Passerini. A simple and expressive graph neural network based method for structural link representation. InICML 2024 Workshop on Geometry- grounded Representation Learning and Generative Modeling, 2024

work page 2024

[11] [11]

Sheaf diffusion goes nonlinear: Enhancing GNNs with adaptive sheaf laplacians

Olga Zaghen, Antonio Longa, Steve Azzolin, Lev Telyatnikov, Andrea Passerini, and Pietro Lio. Sheaf diffusion goes nonlinear: Enhancing GNNs with adaptive sheaf laplacians. InICML 2024 Workshop on Geometry-grounded Representation Learning and Generative Modeling, 2024

work page 2024

[12] [12]

Linkbench: a database benchmark based on the facebook social graph

Timothy G Armstrong, Vamsi Ponnekanti, Dhruba Borthakur, and Mark Callaghan. Linkbench: a database benchmark based on the facebook social graph. InProceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pages 1185–1196, 2013

work page 2013

[13] [13]

Open graph benchmark: Datasets for machine learning on graphs.Advances in neural information processing systems, 33:22118–22133, 2020

Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. Open graph benchmark: Datasets for machine learning on graphs.Advances in neural information processing systems, 33:22118–22133, 2020

work page 2020

[14] [14]

arXiv preprint arXiv:2007.08663 , year=

Christopher Morris, Nils M Kriege, Franka Bause, Kristian Kersting, Petra Mutzel, and Marion Neumann. Tudataset: A collection of benchmark datasets for learning with graphs.arXiv preprint arXiv:2007.08663, 2020

work page arXiv 2007

[15] [15]

Long range graph benchmark.Advances in Neural Information Processing Systems, 35:22326–22340, 2022

Vijay Prakash Dwivedi, Ladislav Ramp´aˇsek, Michael Galkin, Ali Parviz, Guy Wolf, Anh Tuan Luu, and Dominique Beaini. Long range graph benchmark.Advances in Neural Information Processing Systems, 35:22326–22340, 2022

work page 2022

[16] [16]

Opengsl: A comprehensive benchmark for graph structure learning.Advances in Neural Information Processing Systems, 36, 2024

Zhou Zhiyao, Sheng Zhou, Bochao Mao, Xuanyi Zhou, Jiawei Chen, Qiaoyu Tan, Daochen Zha, Yan Feng, Chun Chen, and Can Wang. Opengsl: A comprehensive benchmark for graph structure learning.Advances in Neural Information Processing Systems, 36, 2024

work page 2024

[17] [17]

Temporal graph benchmark for machine learning on temporal graphs.Advances in Neural Information Processing Systems, 36, 2024

Shenyang Huang, Farimah Poursafaei, Jacob Danovitch, Matthias Fey, Weihua Hu, Emanuele Rossi, Jure Leskovec, Michael Bronstein, Guillaume Rabusseau, and Reihaneh Rabbany. Temporal graph benchmark for machine learning on temporal graphs.Advances in Neural Information Processing Systems, 36, 2024

work page 2024

[18] [18]

Graph neural networks and their current applications in bioinformatics.Frontiers in genetics, 12:690049, 2021

Xiao-Meng Zhang, Li Liang, Lin Liu, and Ming-Jing Tang. Graph neural networks and their current applications in bioinformatics.Frontiers in genetics, 12:690049, 2021

work page 2021

[19] [19]

Biognn: how graph neural networks can solve biological problems

Pietro Bongini, Niccol`o Pancino, Franco Scarselli, and Monica Bianchini. Biognn: how graph neural networks can solve biological problems. InArtificial Intelligence and Machine Learning for Healthcare: Vol. 1: Image and Data Analytics, pages 211–231. Springer, 2022

work page 2022

[20] [20]

Graph neural network for traffic forecasting: A survey.Expert Systems with Applications, 207:117921, 2022

Weiwei Jiang and Jiayun Luo. Graph neural network for traffic forecasting: A survey.Expert Systems with Applications, 207:117921, 2022

work page 2022

[21] [21]

A survey of graph neural network based recommendation in social networks.Neurocomputing, 549:126441, 2023

Xiao Li, Li Sun, Mengjie Ling, and Yan Peng. A survey of graph neural network based recommendation in social networks.Neurocomputing, 549:126441, 2023

work page 2023

[22] [22]

Graph neural networks for social recommendation

Wenqi Fan, Yao Ma, Qing Li, Yuan He, Eric Zhao, Jiliang Tang, and Dawei Yin. Graph neural networks for social recommendation. InThe world wide web conference, pages 417–426, 2019

work page 2019

[23] [23]

Dejun Jiang, Zhenxing Wu, Chang-Yu Hsieh, Guangyong Chen, Ben Liao, Zhe Wang, Chao Shen, Dongsheng Cao, Jian Wu, and Tingjun Hou. Could graph neural networks learn better molecular representation for drug discovery? a comparison study of descriptor-based and graph-based models.Journal of Cheminformatics, 13(1):12, Feb 2021

work page 2021

[24] [24]

A compact review of molecular property prediction with graph neural networks.Drug Discovery Today: Technologies, 37:1–12, 2020

Oliver Wieder, Stefan Kohlbacher, M´elaine Kuenemann, Arthur Garon, Pierre Ducrot, Thomas Seidel, and Thierry Langer. A compact review of molecular property prediction with graph neural networks.Drug Discovery Today: Technologies, 37:1–12, 2020

work page 2020

[25] [25]

Graph neural network approaches for drug-target interactions.Current Opinion in Structural Biology, 73:102327, 2022

Zehong Zhang, Lifan Chen, Feisheng Zhong, Dingyan Wang, Jiaxin Jiang, Sulin Zhang, Hualiang Jiang, Mingyue Zheng, and Xutong Li. Graph neural network approaches for drug-target interactions.Current Opinion in Structural Biology, 73:102327, 2022

work page 2022

[26] [26]

Performance improvements in chrome’s rendering pipeline

Chris Harrelson. Performance improvements in chrome’s rendering pipeline. chromium blog, 2017

work page 2017

[27] [27]

Rapid regression detection in software deployments through sequential testing

Michael Lindon, Chris Sanden, and Vach´e Shirikian. Rapid regression detection in software deployments through sequential testing. InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 3336–3346, 2022

work page 2022

[28] [28]

Pace: A program analysis framework for continuous performance prediction

Chidera Biringa and G¨okhan Kul. Pace: A program analysis framework for continuous performance prediction. ACM Transactions on Software Engineering and Methodology, 33(4):1–23, 2024. 13

work page 2024

[29] [29]

Recursive functions of symbolic expressions and their computation by machine, part i.Communi- cations of the ACM, 3(4):184–195, 1960

John McCarthy. Recursive functions of symbolic expressions and their computation by machine, part i.Communi- cations of the ACM, 3(4):184–195, 1960

work page 1960

[30] [30]

Understanding source code evolution using abstract syntax tree matching

Iulian Neamtiu, Jeffrey S Foster, and Michael Hicks. Understanding source code evolution using abstract syntax tree matching. InProceedings of the 2005 international workshop on Mining software repositories, pages 1–5, 2005

work page 2005

[31] [31]

A novel neural source code representation based on abstract syntax tree

Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, Kaixuan Wang, and Xudong Liu. A novel neural source code representation based on abstract syntax tree. In2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pages 783–794. IEEE, 2019

work page 2019

[32] [32]

Cast: Enhancing code summarization with hierarchical splitting and reconstruction of abstract syntax trees

Ensheng Shi, Yanlin Wang, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, and Hongbin Sun. Cast: Enhancing code summarization with hierarchical splitting and reconstruction of abstract syntax trees. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 4053–4062, 2021

work page 2021

[33] [33]

A systematic mapping study of source code representation for deep learning in software engineering.IET Software, 16(4):351–385, 2022

Peter Samoaa, Firas Bayram, Pasquale Salza, and Philipp Leitner. A systematic mapping study of source code representation for deep learning in software engineering.IET Software, 16(4):351–385, 2022

work page 2022

[34] [34]

Control flow analysis.ACM Sigplan Notices, 5(7):1–19, 1970

Frances E Allen. Control flow analysis.ACM Sigplan Notices, 5(7):1–19, 1970

work page 1970

[35] [35]

Traces of control-flow graphs

Simone Campanoni and Stefano Crespi Reghizzi. Traces of control-flow graphs. InDevelopments in Language Theory: 13th International Conference, DLT 2009, Stuttgart, Germany, June 30-July 3, 2009. Proceedings 13, pages 156–169. Springer, 2009

work page 2009

[36] [36]

Automatically deriving control-flow graph generators from operational semantics.Proceedings of the ACM on Programming Languages, 6(ICFP):742–771, 2022

James Koppel, Jackson Kearl, and Armando Solar-Lezama. Automatically deriving control-flow graph generators from operational semantics.Proceedings of the ACM on Programming Languages, 6(ICFP):742–771, 2022

work page 2022

[37] [37]

Survey of malware analysis through control flow graph using machine learning

Shaswata Mitra, Stephen A Torri, and Sudip Mittal. Survey of malware analysis through control flow graph using machine learning. In2023 IEEE 22nd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), pages 1554–1561. IEEE, 2023

work page 2023

[38] [38]

A preliminary architecture for a basic data-flow processor

Jack B Dennis and David P Misunas. A preliminary architecture for a basic data-flow processor. InProceedings of the 2nd annual symposium on Computer architecture, pages 126–132, 1974

work page 1974

[39] [39]

Davis and Robert M

Alan L. Davis and Robert M. Keller. Data flow program graphs.Computer, 15(02):26–41, 1982

work page 1982

[40] [40]

A formal definition of data flow graph models.IEEE Transactions on computers, 100(11):940–948, 1986

Kavi, Buckles, and Bhat. A formal definition of data flow graph models.IEEE Transactions on computers, 100(11):940–948, 1986

work page 1986

[41] [41]

Graphiler: Optimizing graph neural networks with message passing data flow graph.Proceedings of Machine Learning and Systems, 4:515–528, 2022

Zhiqiang Xie, Minjie Wang, Zihao Ye, Zheng Zhang, and Rui Fan. Graphiler: Optimizing graph neural networks with message passing data flow graph.Proceedings of Machine Learning and Systems, 4:515–528, 2022

work page 2022

[42] [42]

Tep-gnn: Accurate execution time prediction of functional tests using graph neural networks

Peter Samoaa, Antonio Longa, Mazen Mohamad, Morteza Haghir Chehreghani, and Philipp Leitner. Tep-gnn: Accurate execution time prediction of functional tests using graph neural networks. In Davide Taibi, Marco Kuhrmann, Tommi Mikkonen, Jil Kl¨under, and Pekka Abrahamsson, editors,Product-Focused Software Process Improvement, pages 464–479, Cham, 2022. Spri...

work page 2022

[43] [43]

Moleculenet: a benchmark for molecular machine learning.Chemical science, 9(2):513–530, 2018

Zhenqin Wu, Bharath Ramsundar, Evan N Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S Pappu, Karl Leswing, and Vijay Pande. Moleculenet: a benchmark for molecular machine learning.Chemical science, 9(2):513–530, 2018

work page 2018

[44] [44]

Automatic chemical design using a data-driven continuous representation of molecules.ACS central science, 4(2):268–276, 2018

Rafael G´omez-Bombarelli, Jennifer N Wei, David Duvenaud, Jos´e Miguel Hern´andez-Lobato, Benjam´ın S´anchez- Lengeling, Dennis Sheberla, Jorge Aguilera-Iparraguirre, Timothy D Hirzel, Ryan P Adams, and Al´an Aspuru- Guzik. Automatic chemical design using a data-driven continuous representation of molecules.ACS central science, 4(2):268–276, 2018

work page 2018

[45] [45]

An adaptive graph learning method for automated molecular interactions and properties predictions.nature machine intelligence, 4(7):645–651, 2022

Yuquan Li, Chang-Yu Hsieh, Ruiqiang Lu, Xiaoqing Gong, Xiaorui Wang, Pengyong Li, Shuo Liu, Yanan Tian, Dejun Jiang, Jiaxian Yan, et al. An adaptive graph learning method for automated molecular interactions and properties predictions.nature machine intelligence, 4(7):645–651, 2022

work page 2022

[46] [46]

Freesolv: a database of experimental and calculated hydration free energies, with input files.Journal of computer-aided molecular design, 28:711–720, 2014

David L Mobley and J Peter Guthrie. Freesolv: a database of experimental and calculated hydration free energies, with input files.Journal of computer-aided molecular design, 28:711–720, 2014

work page 2014

[47] [47]

Pdb-wide collection of binding data: current status of the pdbbind database.Bioinformatics, 31(3):405–412, 2015

Zhihai Liu, Yan Li, Li Han, Jie Li, Jie Liu, Zhixiong Zhao, Wei Nie, Yuchen Liu, and Renxiao Wang. Pdb-wide collection of binding data: current status of the pdbbind database.Bioinformatics, 31(3):405–412, 2015

work page 2015

[48] [48]

Graph rationalization with environment-based augmentations

Gang Liu, Tong Zhao, Jiaxin Xu, Tengfei Luo, and Meng Jiang. Graph rationalization with environment-based augmentations. InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 1069–1078, 2022

work page 2022

[49] [49]

Graph neural network for source code defect prediction.IEEE access, 10:10402–10415, 2022

Lucija ˇSiki´c, Adrian Satja Kurdija, Klemo Vladimir, and Marin ˇSili´c. Graph neural network for source code defect prediction.IEEE access, 10:10402–10415, 2022. 14

work page 2022

[50] [50]

Regvd: Revisiting graph neural networks for vulnerability detection

Van-Anh Nguyen, Dai Quoc Nguyen, Van Nguyen, Trung Le, Quan Hung Tran, and Dinh Phung. Regvd: Revisiting graph neural networks for vulnerability detection. InProceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings, pages 178–182, 2022

work page 2022

[51] [51]

Graph neural networks in program analysis.Graph neural networks: foundations, frontiers, and applications, pages 483–497, 2022

Miltiadis Allamanis. Graph neural networks in program analysis.Graph neural networks: foundations, frontiers, and applications, pages 483–497, 2022

work page 2022

[52] [52]

Learning graph-based code representations for source-level functional similarity detection

Jiahao Liu, Jun Zeng, Xiang Wang, and Zhenkai Liang. Learning graph-based code representations for source-level functional similarity detection. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pages 345–357. IEEE, 2023

work page 2023

[53] [53]

Learning to represent programs with graphs

Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. Learning to represent programs with graphs. InInternational Conference on Learning Representations, 2018

work page 2018

[54] [54]

Graphcodebert: Pre-training code representations with data flow, 2021

Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, Michele Tufano, Shao Kun Deng, Colin Clement, Dawn Drain, Neel Sundaresan, Jian Yin, Daxin Jiang, and Ming Zhou. Graphcodebert: Pre-training code representations with data flow, 2021

work page 2021

[55] [55]

Contrastive code representation learning

Paras Jain, Ajay Jain, Tianjun Zhang, Pieter Abbeel, Joseph Gonzalez, and Ion Stoica. Contrastive code representation learning. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2021

work page 2021

[56] [56]

Devign: Effective vulnerability identifi- cation by learning comprehensive program semantics via graph neural networks

Yaqin Zhou, Shangqing Liu, Jingkai Siow, Xiaoning Du, and Yang Liu. Devign: Effective vulnerability identifi- cation by learning comprehensive program semantics via graph neural networks. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch´e-Buc, E. Fox, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 32. Curran Assoc...

work page 2019

[57] [57]

Ali Babar

David Hin, Andrey Kan, Huaming Chen, and M. Ali Babar. Linevd: statement-level vulnerability detection using graph neural networks. InProceedings of the 19th International Conference on Mining Software Repositories, MSR ’22, page 596–607, New York, NY , USA, 2022. Association for Computing Machinery

work page 2022

[58] [58]

A novel neural source code representation based on abstract syntax tree

Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, Kaixuan Wang, and Xudong Liu. A novel neural source code representation based on abstract syntax tree. InProceedings of the 41st International Conference on Software Engineering, ICSE ’19, page 783–794. IEEE Press, 2019

work page 2019

[59] [59]

Cclearner: A deep learning-based clone detection approach

Liuqing Li, He Feng, Wenjie Zhuang, Na Meng, and Barbara Ryder. Cclearner: A deep learning-based clone detection approach. In2017 IEEE international conference on software maintenance and evolution (ICSME), pages 249–260. IEEE, 2017

work page 2017

[60] [60]

Automated vulnerability detection in source code using deep representation learning

Rebecca Russell, Louis Kim, Lei Hamilton, Tomo Lazovich, Jacob Harer, Onur Ozdemir, Paul Ellingwood, and Marc McConley. Automated vulnerability detection in source code using deep representation learning. In2018 17th IEEE international conference on machine learning and applications (ICMLA), pages 757–762. IEEE, 2018

work page 2018

[61] [61]

Towards better graph neural network-based fault localization through enhanced code representation.Proc

Md Nakhla Rafi, Dong Jae Kim, An Ran Chen, Tse-Hsun (Peter) Chen, and Shaowei Wang. Towards better graph neural network-based fault localization through enhanced code representation.Proc. ACM Softw. Eng., 1(FSE), July 2024

work page 2024

[62] [62]

Codebert: A pre-trained model for programming and natural languages, 2020

Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. Codebert: A pre-trained model for programming and natural languages, 2020

work page 2020

[63] [63]

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian...

work page 2021

[64] [64]

Dobf: a deobfuscation pre- training objective for programming languages

Marie-Anne Lachaux, Baptiste Roziere, Marc Szafraniec, and Guillaume Lample. Dobf: a deobfuscation pre- training objective for programming languages. InProceedings of the 35th International Conference on Neural Information Processing Systems, NIPS ’21, Red Hook, NY , USA, 2021. Curran Associates Inc

work page 2021

[65] [65]

Dobf: A deobfuscation pre-training objective for programming languages, 2021

Baptiste Roziere, Marie-Anne Lachaux, Marc Szafraniec, and Guillaume Lample. Dobf: A deobfuscation pre-training objective for programming languages, 2021

work page 2021

[66] [66]

Codegen: An open large language model for code with multi-turn program synthesis, 2023

Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. Codegen: An open large language model for code with multi-turn program synthesis, 2023. 15

work page 2023

[67] [67]

Yue Wang, Weishi Wang, Shafiq Joty, and Steven C.H. Hoi. CodeT5: Identifier-aware unified pre-trained encoder- decoder models for code understanding and generation. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih, editors,Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 8696–8708, O...

work page 2021

[68] [68]

Predicting unstable software benchmarks using static source code features.Empirical Softw

Christoph Laaber, Mikael Basmaci, and Pasquale Salza. Predicting unstable software benchmarks using static source code features.Empirical Softw. Engg., 26(6), November 2021

work page 2021

[69] [69]

Deepperf: performance prediction for configurable software with deep sparse neural network

Huong Ha and Hongyu Zhang. Deepperf: performance prediction for configurable software with deep sparse neural network. InProceedings of the 41st International Conference on Software Engineering, ICSE ’19, page 1095–1106. IEEE Press, 2019

work page 2019

[70] [70]

Batch mode deep active learning for regression on graph data

Peter Samoaa, Linus Aronsson, Philipp Leitner, and Morteza Haghir Chehreghani. Batch mode deep active learning for regression on graph data. In2023 IEEE International Conference on Big Data (BigData), pages 5904–5913, 2023

work page 2023

[71] [71]

Static analysis: An introduction: The fundamental challenge of software engineering is one of complexity.Queue, 19(4):29–41, September 2021

Patrick Thomson. Static analysis: An introduction: The fundamental challenge of software engineering is one of complexity.Queue, 19(4):29–41, September 2021

work page 2021

[72] [72]

Vuldeepecker: A deep learning-based system for vulnerability detection

Zhen Li, Deqing Zou, Shouhuai Xu, Xinyu Ou, Hai Jin, Sujuan Wang, Zhijun Deng, and Yuyi Zhong. Vuldeepecker: A deep learning-based system for vulnerability detection. InProceedings 2018 Network and Distributed System Security Symposium, NDSS 2018. Internet Society, 2018

work page 2018

[73] [73]

Dfept: Data flow embedding for enhancing pre-trained model based vulnerability detection

Zhonghao Jiang, Weifeng Sun, Xiaoyan Gu, Jiaxin Wu, Tao Wen, Haibo Hu, and Meng Yan. Dfept: Data flow embedding for enhancing pre-trained model based vulnerability detection. InProceedings of the 15th Asia-Pacific Symposium on Internetware, Internetware ’24, page 95–104, New York, NY , USA, 2024. Association for Computing Machinery

work page 2024

[74] [74]

Hellendoorn, Charles Sutton, Rishabh Singh, Petros Maniatis, and David Bieber

Vincent J. Hellendoorn, Charles Sutton, Rishabh Singh, Petros Maniatis, and David Bieber. Global relational models of source code. InInternational Conference on Learning Representations, 2020

work page 2020

[75] [75]

Neural message passing for quantum chemistry

Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural message passing for quantum chemistry. InInternational conference on machine learning, pages 1263–1272. PMLR, 2017

work page 2017

[76] [76]

Spectral Networks and Locally Connected Networks on Graphs

Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. Spectral networks and locally connected networks on graphs.arXiv preprint arXiv:1312.6203, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[77] [77]

Structpool: Structured graph pooling via conditional random fields

Hao Yuan and Shuiwang Ji. Structpool: Structured graph pooling via conditional random fields. InProceedings of the 8th International Conference on Learning Representations, 2020

work page 2020

[78] [78]

Memory-based graph networks

Amir Hosein Khasahmadi, Kaveh Hassani, Parsa Moradi, Leo Lee, and Quaid Morris. Memory-based graph networks. InInternational Conference on Learning Representations, 2020

work page 2020

[79] [79]

How powerful are graph neural networks? In International Conference on Learning Representations, 2019

Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks? In International Conference on Learning Representations, 2019

work page 2019

[80] [80]

Convolutional neural networks on graphs with fast localized spectral filtering.Advances in neural information processing systems, 29, 2016

Micha¨el Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering.Advances in neural information processing systems, 29, 2016

work page 2016