Building Deep Graph Predictors with Graph Imitation Learning

Andr\'e Eberhard; Gerhard Neumann; Pascal Friederich

arxiv: 2601.15133 · v3 · pith:H5UD2MYYnew · submitted 2026-01-21 · 💻 cs.CV · cs.LG

Building Deep Graph Predictors with Graph Imitation Learning

Andr\'e Eberhard , Gerhard Neumann , Pascal Friederich This is my paper

Pith reviewed 2026-05-21 15:40 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords graph generationimitation learningMarkov decision processsupervised graph predictionneural networkssequential generationgraph embeddings

0 comments

The pith

Neural networks can predict graphs by building them sequentially as a series of decisions over partial graph embeddings rather than forcing outputs onto fixed grids.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper identifies that representing graphs on fixed-size Euclidean grids creates theoretical pitfalls for training neural networks via backpropagation in supervised prediction tasks. It proposes GRAIL as an alternative that treats graph generation as a Markov decision process, where the model learns to add elements step by step by imitating expert sequences while working with embeddings of the growing partial graph. This sequential imitation approach sidesteps the representation problems and delivers competitive performance when tested across 18 different graph prediction benchmarks. A sympathetic reader would care because it offers a more direct way to handle the variable nature and structure of graphs as outputs without the distortions that come from padding or reshaping them into uniform arrays.

Core claim

GRAIL generates graphs sequentially through a Markov decision process over embeddings of partial graphs, thereby avoiding the representation issues associated with fixed-size grid graph representations and achieving competitive results on 18 benchmarks.

What carries the argument

GRAIL, a supervised imitation learning framework that models graph construction as a Markov decision process acting on embeddings of partial graphs to produce the final output graph element by element.

If this is right

Graph prediction models can be trained directly on variable-sized outputs without the need for artificial resizing or padding that distorts connectivity.
Imitation learning on sequential decisions provides a stable training signal for neural networks that produce structured graph outputs.
The same partial-graph embedding approach can be applied to other supervised tasks where the target is a graph rather than a fixed vector or image.
Competitive accuracy on diverse benchmarks suggests the method generalizes across domains that require outputting graphs from input data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The sequential decision framing could transfer to generating other non-Euclidean structures such as molecular graphs or scene graphs in vision systems.
Combining this imitation setup with reinforcement learning rewards might enable better handling of long-horizon graph construction without expert demonstrations.
The avoidance of grid representations may reduce the computational overhead of graph neural network layers during inference on the generated outputs.

Load-bearing premise

Representing graphs on a fixed-size Euclidean grid is not the optimal choice for supervised graph prediction tasks and leads to identifiable theoretical challenges in neural training.

What would settle it

Train a fixed-grid neural model using the same imitation learning objective and partial-graph embedding features on the identical 18 benchmarks; if the grid model matches or exceeds GRAIL performance after equivalent hyperparameter search, the claimed representation problems would be called into question.

Figures

Figures reproduced from arXiv: 2601.15133 by Andr\'e Eberhard, Gerhard Neumann, Pascal Friederich.

**Figure 2.** Figure 2: Training dynamics of several synthetic tasks of varying complexity on trees with six to nine nodes. Shaded areas correspond to the standard deviation of a window of 10 iterations, i.e., 10M samples for training and 1000 trajectories for evaluation. Experiments X NC and Y EC have been conducted on graphs with X node and Y edge colors. k varies depending on the number of valid successor states. smaller insta… view at source ↗

**Figure 3.** Figure 3: Training dynamics of several synthetic tasks of varying complexity on trees with 10 to 15 nodes. Shaded areas correspond to the standard deviation of a window of 10 iterations, i.e., 10M samples for training and 1000 trajectories for evaluation. Experiments X NC and Y EC have been conducted on graphs with X node and Y edge colors. k varies depending on the number of valid successor states. issues unique to… view at source ↗

**Figure 4.** Figure 4: Accuracy on the QM9 dataset. OSRA GraSP (ours) MolGrapher DECIMER Accuracy 45.61% 67.51% 88.36% 92.08% TABLE I ACCURACY ON QM9 IMAGES WITH NO STEREO-BONDS AND IN GRAYSCALE. WE RANDOMLY SELECT 10, 000 IMAGES FOR EVALUATION AND KEEP THE REST FOR TRAINING. of using graphs as neural network outputs, we derived a generic, efficient and flexible formulation of sequential graph generation conditioned on an image.… view at source ↗

**Figure 6.** Figure 6: Training dynamics for graphs of size 10 to 15. [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗

read the original abstract

Recent years have seen substantial progress in neural generation of text, images, and audio, supported by mature training pipelines and large-scale optimization. For graphs, however, comparable progress has been more limited. We attribute this gap to graph-specific optimization and representation challenges that undermine the effectiveness of training neural networks with backpropagation and gradient descent. We argue that representing graphs on a fixed-size Euclidean grid, as is common in recently proposed models for supervised graph prediction, may not be the optimal choice in these settings. To support our view, we provide an analysis of neural graph generation methods and identify theoretical challenges that lead to pitfalls when training neural networks to produce graphs as their output. Motivated by this analysis, we introduce \textbf{GRA}ph~\textbf{I}mitation~\textbf{L}earning~(GRAIL), a framework for training neural networks in supervised settings in which the supervision signal is a graph. GRAIL generates graphs sequentially through a Markov decision process over embeddings of partial graphs, thereby avoiding the representation issues associated with fixed-size grid graph representations. We empirically show that GRAIL achieves competitive results on supervised graph prediction across a comprehensive suite of 18 benchmarks, matching or surpassing state-of-the-art methods in several settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper identifies theoretical challenges in training neural networks for supervised graph prediction when using fixed-size Euclidean grid representations, including gradient instability, size mismatch, and backpropagation difficulties. Motivated by this analysis, it introduces GRAIL, a graph imitation learning framework that generates graphs sequentially as a Markov decision process over embeddings of partial graphs. The method is claimed to avoid the representation issues of grid-based approaches and is shown to achieve competitive or superior results on a suite of 18 benchmarks.

Significance. If the central claim holds, the work could be significant for advancing neural methods in graph generation and prediction by offering a sequential MDP-based alternative that sidesteps grid representation pitfalls. The comprehensive evaluation across 18 benchmarks is a positive aspect, and the use of imitation learning in this setting represents a structured approach to supervised graph tasks. However, the significance is tempered by the moderate soundness of the empirical support and the need for stronger evidence that the proposed method genuinely avoids analogous optimization issues.

major comments (2)

[Abstract and theoretical analysis] Abstract and theoretical analysis section: the load-bearing claim that GRAIL 'thereby avoiding the representation issues associated with fixed-size grid graph representations' is not secured by a direct argument or experiment showing that the partial-graph embedding function and policy network are free of equivalent sensitivities (e.g., embedding quality degradation with growing partial size or differentiation through variable-length sequences). The skeptic concern that the MDP may inherit analogous optimization issues is not explicitly ruled out.
[Empirical results] Empirical results section: the competitive performance across 18 benchmarks is reported without error bars, explicit data-split details, or statistical significance tests. This leaves open whether the results robustly support the superiority or equivalence claims or could stem from other modeling choices rather than the claimed avoidance of grid issues.

minor comments (2)

[Methods] The MDP transition and reward hyperparameters are listed as free parameters; a sensitivity analysis or default values would improve reproducibility.
[Methods] Notation for partial graph embeddings and the imitation learning objective could be clarified with an explicit diagram or pseudocode to aid readers unfamiliar with the MDP formulation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive review. We address each major comment below and describe the changes we will make to the manuscript.

read point-by-point responses

Referee: [Abstract and theoretical analysis] Abstract and theoretical analysis section: the load-bearing claim that GRAIL 'thereby avoiding the representation issues associated with fixed-size grid graph representations' is not secured by a direct argument or experiment showing that the partial-graph embedding function and policy network are free of equivalent sensitivities (e.g., embedding quality degradation with growing partial size or differentiation through variable-length sequences). The skeptic concern that the MDP may inherit analogous optimization issues is not explicitly ruled out.

Authors: We acknowledge that the current theoretical analysis contrasts GRAIL with fixed-grid approaches but does not contain a dedicated experiment or formal argument that explicitly rules out every analogous sensitivity in the partial-graph embedding function or policy network. The sequential MDP formulation is intended to sidestep fixed-size representation problems by operating on variable-length partial graphs, and the imitation learning objective provides a stable training signal that does not rely on back-propagating through a complete fixed-size output. In the revised manuscript we will expand the theoretical analysis section with an explicit discussion of potential sensitivities (including embedding degradation with partial-graph size and differentiation through variable-length sequences) and explain how the MDP and imitation-learning setup are designed to mitigate them. We will also add a brief note on why we believe these issues are less severe than in grid-based methods. revision: partial
Referee: [Empirical results] Empirical results section: the competitive performance across 18 benchmarks is reported without error bars, explicit data-split details, or statistical significance tests. This leaves open whether the results robustly support the superiority or equivalence claims or could stem from other modeling choices rather than the claimed avoidance of grid issues.

Authors: We agree that the empirical section would be strengthened by additional statistical detail. In the revised version we will report mean performance together with standard deviations computed over multiple random seeds, provide explicit descriptions of the train/validation/test splits used for each of the 18 benchmarks, and include statistical significance tests (paired t-tests against the strongest baseline) to support the reported comparisons. These additions will help demonstrate that the observed performance is attributable to the GRAIL framework rather than other modeling choices. revision: yes

Circularity Check

0 steps flagged

Derivation chain is self-contained; no reductions to fitted inputs or self-citations

full rationale

The paper first analyzes theoretical challenges (gradient instability, size mismatch, backprop issues) with fixed-size Euclidean grid representations for supervised graph prediction. It then motivates and defines GRAIL as a sequential MDP over partial-graph embeddings to address those challenges. Competitive empirical results on 18 benchmarks are presented as validation rather than as a quantity forced by the method's own definitions or fitted parameters. No load-bearing step reduces a claimed prediction to an input by construction, and no self-citation chain or uniqueness theorem is invoked to justify the core framework.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on standard supervised learning assumptions plus the domain claim that sequential MDP decisions on partial embeddings avoid grid-related pitfalls; no new physical entities are postulated.

free parameters (1)

MDP transition and reward hyperparameters
Typical neural network and imitation learning hyperparameters required to train the sequential policy; not enumerated in abstract.

axioms (1)

domain assumption Sequential decisions over partial graph embeddings can produce valid complete graphs without fixed-grid artifacts
Central modeling choice stated in the abstract as the solution to identified representation challenges.

pith-pipeline@v0.9.0 · 5745 in / 1106 out tokens · 41338 ms · 2026-05-21T15:40:32.340559+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

GRAIL generates graphs sequentially through a Markov decision process over embeddings of partial graphs, thereby avoiding the representation issues associated with fixed-size grid graph representations
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We argue that representing graphs on a fixed-size Euclidean grid... may not be the optimal choice

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 3 internal anchors

[1]

Imagenet classification with deep convolutional neural networks,

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,”Advances in neural informa- tion processing systems, vol. 25, 2012

work page 2012
[2]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778

work page 2016
[3]

Rich feature hierarchies for accurate object detection and semantic segmentation,

R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580–587

work page 2014
[4]

U-net: Convolutional networks for biomedical image segmentation

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation.” Springer, 2015, pp. 234–241

work page 2015
[5]

You only look once: Unified, real-time object detection,

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779– 788

work page 2016
[6]

Optical graph recognition,

C. Auer, C. Bachmaier, F. J. Brandenburg, A. Gleißner, and J. Reislhu- ber, “Optical graph recognition,” inGraph Drawing: 20th International Symposium, GD 2012, Redmond, WA, USA, September 19-21, 2012, Revised Selected Papers 20. Springer, 2013, pp. 529–540

work page 2012
[7]

A holistic approach for image-to-graph: application to optical music recognition,

C. Garrido-Munoz, A. Rios-Vila, and J. Calvo-Zaragoza, “A holistic approach for image-to-graph: application to optical music recognition,” International Journal on Document Analysis and Recognition (IJDAR), vol. 25, no. 4, pp. 293–303, 2022

work page 2022
[8]

Graph-based deep learning for medical diagnosis and analysis: past, present and future,

D. Ahmedt-Aristizabal, M. A. Armin, S. Denman, C. Fookes, and L. Petersson, “Graph-based deep learning for medical diagnosis and analysis: past, present and future,”Sensors, vol. 21, no. 14, p. 4758, 2021

work page 2021
[9]

Realtime multi-person 2d pose estimation using part affinity fields,

Z. Cao, T. Simon, S.-E. Wei, and Y . Sheikh, “Realtime multi-person 2d pose estimation using part affinity fields,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 7291– 7299

work page 2017
[10]

Image-conditioned graph generation for road network extraction,

D. Belli and T. Kipf, “Image-conditioned graph generation for road network extraction,”arXiv preprint arXiv:1910.14388, 2019

work page arXiv 1910
[11]

Chemgrapher: optical graph recognition of chemical compounds by deep learning,

M. Oldenhof, A. Arany, Y . Moreau, and J. Simm, “Chemgrapher: optical graph recognition of chemical compounds by deep learning,”Journal of chemical information and modeling, vol. 60, no. 10, pp. 4506–4517, 2020

work page 2020
[12]

Molgrapher: Graph-based visual recognition of chemical structures,

L. Morin, M. Danelljan, M. I. Agea, A. Nassar, V . Weber, I. Meijer, P. Staar, and F. Yu, “Molgrapher: Graph-based visual recognition of chemical structures,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 10 2023, pp. 19 552–19 561

work page 2023
[13]

Scene parsing by integrating function, geometry and appearance models,

Y . Zhao and S.-C. Zhu, “Scene parsing by integrating function, geometry and appearance models,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2013, pp. 3119–3126

work page 2013
[14]

Visual relationship detection with language priors,

C. Lu, R. Krishna, M. Bernstein, and L. Fei-Fei, “Visual relationship detection with language priors,” inComputer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer, 2016, pp. 852–869

work page 2016
[15]

Unbiased scene graph generation from biased training,

K. Tang, Y . Niu, J. Huang, J. Shi, and H. Zhang, “Unbiased scene graph generation from biased training,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 3716– 3725

work page 2020
[16]

Graph r-cnn for scene graph generation,

J. Yang, J. Lu, S. Lee, D. Batra, and D. Parikh, “Graph r-cnn for scene graph generation,” inProceedings of the European conference on computer vision (ECCV), 2018, pp. 670–685

work page 2018
[17]

Visual relationship detection with internal and external linguistic knowledge distillation,

R. Yu, A. Li, V . I. Morariu, and L. S. Davis, “Visual relationship detection with internal and external linguistic knowledge distillation,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 1974–1982

work page 2017
[18]

Scene graph generation with role-playing large language models,

G. Chen, J. Li, and W. Wang, “Scene graph generation with role-playing large language models,”Advances in Neural Information Processing Systems, vol. 37, pp. 132 238–132 266, 2024. 8

work page 2024
[19]

Egtr: Extracting graph from transformer for scene graph generation,

J. Im, J. Nam, N. Park, H. Lee, and S. Park, “Egtr: Extracting graph from transformer for scene graph generation,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 24 229–24 238

work page 2024
[20]

Graphrnn: Generating realistic graphs with deep auto-regressive models,

J. You, R. Ying, X. Ren, W. Hamilton, and J. Leskovec, “Graphrnn: Generating realistic graphs with deep auto-regressive models,” inInter- national conference on machine learning. PMLR, 2018, pp. 5708–5717

work page 2018
[21]

Efficient graph generation with graph recurrent attention networks,

R. Liao, Y . Li, Y . Song, S. Wang, W. Hamilton, D. K. Duvenaud, R. Ur- tasun, and R. Zemel, “Efficient graph generation with graph recurrent attention networks,”Advances in neural information processing systems, vol. 32, 2019

work page 2019
[22]

Learning Deep Generative Models of Graphs

Y . Li, O. Vinyals, C. Dyer, R. Pascanu, and P. Battaglia, “Learning deep generative models of graphs,”arXiv preprint arXiv:1803.03324, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[23]

Decimer. ai: an open platform for automated optical chemical structure identification, segmentation and recognition in scientific publications,

K. Rajan, H. O. Brinkhaus, M. I. Agea, A. Zielesny, and C. Steinbeck, “Decimer. ai: an open platform for automated optical chemical structure identification, segmentation and recognition in scientific publications,” Nature communications, vol. 14, no. 1, p. 5045, 2023

work page 2023
[24]

The bitter lesson,

R. Sutton, “The bitter lesson,”Incomplete Ideas (blog), vol. 13, no. 1, p. 38, 2019

work page 2019
[25]

Auto-Encoding Variational Bayes

D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” in2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, Y . Bengio and Y . LeCun, Eds., 2014. [Online]. Available: http://arxiv.org/abs/1312.6114

work page internal anchor Pith review Pith/arXiv arXiv 2014
[26]

Denoising diffusion probabilistic models,

J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Advances in neural information processing systems, vol. 33, pp. 6840– 6851, 2020

work page 2020
[27]

Language mod- els are few-shot learners,

T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askellet al., “Language mod- els are few-shot learners,”Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020

work page 1901
[28]

Variational Graph Auto-Encoders

T. N. Kipf and M. Welling, “Variational graph auto-encoders,”arXiv preprint arXiv:1611.07308, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[29]

Score-based generative modeling of graphs via the system of stochastic differential equations,

J. Jo, S. Lee, and S. J. Hwang, “Score-based generative modeling of graphs via the system of stochastic differential equations,” inInterna- tional conference on machine learning. PMLR, 2022, pp. 10 362– 10 383

work page 2022
[30]

Graphvae: Towards generation of small graphs using variational autoencoders,

M. Simonovsky and N. Komodakis, “Graphvae: Towards generation of small graphs using variational autoencoders,” inInternational conference on artificial neural networks. Springer, 2018, pp. 412–422

work page 2018
[31]

Film: Visual reasoning with a general conditioning layer,

E. Perez, F. Strub, H. De Vries, V . Dumoulin, and A. Courville, “Film: Visual reasoning with a general conditioning layer,” inProceedings of the AAAI conference on artificial intelligence, vol. 32, no. 1, 2018

work page 2018
[32]

Neural message passing for quantum chemistry,

J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl, “Neural message passing for quantum chemistry,” inInternational conference on machine learning. PMLR, 2017, pp. 1263–1272

work page 2017
[33]

Identity mappings in deep residual networks,

K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residual networks,” inComputer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14. Springer, 2016, pp. 630–645

work page 2016
[34]

Optical structure recognition soft- ware to recover chemical information: Osra, an open source solution,

I. V . Filippov and M. C. Nicklaus, “Optical structure recognition soft- ware to recover chemical information: Osra, an open source solution,” 2009

work page 2009
[35]

Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules,

D. Weininger, “Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules,”Journal of chemical information and computer sciences, vol. 28, no. 1, pp. 31–36, 1988

work page 1988
[36]

Quantum chemistry structures and properties of 134 kilo molecules,

R. Ramakrishnan, P. O. Dral, M. Rupp, and O. A. V on Lilienfeld, “Quantum chemistry structures and properties of 134 kilo molecules,” Scientific data, vol. 1, no. 1, pp. 1–7, 2014

work page 2014
[37]

Focal loss for dense object detection,

T.-Y . Lin, P. Goyal, R. Girshick, K. He, and P. Doll ´ar, “Focal loss for dense object detection,” inProceedings of the IEEE international conference on computer vision, 2017, pp. 2980–2988

work page 2017
[38]

Asymmetric loss for multi-label classification,

T. Ridnik, E. Ben-Baruch, N. Zamir, A. Noy, I. Friedman, M. Protter, and L. Zelnik-Manor, “Asymmetric loss for multi-label classification,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 82–91

work page 2021
[39]

On the variance of the adaptive learning rate and beyond.arXiv preprint arXiv:1908.03265,

L. Liu, H. Jiang, P. He, W. Chen, X. Liu, J. Gao, and J. Han, “On the variance of the adaptive learning rate and beyond,”arXiv preprint arXiv:1908.03265, 2019. 1 APPENDIXA ONLINE REPOSITORY We provide additional information in an anonymized GitHub repository located at: https://github.com/c72bcbf4/ grasp. Reviewers are encouraged to try out our model inte...

work page arXiv 1908
[40]

We sample a target graph and generate a corresponding image of it

work page
[41]

To obtain image- graph pairs, we run several DFS decompositions of the target graph by deleting edges that do not disconnect the graph

We then need to decide how and how many subgraphs with labels to generate per image, since we need both diverse images and image-graph pairs. To obtain image- graph pairs, we run several DFS decompositions of the target graph by deleting edges that do not disconnect the graph. To avoid duplicates, we hash every graph and skip decomposition branches which ...

work page
[42]

By approximate graph matching, we decide whether a successor is a positive or negative instance, and add these samples to our dataset

We then need to proceed to generate negative examples, which we do, for every positive subgraph in the previous step, by looking at a subset of possible modifications and the resulting graphs. By approximate graph matching, we decide whether a successor is a positive or negative instance, and add these samples to our dataset

work page
[43]

A termi- nal subgraph is only a positive instance if it is the target graph and not otherwise

As our algorithm must distinguish between terminal and non-terminal graphs, we add to each graph a terminal self-transition, which we use during decoding. A termi- nal subgraph is only a positive instance if it is the target graph and not otherwise. As we are free to choose the data distribution, we randomly select10%of all samples to be terminal states. ...

work page

[1] [1]

Imagenet classification with deep convolutional neural networks,

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,”Advances in neural informa- tion processing systems, vol. 25, 2012

work page 2012

[2] [2]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778

work page 2016

[3] [3]

Rich feature hierarchies for accurate object detection and semantic segmentation,

R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580–587

work page 2014

[4] [4]

U-net: Convolutional networks for biomedical image segmentation

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation.” Springer, 2015, pp. 234–241

work page 2015

[5] [5]

You only look once: Unified, real-time object detection,

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779– 788

work page 2016

[6] [6]

Optical graph recognition,

C. Auer, C. Bachmaier, F. J. Brandenburg, A. Gleißner, and J. Reislhu- ber, “Optical graph recognition,” inGraph Drawing: 20th International Symposium, GD 2012, Redmond, WA, USA, September 19-21, 2012, Revised Selected Papers 20. Springer, 2013, pp. 529–540

work page 2012

[7] [7]

A holistic approach for image-to-graph: application to optical music recognition,

C. Garrido-Munoz, A. Rios-Vila, and J. Calvo-Zaragoza, “A holistic approach for image-to-graph: application to optical music recognition,” International Journal on Document Analysis and Recognition (IJDAR), vol. 25, no. 4, pp. 293–303, 2022

work page 2022

[8] [8]

Graph-based deep learning for medical diagnosis and analysis: past, present and future,

D. Ahmedt-Aristizabal, M. A. Armin, S. Denman, C. Fookes, and L. Petersson, “Graph-based deep learning for medical diagnosis and analysis: past, present and future,”Sensors, vol. 21, no. 14, p. 4758, 2021

work page 2021

[9] [9]

Realtime multi-person 2d pose estimation using part affinity fields,

Z. Cao, T. Simon, S.-E. Wei, and Y . Sheikh, “Realtime multi-person 2d pose estimation using part affinity fields,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 7291– 7299

work page 2017

[10] [10]

Image-conditioned graph generation for road network extraction,

D. Belli and T. Kipf, “Image-conditioned graph generation for road network extraction,”arXiv preprint arXiv:1910.14388, 2019

work page arXiv 1910

[11] [11]

Chemgrapher: optical graph recognition of chemical compounds by deep learning,

M. Oldenhof, A. Arany, Y . Moreau, and J. Simm, “Chemgrapher: optical graph recognition of chemical compounds by deep learning,”Journal of chemical information and modeling, vol. 60, no. 10, pp. 4506–4517, 2020

work page 2020

[12] [12]

Molgrapher: Graph-based visual recognition of chemical structures,

L. Morin, M. Danelljan, M. I. Agea, A. Nassar, V . Weber, I. Meijer, P. Staar, and F. Yu, “Molgrapher: Graph-based visual recognition of chemical structures,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 10 2023, pp. 19 552–19 561

work page 2023

[13] [13]

Scene parsing by integrating function, geometry and appearance models,

Y . Zhao and S.-C. Zhu, “Scene parsing by integrating function, geometry and appearance models,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2013, pp. 3119–3126

work page 2013

[14] [14]

Visual relationship detection with language priors,

C. Lu, R. Krishna, M. Bernstein, and L. Fei-Fei, “Visual relationship detection with language priors,” inComputer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer, 2016, pp. 852–869

work page 2016

[15] [15]

Unbiased scene graph generation from biased training,

K. Tang, Y . Niu, J. Huang, J. Shi, and H. Zhang, “Unbiased scene graph generation from biased training,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 3716– 3725

work page 2020

[16] [16]

Graph r-cnn for scene graph generation,

J. Yang, J. Lu, S. Lee, D. Batra, and D. Parikh, “Graph r-cnn for scene graph generation,” inProceedings of the European conference on computer vision (ECCV), 2018, pp. 670–685

work page 2018

[17] [17]

Visual relationship detection with internal and external linguistic knowledge distillation,

R. Yu, A. Li, V . I. Morariu, and L. S. Davis, “Visual relationship detection with internal and external linguistic knowledge distillation,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 1974–1982

work page 2017

[18] [18]

Scene graph generation with role-playing large language models,

G. Chen, J. Li, and W. Wang, “Scene graph generation with role-playing large language models,”Advances in Neural Information Processing Systems, vol. 37, pp. 132 238–132 266, 2024. 8

work page 2024

[19] [19]

Egtr: Extracting graph from transformer for scene graph generation,

J. Im, J. Nam, N. Park, H. Lee, and S. Park, “Egtr: Extracting graph from transformer for scene graph generation,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 24 229–24 238

work page 2024

[20] [20]

Graphrnn: Generating realistic graphs with deep auto-regressive models,

J. You, R. Ying, X. Ren, W. Hamilton, and J. Leskovec, “Graphrnn: Generating realistic graphs with deep auto-regressive models,” inInter- national conference on machine learning. PMLR, 2018, pp. 5708–5717

work page 2018

[21] [21]

Efficient graph generation with graph recurrent attention networks,

R. Liao, Y . Li, Y . Song, S. Wang, W. Hamilton, D. K. Duvenaud, R. Ur- tasun, and R. Zemel, “Efficient graph generation with graph recurrent attention networks,”Advances in neural information processing systems, vol. 32, 2019

work page 2019

[22] [22]

Learning Deep Generative Models of Graphs

Y . Li, O. Vinyals, C. Dyer, R. Pascanu, and P. Battaglia, “Learning deep generative models of graphs,”arXiv preprint arXiv:1803.03324, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[23] [23]

Decimer. ai: an open platform for automated optical chemical structure identification, segmentation and recognition in scientific publications,

K. Rajan, H. O. Brinkhaus, M. I. Agea, A. Zielesny, and C. Steinbeck, “Decimer. ai: an open platform for automated optical chemical structure identification, segmentation and recognition in scientific publications,” Nature communications, vol. 14, no. 1, p. 5045, 2023

work page 2023

[24] [24]

The bitter lesson,

R. Sutton, “The bitter lesson,”Incomplete Ideas (blog), vol. 13, no. 1, p. 38, 2019

work page 2019

[25] [25]

Auto-Encoding Variational Bayes

D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” in2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, Y . Bengio and Y . LeCun, Eds., 2014. [Online]. Available: http://arxiv.org/abs/1312.6114

work page internal anchor Pith review Pith/arXiv arXiv 2014

[26] [26]

Denoising diffusion probabilistic models,

J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Advances in neural information processing systems, vol. 33, pp. 6840– 6851, 2020

work page 2020

[27] [27]

Language mod- els are few-shot learners,

T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askellet al., “Language mod- els are few-shot learners,”Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020

work page 1901

[28] [28]

Variational Graph Auto-Encoders

T. N. Kipf and M. Welling, “Variational graph auto-encoders,”arXiv preprint arXiv:1611.07308, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[29] [29]

Score-based generative modeling of graphs via the system of stochastic differential equations,

J. Jo, S. Lee, and S. J. Hwang, “Score-based generative modeling of graphs via the system of stochastic differential equations,” inInterna- tional conference on machine learning. PMLR, 2022, pp. 10 362– 10 383

work page 2022

[30] [30]

Graphvae: Towards generation of small graphs using variational autoencoders,

M. Simonovsky and N. Komodakis, “Graphvae: Towards generation of small graphs using variational autoencoders,” inInternational conference on artificial neural networks. Springer, 2018, pp. 412–422

work page 2018

[31] [31]

Film: Visual reasoning with a general conditioning layer,

E. Perez, F. Strub, H. De Vries, V . Dumoulin, and A. Courville, “Film: Visual reasoning with a general conditioning layer,” inProceedings of the AAAI conference on artificial intelligence, vol. 32, no. 1, 2018

work page 2018

[32] [32]

Neural message passing for quantum chemistry,

J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl, “Neural message passing for quantum chemistry,” inInternational conference on machine learning. PMLR, 2017, pp. 1263–1272

work page 2017

[33] [33]

Identity mappings in deep residual networks,

K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residual networks,” inComputer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14. Springer, 2016, pp. 630–645

work page 2016

[34] [34]

Optical structure recognition soft- ware to recover chemical information: Osra, an open source solution,

I. V . Filippov and M. C. Nicklaus, “Optical structure recognition soft- ware to recover chemical information: Osra, an open source solution,” 2009

work page 2009

[35] [35]

Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules,

D. Weininger, “Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules,”Journal of chemical information and computer sciences, vol. 28, no. 1, pp. 31–36, 1988

work page 1988

[36] [36]

Quantum chemistry structures and properties of 134 kilo molecules,

R. Ramakrishnan, P. O. Dral, M. Rupp, and O. A. V on Lilienfeld, “Quantum chemistry structures and properties of 134 kilo molecules,” Scientific data, vol. 1, no. 1, pp. 1–7, 2014

work page 2014

[37] [37]

Focal loss for dense object detection,

T.-Y . Lin, P. Goyal, R. Girshick, K. He, and P. Doll ´ar, “Focal loss for dense object detection,” inProceedings of the IEEE international conference on computer vision, 2017, pp. 2980–2988

work page 2017

[38] [38]

Asymmetric loss for multi-label classification,

T. Ridnik, E. Ben-Baruch, N. Zamir, A. Noy, I. Friedman, M. Protter, and L. Zelnik-Manor, “Asymmetric loss for multi-label classification,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 82–91

work page 2021

[39] [39]

On the variance of the adaptive learning rate and beyond.arXiv preprint arXiv:1908.03265,

L. Liu, H. Jiang, P. He, W. Chen, X. Liu, J. Gao, and J. Han, “On the variance of the adaptive learning rate and beyond,”arXiv preprint arXiv:1908.03265, 2019. 1 APPENDIXA ONLINE REPOSITORY We provide additional information in an anonymized GitHub repository located at: https://github.com/c72bcbf4/ grasp. Reviewers are encouraged to try out our model inte...

work page arXiv 1908

[40] [40]

We sample a target graph and generate a corresponding image of it

work page

[41] [41]

To obtain image- graph pairs, we run several DFS decompositions of the target graph by deleting edges that do not disconnect the graph

We then need to decide how and how many subgraphs with labels to generate per image, since we need both diverse images and image-graph pairs. To obtain image- graph pairs, we run several DFS decompositions of the target graph by deleting edges that do not disconnect the graph. To avoid duplicates, we hash every graph and skip decomposition branches which ...

work page

[42] [42]

By approximate graph matching, we decide whether a successor is a positive or negative instance, and add these samples to our dataset

We then need to proceed to generate negative examples, which we do, for every positive subgraph in the previous step, by looking at a subset of possible modifications and the resulting graphs. By approximate graph matching, we decide whether a successor is a positive or negative instance, and add these samples to our dataset

work page

[43] [43]

A termi- nal subgraph is only a positive instance if it is the target graph and not otherwise

As our algorithm must distinguish between terminal and non-terminal graphs, we add to each graph a terminal self-transition, which we use during decoding. A termi- nal subgraph is only a positive instance if it is the target graph and not otherwise. As we are free to choose the data distribution, we randomly select10%of all samples to be terminal states. ...

work page