pith. sign in

arxiv: 2606.06415 · v1 · pith:XES6CAEGnew · submitted 2026-06-04 · ❄️ cond-mat.mtrl-sci

PolyGraphPy: A unified Python framework for atomistic simulation and machine learning-driven polymer design

Pith reviewed 2026-06-28 00:32 UTC · model grok-4.3

classification ❄️ cond-mat.mtrl-sci
keywords polymer designgraph neural networksBayesian methodsgenerative modelsDFTB simulationsacrylatesmaterial informaticsproperty prediction
0
0 comments X

The pith

PolyGraphPy integrates DFTB simulations with Bayesian GNNs and generative models for polymer property prediction and design.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents PolyGraphPy as an open-source Python framework that automates Density Functional Tight Binding calculations to build structured datasets for monomers, homopolymers, and alternating copolymers. It employs Bayesian Graph Neural Networks with stochastic graph representations to predict properties such as static polarizability while quantifying uncertainty. Two generative models, a SELFIES-based GPT and a BRICS-based genetic algorithm, support de novo design of molecules with desired traits. The approach is demonstrated on an acrylate dataset to create a customizable end-to-end pipeline.

Core claim

PolyGraphPy supplies a unified platform that links efficient atomistic simulations for dataset generation with Bayesian GNN property predictors and complementary generative models for targeted polymer creation, shown to work on acrylates.

What carries the argument

Bayesian Graph Neural Networks using stochastic graph representations for property prediction and uncertainty quantification, paired with SELFIES-GPT and BRICS-GA generative models.

Load-bearing premise

Bayesian Graph Neural Networks with stochastic graph representations deliver accurate predictions and robust uncertainty quantification for properties such as static polarizability.

What would settle it

Comparison of Bayesian GNN predictions for static polarizability against experimental measurements on a held-out set of acrylate polymers.

Figures

Figures reproduced from arXiv: 2606.06415 by Jo\~ao G. C. S. Duarte, Ketson R. M. dos Santos, Morgan Cencer, Shruti Venkatram, Traian Dumitric\v{a}.

Figure 1
Figure 1. Figure 1: Architecture, core Python scripts, and associated classes of [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Steps for the construction of acrylate homopolymers. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Steps for the construction of acrylate copolymers. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Graph representation of a methyl acrylate molecule. Gray circles represent hydrogen (H) atoms, black [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Stochastic connection representation showing the frequency of links between repeating units (e.g., AA for [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Graph U-Net model architecture. 2.4. Property-guided molecular generation Generative models have emerged as powerful tools in computational chemistry for the de novo design of molecules with targeted properties. Originally developed for natural language processing [55, 24], generative pretrained transformers (GPT) have been adapted for molecular discovery by exploiting text-like representations of chemical… view at source ↗
Figure 7
Figure 7. Figure 7: GPT model pretraining, training, and generation process. [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: GA-based generative algorithm pipeline implemented in [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Histograms of numeric molecular descriptors found in the first dataset: (a) molecular weight computed [PITH_FULL_IMAGE:figures/full_fig_p014_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Plot showing the distribution of polymers per chain size (a), alongside histograms of static polarizability [PITH_FULL_IMAGE:figures/full_fig_p014_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Distributions of static polarizability for the copolymer and homopolymer/monomer datasets. [PITH_FULL_IMAGE:figures/full_fig_p015_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Training and validation losses for (a) the monomer and homopolymer dataset, and (b) the copolymer [PITH_FULL_IMAGE:figures/full_fig_p016_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Predicted versus ground truth polarizability for the validation sets of (a) the monomer and homopolymer [PITH_FULL_IMAGE:figures/full_fig_p016_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Comparison of ground truth and predicted static polarizability from 100 Monte Carlo runs with dropout [PITH_FULL_IMAGE:figures/full_fig_p016_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Scaled polarizability standard deviation (STD) per molecule from 100 Monte Carlo runs with dropout [PITH_FULL_IMAGE:figures/full_fig_p017_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Estimated PDFs of MAPE, R2 , and MSE for 100 validation runs, showing concentrated means and near￾Gaussian behavior for monomers/homopolymers (a–c) and copolymers (d–f). 10 15 20 25 30 Static Polarizability (˚A 3 ) −0.3 −0.2 −0.1 0.0 Fitness Score [PITH_FULL_IMAGE:figures/full_fig_p018_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Scatter plot showing the fitness score as a function of the static polarizability for the generated monomers. [PITH_FULL_IMAGE:figures/full_fig_p018_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Distributions of (a) static polarizability and (b) fitness scores for the valid generated monomers. [PITH_FULL_IMAGE:figures/full_fig_p019_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: Distributions of (a) static polarizability and (b) relative error for the valid monomers generated by the [PITH_FULL_IMAGE:figures/full_fig_p019_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: t-SNE visualization of the chemical space comparing the original acrylate dataset with the valid monomers [PITH_FULL_IMAGE:figures/full_fig_p020_20.png] view at source ↗
read the original abstract

Polymers are indispensable materials with applications ranging from electronics to medicine owing to their versatility, which can be tailored by adjusting their chemical composition and architecture. The design space for these compounds is vast and governed by factors such as monomer classes, copolymer configurations (e.g., linear, branched, random, and alternating), chain size, stoichiometry, and material properties (e.g., density, refractive index, solubility, and Poisson's ratio). Exploring this space requires efficient computational methodologies for polymer science. To address this challenge, we introduce PolyGraphPy, an open-source Python framework that integrates atomistic simulations with machine learning for accurate property prediction and property-guided polymer design. The framework automates Density Functional Tight Binding calculations to efficiently construct structured datasets for monomers, homopolymers, and alternating copolymers. For property prediction, PolyGraphPy employs Bayesian Graph Neural Networks (GNNs) with stochastic graph representations to predict target properties, such as static polarizability, while providing robust uncertainty quantification. Furthermore, the platform incorporates two complementary generative models for the de novo design of targeted molecules: a SELFIES-based Generative Pretrained Transformer (GPT) and a Genetic Algorithm (GA) based on BRICS graph fragmentation. Demonstrated on a dataset of acrylates, PolyGraphPy provides a highly customizable end-to-end pipeline that reduces computational costs and accelerates data-driven polymer informatics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces PolyGraphPy, an open-source Python framework that automates DFTB calculations to build datasets for monomers, homopolymers, and copolymers; employs Bayesian GNNs with stochastic graph representations for property prediction (e.g., static polarizability) with uncertainty quantification; and integrates SELFIES-based GPT and BRICS-GA generative models for property-guided de novo polymer design. It claims this end-to-end pipeline, demonstrated on acrylates, reduces computational costs and accelerates polymer informatics.

Significance. A validated, integrated framework combining automated atomistic simulation, Bayesian GNN surrogates, and generative design could meaningfully lower barriers to polymer property prediction and inverse design. The open-source release and focus on customizable pipelines are positive features. However, the absence of any quantitative validation metrics means the significance cannot yet be assessed from the manuscript.

major comments (2)
  1. [Abstract / acrylates demonstration] Abstract and demonstration section: the central claim that Bayesian GNNs with stochastic graph representations deliver accurate predictions and robust UQ for properties such as static polarizability (required for property-guided design) is unsupported by any reported error metrics, calibration plots, comparison to DFTB/experiment, or ablation on the stochastic representation.
  2. [Abstract] Abstract: the assertion that the pipeline 'reduces computational costs and accelerates data-driven polymer informatics' lacks any supporting numbers (timings, dataset sizes, scaling comparisons, or baseline costs), which is load-bearing for the end-to-end utility claim.
minor comments (1)
  1. The manuscript should explicitly state code availability, installation instructions, and example notebooks to enable reproducibility of the claimed framework.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We agree that the current manuscript lacks the quantitative metrics needed to substantiate the central claims regarding prediction accuracy, uncertainty quantification, and computational efficiency. We will revise the manuscript to address these gaps directly. Our point-by-point responses follow.

read point-by-point responses
  1. Referee: [Abstract / acrylates demonstration] Abstract and demonstration section: the central claim that Bayesian GNNs with stochastic graph representations deliver accurate predictions and robust UQ for properties such as static polarizability (required for property-guided design) is unsupported by any reported error metrics, calibration plots, comparison to DFTB/experiment, or ablation on the stochastic representation.

    Authors: We acknowledge that the submitted manuscript does not report error metrics (MAE, RMSE, R²), calibration plots, comparisons against DFTB or experimental values, or an ablation study isolating the stochastic graph representation. This is a genuine presentational gap that leaves the accuracy and UQ claims unsupported. In the revised manuscript we will add these elements to the demonstration section on acrylates, including tabulated performance numbers, reliability diagrams for the Bayesian predictions, and an explicit ablation comparing stochastic versus fixed graph inputs. revision: yes

  2. Referee: [Abstract] Abstract: the assertion that the pipeline 'reduces computational costs and accelerates data-driven polymer informatics' lacks any supporting numbers (timings, dataset sizes, scaling comparisons, or baseline costs), which is load-bearing for the end-to-end utility claim.

    Authors: We agree that the abstract claim is unsupported by any quantitative data in the current text. Although the framework description mentions automation of DFTB calculations, no timings, dataset cardinalities, or baseline comparisons appear. We will revise both the abstract and the methods/results sections to include concrete figures: number of monomers/homopolymers/copolymers generated, wall-clock times for the automated pipeline versus manual DFTB runs, and any scaling observations with system size. revision: yes

Circularity Check

0 steps flagged

No circularity: software framework paper with no derivation chain

full rationale

The manuscript describes an open-source Python framework (PolyGraphPy) that automates DFTB calculations, trains Bayesian GNNs on polymer graphs, and deploys generative models (SELFIES-GPT and BRICS-GA). No equations, uniqueness theorems, fitted-parameter predictions, or self-citation load-bearing arguments appear in the provided text. The central claims are engineering and demonstration statements about an end-to-end pipeline; they do not reduce to any input by construction. The paper is therefore self-contained as a tool description and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated in the provided text.

pith-pipeline@v0.9.1-grok · 5806 in / 1054 out tokens · 30449 ms · 2026-06-28T00:32:23.819346+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

67 extracted references · 1 canonical work pages · 1 internal anchor

  1. [1]

    S. Wu, H. Yamada, Y. Hayashi, M. Zamengo, R. Yoshida, Potentials and challenges of polymer informatics: exploiting machine learning for polymer design, 2020. URL:https://arxiv.org/ abs/2010.07683.arXiv:2010.07683

  2. [2]

    J. S. Peerless, N. J. B. Milliken, T. J. Oweida, M. D. Manning, Y. G. Yingling, Soft matter informatics: Current progress and challenges, Advanced Theory and Simulations 2 (2019) 1800129

  3. [3]

    Weininger, Smiles, a chemical language and information system

    D. Weininger, Smiles, a chemical language and information system. 1. introduction to method- ology and encoding rules, Journal of Chemical Information and Computer Sciences 28 (1988) 31–36

  4. [4]

    T.-S. Lin, C. W. Coley, H. Mochigase, H. K. Beech, W. Wang, Z. Wang, E. Woods, S. L. Craig, J. A. Johnson, J. A. Kalow, K. F. Jensen, B. D. Olsen, Bigsmiles: A structurally-based line notation for describing macromolecules, ACS Central Science 5 (2019) 1523–1531. PMID: 31572779

  5. [5]

    S. R. Heller, A. McNaught, I. Pletnev, S. Stein, D. Tchekhovskoi, Inchi, the iupac international chemical identifier, Journal of Cheminformatics 7 (2015) 23

  6. [6]

    M. Guo, W. Shou, L. Makatura, T. Erps, M. Foshey, W. Matusik, Polygrammar: Grammar for digital polymer representation and generation, Advanced Science 9 (2022) 2101864

  7. [7]

    Krenn, F

    M. Krenn, F. Häse, A. Nigam, P. Friederich, A. Aspuru-Guzik, Self-referencing embedded strings (selfies): A 100% robust molecular string representation, Machine Learning: Science and Technology 1 (2020) 045024

  8. [8]

    Aldeghi, C

    M. Aldeghi, C. W. Coley, A graph representation of molecular ensembles for polymer property prediction, Chemical Science 13 (2022) 10486–10498

  9. [9]

    C. Kim, A. Chandrasekaran, T. D. Huan, D. Das, R. Ramprasad, Polymer genome: A data- powered polymer informatics platform for property predictions, The Journal of Physical Chem- istry C 122 (2018) 17575–17585

  10. [10]

    Doan Tran, C

    H. Doan Tran, C. Kim, L. Chen, A. Chandrasekaran, R. Batra, S. Venkatram, D. Kamal, J. P. Lightstone, R. Gurnani, P. Shetty, M. Ramprasad, J. Laws, M. Shelton, R. Ramprasad, Machine-learning predictions of polymer properties with polymer genome, Journal of Applied Physics 128 (2020) 171104

  11. [11]

    K. Yang, K. Swanson, W. Jin, C. Coley, P. Eiden, H. Gao, A. Guzman-Perez, T. Hopper, B. Kelley, M. Mathea, A. Palmer, V. Settels, T. Jaakkola, K. Jensen, R. Barzilay, Analyzing learned molecular representations for property prediction, Journal of Chemical Information and Modeling 59 (2019) 3370–3388. PMID: 31361484

  12. [12]

    E. Heid, K. P. Greenman, Y. Chung, S.-C. Li, D. E. Graff, F. H. Vermeire, H. Wu, W. H. Green, C. J. McGill, Chemprop: A machine learning package for chemical property prediction, Journal of Chemical Information and Modeling 64 (2024) 9–17. PMID: 38147829

  13. [13]

    Ignacz, M

    G. Ignacz, M. I. Baig, K. Gopalsamy, A. Villa, S. Nunes, B. Ghanem, T. Shastry, S. K. Kumar, G.Szekely, Adata-drivenapproachtointerfacialpolymerizationexploitingmachinelearningfor predicting thin-film composite membrane formation, Materials Horizons 12 (2025) 9009–9025. 23

  14. [14]

    S. Sun, F. Tian, C. Zhao, M. Xie, W. Li, W. Yu, K. Cui, L. Li, Directed message passing neural networks enhanced graph convolutional learning for accurate polymer density prediction, The Journal of Chemical Physics 163 (2025)

  15. [15]

    Correia, J

    J. Correia, J. Capela, M. Rocha, Deepmol: an automated machine and deep learning framework for computational chemistry, Journal of Cheminformatics 16 (2024) 136

  16. [16]

    Bicerano, D

    J. Bicerano, D. Rigby, C. Freeman, B. LeBlanc, J. Aubry, Polymer expert – a software tool for de novo polymer design, Computational Materials Science 235 (2024) 112810

  17. [17]

    Nanjo, Arifin, H

    S. Nanjo, Arifin, H. Maeda, Y. Hayashi, K. Hatakeyama-Sato, R. Himeno, T. Hayakawa, R. Yoshida, Spacier: On-demand polymer design with fully automated all-atom classical molecular dynamics integrated into machine learning pipelines, npj Computational Materi- als 11 (2025) 16

  18. [18]

    Priyadarsini, S

    I. Priyadarsini, S. Takeda, L. Hamada, E. V. Brazil, E. Soares, H. Shinohara, Self-bart: A transformer-based molecular representation model using selfies, 2024. URL:https://arxiv. org/abs/2410.12348.arXiv:2410.12348

  19. [19]

    Lewis, Y

    M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, L. Zettlemoyer, Bart: Denoising sequence-to-sequence pre-training for natural language gen- eration, translation, and comprehension, 2019. URL:https://arxiv.org/abs/1910.13461. arXiv:1910.13461

  20. [20]

    H. Kim, M. Kim, S. Choi, J. Park, Genetic-guided gflownets for sample efficient molecular optimization, 2024. URL:https://arxiv.org/abs/2402.05961.arXiv:2402.05961

  21. [21]

    Bongini, M

    P. Bongini, M. Bianchini, F. Scarselli, Molecular generative graph neural networks for drug discovery, Neurocomputing 450 (2021) 242–252

  22. [22]

    Elstner, Scc-dftb: What is the proper degree of self-consistency?, The Journal of Physical Chemistry A 111 (2007) 5614–5621

    M. Elstner, Scc-dftb: What is the proper degree of self-consistency?, The Journal of Physical Chemistry A 111 (2007) 5614–5621. PMID: 17564420

  23. [23]

    Hourahine, B

    B. Hourahine, B. Aradi, V. Blum, F. Bonafé, A. Buccheri, C. Camacho, C. Cevallos, M. Y. Deshaye, T. Dumitrică, A. Dominguez, S. Ehlert, M. Elstner, T. van der Heide, J. Hermann, S. Irle, J. J. Kranz, C. Köhler, T. Kowalczyk, T. Kubař, I. S. Lee, V. Lutsker, R. J. Maurer, S. K. Min, I. Mitchell, C. Negre, T. A. Niehaus, A. M. N. Niklasson, A. J. Page, A. P...

  24. [24]

    Radford, J

    A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, Language models are unsu- pervised multitask learners, OpenAI (2019). Accessed: 2024-11-15

  25. [25]

    J. H. Holland, Genetic algorithms, Scientific American 267 (1992) 66–73

  26. [26]

    Katoch, S

    S. Katoch, S. S. Chauhan, V. Kumar, A review on genetic algorithm: past, present, and future, Multimedia Tools and Applications 80 (2021) 8091–8126

  27. [27]

    Scarselli, M

    F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, G. Monfardini, The graph neural network model, IEEE Transactions on Neural Networks 20 (2009) 61–80. 24

  28. [28]

    T. N. Kipf, M. Welling, Variational graph auto-encoders, 2016. URL:https://arxiv.org/ abs/1611.07308.arXiv:1611.07308

  29. [29]

    H. Gao, S. Ji, Graph u-nets, 2019. URL:https://arxiv.org/abs/1905.05178. arXiv:1905.05178

  30. [30]

    Degen, C

    J. Degen, C. Wegscheid-Gerlach, A. Zaliani, M. Rarey, On the art of compiling and using ’drug-like’ chemical fragment spaces, ChemMedChem 3 (2008) 1503–1507

  31. [31]

    Paszke, S

    A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Köpf, E. Yang, Z. DeVito, M. Raison, A. Te- jani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, S. Chintala, Pytorch: An imperative style, high-performance deep learning library, 2019. URL:https://arxiv.org/abs/1912.01703. arXiv:...

  32. [32]

    M. Fey, J. E. Lenssen, Fast graph representation learning with pytorch geometric, 2019. URL: https://arxiv.org/abs/1903.02428.arXiv:1903.02428

  33. [33]

    Accessed: 2025- 07-12

    DFTB+ Development Team, DFTB+: A software package for efficient approximate density functional theory based atomistic simulations,https://dftbplus.org, 2025. Accessed: 2025- 07-12

  34. [34]

    Hourahine, M

    B. Hourahine, M. Berdakin, J. A. Bich, F. P. Bonafé, C. Camacho, Q. Cui, M. Y. Deshaye, G. Díaz Mirón, S. Ehlert, M. Elstner, T. Frauenheim, N. Goldman, R. A. González León, T. van der Heide, S. Irle, T. Kowalczyk, T. Kubař, I. S. Lee, C. R. Lien-Medrano, A. Maryewski, T. Melson, S. K. Min, T. Niehaus, A. M. N. Niklasson, A. Pecchia, K. Reuter, C. G. Sánc...

  35. [35]

    Hohenberg, W

    P. Hohenberg, W. Kohn, Inhomogeneous electron gas, Phys. Rev. 136 (1964) B864–B871

  36. [36]

    W. Kohn, L. J. Sham, Self-consistent equations including exchange and correlation effects, Phys. Rev. 140 (1965) A1133–A1138

  37. [37]

    S. U. Patil, M. S. Radue, W. A. Pisani, P. Deshpande, H. Xu, H. Al Mahmud, T. Dumitrică, G. M. Odegard, Interfacial characteristics between flattened cnt stacks and polyimides: A molecular dynamics study, Computational Materials Science 185 (2020) 109970

  38. [38]

    Elstner, T

    M. Elstner, T. Frauenheim, E. Kaxiras, G. Seifert, S. Suhai, A self-consistent charge density- functional based tight-binding scheme for large biomolecules, physica status solidi (b) 217 (2000) 357–376

  39. [39]

    M. Gaus, X. Lu, M. Elstner, Q. Cui, Parameterization of dftb3/3ob for sulfur and phosphorus for chemical and biological applications, Journal of Chemical Theory and Computation 10 (2014) 1518–1537. PMID: 24803865

  40. [40]

    J. C. Slater, G. F. Koster, Simplified lcao method for the periodic potential problem, Physical Review 94 (1954) 1498–1524

  41. [41]

    Turcani, E

    L. Turcani, E. Berardo, K. E. Jelfs, stk: A python toolkit for supramolecular assembly, Journal of Computational Chemistry 39 (2018) 1456–1465. 25

  42. [42]

    M. B. Oviedo, C. F. A. Negre, C. G. Sánchez, Dynamical simulation of the optical response of photosynthetic pigments, Phys. Chem. Chem. Phys. 12 (2010) 6706–6711

  43. [43]

    Kearnes, K

    S. Kearnes, K. McCloskey, M. Berndl, V. Pande, P. Riley, Molecular graph convolutions: Moving beyond fingerprints, Journal of Computer-Aided Molecular Design 30 (2016) 595–608

  44. [44]

    J. M. Stokes, K. Yang, K. Swanson, W. Jin, A. Cubillos-Ruiz, N. M. Donghia, C. R. MacNair, S. French, L. A. Carfrae, Z. Bloom-Ackermann, V. M. Tran, A. Chiappino-Pepe, A. H. Badran, I. W. Andrews, E. J. Chory, G. M. Church, E. D. Brown, T. S. Jaakkola, R. Barzilay, J. J. Collins, A deep learning approach to antibiotic discovery, Cell 180 (2020) 688–702.e13

  45. [45]

    Jiang, Z

    D. Jiang, Z. Wu, C.-Y. Hsieh, G. Chen, B. Liao, Z. Wang, C. Shen, D. Cao, J. Wu, T. Hou, Could graph neural networks learn better molecular representation for drug discovery? a com- parison study of descriptor-based and graph-based models, Journal of Cheminformatics 13 (2021) 12

  46. [46]

    M. Zeng, J. N. Kumar, Z. Zeng, R. Savitha, V. R. Chandrasekhar, K. Hippalgaonkar, Graph convolutional neural networks for polymers property prediction, 2018. URL:https://arxiv. org/abs/1811.06231.arXiv:1811.06231

  47. [47]

    Gurnani, C

    R. Gurnani, C. Kuenneth, A. Toland, R. Ramprasad, Polymer informatics at scale with mul- titask graph neural networks, Chemistry of Materials 35 (2023) 1560–1567

  48. [48]

    F. Wang, W. Guo, M. Cheng, S. Yuan, H. Xu, Z. Gao, Mmpolymer: A multimodal multitask pretraining framework for polymer property prediction, 2024. URL:https://arxiv.org/abs/ 2406.04727.arXiv:2406.04727

  49. [49]

    T. N. Kipf, M. Welling, Semi-supervised classification with graph convolutional networks, 2017. URL:https://arxiv.org/abs/1609.02907.arXiv:1609.02907

  50. [50]

    Wasserman, Bayesian model selection and model averaging, Journal of Mathematical Psychology 44 (2000) 92–107

    L. Wasserman, Bayesian model selection and model averaging, Journal of Mathematical Psychology 44 (2000) 92–107

  51. [51]

    D. M. Blei, A. Kucukelbir, J. D. McAuliffe, Variational inference: A review for statisticians, Journal of the American Statistical Association 112 (2017) 859–877

  52. [52]

    Y. Gal, Z. Ghahramani, Bayesian convolutional neural networks with bernoulli approximate variational inference, 2015. URL:https://arxiv.org/abs/1506.02158. doi:10.48550/ARXIV. 1506.02158

  53. [53]

    L. V. Jospin, H. Laga, F. Boussaid, W. Buntine, M. Bennamoun, Hands-on bayesian neural networks—a tutorial for deep learning users, IEEE Computational Intelligence Magazine 17 (2022) 29–48

  54. [54]

    Ronneberger, P

    O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for biomedical image segmentation, 2015. URL:https://arxiv.org/abs/1505.04597.arXiv:1505.04597

  55. [55]

    A.Vaswani, N.Shazeer, N.Parmar, J.Uszoreit, L.Jones, A.N.Gomez, L.Kaiser, I.Polosukhin, Attention is all you need, Advances in Neural Information Processing Systems 30 (2017) 5998– 6008

  56. [56]

    Bilodeau, W

    C. Bilodeau, W. Jin, T. Jaakkola, R. Barzilay, K. F. Jensen, Generative models for molecular discovery: Recent advances and challenges, Wiley Interdisciplinary Reviews: Computational Molecular Science 12 (2022) e1608. 26

  57. [57]

    N. C. Frey, V. Gadepally, S. Samsi, A. Speth, B. Subramanian, Scalable generative models for molecular design, Journal of Chemical Information and Modeling 63 (2023) 1905–1915

  58. [58]

    Nigam, R

    A. Nigam, R. Pollice, M. F. D. Hurley, R. J. Hickman, M. Aldeghi, N. Yoshikawa, S. Chithrananda, A. Aspuru-Guzik, Artificial intelligence in chemistry: Current trends and future directions, Journal of Chemical Information and Modeling 60 (2020) 6025–6041

  59. [59]

    J. H. Jensen, A graph-based genetic algorithm and generative model/monte carlo tree search for the exploration of chemical space, Chemical Science 10 (2019) 3567–3572

  60. [60]

    Available athttps://huggingface.co/openai-community/gpt2

    Hugging Face, Gpt-2: Language models are unsupervised multitask learners, Hugging Face Model Hub, 2019. Available athttps://huggingface.co/openai-community/gpt2

  61. [61]

    Loshchilov, F

    I. Loshchilov, F. Hutter, Decoupled weight decay regularization, 2019. URL:https://arxiv. org/abs/1711.05101.arXiv:1711.05101

  62. [62]

    S. Kim, J. Chen, T. Cheng, A. Gindulyte, J. He, S. He, Q. Li, B. A. Shoemaker, P. A. Thiessen, B. Yu, L. Zaslavsky, J. Zhang, E. E. Bolton, Pubchem in 2021: New data content and improved web interfaces, Nucleic Acids Research 49 (2021) D1388–D1395

  63. [63]

    nlm.nih.gov/, 2025

    National Center for Biotechnology Information (NCBI), Pubchem,https://pubchem.ncbi. nlm.nih.gov/, 2025. Accessed: [Current Date]

  64. [64]

    R. J. Young, P. A. Lovell, Introduction to polymers, Chapman and Hall (1999)

  65. [65]

    Y. Gal, Z. Ghahramani, Dropout as a bayesian approximation: Representing model uncertainty in deep learning, 2016. URL:https://arxiv.org/abs/1506.02142.arXiv:1506.02142

  66. [66]

    Y. Gal, Z. Ghahramani, Dropout as a bayesian approximation: Appendix, 2016. URL:https: //arxiv.org/abs/1506.02157.arXiv:1506.02157

  67. [67]

    van der Maaten, G

    L. van der Maaten, G. Hinton, Visualizing data using t-sne, Journal of Machine Learning Research 9 (2008) 2579–2605. 27