pith. sign in

arxiv: 2410.02082 · v4 · submitted 2024-10-02 · 💻 cs.LG · q-bio.QM

FARM: Enhancing Molecular Representations with Functional Group Awareness

Pith reviewed 2026-05-23 19:46 UTC · model grok-4.3

classification 💻 cs.LG q-bio.QM
keywords functional groupsmolecular representationsSMILESgraph neural networkscontrastive learningMoleculeNetdrug discoverymolecular graphs
0
0 comments X

The pith

Adding functional group annotations to atoms in SMILES strings and graphs yields unified embeddings that reach state-of-the-art results on eight of thirteen MoleculeNet tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces FARM to bridge SMILES strings, natural language, and molecular graphs by injecting functional group membership directly into atomic tokens. This produces FG-enhanced SMILES for masked language modeling and FG graphs for graph neural network processing of group connectivity. Contrastive learning then aligns the two resulting views into a single embedding space that carries both atom-level detail and higher-level topology. The resulting representations improve molecular property prediction enough to outperform prior methods on most MoleculeNet benchmarks and to generalize to a photostability dataset. A reader would care because chemically richer embeddings could speed up virtual screening and property prediction in drug and materials design.

Core claim

FARM learns molecular representations from two complementary perspectives: masked language modeling on FG-enhanced SMILES that captures atom-level features enriched with functional context, and graph neural networks on FG graphs that model higher-level molecular topology through functional group connectivity. Contrastive learning aligns these views into a unified embedding space. When evaluated on the MoleculeNet benchmark this produces state-of-the-art performance on 8 out of 13 tasks and supports transfer to a photostability dataset for quantum mechanical properties.

What carries the argument

Functional group-enhanced SMILES and FG graphs aligned by contrastive learning between a masked language model and a graph neural network.

If this is right

  • The same representations support stronger transfer learning across drug discovery and materials science tasks.
  • Atom-level functional context improves predictions for both small-molecule properties and quantum mechanical quantities.
  • The unified embedding space enables applications in pharmaceutical research and functional material design.
  • FG-enhanced tokenization expands the effective molecular vocabulary for Transformer models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same annotation scheme could be tested on larger or more diverse chemical libraries to check whether the gains scale.
  • If functional groups prove decisive, similar label injection might improve other graph-language hybrid models beyond this architecture.
  • The contrastive alignment step could be replaced or augmented with different objectives to isolate which part drives the observed gains.

Load-bearing premise

Functional group annotations at the atomic level supply chemical knowledge that standard SMILES tokenization and atom-level graphs do not already capture, and the contrastive alignment between the two views produces a genuinely more informative embedding.

What would settle it

An ablation that removes all functional group annotations while keeping the rest of the architecture and training identical would produce the same or better MoleculeNet scores.

Figures

Figures reproduced from arXiv: 2410.02082 by Ge Liu, Heng Ji, Kuan-Hao Huang, Martin D. Burke, Thao Nguyen, Ying Diao.

Figure 1
Figure 1. Figure 1: FARM’s molecular representation learning framework. Given a molecule as input, we apply [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: (a) FARM’s molecular representation learning model architecture. (b) Functional group [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Visualization of the attention map of the BERT model trained with functional group [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: (a) Visualization of functional group knowledge graph embedding space: Clusters of five [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: (a) Example of naming a fused ring system in 4 steps: generate the core structure of the [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Number of functional groups associated with different chemical elements in the FG [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Loss curves for the masked language model (MLM) during training on two datasets: [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Link prediction model performance: Similar to word embedding analogies in NLP, replacing [PITH_FULL_IMAGE:figures/full_fig_p019_8.png] view at source ↗
read the original abstract

We introduce Functional Group-Aware Representations for Small Molecules (FARM), a novel foundation model designed to bridge the gap between SMILES, natural language, and molecular graphs. The key idea behind FARM is the incorporation of functional group (FG) annotations at the atomic level, enabling both FG-enhanced SMILES and FG graphs. In this representation, SMILES strings are enriched with functional group information that identifies the group membership of each atom, while the FG graph captures molecular structure by representing how functional groups are connected. This tokenization injects chemical knowledge into SMILES and expands the effective molecular vocabulary, making the representation more suitable for Transformer-based models and more aligned with natural language structure. FARM learns molecular representations from two complementary perspectives to jointly encode functional and structural information. Masked language modeling on FG-enhanced SMILES captures atom-level features enriched with functional context, while graph neural networks model higher-level molecular topology through functional group connectivity. Contrastive learning is then used to align these two views into a unified embedding space, ensuring that both atom-level detail and functional group structure are jointly represented. We evaluate FARM on the MoleculeNet benchmark and achieve state-of-the-art performance on 8 out of 13 tasks. We further validate its generalization ability on a photostability dataset for quantum mechanical properties. These results demonstrate that FARM improves molecular representation learning, supports strong transfer learning across drug discovery and materials science, and enables broad applications in pharmaceutical research and functional material design.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper introduces FARM, a foundation model for small molecules that augments SMILES strings with functional group (FG) annotations at the atomic level and constructs corresponding FG graphs. It trains via masked language modeling on the FG-enhanced SMILES, graph neural networks on the FG graphs, and contrastive alignment between the two views to produce unified embeddings. The central empirical claim is state-of-the-art performance on 8 out of 13 MoleculeNet tasks together with generalization results on a photostability dataset for quantum-mechanical properties.

Significance. If the reported gains hold under the stated evaluation protocol, the explicit injection of functional-group information could supply chemically grounded features that standard SMILES tokenization or atom-level graphs do not fully capture, with downstream utility in drug discovery and materials design. The work is an empirical ML study that uses canonical MoleculeNet splits, reports standard deviations, and presents internally consistent ablations; the stress-test concern about information gain from FG annotations does not introduce circularity or inconsistency in the manuscript.

minor comments (2)
  1. Abstract: the SOTA claim on 8/13 tasks would be clearer if the abstract briefly noted the use of standard MoleculeNet splits, the set of baselines, and the reporting of standard deviations (these details appear in the main text).
  2. The definition and construction of the FG graph (how functional groups are identified and connected) could be illustrated with a small concrete example in the methods section to aid reproducibility.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of FARM, the recognition of its empirical contributions on MoleculeNet, and the recommendation for minor revision. No major comments appear in the report.

Circularity Check

0 steps flagged

No significant circularity

full rationale

This is an empirical machine-learning paper introducing a molecular representation model (FG-enhanced SMILES + FG graphs + MLM + GNN + contrastive alignment) and reporting benchmark results on MoleculeNet. No derivation chain, equations, or 'predictions' are present that reduce by construction to fitted parameters or self-defined quantities inside the paper. The central performance claims rest on standard training and evaluation protocols on public data splits; no self-citation load-bearing steps, uniqueness theorems, or ansatz smuggling appear in the provided text. The work is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that functional groups can be reliably annotated at the atomic level and that this annotation supplies information orthogonal to standard molecular graphs and SMILES. Standard deep-learning training assumptions (convergence of contrastive objectives, absence of data leakage) are also required but not stated explicitly.

free parameters (1)
  • loss weighting coefficients among MLM, GNN, and contrastive terms
    Joint training of three objectives requires tunable scalars whose values are not reported in the abstract.
axioms (1)
  • domain assumption Functional groups can be accurately and unambiguously annotated at the atomic level for arbitrary small molecules
    The key idea of FG-enhanced SMILES and FG graphs depends on this annotation step being reliable and chemically meaningful.
invented entities (1)
  • FG graph no independent evidence
    purpose: Represent molecular topology at the level of functional-group connectivity rather than atom connectivity
    The FG graph is introduced as a new structural view whose utility is demonstrated only through the reported benchmark gains.

pith-pipeline@v0.9.0 · 5809 in / 1537 out tokens · 32670 ms · 2026-05-23T19:46:59.522253+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 2 internal anchors

  1. [1]

    Applications of quantitative structure-activity relationships (qsar) based virtual screening in drug design: A review

    Patnala GR Achary. Applications of quantitative structure-activity relationships (qsar) based virtual screening in drug design: A review. Mini Reviews in Medicinal Chemistry, 20:1375–1388, 2020

  2. [2]

    Closed-loop transfer enables artificial intelligence to yield chemical knowledge

    Nicholas H Angello, David M Friday, Changhyun Hwang, Seungjoo Yi, Austin H Cheng, Tiara C Torres- Flores, Edward R Jira, Wesley Wang, Alán Aspuru-Guzik, Martin D Burke, et al. Closed-loop transfer enables artificial intelligence to yield chemical knowledge. Nature, 633(8029):351–358, 2024

  3. [3]

    Drug–target interaction prediction: databases, web servers and computational models

    Xing Chen, Chenggang Clarence Yan, Xiaotian Zhang, Xu Zhang, Feng Dai, Jian Yin, and Yongdong Zhang. Drug–target interaction prediction: databases, web servers and computational models. Briefings in bioinformatics, 17:696–712, 2016

  4. [4]

    arXiv preprint arXiv:2406.14021(2024)

    Yongqiang Chen, Quanming Yao, Juzheng Zhang, James Cheng, and Yatao Bian. Hight: Hierarchical graph tokenization for graph-language alignment. arXiv preprint arXiv:2406.14021, 2024

  5. [5]

    On the art of compiling and using’drug-like’chemical fragment spaces.ChemMedChem, 3:1503, 2008

    Jorg Degen, Christof Wegscheid-Gerlach, Andrea Zaliani, and Matthias Rarey. On the art of compiling and using’drug-like’chemical fragment spaces.ChemMedChem, 3:1503, 2008

  6. [6]

    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

    Jacob Devlin. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018

  7. [7]

    Translation between molecules and natural language

    Carl Edwards, Tuan Lai, Kevin Ros, Garrett Honke, Kyunghyun Cho, and Heng Ji. Translation between molecules and natural language. In Proc. The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP2022), 2022

  8. [8]

    Synergpt: In-context learning for personalized drug synergy prediction and drug design

    Carl Edwards, Aakanksha Naik, Tushar Khot, Martin Burke, Heng Ji, and Tom Hope. Synergpt: In-context learning for personalized drug synergy prediction and drug design. In Proc. 1st Conference on Language Modeling (COLM2024), 2024

  9. [9]

    L+m-24: Building a dataset for language + molecules acl 2024

    Carl Edwards, Qingyun Wang, Lawrence Zhao, and Heng Ji. L+m-24: Building a dataset for language + molecules acl 2024. In Proc. ACL 2024 Workshop on Language+Molecules, 2024

  10. [10]

    Geometry-enhanced molecular representation learning for property prediction

    Xiaomin Fang, Lihang Liu, Jieqiong Lei, Donglong He, Shanzhuo Zhang, Jingbo Zhou, Fan Wang, Hua Wu, and Haifeng Wang. Geometry-enhanced molecular representation learning for property prediction. Nature Machine Intelligence, 4(2):127–134, 2022

  11. [11]

    Knowledge graph-enhanced molecular contrastive learning with functional prompt

    Yin Fang, Qiang Zhang, Ningyu Zhang, Zhuo Chen, Xiang Zhuang, Xin Shao, Xiaohui Fan, and Huajun Chen. Knowledge graph-enhanced molecular contrastive learning with functional prompt. Nature Machine Intelligence, 5(5):542–553, 2023

  12. [12]

    Chembl: A large-scale bioactivity database for chemical biology and drug discovery

    A Gaulton, L Bellis, J Chambers, M Davies, A Hersey, Y Light, S McGlinchey, R Akhtar, F Atkinson, AP Bento, et al. Chembl: A large-scale bioactivity database for chemical biology and drug discovery. Nucleic Acids Research. Database, page D1, 2011

  13. [13]

    Himgnn: a novel hierarchical molecular graph representation learning framework for property prediction

    Shen Han, Haitao Fu, Yuyang Wu, Ganglan Zhao, Zhenyu Song, Feng Huang, Zhongfei Zhang, Shichao Liu, and Wen Zhang. Himgnn: a novel hierarchical molecular graph representation learning framework for property prediction. Briefings in Bioinformatics, 24(5):bbad305, 2023

  14. [14]

    Strategies for pre-training graph neural networks.arXiv preprint arXiv:1905.12265, 2019

    Weihua Hu, Bowen Liu, Joseph Gomes, Marinka Zitnik, Percy Liang, Vijay Pande, and Jure Leskovec. Strategies for pre-training graph neural networks. arXiv preprint arXiv:1905.12265, 2019

  15. [15]

    Zinc- a free database of commercially available compounds for virtual screening

    John J Irwin and Brian K Shoichet. Zinc- a free database of commercially available compounds for virtual screening. Journal of chemical information and modeling, 45:177–182, 2005

  16. [16]

    Rdkit: Open-source cheminformatics

    G Landrum. Rdkit: Open-source cheminformatics. https://www.rdkit.org, 2010. Accessed: 2024- 09-19

  17. [17]

    Fg-bert: a generalized and self-supervised functional group-based molecular representation learning framework for properties prediction

    Biaoshun Li, Mujie Lin, Tiegen Chen, and Ling Wang. Fg-bert: a generalized and self-supervised functional group-based molecular representation learning framework for properties prediction. Briefings in Bioinformatics, 24(6):bbad398, 2023

  18. [18]

    Pre-training molecular graph representation with 3d geometry

    Shengchao Liu, Hanchen Wang, Weiyang Liu, Joan Lasenby, Hongyu Guo, and Jian Tang. Pre-training molecular graph representation with 3d geometry. arXiv preprint arXiv:2110.07728, 2021

  19. [19]

    Glad: Synergizing molecular graphs and language descriptors for enhanced power conversion efficiency prediction in organic photovoltaic devices

    Thao Nguyen, Tiara Torres-Flores, Changhyun Hwang, Carl Edwards, Ying Diao, and Heng Ji. Glad: Synergizing molecular graphs and language descriptors for enhanced power conversion efficiency prediction in organic photovoltaic devices. In Proc. 33rd ACM International Conference on Information and Knowledge Management (CIKM 2024), 2024. 11

  20. [20]

    Glad: Synergizing molecular graphs and language descriptors for enhanced power conversion efficiency prediction in organic photovoltaic devices

    Thao Nguyen, Tiara Torres-Flores, Changhyun Hwang, Carl Edwards, Ying Diao, and Heng Ji. Glad: Synergizing molecular graphs and language descriptors for enhanced power conversion efficiency prediction in organic photovoltaic devices. 33rd ACM International Conference on Information and Knowledge Management, 2024

  21. [21]

    Smiclr: contrastive learning on multiple molecular representations for semisupervised and unsupervised representation learning

    Gabriel A Pinheiro, Juarez LF Da Silva, and Marcos G Quiles. Smiclr: contrastive learning on multiple molecular representations for semisupervised and unsupervised representation learning. Journal of Chemical Information and Modeling, 62(17):3948–3960, 2022

  22. [22]

    Self- supervised graph transformer on large-scale molecular data

    Yu Rong, Yatao Bian, Tingyang Xu, Weiyang Xie, Ying Wei, Wenbing Huang, and Junzhou Huang. Self- supervised graph transformer on large-scale molecular data. Advances in neural information processing systems, 33:12559–12571, 2020

  23. [23]

    Molecular property prediction: recent trends in the era of artificial intelligence

    Jie Shen and Christos A Nicolaou. Molecular property prediction: recent trends in the era of artificial intelligence. Drug Discovery Today: Technologies, 32:29–36, 2019

  24. [24]

    Complex embeddings for simple link prediction

    Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, and Guillaume Bouchard. Complex embeddings for simple link prediction. In International conference on machine learning, pages 2071–2080. PMLR, 2016

  25. [25]

    Graph Attention Networks

    Petar Veliˇckovi´c, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. Graph attention networks. arXiv preprint arXiv:1710.10903, 2017

  26. [26]

    Applications of deep learning in molecule generation and molecular property prediction

    W Patrick Walters and Regina Barzilay. Applications of deep learning in molecule generation and molecular property prediction. Accounts of chemical research, 54:263–270, 2020

  27. [27]

    Chemical-reaction-aware molecule representation learning

    Hongwei Wang, Weijiang Li, Xiaomeng Jin, Kyunghyun Cho, Heng Ji, Jiawei Han, and Martin Burke. Chemical-reaction-aware molecule representation learning. In Proc. The International Conference on Learning Representations (ICLR2022), 2022

  28. [28]

    Motif-based graph representation learning with application to chemical molecules

    Yifei Wang, Shiyang Chen, Guobin Chen, Ethan Shurberg, Hang Liu, and Pengyu Hong. Motif-based graph representation learning with application to chemical molecules. In Informatics, volume 10, page 8. MDPI, 2023

  29. [29]

    Molecular contrastive learning of representations via graph neural networks

    Yuyang Wang, Jianren Wang, Zhonglin Cao, and Amir Barati Farimani. Molecular contrastive learning of representations via graph neural networks. Nature Machine Intelligence, 4(3):279–287, 2022

  30. [30]

    Deep-learning-based drug–target interaction prediction

    Ming Wen, Zhimin Zhang, Shaoyu Niu, Haozhi Sha, Ruihan Yang, Yonghuan Yun, and Hongmei Lu. Deep-learning-based drug–target interaction prediction. Journal of proteome research, 16:1401–1409, 2017

  31. [31]

    Transformers: State-of-the-art natural language processing

    Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, et al. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, pages 38–45, 2020

  32. [32]

    Molformer: Motif-based transformer on 3d heterogeneous molecular graphs

    Fang Wu, Dragomir Radev, and Stan Z Li. Molformer: Motif-based transformer on 3d heterogeneous molecular graphs. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 5312–5320, 2023

  33. [33]

    Moleculenet: a benchmark for molecular machine learning

    Zhenqin Wu, Bharath Ramsundar, Evan N Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S Pappu, Karl Leswing, and Vijay Pande. Moleculenet: a benchmark for molecular machine learning. Chemical science, 9:513–530, 2018

  34. [34]

    Mole-bert: Rethinking pre-training graph neural networks for molecules

    Jun Xia, Chengshuai Zhao, Bozhen Hu, Zhangyang Gao, Cheng Tan, Yue Liu, Siyuan Li, and Stan Z Li. Mole-bert: Rethinking pre-training graph neural networks for molecules. The Eleventh International Conference on Learning Representations, ICLR 2023, 2023

  35. [35]

    Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism

    Zhaoping Xiong, Dingyan Wang, Xiaohong Liu, Feisheng Zhong, Xiaozhe Wan, Xutong Li, Zhaojun Li, Xiaomin Luo, Kaixian Chen, Hualiang Jiang, et al. Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism. Journal of medicinal chemistry, 63:8749–8760, 2019

  36. [36]

    Invariant tokenization for language model enabled crystal materials generation

    Keqiang Yan, Xiner Li, Hongyi Ling, Carl Ashen, Kenna; Edwards, Raymundo Arroyave, Marinka Zitnik, Heng Ji, Xiaofeng Qian, Qian Xiaoning, and Shuiwang Ji. Invariant tokenization for language model enabled crystal materials generation. In Proc. the Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS2024), 2024. 12

  37. [37]

    Learning substructure invariance for out-of-distribution molecular representations

    Nianzu Yang, Kaipeng Zeng, Qitian Wu, Xiaosong Jia, and Junchi Yan. Learning substructure invariance for out-of-distribution molecular representations. Advances in Neural Information Processing Systems, 35: 12964–12978, 2022

  38. [38]

    Molecular representation learning via heterogeneous motif graph neural networks

    Zhaoning Yu and Hongyang Gao. Molecular representation learning via heterogeneous motif graph neural networks. In International Conference on Machine Learning, pages 25581–25594. PMLR, 2022

  39. [39]

    Hierarchical molecular graph self-supervised learning for property prediction

    Xuan Zang, Xianbing Zhao, and Buzhou Tang. Hierarchical molecular graph self-supervised learning for property prediction. Communications Chemistry, 6(1):34, 2023

  40. [40]

    Motif-driven contrastive learning of graph representations

    Shichang Zhang, Ziniu Hu, Arjun Subramonian, and Yizhou Sun. Motif-driven contrastive learning of graph representations. arXiv preprint arXiv:2012.12533, 2020

  41. [41]

    Hofgard, Aria Mansouri Tehrani, Rui Wang, Ameya Daigavane, Montgomery Bohde, Jerry Kurtin, Qian Huang, Tuong Phung, Minkai Xu, Chaitanya K

    Xuan Zhang, Limei Wang, Jacob Helwig, Youzhi Luo, Cong Fu, Yaochen Xie, Meng Liu, Yuchao Lin, Zhao Xu, Keqiang Yan, Keir Adams, Maurice Weiler, Xiner Li, Tianfan Fu, Yucheng Wang, Haiyang Yu, YuQing Xie, Xiang Fu, Alex Strasser, Shenglong Xu, Yi Liu, Yuanqi Du, Alexandra Saxton, Hongyi Ling, Hannah Lawrence, Hannes Stärk, Shurui Gui, Carl Edwards, Nichola...

  42. [42]

    Artificial intelligence for science in quantum, atomistic, and continuum systems

    Xuan Zhang, Limei Wang, Jacob Helwig, Youzhi Luo, Cong Fu, Yaochen Xie, Meng Liu, Yuchao Lin, Zhao Xu, Keqiang Yan, et al. Artificial intelligence for science in quantum, atomistic, and continuum systems. arXiv preprint arXiv:2307.08423, 2023

  43. [43]

    Motif-based graph self-supervised learning for molecular property prediction

    Zaixi Zhang, Qi Liu, Hao Wang, Chengqiang Lu, and Chee-Kong Lee. Motif-based graph self-supervised learning for molecular property prediction. Advances in Neural Information Processing Systems , 34: 15870–15882, 2021

  44. [44]

    Uni-mol: A universal 3d molecular representation learning framework

    Gengmo Zhou, Zhifeng Gao, Qiankun Ding, Hang Zheng, Hongteng Xu, Zhewei Wei, Linfeng Zhang, and Guolin Ke. Uni-mol: A universal 3d molecular representation learning framework. The Eleventh International Conference on Learning Representations, ICLR 2023, 2023. 13 A Molecular Datasets A.1 Training data We collected a diverse dataset to train our FARM model ...