MOLAR: Learning Multimodal Molecular Representations from Noisy Labels

Eran Segal; Kunyu Zhang; Nan Yin; Yingxu Wang; Yu Li

arxiv: 2606.18390 · v1 · pith:LQZG4P5Tnew · submitted 2026-06-16 · 💻 cs.LG · q-bio.QM

MOLAR: Learning Multimodal Molecular Representations from Noisy Labels

Yingxu Wang , Kunyu Zhang , Nan Yin , Yu Li , Eran Segal This is my paper

Pith reviewed 2026-06-27 01:14 UTC · model grok-4.3

classification 💻 cs.LG q-bio.QM

keywords noisy labelsmultimodal learningmolecular property predictiongraph neural networkslabel reliabilityrepresentation learningtextual molecular descriptions

0 comments

The pith

MOLAR separates latent clean-property inference from recorded-label observation using graph and text views.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that multimodal molecular representations can be learned from noisy labels by separating the inference of clean properties from the observation of recorded labels. Graph and text modalities provide evidence for the clean properties, while a dedicated channel explains the noisy labels. This matters because treating noisy labels as ground truth leads models to memorize errors, which is worse in multimodal fusion. Deriving reliability and evidence from the model allows both better performance and interpretability on molecular property tasks.

Core claim

MOLAR separates latent clean-property inference from recorded-label observation: graph and text views contribute residual evidence to a clean-property distribution, and a categorical label-observation channel maps this distribution to recorded labels for training. This formulation derives posterior label reliability and modality-specific molecular evidence from the model.

What carries the argument

The separation of a latent clean-property distribution (from graph and text) and a categorical label-observation channel (to recorded labels).

If this is right

MOLAR outperforms baselines on naturally noisy molecular benchmarks.
It also outperforms on controlled label-flipping benchmarks.
The model derives posterior label reliability scores.
Visualization shows modality-specific evidence and reliability diagnostics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Reliability scores could help curate better training sets from noisy databases.
The method might generalize to other domains with noisy scientific annotations like images or sequences.
If the clean distribution is accurate, predictions could be more robust to changes in label collection methods.

Load-bearing premise

The proposed generative separation between the clean-property distribution and the label-observation channel can be learned from noisy data without needing clean labels or detailed noise models.

What would settle it

Controlled experiments showing that MOLAR performs no better than standard multimodal models when labels are flipped at known rates would falsify the utility of the separation.

read the original abstract

Motivation: Noisy labels are a common challenge in molecular property prediction because molecular annotations are often obtained from assays, curated databases, or weak annotation pipelines rather than directly observed clean biological states. Treating recorded labels as reliable supervision can cause models to memorize corrupted observations and learn misleading molecular evidence. In multimodal molecular representation learning, this issue can be amplified by graph-text fusion or alignment, which may propagate label-induced errors across modalities. Results: We propose MOLAR, a noise-aware framework for learning multimodal molecular representations from noisy labels. MOLAR separates latent clean-property inference from recorded-label observation: graph and text views contribute residual evidence to a clean-property distribution, and a categorical label-observation channel maps this distribution to recorded labels for training. This formulation derives posterior label reliability and modality-specific molecular evidence from the model. Experiments on naturally noisy molecular benchmarks and controlled label-flipping benchmarks show that MOLAR consistently outperforms representative baselines. Visualization analyses further show that MOLAR provides interpretable reliability and modality-evidence diagnostics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MOLAR sets up a generative split between a latent clean property distribution (fed by graph and text) and a label observation channel, but that split is probably not identifiable from noisy labels alone.

read the letter

The main thing to know is that MOLAR models a clean-property distribution informed by both graph and text views, then routes it through a separate categorical channel to explain the recorded noisy labels, and claims this lets the model output reliability scores and modality evidence.

What is new is the concrete multimodal molecular version of this separation; the general noisy-label factorization idea is older, but applying it to graph-plus-text molecular data with derived diagnostics is the incremental step here.

The paper does well on the empirical side. It reports consistent gains over baselines on naturally noisy molecular benchmarks and on controlled label-flip tests, and the visualizations of reliability and per-modality evidence are a practical plus for users who want diagnostics.

The soft spot is identifiability. The stress-test concern lands: the observed likelihood is unchanged if probability mass is swapped between the clean posterior and the observation channel, so nothing pins down which factorization is the “true” one. The abstract gives no sign of anchor points, a fixed noise-transition matrix, or a clean-label subset, which means the posterior reliabilities could be an arbitrary but convenient decomposition rather than recovered latent quantities. If the full paper has ablations or external validation that show the reliabilities track something measurable outside the model, that would help; otherwise the interpretability story rests on the assumption that the separation can be learned from noisy supervision alone.

This is for people working on molecular property prediction who already deal with assay or database noise and want a method that also produces reliability estimates. A reader focused on practical gains in noisy multimodal settings would get something from the benchmarks.

It deserves peer review because the problem is real and the results look usable, even if the generative claims need checking.

Referee Report

1 major / 2 minor

Summary. The paper proposes MOLAR, a noise-aware framework for multimodal molecular representation learning. It models the generative process by separating latent clean-property inference (from graph and text views) from a categorical label-observation channel that maps the clean distribution to recorded noisy labels, enabling derivation of posterior label reliability and modality-specific evidence. Experiments on naturally noisy molecular benchmarks and controlled label-flipping settings show consistent outperformance over baselines, with additional visualization analyses for interpretability.

Significance. If the separation between clean-property distribution and label-observation channel is identifiable and learnable from noisy supervision alone, the framework could provide a principled way to obtain interpretable reliability diagnostics in a domain where assay-derived labels are frequently noisy; the multimodal aspect and empirical gains on both natural and synthetic noise would be of practical interest to molecular ML.

major comments (1)

[Method (generative factorization and posterior derivation)] The central claim rests on recovering the factorization p(y_recorded | x) = ∫ p(y_recorded | y_clean) p(y_clean | x_graph, x_text) dy_clean by maximum likelihood on observed (x, y_recorded) pairs alone. Without an explicit noise-transition matrix, anchor points, or a clean-label subset, the observed-data likelihood is invariant under reparameterizations that trade probability mass between the clean posterior and the observation channel; consequently the derived posterior reliabilities and modality-specific evidence are not guaranteed to recover the intended latent quantities. This identifiability issue is load-bearing for the separation and interpretability claims (see the generative model description and the derivation of posteriors).

minor comments (2)

[Abstract and Experiments] The abstract and results section state that MOLAR 'consistently outperforms representative baselines' but do not list the exact baselines, the magnitude of gains, or statistical significance tests; adding these details would strengthen the experimental claims.
[Method] Notation for the clean-property distribution and the observation channel should be introduced with explicit equations early in the method section to avoid ambiguity when later referring to 'residual evidence' and 'posterior reliability'.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the thoughtful and detailed review. The identifiability concern regarding the generative factorization is a substantive point that we address directly below. We believe the empirical evidence and modeling choices provide support for the framework's utility, while acknowledging the theoretical subtlety.

read point-by-point responses

Referee: [Method (generative factorization and posterior derivation)] The central claim rests on recovering the factorization p(y_recorded | x) = ∫ p(y_recorded | y_clean) p(y_clean | x_graph, x_text) dy_clean by maximum likelihood on observed (x, y_recorded) pairs alone. Without an explicit noise-transition matrix, anchor points, or a clean-label subset, the observed-data likelihood is invariant under reparameterizations that trade probability mass between the clean posterior and the observation channel; consequently the derived posterior reliabilities and modality-specific evidence are not guaranteed to recover the intended latent quantities. This identifiability issue is load-bearing for the separation and interpretability claims (see the generative model description and the derivation of posteriors).

Authors: We agree that identifiability of the clean-property posterior and the label-observation channel from noisy observations alone is not guaranteed in general without additional structure. Our formulation parameterizes the observation channel as a learnable categorical conditional distribution p(y_recorded | y_clean) that is jointly optimized with the multimodal clean-property inference network via the observed-data marginal likelihood. The multimodal (graph + text) inputs supply complementary evidence that, in practice, regularizes the decomposition. While we do not claim unique recovery of the ground-truth latent quantities, the resulting model yields (i) improved predictive performance on both naturally noisy and controlled label-flip benchmarks and (ii) post-hoc reliability and modality-evidence scores that align with domain expectations in visualization analyses. We will revise the manuscript to include an explicit discussion of the modeling assumptions, the lack of anchor-point or clean-label supervision, and the empirical rather than theoretical guarantees on the recovered posteriors. revision: partial

Circularity Check

0 steps flagged

No circularity: derivation not reducible to inputs in provided text

full rationale

The abstract describes a generative separation between clean-property distribution and label-observation channel but supplies no equations, no explicit likelihood, and no derivation steps. Without visible formulas showing that posterior reliabilities reduce to fitted parameters by construction or that any quantity is renamed as a prediction, no load-bearing step matches the enumerated circularity patterns. The central claim remains a modeling assumption whose identifiability is an external statistical question rather than an internal definitional collapse. The paper is therefore self-contained against the supplied text; external benchmarks or full equations would be required to raise the score.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no free parameters, axioms, or invented entities are specified in sufficient detail to populate the ledger.

pith-pipeline@v0.9.1-grok · 5710 in / 1067 out tokens · 27184 ms · 2026-06-27T01:14:22.159719+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

52 extracted references · 3 linked inside Pith

[1]

Machine learning assisted hit prioritization for high throughput screening in drug discovery.ACS Central Science, 2024

Davide Boldini, Lukas Friedrich, Daniel Kuhn, and Stephan A Sieber. Machine learning assisted hit prioritization for high throughput screening in drug discovery.ACS Central Science, 2024

2024
[2]

Mf-pcba: Multifidelity high-throughput screening benchmarks for drug discovery and machine learning

David Buterez, Jon Paul Janet, Steven J Kiddle, and Pietro Li` o. Mf-pcba: Multifidelity high-throughput screening benchmarks for drug discovery and machine learning. Journal of Chemical Information and Modeling, 2023

2023
[3]

A systematic study of key elements underlying molecular property prediction

Jianyuan Deng, Zhibo Yang, Hehe Wang, Iwao Ojima, Dimitris Samaras, and Fusheng Wang. A systematic study of key elements underlying molecular property prediction. Nature Communications, 2023

2023
[4]

Convolutional networks on graphs for learning molecular fingerprints.NIPS, 2015

David K Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Al´ an Aspuru-Guzik, and Ryan P Adams. Convolutional networks on graphs for learning molecular fingerprints.NIPS, 2015

2015
[5]

Translation between molecules and natural language

Carl Edwards, Tuan Lai, Kevin Ros, Garrett Honke, Kyunghyun Cho, and Heng Ji. Translation between molecules and natural language. InEMNLP, 2022

2022
[6]

Mol-instructions: A large-scale biomolecular instruction dataset for large language models

Yin Fang, Xiaozhuan Liang, Ningyu Zhang, Kangwei Liu, Rui Huang, Zhuo Chen, Xiaohui Fan, and Huajun Chen. Mol-instructions: A large-scale biomolecular instruction dataset for large language models. InICLR, 2024

2024
[7]

Mdfcl: Multimodal data fusion-based graph contrastive learning framework for molecular property prediction.Pattern Recognition, 2025

Xu Gong, Maotao Liu, Qun Liu, Yike Guo, and Guoyin Wang. Mdfcl: Multimodal data fusion-based graph contrastive learning framework for molecular property prediction.Pattern Recognition, 2025

2025
[8]

Kipf and Max Welling

Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. InICLR, 2017

2017
[9]

Pre-training graph neural networks on molecules by using subgraph-conditioned graph information bottleneck

O-Joun Lee et al. Pre-training graph neural networks on molecules by using subgraph-conditioned graph information bottleneck. InAAAI, 2025

2025
[10]

Instance-dependent label distribution estimation for learning with label noise.IJCV, 2025

Zehui Liao, Shishuai Hu, Yutong Xie, and Yong Xia. Instance-dependent label distribution estimation for learning with label noise.IJCV, 2025

2025
[11]

Learning the latent causal structure for modeling label noise.NIPS, 2024

Yexiong Lin, Yu Yao, and Tongliang Liu. Learning the latent causal structure for modeling label noise.NIPS, 2024

2024
[12]

Multi-modal molecule structure– text model for text-based retrieval and editing.Nature Machine Intelligence, 2023

Shengchao Liu, Weili Nie, Chengpeng Wang, Jiarui Lu, Zhuoran Qiao, Ling Liu, Jian Tang, Chaowei Xiao, and Animashree Anandkumar. Multi-modal molecule structure– text model for text-based retrieval and editing.Nature Machine Intelligence, 2023

2023
[13]

Identifiability of label noise transition matrix

Yang Liu, Hao Cheng, and Kun Zhang. Identifiability of label noise transition matrix. InICLR, 2023

2023
[14]

Molca: Molecular graph-language modeling with cross-modal projector and uni-modal adapter

Zhiyuan Liu, Sihang Li, Yanchen Luo, Hao Fei, Yixin Cao, Kenji Kawaguchi, Xiang Wang, and Tat-Seng Chua. Molca: Molecular graph-language modeling with cross-modal projector and uni-modal adapter. InEMNLP, 2023

2023
[15]

Rethinking tokenizer and decoder in masked graph modeling for molecules.NIPS, 2023

Zhiyuan Liu, Yaorui Shi, An Zhang, Enzhi Zhang, Kenji Kawaguchi, Xiang Wang, and Tat-Seng Chua. Rethinking tokenizer and decoder in masked graph modeling for molecules.NIPS, 2023

2023
[16]

Multimodal fused deep learning for drug property prediction: Integrating chemical language and molecular graph.Computational and Structural Biotechnology Journal, 2024

Xiaohua Lu, Liangxu Xie, Lei Xu, Rongzhi Mao, Xiaojun Xu, and Shan Chang. Multimodal fused deep learning for drug property prediction: Integrating chemical language and molecular graph.Computational and Structural Biotechnology Journal, 2024

2024
[17]

Cross-dependent graph neural networks for molecular property prediction.Bioinformatics, 2022

Hehuan Ma, Yatao Bian, Yu Rong, Wenbing Huang, Tingyang Xu, Weiyang Xie, Geyan Ye, and Junzhou Huang. Cross-dependent graph neural networks for molecular property prediction.Bioinformatics, 2022

2022
[18]

From intuition to ai: evolution of small molecule representations in drug discovery.Briefings in bioinformatics, 2024

Miles McGibbon, Steven Shave, Jie Dong, Yumiao Gao, Douglas R Houston, Jiancong Xie, Yuedong Yang, Philippe Schwaller, and Vincent Blay. From intuition to ai: evolution of small molecule representations in drug discovery.Briefings in bioinformatics, 2024

2024
[19]

Noisy label learning with instance-dependent outliers: Identifiability via crowd wisdom.NIPS, 2024

Tri Nguyen, Shahana Ibrahim, and Xiao Fu. Noisy label learning with instance-dependent outliers: Identifiability via crowd wisdom.NIPS, 2024

2024
[20]

Biot5+: Towards generalized biological understanding with 8 xxx, xxx, Volume , Issue iupac integration and multi-task tuning

Qizhi Pei, Lijun Wu, Kaiyuan Gao, Xiaozhuan Liang, Yin Fang, Jinhua Zhu, Shufang Xie, Tao Qin, and Rui Yan. Biot5+: Towards generalized biological understanding with 8 xxx, xxx, Volume , Issue iupac integration and multi-task tuning. InACL, 2024

2024
[21]

Biot5: Enriching cross- modal integration in biology with chemical knowledge and natural language associations

Qizhi Pei, Wei Zhang, Jinhua Zhu, Kehan Wu, Kaiyuan Gao, Lijun Wu, Yingce Xia, and Rui Yan. Biot5: Enriching cross- modal integration in biology with chemical knowledge and natural language associations. InEMNLP, 2023

2023
[22]

Robust training of graph neural networks via noise governance

Siyi Qian, Haochao Ying, Renjun Hu, Jingbo Zhou, Jintai Chen, Danny Z Chen, and Jian Wu. Robust training of graph neural networks via noise governance. InWSDM, 2023

2023
[23]

Extended-connectivity fingerprints.Journal of chemical information and modeling, 2010

David Rogers and Mathew Hahn. Extended-connectivity fingerprints.Journal of chemical information and modeling, 2010

2010
[24]

Molprop: Molecular property prediction with multimodal language and graph fusion.Journal of Cheminformatics, 2024

Zachary A Rollins, Alan C Cheng, and Essam Metwally. Molprop: Molecular property prediction with multimodal language and graph fusion.Journal of Cheminformatics, 2024

2024
[25]

Self-supervised graph transformer on large-scale molecular data.NIPS, 2020

Yu Rong, Yatao Bian, Tingyang Xu, Weiyang Xie, Ying Wei, Wenbing Huang, and Junzhou Huang. Self-supervised graph transformer on large-scale molecular data.NIPS, 2020

2020
[26]

Large-scale chemical language representations capture molecular structure and properties.Nature Machine Intelligence, 2022

Jerret Ross, Brian Belgodere, Vijil Chenthamarakshan, Inkit Padhi, Youssef Mroueh, and Payel Das. Large-scale chemical language representations capture molecular structure and properties.Nature Machine Intelligence, 2022

2022
[27]

Can large language models understand molecules?BMC bioinformatics, 2024

Shaghayegh Sadeghi, Alan Bui, Ali Forooghi, Jianguo Lu, and Alioune Ngom. Can large language models understand molecules?BMC bioinformatics, 2024

2024
[28]

Graph attention networks

Petar Veliˇ ckovi´ c, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. Graph attention networks. InICLR, 2018

2018
[29]

Chain-aware graph neural networks for molecular property prediction.Bioinformatics, 2024

Honghao Wang, Acong Zhang, Yuan Zhong, Junlei Tang, Kai Zhang, and Ping Li. Chain-aware graph neural networks for molecular property prediction.Bioinformatics, 2024

2024
[30]

Sgac: a graph neural network framework for imbalanced and structure-aware amp classification.Briefings in Bioinformatics, 27(1):bbag038, 2026

Yingxu Wang, Victor Liang, Nan Yin, Siwei Liu, and Eran Segal. Sgac: a graph neural network framework for imbalanced and structure-aware amp classification.Briefings in Bioinformatics, 27(1):bbag038, 2026

2026
[31]

Riemannian flow matching for disentangled graph domain adaptation.arXiv preprint arXiv:2602.00656, 2026

Yingxu Wang, Xinwang Liu, Mengzhu Wang, Siyang Gao, and Nan Yin. Riemannian flow matching for disentangled graph domain adaptation.arXiv preprint arXiv:2602.00656, 2026

Pith/arXiv arXiv 2026
[32]

Nested graph pseudo-label refinement for noisy label domain adaptation learning

Yingxu Wang, Mengzhu Wang, Zhichao Huang, Suyu Liu, and Nan Yin. Nested graph pseudo-label refinement for noisy label domain adaptation learning. InAAAI, 2026

2026
[33]

Degree-conscious spiking graph for cross-domain adaptation.arXiv preprint arXiv:2410.06883, 2024

Yingxu Wang, Mengzhu Wang, Houcheng Su, Nan Yin, Quanming Yao, and James Kwok. Degree-conscious spiking graph for cross-domain adaptation.arXiv preprint arXiv:2410.06883, 2024

arXiv 2024
[34]

Dusego: Dual second-order equivariant graph ordinary differential equation.TKDD, 2025

Yingxu Wang, Nan Yin, Mingyan Xiao, Xinhao Yi, Siwei Liu, and Shangsong Liang. Dusego: Dual second-order equivariant graph ordinary differential equation.TKDD, 2025

2025
[35]

Protomol: enhancing molecular property prediction via prototype-guided multimodal learning.Briefings in Bioinformatics, 2025

Yingxu Wang, Kunyu Zhang, Jiaxin Huang, Nan Yin, Siwei Liu, and Eran Segal. Protomol: enhancing molecular property prediction via prototype-guided multimodal learning.Briefings in Bioinformatics, 2025

2025
[36]

Usbd: Universal structural basis distillation for source-free graph domain adaptation.arXiv preprint arXiv:2602.08431, 2026

Yingxu Wang, Kunyu Zhang, Mengzhu Wang, Siyang Gao, and Nan Yin. Usbd: Universal structural basis distillation for source-free graph domain adaptation.arXiv preprint arXiv:2602.08431, 2026

arXiv 2026
[37]

When brain networks travel: Learning beyond site.arXiv preprint arXiv:2605.06050, 2026

Yingxu Wang, Kunyu Zhang, Yanwu Yang, Thomas Wolfers, Yujie Wu, Siyang Gao, and Nan Yin. When brain networks travel: Learning beyond site.arXiv preprint arXiv:2605.06050, 2026

Pith/arXiv arXiv 2026
[38]

Advanced graph and sequence neural networks for molecular property prediction and drug discovery.Bioinformatics, 2022

Zhengyang Wang, Meng Liu, Youzhi Luo, Zhao Xu, Yaochen Xie, Limei Wang, Lei Cai, Qi Qi, Zhuoning Yuan, Tianbao Yang, et al. Advanced graph and sequence neural networks for molecular property prediction and drug discovery.Bioinformatics, 2022

2022
[39]

Learning from graph: Mitigating label noise on graph through topological feature reconstruction

Zhonghao Wang, Yuanchen Bei, Sheng Zhou, Zhiyao Zhou, Jiapei Fan, Hui Xue, Haishuai Wang, and Jiajun Bu. Learning from graph: Mitigating label noise on graph through topological feature reconstruction. InCIKM, 2025

2025
[40]

Fine-grained classification with noisy labels

Qi Wei, Lei Feng, Haoliang Sun, Ren Wang, Chenhui Guo, and Yilong Yin. Fine-grained classification with noisy labels. InCVPR, 2023

2023
[41]

Smiles, a chemical language and information system

David Weininger. Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules.Journal of chemical information and computer sciences, 1988

1988
[43]

Molecular joint representation learning via multi-modal information of smiles and graphs.IEEE/ACM transactions on computational biology and bioinformatics, 2023

Tianyu Wu, Yang Tang, Qiyu Sun, and Luolin Xiong. Molecular joint representation learning via multi-modal information of smiles and graphs.IEEE/ACM transactions on computational biology and bioinformatics, 2023

2023
[44]

Moleculenet: a benchmark for molecular machine learning.Chemical science, 2018

Zhenqin Wu, Bharath Ramsundar, Evan N Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S Pappu, Karl Leswing, and Vijay Pande. Moleculenet: a benchmark for molecular machine learning.Chemical science, 2018

2018
[45]

How powerful are graph neural networks?arXiv preprint arXiv:1810.00826, 2018

Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks?arXiv preprint arXiv:1810.00826, 2018

Pith/arXiv arXiv 2018
[46]

Sport: A subgraph perspective on graph classification with label noise.TKDD, 2024

Nan Yin, Li Shen, Chong Chen, Xian-sheng Hua, and Xiao Luo. Sport: A subgraph perspective on graph classification with label noise.TKDD, 2024

2024
[47]

Omg: Towards effective graph classification against label noise.TKDE, 2023

Nan Yin, Li Shen, Mengzhu Wang, Xiao Luo, Zhigang Luo, and Dacheng Tao. Omg: Towards effective graph classification against label noise.TKDE, 2023

2023
[48]

Difficulty in chirality recognition for transformer architectures learning chemical structures from string representations.Nature Communications, 2024

Yasuhiro Yoshikai, Tadahaya Mizuno, Shumpei Nemoto, and Hiroyuki Kusuhara. Difficulty in chirality recognition for transformer architectures learning chemical structures from string representations.Nature Communications, 2024

2024
[49]

Graph contrastive learning with augmentations.NIPS, 2020

Yuning You, Tianlong Chen, Yongduo Sui, Ting Chen, Zhangyang Wang, and Yang Shen. Graph contrastive learning with augmentations.NIPS, 2020

2020
[50]

Mvmrl: a multi- view molecular representation learning method for molecular property prediction.Briefings in Bioinformatics, 2024

Ru Zhang, Yanmei Lin, Yijia Wu, Lei Deng, Hao Zhang, Mingzhi Liao, and Yuzhong Peng. Mvmrl: a multi- view molecular representation learning method for molecular property prediction.Briefings in Bioinformatics, 2024

2024
[51]

Molecular property prediction based on graph structure learning.Bioinformatics, 2024

Bangyi Zhao, Weixia Xu, Jihong Guan, and Shuigeng Zhou. Molecular property prediction based on graph structure learning.Bioinformatics, 2024

2024
[52]

Large language models for scientific discovery in molecular property prediction.Nature Machine Intelligence, 2025

Yizhen Zheng, Huan Yee Koh, Jiaxin Ju, Anh TN Nguyen, Lauren T May, Geoffrey I Webb, and Shirui Pan. Large language models for scientific discovery in molecular property prediction.Nature Machine Intelligence, 2025

2025
[53]

Textencoderf# Residualevidencemapping Residualevidencemapping z

Gengmo Zhou, Zhifeng Gao, Qiankun Ding, Hang Zheng, Hongteng Xu, Zhewei Wei, Linfeng Zhang, and Guolin Ke. Uni-mol: A universal 3d molecular representation learning framework. InICLR, 2023. xxx, xxx, Volume , Issue 9 Notation summary Symbol Meaning Gi = (Vi, Ei, Xi) Molecular graph for moleculeiwith atoms, bonds, and atom features Ti Text-derived molecula...

2023

[1] [1]

Machine learning assisted hit prioritization for high throughput screening in drug discovery.ACS Central Science, 2024

Davide Boldini, Lukas Friedrich, Daniel Kuhn, and Stephan A Sieber. Machine learning assisted hit prioritization for high throughput screening in drug discovery.ACS Central Science, 2024

2024

[2] [2]

Mf-pcba: Multifidelity high-throughput screening benchmarks for drug discovery and machine learning

David Buterez, Jon Paul Janet, Steven J Kiddle, and Pietro Li` o. Mf-pcba: Multifidelity high-throughput screening benchmarks for drug discovery and machine learning. Journal of Chemical Information and Modeling, 2023

2023

[3] [3]

A systematic study of key elements underlying molecular property prediction

Jianyuan Deng, Zhibo Yang, Hehe Wang, Iwao Ojima, Dimitris Samaras, and Fusheng Wang. A systematic study of key elements underlying molecular property prediction. Nature Communications, 2023

2023

[4] [4]

Convolutional networks on graphs for learning molecular fingerprints.NIPS, 2015

David K Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Al´ an Aspuru-Guzik, and Ryan P Adams. Convolutional networks on graphs for learning molecular fingerprints.NIPS, 2015

2015

[5] [5]

Translation between molecules and natural language

Carl Edwards, Tuan Lai, Kevin Ros, Garrett Honke, Kyunghyun Cho, and Heng Ji. Translation between molecules and natural language. InEMNLP, 2022

2022

[6] [6]

Mol-instructions: A large-scale biomolecular instruction dataset for large language models

Yin Fang, Xiaozhuan Liang, Ningyu Zhang, Kangwei Liu, Rui Huang, Zhuo Chen, Xiaohui Fan, and Huajun Chen. Mol-instructions: A large-scale biomolecular instruction dataset for large language models. InICLR, 2024

2024

[7] [7]

Mdfcl: Multimodal data fusion-based graph contrastive learning framework for molecular property prediction.Pattern Recognition, 2025

Xu Gong, Maotao Liu, Qun Liu, Yike Guo, and Guoyin Wang. Mdfcl: Multimodal data fusion-based graph contrastive learning framework for molecular property prediction.Pattern Recognition, 2025

2025

[8] [8]

Kipf and Max Welling

Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. InICLR, 2017

2017

[9] [9]

Pre-training graph neural networks on molecules by using subgraph-conditioned graph information bottleneck

O-Joun Lee et al. Pre-training graph neural networks on molecules by using subgraph-conditioned graph information bottleneck. InAAAI, 2025

2025

[10] [10]

Instance-dependent label distribution estimation for learning with label noise.IJCV, 2025

Zehui Liao, Shishuai Hu, Yutong Xie, and Yong Xia. Instance-dependent label distribution estimation for learning with label noise.IJCV, 2025

2025

[11] [11]

Learning the latent causal structure for modeling label noise.NIPS, 2024

Yexiong Lin, Yu Yao, and Tongliang Liu. Learning the latent causal structure for modeling label noise.NIPS, 2024

2024

[12] [12]

Multi-modal molecule structure– text model for text-based retrieval and editing.Nature Machine Intelligence, 2023

Shengchao Liu, Weili Nie, Chengpeng Wang, Jiarui Lu, Zhuoran Qiao, Ling Liu, Jian Tang, Chaowei Xiao, and Animashree Anandkumar. Multi-modal molecule structure– text model for text-based retrieval and editing.Nature Machine Intelligence, 2023

2023

[13] [13]

Identifiability of label noise transition matrix

Yang Liu, Hao Cheng, and Kun Zhang. Identifiability of label noise transition matrix. InICLR, 2023

2023

[14] [14]

Molca: Molecular graph-language modeling with cross-modal projector and uni-modal adapter

Zhiyuan Liu, Sihang Li, Yanchen Luo, Hao Fei, Yixin Cao, Kenji Kawaguchi, Xiang Wang, and Tat-Seng Chua. Molca: Molecular graph-language modeling with cross-modal projector and uni-modal adapter. InEMNLP, 2023

2023

[15] [15]

Rethinking tokenizer and decoder in masked graph modeling for molecules.NIPS, 2023

Zhiyuan Liu, Yaorui Shi, An Zhang, Enzhi Zhang, Kenji Kawaguchi, Xiang Wang, and Tat-Seng Chua. Rethinking tokenizer and decoder in masked graph modeling for molecules.NIPS, 2023

2023

[16] [16]

Multimodal fused deep learning for drug property prediction: Integrating chemical language and molecular graph.Computational and Structural Biotechnology Journal, 2024

Xiaohua Lu, Liangxu Xie, Lei Xu, Rongzhi Mao, Xiaojun Xu, and Shan Chang. Multimodal fused deep learning for drug property prediction: Integrating chemical language and molecular graph.Computational and Structural Biotechnology Journal, 2024

2024

[17] [17]

Cross-dependent graph neural networks for molecular property prediction.Bioinformatics, 2022

Hehuan Ma, Yatao Bian, Yu Rong, Wenbing Huang, Tingyang Xu, Weiyang Xie, Geyan Ye, and Junzhou Huang. Cross-dependent graph neural networks for molecular property prediction.Bioinformatics, 2022

2022

[18] [18]

From intuition to ai: evolution of small molecule representations in drug discovery.Briefings in bioinformatics, 2024

Miles McGibbon, Steven Shave, Jie Dong, Yumiao Gao, Douglas R Houston, Jiancong Xie, Yuedong Yang, Philippe Schwaller, and Vincent Blay. From intuition to ai: evolution of small molecule representations in drug discovery.Briefings in bioinformatics, 2024

2024

[19] [19]

Noisy label learning with instance-dependent outliers: Identifiability via crowd wisdom.NIPS, 2024

Tri Nguyen, Shahana Ibrahim, and Xiao Fu. Noisy label learning with instance-dependent outliers: Identifiability via crowd wisdom.NIPS, 2024

2024

[20] [20]

Biot5+: Towards generalized biological understanding with 8 xxx, xxx, Volume , Issue iupac integration and multi-task tuning

Qizhi Pei, Lijun Wu, Kaiyuan Gao, Xiaozhuan Liang, Yin Fang, Jinhua Zhu, Shufang Xie, Tao Qin, and Rui Yan. Biot5+: Towards generalized biological understanding with 8 xxx, xxx, Volume , Issue iupac integration and multi-task tuning. InACL, 2024

2024

[21] [21]

Biot5: Enriching cross- modal integration in biology with chemical knowledge and natural language associations

Qizhi Pei, Wei Zhang, Jinhua Zhu, Kehan Wu, Kaiyuan Gao, Lijun Wu, Yingce Xia, and Rui Yan. Biot5: Enriching cross- modal integration in biology with chemical knowledge and natural language associations. InEMNLP, 2023

2023

[22] [22]

Robust training of graph neural networks via noise governance

Siyi Qian, Haochao Ying, Renjun Hu, Jingbo Zhou, Jintai Chen, Danny Z Chen, and Jian Wu. Robust training of graph neural networks via noise governance. InWSDM, 2023

2023

[23] [23]

Extended-connectivity fingerprints.Journal of chemical information and modeling, 2010

David Rogers and Mathew Hahn. Extended-connectivity fingerprints.Journal of chemical information and modeling, 2010

2010

[24] [24]

Molprop: Molecular property prediction with multimodal language and graph fusion.Journal of Cheminformatics, 2024

Zachary A Rollins, Alan C Cheng, and Essam Metwally. Molprop: Molecular property prediction with multimodal language and graph fusion.Journal of Cheminformatics, 2024

2024

[25] [25]

Self-supervised graph transformer on large-scale molecular data.NIPS, 2020

Yu Rong, Yatao Bian, Tingyang Xu, Weiyang Xie, Ying Wei, Wenbing Huang, and Junzhou Huang. Self-supervised graph transformer on large-scale molecular data.NIPS, 2020

2020

[26] [26]

Large-scale chemical language representations capture molecular structure and properties.Nature Machine Intelligence, 2022

Jerret Ross, Brian Belgodere, Vijil Chenthamarakshan, Inkit Padhi, Youssef Mroueh, and Payel Das. Large-scale chemical language representations capture molecular structure and properties.Nature Machine Intelligence, 2022

2022

[27] [27]

Can large language models understand molecules?BMC bioinformatics, 2024

Shaghayegh Sadeghi, Alan Bui, Ali Forooghi, Jianguo Lu, and Alioune Ngom. Can large language models understand molecules?BMC bioinformatics, 2024

2024

[28] [28]

Graph attention networks

Petar Veliˇ ckovi´ c, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. Graph attention networks. InICLR, 2018

2018

[29] [29]

Chain-aware graph neural networks for molecular property prediction.Bioinformatics, 2024

Honghao Wang, Acong Zhang, Yuan Zhong, Junlei Tang, Kai Zhang, and Ping Li. Chain-aware graph neural networks for molecular property prediction.Bioinformatics, 2024

2024

[30] [30]

Sgac: a graph neural network framework for imbalanced and structure-aware amp classification.Briefings in Bioinformatics, 27(1):bbag038, 2026

Yingxu Wang, Victor Liang, Nan Yin, Siwei Liu, and Eran Segal. Sgac: a graph neural network framework for imbalanced and structure-aware amp classification.Briefings in Bioinformatics, 27(1):bbag038, 2026

2026

[31] [31]

Riemannian flow matching for disentangled graph domain adaptation.arXiv preprint arXiv:2602.00656, 2026

Yingxu Wang, Xinwang Liu, Mengzhu Wang, Siyang Gao, and Nan Yin. Riemannian flow matching for disentangled graph domain adaptation.arXiv preprint arXiv:2602.00656, 2026

Pith/arXiv arXiv 2026

[32] [32]

Nested graph pseudo-label refinement for noisy label domain adaptation learning

Yingxu Wang, Mengzhu Wang, Zhichao Huang, Suyu Liu, and Nan Yin. Nested graph pseudo-label refinement for noisy label domain adaptation learning. InAAAI, 2026

2026

[33] [33]

Degree-conscious spiking graph for cross-domain adaptation.arXiv preprint arXiv:2410.06883, 2024

Yingxu Wang, Mengzhu Wang, Houcheng Su, Nan Yin, Quanming Yao, and James Kwok. Degree-conscious spiking graph for cross-domain adaptation.arXiv preprint arXiv:2410.06883, 2024

arXiv 2024

[34] [34]

Dusego: Dual second-order equivariant graph ordinary differential equation.TKDD, 2025

Yingxu Wang, Nan Yin, Mingyan Xiao, Xinhao Yi, Siwei Liu, and Shangsong Liang. Dusego: Dual second-order equivariant graph ordinary differential equation.TKDD, 2025

2025

[35] [35]

Protomol: enhancing molecular property prediction via prototype-guided multimodal learning.Briefings in Bioinformatics, 2025

Yingxu Wang, Kunyu Zhang, Jiaxin Huang, Nan Yin, Siwei Liu, and Eran Segal. Protomol: enhancing molecular property prediction via prototype-guided multimodal learning.Briefings in Bioinformatics, 2025

2025

[36] [36]

Usbd: Universal structural basis distillation for source-free graph domain adaptation.arXiv preprint arXiv:2602.08431, 2026

Yingxu Wang, Kunyu Zhang, Mengzhu Wang, Siyang Gao, and Nan Yin. Usbd: Universal structural basis distillation for source-free graph domain adaptation.arXiv preprint arXiv:2602.08431, 2026

arXiv 2026

[37] [37]

When brain networks travel: Learning beyond site.arXiv preprint arXiv:2605.06050, 2026

Yingxu Wang, Kunyu Zhang, Yanwu Yang, Thomas Wolfers, Yujie Wu, Siyang Gao, and Nan Yin. When brain networks travel: Learning beyond site.arXiv preprint arXiv:2605.06050, 2026

Pith/arXiv arXiv 2026

[38] [38]

Advanced graph and sequence neural networks for molecular property prediction and drug discovery.Bioinformatics, 2022

Zhengyang Wang, Meng Liu, Youzhi Luo, Zhao Xu, Yaochen Xie, Limei Wang, Lei Cai, Qi Qi, Zhuoning Yuan, Tianbao Yang, et al. Advanced graph and sequence neural networks for molecular property prediction and drug discovery.Bioinformatics, 2022

2022

[39] [39]

Learning from graph: Mitigating label noise on graph through topological feature reconstruction

Zhonghao Wang, Yuanchen Bei, Sheng Zhou, Zhiyao Zhou, Jiapei Fan, Hui Xue, Haishuai Wang, and Jiajun Bu. Learning from graph: Mitigating label noise on graph through topological feature reconstruction. InCIKM, 2025

2025

[40] [40]

Fine-grained classification with noisy labels

Qi Wei, Lei Feng, Haoliang Sun, Ren Wang, Chenhui Guo, and Yilong Yin. Fine-grained classification with noisy labels. InCVPR, 2023

2023

[41] [41]

Smiles, a chemical language and information system

David Weininger. Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules.Journal of chemical information and computer sciences, 1988

1988

[42] [43]

Molecular joint representation learning via multi-modal information of smiles and graphs.IEEE/ACM transactions on computational biology and bioinformatics, 2023

Tianyu Wu, Yang Tang, Qiyu Sun, and Luolin Xiong. Molecular joint representation learning via multi-modal information of smiles and graphs.IEEE/ACM transactions on computational biology and bioinformatics, 2023

2023

[43] [44]

Moleculenet: a benchmark for molecular machine learning.Chemical science, 2018

Zhenqin Wu, Bharath Ramsundar, Evan N Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S Pappu, Karl Leswing, and Vijay Pande. Moleculenet: a benchmark for molecular machine learning.Chemical science, 2018

2018

[44] [45]

How powerful are graph neural networks?arXiv preprint arXiv:1810.00826, 2018

Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks?arXiv preprint arXiv:1810.00826, 2018

Pith/arXiv arXiv 2018

[45] [46]

Sport: A subgraph perspective on graph classification with label noise.TKDD, 2024

Nan Yin, Li Shen, Chong Chen, Xian-sheng Hua, and Xiao Luo. Sport: A subgraph perspective on graph classification with label noise.TKDD, 2024

2024

[46] [47]

Omg: Towards effective graph classification against label noise.TKDE, 2023

Nan Yin, Li Shen, Mengzhu Wang, Xiao Luo, Zhigang Luo, and Dacheng Tao. Omg: Towards effective graph classification against label noise.TKDE, 2023

2023

[47] [48]

Difficulty in chirality recognition for transformer architectures learning chemical structures from string representations.Nature Communications, 2024

Yasuhiro Yoshikai, Tadahaya Mizuno, Shumpei Nemoto, and Hiroyuki Kusuhara. Difficulty in chirality recognition for transformer architectures learning chemical structures from string representations.Nature Communications, 2024

2024

[48] [49]

Graph contrastive learning with augmentations.NIPS, 2020

Yuning You, Tianlong Chen, Yongduo Sui, Ting Chen, Zhangyang Wang, and Yang Shen. Graph contrastive learning with augmentations.NIPS, 2020

2020

[49] [50]

Mvmrl: a multi- view molecular representation learning method for molecular property prediction.Briefings in Bioinformatics, 2024

Ru Zhang, Yanmei Lin, Yijia Wu, Lei Deng, Hao Zhang, Mingzhi Liao, and Yuzhong Peng. Mvmrl: a multi- view molecular representation learning method for molecular property prediction.Briefings in Bioinformatics, 2024

2024

[50] [51]

Molecular property prediction based on graph structure learning.Bioinformatics, 2024

Bangyi Zhao, Weixia Xu, Jihong Guan, and Shuigeng Zhou. Molecular property prediction based on graph structure learning.Bioinformatics, 2024

2024

[51] [52]

Large language models for scientific discovery in molecular property prediction.Nature Machine Intelligence, 2025

Yizhen Zheng, Huan Yee Koh, Jiaxin Ju, Anh TN Nguyen, Lauren T May, Geoffrey I Webb, and Shirui Pan. Large language models for scientific discovery in molecular property prediction.Nature Machine Intelligence, 2025

2025

[52] [53]

Textencoderf# Residualevidencemapping Residualevidencemapping z

Gengmo Zhou, Zhifeng Gao, Qiankun Ding, Hang Zheng, Hongteng Xu, Zhewei Wei, Linfeng Zhang, and Guolin Ke. Uni-mol: A universal 3d molecular representation learning framework. InICLR, 2023. xxx, xxx, Volume , Issue 9 Notation summary Symbol Meaning Gi = (Vi, Ei, Xi) Molecular graph for moleculeiwith atoms, bonds, and atom features Ti Text-derived molecula...

2023