MOLAR: Learning Multimodal Molecular Representations from Noisy Labels
Pith reviewed 2026-06-27 01:14 UTC · model grok-4.3
The pith
MOLAR separates latent clean-property inference from recorded-label observation using graph and text views.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MOLAR separates latent clean-property inference from recorded-label observation: graph and text views contribute residual evidence to a clean-property distribution, and a categorical label-observation channel maps this distribution to recorded labels for training. This formulation derives posterior label reliability and modality-specific molecular evidence from the model.
What carries the argument
The separation of a latent clean-property distribution (from graph and text) and a categorical label-observation channel (to recorded labels).
If this is right
- MOLAR outperforms baselines on naturally noisy molecular benchmarks.
- It also outperforms on controlled label-flipping benchmarks.
- The model derives posterior label reliability scores.
- Visualization shows modality-specific evidence and reliability diagnostics.
Where Pith is reading between the lines
- Reliability scores could help curate better training sets from noisy databases.
- The method might generalize to other domains with noisy scientific annotations like images or sequences.
- If the clean distribution is accurate, predictions could be more robust to changes in label collection methods.
Load-bearing premise
The proposed generative separation between the clean-property distribution and the label-observation channel can be learned from noisy data without needing clean labels or detailed noise models.
What would settle it
Controlled experiments showing that MOLAR performs no better than standard multimodal models when labels are flipped at known rates would falsify the utility of the separation.
read the original abstract
Motivation: Noisy labels are a common challenge in molecular property prediction because molecular annotations are often obtained from assays, curated databases, or weak annotation pipelines rather than directly observed clean biological states. Treating recorded labels as reliable supervision can cause models to memorize corrupted observations and learn misleading molecular evidence. In multimodal molecular representation learning, this issue can be amplified by graph-text fusion or alignment, which may propagate label-induced errors across modalities. Results: We propose MOLAR, a noise-aware framework for learning multimodal molecular representations from noisy labels. MOLAR separates latent clean-property inference from recorded-label observation: graph and text views contribute residual evidence to a clean-property distribution, and a categorical label-observation channel maps this distribution to recorded labels for training. This formulation derives posterior label reliability and modality-specific molecular evidence from the model. Experiments on naturally noisy molecular benchmarks and controlled label-flipping benchmarks show that MOLAR consistently outperforms representative baselines. Visualization analyses further show that MOLAR provides interpretable reliability and modality-evidence diagnostics.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes MOLAR, a noise-aware framework for multimodal molecular representation learning. It models the generative process by separating latent clean-property inference (from graph and text views) from a categorical label-observation channel that maps the clean distribution to recorded noisy labels, enabling derivation of posterior label reliability and modality-specific evidence. Experiments on naturally noisy molecular benchmarks and controlled label-flipping settings show consistent outperformance over baselines, with additional visualization analyses for interpretability.
Significance. If the separation between clean-property distribution and label-observation channel is identifiable and learnable from noisy supervision alone, the framework could provide a principled way to obtain interpretable reliability diagnostics in a domain where assay-derived labels are frequently noisy; the multimodal aspect and empirical gains on both natural and synthetic noise would be of practical interest to molecular ML.
major comments (1)
- [Method (generative factorization and posterior derivation)] The central claim rests on recovering the factorization p(y_recorded | x) = ∫ p(y_recorded | y_clean) p(y_clean | x_graph, x_text) dy_clean by maximum likelihood on observed (x, y_recorded) pairs alone. Without an explicit noise-transition matrix, anchor points, or a clean-label subset, the observed-data likelihood is invariant under reparameterizations that trade probability mass between the clean posterior and the observation channel; consequently the derived posterior reliabilities and modality-specific evidence are not guaranteed to recover the intended latent quantities. This identifiability issue is load-bearing for the separation and interpretability claims (see the generative model description and the derivation of posteriors).
minor comments (2)
- [Abstract and Experiments] The abstract and results section state that MOLAR 'consistently outperforms representative baselines' but do not list the exact baselines, the magnitude of gains, or statistical significance tests; adding these details would strengthen the experimental claims.
- [Method] Notation for the clean-property distribution and the observation channel should be introduced with explicit equations early in the method section to avoid ambiguity when later referring to 'residual evidence' and 'posterior reliability'.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and detailed review. The identifiability concern regarding the generative factorization is a substantive point that we address directly below. We believe the empirical evidence and modeling choices provide support for the framework's utility, while acknowledging the theoretical subtlety.
read point-by-point responses
-
Referee: [Method (generative factorization and posterior derivation)] The central claim rests on recovering the factorization p(y_recorded | x) = ∫ p(y_recorded | y_clean) p(y_clean | x_graph, x_text) dy_clean by maximum likelihood on observed (x, y_recorded) pairs alone. Without an explicit noise-transition matrix, anchor points, or a clean-label subset, the observed-data likelihood is invariant under reparameterizations that trade probability mass between the clean posterior and the observation channel; consequently the derived posterior reliabilities and modality-specific evidence are not guaranteed to recover the intended latent quantities. This identifiability issue is load-bearing for the separation and interpretability claims (see the generative model description and the derivation of posteriors).
Authors: We agree that identifiability of the clean-property posterior and the label-observation channel from noisy observations alone is not guaranteed in general without additional structure. Our formulation parameterizes the observation channel as a learnable categorical conditional distribution p(y_recorded | y_clean) that is jointly optimized with the multimodal clean-property inference network via the observed-data marginal likelihood. The multimodal (graph + text) inputs supply complementary evidence that, in practice, regularizes the decomposition. While we do not claim unique recovery of the ground-truth latent quantities, the resulting model yields (i) improved predictive performance on both naturally noisy and controlled label-flip benchmarks and (ii) post-hoc reliability and modality-evidence scores that align with domain expectations in visualization analyses. We will revise the manuscript to include an explicit discussion of the modeling assumptions, the lack of anchor-point or clean-label supervision, and the empirical rather than theoretical guarantees on the recovered posteriors. revision: partial
Circularity Check
No circularity: derivation not reducible to inputs in provided text
full rationale
The abstract describes a generative separation between clean-property distribution and label-observation channel but supplies no equations, no explicit likelihood, and no derivation steps. Without visible formulas showing that posterior reliabilities reduce to fitted parameters by construction or that any quantity is renamed as a prediction, no load-bearing step matches the enumerated circularity patterns. The central claim remains a modeling assumption whose identifiability is an external statistical question rather than an internal definitional collapse. The paper is therefore self-contained against the supplied text; external benchmarks or full equations would be required to raise the score.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Machine learning assisted hit prioritization for high throughput screening in drug discovery.ACS Central Science, 2024
Davide Boldini, Lukas Friedrich, Daniel Kuhn, and Stephan A Sieber. Machine learning assisted hit prioritization for high throughput screening in drug discovery.ACS Central Science, 2024
2024
-
[2]
Mf-pcba: Multifidelity high-throughput screening benchmarks for drug discovery and machine learning
David Buterez, Jon Paul Janet, Steven J Kiddle, and Pietro Li` o. Mf-pcba: Multifidelity high-throughput screening benchmarks for drug discovery and machine learning. Journal of Chemical Information and Modeling, 2023
2023
-
[3]
A systematic study of key elements underlying molecular property prediction
Jianyuan Deng, Zhibo Yang, Hehe Wang, Iwao Ojima, Dimitris Samaras, and Fusheng Wang. A systematic study of key elements underlying molecular property prediction. Nature Communications, 2023
2023
-
[4]
Convolutional networks on graphs for learning molecular fingerprints.NIPS, 2015
David K Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Al´ an Aspuru-Guzik, and Ryan P Adams. Convolutional networks on graphs for learning molecular fingerprints.NIPS, 2015
2015
-
[5]
Translation between molecules and natural language
Carl Edwards, Tuan Lai, Kevin Ros, Garrett Honke, Kyunghyun Cho, and Heng Ji. Translation between molecules and natural language. InEMNLP, 2022
2022
-
[6]
Mol-instructions: A large-scale biomolecular instruction dataset for large language models
Yin Fang, Xiaozhuan Liang, Ningyu Zhang, Kangwei Liu, Rui Huang, Zhuo Chen, Xiaohui Fan, and Huajun Chen. Mol-instructions: A large-scale biomolecular instruction dataset for large language models. InICLR, 2024
2024
-
[7]
Mdfcl: Multimodal data fusion-based graph contrastive learning framework for molecular property prediction.Pattern Recognition, 2025
Xu Gong, Maotao Liu, Qun Liu, Yike Guo, and Guoyin Wang. Mdfcl: Multimodal data fusion-based graph contrastive learning framework for molecular property prediction.Pattern Recognition, 2025
2025
-
[8]
Kipf and Max Welling
Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. InICLR, 2017
2017
-
[9]
Pre-training graph neural networks on molecules by using subgraph-conditioned graph information bottleneck
O-Joun Lee et al. Pre-training graph neural networks on molecules by using subgraph-conditioned graph information bottleneck. InAAAI, 2025
2025
-
[10]
Instance-dependent label distribution estimation for learning with label noise.IJCV, 2025
Zehui Liao, Shishuai Hu, Yutong Xie, and Yong Xia. Instance-dependent label distribution estimation for learning with label noise.IJCV, 2025
2025
-
[11]
Learning the latent causal structure for modeling label noise.NIPS, 2024
Yexiong Lin, Yu Yao, and Tongliang Liu. Learning the latent causal structure for modeling label noise.NIPS, 2024
2024
-
[12]
Multi-modal molecule structure– text model for text-based retrieval and editing.Nature Machine Intelligence, 2023
Shengchao Liu, Weili Nie, Chengpeng Wang, Jiarui Lu, Zhuoran Qiao, Ling Liu, Jian Tang, Chaowei Xiao, and Animashree Anandkumar. Multi-modal molecule structure– text model for text-based retrieval and editing.Nature Machine Intelligence, 2023
2023
-
[13]
Identifiability of label noise transition matrix
Yang Liu, Hao Cheng, and Kun Zhang. Identifiability of label noise transition matrix. InICLR, 2023
2023
-
[14]
Molca: Molecular graph-language modeling with cross-modal projector and uni-modal adapter
Zhiyuan Liu, Sihang Li, Yanchen Luo, Hao Fei, Yixin Cao, Kenji Kawaguchi, Xiang Wang, and Tat-Seng Chua. Molca: Molecular graph-language modeling with cross-modal projector and uni-modal adapter. InEMNLP, 2023
2023
-
[15]
Rethinking tokenizer and decoder in masked graph modeling for molecules.NIPS, 2023
Zhiyuan Liu, Yaorui Shi, An Zhang, Enzhi Zhang, Kenji Kawaguchi, Xiang Wang, and Tat-Seng Chua. Rethinking tokenizer and decoder in masked graph modeling for molecules.NIPS, 2023
2023
-
[16]
Multimodal fused deep learning for drug property prediction: Integrating chemical language and molecular graph.Computational and Structural Biotechnology Journal, 2024
Xiaohua Lu, Liangxu Xie, Lei Xu, Rongzhi Mao, Xiaojun Xu, and Shan Chang. Multimodal fused deep learning for drug property prediction: Integrating chemical language and molecular graph.Computational and Structural Biotechnology Journal, 2024
2024
-
[17]
Cross-dependent graph neural networks for molecular property prediction.Bioinformatics, 2022
Hehuan Ma, Yatao Bian, Yu Rong, Wenbing Huang, Tingyang Xu, Weiyang Xie, Geyan Ye, and Junzhou Huang. Cross-dependent graph neural networks for molecular property prediction.Bioinformatics, 2022
2022
-
[18]
From intuition to ai: evolution of small molecule representations in drug discovery.Briefings in bioinformatics, 2024
Miles McGibbon, Steven Shave, Jie Dong, Yumiao Gao, Douglas R Houston, Jiancong Xie, Yuedong Yang, Philippe Schwaller, and Vincent Blay. From intuition to ai: evolution of small molecule representations in drug discovery.Briefings in bioinformatics, 2024
2024
-
[19]
Noisy label learning with instance-dependent outliers: Identifiability via crowd wisdom.NIPS, 2024
Tri Nguyen, Shahana Ibrahim, and Xiao Fu. Noisy label learning with instance-dependent outliers: Identifiability via crowd wisdom.NIPS, 2024
2024
-
[20]
Biot5+: Towards generalized biological understanding with 8 xxx, xxx, Volume , Issue iupac integration and multi-task tuning
Qizhi Pei, Lijun Wu, Kaiyuan Gao, Xiaozhuan Liang, Yin Fang, Jinhua Zhu, Shufang Xie, Tao Qin, and Rui Yan. Biot5+: Towards generalized biological understanding with 8 xxx, xxx, Volume , Issue iupac integration and multi-task tuning. InACL, 2024
2024
-
[21]
Biot5: Enriching cross- modal integration in biology with chemical knowledge and natural language associations
Qizhi Pei, Wei Zhang, Jinhua Zhu, Kehan Wu, Kaiyuan Gao, Lijun Wu, Yingce Xia, and Rui Yan. Biot5: Enriching cross- modal integration in biology with chemical knowledge and natural language associations. InEMNLP, 2023
2023
-
[22]
Robust training of graph neural networks via noise governance
Siyi Qian, Haochao Ying, Renjun Hu, Jingbo Zhou, Jintai Chen, Danny Z Chen, and Jian Wu. Robust training of graph neural networks via noise governance. InWSDM, 2023
2023
-
[23]
Extended-connectivity fingerprints.Journal of chemical information and modeling, 2010
David Rogers and Mathew Hahn. Extended-connectivity fingerprints.Journal of chemical information and modeling, 2010
2010
-
[24]
Molprop: Molecular property prediction with multimodal language and graph fusion.Journal of Cheminformatics, 2024
Zachary A Rollins, Alan C Cheng, and Essam Metwally. Molprop: Molecular property prediction with multimodal language and graph fusion.Journal of Cheminformatics, 2024
2024
-
[25]
Self-supervised graph transformer on large-scale molecular data.NIPS, 2020
Yu Rong, Yatao Bian, Tingyang Xu, Weiyang Xie, Ying Wei, Wenbing Huang, and Junzhou Huang. Self-supervised graph transformer on large-scale molecular data.NIPS, 2020
2020
-
[26]
Large-scale chemical language representations capture molecular structure and properties.Nature Machine Intelligence, 2022
Jerret Ross, Brian Belgodere, Vijil Chenthamarakshan, Inkit Padhi, Youssef Mroueh, and Payel Das. Large-scale chemical language representations capture molecular structure and properties.Nature Machine Intelligence, 2022
2022
-
[27]
Can large language models understand molecules?BMC bioinformatics, 2024
Shaghayegh Sadeghi, Alan Bui, Ali Forooghi, Jianguo Lu, and Alioune Ngom. Can large language models understand molecules?BMC bioinformatics, 2024
2024
-
[28]
Graph attention networks
Petar Veliˇ ckovi´ c, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. Graph attention networks. InICLR, 2018
2018
-
[29]
Chain-aware graph neural networks for molecular property prediction.Bioinformatics, 2024
Honghao Wang, Acong Zhang, Yuan Zhong, Junlei Tang, Kai Zhang, and Ping Li. Chain-aware graph neural networks for molecular property prediction.Bioinformatics, 2024
2024
-
[30]
Sgac: a graph neural network framework for imbalanced and structure-aware amp classification.Briefings in Bioinformatics, 27(1):bbag038, 2026
Yingxu Wang, Victor Liang, Nan Yin, Siwei Liu, and Eran Segal. Sgac: a graph neural network framework for imbalanced and structure-aware amp classification.Briefings in Bioinformatics, 27(1):bbag038, 2026
2026
-
[31]
Yingxu Wang, Xinwang Liu, Mengzhu Wang, Siyang Gao, and Nan Yin. Riemannian flow matching for disentangled graph domain adaptation.arXiv preprint arXiv:2602.00656, 2026
Pith/arXiv arXiv 2026
-
[32]
Nested graph pseudo-label refinement for noisy label domain adaptation learning
Yingxu Wang, Mengzhu Wang, Zhichao Huang, Suyu Liu, and Nan Yin. Nested graph pseudo-label refinement for noisy label domain adaptation learning. InAAAI, 2026
2026
-
[33]
Degree-conscious spiking graph for cross-domain adaptation.arXiv preprint arXiv:2410.06883, 2024
Yingxu Wang, Mengzhu Wang, Houcheng Su, Nan Yin, Quanming Yao, and James Kwok. Degree-conscious spiking graph for cross-domain adaptation.arXiv preprint arXiv:2410.06883, 2024
arXiv 2024
-
[34]
Dusego: Dual second-order equivariant graph ordinary differential equation.TKDD, 2025
Yingxu Wang, Nan Yin, Mingyan Xiao, Xinhao Yi, Siwei Liu, and Shangsong Liang. Dusego: Dual second-order equivariant graph ordinary differential equation.TKDD, 2025
2025
-
[35]
Protomol: enhancing molecular property prediction via prototype-guided multimodal learning.Briefings in Bioinformatics, 2025
Yingxu Wang, Kunyu Zhang, Jiaxin Huang, Nan Yin, Siwei Liu, and Eran Segal. Protomol: enhancing molecular property prediction via prototype-guided multimodal learning.Briefings in Bioinformatics, 2025
2025
-
[36]
Yingxu Wang, Kunyu Zhang, Mengzhu Wang, Siyang Gao, and Nan Yin. Usbd: Universal structural basis distillation for source-free graph domain adaptation.arXiv preprint arXiv:2602.08431, 2026
arXiv 2026
-
[37]
When brain networks travel: Learning beyond site.arXiv preprint arXiv:2605.06050, 2026
Yingxu Wang, Kunyu Zhang, Yanwu Yang, Thomas Wolfers, Yujie Wu, Siyang Gao, and Nan Yin. When brain networks travel: Learning beyond site.arXiv preprint arXiv:2605.06050, 2026
Pith/arXiv arXiv 2026
-
[38]
Advanced graph and sequence neural networks for molecular property prediction and drug discovery.Bioinformatics, 2022
Zhengyang Wang, Meng Liu, Youzhi Luo, Zhao Xu, Yaochen Xie, Limei Wang, Lei Cai, Qi Qi, Zhuoning Yuan, Tianbao Yang, et al. Advanced graph and sequence neural networks for molecular property prediction and drug discovery.Bioinformatics, 2022
2022
-
[39]
Learning from graph: Mitigating label noise on graph through topological feature reconstruction
Zhonghao Wang, Yuanchen Bei, Sheng Zhou, Zhiyao Zhou, Jiapei Fan, Hui Xue, Haishuai Wang, and Jiajun Bu. Learning from graph: Mitigating label noise on graph through topological feature reconstruction. InCIKM, 2025
2025
-
[40]
Fine-grained classification with noisy labels
Qi Wei, Lei Feng, Haoliang Sun, Ren Wang, Chenhui Guo, and Yilong Yin. Fine-grained classification with noisy labels. InCVPR, 2023
2023
-
[41]
Smiles, a chemical language and information system
David Weininger. Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules.Journal of chemical information and computer sciences, 1988
1988
-
[43]
Molecular joint representation learning via multi-modal information of smiles and graphs.IEEE/ACM transactions on computational biology and bioinformatics, 2023
Tianyu Wu, Yang Tang, Qiyu Sun, and Luolin Xiong. Molecular joint representation learning via multi-modal information of smiles and graphs.IEEE/ACM transactions on computational biology and bioinformatics, 2023
2023
-
[44]
Moleculenet: a benchmark for molecular machine learning.Chemical science, 2018
Zhenqin Wu, Bharath Ramsundar, Evan N Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S Pappu, Karl Leswing, and Vijay Pande. Moleculenet: a benchmark for molecular machine learning.Chemical science, 2018
2018
-
[45]
How powerful are graph neural networks?arXiv preprint arXiv:1810.00826, 2018
Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks?arXiv preprint arXiv:1810.00826, 2018
Pith/arXiv arXiv 2018
-
[46]
Sport: A subgraph perspective on graph classification with label noise.TKDD, 2024
Nan Yin, Li Shen, Chong Chen, Xian-sheng Hua, and Xiao Luo. Sport: A subgraph perspective on graph classification with label noise.TKDD, 2024
2024
-
[47]
Omg: Towards effective graph classification against label noise.TKDE, 2023
Nan Yin, Li Shen, Mengzhu Wang, Xiao Luo, Zhigang Luo, and Dacheng Tao. Omg: Towards effective graph classification against label noise.TKDE, 2023
2023
-
[48]
Difficulty in chirality recognition for transformer architectures learning chemical structures from string representations.Nature Communications, 2024
Yasuhiro Yoshikai, Tadahaya Mizuno, Shumpei Nemoto, and Hiroyuki Kusuhara. Difficulty in chirality recognition for transformer architectures learning chemical structures from string representations.Nature Communications, 2024
2024
-
[49]
Graph contrastive learning with augmentations.NIPS, 2020
Yuning You, Tianlong Chen, Yongduo Sui, Ting Chen, Zhangyang Wang, and Yang Shen. Graph contrastive learning with augmentations.NIPS, 2020
2020
-
[50]
Mvmrl: a multi- view molecular representation learning method for molecular property prediction.Briefings in Bioinformatics, 2024
Ru Zhang, Yanmei Lin, Yijia Wu, Lei Deng, Hao Zhang, Mingzhi Liao, and Yuzhong Peng. Mvmrl: a multi- view molecular representation learning method for molecular property prediction.Briefings in Bioinformatics, 2024
2024
-
[51]
Molecular property prediction based on graph structure learning.Bioinformatics, 2024
Bangyi Zhao, Weixia Xu, Jihong Guan, and Shuigeng Zhou. Molecular property prediction based on graph structure learning.Bioinformatics, 2024
2024
-
[52]
Large language models for scientific discovery in molecular property prediction.Nature Machine Intelligence, 2025
Yizhen Zheng, Huan Yee Koh, Jiaxin Ju, Anh TN Nguyen, Lauren T May, Geoffrey I Webb, and Shirui Pan. Large language models for scientific discovery in molecular property prediction.Nature Machine Intelligence, 2025
2025
-
[53]
Textencoderf# Residualevidencemapping Residualevidencemapping z
Gengmo Zhou, Zhifeng Gao, Qiankun Ding, Hang Zheng, Hongteng Xu, Zhewei Wei, Linfeng Zhang, and Guolin Ke. Uni-mol: A universal 3d molecular representation learning framework. InICLR, 2023. xxx, xxx, Volume , Issue 9 Notation summary Symbol Meaning Gi = (Vi, Ei, Xi) Molecular graph for moleculeiwith atoms, bonds, and atom features Ti Text-derived molecula...
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.