SPECTRA: Spectral Domain-Aware Graph Generation for Imbalanced Molecular Property Regression

Brenda Nogueira; Gisela A. Gonzalez-Montiel; Meng Jiang; Nitesh V. Chawla; Nuno Moniz

arxiv: 2511.04838 · v2 · pith:6BYWBHWMnew · submitted 2025-11-06 · 💻 cs.LG · math.SP· q-bio.MN

SPECTRA: Spectral Domain-Aware Graph Generation for Imbalanced Molecular Property Regression

Brenda Nogueira , Gisela A. Gonzalez-Montiel , Meng Jiang , Nitesh V. Chawla , Nuno Moniz This is my paper

Pith reviewed 2026-05-22 12:55 UTC · model grok-4.3

classification 💻 cs.LG math.SPq-bio.MN

keywords molecular property regressiongraph generationspectral methodsimbalanced regressionLaplacian spectragraph neural networksChebyshev convolutions

0 comments

The pith

SPECTRA generates molecular graphs by interpolating Laplacian spectra to improve regression on rare but relevant property targets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces SPECTRA to tackle imbalanced molecular property regression, where standard methods fail on chemically important but underrepresented target values. It uses a rarity-aware budgeting scheme to focus generation, aligns graphs via target neighbors for structural match, and interpolates Laplacian spectra together with node features and targets to create new data points. These synthetic graphs feed into a spectral GNN that employs edge-aware Chebyshev convolutions. The approach delivers competitive accuracy against leading methods on key benchmarks while cutting computation time by roughly four times. A reader cares because it offers a way to produce useful molecular representations instead of the invalid ones that arise from simple oversampling.

Core claim

SPECTRA shows that a combination of rarity-aware budgeting, target-neighbor graph alignment, and direct interpolation across Laplacian spectra, node features, and targets produces synthetic molecular graphs that, when paired with edge-aware Chebyshev spectral convolutions, raise prediction accuracy specifically in the underrepresented yet chemically relevant ranges of molecular properties.

What carries the argument

Rarity-aware interpolation of Laplacian spectra with target-neighbor alignment for synthetic molecular graph generation.

If this is right

Prediction accuracy rises for the scarce but chemically important molecular property ranges.
Computational cost drops by a factor of about four relative to leading oversampling or augmentation baselines.
Generated graphs remain chemically meaningful rather than producing the meaningless structures that oversampling often creates.
The same spectral GNN backbone with edge-aware Chebyshev convolutions integrates directly with the new data without architectural changes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The spectral interpolation technique could transfer to other graph regression settings where target values are unevenly distributed.
Because the method works directly in the Laplacian domain, it may reveal structure-property links that are harder to see in raw coordinate or fingerprint representations.
Scaling the rarity-aware budget to very large molecular libraries could test whether the fourfold speed gain holds when dataset size increases.

Load-bearing premise

Interpolating Laplacian spectra together with node features and targets produces chemically valid and distributionally useful molecular graphs that improve downstream regression on underrepresented targets rather than introducing artifacts or noise.

What would settle it

Direct validation showing that the generated graphs violate chemical rules or that prediction error on rare target ranges remains unchanged or worsens compared with standard training would falsify the central claim.

Figures

Figures reproduced from arXiv: 2511.04838 by Brenda Nogueira, Gisela A. Gonzalez-Montiel, Meng Jiang, Nitesh V. Chawla, Nuno Moniz.

**Figure 2.** Figure 2: Pipeline of spectral molecular interpolation. Molecular graphs are first aligned via Gro [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Joint distribution plots of molecular properties versus task targets for original (blue, cir [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Mean Absolute Error (MAE) distribution across target value ranges for each dataset. Colors correspond to different models as indicated in the legend. identified for each dataset. SPECTRA consistently lies on or very close to the Pareto frontier, indicating that it achieves a favorable trade-off between performance and efficiency. Compared to transformer-based models such as Molformer, which incur substant… view at source ↗

**Figure 5.** Figure 5: Time vs. MAE across models and datasets. Each point represents the average runtime (log scale) and mean absolute error (MAE) of a model–dataset pair. Black hollow circles and connecting lines indicate the Pareto frontier for each dataset. 5 CONCLUSION Experiments across benchmark datasets show that our method improves predictive accuracy in rare but critical regimes, preserves property–target correlations… view at source ↗

read the original abstract

Molecular property regression struggles with cases in chemically relevant target ranges that are underrepresented in datasets. Standard average error minimization approaches underperform in these highly relevant cases, and oversampling approaches lead to meaningless molecular representations. In this paper, we propose SPECTRA, a spectral, domain-aware graph generation method designed to improve the prediction of underrepresented but relevant molecular property values. It combines a rarity-aware budgeting scheme to focus generation where data are scarce, target-neighbors graph alignment to establish structural correspondence, and interpolation of Laplacian spectra, node features, and targets. Coupled with spectral GNN using edge-aware Chebyshev convolutions, SPECTRA shows its effectiveness in property prediction benchmarks with competitive performance over leading state-of-the-art methods in relevant target ranges, while requiring ~4x less computational time.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SPECTRA introduces rarity-aware spectral interpolation for generating molecular graphs to address imbalance in property regression, but the abstract leaves the experimental support and graph validity unclear.

read the letter

The core idea in this paper is a graph generation pipeline called SPECTRA that targets underrepresented molecular property values. It adds a rarity-aware budgeting step to decide where to generate, aligns graphs by target neighbors, then interpolates Laplacian spectra together with node features and the target values themselves. The generated graphs feed into an edge-aware Chebyshev spectral GNN. The abstract claims this beats leading methods on the relevant target ranges while running about four times faster.

Referee Report

2 major / 2 minor

Summary. The manuscript presents SPECTRA, a spectral domain-aware graph generation method for imbalanced molecular property regression. It introduces a rarity-aware budgeting scheme, target-neighbor graph alignment, and interpolation of Laplacian spectra together with node features and targets. These generated graphs are used to augment training for a spectral GNN employing edge-aware Chebyshev convolutions. The central claim is that this yields competitive performance on property prediction benchmarks in relevant (underrepresented) target ranges while requiring approximately 4x less computational time than leading state-of-the-art methods.

Significance. If the quantitative claims and chemical validity of the generated graphs are substantiated, the work could provide a useful contribution to handling data imbalance in molecular machine learning. The spectral interpolation approach offers a domain-specific alternative to generic oversampling, with potential for improved focus on chemically relevant but scarce property values.

major comments (2)

[Abstract] Abstract: the claim of 'competitive performance over leading state-of-the-art methods in relevant target ranges' and '~4x less computational time' is stated without any quantitative metrics, error bars, dataset details, ablation studies, or specific benchmark numbers. This absence makes it impossible to evaluate whether the central claim is supported by the experiments.
[Method] Method section (spectral interpolation and reconstruction): separate interpolation of Laplacian eigenvalues/eigenvectors, node features, and scalar targets does not include an explicit reconstruction procedure that enforces molecular constraints such as valence rules, bond orders, or RDKit sanitization. Because Laplacian spectra are not graph-unique, the resulting adjacency matrices may produce chemically invalid or non-isomorphic structures that act as noise rather than useful augmentations for rare targets.

minor comments (2)

[Abstract] The abstract refers to 'oversampling approaches lead to meaningless molecular representations' without citing specific prior works or explaining why the proposed spectral method avoids the same issue.
[Method] Notation for the rarity-aware budgeting parameters and the target-neighbor alignment procedure should be introduced with explicit equations or pseudocode for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We address the major comments point by point below and have updated the manuscript accordingly to improve clarity and address concerns about the presentation of results and methodological details.

read point-by-point responses

Referee: [Abstract] Abstract: the claim of 'competitive performance over leading state-of-the-art methods in relevant target ranges' and '~4x less computational time' is stated without any quantitative metrics, error bars, dataset details, ablation studies, or specific benchmark numbers. This absence makes it impossible to evaluate whether the central claim is supported by the experiments.

Authors: We agree with the referee that the abstract would be strengthened by the inclusion of specific quantitative metrics to support our claims. Due to space limitations in the original abstract, we focused on a high-level summary. In the revised manuscript, we have updated the abstract to include key benchmark performance numbers, error bars where applicable, dataset information, and references to ablation studies, while keeping it concise. The detailed experimental results, including comparisons with state-of-the-art methods, remain fully documented in the main body of the paper. revision: yes
Referee: [Method] Method section (spectral interpolation and reconstruction): separate interpolation of Laplacian eigenvalues/eigenvectors, node features, and scalar targets does not include an explicit reconstruction procedure that enforces molecular constraints such as valence rules, bond orders, or RDKit sanitization. Because Laplacian spectra are not graph-unique, the resulting adjacency matrices may produce chemically invalid or non-isomorphic structures that act as noise rather than useful augmentations for rare targets.

Authors: This is a valid concern, as non-unique spectra could indeed lead to invalid molecular graphs if not properly handled. Our method incorporates target-neighbor graph alignment to establish correspondence and guide the interpolation towards chemically meaningful structures. To explicitly address this, we have added a detailed description of the reconstruction procedure in the revised Method section. This includes steps for converting interpolated spectra back to adjacency matrices, followed by RDKit-based sanitization, enforcement of valence rules, and bond order validation. Furthermore, we have included quantitative results on the chemical validity of the generated graphs in the experimental evaluation to demonstrate that they serve as useful augmentations rather than noise. revision: yes

Circularity Check

0 steps flagged

SPECTRA introduces independent algorithmic components (rarity budgeting, spectral interpolation) evaluated on external benchmarks with no reduction to fitted inputs or self-definitional claims.

full rationale

The derivation chain proposes a new combination of rarity-aware budgeting, target-neighbors graph alignment, and interpolation of Laplacian spectra/node features/targets, then couples it to an edge-aware Chebyshev spectral GNN. Performance claims rest on empirical benchmarks against SOTA methods rather than any equation that forces the reported gains by construction. No load-bearing self-citation or uniqueness theorem is invoked to justify the core method; any minor self-citations (if present) are not central to the result. The approach remains self-contained against external validation data.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption that Laplacian spectra are a suitable basis for interpolating molecular graphs and that rarity-aware allocation plus target alignment will yield useful training examples; no explicit free parameters or invented entities are named in the abstract.

free parameters (1)

rarity-aware budgeting parameters
Controls how generation effort is allocated to scarce target regions; must be chosen or tuned.

axioms (1)

domain assumption Laplacian spectra, node features, and targets can be meaningfully interpolated to produce valid molecular graphs
Core generation step invoked in the method description.

pith-pipeline@v0.9.0 · 5683 in / 1341 out tokens · 35010 ms · 2026-05-22T12:55:57.153093+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

rarity-aware budgeting scheme derived from kernel density estimation of labels

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages

[1]

Overcoming class imbalance in drug discovery problems: Graph neural networks and balancing approaches

Rafael Lopes Almeida, Vinícius Gonçalves Maltarollo, and Frederico Gualberto Ferreira Coelho. Overcoming class imbalance in drug discovery problems: Graph neural networks and balancing approaches. Journal of Molecular Graphics and Modelling, 126: 0 108627, 2024

work page 2024
[2]

The first general index of molecular complexity

Steven H Bertz. The first general index of molecular complexity. Journal of the American Chemical Society, 103 0 (12): 0 3599--3601, 1981

work page 1981
[3]

Quantifying the chemical beauty of drugs

G Richard Bickerton, Gaia V Paolini, J \'e r \'e my Besnard, Sorel Muresan, and Andrew L Hopkins. Quantifying the chemical beauty of drugs. Nature chemistry, 4 0 (2): 0 90--98, 2012

work page 2012
[4]

Specformer: Spectral graph neural networks meet transformers.arXiv preprint arXiv:2303.01028,

Deyu Bo, Chuan Shi, Lele Wang, and Renjie Liao. Specformer: Spectral graph neural networks meet transformers. arXiv preprint arXiv:2303.01028, 2023 a

work page arXiv 2023
[5]

A survey on spectral graph neural networks

Deyu Bo, Chuan Zheng, Xinchen Wang, Peipei Jiao, Shirui Zhou, Hao Zhang, Zhewei Wei, and Chuan Shi. A survey on spectral graph neural networks. arXiv preprint arXiv:2302.05631, 2023 b

work page arXiv 2023
[6]

Smogn: a pre-processing approach for imbalanced regression

Paula Branco, Lu \' s Torgo, and Rita P Ribeiro. Smogn: a pre-processing approach for imbalanced regression. In First international workshop on learning with imbalanced domains: Theory and applications, pp.\ 36--50. PMLR, 2017

work page 2017
[7]

Learning imbalanced datasets with label-distribution-aware margin loss

Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, and Tengyu Ma. Learning imbalanced datasets with label-distribution-aware margin loss. Advances in neural information processing systems, 32, 2019

work page 2019
[8]

Smote: synthetic minority over-sampling technique

Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16: 0 321--357, 2002

work page 2002
[9]

Deep generative model for drug design from protein target sequence

Yangyang Chen, Zixu Wang, Lei Wang, Jianmin Wang, Pengyong Li, Dongsheng Cao, Xiangxiang Zeng, Xiucai Ye, and Tetsuya Sakurai. Deep generative model for drug design from protein target sequence. Journal of Cheminformatics, 15 0 (1): 0 38, 2023

work page 2023
[10]

Class-balanced loss based on effective number of samples

Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, and Serge Belongie. Class-balanced loss based on effective number of samples. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.\ 9268--9277, 2019

work page 2019
[11]

Convolutional neural networks on graphs with fast localized spectral filtering

Micha \"e l Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. Advances in neural information processing systems, 29, 2016

work page 2016
[12]

Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions

Peter Ertl and Ansgar Schuffenhauer. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. Journal of cheminformatics, 1 0 (1): 0 8, 2009

work page 2009
[13]

Natural product-likeness score and its application for prioritization of compound libraries

Peter Ertl, Silvio Roggo, and Ansgar Schuffenhauer. Natural product-likeness score and its application for prioritization of compound libraries. Journal of chemical information and modeling, 48 0 (1): 0 68--74, 2008

work page 2008
[14]

Reducing overconfident errors in molecular property classification using posterior network

Zhe Fan, Junda Yu, Xiangyu Zhang, Yuhan Chen, Shuqian Sun, Yuyang Zhang, Ming Chen, Feng Xiao, Wei Wu, Xiang-Nan Li, et al. Reducing overconfident errors in molecular property classification using posterior network. Patterns, 2024

work page 2024
[15]

Language models can learn complex molecular distributions

Daniel Flam-Shepherd, Kevin Zhu, and Al \'a n Aspuru-Guzik. Language models can learn complex molecular distributions. Nature Communications, 13 0 (1): 0 3293, 2022

work page 2022
[16]

Ranksim: Ranking similarity regularization for deep imbalanced regression

Yu Gong, Greg Mori, and Frederick Tung. Ranksim: Ranking similarity regularization for deep imbalanced regression. arXiv preprint arXiv:2205.15236, 2022

work page arXiv 2022
[17]

Autonomous molecule generation using reinforcement learning and docking to develop potential novel inhibitors

Woosung Jeon and Dongsup Kim. Autonomous molecule generation using reinforcement learning and docking to develop potential novel inhibitors. Scientific reports, 10 0 (1): 0 22104, 2020

work page 2020
[18]

Junction tree variational autoencoder for molecular graph generation

Wengong Jin, Regina Barzilay, and Tommi Jaakkola. Junction tree variational autoencoder for molecular graph generation. International Conference on Machine Learning, pp.\ 2323--2332, 2018

work page 2018
[19]

Orbital graph convolutional neural network for material property prediction

Mohammadreza Karamad, Rishi Magar, Yanming Shi, Samira Siahrostami, Ian D Gates, and Amir Barati Farimani. Orbital graph convolutional neural network for material property prediction. Physical Review Materials, 4 0 (9): 0 093801, 2020

work page 2020
[20]

Deepgraphmolgen, a multi-objective, computational strategy for generating molecules with desirable properties: a graph convolution and reinforcement learning approach

Yash Khemchandani, Stephen O’Hagan, Soumitra Samanta, Neil Swainston, Timothy J Roberts, Danushka Bollegala, and Douglas B Kell. Deepgraphmolgen, a multi-objective, computational strategy for generating molecules with desirable properties: a graph convolution and reinforcement learning approach. Journal of cheminformatics, 12 0 (1): 0 53, 2020

work page 2020
[21]

Mgcvae: multi-objective inverse design via molecular graph conditional variational autoencoder

Myeonghun Lee and Kyoungmin Min. Mgcvae: multi-objective inverse design via molecular graph conditional variational autoencoder. Journal of chemical information and modeling, 62 0 (12): 0 2943--2950, 2022

work page 2022
[22]

Large-scale spectral graph neural networks via laplacian sparsification: Technical report

Tianyi Li, Hongxu Yin, Chuan Shi, and Wei Lin. Large-scale spectral graph neural networks via laplacian sparsification: Technical report. arXiv preprint arXiv:2501.04570, 2025

work page arXiv 2025
[23]

Predicting drug-target interaction using a novel graph neural network with 3d structure-embedded graph representation

Jaechang Lim, Seongok Ryu, Kyubyong Park, Yo Jun Choe, Jiyeon Ham, and Woo Youn Kim. Predicting drug-target interaction using a novel graph neural network with 3d structure-embedded graph representation. Journal of Chemical Information and Modeling, 59 0 (9): 0 3981--3988, 2019

work page 2019
[24]

Focal loss for dense object detection

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Doll \'a r. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pp.\ 2980--2988, 2017

work page 2017
[25]

Semi-supervised graph imbalanced regression

Gang Liu, Tong Zhao, Eric Inae, Tengfei Luo, and Meng Jiang. Semi-supervised graph imbalanced regression. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp.\ 1453--1465, 2023 a

work page 2023
[26]

Semi-supervised graph imbalanced regression

Gang Liu, Tong Zhao, Eric Inae, Tengfei Luo, and Meng Jiang. Semi-supervised graph imbalanced regression. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD '23, pp.\ 1453–1465, New York, NY, USA, 2023 b . Association for Computing Machinery. ISBN 9798400701030. doi:10.1145/3580305.3599497. URL https://doi.org/10....

work page doi:10.1145/3580305.3599497 2023
[27]

sweater",

Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain, Andreas Veit, and Sanjiv Kumar. Long-tail learning via logit adjustment. arXiv preprint arXiv:2007.07314, 2020

work page arXiv 2007
[28]

A de novo molecular generation method using latent vector based generative adversarial network

Oleksii Prykhodko, Simon Viet Johansson, Panagiotis-Christos Kotsias, Josep Ar \'u s-Pous, Esben Jannik Bjerrum, Ola Engkvist, and Hongming Chen. A de novo molecular generation method using latent vector based generative adversarial network. Journal of cheminformatics, 11 0 (1): 0 74, 2019

work page 2019
[29]

Balanced mse for imbalanced visual regression

Jiawei Ren, Mingyuan Zhang, Cunjun Yu, and Ziwei Liu. Balanced mse for imbalanced visual regression. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.\ 7926--7935, 2022

work page 2022
[30]

Ribeiro and Nuno Moniz

Rita P. Ribeiro and Nuno Moniz. Imbalanced regression and extreme value prediction. Machine Learning, 109 0 (9): 0 1803--1835, 2020 a

work page 2020
[31]

Ribeiro and Nuno Moniz

Rita P. Ribeiro and Nuno Moniz. Imbalanced regression and extreme value prediction. Machine Learning, 109 0 (9): 0 1803--1835, September 2020 b . ISSN 1573-0565. doi:10.1007/s10994-020-05900-9. URL https://doi.org/10.1007/s10994-020-05900-9

work page doi:10.1007/s10994-020-05900-9 2020
[32]

Large-scale chemical language representations capture molecular structure and properties

Jerret Ross, Brian Belgodere, Vijil Chenthamarakshan, Inkit Padhi, Youssef Mroueh, and Payel Das. Large-scale chemical language representations capture molecular structure and properties. Nature Machine Intelligence, 4 0 (12): 0 1256--1264, 2022

work page 2022
[33]

Posterior re-calibration for imbalanced datasets

Junjiao Tian, Yen-Cheng Liu, Nathaniel Glaser, Yen-Chang Hsu, and Zsolt Kira. Posterior re-calibration for imbalanced datasets. Advances in neural information processing systems, 33: 0 8101--8113, 2020

work page 2020
[34]

Applications of machine learning in drug discovery and development

Jessica Vamathevan, Dominic Clark, Paul Czodrowski, Ian Dunham, Edgardo Ferran, George Lee, Bin Li, Anant Madabhushi, Parantu Shah, Michaela Spitzer, et al. Applications of machine learning in drug discovery and development. Nature Reviews Drug Discovery, 18 0 (6): 0 463--477, 2019

work page 2019
[35]

How powerful are spectral graph neural networks

Xiyuan Wang and Ming Zhang. How powerful are spectral graph neural networks. arXiv preprint arXiv:2205.11172, 2022

work page arXiv 2022
[36]

Molecular contrastive learning of representations via graph neural networks

Yuyang Wang, Jianren Wang, Zhonglin Cao, and Amir Barati Farimani. Molecular contrastive learning of representations via graph neural networks. Nature Machine Intelligence, 4 0 (3): 0 279--287, 2022

work page 2022
[37]

Prediction of physicochemical parameters by atomic contributions

Scott A Wildman and Gordon M Crippen. Prediction of physicochemical parameters by atomic contributions. Journal of chemical information and computer sciences, 39 0 (5): 0 868--873, 1999

work page 1999
[38]

Moleculenet: a benchmark for molecular machine learning

Zhenqin Wu, Bharath Ramsundar, Evan N Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S Pappu, Karl Leswing, and Vijay Pande. Moleculenet: a benchmark for molecular machine learning. Chemical science, 9 0 (2): 0 513--530, 2018

work page 2018
[39]

A novel graph oversampling framework for node classification in class-imbalanced graphs

Ruoyan Xia, Chao Zhang, and Yongdong Zhang. A novel graph oversampling framework for node classification in class-imbalanced graphs. Science China Information Sciences, 67 0 (1): 0 162101, 2024

work page 2024
[40]

Attentive fp: Augmenting graph neural networks with attentive message passing for molecular property prediction

Zhaoping Xiong, Dingyan Wang, Xiaohong Liu, Feisheng Zhong, Xutong Wan, Xiang Li, Zhaojian Li, Xiaomin Luo, Kaixian Chen, Hualiang Jiang, et al. Attentive fp: Augmenting graph neural networks with attentive message passing for molecular property prediction. Journal of Chemical Information and Modeling, 60 0 (6): 0 2213--2228, 2020

work page 2020
[41]

Spectral-aware augmentation for enhanced graph representation learning

Kaiqi Yang, Haoyu Han, Wei Jin, and Hui Liu. Spectral-aware augmentation for enhanced graph representation learning. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, pp.\ 2837--2847, 2024

work page 2024
[42]

Delving into deep imbalanced regression

Yuzhe Yang, Kaiwen Zha, Yingcong Chen, Hao Wang, and Dina Katabi. Delving into deep imbalanced regression. In International conference on machine learning, pp.\ 11842--11851. PMLR, 2021

work page 2021
[43]

Knowledge mapping of graph neural networks for drug discovery: a bibliometric and visualized analysis

Rufan Yao, Zhenhua Shen, Xinyi Xu, Guixia Ling, Rongwu Xiang, Tingyan Song, Fei Zhai, and Yuxuan Zhai. Knowledge mapping of graph neural networks for drug discovery: a bibliometric and visualized analysis. Frontiers in Pharmacology, 15, 2024

work page 2024
[44]

Graph contrastive learning with augmentations

Yuning You, Tianlong Chen, Yongduo Sui, Ting Chen, Zhangyang Wang, and Yang Shen. Graph contrastive learning with augmentations. Advances in neural information processing systems, 33: 0 5812--5823, 2020

work page 2020
[45]

Hierarchical molecular graph self-supervised learning for property prediction

Xuan Zang, Xianbing Zhao, and Buzhou Tang. Hierarchical molecular graph self-supervised learning for property prediction. Communications Chemistry, 6 0 (1): 0 34, 2023

work page 2023
[46]

A review on graph neural networks for predicting synergistic drug combinations

Bin Zhang and Mengjun Tu. A review on graph neural networks for predicting synergistic drug combinations. Artificial Intelligence Review, 2023

work page 2023
[47]

Boosting semi-supervised learning under imbalanced regression via pseudo-labeling

Nannan Zong, Songzhi Su, and Changle Zhou. Boosting semi-supervised learning under imbalanced regression via pseudo-labeling. Concurrency and Computation: Practice and Experience, 36 0 (19): 0 e8103, 2024

work page 2024
[48]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page
[49]

@esa (Ref

\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

work page
[50]

\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

work page
[51]

Hippocampus, Natalia Cerebro & Amelie P

@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...

work page 2026

[1] [1]

Overcoming class imbalance in drug discovery problems: Graph neural networks and balancing approaches

Rafael Lopes Almeida, Vinícius Gonçalves Maltarollo, and Frederico Gualberto Ferreira Coelho. Overcoming class imbalance in drug discovery problems: Graph neural networks and balancing approaches. Journal of Molecular Graphics and Modelling, 126: 0 108627, 2024

work page 2024

[2] [2]

The first general index of molecular complexity

Steven H Bertz. The first general index of molecular complexity. Journal of the American Chemical Society, 103 0 (12): 0 3599--3601, 1981

work page 1981

[3] [3]

Quantifying the chemical beauty of drugs

G Richard Bickerton, Gaia V Paolini, J \'e r \'e my Besnard, Sorel Muresan, and Andrew L Hopkins. Quantifying the chemical beauty of drugs. Nature chemistry, 4 0 (2): 0 90--98, 2012

work page 2012

[4] [4]

Specformer: Spectral graph neural networks meet transformers.arXiv preprint arXiv:2303.01028,

Deyu Bo, Chuan Shi, Lele Wang, and Renjie Liao. Specformer: Spectral graph neural networks meet transformers. arXiv preprint arXiv:2303.01028, 2023 a

work page arXiv 2023

[5] [5]

A survey on spectral graph neural networks

Deyu Bo, Chuan Zheng, Xinchen Wang, Peipei Jiao, Shirui Zhou, Hao Zhang, Zhewei Wei, and Chuan Shi. A survey on spectral graph neural networks. arXiv preprint arXiv:2302.05631, 2023 b

work page arXiv 2023

[6] [6]

Smogn: a pre-processing approach for imbalanced regression

Paula Branco, Lu \' s Torgo, and Rita P Ribeiro. Smogn: a pre-processing approach for imbalanced regression. In First international workshop on learning with imbalanced domains: Theory and applications, pp.\ 36--50. PMLR, 2017

work page 2017

[7] [7]

Learning imbalanced datasets with label-distribution-aware margin loss

Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, and Tengyu Ma. Learning imbalanced datasets with label-distribution-aware margin loss. Advances in neural information processing systems, 32, 2019

work page 2019

[8] [8]

Smote: synthetic minority over-sampling technique

Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16: 0 321--357, 2002

work page 2002

[9] [9]

Deep generative model for drug design from protein target sequence

Yangyang Chen, Zixu Wang, Lei Wang, Jianmin Wang, Pengyong Li, Dongsheng Cao, Xiangxiang Zeng, Xiucai Ye, and Tetsuya Sakurai. Deep generative model for drug design from protein target sequence. Journal of Cheminformatics, 15 0 (1): 0 38, 2023

work page 2023

[10] [10]

Class-balanced loss based on effective number of samples

Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, and Serge Belongie. Class-balanced loss based on effective number of samples. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.\ 9268--9277, 2019

work page 2019

[11] [11]

Convolutional neural networks on graphs with fast localized spectral filtering

Micha \"e l Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. Advances in neural information processing systems, 29, 2016

work page 2016

[12] [12]

Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions

Peter Ertl and Ansgar Schuffenhauer. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. Journal of cheminformatics, 1 0 (1): 0 8, 2009

work page 2009

[13] [13]

Natural product-likeness score and its application for prioritization of compound libraries

Peter Ertl, Silvio Roggo, and Ansgar Schuffenhauer. Natural product-likeness score and its application for prioritization of compound libraries. Journal of chemical information and modeling, 48 0 (1): 0 68--74, 2008

work page 2008

[14] [14]

Reducing overconfident errors in molecular property classification using posterior network

Zhe Fan, Junda Yu, Xiangyu Zhang, Yuhan Chen, Shuqian Sun, Yuyang Zhang, Ming Chen, Feng Xiao, Wei Wu, Xiang-Nan Li, et al. Reducing overconfident errors in molecular property classification using posterior network. Patterns, 2024

work page 2024

[15] [15]

Language models can learn complex molecular distributions

Daniel Flam-Shepherd, Kevin Zhu, and Al \'a n Aspuru-Guzik. Language models can learn complex molecular distributions. Nature Communications, 13 0 (1): 0 3293, 2022

work page 2022

[16] [16]

Ranksim: Ranking similarity regularization for deep imbalanced regression

Yu Gong, Greg Mori, and Frederick Tung. Ranksim: Ranking similarity regularization for deep imbalanced regression. arXiv preprint arXiv:2205.15236, 2022

work page arXiv 2022

[17] [17]

Autonomous molecule generation using reinforcement learning and docking to develop potential novel inhibitors

Woosung Jeon and Dongsup Kim. Autonomous molecule generation using reinforcement learning and docking to develop potential novel inhibitors. Scientific reports, 10 0 (1): 0 22104, 2020

work page 2020

[18] [18]

Junction tree variational autoencoder for molecular graph generation

Wengong Jin, Regina Barzilay, and Tommi Jaakkola. Junction tree variational autoencoder for molecular graph generation. International Conference on Machine Learning, pp.\ 2323--2332, 2018

work page 2018

[19] [19]

Orbital graph convolutional neural network for material property prediction

Mohammadreza Karamad, Rishi Magar, Yanming Shi, Samira Siahrostami, Ian D Gates, and Amir Barati Farimani. Orbital graph convolutional neural network for material property prediction. Physical Review Materials, 4 0 (9): 0 093801, 2020

work page 2020

[20] [20]

Deepgraphmolgen, a multi-objective, computational strategy for generating molecules with desirable properties: a graph convolution and reinforcement learning approach

Yash Khemchandani, Stephen O’Hagan, Soumitra Samanta, Neil Swainston, Timothy J Roberts, Danushka Bollegala, and Douglas B Kell. Deepgraphmolgen, a multi-objective, computational strategy for generating molecules with desirable properties: a graph convolution and reinforcement learning approach. Journal of cheminformatics, 12 0 (1): 0 53, 2020

work page 2020

[21] [21]

Mgcvae: multi-objective inverse design via molecular graph conditional variational autoencoder

Myeonghun Lee and Kyoungmin Min. Mgcvae: multi-objective inverse design via molecular graph conditional variational autoencoder. Journal of chemical information and modeling, 62 0 (12): 0 2943--2950, 2022

work page 2022

[22] [22]

Large-scale spectral graph neural networks via laplacian sparsification: Technical report

Tianyi Li, Hongxu Yin, Chuan Shi, and Wei Lin. Large-scale spectral graph neural networks via laplacian sparsification: Technical report. arXiv preprint arXiv:2501.04570, 2025

work page arXiv 2025

[23] [23]

Predicting drug-target interaction using a novel graph neural network with 3d structure-embedded graph representation

Jaechang Lim, Seongok Ryu, Kyubyong Park, Yo Jun Choe, Jiyeon Ham, and Woo Youn Kim. Predicting drug-target interaction using a novel graph neural network with 3d structure-embedded graph representation. Journal of Chemical Information and Modeling, 59 0 (9): 0 3981--3988, 2019

work page 2019

[24] [24]

Focal loss for dense object detection

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Doll \'a r. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pp.\ 2980--2988, 2017

work page 2017

[25] [25]

Semi-supervised graph imbalanced regression

Gang Liu, Tong Zhao, Eric Inae, Tengfei Luo, and Meng Jiang. Semi-supervised graph imbalanced regression. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp.\ 1453--1465, 2023 a

work page 2023

[26] [26]

Semi-supervised graph imbalanced regression

Gang Liu, Tong Zhao, Eric Inae, Tengfei Luo, and Meng Jiang. Semi-supervised graph imbalanced regression. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD '23, pp.\ 1453–1465, New York, NY, USA, 2023 b . Association for Computing Machinery. ISBN 9798400701030. doi:10.1145/3580305.3599497. URL https://doi.org/10....

work page doi:10.1145/3580305.3599497 2023

[27] [27]

sweater",

Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain, Andreas Veit, and Sanjiv Kumar. Long-tail learning via logit adjustment. arXiv preprint arXiv:2007.07314, 2020

work page arXiv 2007

[28] [28]

A de novo molecular generation method using latent vector based generative adversarial network

Oleksii Prykhodko, Simon Viet Johansson, Panagiotis-Christos Kotsias, Josep Ar \'u s-Pous, Esben Jannik Bjerrum, Ola Engkvist, and Hongming Chen. A de novo molecular generation method using latent vector based generative adversarial network. Journal of cheminformatics, 11 0 (1): 0 74, 2019

work page 2019

[29] [29]

Balanced mse for imbalanced visual regression

Jiawei Ren, Mingyuan Zhang, Cunjun Yu, and Ziwei Liu. Balanced mse for imbalanced visual regression. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.\ 7926--7935, 2022

work page 2022

[30] [30]

Ribeiro and Nuno Moniz

Rita P. Ribeiro and Nuno Moniz. Imbalanced regression and extreme value prediction. Machine Learning, 109 0 (9): 0 1803--1835, 2020 a

work page 2020

[31] [31]

Ribeiro and Nuno Moniz

Rita P. Ribeiro and Nuno Moniz. Imbalanced regression and extreme value prediction. Machine Learning, 109 0 (9): 0 1803--1835, September 2020 b . ISSN 1573-0565. doi:10.1007/s10994-020-05900-9. URL https://doi.org/10.1007/s10994-020-05900-9

work page doi:10.1007/s10994-020-05900-9 2020

[32] [32]

Large-scale chemical language representations capture molecular structure and properties

Jerret Ross, Brian Belgodere, Vijil Chenthamarakshan, Inkit Padhi, Youssef Mroueh, and Payel Das. Large-scale chemical language representations capture molecular structure and properties. Nature Machine Intelligence, 4 0 (12): 0 1256--1264, 2022

work page 2022

[33] [33]

Posterior re-calibration for imbalanced datasets

Junjiao Tian, Yen-Cheng Liu, Nathaniel Glaser, Yen-Chang Hsu, and Zsolt Kira. Posterior re-calibration for imbalanced datasets. Advances in neural information processing systems, 33: 0 8101--8113, 2020

work page 2020

[34] [34]

Applications of machine learning in drug discovery and development

Jessica Vamathevan, Dominic Clark, Paul Czodrowski, Ian Dunham, Edgardo Ferran, George Lee, Bin Li, Anant Madabhushi, Parantu Shah, Michaela Spitzer, et al. Applications of machine learning in drug discovery and development. Nature Reviews Drug Discovery, 18 0 (6): 0 463--477, 2019

work page 2019

[35] [35]

How powerful are spectral graph neural networks

Xiyuan Wang and Ming Zhang. How powerful are spectral graph neural networks. arXiv preprint arXiv:2205.11172, 2022

work page arXiv 2022

[36] [36]

Molecular contrastive learning of representations via graph neural networks

Yuyang Wang, Jianren Wang, Zhonglin Cao, and Amir Barati Farimani. Molecular contrastive learning of representations via graph neural networks. Nature Machine Intelligence, 4 0 (3): 0 279--287, 2022

work page 2022

[37] [37]

Prediction of physicochemical parameters by atomic contributions

Scott A Wildman and Gordon M Crippen. Prediction of physicochemical parameters by atomic contributions. Journal of chemical information and computer sciences, 39 0 (5): 0 868--873, 1999

work page 1999

[38] [38]

Moleculenet: a benchmark for molecular machine learning

Zhenqin Wu, Bharath Ramsundar, Evan N Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S Pappu, Karl Leswing, and Vijay Pande. Moleculenet: a benchmark for molecular machine learning. Chemical science, 9 0 (2): 0 513--530, 2018

work page 2018

[39] [39]

A novel graph oversampling framework for node classification in class-imbalanced graphs

Ruoyan Xia, Chao Zhang, and Yongdong Zhang. A novel graph oversampling framework for node classification in class-imbalanced graphs. Science China Information Sciences, 67 0 (1): 0 162101, 2024

work page 2024

[40] [40]

Attentive fp: Augmenting graph neural networks with attentive message passing for molecular property prediction

Zhaoping Xiong, Dingyan Wang, Xiaohong Liu, Feisheng Zhong, Xutong Wan, Xiang Li, Zhaojian Li, Xiaomin Luo, Kaixian Chen, Hualiang Jiang, et al. Attentive fp: Augmenting graph neural networks with attentive message passing for molecular property prediction. Journal of Chemical Information and Modeling, 60 0 (6): 0 2213--2228, 2020

work page 2020

[41] [41]

Spectral-aware augmentation for enhanced graph representation learning

Kaiqi Yang, Haoyu Han, Wei Jin, and Hui Liu. Spectral-aware augmentation for enhanced graph representation learning. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, pp.\ 2837--2847, 2024

work page 2024

[42] [42]

Delving into deep imbalanced regression

Yuzhe Yang, Kaiwen Zha, Yingcong Chen, Hao Wang, and Dina Katabi. Delving into deep imbalanced regression. In International conference on machine learning, pp.\ 11842--11851. PMLR, 2021

work page 2021

[43] [43]

Knowledge mapping of graph neural networks for drug discovery: a bibliometric and visualized analysis

Rufan Yao, Zhenhua Shen, Xinyi Xu, Guixia Ling, Rongwu Xiang, Tingyan Song, Fei Zhai, and Yuxuan Zhai. Knowledge mapping of graph neural networks for drug discovery: a bibliometric and visualized analysis. Frontiers in Pharmacology, 15, 2024

work page 2024

[44] [44]

Graph contrastive learning with augmentations

Yuning You, Tianlong Chen, Yongduo Sui, Ting Chen, Zhangyang Wang, and Yang Shen. Graph contrastive learning with augmentations. Advances in neural information processing systems, 33: 0 5812--5823, 2020

work page 2020

[45] [45]

Hierarchical molecular graph self-supervised learning for property prediction

Xuan Zang, Xianbing Zhao, and Buzhou Tang. Hierarchical molecular graph self-supervised learning for property prediction. Communications Chemistry, 6 0 (1): 0 34, 2023

work page 2023

[46] [46]

A review on graph neural networks for predicting synergistic drug combinations

Bin Zhang and Mengjun Tu. A review on graph neural networks for predicting synergistic drug combinations. Artificial Intelligence Review, 2023

work page 2023

[47] [47]

Boosting semi-supervised learning under imbalanced regression via pseudo-labeling

Nannan Zong, Songzhi Su, and Changle Zhou. Boosting semi-supervised learning under imbalanced regression via pseudo-labeling. Concurrency and Computation: Practice and Experience, 36 0 (19): 0 e8103, 2024

work page 2024

[48] [48]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page

[49] [49]

@esa (Ref

\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

work page

[50] [50]

\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

work page

[51] [51]

Hippocampus, Natalia Cerebro & Amelie P

@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...

work page 2026