Protein Representation Learning with Secondary-Structure and Energy-Filtered Hydrogen-Bond Graphs

Dongqi Fu; El houcine Bergou; Hajar El Hammouti; Lamiae Azizi; Limei Wang; Mohamed Mouhajir

arxiv: 2606.19374 · v1 · pith:B3JQ2E2Unew · submitted 2026-06-12 · 💻 cs.LG · cs.AI

Protein Representation Learning with Secondary-Structure and Energy-Filtered Hydrogen-Bond Graphs

Mohamed Mouhajir , Limei Wang , El Houcine Bergou , Hajar El Hammouti , Lamiae Azizi , Dongqi Fu This is my paper

Pith reviewed 2026-06-27 04:58 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords protein representation learninggraph neural networkssecondary structurehydrogen bondsprotein foldinginductive biasstructural motifs

0 comments

The pith

Incorporating secondary structure assignments and energy-filtered hydrogen-bond edges improves protein graph representations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Proteins fold into shapes stabilized by secondary structures like helices and sheets held by hydrogen bonds, yet many graph models for proteins use only sequence order or nearby atoms. This paper builds graphs where each residue node carries its secondary structure label and edges connect only residues linked by sufficiently strong hydrogen bonds. The resulting graph neural network learns representations that better reflect the principles of protein folding and stability. Evaluation on common protein benchmarks shows consistent gains over previous graph methods. The approach also produces representations whose connectivity matches known structural biology patterns.

Core claim

By augmenting residue-level node features with secondary structure assignments and constructing graph edges exclusively from energy-filtered hydrogen-bond interactions, the model captures both recurring local motifs and long-range stabilizing couplings that govern protein stability and function, leading to improved performance on protein benchmarks and greater alignment with biological motifs.

What carries the argument

Secondary-structure-augmented graph neural network with edges defined by energy-filtered hydrogen-bond interactions

If this is right

Consistent improvements over existing graph-based methods on protein benchmarks
Enhanced biological interpretability of the learned graph representations
Learned connectivity aligns with established structural motifs

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Removing the energy filter on hydrogen bonds should reduce performance toward levels seen with proximity-based graphs
The same node and edge construction could be tested on related tasks such as mutation stability prediction
The topology may transfer to modeling interactions between proteins and small molecules

Load-bearing premise

Secondary-structure assignments and energy-filtered hydrogen-bond interactions supply a graph topology that is meaningfully more informative for stability and function than sequence adjacency or geometric proximity alone.

What would settle it

Train the same GNN architecture on identical protein data but with three different edge sets (sequence adjacency, geometric proximity, and the proposed energy-filtered hydrogen bonds), then measure whether the hydrogen-bond version produces statistically significant gains on a held-out stability or function prediction benchmark.

Figures

Figures reproduced from arXiv: 2606.19374 by Dongqi Fu, El houcine Bergou, Hajar El Hammouti, Lamiae Azizi, Limei Wang, Mohamed Mouhajir.

**Figure 2.** Figure 2: Comparison of graph construction strategies. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Examples of 3D protein structures from the FOLD dataset annotated by DSSP. Colors indicate DSSP secondary-structure [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

read the original abstract

Graph-based representations are widely used in protein modeling, yet many existing approaches rely primarily on sequence adjacency or geometric proximity, which only partially reflect the principles governing protein folding. Proteins instead adopt complex three-dimensional conformations organized around secondary structure elements, such as $\alpha$-helices and $\beta$-sheets, which encode recurring local motifs and stabilizing hydrogen-bond interactions. In this work, we introduce a secondary-structure-aware graph neural network for protein representation learning. Residue-level node representations are augmented with secondary structure assignments, and graph edges are constructed from hydrogen-bond interactions filtered by their energetic strength. This design enables the model to capture both local structural context and long-range couplings that are central to protein stability and function. We evaluate the proposed approach on commonly used protein benchmarks and observe consistent improvements over existing graph-based methods. In addition, the resulting graph representations offer enhanced biological interpretability, as the learned connectivity aligns with established structural motifs. These findings suggest that incorporating secondary structure and energy-filtered hydrogen-bond topology provides an effective inductive bias for protein representation learning. The code is released at https://github.com/mohamedmohamed2021/SSProNet

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Energy-filtered H-bond graphs with secondary structure labels improve protein representations on benchmarks, with code released.

read the letter

The punchline is that this paper's graph construction, using secondary-structure labels on residues and only the stronger hydrogen bonds as edges, produces better results than standard sequence or distance-based graphs on the benchmarks they tested, and the code is public.

What is new is the energy filter on the hydrogen bonds combined with the secondary structure augmentation. That is a reasonable way to inject folding principles into the graph without adding much complexity.

The paper does the right thing by running the comparison to the obvious baselines and by releasing the implementation. That lets the claim about the inductive bias be checked directly.

The only soft spot I see is that the abstract does not give the size of the improvements or the variance, so a reader has to go to the tables to judge whether the edge is large enough to matter in practice. Everything else lines up.

This is the sort of incremental but well-executed methods paper that people working on protein ML will want to try. It deserves a serious referee because the evaluation tests the central idea and the code is there to inspect.

Referee Report

0 major / 2 minor

Summary. The manuscript introduces SSProNet, a graph neural network for protein representation learning. Residue nodes are augmented with secondary-structure assignments, and edges are defined via energetically filtered hydrogen-bond interactions rather than sequence adjacency or geometric proximity alone. The model is evaluated on standard protein benchmarks, where it reports consistent gains over prior graph-based methods and improved alignment with known structural motifs. Code is released publicly.

Significance. If the empirical gains hold under rigorous controls, the work supplies a biologically motivated inductive bias that directly encodes secondary-structure elements and stabilizing hydrogen bonds, which are central to folding and function. Public code release strengthens reproducibility.

minor comments (2)

[Abstract] Abstract: the claim of 'consistent improvements' should be accompanied in the results section by explicit baseline comparisons, dataset sizes, and statistical significance tests so that the magnitude of gains can be assessed directly.
Methods: the precise energy threshold and filtering procedure for hydrogen-bond edges should be stated with a reference to the underlying energy function or software used.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of our work on SSProNet and the recommendation for minor revision. The report does not list any specific major comments, so we have no points to address point-by-point. We are happy to incorporate any minor suggestions that may arise during the revision process.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper proposes a graph construction for protein representation learning that augments nodes with secondary-structure assignments and defines edges via energy-filtered hydrogen-bond interactions. This is presented as an inductive bias choice and is evaluated empirically on external protein benchmarks with reported gains over sequence-adjacency and geometric baselines. No equations, parameter fits, or self-citations are described that reduce any claimed result to an input by construction. The central claim rests on benchmark comparisons that are independent of the method definition itself.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on domain assumptions from structural biology rather than new mathematical axioms or invented entities.

axioms (2)

domain assumption Secondary structure assignments accurately capture recurring local motifs that influence folding.
Used to augment node representations.
domain assumption Hydrogen-bond energy calculations reliably identify stabilizing long-range interactions.
Used to filter graph edges.

pith-pipeline@v0.9.1-grok · 5753 in / 1026 out tokens · 24808 ms · 2026-06-27T04:58:18.417020+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

51 extracted references · 6 canonical work pages

[1]

Functional evolution of two subtly different (similar) folds.BMC structural biology, 1(1):5, 2001

Vishal Agrawal and Radha KV Kishan. Functional evolution of two subtly different (similar) folds.BMC structural biology, 1(1):5, 2001

2001
[2]

A minimal sequence code for switching protein structure and function.Proceedings of the National Academy of Sciences, 106(50):21149–21154, 2009

Patrick A Alexander, Yanan He, Yihong Chen, John Orban, and Philip N Bryan. A minimal sequence code for switching protein structure and function.Proceedings of the National Academy of Sciences, 106(50):21149–21154, 2009

2009
[3]

Learning to represent programs with graphs.arXiv preprint arXiv:1711.00740, 2017

Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. Learning to represent programs with graphs.arXiv preprint arXiv:1711.00740, 2017

Pith/arXiv arXiv 2017
[4]

Graphqa: protein model quality assessment using graph convolutional networks.Bioinformatics, 37(3):360–366, 2021

Federico Baldassarre, David Menéndez Hurtado, Arne Elofsson, and Hossein Azizpour. Graphqa: protein model quality assessment using graph convolutional networks.Bioinformatics, 37(3):360–366, 2021

2021
[5]

Network biology: understanding the cell’s functional organization.Nature reviews genetics, 5(2):101–113, 2004

Albert-Laszlo Barabasi and Zoltan N Oltvai. Network biology: understanding the cell’s functional organization.Nature reviews genetics, 5(2):101–113, 2004

2004
[6]

Mace: Higher order equivariant message passing neural networks for fast and accurate force fields

Ilyes Batatia, David P Kovacs, Gregor Simm, Christoph Ortner, and Gabor Csanyi. Mace: Higher order equivariant message passing neural networks for fast and accurate force fields. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors,Advances in Neural Information Processing Systems, volume 35, pages 11423–11436. Curran Associates, ...

2022
[7]

Berman, John Westbrook, Zukang Feng, Gary Gilliland, T

Helen M. Berman, John Westbrook, Zukang Feng, Gary Gilliland, T. N. Bhat, Helge Weissig, Ilya N. Shindyalov, and Philip E. Bourne. The protein data bank. Nucleic Acids Research, 28(1):235–242, January 2000. doi: 10.1093/nar/28.1.235

work page doi:10.1093/nar/28.1.235 2000
[8]

Bekkers, and Max Welling

Johannes Brandstetter, Rob Hesselink, Elise van der Pol, Erik J. Bekkers, and Max Welling. Geometric and physical quantities improve E(3) equivariant message passing.CoRR, abs/2110.02905, 2021

arXiv 2021
[9]

Geometric deep learning: going beyond euclidean data.IEEE Signal Processing Magazine, 34(4):18–42, 2017

Michael M Bronstein, Joan Bruna, Yann LeCun, Arthur Szlam, and Pierre Van- dergheynst. Geometric deep learning: going beyond euclidean data.IEEE Signal Processing Magazine, 34(4):18–42, 2017

2017
[10]

Dana, Aleksandras Gutmanas, Nidhi Tyagi, Guoying Qi, Claire O’Donovan, Maria Martin, and Sameer Velankar

Jose M. Dana, Aleksandras Gutmanas, Nidhi Tyagi, Guoying Qi, Claire O’Donovan, Maria Martin, and Sameer Velankar. Sifts: updated structure inte- gration with function, taxonomy and sequences resource allows 40-fold increase in coverage of structure-based annotations for proteins.Nucleic Acids Research, 47(D1):D482–D489, January 2019. doi: 10.1093/nar/gky1114

work page doi:10.1093/nar/gky1114 2019
[11]

Networks, crowds, and markets.Econ

David Easley and Jon Kleinberg. Networks, crowds, and markets.Econ. Theory, 26:1–28, 2010

2010
[12]

Continuous- discrete convolution for geometry-sequence modeling in proteins

Hehe Fan, Zhangyang Wang, Yi Yang, and Mohan Kankanhalli. Continuous- discrete convolution for geometry-sequence modeling in proteins. InThe Eleventh International Conference on Learning Representations, 2022

2022
[13]

Matthias Fey and Jan E. Lenssen. Fast graph representation learning with PyTorch geometric. InICLR Workshop on Representation Learning on Graphs and Manifolds, 2019

2019
[14]

Protein interface prediction using graph convolutional networks.Advances in neural information processing systems, 30, 2017

Alex Fout, Jonathon Byrd, Basir Shariat, and Asa Ben-Hur. Protein interface prediction using graph convolutional networks.Advances in neural information processing systems, 30, 2017

2017
[15]

DPPIN: A biological repository of dynamic protein- protein interaction network data

Dongqi Fu and Jingrui He. DPPIN: A biological repository of dynamic protein- protein interaction network data. InIEEE International Conference on Big Data, Big Data 2022, Osaka, Japan, December 17-20, 2022, pages 5269–5277. IEEE,

2022
[16]

URL https://doi.org/10.1109/ BigData55660.2022.10020904

doi: 10.1109/BIGDATA55660.2022.10020904. URL https://doi.org/10.1109/ BigData55660.2022.10020904

work page doi:10.1109/bigdata55660.2022.10020904 2022
[17]

Torvik, and Jingrui He

Dongqi Fu, Liri Fang, Ross Maciejewski, Vetle I. Torvik, and Jingrui He. Meta- learned metrics over multi-evolution temporal graphs. InKDD ’22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14 - 18, 2022, pages 367–377. ACM, 2022. doi: 10.1145/3534678. 3539313. URL https://doi.org/10.1145/3534678.3539313

work page doi:10.1145/3534678 2022
[18]

Graph u-nets

Hongyang Gao and Shuiwang Ji. Graph u-nets. Ininternational conference on machine learning, pages 2083–2092. PMLR, 2019

2083
[19]

Topology-aware graph pooling networks

Hongyang Gao, Yi Liu, and Shuiwang Ji. Topology-aware graph pooling networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(12):4512–4518, 2021

2021
[20]

Good: A graph out-of- distribution benchmark.Advances in Neural Information Processing Systems, 35: 2059–2073, 2022

Shurui Gui, Xiner Li, Limei Wang, and Shuiwang Ji. Good: A graph out-of- distribution benchmark.Advances in Neural Information Processing Systems, 35: 2059–2073, 2022

2059
[21]

Dssp 4: Fair annotation of protein secondary structure.Protein Science, 2025

Maarten L Hekkelman, Daniel Álvarez Salmoral, Anastassis Perrakis, and Rob- bie P Joosten. Dssp 4: Fair annotation of protein secondary structure.Protein Science, 2025

2025
[22]

Contrastive representation learning for 3d protein structures.arXiv preprint arXiv:2205.15675, 2022

Pedro Hermosilla and Timo Ropinski. Contrastive representation learning for 3d protein structures.arXiv preprint arXiv:2205.15675, 2022

arXiv 2022
[23]

Intrinsic-extrinsic convolution and pooling for learning on 3d protein structures

Pedro Hermosilla, Marco Schäfer, Matěj Lang, Gloria Fackelmann, Pere Pau Vázquez, Barbora Kozlíková, Michael Krone, Tobias Ritschel, and Timo Ropinski. Intrinsic-extrinsic convolution and pooling for learning on 3d protein structures. arXiv preprint arXiv:2007.06252, 2020

arXiv 2007
[24]

Jie Hou, Badri Adhikari, and Jianlin Cheng. Deepsf: deep convolutional neural network for mapping protein sequences to folds.Bioinformatics, 34(8):1295–1303, Protein Representation Learning with Secondary-Structure and Energy-Filtered Hydrogen-Bond Graphs BIOKDD ’26, August 10, 2026, Jeju, Korea 2018

2026
[25]

Learning complete protein representation by dynamically coupling of sequence and structure.Advances in Neural Information Processing Systems, 37:137673–137697, 2024

Bozhen Hu, Cheng Tan, Jun Xia, Yue Liu, Lirong Wu, Jiangbin Zheng, Yongjie Xu, Yufei Huang, and Stan Z Li. Learning complete protein representation by dynamically coupling of sequence and structure.Advances in Neural Information Processing Systems, 37:137673–137697, 2024

2024
[26]

Open graph benchmark: Datasets for machine learning on graphs.Advances in neural information processing systems, 33:22118–22133, 2020

Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. Open graph benchmark: Datasets for machine learning on graphs.Advances in neural information processing systems, 33:22118–22133, 2020

2020
[27]

Learning from protein structure with geometric vector perceptrons.arXiv preprint arXiv:2009.01411, 2020

Bowen Jing, Stephan Eismann, Patricia Suriana, Raphael JL Townshend, and Ron Dror. Learning from protein structure with geometric vector perceptrons.arXiv preprint arXiv:2009.01411, 2020

arXiv 2009
[28]

Highly accurate protein structure prediction with alphafold

John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, et al. Highly accurate protein structure prediction with alphafold. nature, 596(7873):583–589, 2021

2021
[29]

Kingma and Jimmy Ba

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InInternational Conference on Learning Representations (ICLR), 2015

2015
[30]

Kipf and Max Welling

Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional networks.ICLR, 2017

2017
[31]

Structural patterns in globular proteins

Michael Levitt and Cyrus Chothia. Structural patterns in globular proteins. Nature, 261(5561):552–558, 1976

1976
[32]

Directed weight neural networks for protein structure representation learning.arXiv preprint arXiv:2201.13299, 2022

Jiahan Li. Directed weight neural networks for protein structure representation learning.arXiv preprint arXiv:2201.13299, 2022

arXiv 2022
[33]

Spherical message passing for 3d molecular graphs

Yi Liu, Limei Wang, Meng Liu, Yuchao Lin, Xuan Zhang, Bora Oztekin, and Shuiwang Ji. Spherical message passing for 3d molecular graphs. InInternational Conference on Learning Representations (ICLR), 2022

2022
[34]

Pdb-wide collection of binding data: current status of the pdbbind database.Bioinformatics, 31(3):405–412, February 2015

Zhihai Liu, Yan Li, Li Han, Jie Li, Jie Liu, Zhixiong Zhao, Wei Nie, Yuchen Liu, and Renxiao Wang. Pdb-wide collection of binding data: current status of the pdbbind database.Bioinformatics, 31(3):405–412, February 2015. doi: 10.1093/ bioinformatics/btu626

2015
[35]

One transformer can understand both 2d & 3d molecular data.arXiv preprint arXiv:2210.01765, 2022

Shengjie Luo, Tianlang Chen, Yixian Xu, Shuxin Zheng, Tie-Yan Liu, Liwei Wang, and Di He. One transformer can understand both 2d & 3d molecular data.arXiv preprint arXiv:2210.01765, 2022

arXiv 2022
[36]

Weisfeiler and leman go neural: Higher-order graph neural networks

Christopher Morris, Martin Ritzert, Matthias Fey, William L Hamilton, Jan Eric Lenssen, Gaurav Rattan, and Martin Grohe. Weisfeiler and leman go neural: Higher-order graph neural networks. InProceedings of the AAAI conference on artificial intelligence, volume 33, pages 4602–4609, 2019

2019
[37]

Omelchenko, Michael Y

Mikhail V. Omelchenko, Michael Y. Galperin, Yuri I. Wolf, et al. Non-homologous isofunctional enzymes: A systematic analysis of alternative solutions in enzyme evolution.Biology Direct, 5:31, 2010. doi: 10.1186/1745-6150-5-31

work page doi:10.1186/1745-6150-5-31 2010
[38]

Pytorch: An imperative style, high-performance deep learning library

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Des- maison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high-per...

2019
[39]

Springer Science & Business Media, 2013

Georg E Schulz and R Heiner Schirmer.Principles of protein structure. Springer Science & Business Media, 2013

2013
[40]

Dynamic edge-conditioned filters in convolutional neural networks on graphs

Martin Simonovsky and Nikos Komodakis. Dynamic edge-conditioned filters in convolutional neural networks on graphs. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 3693–3702, 2017

2017
[41]

Multi-scale representation learning on proteins.Advances in Neural Information Processing Systems, 34, 2021

Vignesh Ram Somnath, Charlotte Bunne, and Andreas Krause. Multi-scale representation learning on proteins.Advances in Neural Information Processing Systems, 34, 2021

2021
[42]

Anderson, Stephan Eismann, Risi Kondor, Russ Alt- man, and Ron O

Raphael John Lamarre Townshend, Martin Vögele, Patricia Adriana Suriana, Alexander Derry, Alexander Powers, Yianni Laloudakis, Sidhika Balachandar, Bowen Jing, Brandon M. Anderson, Stephan Eismann, Risi Kondor, Russ Alt- man, and Ron O. Dror. ATOM3D: Tasks on molecules in three dimensions. In Thirty-fifth Conference on Neural Information Processing System...

2021
[43]

Learning hierar- chical protein representations via complete 3d graph networks.ICLR, 2022

Limei Wang, Haoran Liu, Yi Liu, Jerry Kurtin, and Shuiwang Ji. Learning hierar- chical protein representations via complete 3d graph networks.ICLR, 2022

2022
[44]

The pdbbind database: Collection of binding affinities for protein–ligand complexes with known three-dimensional structures.Journal of Medicinal Chemistry, 47(12): 2977–2980, 2004

Renxiao Wang, Xueliang Fang, Yipin Lu, and Shaomeng Wang. The pdbbind database: Collection of binding affinities for protein–ligand complexes with known three-dimensional structures.Journal of Medicinal Chemistry, 47(12): 2977–2980, 2004. doi: 10.1021/jm030580l

work page doi:10.1021/jm030580l 2004
[45]

A theoretically-principled sparse, connected, and rigid graph repre- sentation of molecules

Shih-Hsin Wang, Yuhao Huang, Justin M Baker, Yuan-En Sun, Qi Tang, and Bao Wang. A theoretically-principled sparse, connected, and rigid graph repre- sentation of molecules. InThe Thirteenth International Conference on Learning Representations, 2025

2025
[46]

Advanced graph and se- quence neural networks for molecular property prediction and drug discovery

Zhengyang Wang, Meng Liu, Youzhi Luo, Zhao Xu, Yaochen Xie, Limei Wang, Lei Cai, Qi Qi, Zhuoning Yuan, Tianbao Yang, et al. Advanced graph and se- quence neural networks for molecular property prediction and drug discovery. Bioinformatics, 38(9):2579–2586, 2022

2022
[47]

Moleculenet: a bench- mark for molecular machine learning.Chemical science, 9(2):513–530, 2018

Zhenqin Wu, Bharath Ramsundar, Evan N Feinberg, Joseph Gomes, Caleb Ge- niesse, Aneesh S Pappu, Karl Leswing, and Vijay Pande. Moleculenet: a bench- mark for molecular machine learning.Chemical science, 9(2):513–530, 2018

2018
[48]

Task-agnostic graph explanations.Advances in neural information processing systems, 35:12027–12039, 2022

Yaochen Xie, Sumeet Katariya, Xianfeng Tang, Edward Huang, Nikhil Rao, Karthik Subbian, and Shuiwang Ji. Task-agnostic graph explanations.Advances in neural information processing systems, 35:12027–12039, 2022

2022
[49]

Periodic graph transformers for crystal material property prediction.Advances in Neural Information Processing Systems, 35:15066–15080, 2022

Keqiang Yan, Yi Liu, Yuchao Lin, and Shuiwang Ji. Periodic graph transformers for crystal material property prediction.Advances in Neural Information Processing Systems, 35:15066–15080, 2022

2022
[50]

Graphfm: Improving large-scale gnn training via feature momentum

Haiyang Yu, Limei Wang, Bokun Wang, Meng Liu, Tianbao Yang, and Shuiwang Ji. Graphfm: Improving large-scale gnn training via feature momentum. In International conference on machine learning, pages 25684–25701. PMLR, 2022

2022
[51]

Protein representation learning by geometric structure pretraining.ICLR, 2023

Zuobai Zhang, Minghao Xu, Arian Jamasb, Vijil Chenthamarakshan, Aurelie Lozano, Payel Das, and Jian Tang. Protein representation learning by geometric structure pretraining.ICLR, 2023

2023

[1] [1]

Functional evolution of two subtly different (similar) folds.BMC structural biology, 1(1):5, 2001

Vishal Agrawal and Radha KV Kishan. Functional evolution of two subtly different (similar) folds.BMC structural biology, 1(1):5, 2001

2001

[2] [2]

A minimal sequence code for switching protein structure and function.Proceedings of the National Academy of Sciences, 106(50):21149–21154, 2009

Patrick A Alexander, Yanan He, Yihong Chen, John Orban, and Philip N Bryan. A minimal sequence code for switching protein structure and function.Proceedings of the National Academy of Sciences, 106(50):21149–21154, 2009

2009

[3] [3]

Learning to represent programs with graphs.arXiv preprint arXiv:1711.00740, 2017

Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. Learning to represent programs with graphs.arXiv preprint arXiv:1711.00740, 2017

Pith/arXiv arXiv 2017

[4] [4]

Graphqa: protein model quality assessment using graph convolutional networks.Bioinformatics, 37(3):360–366, 2021

Federico Baldassarre, David Menéndez Hurtado, Arne Elofsson, and Hossein Azizpour. Graphqa: protein model quality assessment using graph convolutional networks.Bioinformatics, 37(3):360–366, 2021

2021

[5] [5]

Network biology: understanding the cell’s functional organization.Nature reviews genetics, 5(2):101–113, 2004

Albert-Laszlo Barabasi and Zoltan N Oltvai. Network biology: understanding the cell’s functional organization.Nature reviews genetics, 5(2):101–113, 2004

2004

[6] [6]

Mace: Higher order equivariant message passing neural networks for fast and accurate force fields

Ilyes Batatia, David P Kovacs, Gregor Simm, Christoph Ortner, and Gabor Csanyi. Mace: Higher order equivariant message passing neural networks for fast and accurate force fields. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors,Advances in Neural Information Processing Systems, volume 35, pages 11423–11436. Curran Associates, ...

2022

[7] [7]

Berman, John Westbrook, Zukang Feng, Gary Gilliland, T

Helen M. Berman, John Westbrook, Zukang Feng, Gary Gilliland, T. N. Bhat, Helge Weissig, Ilya N. Shindyalov, and Philip E. Bourne. The protein data bank. Nucleic Acids Research, 28(1):235–242, January 2000. doi: 10.1093/nar/28.1.235

work page doi:10.1093/nar/28.1.235 2000

[8] [8]

Bekkers, and Max Welling

Johannes Brandstetter, Rob Hesselink, Elise van der Pol, Erik J. Bekkers, and Max Welling. Geometric and physical quantities improve E(3) equivariant message passing.CoRR, abs/2110.02905, 2021

arXiv 2021

[9] [9]

Geometric deep learning: going beyond euclidean data.IEEE Signal Processing Magazine, 34(4):18–42, 2017

Michael M Bronstein, Joan Bruna, Yann LeCun, Arthur Szlam, and Pierre Van- dergheynst. Geometric deep learning: going beyond euclidean data.IEEE Signal Processing Magazine, 34(4):18–42, 2017

2017

[10] [10]

Dana, Aleksandras Gutmanas, Nidhi Tyagi, Guoying Qi, Claire O’Donovan, Maria Martin, and Sameer Velankar

Jose M. Dana, Aleksandras Gutmanas, Nidhi Tyagi, Guoying Qi, Claire O’Donovan, Maria Martin, and Sameer Velankar. Sifts: updated structure inte- gration with function, taxonomy and sequences resource allows 40-fold increase in coverage of structure-based annotations for proteins.Nucleic Acids Research, 47(D1):D482–D489, January 2019. doi: 10.1093/nar/gky1114

work page doi:10.1093/nar/gky1114 2019

[11] [11]

Networks, crowds, and markets.Econ

David Easley and Jon Kleinberg. Networks, crowds, and markets.Econ. Theory, 26:1–28, 2010

2010

[12] [12]

Continuous- discrete convolution for geometry-sequence modeling in proteins

Hehe Fan, Zhangyang Wang, Yi Yang, and Mohan Kankanhalli. Continuous- discrete convolution for geometry-sequence modeling in proteins. InThe Eleventh International Conference on Learning Representations, 2022

2022

[13] [13]

Matthias Fey and Jan E. Lenssen. Fast graph representation learning with PyTorch geometric. InICLR Workshop on Representation Learning on Graphs and Manifolds, 2019

2019

[14] [14]

Protein interface prediction using graph convolutional networks.Advances in neural information processing systems, 30, 2017

Alex Fout, Jonathon Byrd, Basir Shariat, and Asa Ben-Hur. Protein interface prediction using graph convolutional networks.Advances in neural information processing systems, 30, 2017

2017

[15] [15]

DPPIN: A biological repository of dynamic protein- protein interaction network data

Dongqi Fu and Jingrui He. DPPIN: A biological repository of dynamic protein- protein interaction network data. InIEEE International Conference on Big Data, Big Data 2022, Osaka, Japan, December 17-20, 2022, pages 5269–5277. IEEE,

2022

[16] [16]

URL https://doi.org/10.1109/ BigData55660.2022.10020904

doi: 10.1109/BIGDATA55660.2022.10020904. URL https://doi.org/10.1109/ BigData55660.2022.10020904

work page doi:10.1109/bigdata55660.2022.10020904 2022

[17] [17]

Torvik, and Jingrui He

Dongqi Fu, Liri Fang, Ross Maciejewski, Vetle I. Torvik, and Jingrui He. Meta- learned metrics over multi-evolution temporal graphs. InKDD ’22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14 - 18, 2022, pages 367–377. ACM, 2022. doi: 10.1145/3534678. 3539313. URL https://doi.org/10.1145/3534678.3539313

work page doi:10.1145/3534678 2022

[18] [18]

Graph u-nets

Hongyang Gao and Shuiwang Ji. Graph u-nets. Ininternational conference on machine learning, pages 2083–2092. PMLR, 2019

2083

[19] [19]

Topology-aware graph pooling networks

Hongyang Gao, Yi Liu, and Shuiwang Ji. Topology-aware graph pooling networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(12):4512–4518, 2021

2021

[20] [20]

Good: A graph out-of- distribution benchmark.Advances in Neural Information Processing Systems, 35: 2059–2073, 2022

Shurui Gui, Xiner Li, Limei Wang, and Shuiwang Ji. Good: A graph out-of- distribution benchmark.Advances in Neural Information Processing Systems, 35: 2059–2073, 2022

2059

[21] [21]

Dssp 4: Fair annotation of protein secondary structure.Protein Science, 2025

Maarten L Hekkelman, Daniel Álvarez Salmoral, Anastassis Perrakis, and Rob- bie P Joosten. Dssp 4: Fair annotation of protein secondary structure.Protein Science, 2025

2025

[22] [22]

Contrastive representation learning for 3d protein structures.arXiv preprint arXiv:2205.15675, 2022

Pedro Hermosilla and Timo Ropinski. Contrastive representation learning for 3d protein structures.arXiv preprint arXiv:2205.15675, 2022

arXiv 2022

[23] [23]

Intrinsic-extrinsic convolution and pooling for learning on 3d protein structures

Pedro Hermosilla, Marco Schäfer, Matěj Lang, Gloria Fackelmann, Pere Pau Vázquez, Barbora Kozlíková, Michael Krone, Tobias Ritschel, and Timo Ropinski. Intrinsic-extrinsic convolution and pooling for learning on 3d protein structures. arXiv preprint arXiv:2007.06252, 2020

arXiv 2007

[24] [24]

Jie Hou, Badri Adhikari, and Jianlin Cheng. Deepsf: deep convolutional neural network for mapping protein sequences to folds.Bioinformatics, 34(8):1295–1303, Protein Representation Learning with Secondary-Structure and Energy-Filtered Hydrogen-Bond Graphs BIOKDD ’26, August 10, 2026, Jeju, Korea 2018

2026

[25] [25]

Learning complete protein representation by dynamically coupling of sequence and structure.Advances in Neural Information Processing Systems, 37:137673–137697, 2024

Bozhen Hu, Cheng Tan, Jun Xia, Yue Liu, Lirong Wu, Jiangbin Zheng, Yongjie Xu, Yufei Huang, and Stan Z Li. Learning complete protein representation by dynamically coupling of sequence and structure.Advances in Neural Information Processing Systems, 37:137673–137697, 2024

2024

[26] [26]

Open graph benchmark: Datasets for machine learning on graphs.Advances in neural information processing systems, 33:22118–22133, 2020

Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. Open graph benchmark: Datasets for machine learning on graphs.Advances in neural information processing systems, 33:22118–22133, 2020

2020

[27] [27]

Learning from protein structure with geometric vector perceptrons.arXiv preprint arXiv:2009.01411, 2020

Bowen Jing, Stephan Eismann, Patricia Suriana, Raphael JL Townshend, and Ron Dror. Learning from protein structure with geometric vector perceptrons.arXiv preprint arXiv:2009.01411, 2020

arXiv 2009

[28] [28]

Highly accurate protein structure prediction with alphafold

John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, et al. Highly accurate protein structure prediction with alphafold. nature, 596(7873):583–589, 2021

2021

[29] [29]

Kingma and Jimmy Ba

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InInternational Conference on Learning Representations (ICLR), 2015

2015

[30] [30]

Kipf and Max Welling

Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional networks.ICLR, 2017

2017

[31] [31]

Structural patterns in globular proteins

Michael Levitt and Cyrus Chothia. Structural patterns in globular proteins. Nature, 261(5561):552–558, 1976

1976

[32] [32]

Directed weight neural networks for protein structure representation learning.arXiv preprint arXiv:2201.13299, 2022

Jiahan Li. Directed weight neural networks for protein structure representation learning.arXiv preprint arXiv:2201.13299, 2022

arXiv 2022

[33] [33]

Spherical message passing for 3d molecular graphs

Yi Liu, Limei Wang, Meng Liu, Yuchao Lin, Xuan Zhang, Bora Oztekin, and Shuiwang Ji. Spherical message passing for 3d molecular graphs. InInternational Conference on Learning Representations (ICLR), 2022

2022

[34] [34]

Pdb-wide collection of binding data: current status of the pdbbind database.Bioinformatics, 31(3):405–412, February 2015

Zhihai Liu, Yan Li, Li Han, Jie Li, Jie Liu, Zhixiong Zhao, Wei Nie, Yuchen Liu, and Renxiao Wang. Pdb-wide collection of binding data: current status of the pdbbind database.Bioinformatics, 31(3):405–412, February 2015. doi: 10.1093/ bioinformatics/btu626

2015

[35] [35]

One transformer can understand both 2d & 3d molecular data.arXiv preprint arXiv:2210.01765, 2022

Shengjie Luo, Tianlang Chen, Yixian Xu, Shuxin Zheng, Tie-Yan Liu, Liwei Wang, and Di He. One transformer can understand both 2d & 3d molecular data.arXiv preprint arXiv:2210.01765, 2022

arXiv 2022

[36] [36]

Weisfeiler and leman go neural: Higher-order graph neural networks

Christopher Morris, Martin Ritzert, Matthias Fey, William L Hamilton, Jan Eric Lenssen, Gaurav Rattan, and Martin Grohe. Weisfeiler and leman go neural: Higher-order graph neural networks. InProceedings of the AAAI conference on artificial intelligence, volume 33, pages 4602–4609, 2019

2019

[37] [37]

Omelchenko, Michael Y

Mikhail V. Omelchenko, Michael Y. Galperin, Yuri I. Wolf, et al. Non-homologous isofunctional enzymes: A systematic analysis of alternative solutions in enzyme evolution.Biology Direct, 5:31, 2010. doi: 10.1186/1745-6150-5-31

work page doi:10.1186/1745-6150-5-31 2010

[38] [38]

Pytorch: An imperative style, high-performance deep learning library

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Des- maison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high-per...

2019

[39] [39]

Springer Science & Business Media, 2013

Georg E Schulz and R Heiner Schirmer.Principles of protein structure. Springer Science & Business Media, 2013

2013

[40] [40]

Dynamic edge-conditioned filters in convolutional neural networks on graphs

Martin Simonovsky and Nikos Komodakis. Dynamic edge-conditioned filters in convolutional neural networks on graphs. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 3693–3702, 2017

2017

[41] [41]

Multi-scale representation learning on proteins.Advances in Neural Information Processing Systems, 34, 2021

Vignesh Ram Somnath, Charlotte Bunne, and Andreas Krause. Multi-scale representation learning on proteins.Advances in Neural Information Processing Systems, 34, 2021

2021

[42] [42]

Anderson, Stephan Eismann, Risi Kondor, Russ Alt- man, and Ron O

Raphael John Lamarre Townshend, Martin Vögele, Patricia Adriana Suriana, Alexander Derry, Alexander Powers, Yianni Laloudakis, Sidhika Balachandar, Bowen Jing, Brandon M. Anderson, Stephan Eismann, Risi Kondor, Russ Alt- man, and Ron O. Dror. ATOM3D: Tasks on molecules in three dimensions. In Thirty-fifth Conference on Neural Information Processing System...

2021

[43] [43]

Learning hierar- chical protein representations via complete 3d graph networks.ICLR, 2022

Limei Wang, Haoran Liu, Yi Liu, Jerry Kurtin, and Shuiwang Ji. Learning hierar- chical protein representations via complete 3d graph networks.ICLR, 2022

2022

[44] [44]

The pdbbind database: Collection of binding affinities for protein–ligand complexes with known three-dimensional structures.Journal of Medicinal Chemistry, 47(12): 2977–2980, 2004

Renxiao Wang, Xueliang Fang, Yipin Lu, and Shaomeng Wang. The pdbbind database: Collection of binding affinities for protein–ligand complexes with known three-dimensional structures.Journal of Medicinal Chemistry, 47(12): 2977–2980, 2004. doi: 10.1021/jm030580l

work page doi:10.1021/jm030580l 2004

[45] [45]

A theoretically-principled sparse, connected, and rigid graph repre- sentation of molecules

Shih-Hsin Wang, Yuhao Huang, Justin M Baker, Yuan-En Sun, Qi Tang, and Bao Wang. A theoretically-principled sparse, connected, and rigid graph repre- sentation of molecules. InThe Thirteenth International Conference on Learning Representations, 2025

2025

[46] [46]

Advanced graph and se- quence neural networks for molecular property prediction and drug discovery

Zhengyang Wang, Meng Liu, Youzhi Luo, Zhao Xu, Yaochen Xie, Limei Wang, Lei Cai, Qi Qi, Zhuoning Yuan, Tianbao Yang, et al. Advanced graph and se- quence neural networks for molecular property prediction and drug discovery. Bioinformatics, 38(9):2579–2586, 2022

2022

[47] [47]

Moleculenet: a bench- mark for molecular machine learning.Chemical science, 9(2):513–530, 2018

Zhenqin Wu, Bharath Ramsundar, Evan N Feinberg, Joseph Gomes, Caleb Ge- niesse, Aneesh S Pappu, Karl Leswing, and Vijay Pande. Moleculenet: a bench- mark for molecular machine learning.Chemical science, 9(2):513–530, 2018

2018

[48] [48]

Task-agnostic graph explanations.Advances in neural information processing systems, 35:12027–12039, 2022

Yaochen Xie, Sumeet Katariya, Xianfeng Tang, Edward Huang, Nikhil Rao, Karthik Subbian, and Shuiwang Ji. Task-agnostic graph explanations.Advances in neural information processing systems, 35:12027–12039, 2022

2022

[49] [49]

Periodic graph transformers for crystal material property prediction.Advances in Neural Information Processing Systems, 35:15066–15080, 2022

Keqiang Yan, Yi Liu, Yuchao Lin, and Shuiwang Ji. Periodic graph transformers for crystal material property prediction.Advances in Neural Information Processing Systems, 35:15066–15080, 2022

2022

[50] [50]

Graphfm: Improving large-scale gnn training via feature momentum

Haiyang Yu, Limei Wang, Bokun Wang, Meng Liu, Tianbao Yang, and Shuiwang Ji. Graphfm: Improving large-scale gnn training via feature momentum. In International conference on machine learning, pages 25684–25701. PMLR, 2022

2022

[51] [51]

Protein representation learning by geometric structure pretraining.ICLR, 2023

Zuobai Zhang, Minghao Xu, Arian Jamasb, Vijil Chenthamarakshan, Aurelie Lozano, Payel Das, and Jian Tang. Protein representation learning by geometric structure pretraining.ICLR, 2023

2023