Mamba-Based Graph Convolutional Networks: Tackling Over-smoothing with Selective State Space
Pith reviewed 2026-05-23 04:39 UTC · model grok-4.3
The pith
MbaGCN borrows selective state space modeling from sequences to let graph networks distinguish neighborhood messages and avoid over-smoothing in deep layers.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MbaGCN introduces a new GNN backbone built from the Message Aggregation Layer, the Selective State Space Transition Layer, and the Node State Prediction Layer. These components together adaptively aggregate neighborhood information by importing the selective state space mechanism originally developed for sequence modeling. The resulting architecture supplies greater flexibility and scalability to deep graph models and thereby addresses the root cause of over-smoothing.
What carries the argument
The Selective State Space Transition Layer, which applies input-dependent selection to neighborhood messages so that more relevant information is retained while less relevant information is filtered.
If this is right
- Deep GNNs become able to maintain distinguishable node representations instead of collapsing to a single vector.
- Message aggregation gains input-dependent flexibility that fixed-sum or mean operations lack.
- The same three-layer pattern supplies a reusable starting point for other sequence-to-graph transfers.
- Scalability improves because the selective mechanism does not require extra normalization or residual tricks to reach greater depths.
Where Pith is reading between the lines
- If the transfer succeeds, the same selective-state idea could be tested on other non-Euclidean domains such as meshes or point clouds.
- One could measure whether the added selection step changes the effective receptive field size compared with ordinary message passing.
- The framework invites direct replacement of the state-space transition with other linear-time sequence operators to test which property of Mamba matters most for graphs.
Load-bearing premise
The selective state space selection rule developed for linear sequences can be moved directly onto graph neighborhoods and will correctly rank the importance of messages arriving from different nodes.
What would settle it
Training MbaGCN at increasing depths on a standard citation or social graph and measuring whether the average pairwise distance between node embeddings continues to shrink toward zero.
Figures
read the original abstract
Graph Neural Networks (GNNs) have shown great success in various graph-based learning tasks. However, it often faces the issue of over-smoothing as the model depth increases, which causes all node representations to converge to a single value and become indistinguishable. This issue stems from the inherent limitations of GNNs, which struggle to distinguish the importance of information from different neighborhoods. In this paper, we introduce MbaGCN, a novel graph convolutional architecture that draws inspiration from the Mamba paradigm-originally designed for sequence modeling. MbaGCN presents a new backbone for GNNs, consisting of three key components: the Message Aggregation Layer, the Selective State Space Transition Layer, and the Node State Prediction Layer. These components work in tandem to adaptively aggregate neighborhood information, providing greater flexibility and scalability for deep GNN models. While MbaGCN may not consistently outperform all existing methods on each dataset, it provides a foundational framework that demonstrates the effective integration of the Mamba paradigm into graph representation learning. Through extensive experiments on benchmark datasets, we demonstrate that MbaGCN paves the way for future advancements in graph neural network research.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces MbaGCN, a novel GNN backbone inspired by the Mamba selective state space model. It consists of three components—the Message Aggregation Layer, the Selective State Space Transition Layer, and the Node State Prediction Layer—that are claimed to work together to adaptively aggregate neighborhood information, thereby mitigating over-smoothing and enabling deeper, more scalable GNNs. Experiments on benchmark datasets are presented to support the framework, with the authors noting that it may not consistently outperform all existing methods.
Significance. If the claimed transfer of Mamba's input-dependent selectivity to graph neighborhoods can be rigorously shown to preserve stability and permutation invariance while distinguishing neighborhood importance, the work would provide a new architectural primitive for deep GNNs. This could open avenues for scalable models that avoid the uniform convergence typical of repeated message passing. The honest admission that performance gains are not guaranteed across all datasets is a positive aspect of the presentation.
major comments (2)
- The Selective State Space Transition Layer is presented as the key mechanism for distinguishing importance across neighborhoods via Mamba-style input-dependent selection. However, the manuscript provides no equations or derivation showing how the discretization parameters (B, C, Δ) or the state transition are redefined to operate on unordered graph neighborhoods rather than ordered 1D sequences (see abstract and model description). This mapping is load-bearing for the central claim that the architecture solves over-smoothing through adaptive aggregation.
- No stability analysis or multi-hop propagation properties are given for the hidden-state update under the proposed graph-adapted SSM. Without this, it is unclear whether repeated application of the Selective State Space Transition Layer remains well-defined or avoids the very convergence the paper seeks to prevent.
minor comments (1)
- The abstract repeats the high-level description of the three layers without adding concrete technical distinctions; a single crisp sentence on the novelty of the state-transition adaptation would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and for recognizing the potential of transferring Mamba-style selectivity to graph neighborhoods. We address each major comment below and will revise the manuscript accordingly to provide the requested mathematical details and analysis.
read point-by-point responses
-
Referee: The Selective State Space Transition Layer is presented as the key mechanism for distinguishing importance across neighborhoods via Mamba-style input-dependent selection. However, the manuscript provides no equations or derivation showing how the discretization parameters (B, C, Δ) or the state transition are redefined to operate on unordered graph neighborhoods rather than ordered 1D sequences (see abstract and model description). This mapping is load-bearing for the central claim that the architecture solves over-smoothing through adaptive aggregation.
Authors: We agree that an explicit derivation of the graph adaptation is necessary to substantiate the central claim. In the revised manuscript we will insert a new subsection (immediately following the description of the Selective State Space Transition Layer) that supplies the missing equations. The subsection will (i) define how the discretization parameters B, C, and Δ are computed from the aggregated neighborhood features rather than from a linear sequence, (ii) show that the resulting state transition remains permutation-invariant by construction (via symmetric aggregation before the selective SSM step), and (iii) demonstrate that input-dependent selectivity is preserved because the selection is driven by the aggregated node features rather than by arbitrary node ordering. revision: yes
-
Referee: No stability analysis or multi-hop propagation properties are given for the hidden-state update under the proposed graph-adapted SSM. Without this, it is unclear whether repeated application of the Selective State Space Transition Layer remains well-defined or avoids the very convergence the paper seeks to prevent.
Authors: We acknowledge the absence of a formal stability argument. The revised manuscript will contain an additional analysis section that examines the multi-hop behavior of the graph-adapted SSM. The section will (i) provide a recurrence relation for the hidden-state update across layers, (ii) argue that the selective mechanism (via input-dependent Δ) introduces an adaptive forgetting factor that prevents uniform convergence to a single representation, and (iii) support the argument with both a sketch of bounded-norm propagation and new empirical results on deeper (8- and 16-layer) variants of MbaGCN. revision: yes
Circularity Check
No circularity detected; architecture proposal is self-contained
full rationale
The paper proposes MbaGCN as a new GNN backbone inspired by the external Mamba paradigm, describing three components at a conceptual level to adaptively aggregate neighborhood information. No equations, parameter fits, self-citations, or uniqueness theorems appear in the provided text that would reduce any claim to a self-referential definition or input. The transfer of selective state space ideas is presented as an analogy for design, not a derived result that loops back by construction. This is a standard model-introduction paper with independent content.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Graph mamba: Towards learning on graphs with state space models
[Behrouz and Hashemi, 2024] Ali Behrouz and Farnoosh Hashemi. Graph mamba: Towards learning on graphs with state space models. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 119–130,
work page 2024
-
[2]
Simple and deep graph convolutional networks
[Chen et al., 2020] Ming Chen, Zhewei Wei, Zengfeng Huang, Bolin Ding, and Yaliang Li. Simple and deep graph convolutional networks. InInternational conference on machine learning, pages 1725–1735. PMLR,
work page 2020
-
[3]
Adaptive universal generalized pagerank graph neural network
[Chien et al., 2020] Eli Chien, Jianhao Peng, Pan Li, and Ol- gica Milenkovic. Adaptive universal generalized pagerank graph neural network. arXiv preprint arXiv:2006.07988,
-
[4]
[Dao and Gu, 2024] Tri Dao and Albert Gu. Transformers are ssms: Generalized models and efficient algorithms through structured state space duality. arXiv preprint arXiv:2405.21060,
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[5]
Recurrent distance filtering for graph representation learning
[Ding et al., 2024] Yuhui Ding, Antonio Orvieto, Bobby He, and Thomas Hofmann. Recurrent distance filtering for graph representation learning. In Forty-first International Conference on Machine Learning,
work page 2024
-
[6]
Predict then propa- gate: Graph neural networks meet personalized pagerank
[Gasteiger et al., 2018] Johannes Gasteiger, Aleksandar Bo- jchevski, and Stephan G ¨unnemann. Predict then propa- gate: Graph neural networks meet personalized pagerank. arXiv preprint arXiv:1810.05997,
-
[7]
Is mamba capable of in-context learning? arXiv preprint arXiv:2402.03170,
[Grazzi et al., 2024] Riccardo Grazzi, Julien Siems, Si- mon Schrodi, Thomas Brox, and Frank Hutter. Is mamba capable of in-context learning? arXiv preprint arXiv:2402.03170,
-
[8]
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
[Gu and Dao, 2023] Albert Gu and Tri Dao. Mamba: Linear- time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752,
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[9]
Hippo: Recurrent memory with optimal polynomial projections
[Gu et al., 2020] Albert Gu, Tri Dao, Stefano Ermon, Atri Rudra, and Christopher R ´e. Hippo: Recurrent memory with optimal polynomial projections. Advances in neural information processing systems, 33:1474–1487,
work page 2020
-
[10]
Efficiently Modeling Long Sequences with Structured State Spaces
[Gu et al., 2021] Albert Gu, Karan Goel, and Christopher R´e. Efficiently modeling long sequences with structured state spaces. arXiv preprint arXiv:2111.00396,
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[11]
Bernnet: Learning arbitrary graph spectral filters via bernstein approximation
[He et al., 2021] Mingguo He, Zhewei Wei, Hongteng Xu, et al. Bernnet: Learning arbitrary graph spectral filters via bernstein approximation. Advances in Neural Information Processing Systems, 34:14239–14251,
work page 2021
-
[12]
Zigma: A dit-style zigzag mamba diffusion model
[Hu et al., 2025] Vincent Tao Hu, Stefan Andreas Baumann, Ming Gui, Olga Grebenkova, Pingchuan Ma, Johannes Fischer, and Bj ¨orn Ommer. Zigma: A dit-style zigzag mamba diffusion model. InEuropean Conference on Com- puter Vision, pages 148–166. Springer,
work page 2025
-
[13]
Localmamba: Visual state space model with windowed selective scan
[Huang et al., 2024] Tao Huang, Xiaohuan Pei, Shan You, Fei Wang, Chen Qian, and Chang Xu. Localmamba: Visual state space model with windowed selective scan. arXiv preprint arXiv:2403.09338,
-
[14]
Node similarity preserving graph convolutional networks
[Jin et al., 2021] Wei Jin, Tyler Derr, Yiqi Wang, Yao Ma, Zitao Liu, and Jiliang Tang. Node similarity preserving graph convolutional networks. In Proceedings of the 14th ACM international conference on web search and data mining, pages 148–156,
work page 2021
-
[15]
Semi-Supervised Classification with Graph Convolutional Networks
[Kipf and Welling, 2016] Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907,
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[16]
Deeper insights into graph convolutional networks for semi-supervised learning
[Li et al., 2018] Qimai Li, Zhichao Han, and Xiao-Ming Wu. Deeper insights into graph convolutional networks for semi-supervised learning. In Proceedings of the AAAI con- ference on artificial intelligence, volume 32,
work page 2018
-
[17]
Dual graph convolutional networks for aspect-based sentiment analy- sis
[Li et al., 2021] Ruifan Li, Hao Chen, Fangxiang Feng, Zhanyu Ma, Xiaojie Wang, and Eduard Hovy. Dual graph convolutional networks for aspect-based sentiment analy- sis. In Proceedings of the 59th Annual Meeting of the Asso- ciation for Computational Linguistics and the 11th Inter- national Joint Conference on Natural Language Process- ing (Volume 1: Long...
work page 2021
-
[18]
Jamba: A Hybrid Transformer-Mamba Language Model
[Lieber et al., 2024] Opher Lieber, Barak Lenz, Hofit Bata, Gal Cohen, Jhonathan Osin, Itay Dalmedigos, Erez Safahi, Shaked Meirom, Yonatan Belinkov, Shai Shalev-Shwartz, et al. Jamba: A hybrid transformer-mamba language model. arXiv preprint arXiv:2403.19887,
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[19]
Rethinking in- dependent cross-entropy loss for graph-structured data
[Miao et al., 2024] Rui Miao, Kaixiong Zhou, Yili Wang, Ninghao Liu, Ying Wang, and Xin Wang. Rethinking in- dependent cross-entropy loss for graph-structured data. In Proceedings of the 41st International Conference on Ma- chine Learning, pages 35570–35589,
work page 2024
-
[20]
Learning graph ode for continuous- time sequential recommendation
[Qin et al., 2024] Yifang Qin, Wei Ju, Hongjun Wu, Xiao Luo, and Ming Zhang. Learning graph ode for continuous- time sequential recommendation. IEEE Transactions on Knowledge and Data Engineering,
work page 2024
-
[21]
A survey on over- smoothing in graph neural networks
[Rusch et al., 2023] T Konstantin Rusch, Michael M Bron- stein, and Siddhartha Mishra. A survey on over- smoothing in graph neural networks. arXiv preprint arXiv:2303.10993,
-
[22]
Raising the bar in graph ood generalization: Invariant learning beyond explicit environment modeling
[Shen et al., 2025] Xu Shen, Yixin Liu, Yili Wang, Rui Miao, Yiwei Dai, Shirui Pan, and Xin Wang. Raising the bar in graph ood generalization: Invariant learning beyond explicit environment modeling. arXiv preprint arXiv:2502.10706,
-
[23]
A comprehensive survey of synthetic tabular data generation
[Shi et al., 2025] Ruxue Shi, Yili Wang, Mengnan Du, Xu Shen, and Xin Wang. A comprehensive survey of synthetic tabular data generation. arXiv preprint arXiv:2504.16506,
-
[24]
Mod- eling multivariate biosignals with graph neural networks and structured state space models
[Tang et al., 2023] Siyi Tang, Jared A Dunnmon, Qu Liangqiong, Khaled K Saab, Tina Baykaner, Christopher Lee-Messer, and Daniel L Rubin. Mod- eling multivariate biosignals with graph neural networks and structured state space models. In Conference on Health, Inference, and Learning , pages 50–71. PMLR,
work page 2023
-
[25]
[Veliˇckovi´c et al., 2017] Petar Veliˇckovi´c, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. Graph attention networks. arXiv preprint arXiv:1710.10903,
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[26]
Contrastive and generative graph convolu- tional networks for graph-based semi-supervised learning
[Wan et al., 2021] Sheng Wan, Shirui Pan, Jian Yang, and Chen Gong. Contrastive and generative graph convolu- tional networks for graph-based semi-supervised learning. In Proceedings of the AAAI conference on artificial intel- ligence, volume 35, pages 10049–10057,
work page 2021
-
[27]
Adagcl: Adaptive subgraph contrastive learning to generalize large-scale graph train- ing
[Wang et al., 2022] Yili Wang, Kaixiong Zhou, Rui Miao, Ninghao Liu, and Xin Wang. Adagcl: Adaptive subgraph contrastive learning to generalize large-scale graph train- ing. In Proceedings of the 31st ACM international con- ference on information & knowledge management , pages 2046–2055,
work page 2022
-
[28]
Graph-mamba: Towards long-range graph sequence modeling with selective state spaces
[Wang et al., 2024a] Chloe Wang, Oleksii Tsepa, Jun Ma, and Bo Wang. Graph-mamba: Towards long-range graph sequence modeling with selective state spaces. arXiv preprint arXiv:2402.00789,
-
[29]
Unifying unsupervised graph-level anomaly detection and out-of-distribution detection: A benchmark
[Wang et al., 2024b] Yili Wang, Yixin Liu, Xu Shen, Chenyu Li, Kaize Ding, Rui Miao, Ying Wang, Shirui Pan, and Xin Wang. Unifying unsupervised graph-level anomaly detection and out-of-distribution detection: A benchmark. arXiv preprint arXiv:2406.15523,
-
[30]
Efficient sharpness-aware minimization for molecular graph transformer models
[Wang et al., 2024c] Yili Wang, Kaixiong Zhou, Ninghao Liu, Ying Wang, and Xin Wang. Efficient sharpness-aware minimization for molecular graph transformer models. In The Twelfth International Conference on Learning Repre- sentations, ICLR 2024, Vienna, Austria, May 7-11, 2024 . OpenReview.net,
work page 2024
-
[31]
Is mamba effective for time series forecast- ing? Neurocomputing, 619:129178,
[Wang et al., 2025] Zihan Wang, Fanheng Kong, Shi Feng, Ming Wang, Xiaocui Yang, Han Zhao, Daling Wang, and Yifei Zhang. Is mamba effective for time series forecast- ing? Neurocomputing, 619:129178,
work page 2025
-
[32]
Simpli- fying graph convolutional networks
[Wu et al., 2019] Felix Wu, Amauri Souza, Tianyi Zhang, Christopher Fifty, Tao Yu, and Kilian Weinberger. Simpli- fying graph convolutional networks. In International con- ference on machine learning , pages 6861–6871. PMLR,
work page 2019
-
[33]
Demystifying oversmoothing in attention- based graph neural networks
[Wu et al., 2024] Xinyi Wu, Amir Ajorlou, Zihui Wu, and Ali Jadbabaie. Demystifying oversmoothing in attention- based graph neural networks. Advances in Neural Infor- mation Processing Systems, 36,
work page 2024
-
[34]
Self-supervised graph-level representation learning with local and global structure
[Xu et al., 2021] Minghao Xu, Hang Wang, Bingbing Ni, Hongyu Guo, and Jian Tang. Self-supervised graph-level representation learning with local and global structure. In International Conference on Machine Learning , pages 11548–11558. PMLR,
work page 2021
-
[35]
Rankmamba, benchmarking mamba’s document ranking performance in the era of transformers
[Xu, 2024] Zhichao Xu. Rankmamba, benchmarking mamba’s document ranking performance in the era of transformers. arXiv preprint arXiv:2403.18276,
-
[36]
Two sides of the same coin: Heterophily and oversmoothing in graph con- volutional neural networks
[Yan et al., 2022] Yujun Yan, Milad Hashemi, Kevin Swer- sky, Yaoqing Yang, and Danai Koutra. Two sides of the same coin: Heterophily and oversmoothing in graph con- volutional neural networks. In 2022 IEEE International Conference on Data Mining (ICDM) , pages 1287–1292. IEEE,
work page 2022
-
[37]
Plainmamba: Improving non- hierarchical mamba in visual recognition
[Yang et al., 2024] Chenhongyi Yang, Zehui Chen, Miguel Espinosa, Linus Ericsson, Zhenyu Wang, Jiaming Liu, and Elliot J Crowley. Plainmamba: Improving non- hierarchical mamba in visual recognition. arXiv preprint arXiv:2403.17695,
-
[38]
Multi- plex heterogeneous graph convolutional network
[Yu et al., 2022] Pengyang Yu, Chaofan Fu, Yanwei Yu, Chao Huang, Zhongying Zhao, and Junyu Dong. Multi- plex heterogeneous graph convolutional network. In Pro- ceedings of the 28th ACM SIGKDD Conference on Knowl- edge Discovery and Data Mining, pages 2377–2387,
work page 2022
-
[39]
[Zhai et al., 2024] Jiayu Zhai, Lequan Lin, Dai Shi, and Jun- bin Gao. Bregman graph neural network. InICASSP 2024- 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6250–6254. IEEE,
work page 2024
-
[40]
[Zhang and Li, 2021] Muhan Zhang and Pan Li. Nested graph neural networks. Advances in Neural Information Processing Systems, 34:15734–15747,
work page 2021
-
[41]
Node dependent local smoothing for scalable graph learning
[Zhang et al., 2021] Wentao Zhang, Mingyu Yang, Zeang Sheng, Yang Li, Wen Ouyang, Yangyu Tao, Zhi Yang, and Bin Cui. Node dependent local smoothing for scalable graph learning. Advances in Neural Information Process- ing Systems, 34:20321–20332,
work page 2021
-
[42]
Sim- ple spectral graph convolution
[Zhu and Koniusz, 2021] Hao Zhu and Piotr Koniusz. Sim- ple spectral graph convolution. InInternational conference on learning representations,
work page 2021
-
[43]
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
[Zhu et al., 2024] Lianghui Zhu, Bencheng Liao, Qian Zhang, Xinlong Wang, Wenyu Liu, and Xinggang Wang. Vision mamba: Efficient visual representation learning with bidirectional state space model. arXiv preprint arXiv:2401.09417, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.