Dual Mamba for Node-Specific Representation Learning: Tackling Over-Smoothing with Selective State Space Modeling
Pith reviewed 2026-05-17 23:57 UTC · model grok-4.3
The pith
A dual-Mamba graph network models node-specific state evolution locally while adding global context to keep representations distinct in deep layers.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that DMbaGCN, built from Local State-Evolution Mamba (LSEMba) for node-specific local dynamics and Global Context-Aware Mamba (GCAMba) for global information, enhances node discriminability in deep GNNs and thereby mitigates over-smoothing more effectively than residual connections or skip layers alone.
What carries the argument
DMbaGCN framework that pairs LSEMba, which applies selective state-space modeling to capture progressive node-specific representation changes during local neighborhood aggregation, with GCAMba, which injects global context for each node.
If this is right
- Deep GNNs using the dual-Mamba structure maintain higher node discriminability than those relying solely on residual connections.
- The selective state-space approach allows explicit modeling of how individual node representations evolve layer by layer.
- Incorporating global context via GCAMba supplies information that local aggregation alone cannot provide, further reducing convergence of representations.
- The resulting architecture demonstrates both effectiveness on node-level tasks and computational efficiency on standard graph benchmarks.
Where Pith is reading between the lines
- The same node-specific state tracking could be inserted into other message-passing architectures beyond GCNs, such as GAT or GraphSAGE variants.
- If the method scales, it may allow reliable training of GNNs with dozens of layers on large graphs where current residual techniques still saturate.
- A natural next measurement is whether the learned state transitions inside LSEMba correspond to interpretable structural roles of nodes.
Load-bearing premise
That Mamba's selective state-space modeling can be directly adapted to capture progressive, node-specific representation evolution across GNN layers and that adding global context will meaningfully outperform existing residual or skip-connection techniques.
What would settle it
Train a standard GCN, a residual GCN, and DMbaGCN to 20 or more layers on a fixed benchmark such as Cora or ogbn-arxiv, then measure the average cosine similarity or mutual information among node embeddings at the final layer; if the dual-Mamba version shows no clear reduction in similarity relative to the residual baseline, the mitigation claim is refuted.
Figures
read the original abstract
Over-smoothing remains a fundamental challenge in deep Graph Neural Networks (GNNs), where repeated message passing causes node representations to become indistinguishable. While existing solutions, such as residual connections and skip layers, alleviate this issue to some extent, they fail to explicitly model how node representations evolve in a node-specific and progressive manner across layers. Moreover, these methods do not take global information into account, which is also crucial for mitigating the over-smoothing problem. To address the aforementioned issues, in this work, we propose a Dual Mamba-enhanced Graph Convolutional Network (DMbaGCN), which is a novel framework that integrates Mamba into GNNs to address over-smoothing from both local and global perspectives. DMbaGCN consists of two modules: the Local State-Evolution Mamba (LSEMba) for local neighborhood aggregation and utilizing Mamba's selective state space modeling to capture node-specific representation dynamics across layers, and the Global Context-Aware Mamba (GCAMba) that leverages Mamba's global attention capabilities to incorporate global context for each node. By combining these components, DMbaGCN enhances node discriminability in deep GNNs, thereby mitigating over-smoothing. Extensive experiments on multiple benchmarks demonstrate the effectiveness and efficiency of our method.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes DMbaGCN, a Dual Mamba-enhanced Graph Convolutional Network that integrates Local State-Evolution Mamba (LSEMba) for local neighborhood aggregation and node-specific representation dynamics via selective state space modeling, together with Global Context-Aware Mamba (GCAMba) for incorporating global context per node. The central claim is that this combination enhances node discriminability in deep GNNs and thereby mitigates over-smoothing, with effectiveness shown through experiments on multiple benchmarks.
Significance. If the empirical results and adaptation hold, the work offers a novel direction for applying selective SSMs to model progressive, node-specific evolution in GNN layers, potentially improving upon residual or skip-connection baselines for deeper architectures. The dual local-global design is a clear strength and could influence future SSM-graph hybrids, though its impact hinges on demonstrating concrete gains in discriminability metrics beyond existing techniques.
major comments (2)
- Abstract: the claim that 'extensive experiments on multiple benchmarks demonstrate the effectiveness' supplies no quantitative results, baseline comparisons, over-smoothing metrics (e.g., MAD, Dirichlet energy), or ablation details, leaving the central empirical support for the dual-Mamba claim without visible grounding in the provided text.
- Method (LSEMba description): the assertion that selective state-space modeling captures 'node-specific representation dynamics across layers' to maintain discriminability assumes the SSM selection mechanism can counteract homogenization from repeated neighborhood averaging. No derivation is supplied showing how the combined LSEMba+GCAMba modules alter the contraction rate of the layer operator; if the SSM is applied post-aggregation without explicit topology-aware discretization, the update reduces to a learned residual without guaranteed advantage over skip connections.
minor comments (2)
- Notation for LSEMba and GCAMba should be accompanied by explicit equations or pseudocode to clarify how the selective SSM is discretized and applied to graph-structured inputs.
- The introduction could more explicitly contrast the proposed node-specific progressive modeling with prior residual and attention-based over-smoothing remedies.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback on our manuscript. We address each major comment below and indicate the revisions we will make to strengthen the presentation and analysis.
read point-by-point responses
-
Referee: Abstract: the claim that 'extensive experiments on multiple benchmarks demonstrate the effectiveness' supplies no quantitative results, baseline comparisons, over-smoothing metrics (e.g., MAD, Dirichlet energy), or ablation details, leaving the central empirical support for the dual-Mamba claim without visible grounding in the provided text.
Authors: We agree that the abstract would be strengthened by including concrete quantitative highlights. In the revised manuscript we will add specific performance gains (e.g., accuracy improvements on the cited benchmarks) together with references to the over-smoothing metrics (MAD, Dirichlet energy) and ablation results already reported in the experimental section. revision: yes
-
Referee: Method (LSEMba description): the assertion that selective state-space modeling captures 'node-specific representation dynamics across layers' to maintain discriminability assumes the SSM selection mechanism can counteract homogenization from repeated neighborhood averaging. No derivation is supplied showing how the combined LSEMba+GCAMba modules alter the contraction rate of the layer operator; if the SSM is applied post-aggregation without explicit topology-aware discretization, the update reduces to a learned residual without guaranteed advantage over skip connections.
Authors: We acknowledge that a formal derivation of the contraction-rate change would provide additional theoretical support. Our current argument rests on the empirical behavior of the selective SSM, which permits input-dependent state transitions that adapt per node and per layer; this is distinct from a fixed residual because the selection parameters are conditioned on the current node features and aggregated neighborhood. We will expand the method section with a clearer mechanistic explanation of the LSEMba–GCAMba interaction and will include additional diagnostic plots (e.g., layer-wise MAD curves) to illustrate the effect. A complete contraction-rate analysis remains an open theoretical question that we flag for future work. revision: partial
Circularity Check
No circularity: architectural proposal with independent empirical validation
full rationale
The paper proposes DMbaGCN as a new architectural combination of LSEMba (for local neighborhood aggregation and node-specific state evolution via selective SSM) and GCAMba (for global context). The central claim—that this mitigates over-smoothing by enhancing node discriminability—is presented as the outcome of the design and supported by benchmark experiments, without any equations, fitted parameters renamed as predictions, or self-citation chains that reduce the result to its own inputs. The derivation chain consists of module definitions and their integration, which remain independent of the performance assertions and do not invoke uniqueness theorems or ansatzes from prior self-work in a load-bearing manner. This is a standard non-circular empirical contribution.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Simba: Simplified mamba-based architecture for vision and multivariate time series
Gramformer: Learning crowd counting via graph- modulated transformer. InProceedings of the AAAI Con- ference on Artificial Intelligence, volume 38, 3395–3403. Liu, W.; Zhang, Z.; Li, X.; Hu, J.; Luo, Y .; and Du, J. 2024. Enhancing recommendation systems with GNNs and ad- dressing over-smoothing. In2024 4th International Confer- ence on Electronic Informa...
-
[2]
Wu, X.; Ajorlou, A.; Wu, Z.; and Jadbabaie, A
PMLR. Wu, X.; Ajorlou, A.; Wu, Z.; and Jadbabaie, A. 2023. De- mystifying oversmoothing in attention-based graph neural networks.Advances in Neural Information Processing Sys- tems, 36: 35084–35106. Yang, L.; Cai, Y .; Ning, H.; Zhuo, J.; Jin, D.; Ma, Z.; Guo, Y .; Wang, C.; and Wang, Z. 2025a. Universal Graph Self- Contrastive Learning. InIJCAI, 3534–354...
work page 2023
-
[3]
Graph bottlenecked social recommendation. InPro- ceedings of the 30th ACM SIGKDD Conference on Knowl- edge Discovery and Data Mining, 3853–3862. Ying, C.; Cai, T.; Luo, S.; Zheng, S.; Ke, G.; He, D.; Shen, Y .; and Liu, T.-Y . 2021. Do transformers really perform badly for graph representation?Advances in neural infor- mation processing systems, 34: 28877...
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.