Adaptive Node Feature Selection For Graph Neural Networks
Pith reviewed 2026-05-18 09:54 UTC · model grok-4.3
The pith
Permuting each node feature's values and measuring the drop in validation performance lets GNNs remove unneeded features during training.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We propose an adaptive node feature selection approach for graph neural networks that identifies and removes unnecessary features during training based on changes in validation performance upon permuting feature values. We theoretically motivate our approach by characterizing how the relationships between node data and graph structure influences GNN performance. Empirically, our highly general approach rivals the performance of tailored feature selection approaches that exploit prior assumptions, returns meaningful feature importance scores well before the GNN is fully trained, and extracts relevant properties that inform feature importance for various graph learning settings.
What carries the argument
The permutation importance mechanism that scores each node feature by the change in validation performance when its values are randomly reassigned across nodes while the graph edges and other features remain fixed.
If this is right
- The method can be inserted into standard GNN training loops to drop low-scoring features without retraining from scratch.
- Importance scores become available after only a fraction of training epochs rather than at convergence.
- The same procedure applies to node classification, regression, and link-prediction tasks without modification.
- Dimensionality reduction occurs while preserving or improving final model accuracy on benchmark graphs.
Where Pith is reading between the lines
- Training-time feature pruning could reduce memory and compute costs on large graphs where full feature sets are expensive to process.
- The validation-performance signal might serve as a surrogate for gradient-based attribution when gradients are noisy due to graph convolutions.
- Extending the permutation test to subsets of features rather than single ones could capture joint contributions that single-feature swaps miss.
Load-bearing premise
Permuting individual node feature values and measuring the resulting change in validation performance accurately isolates each feature's contribution even when complex dependencies exist between features, nodes, and the graph structure.
What would settle it
A controlled experiment on a synthetic graph where features have known pairwise interactions and the permutation scores assign high importance to features that are known to be irrelevant once interactions are accounted for.
read the original abstract
We propose an adaptive node feature selection approach for graph neural networks (GNNs) that identifies and removes unnecessary features during training. The ability to measure how features contribute to model output is key for interpreting decisions and reducing dimensionality by eliminating unhelpful variables. However, graph-structured data introduces complex dependencies that may be unsuited to classical feature importance metrics. Inspired by this, we present a data-, model-, and task-agnostic method that determines relevant features during training based on changes in validation performance upon permuting feature values. We theoretically motivate our approach by characterizing how the relationships between node data and graph structure influences GNN performance. Empirically, we show that (i) our highly general approach rivals the performance of tailored feature selection approaches that exploit prior assumptions; (ii) we return meaningful feature importance scores well before the GNN is fully trained; and (iii) our scores demonstrably extract relevant properties that inform feature importance for various graph learning settings.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes an adaptive node feature selection method for GNNs that identifies and removes unnecessary features during training by measuring changes in validation performance after permuting individual node feature values. It is presented as data-, model-, and task-agnostic, with a theoretical motivation characterizing relationships between node data and graph structure, and empirical claims that the approach rivals tailored feature selection methods, yields meaningful importance scores early in training, and extracts relevant properties across graph learning settings.
Significance. If the permutation-based importance scores reliably isolate feature contributions despite inter-feature correlations and graph message-passing effects, the method would offer a general, assumption-light tool for dimensionality reduction and interpretability in GNNs that could apply across diverse tasks without requiring domain-specific priors.
major comments (2)
- [theoretical motivation section] Theoretical motivation section: the characterization of node-data/graph-structure relationships does not address how permuting a single feature across nodes simultaneously disrupts inter-feature correlations and alters aggregated neighborhood representations in GNN layers; this leaves open whether the observed validation-performance delta isolates marginal feature relevance or conflates joint effects, directly bearing on the claim that the scores are meaningful and general.
- [method description] Method description (permutation procedure): the approach relies on an external validation-performance signal after feature permutation rather than an internal model-derived quantity, yet no analysis is provided on how this behaves under feature dependence or when early-training models have not yet converged to stable representations; this underpins both the early-score claim and the rivalry with tailored methods.
minor comments (1)
- [abstract] Abstract and experimental claims: quantitative details on baselines, dataset sizes, statistical significance, and ablation controls are absent, making it difficult to evaluate the 'rivals tailored methods' assertion.
Simulated Author's Rebuttal
We thank the referee for their insightful comments on our paper. We address each of the major comments in detail below. We have revised the manuscript to incorporate additional discussion and analysis where the comments highlight areas for improvement, strengthening the presentation of our method's theoretical and empirical foundations.
read point-by-point responses
-
Referee: [theoretical motivation section] Theoretical motivation section: the characterization of node-data/graph-structure relationships does not address how permuting a single feature across nodes simultaneously disrupts inter-feature correlations and alters aggregated neighborhood representations in GNN layers; this leaves open whether the observed validation-performance delta isolates marginal feature relevance or conflates joint effects, directly bearing on the claim that the scores are meaningful and general.
Authors: We agree that the permutation of a single feature can disrupt inter-feature correlations and influence the neighborhood aggregations in GNN layers due to the message-passing mechanism. Our theoretical motivation section characterizes the broader relationships between node data and graph structure to explain why adaptive feature selection is beneficial for GNN performance. To address this point more explicitly, we have expanded the section to discuss how the validation performance delta, while reflecting joint effects in the presence of feature dependencies, still provides a meaningful measure of feature relevance in the context of the trained model. This is in line with the established use of permutation importance in other domains, and our empirical results across diverse graph settings support the generality of the scores. revision: partial
-
Referee: [method description] Method description (permutation procedure): the approach relies on an external validation-performance signal after feature permutation rather than an internal model-derived quantity, yet no analysis is provided on how this behaves under feature dependence or when early-training models have not yet converged to stable representations; this underpins both the early-score claim and the rivalry with tailored methods.
Authors: We acknowledge the value of providing more analysis on the permutation procedure, particularly regarding feature dependence and early training stages. In the revised version, we have added a new paragraph in the method description section that examines the impact of feature correlations on the importance scores using controlled experiments on synthetic graphs. Furthermore, we include additional plots and discussion demonstrating that the feature importance scores become informative early in the training process, prior to full model convergence, which bolsters the claim of obtaining meaningful scores early. These additions also help contextualize why our general approach can rival more specialized methods that rely on task-specific assumptions. revision: yes
Circularity Check
No significant circularity; importance derived from external validation deltas
full rationale
The paper grounds feature importance in measured changes to validation performance after permuting node feature values, an observable external signal independent of the model's internal fitted parameters. The theoretical motivation section characterizes relationships between node data and graph structure but does not define or reduce the performance delta itself to a fitted constant or self-referential quantity by construction. No load-bearing self-citations, uniqueness theorems, or ansatzes are invoked that collapse the central claims back to the inputs. The derivation therefore remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- performance-change threshold for feature removal
axioms (1)
- domain assumption Permuting a feature's values across nodes disrupts its relationship to the graph structure in a way that reliably indicates its contribution to model output.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.