Adaptive Node Feature Selection For Graph Neural Networks

Ali Azizpour; Madeline Navarro; Santiago Segarra

arxiv: 2510.03096 · v2 · submitted 2025-10-03 · 💻 cs.LG

Adaptive Node Feature Selection For Graph Neural Networks

Ali Azizpour , Madeline Navarro , Santiago Segarra This is my paper

Pith reviewed 2026-05-18 09:54 UTC · model grok-4.3

classification 💻 cs.LG

keywords graph neural networksfeature selectionpermutation importanceadaptive trainingnode featuresmodel interpretabilitydimensionality reduction

0 comments

The pith

Permuting each node feature's values and measuring the drop in validation performance lets GNNs remove unneeded features during training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that a simple permutation test on node features can identify which ones actually help a graph neural network without relying on assumptions about the data, model, or task. Graph data creates intertwined effects between node attributes and connections that break many classical feature-ranking tools, so a method that works by observing real performance changes offers a practical alternative. If the claim holds, practitioners could trim input dimensions on the fly, obtain early importance rankings, and apply the same procedure across node classification, link prediction, and other settings. The approach is presented as competitive with methods built for narrower cases while also supplying theoretical motivation based on how node data interacts with graph structure.

Core claim

We propose an adaptive node feature selection approach for graph neural networks that identifies and removes unnecessary features during training based on changes in validation performance upon permuting feature values. We theoretically motivate our approach by characterizing how the relationships between node data and graph structure influences GNN performance. Empirically, our highly general approach rivals the performance of tailored feature selection approaches that exploit prior assumptions, returns meaningful feature importance scores well before the GNN is fully trained, and extracts relevant properties that inform feature importance for various graph learning settings.

What carries the argument

The permutation importance mechanism that scores each node feature by the change in validation performance when its values are randomly reassigned across nodes while the graph edges and other features remain fixed.

If this is right

The method can be inserted into standard GNN training loops to drop low-scoring features without retraining from scratch.
Importance scores become available after only a fraction of training epochs rather than at convergence.
The same procedure applies to node classification, regression, and link-prediction tasks without modification.
Dimensionality reduction occurs while preserving or improving final model accuracy on benchmark graphs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Training-time feature pruning could reduce memory and compute costs on large graphs where full feature sets are expensive to process.
The validation-performance signal might serve as a surrogate for gradient-based attribution when gradients are noisy due to graph convolutions.
Extending the permutation test to subsets of features rather than single ones could capture joint contributions that single-feature swaps miss.

Load-bearing premise

Permuting individual node feature values and measuring the resulting change in validation performance accurately isolates each feature's contribution even when complex dependencies exist between features, nodes, and the graph structure.

What would settle it

A controlled experiment on a synthetic graph where features have known pairwise interactions and the permutation scores assign high importance to features that are known to be irrelevant once interactions are accounted for.

read the original abstract

We propose an adaptive node feature selection approach for graph neural networks (GNNs) that identifies and removes unnecessary features during training. The ability to measure how features contribute to model output is key for interpreting decisions and reducing dimensionality by eliminating unhelpful variables. However, graph-structured data introduces complex dependencies that may be unsuited to classical feature importance metrics. Inspired by this, we present a data-, model-, and task-agnostic method that determines relevant features during training based on changes in validation performance upon permuting feature values. We theoretically motivate our approach by characterizing how the relationships between node data and graph structure influences GNN performance. Empirically, we show that (i) our highly general approach rivals the performance of tailored feature selection approaches that exploit prior assumptions; (ii) we return meaningful feature importance scores well before the GNN is fully trained; and (iii) our scores demonstrably extract relevant properties that inform feature importance for various graph learning settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This adapts permutation importance to GNN node features with an adaptive training schedule, but the isolation of individual contributions looks shaky when features correlate or graph structure mixes signals.

read the letter

The paper's main move is to run permutation importance on node features inside the GNN training loop, drop low-impact ones once their scores stabilize, and tie the approach to a characterization of node-graph interactions. That gives a general, assumption-light selector that can produce scores before the model finishes training. The motivation section is the clearest part: it spells out why classical metrics can misfire on graphs without forcing extra priors on the user. If the early-stopping rule works reliably, it could cut down model size and help interpretation in settings where node attributes are high-dimensional and noisy. The empirical claims say it matches tailored selectors and surfaces relevant properties across tasks, which would be useful if the numbers are solid. The soft spot is the permutation step. Changing one feature across nodes while the graph is still passing messages can break correlations and alter neighborhood aggregates at the same time, so the performance delta mixes marginal and joint effects. The abstract acknowledges complex dependencies but does not show how the method separates them in practice. Without ablations on correlated features or controls for message-passing artifacts, the generality claim rests on thinner ground than it first appears. This is for graph ML people who want a drop-in feature selector without heavy customization. A reader running GNNs on real data with redundant node attributes would find the practical angle worth checking. It deserves peer review because the idea is straightforward to implement and the open questions about attribution are worth testing against stronger baselines.

Referee Report

2 major / 1 minor

Summary. The paper proposes an adaptive node feature selection method for GNNs that identifies and removes unnecessary features during training by measuring changes in validation performance after permuting individual node feature values. It is presented as data-, model-, and task-agnostic, with a theoretical motivation characterizing relationships between node data and graph structure, and empirical claims that the approach rivals tailored feature selection methods, yields meaningful importance scores early in training, and extracts relevant properties across graph learning settings.

Significance. If the permutation-based importance scores reliably isolate feature contributions despite inter-feature correlations and graph message-passing effects, the method would offer a general, assumption-light tool for dimensionality reduction and interpretability in GNNs that could apply across diverse tasks without requiring domain-specific priors.

major comments (2)

[theoretical motivation section] Theoretical motivation section: the characterization of node-data/graph-structure relationships does not address how permuting a single feature across nodes simultaneously disrupts inter-feature correlations and alters aggregated neighborhood representations in GNN layers; this leaves open whether the observed validation-performance delta isolates marginal feature relevance or conflates joint effects, directly bearing on the claim that the scores are meaningful and general.
[method description] Method description (permutation procedure): the approach relies on an external validation-performance signal after feature permutation rather than an internal model-derived quantity, yet no analysis is provided on how this behaves under feature dependence or when early-training models have not yet converged to stable representations; this underpins both the early-score claim and the rivalry with tailored methods.

minor comments (1)

[abstract] Abstract and experimental claims: quantitative details on baselines, dataset sizes, statistical significance, and ablation controls are absent, making it difficult to evaluate the 'rivals tailored methods' assertion.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments on our paper. We address each of the major comments in detail below. We have revised the manuscript to incorporate additional discussion and analysis where the comments highlight areas for improvement, strengthening the presentation of our method's theoretical and empirical foundations.

read point-by-point responses

Referee: [theoretical motivation section] Theoretical motivation section: the characterization of node-data/graph-structure relationships does not address how permuting a single feature across nodes simultaneously disrupts inter-feature correlations and alters aggregated neighborhood representations in GNN layers; this leaves open whether the observed validation-performance delta isolates marginal feature relevance or conflates joint effects, directly bearing on the claim that the scores are meaningful and general.

Authors: We agree that the permutation of a single feature can disrupt inter-feature correlations and influence the neighborhood aggregations in GNN layers due to the message-passing mechanism. Our theoretical motivation section characterizes the broader relationships between node data and graph structure to explain why adaptive feature selection is beneficial for GNN performance. To address this point more explicitly, we have expanded the section to discuss how the validation performance delta, while reflecting joint effects in the presence of feature dependencies, still provides a meaningful measure of feature relevance in the context of the trained model. This is in line with the established use of permutation importance in other domains, and our empirical results across diverse graph settings support the generality of the scores. revision: partial
Referee: [method description] Method description (permutation procedure): the approach relies on an external validation-performance signal after feature permutation rather than an internal model-derived quantity, yet no analysis is provided on how this behaves under feature dependence or when early-training models have not yet converged to stable representations; this underpins both the early-score claim and the rivalry with tailored methods.

Authors: We acknowledge the value of providing more analysis on the permutation procedure, particularly regarding feature dependence and early training stages. In the revised version, we have added a new paragraph in the method description section that examines the impact of feature correlations on the importance scores using controlled experiments on synthetic graphs. Furthermore, we include additional plots and discussion demonstrating that the feature importance scores become informative early in the training process, prior to full model convergence, which bolsters the claim of obtaining meaningful scores early. These additions also help contextualize why our general approach can rival more specialized methods that rely on task-specific assumptions. revision: yes

Circularity Check

0 steps flagged

No significant circularity; importance derived from external validation deltas

full rationale

The paper grounds feature importance in measured changes to validation performance after permuting node feature values, an observable external signal independent of the model's internal fitted parameters. The theoretical motivation section characterizes relationships between node data and graph structure but does not define or reduce the performance delta itself to a fitted constant or self-referential quantity by construction. No load-bearing self-citations, uniqueness theorems, or ansatzes are invoked that collapse the central claims back to the inputs. The derivation therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that feature permutation produces measurable, interpretable effects on validation performance in the presence of graph dependencies, plus standard supervised learning assumptions about validation sets being representative.

free parameters (1)

performance-change threshold for feature removal
Likely a cutoff value used to decide which features to keep or drop based on observed validation drop after permutation.

axioms (1)

domain assumption Permuting a feature's values across nodes disrupts its relationship to the graph structure in a way that reliably indicates its contribution to model output.
Invoked to justify why the permutation test measures importance for GNNs.

pith-pipeline@v0.9.0 · 5690 in / 1356 out tokens · 34680 ms · 2026-05-18T09:54:38.387352+00:00 · methodology

Adaptive Node Feature Selection For Graph Neural Networks

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)