Rethinking GNNs and Missing Features: Challenges, Evaluation and a Robust Solution

Andrea Passerini; Antonio Longa; Bruno Lepri; Francesco Ferrini; Manfred Jaeger; Matono Akiyoshi; Veronica Lachi; Xin Liu

arxiv: 2601.04855 · v2 · pith:LA4NDS2Xnew · submitted 2026-01-08 · 💻 cs.LG · cs.AI

Rethinking GNNs and Missing Features: Challenges, Evaluation and a Robust Solution

Francesco Ferrini , Veronica Lachi , Antonio Longa , Bruno Lepri , Matono Akiyoshi , Andrea Passerini , Xin Liu , Manfred Jaeger This is my paper

Pith reviewed 2026-05-21 15:24 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords graph neural networksmissing node featuresnode classificationmissing data mechanismsrobust baselinesevaluation protocols

0 comments

The pith

A simple baseline handles missing node features in GNNs as well as specialized architectures on new realistic tests.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that standard benchmarks for GNNs with missing features rely on high-dimensional sparse data where missingness removes little useful information, so all methods look artificially strong. It introduces one synthetic and three real-world datasets that use dense, meaningful features instead, along with evaluation setups that include non-random missingness patterns common in practice. A straightforward method called GNNmim is presented as a baseline that processes incomplete features without complex changes to the model. Experiments indicate this baseline matches or approaches the accuracy of more elaborate specialized architectures across the new tests and missingness types. Readers should care because domains such as healthcare and sensor networks often involve dense features and structured missingness, making prior benchmarks unreliable guides for deployment.

Core claim

High sparsity in node features substantially limits the information loss from missing values, which explains why existing models all appear robust and why comparisons have been uninformative. By creating datasets with dense semantically meaningful features and designing protocols that go beyond missing completely at random, the paper shows that a simple baseline for node classification with incomplete data achieves competitive performance with specialized architectures across varied missingness regimes.

What carries the argument

GNNmim, a simple baseline for node classification with incomplete feature data that directly addresses missing values without requiring architectural overhauls.

If this is right

Standard sparse-feature benchmarks hide the performance gaps that arise when features are dense and missingness is structured.
GNNmim remains competitive with specialized models under multiple realistic missingness mechanisms.
Theoretical analysis of missingness assumptions can guide which methods are appropriate for a given data-generating process.
New dense-feature datasets make it possible to distinguish robust methods from those that only work under benign conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the simple baseline works across regimes, then many specialized missing-feature architectures may be solving an easier problem than previously thought.
The evaluation protocol could be reused to test other graph tasks such as link prediction when features are incomplete.
Sensor-failure patterns in real networks could be used to generate additional test cases that further stress the methods.

Load-bearing premise

The new synthetic and real-world datasets with dense features capture the real difficulties that missing node data poses in application areas such as healthcare and sensor networks.

What would settle it

A result in which GNNmim shows markedly lower accuracy than at least one specialized architecture on any of the introduced datasets under a non-MCAR missingness pattern would falsify the competitiveness claim.

read the original abstract

Handling missing node features is a key challenge for deploying Graph Neural Networks (GNNs) in real-world domains such as healthcare and sensor networks. Existing studies mostly address relatively benign scenarios, namely benchmark datasets with (a) high-dimensional but sparse node features and (b) incomplete data generated under Missing Completely At Random (MCAR) mechanisms. For (a), we theoretically prove that high sparsity substantially limits the information loss caused by missingness, making all models appear robust and preventing a meaningful comparison of their performance. To overcome this limitation, we introduce one synthetic and three real-world datasets with dense, semantically meaningful features. For (b), we move beyond MCAR and design evaluation protocols with more realistic missingness mechanisms. Moreover, we provide a theoretical background to state explicit assumptions on the missingness process and analyze their implications for different methods. Building on this analysis, we propose GNNmim, a simple yet effective baseline for node classification with incomplete feature data. Experiments show that GNNmim is competitive with respect to specialized architectures across diverse datasets and missingness regimes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Sparse benchmarks hide how much missing features hurt GNNs, and the new dense datasets plus GNNmim baseline give a clearer picture of what works.

read the letter

The central point is that high-dimensional but sparse node features in standard benchmarks limit the damage from missing data, which their theoretical analysis shows directly. This explains why prior models all looked robust and why comparisons were not very informative. They fix the setup by introducing one synthetic and three real-world datasets with dense, semantically meaningful features, along with evaluation protocols that go beyond MCAR to more realistic missingness mechanisms and explicit assumptions about the process.

Referee Report

2 major / 2 minor

Summary. The paper claims that existing GNN work on missing node features relies on sparse high-dimensional benchmarks and MCAR missingness, which theoretically limits information loss and prevents meaningful comparisons. It introduces one synthetic and three real-world datasets with dense semantically meaningful features, designs evaluation protocols using more realistic missingness mechanisms beyond MCAR with explicit assumptions, provides theoretical background on those assumptions, and proposes the simple GNNmim baseline, showing it is competitive with specialized architectures across the new datasets and regimes.

Significance. If the central claims hold, the work strengthens evaluation practices for missing features in GNNs by addressing sparsity-induced robustness artifacts and by grounding comparisons in explicit missingness assumptions. The theoretical analysis of sparsity effects and the introduction of denser-feature benchmarks are positive contributions that could improve robustness testing in domains like healthcare and sensor networks.

major comments (2)

[§4] §4 (Experiments) and dataset description: the competitiveness of GNNmim is demonstrated on the introduced synthetic and real-world datasets, but no quantitative verification is given that their missingness statistics (dependence on labels/features, temporal/spatial correlations) match those in target domains such as irregular clinical records or sensor dropouts. This is load-bearing for the transferability of the main claim.
[Theoretical analysis] Theoretical section on sparsity: the proof that high sparsity substantially limits information loss from missingness is central to motivating the new datasets, yet the manuscript does not explicitly state the assumptions on the GNN message-passing operator or feature distribution under which the bound holds, making it difficult to assess its generality.

minor comments (2)

[§4] The paper should include full implementation details and code for GNNmim and the missingness generation protocols to allow independent verification of the reported results.
[§3] Notation for the missingness mechanisms (e.g., how the explicit assumptions translate into the probability model) could be clarified with a dedicated table or pseudocode.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and for acknowledging the potential value of our contributions in strengthening evaluation practices for missing features in GNNs. We respond to each major comment below, clarifying our position and committing to revisions where appropriate to improve the manuscript.

read point-by-point responses

Referee: [§4] §4 (Experiments) and dataset description: the competitiveness of GNNmim is demonstrated on the introduced synthetic and real-world datasets, but no quantitative verification is given that their missingness statistics (dependence on labels/features, temporal/spatial correlations) match those in target domains such as irregular clinical records or sensor dropouts. This is load-bearing for the transferability of the main claim.

Authors: We agree that stronger evidence linking the missingness statistics in our datasets to those in target application domains would bolster claims of transferability. Direct quantitative matching is challenging because many relevant datasets (e.g., irregular clinical records) are private or restricted. In the revision we will add a dedicated discussion subsection in §4 that (i) explicitly connects our missingness generation procedures to documented patterns in the healthcare and sensor-network literature and (ii) reports quantitative statistics (correlation coefficients, label-dependence measures, temporal/spatial autocorrelation) on any publicly available proxy datasets that share similar characteristics. This will make the modeling assumptions and their alignment with real-world regimes transparent while respecting data-access constraints. revision: yes
Referee: [Theoretical analysis] Theoretical section on sparsity: the proof that high sparsity substantially limits information loss from missingness is central to motivating the new datasets, yet the manuscript does not explicitly state the assumptions on the GNN message-passing operator or feature distribution under which the bound holds, making it difficult to assess its generality.

Authors: We acknowledge that the assumptions underlying the sparsity bound were not stated with sufficient precision. The proof assumes a Lipschitz-continuous aggregation operator (standard for most message-passing GNNs) and bounded node-feature distributions. In the revised manuscript we will insert an explicit paragraph at the opening of the theoretical section that lists these assumptions, states the precise conditions on the message-passing operator and feature distribution, and briefly discusses the scope of generality (including regimes where the bound may loosen, such as unbounded features or non-Lipschitz aggregators). This clarification will allow readers to evaluate the result without changing its substance. revision: yes

Circularity Check

0 steps flagged

No load-bearing circularity; new datasets and missingness analysis provide independent empirical support

full rationale

The paper begins with a theoretical argument that high sparsity in existing benchmarks limits observable information loss from missingness, then introduces one synthetic and three real-world datasets with dense features plus evaluation protocols using non-MCAR mechanisms and explicit missingness assumptions. GNNmim is proposed as a baseline built on this analysis, with competitiveness shown via experiments on the new datasets. No derivation step reduces by construction to a fitted parameter, self-definition, or self-citation chain; the central claim rests on external benchmarks and stated assumptions rather than tautological renaming or forced predictions. This is the expected honest non-finding for a paper whose evaluation is self-contained against its introduced data.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work relies on domain assumptions about missingness processes and the representativeness of the new datasets rather than free parameters or invented entities.

axioms (1)

domain assumption Explicit assumptions on the missingness process can be stated and their implications analyzed for different methods.
Paper supplies theoretical background to state explicit assumptions on the missingness process.

pith-pipeline@v0.9.0 · 5741 in / 1128 out tokens · 55151 ms · 2026-05-21T15:24:09.884939+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose GNNmim, a simple yet effective baseline... augments the node feature matrix with a binary mask indicating which features are missing.
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 2... information loss induced by missingness is provably negligible unless missingness is extremely high.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.