pith. sign in

arxiv: 2604.04365 · v1 · submitted 2026-04-06 · 🧮 math.ST · stat.ML· stat.TH

Attributed Network Alignment: Statistical Limits and Efficient Algorithm

Pith reviewed 2026-05-10 20:12 UTC · model grok-4.3

classification 🧮 math.ST stat.MLstat.TH
keywords network alignmentgraph matchinginformation theoretic thresholdsquadratic programmingGaussian Wigner modelexact recoverynode featurespartial recovery
0
0 comments X

The pith

Node features and edges together set sharp thresholds for recovering hidden vertex mappings between graphs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper studies recovering an unknown vertex correspondence between two graphs when both edge weights and node features are observed. It introduces the featured correlated Gaussian Wigner model, in which the graphs are linked by a shared unknown permutation and the node features are correlated under that same permutation. The work derives the optimal information-theoretic thresholds separating regimes of exact recovery from partial recovery and no recovery. It then introduces the QPAlign algorithm based on a quadratic programming relaxation, shows that the algorithm performs well on synthetic and real data, and supplies theoretical guarantees for its behavior.

Core claim

In the featured correlated Gaussian Wigner model, the optimal information-theoretic thresholds for exact recovery and partial recovery of the latent mapping are characterized. QPAlign, an algorithm based on a quadratic programming relaxation, is proposed and shown to achieve strong empirical performance on synthetic and real datasets while carrying theoretical guarantees.

What carries the argument

The featured correlated Gaussian Wigner model, which couples two graphs through an unknown vertex permutation with node features jointly distributed under the same permutation.

If this is right

  • Exact recovery of the full mapping succeeds with high probability once the combined edge and feature signal exceeds an explicit threshold.
  • Partial recovery of a positive fraction of the mapping is possible between a lower threshold and the exact-recovery threshold.
  • QPAlign returns a reliable solution with convergence guarantees whenever the model parameters place the instance above the relevant threshold.
  • The same algorithm outperforms purely edge-based methods once node features carry additional correlation information.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same quadratic-programming relaxation could be adapted to other correlation models beyond the Gaussian Wigner case.
  • Social-network de-anonymization pipelines that already use profile attributes would gain a statistically grounded performance guarantee by switching to QPAlign.
  • Large-scale experiments on graphs with millions of vertices would test whether the current relaxation remains computationally tractable without further approximation.

Load-bearing premise

Node features must be correlated under the same unknown permutation as the edges, and the joint distributions must obey the Gaussian Wigner structure.

What would settle it

A simulation drawn exactly from the featured correlated Gaussian Wigner model in which the mapping is recovered with high probability below the derived information-theoretic threshold would falsify the claimed optimality of those thresholds.

Figures

Figures reproduced from arXiv: 2604.04365 by Chenyang Tian, Dong Huang, Pengkun Yang.

Figure 1
Figure 1. Figure 1: Overlap between the estimator ˆπ in Algorithm 1 and the ground truth π ∗ in two models with n = 3000 and d = 512, evaluated across varying correlations ρ ∈ [0, 1] and r ∈ [0, 1]. Importantly, these numerical results are consistent with the information-theoretic exact recov￾ery thresholds given in Theorem 2. We also note that in certain intermediate regimes, there exists 11 [PITH_FULL_IMAGE:figures/full_fi… view at source ↗
Figure 2
Figure 2. Figure 2: Phase-transition boundaries of QPAlign under different regularization parameters [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overlap vs. λ on ACM-DBLP and Douban datasets. We report the experimental results averaged over 5 random seeds. The faint curves represent the results of individual runs, while the bold curves show their average. To fairly compare with baselines using the corresponding information source in [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Overlap between the estimator ˆπ in Algorithm 1 and the ground truth π ∗ in two models with n = 100 and d = 16, evaluated across varying correlations ρ ∈ [0, 1] and r ∈ [0, 1]. results in terms of σ = p (1 − ρ 2)/ρ2 ∈ [0, 0.5], which serves as a noise-to-signal ratio relative to the original graph, as showed in [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: reports the alignment accuracy as a function of σ = p (1 − ρ 2)/ρ2 with λ = 0.05, 0.1, and 0.15, respectively, where smaller σ corresponds to stronger graph correlation. Our method consistently outperforms the purely edge-based baselines (GW, Grampa, IsoRank, Umeyama) and the joint edge–feature baseline FGW across different feature correlations r. Notably, even when r is small (e.g., r = 0.1), our approach… view at source ↗
Figure 6
Figure 6. Figure 6: Overlap between the estimator ˆπ in Algorithm 1 and the ground truth π ∗ evaluated by different algorithms across varying correlations r ∈ [0, 1] with different λ. To investigate the effect of non-Gaussianity and heavy tails, we also conduct experiments under a Student-t model. Specifically, we consider the setting n = 100 and d = 16, and generate the edge weights and node features from Student-t distribut… view at source ↗
Figure 7
Figure 7. Figure 7: Overlap between the estimator ˆπ and the ground truth π ∗ under Student-t distributions with degrees of freedom ν ∈ {3, 5, 10}, for n = 100 and d = 16. To initialize for synthetic datasets, we leverage both feature and degree information to construct a mixed similarity matrix. Specifically, given node feature matrices X = [x ⊤ 1 , · · · , x ⊤ n ] ∈ R n×d and Y = [y ⊤ 1 , · · · , y ⊤ n ] ∈ R n×d , we first … view at source ↗
Figure 8
Figure 8. Figure 8: Alignment accuracy across different λ in ST data. In this following, we describe the experimental setup on simulated spatial transcriptomic data. Synthetic slices were generated by perturbing both spatial coordinates and transcript counts. Let (X, Z) denote a transcript count matrix X ∈ N p×n and spot coordinates Z ∈ R 2×n . To model sectioning variability, each coordinate was rotated by z ′ i = R(θ)zi , R… view at source ↗
read the original abstract

This paper studies the problem of recovering a hidden vertex correspondence between two correlated graphs when both edge weights and node features are observed. While most existing work on graph alignment relies primarily on edge information, many real-world applications provide informative node features in addition to graph topology. To capture this setting, we introduce the featured correlated Gaussian Wigner model, where two graphs are coupled through an unknown vertex permutation, and the node features are correlated under the same permutation. We characterize the optimal information-theoretic thresholds for exact recovery and partial recovery of the latent mapping. On the algorithmic side, we propose QPAlign, an algorithm based on a quadratic programming relaxation, and demonstrate its strong empirical performance on both synthetic and real datasets. Moreover, we also derive theoretical guarantees for the proposed procedure, supporting its reliability and providing convergence guarantees.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper introduces the featured correlated Gaussian Wigner model for attributed network alignment, where two graphs are coupled through an unknown vertex permutation with jointly Gaussian correlated edge weights and node features. It characterizes the optimal information-theoretic thresholds for exact recovery and partial recovery of the latent mapping. The authors propose QPAlign, a quadratic programming relaxation algorithm, demonstrate its empirical performance on synthetic and real datasets, and derive theoretical guarantees for the procedure.

Significance. If the characterizations and guarantees hold under the stated model, the work is significant for extending graph alignment theory to include node features, providing both sharp information-theoretic limits and a practical algorithm with supporting analysis. The explicit model definition means the stress-test concern about the Gaussian Wigner structure does not land as an internal inconsistency; the claims are scoped to this setting and the derivations follow from it in a standard manner for such models.

minor comments (3)
  1. Abstract: the claim of 'strong empirical performance' is stated without any quantitative metrics, baselines, or dataset sizes; a single sentence summarizing key results (e.g., recovery accuracy or runtime) would improve the summary.
  2. Model section: the joint distribution parameters (edge correlation rho and feature correlation) are introduced but their precise roles in the threshold expressions could be highlighted with a short table or remark for quick reference.
  3. Experiments: more detail on how the real-world datasets are preprocessed to fit the Gaussian Wigner assumption (or how deviations are handled) would help readers evaluate practical relevance.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the careful and positive review of our manuscript. We are pleased that the significance of the featured correlated Gaussian Wigner model, the information-theoretic thresholds, and the QPAlign algorithm with its guarantees has been recognized. The recommendation for minor revision is appreciated. No major comments were raised in the report.

Circularity Check

0 steps flagged

No circularity: model defined first, thresholds and algorithm derived from it

full rationale

The paper introduces the featured correlated Gaussian Wigner model explicitly, then derives information-theoretic thresholds for exact/partial recovery and provides guarantees for QPAlign. No quoted steps reduce a claimed prediction or first-principles result to its own inputs by construction, self-definition, or load-bearing self-citation. Derivations rely on standard analysis of the stated joint Gaussian structure rather than tautological renaming or fitted parameters presented as predictions. The derivation chain is self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claims rest on the definition of a new joint Gaussian model for edges and features under a hidden permutation, plus standard assumptions of Gaussian noise and correlation structure.

axioms (1)
  • domain assumption Edge weights and node features follow jointly Gaussian distributions under the unknown permutation
    Invoked in the model definition in the abstract.
invented entities (1)
  • featured correlated Gaussian Wigner model no independent evidence
    purpose: To jointly model correlated graphs with both topology and node attributes
    New model introduced to capture the attributed alignment setting

pith-pipeline@v0.9.0 · 5431 in / 1224 out tokens · 49496 ms · 2026-05-10T20:12:45.669018+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages · 1 internal anchor

  1. [1]

    Exact random graph matching with multiple graphs

    [AH24] Taha Ameen and Bruce Hajek. Exact random graph matching with multiple graphs. arXiv preprint arXiv:2405.12293,

  2. [2]

    Detecting correlation between multiple unlabeled Gaussian networks.arXiv preprint arXiv:2504.16279,

    [AH25] Taha Ameen and Bruce Hajek. Detecting correlation between multiple unlabeled Gaussian networks.arXiv preprint arXiv:2504.16279,

  3. [3]

    Exact alignment recovery for correlated Erd˝ os- R´ enyi graphs.arXiv preprint arXiv:1711.06783,

    [CK17] Daniel Cullina and Negar Kiyavash. Exact alignment recovery for correlated Erd˝ os- R´ enyi graphs.arXiv preprint arXiv:1711.06783,

  4. [4]

    Gaussian database alignment and gaussian planted matching.arXiv preprint arXiv:2307.02459,

    [DCK23] Osman Emre Dai, Daniel Cullina, and Negar Kiyavash. Gaussian database alignment and gaussian planted matching.arXiv preprint arXiv:2307.02459,

  5. [5]

    A polynomial-time iterative algorithm for random graph matching with non-vanishing correlation.arXiv preprint arXiv:2306.00266,

    [DL23] Jian Ding and Zhangsong Li. A polynomial-time iterative algorithm for random graph matching with non-vanishing correlation.arXiv preprint arXiv:2306.00266,

  6. [6]

    Optimal recovery of correlated Erd˝ os-R´ enyi graphs.arXiv preprint arXiv:2502.12077,

    [Du25] Hang Du. Optimal recovery of correlated Erd˝ os-R´ enyi graphs.arXiv preprint arXiv:2502.12077,

  7. [7]

    The Umeyama algorithm for matching correlated Gaussian geometric models in the low-dimensional regime

    [GL24] Shuyang Gong and Zhangsong Li. The Umeyama algorithm for matching corre- lated Gaussian geometric models in the low-dimensional regime.arXiv preprint arXiv:2402.15095,

  8. [8]

    Impossibility of partial re- covery in the graph alignment problem

    [GML21] Luca Ganassali, Laurent Massouli´ e, and Marc Lelarge. Impossibility of partial re- covery in the graph alignment problem. InConference on Learning Theory, pages 2080–2102. PMLR,

  9. [9]

    Sample complexity of correlation detection in the Gaussian Wigner model.arXiv preprint arXiv:2505.14138,

    [HY25] Dong Huang and Pengkun Yang. Sample complexity of correlation detection in the Gaussian Wigner model.arXiv preprint arXiv:2505.14138,

  10. [10]

    Strong recovery of geometric planted matchings

    [KNW22] Dmitriy Kunisky and Jonathan Niles-Weed. Strong recovery of geometric planted matchings. InProceedings of the 2022 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 834–876. SIAM,

  11. [11]

    Faster algorithms for the alignment of sparse correlated Erd˝ os-R´ enyi random graphs.Journal of Statistical Mechanics: The- ory and Experiment, 2024(11):113405,

    [MS24] Andrea Muratori and Guilhem Semerjian. Faster algorithms for the alignment of sparse correlated Erd˝ os-R´ enyi random graphs.Journal of Statistical Mechanics: The- ory and Experiment, 2024(11):113405,

  12. [12]

    Wein, and Shenduo Zhang

    [MWZ24] Cheng Mao, Alexander S. Wein, and Shenduo Zhang. Information-theoretic thresh- olds for planted dense cycles.arXiv preprint arXiv:2402.00305,

  13. [13]

    Align- ing random graphs with a sub-tree similarity message-passing algorithm.Journal of Statistical Mechanics: Theory and Experiment, 2022(6):063401,

    [PSSZ22] Giovanni Piccioli, Guilhem Semerjian, Gabriele Sicuro, and Lenka Zdeborov´ a. Align- ing random graphs with a sub-tree similarity message-passing algorithm.Journal of Statistical Mechanics: Theory and Experiment, 2022(6):063401,

  14. [14]

    Independence testing for inhomoge- neous random graphs.arXiv preprint arXiv:2304.09132,

    [SPT23] Yukun Song, Carey E Priebe, and Minh Tang. Independence testing for inhomoge- neous random graphs.arXiv preprint arXiv:2304.09132,

  15. [15]

    Nonparametric graphon estimation.arXiv preprint arXiv:1309.5936,

    51 [WO13] Patrick J Wolfe and Sofia C Olhede. Nonparametric graphon estimation.arXiv preprint arXiv:1309.5936,

  16. [16]

    Consistent polynomial-time unseeded graph matching for Lipschitz graphons.arXiv preprint arXiv:1807.11027,

    [Zha18] Yuan Zhang. Consistent polynomial-time unseeded graph matching for Lipschitz graphons.arXiv preprint arXiv:1807.11027,

  17. [17]

    PARROT: Position- aware regularized optimal transport for network alignment

    52 [ZZXT23] Zhichen Zeng, Si Zhang, Yinglong Xia, and Hanghang Tong. PARROT: Position- aware regularized optimal transport for network alignment. InProceedings of the ACM Web Conference 2023, pages 372–382. Association for Computing Machinery,