Temporal Motif Signatures for Temporal Graph Neural Networks

Dylan Sandfelder; Mihai Cucuringu; Xiaowen Dong

arxiv: 2606.01176 · v1 · pith:BY6BZKEFnew · submitted 2026-05-31 · 💻 cs.LG

Temporal Motif Signatures for Temporal Graph Neural Networks

Dylan Sandfelder , Mihai Cucuringu , Xiaowen Dong This is my paper

Pith reviewed 2026-06-28 17:05 UTC · model grok-4.3

classification 💻 cs.LG

keywords temporal graphsmotif signaturesgraph neural networkslink predictiontemporal motifsfeature augmentationedge classificationWeisfeiler-Leman

0 comments

The pith

A compact 13-feature motif map captures predictive patterns that temporal GNNs miss.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Temporal interaction streams contain short-horizon motif patterns such as repetition, reciprocity, star diversity and triadic flow that carry signals useful for prediction. Vanilla temporal graph neural networks often fail to make these patterns available to their edge scorers. The authors observe that motif activity organizes consistently along three scale-stable axes across real and synthetic data and use this structure to build a 13-coordinate leakage-safe feature map h(u,v,t). The map adds linearly to any existing static or temporal encoder without architectural modification. Readers would care because the same augmentation raises performance on link-property prediction, edge classification and graph-level tasks.

Core claim

What carries the argument

The 13-coordinate leakage-safe candidate-local motif feature map h(u, v, t) derived from three scale-stable axes of motif activity.

Load-bearing premise

Motif activity organizes consistently along three scale-stable axes across real and synthetic temporal datasets.

What would settle it

A temporal dataset in which the three axes fail to organize observed motif counts or in which the 13 features produce no lift on any baseline TGNN for link prediction.

Figures

Figures reproduced from arXiv: 2606.01176 by Dylan Sandfelder, Mihai Cucuringu, Xiaowen Dong.

**Figure 2.** Figure 2: Empirical motif signatures across real and synthetic temporal streams. Rows: [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Two-dimensional projection of per-edge motif vectors [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

**Figure 4.** Figure 4: Motif signatures are scale-stable for most coordinates. Each bar reports the per-coordinate [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

**Figure 5.** Figure 5: Witness for Theorem 6.2: C6 and 2K3 as anchored directed temporal streams (anchor edge (u, v)=(0, 1) in red; in 2K3 the anchored common-neighbor witness w=2 is in green). Both streams are temporal-1-WL-equivalent at the anchor for every refinement depth, but the four A3 triadic-flow coordinates of h take value 0 in C6 and 1 in 2K3, isolating A3 as the separator. Why a whole-stream witness is impossible. By… view at source ↗

**Figure 6.** Figure 6: Effect of ∆ on Bitcoin Alpha PR-AUC [PITH_FULL_IMAGE:figures/full_fig_p022_6.png] view at source ↗

**Figure 8.** Figure 8: Soft-kernel (e −δ/τ with τ ∈ {∆/10, ∆/3, ∆}) vs hard-window (∆) PR-AUC across three seeds on Bitcoin Alpha, MOOC, and PaySim. Soft kernels match or slightly improve on hard windows for τ near ∆/3. The tgbl-review-v2 sweep was omitted because each seed required ∼13 h of wall-clock time, exceeding the available budget (see §G) [PITH_FULL_IMAGE:figures/full_fig_p023_8.png] view at source ↗

**Figure 9.** Figure 9: Learned linear motif embedding M for Bitcoin Alpha (left), Bitcoin OTC (middle), and MOOC (right). Trust networks concentrate mass on A1; MOOC on A2 and A3, reproducing the axis structure of Section 4. F PaySim stress test The PaySim fraud-detection stream is included in this paper as a stress test rather than as a headline empirical result. Two structural properties of the dataset put it outside the regim… view at source ↗

read the original abstract

Real temporal interaction streams carry predictive structure in short-horizon motif patterns -- repetition, reciprocity, star diversity, triadic flow -- that vanilla temporal graph neural networks (TGNNs) often fail to expose to their edge scorers. We show this concretely on MOOC interaction prediction, where a small four-feature family of past-window star counts already delivers most of the lift over a strong static GNN. Across a wide set of real and synthetic temporal datasets we find that motif activity organizes consistently along three scale-stable axes (dyadic recency/reciprocity, star diversity, triadic flow), and we use this empirical structure to design a compact 13-coordinate, leakage-safe, candidate-local motif feature map h(u, v, t) that linearly embeds into any static or temporal encoder without architectural changes. A temporal Weisfeiler-Leman (WL) analysis places the augmentation relative to the first level of an anchored temporal-WL hierarchy and exhibits a candidate-anchored pair on which motif features distinguish. We demonstrate empirically that the same augmentation consistently lifts performance across heterogeneous tasks: TGB link-property prediction across all five baselines, edge classification on Bitcoin Alpha/OTC and MOOC, and graph-level classification of synthetic temporal generators.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper offers a compact 13-feature motif augmentation for TGNNs that claims easy lifts, but the three axes come from patterns in the evaluation data with no shown guarantee they hold elsewhere.

read the letter

The main takeaway is a leakage-safe 13-coordinate feature map h(u, v, t) built from observed temporal motif patterns that you can add linearly to any existing encoder.

They start from the observation that short-horizon motifs like repetition, reciprocity, star counts, and triadic flow carry signal that vanilla TGNNs miss. On MOOC a handful of past-window star features already recover most of the gain over a static GNN. Across real and synthetic sets they report that motif activity lines up along three axes—dyadic recency/reciprocity, star diversity, triadic flow—and fix the 13 coordinates to those axes. A temporal WL argument places the map relative to anchored pair distinctions.

This is new as a small, architecture-agnostic construction rather than another model change. The candidate-local design and the claim of no leakage are practical pluses if they hold.

The soft spot is exactly the one in the stress-test note. The axes are extracted from the same class of datasets used for the reported lifts, so the fixed coordinates could be tuned to visible patterns rather than a stable property. Nothing in the WL placement supplies an a-priori reason the organization must persist on graphs where higher-order motifs or different time scales dominate. The abstract asserts consistent gains on TGB baselines, Bitcoin, MOOC, and synthetic generators but supplies no numbers, error bars, or leakage checks, which makes the size and reliability of the effect impossible to judge from the given material.

The work is for people doing link prediction or edge classification on temporal interaction data who want a lightweight feature add-on. It deserves peer review because the idea is concrete and testable even if the generalization claim needs stronger evidence.

Referee Report

3 major / 2 minor

Summary. The manuscript claims that motif activity in temporal graphs organizes consistently along three scale-stable axes (dyadic recency/reciprocity, star diversity, triadic flow) across real and synthetic datasets. It uses this structure to define a compact 13-coordinate, leakage-safe, candidate-local feature map h(u,v,t) that can be linearly embedded into any static or temporal GNN without architectural changes. A temporal Weisfeiler-Leman analysis situates the map relative to the first level of an anchored temporal-WL hierarchy, and empirical results are said to show consistent performance lifts on TGB link-property prediction (five baselines), edge classification (Bitcoin Alpha/OTC, MOOC), and graph-level classification of synthetic generators.

Significance. If the three-axis organization generalizes and the performance improvements prove robust under proper controls, the work would supply a lightweight, architecture-agnostic motif augmentation for TGNNs. The explicit temporal-WL placement is a strength, as it provides a theoretical anchor for the candidate-anchored distinction that the features are claimed to capture.

major comments (3)

[Abstract, §3] Abstract and §3: The central claim that motif activity 'organizes consistently along three scale-stable axes' is derived from patterns observed on the same class of datasets later used for evaluation. No quantitative measure of axis stability (e.g., correlation of axis loadings across held-out datasets or sensitivity to motif horizon) is supplied, which is load-bearing for the generalization of the fixed 13-coordinate map.
[§4] §4 (feature map definition): The 13 coordinates of h(u,v,t) are fixed from the empirical axes; the leakage-safe construction is asserted but no explicit argument or experiment demonstrates that the coordinate selection itself does not encode test-distribution information, leaving the 'candidate-local' guarantee unverified for new temporal graphs whose motif scale stability differs.
[§5] §5 (empirical evaluation): The abstract asserts 'consistent' lifts across all five TGB baselines and multiple tasks, yet the provided text supplies no quantitative tables, error bars, statistical tests, or ablation isolating the contribution of each axis. This prevents assessment of whether the reported improvements are statistically reliable or driven by the specific datasets used to discover the axes.

minor comments (2)

[§2, §4] Notation for the temporal WL hierarchy and the precise definition of 'anchored' pairs should be clarified with a small example in §2 or §4.
[§4] The manuscript would benefit from an explicit statement of the temporal horizon(s) used to count motifs when constructing the 13 features.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We appreciate the referee's insightful comments on our manuscript. Below we provide point-by-point responses to the major comments, outlining clarifications and planned revisions.

read point-by-point responses

Referee: [Abstract, §3] Abstract and §3: The central claim that motif activity 'organizes consistently along three scale-stable axes' is derived from patterns observed on the same class of datasets later used for evaluation. No quantitative measure of axis stability (e.g., correlation of axis loadings across held-out datasets or sensitivity to motif horizon) is supplied, which is load-bearing for the generalization of the fixed 13-coordinate map.

Authors: The identification of the three axes was based on observations across a diverse set of real and synthetic datasets, as described in §3. While the manuscript emphasizes the consistency observed, we agree that providing quantitative measures of stability would strengthen the claim. In the revision, we will include analyses such as the correlation of axis loadings across held-out dataset partitions and sensitivity tests to the motif horizon parameter. revision: yes
Referee: [§4] §4 (feature map definition): The 13 coordinates of h(u,v,t) are fixed from the empirical axes; the leakage-safe construction is asserted but no explicit argument or experiment demonstrates that the coordinate selection itself does not encode test-distribution information, leaving the 'candidate-local' guarantee unverified for new temporal graphs whose motif scale stability differs.

Authors: The h(u,v,t) map is designed to be candidate-local, relying solely on information available at time t for the candidate pair without access to future events or test set statistics. The coordinates are fixed globally based on the empirical structure rather than being dataset-specific. We will expand §4 with a formal argument for leakage-safety and an additional experiment applying the fixed map to a new temporal graph with differing motif characteristics to verify the guarantee. revision: yes
Referee: [§5] §5 (empirical evaluation): The abstract asserts 'consistent' lifts across all five TGB baselines and multiple tasks, yet the provided text supplies no quantitative tables, error bars, statistical tests, or ablation isolating the contribution of each axis. This prevents assessment of whether the reported improvements are statistically reliable or driven by the specific datasets used to discover the axes.

Authors: The manuscript contains quantitative results for the TGB link prediction tasks across the five baselines as well as the other tasks. To better demonstrate reliability, we will add error bars, statistical tests for significance, and ablations that isolate the contribution of each axis (dyadic, star, triadic) in the revised §5. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The paper's derivation proceeds from empirical observation of motif patterns on real and synthetic datasets to the design of a fixed 13-coordinate feature map h(u,v,t), followed by independent empirical validation of performance lifts across TGB tasks, edge classification, and synthetic generators. No step reduces by construction to its inputs via self-definition, fitted parameters renamed as predictions, or load-bearing self-citations; the feature map is a data-motivated but non-adaptive construction whose utility is tested on held-out evaluations rather than forced by the initial observations.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the empirical observation that motif activity is organized along three stable axes and that the resulting features are predictive and leakage-safe. No explicit free parameters are named in the abstract. The design choices for the 13 coordinates and the three axes constitute domain assumptions extracted from data.

axioms (1)

domain assumption Motif activity in temporal interaction streams organizes consistently along three scale-stable axes (dyadic recency/reciprocity, star diversity, triadic flow)
This organization is invoked to justify the design of the 13-feature map.

pith-pipeline@v0.9.1-grok · 5748 in / 1373 out tokens · 23737 ms · 2026-06-28T17:05:56.086535+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

15 extracted references · 7 canonical work pages · 4 internal anchors

[1]

Hyperevent: A strong baseline for dynamic link prediction via relative structural encoding.arXiv preprint arXiv:2507.11836,

Jian Gao, Jianshe Wu, and JingYi Ding. Hyperevent: A strong baseline for dynamic link prediction via relative structural encoding.arXiv preprint arXiv:2507.11836,

work page arXiv
[2]

Temporal graph benchmark for machine learning on temporal graphs.Advances in Neural Information Processing Systems, 36:2056–2073,

Shenyang Huang, Farimah Poursafaei, Jacob Danovitch, Matthias Fey, Weihua Hu, Emanuele Rossi, Jure Leskovec, Michael Bronstein, Guillaume Rabusseau, and Reihaneh Rabbany. Temporal graph benchmark for machine learning on temporal graphs.Advances in Neural Information Processing Systems, 36:2056–2073,

2056
[3]

Justifying recommendations using distantly-labeled reviews and fine-grained aspects

Jianmo Ni, Jiacheng Li, and Julian McAuley. Justifying recommendations using distantly-labeled reviews and fine-grained aspects. InProceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pages 188–197,

2019
[4]

Graph Convolutional Neural Networks via Motif-based Attention

Hao Peng, Jianxin Li, Qiran Gong, Senzhang Wang, Yuanxing Ning, and Philip S Yu. Graph convolutional neural networks via motif-based attention.arXiv preprint arXiv:1811.08270,

work page internal anchor Pith review Pith/arXiv arXiv
[5]

Temporal Graph Networks for Deep Learning on Dynamic Graphs

Emanuele Rossi, Ben Chamberlain, Fabrizio Frasca, Davide Eynard, Federico Monti, and Michael Bronstein. Temporal graph networks for deep learning on dynamic graphs.arXiv preprint arXiv:2006.10637,

work page internal anchor Pith review Pith/arXiv arXiv 2006
[6]

Representation Learning over Dynamic Graphs

Rakshit Trivedi, Mehrdad Farajtabar, Prasenjeet Biswal, and Hongyuan Zha. Representation learning over dynamic graphs.arXiv preprint arXiv:1803.04051,

work page internal anchor Pith review Pith/arXiv arXiv
[7]

A survey of link prediction in temporal networks.arXiv preprint arXiv:2502.21185,

Jiafeng Xiong, Ahmad Zareie, and Rizos Sakellariou. A survey of link prediction in temporal networks.arXiv preprint arXiv:2502.21185,

work page arXiv
[8]

How Powerful are Graph Neural Networks?

Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks?arXiv preprint arXiv:1810.00826,

work page internal anchor Pith review Pith/arXiv arXiv
[9]

Motifexplainer: a motif-based graph neural network explainer

Zhaoning Yu and Hongyang Gao. Motifexplainer: a motif-based graph neural network explainer. arXiv preprint arXiv:2202.00519,

work page arXiv
[10]

For heavy-tailed count coordinates (all except m2_ba_since_last), we apply the per-coordinate transform x7→ log10(1 +x)before passinghto the linear embeddingM. 14 B.5 Neighbor cap and subsampling rule To bound the cost of evaluating Axis-A2 and Axis-A3 features in the presence of high-degree hubs, we apply a global distinct-neighbor cap C∈N . If |N dir a ...

2017
[11]

C.6 Temporal-k-WL and the hierarchy theorem We define temporal-k-WL and prove Theorem C.7

bucketization. C.6 Temporal-k-WL and the hierarchy theorem We define temporal-k-WL and prove Theorem C.7. Definition C.6(Temporal- k-WL).Fix k≥1 . Atemporal k-tupleat reference time t is an element (⃗ v, ⃗ s)∈Vk ×[t−∆, t] k. Its initial color C(0) ⃗ v,⃗ sencodes the isomorphism type of the ≤k -node subgraph induced by events in Wpast t (∆) between the nod...

2019
[12]

Differences are within the variation expected from seed counts, hyperparameter retuning, and minor pipeline conventions (negative sampling protocol, evaluator version)

against the official TGB leader- board / reported figures [Huang et al., 2023, Gastinger et al., 2024] for the same baselines, to make the faithfulness of our reproductions explicit. Differences are within the variation expected from seed counts, hyperparameter retuning, and minor pipeline conventions (negative sampling protocol, evaluator version). Where...

2023
[13]

F PaySim stress test The PaySim fraud-detection stream is included in this paper as a stress test rather than as a headline empirical result. Two structural properties of the dataset put it outside the regime where motif augmentation is expected to help: (i) the fraud subgraph has negligible triadic flow, so the four A3 coordinates of h are dataset-level ...

1968
[14]

H.2 Motif-based GNN architectures Several families of GNN architectures couple motifs to message passing or attention

is the immediate predecessor of our feature family; in contrast to that paper’s aggregate enumeration, we use temporal motifcounts per candidate edgein a past-only window. H.2 Motif-based GNN architectures Several families of GNN architectures couple motifs to message passing or attention. MotifNet [Monti et al., 2018] builds one adjacency matrix per moti...

2018
[15]

features of the neighborhood at time t

analyze the expressive power of temporal graph networks and provide a related but distinct WL-style bound. Our temporal-WL hierarchy (Definitions C.1 and C.6) is self-contained, modeled on Morris et al. [2019], and makes the connection between motif order and WL rung explicit (Theorem C.7). Feature-augmented expressivity beyond 1-WL in static settings was...

2019

[1] [1]

Hyperevent: A strong baseline for dynamic link prediction via relative structural encoding.arXiv preprint arXiv:2507.11836,

Jian Gao, Jianshe Wu, and JingYi Ding. Hyperevent: A strong baseline for dynamic link prediction via relative structural encoding.arXiv preprint arXiv:2507.11836,

work page arXiv

[2] [2]

Temporal graph benchmark for machine learning on temporal graphs.Advances in Neural Information Processing Systems, 36:2056–2073,

Shenyang Huang, Farimah Poursafaei, Jacob Danovitch, Matthias Fey, Weihua Hu, Emanuele Rossi, Jure Leskovec, Michael Bronstein, Guillaume Rabusseau, and Reihaneh Rabbany. Temporal graph benchmark for machine learning on temporal graphs.Advances in Neural Information Processing Systems, 36:2056–2073,

2056

[3] [3]

Justifying recommendations using distantly-labeled reviews and fine-grained aspects

Jianmo Ni, Jiacheng Li, and Julian McAuley. Justifying recommendations using distantly-labeled reviews and fine-grained aspects. InProceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pages 188–197,

2019

[4] [4]

Graph Convolutional Neural Networks via Motif-based Attention

Hao Peng, Jianxin Li, Qiran Gong, Senzhang Wang, Yuanxing Ning, and Philip S Yu. Graph convolutional neural networks via motif-based attention.arXiv preprint arXiv:1811.08270,

work page internal anchor Pith review Pith/arXiv arXiv

[5] [5]

Temporal Graph Networks for Deep Learning on Dynamic Graphs

Emanuele Rossi, Ben Chamberlain, Fabrizio Frasca, Davide Eynard, Federico Monti, and Michael Bronstein. Temporal graph networks for deep learning on dynamic graphs.arXiv preprint arXiv:2006.10637,

work page internal anchor Pith review Pith/arXiv arXiv 2006

[6] [6]

Representation Learning over Dynamic Graphs

Rakshit Trivedi, Mehrdad Farajtabar, Prasenjeet Biswal, and Hongyuan Zha. Representation learning over dynamic graphs.arXiv preprint arXiv:1803.04051,

work page internal anchor Pith review Pith/arXiv arXiv

[7] [7]

A survey of link prediction in temporal networks.arXiv preprint arXiv:2502.21185,

Jiafeng Xiong, Ahmad Zareie, and Rizos Sakellariou. A survey of link prediction in temporal networks.arXiv preprint arXiv:2502.21185,

work page arXiv

[8] [8]

How Powerful are Graph Neural Networks?

Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks?arXiv preprint arXiv:1810.00826,

work page internal anchor Pith review Pith/arXiv arXiv

[9] [9]

Motifexplainer: a motif-based graph neural network explainer

Zhaoning Yu and Hongyang Gao. Motifexplainer: a motif-based graph neural network explainer. arXiv preprint arXiv:2202.00519,

work page arXiv

[10] [10]

For heavy-tailed count coordinates (all except m2_ba_since_last), we apply the per-coordinate transform x7→ log10(1 +x)before passinghto the linear embeddingM. 14 B.5 Neighbor cap and subsampling rule To bound the cost of evaluating Axis-A2 and Axis-A3 features in the presence of high-degree hubs, we apply a global distinct-neighbor cap C∈N . If |N dir a ...

2017

[11] [11]

C.6 Temporal-k-WL and the hierarchy theorem We define temporal-k-WL and prove Theorem C.7

bucketization. C.6 Temporal-k-WL and the hierarchy theorem We define temporal-k-WL and prove Theorem C.7. Definition C.6(Temporal- k-WL).Fix k≥1 . Atemporal k-tupleat reference time t is an element (⃗ v, ⃗ s)∈Vk ×[t−∆, t] k. Its initial color C(0) ⃗ v,⃗ sencodes the isomorphism type of the ≤k -node subgraph induced by events in Wpast t (∆) between the nod...

2019

[12] [12]

Differences are within the variation expected from seed counts, hyperparameter retuning, and minor pipeline conventions (negative sampling protocol, evaluator version)

against the official TGB leader- board / reported figures [Huang et al., 2023, Gastinger et al., 2024] for the same baselines, to make the faithfulness of our reproductions explicit. Differences are within the variation expected from seed counts, hyperparameter retuning, and minor pipeline conventions (negative sampling protocol, evaluator version). Where...

2023

[13] [13]

F PaySim stress test The PaySim fraud-detection stream is included in this paper as a stress test rather than as a headline empirical result. Two structural properties of the dataset put it outside the regime where motif augmentation is expected to help: (i) the fraud subgraph has negligible triadic flow, so the four A3 coordinates of h are dataset-level ...

1968

[14] [14]

H.2 Motif-based GNN architectures Several families of GNN architectures couple motifs to message passing or attention

is the immediate predecessor of our feature family; in contrast to that paper’s aggregate enumeration, we use temporal motifcounts per candidate edgein a past-only window. H.2 Motif-based GNN architectures Several families of GNN architectures couple motifs to message passing or attention. MotifNet [Monti et al., 2018] builds one adjacency matrix per moti...

2018

[15] [15]

features of the neighborhood at time t

analyze the expressive power of temporal graph networks and provide a related but distinct WL-style bound. Our temporal-WL hierarchy (Definitions C.1 and C.6) is self-contained, modeled on Morris et al. [2019], and makes the connection between motif order and WL rung explicit (Theorem C.7). Feature-augmented expressivity beyond 1-WL in static settings was...

2019