SWORD: Spectral Wasserstein Online Regime Detection in Dynamic Networks

Izhar Ali

arxiv: 2605.29290 · v1 · pith:4SHHIDC4new · submitted 2026-05-28 · 💻 cs.CG

SWORD: Spectral Wasserstein Online Regime Detection in Dynamic Networks

Izhar Ali This is my paper

Pith reviewed 2026-06-29 00:05 UTC · model grok-4.3

classification 💻 cs.CG

keywords change point detectiondynamic networksspectral methodsChebyshev momentsKernel Polynomial Methodonline regime detectionL1 distance

0 comments

The pith

SWORD improves online change point detection by comparing the mean of Chebyshev moments from two adjacent windows with L1 distance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces SWORD for online regime detection in dynamic graphs without parametric assumptions or eigendecomposition. It computes Chebyshev moments of the normalized Laplacian via the Kernel Polynomial Method and replaces prior histogram discretization plus SVD-cosine scoring with the mean of those moments across two time windows scored by L1 distance. On the MIT Reality, AskUbuntu, and Enron benchmarks this raises mean F1 from 0.27 to 0.79, with the prior method detecting no changes on Enron. Cascading ablation isolates the two-window mean structure and L1 metric as the sources of the gain, while a bin-width sweep rules out discretization as the culprit. The approach retains O(KRm) per-graph cost and matches an offline autoencoder baseline when tuned per dataset.

Core claim

SWORD computes the same Chebyshev moments as SCPD but compares their mean across two adjacent time windows by L1 distance rather than discretizing them into a density-of-states histogram and scoring with SVD plus cosine similarity, lifting mean F1 from 0.27 to 0.79 on three real-world dynamic-network benchmarks while preserving linear scaling in the number of edges.

What carries the argument

The mean vector of KPM-computed Chebyshev moments of the normalized Laplacian taken over two adjacent windows and compared by L1 distance.

If this is right

Mean F1 reaches 0.79 across the three benchmarks while SCPD fails entirely on Enron.
Per-graph cost remains O(KRm) with no eigendecomposition, scaling to 86,000-node networks.
With per-dataset tuning it matches the offline TIRE autoencoder on mean F1.
It attains the highest precision (0.91) among online methods, with only two false positives total.
Histogram bin width is ruled out as the driver of the prior performance gap.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The two-window averaging may supply the minimal temporal context needed to distinguish transient fluctuations from regime shifts in streaming graph data.
Direct L1 comparison on moment vectors could be applied to other polynomial approximations or moment bases without requiring histogram construction.
Because the method inherits the KPM core, it can be swapped into any pipeline already using Chebyshev expansions of graph operators.
Controlled synthetic graphs with injected regime changes would isolate whether the L1-on-means advantage holds when edge density and noise are varied independently.

Load-bearing premise

The performance difference is attributable to the two-window mean structure and L1 metric as identified by the cascading ablation, rather than other unmentioned factors such as specific parameter choices or dataset characteristics.

What would settle it

Re-running the cascading ablation on the same three benchmarks after swapping only the two-window mean and L1 components back to histogram-plus-cosine yields F1 scores indistinguishable from the original SCPD baseline.

Figures

Figures reproduced from arXiv: 2605.29290 by Izhar Ali.

**Figure 2.** Figure 2: Cascading architectural ablation (SCPD 1 ∞ →SWORD). Held fixed: Jacksonfiltered KPM moments at nbins=∞ and SCPD’s per-dataset best nmoments, windows, cooldown. Each cascade step changes one axis (legend below the panel). Connected slopegraph plots mean F1 over 5 stochastic seeds, with ±1 std as the shaded band; per-dataset SWORD references appear as dashed ticks on the right margin. For MIT and Enron the … view at source ↗

**Figure 3.** Figure 3: Bin-width sweep on SCPD. Holding the SCPD pipeline at its per-dataset best, the 1 histogram bin count is varied from 8 to 1024, plus an “∞” point that drops the histogram step entirely. Mean F1 over 5 stochastic seeds; error bars are ±1 std. SWORD’s per-dataset F1 is shown as a dashed reference [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: False-alarm vs. detection trade-off. Top row: detection rate (power) vs. [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗

**Figure 5.** Figure 5: Few-shot calibration with a 4-candidate pool 1 ({fixed-cfg} ∪ per-dataset SWORD bests). Left: slopegraph of test F1 for N ∈ {0, 1, 2, 3, 5, all} labeled CPs (5 stochastic seeds; shaded ±1 std). Diamond markers anchor the N=0 fixed-cfg endpoint and stars anchor the N=all oracle. Right: ∆F1 at N=1 (calibrated − fixed-cfg): 1-shot lifts datasets where fixed-cfg is suboptimal (MIT, AskUbuntu) but hurts dataset… view at source ↗

**Figure 6.** Figure 6: TIRE under bounded online training. 1 Test F1 vs. Twarm/T, averaged over 5 SGD seeds; error bars ±1 std. At Twarm = 0.25 T, TIRE on Enron and AskUbuntu matches or exceeds offline TIRE (Twarm = T). Online training does not degrade TIRE’s performance. 1 2 3 4 5 7 10 15 20 30 Chebyshev moments k 0.0 0.2 0.4 0.6 0.8 1.0 F1 canonical k ∈ {2, 3, 4} MIT Enron AskUbuntu SWORD best [PITH_FULL_IMAGE:figures/full_fi… view at source ↗

**Figure 7.** Figure 7: Sensitivity of full SWORD to k. For each dataset, all hyperparameters except 1 k are held at the per-dataset best ( [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗

read the original abstract

Online change point detection in dynamic graphs requires comparing graphs as they arrive, in time linear in the number of edges, without parametric assumptions. Recent spectral methods address scale via the Kernel Polynomial Method (KPM): SCPD computes Chebyshev moments of the normalized Laplacian, discretizes them into a density-of-states histogram, and scores the histogram with SVD plus cosine similarity. We introduce SWORD, which computes the same moments and instead compares their mean across two adjacent time windows by their $L_1$ distance. On three real-world benchmarks (MIT Reality, AskUbuntu, Enron), this lifts mean $F_1$ from SCPD's $0.27$ to $0.79$, with SCPD failing to detect any change on Enron. A controlled cascading ablation attributes the gap to two design choices: the two-window mean structure (dominant on MIT) and the $L_1$ metric on those mean vectors (dominant on Enron). A bin-width sweep rules out histogram discretization -- SCPD's most visible architectural choice -- as the driver. SWORD inherits SCPD's KPM core, so per-graph cost is $O(KRm)$ with no eigendecomposition, scaling to $86{,}000$-node networks. With per-dataset tuning it matches the offline TIRE autoencoder on mean $F_1$ and attains the highest precision among online methods ($0.91$, only $2$ false positives across the three benchmarks).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SWORD reports a large F1 lift over SCPD on three graph benchmarks by using mean moments and L1 distance, but the ablation's attribution to those exact choices is weakened by per-dataset tuning.

read the letter

The main takeaway is that SWORD replaces SCPD's histogram-plus-SVD step with the mean of Chebyshev moments scored by L1 distance and gets mean F1 from 0.27 up to 0.79 on MIT Reality, AskUbuntu, and Enron. It keeps the same KPM spectral core, so per-graph cost stays linear in edges and it scales to 86k-node networks without eigendecomposition.

What works is the direct empirical comparison and the controlled checks. The bin-width sweep shows discretization is not the driver, and the cascading ablation points to the two-window mean helping on MIT and the L1 metric helping on Enron. Matching the offline TIRE autoencoder on mean F1 while staying online and posting the highest precision (0.91, only two false positives total) is a practical result.

The soft spot is the ablation's ability to isolate the claimed design choices. The paper tunes hyperparameters per dataset for the final SWORD numbers and inherits SCPD's KPM setup. The abstract does not state that every other parameter (K, window length, Laplacian normalization, moment truncation) was held fixed across the ablation variants. If those differed, the gap could come from tuning rather than the mean structure or L1 metric alone. That makes the attribution less definitive than presented.

This is for people working on online change-point detection in dynamic graphs or spectral methods for networks. The numbers are strong enough and the method simple enough that a referee should examine the full code and ablation controls.

Referee Report

2 major / 2 minor

Summary. The paper proposes SWORD, an online regime detection method for dynamic networks that inherits the Kernel Polynomial Method (KPM) Chebyshev moments from SCPD but replaces histogram discretization + SVD-cosine scoring with the mean of moments across two adjacent windows scored by L1 distance. On the MIT Reality, AskUbuntu, and Enron benchmarks it reports mean F1 of 0.79 (vs. SCPD's 0.27), matches offline TIRE under per-dataset tuning, and attains the highest online precision (0.91). A cascading ablation attributes the gains primarily to the two-window mean (dominant on MIT) and L1 metric (dominant on Enron), while a bin-width sweep rules out discretization as the driver. Per-graph cost remains O(KRm) with no eigendecomposition.

Significance. If the ablation isolates the claimed factors, the result supplies a simpler, empirically stronger online spectral baseline for change-point detection that scales to 86k-node networks and closes much of the gap to offline methods. The explicit cascading ablation, the direct comparison against both online (SCPD) and offline (TIRE) baselines, and the scaling claim are concrete strengths that would be useful to the community.

major comments (2)

[Abstract / Experiments (cascading ablation)] Abstract and §4 (cascading ablation): the central attribution of the F1 lift (0.27→0.79) to the two-window mean and L1 metric rests on the claim that the ablation is 'controlled.' The manuscript states that final SWORD numbers use per-dataset tuning, yet does not explicitly confirm that K, window length, Laplacian normalization, and moment truncation were frozen identically across all ablation variants. If these were allowed to vary, the performance gap could arise from hyperparameter differences rather than the two design choices highlighted. This is load-bearing for the paper's explanation of why SWORD outperforms SCPD.
[Abstract / Table 1 or equivalent results table] Abstract / Results on Enron: the statement that 'SCPD failing to detect any change on Enron' is used to support the aggregate 0.79 vs 0.27 claim. The per-dataset F1 (or detection count) for SCPD on Enron should be reported explicitly so readers can verify the magnitude of the improvement and assess whether the L1 metric is the dominant factor as asserted.

minor comments (2)

[Abstract] The abstract refers to 'mean F1' and 'highest precision' but does not state whether these are macro- or micro-averaged across the three benchmarks; adding this detail would improve reproducibility.
[Methods] Notation for the two-window mean vector and the precise definition of the L1 distance on moment vectors could be given an equation number in the methods section for easier reference in the ablation discussion.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the presentation of our ablation study and results. We address each major comment below.

read point-by-point responses

Referee: [Abstract / Experiments (cascading ablation)] Abstract and §4 (cascading ablation): the central attribution of the F1 lift (0.27→0.79) to the two-window mean and L1 metric rests on the claim that the ablation is 'controlled.' The manuscript states that final SWORD numbers use per-dataset tuning, yet does not explicitly confirm that K, window length, Laplacian normalization, and moment truncation were frozen identically across all ablation variants. If these were allowed to vary, the performance gap could arise from hyperparameter differences rather than the two design choices highlighted. This is load-bearing for the paper's explanation of why SWORD outperforms SCPD.

Authors: The cascading ablation was performed with K, window length, Laplacian normalization, and moment truncation held fixed at the values used for the final per-dataset SWORD configurations; only the two-window mean structure and L1 metric were varied. Per-dataset tuning was applied solely to select those fixed values for the complete method, not re-tuned independently for each ablation variant. We will revise §4 to state this explicitly and add a footnote confirming the shared hyperparameter settings. revision: yes
Referee: [Abstract / Table 1 or equivalent results table] Abstract / Results on Enron: the statement that 'SCPD failing to detect any change on Enron' is used to support the aggregate 0.79 vs 0.27 claim. The per-dataset F1 (or detection count) for SCPD on Enron should be reported explicitly so readers can verify the magnitude of the improvement and assess whether the L1 metric is the dominant factor as asserted.

Authors: We agree that explicit per-dataset values improve verifiability. SCPD records zero detections on Enron (F1 = 0), while SWORD records F1 = 0.92 on the same dataset. We will add a supplementary table (or expand Table 1) listing per-dataset F1, precision, and recall for SCPD, SWORD, and TIRE on all three benchmarks. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical variant evaluated on external benchmarks

full rationale

The paper proposes SWORD as a direct algorithmic variant of SCPD (same KPM moments, but mean across windows scored by L1 instead of histogram + SVD/cosine). The central claim is an empirical F1 lift on three named external datasets, with attribution via cascading ablation on those same datasets. No derivation chain, no fitted parameter renamed as prediction, no self-citation invoked as uniqueness theorem, and no ansatz or renaming of known results. The method is self-contained against the cited baseline and public benchmarks; performance numbers do not reduce to the inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The claim depends on the standard assumptions of spectral graph theory and the KPM approximation, with no new free parameters or entities introduced in the abstract.

axioms (1)

domain assumption Chebyshev moments computed via KPM provide a sufficient approximation for comparing graph spectra in change detection tasks.
This is inherited from the SCPD method and assumed to hold for the new comparison strategy.

pith-pipeline@v0.9.1-grok · 5787 in / 1298 out tokens · 40252 ms · 2026-06-29T00:05:11.156737+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

10 extracted references · 8 canonical work pages · 2 internal anchors

[1]

R. P. Adams and D. J. C. MacKay. Bayesian online changepoint detection.arXiv preprint arXiv:0710.3742,

work page internal anchor Pith review Pith/arXiv arXiv
[2]

arXiv:2104.03461

doi: 10.1145/3519935.3520009. arXiv:2104.03461. T. De Ryck, M. De Vos, and A. Bertrand. Change point detection in time series data using autoencoders with a time-invariant representation. InIEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3720–3724,

work page doi:10.1145/3519935.3520009
[3]

doi: 10.1287/opre.2021.0413. A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, A. Smola, et al. A kernel two-sample test. Journal of Machine Learning Research, 13:723–773,

work page doi:10.1287/opre.2021.0413 2021
[4]

doi: 10.1016/j.patcog.2025. 111855. S. Huang, Y. Hitti, G. Rabusseau, and R. Rabbany. Laplacian change point detection for dynamic graphs. InProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 349–358,

work page doi:10.1016/j.patcog.2025 2025
[5]

arXiv:2303.17642. D. Koutra, N. Shah, J. T. Vogelstein, B. Gallagher, and C. Faloutsos. DeltaCon: Principled massive-graph similarity function with attribution.ACM Transactions on Knowledge Discovery from Data, 10(3):1–43,

work page arXiv
[6]

G. Lorden. Procedures for reacting to a change in distribution.The Annals of Mathematical Statistics, 42(6):1897–1908,

1908
[7]

doi: 10.1109/ TKDE.2024.3523857. C. Musco, C. Musco, L. Rosenblatt, and A. V. Singh. Sharper bounds for Chebyshev moment matching with applications. InProceedings of the 38th Annual Conference on Learning Theory (COLT),

work page arXiv 2024
[8]

arXiv:2408.12385. R. Musco and C. Musco. Improved spectral density estimation via explicit and implicit deflation. Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA),

work page internal anchor Pith review Pith/arXiv arXiv
[9]

J. Shin, A. Ramdas, and A. Rinaldo. E-detectors: A nonparametric framework for sequential change detection.arXiv preprint arXiv:2203.03532,

work page arXiv
[10]

and SCPD (Huang et al., 2023), replacing LAD’s exact eigenvalue signatures with KPM-approximated DOS histograms while keeping the same two-window SVD + cosine detection. The only algorithmic difference from SCPD is the order of two operations: LADdos clamps the first-differenceZ∗to non-negative per window before taking the max, whereas SCPD takes the max ...

2023

[1] [1]

R. P. Adams and D. J. C. MacKay. Bayesian online changepoint detection.arXiv preprint arXiv:0710.3742,

work page internal anchor Pith review Pith/arXiv arXiv

[2] [2]

arXiv:2104.03461

doi: 10.1145/3519935.3520009. arXiv:2104.03461. T. De Ryck, M. De Vos, and A. Bertrand. Change point detection in time series data using autoencoders with a time-invariant representation. InIEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3720–3724,

work page doi:10.1145/3519935.3520009

[3] [3]

doi: 10.1287/opre.2021.0413. A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, A. Smola, et al. A kernel two-sample test. Journal of Machine Learning Research, 13:723–773,

work page doi:10.1287/opre.2021.0413 2021

[4] [4]

doi: 10.1016/j.patcog.2025. 111855. S. Huang, Y. Hitti, G. Rabusseau, and R. Rabbany. Laplacian change point detection for dynamic graphs. InProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 349–358,

work page doi:10.1016/j.patcog.2025 2025

[5] [5]

arXiv:2303.17642. D. Koutra, N. Shah, J. T. Vogelstein, B. Gallagher, and C. Faloutsos. DeltaCon: Principled massive-graph similarity function with attribution.ACM Transactions on Knowledge Discovery from Data, 10(3):1–43,

work page arXiv

[6] [6]

G. Lorden. Procedures for reacting to a change in distribution.The Annals of Mathematical Statistics, 42(6):1897–1908,

1908

[7] [7]

doi: 10.1109/ TKDE.2024.3523857. C. Musco, C. Musco, L. Rosenblatt, and A. V. Singh. Sharper bounds for Chebyshev moment matching with applications. InProceedings of the 38th Annual Conference on Learning Theory (COLT),

work page arXiv 2024

[8] [8]

arXiv:2408.12385. R. Musco and C. Musco. Improved spectral density estimation via explicit and implicit deflation. Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA),

work page internal anchor Pith review Pith/arXiv arXiv

[9] [9]

J. Shin, A. Ramdas, and A. Rinaldo. E-detectors: A nonparametric framework for sequential change detection.arXiv preprint arXiv:2203.03532,

work page arXiv

[10] [10]

and SCPD (Huang et al., 2023), replacing LAD’s exact eigenvalue signatures with KPM-approximated DOS histograms while keeping the same two-window SVD + cosine detection. The only algorithmic difference from SCPD is the order of two operations: LADdos clamps the first-differenceZ∗to non-negative per window before taking the max, whereas SCPD takes the max ...

2023