Patch-Level DINOv2 Scoring for Gravitational-Wave Glitch Detection: Breaking the Signal Dilution Barrier via Vector-Quantized Local Feature Indexing

Luca Cirfeta

arxiv: 2606.09933 · v1 · pith:DKXPH6TXnew · submitted 2026-06-07 · 🌌 astro-ph.IM · gr-qc

Patch-Level DINOv2 Scoring for Gravitational-Wave Glitch Detection: Breaking the Signal Dilution Barrier via Vector-Quantized Local Feature Indexing

Luca Cirfeta This is my paper

Pith reviewed 2026-06-27 17:48 UTC · model grok-4.3

classification 🌌 astro-ph.IM gr-qc

keywords gravitational wave glitch detectionDINOv2vector quantizationpatch-level scoringLIGO spectrogramsunsupervised anomaly detectionsignal dilution

0 comments

The pith

Patch-level top-k scoring on DINOv2 token similarities to a vector-quantized index separates extended glitch signals from noise where global averaging fails.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that the CLS token in a frozen DINOv2 model performs global average pooling over 1369 patches and therefore suppresses signals that occupy less than 5 percent of a spectrogram grid. It replaces that global metric with a top-k order statistic computed on the similarities of the individual patch tokens to a vector-quantized reference index containing 64 centroids for each of 19 Gravity Spy morphologies. On strain-domain injections into LIGO O4a L1 data the new statistic produces a Kolmogorov-Smirnov separation of 0.963 for spatially extended morphologies such as SpiralBurst. The same construction yields spatial saliency maps that localize glitches without functioning as a binary classifier.

Core claim

Replacing the global CLS similarity metric with a top-k order statistic over individual patch token similarities against a Vector-Quantized reference index (K=64 centroids per class, 19 Gravity Spy O3b morphologies, 1216 total centroids) mitigates the signal dilution limitation, producing KS=0.963 distributional separation for spatially extended morphologies such as SpiralBurst on LIGO O4a L1 data.

What carries the argument

The top-k order statistic over individual patch-token similarities to a vector-quantized reference index of 1216 centroids.

If this is right

A topological saliency map built from spatial patch similarity against a background matrix of 78 null segments correctly localizes signatures for Scattered_Light and injected SpiralBurst.
The method confirms a patch-size temporal resolution limit for ultra-short transients such as AsymBlip.
Max/Mean ratio analysis shows that patch-level saliency functions as a topological visualizer rather than a binary detector.
The observed behavior is consistent with the non-isotropic geometry of DINOv2 embedding space on GW spectrograms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Updating the vector-quantized index online could allow the detector to track slowly evolving glitch populations across observing runs.
The same patch-level indexing could be applied to other time-frequency representations used in radio or X-ray transient searches.
If the separation holds for rarer morphologies not present in the original 19-class index, the approach would reduce reliance on labeled training sets for new glitch types.

Load-bearing premise

Embeddings from a DINOv2 model pretrained on natural photographs remain sufficiently structured on gravitational-wave spectrograms to support meaningful nearest-centroid matching after vector quantization, without any domain-specific fine-tuning.

What would settle it

Running the identical pipeline on a set of ultra-short transients such as AsymBlip and finding that the Kolmogorov-Smirnov statistic falls below statistical significance would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.09933 by Luca Cirfeta.

**Figure 1.** Figure 1: Architectural schematic comparing the global [CLS] token baseline against the proposed Patch-Level Topk Novelty Scoring framework. The 37×37 spatial grid illustrates the isolation of the Top-68 most anomalous patches relative to the Vector-Quantized Reference Index. 2.4. Topological Saliency Map While the VQ index successfully classifies global novelty, it introduces severe false positives if used for s… view at source ↗

**Figure 2.** Figure 2: Kolmogorov-Smirnov (KS) statistic vs. MatchedFilter SNR for AsymBlip, SpiralBurst, and HarmonicComb at optimal k = 68. The dashed line indicates the threshold for statistical significance (α = 0.05). The SpiralBurst experiences a transition to high separation at SNR ≈ 37, whereas AsymBlip remains strictly non-significant across the entire domain, mathematically confirming the ViT spatial diffraction lim… view at source ↗

**Figure 3.** Figure 3: Topological Saliency Map applied to an injected SpiralBurst (SNR ≈ 138). The spatial mapping isolates the morphological footprint of the transient, entirely ignoring Q-Transform boundary artifacts via purely spatial distance evaluations against the null median matrix. 4. DISCUSSION 4.1. Mitigating the Signal Dilution Barrier The results of the Micro-MDC confirm that extracting features at the 14 × 14 patc… view at source ↗

read the original abstract

We present a patch-level scoring architecture for unsupervised gravitational-wave glitch detection that mitigates the signal dilution limitation identified in Cirfeta (2026b). The CLS token of frozen DINOv2 (ViT-S/14) performs global average pooling over 37x37=1369 patches, systematically suppressing signals occupying less than 5% of the spectrogram grid. We replace the global CLS similarity metric with a top-$k$ order statistic over individual patch token similarities against a Vector-Quantized reference index ($K=64$ centroids per class, 19 Gravity Spy O3b morphologies, 1216 total centroids). Applied to strain-domain injections in LIGO O4a L1 data (session 20260524), we demonstrate a statistically significant distributional separation ($\text{KS}=0.963$ at optimal $k=68$) for spatially extended morphologies (SpiralBurst), while confirming the patch-size temporal resolution limit for ultra-short transients (AsymBlip). A topological saliency map constructed from spatial patch similarity against a background matrix (78 null segments) correctly localizes glitch signatures for Scattered_Light and injected SpiralBurst. The Max/Mean ratio analysis demonstrates that patch-level saliency functions as a topological visualizer rather than a binary detector, consistent with the non-isotropic geometry of DINOv2 embedding space on GW spectrograms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is an incremental patch-level tweak on the author's prior global DINOv2 glitch work that reports a strong KS number but without independent validation or embedding diagnostics.

read the letter

The core move is replacing the single CLS token similarity with a top-k order statistic over patch tokens matched to a 64-centroid VQ index built from 19 Gravity Spy classes. On the O4a L1 injections they show KS=0.963 separation for SpiralBurst while the saliency map localizes the signal against a background matrix.

That addresses the dilution issue for extended morphologies in a direct way and adds a visualization tool that the global method lacked. The confirmation that ultra-short transients hit the patch resolution limit is also useful.

The soft spots are the usual ones for this style of work. k=68 and the centroid count were selected on the same data segment used to quote the KS value, so the headline number is not a held-out result. No error bars, no repeated splits, and no side-by-side numbers against the earlier global CLS or against any standard glitch pipeline. The DINOv2 backbone stays frozen from natural-image pretraining with no reported checks on whether the patch embeddings actually organize by glitch morphology rather than generic texture.

The stress-test worry about embedding collapse therefore lands; nothing in the abstract or described methods rules it out.

This is for people already working on LIGO glitch rejection who want to try vision-model variants. A broader reader would get little unless they care about the specific top-k + VQ construction.

I would not bring it to reading group unless the group is deep in GW instrumentation. I would not cite it. It still deserves peer review because the dilution problem is real and the patch-level idea is concrete enough to test properly.

Referee Report

3 major / 2 minor

Summary. The paper introduces a patch-level scoring method for unsupervised gravitational-wave glitch detection that replaces the global CLS token similarity of frozen DINOv2 (ViT-S/14) with a top-k order statistic over individual patch-token similarities to a vector-quantized reference index (K=64 centroids per class across 19 Gravity Spy morphologies). Applied to LIGO O4a L1 strain injections, it reports a KS=0.963 distributional separation for extended morphologies such as SpiralBurst at k=68, while providing topological saliency maps that localize glitch signatures.

Significance. If the transfer of natural-image DINOv2 embeddings to single-channel spectrograms holds without domain adaptation, the method would provide a concrete mitigation of the signal-dilution problem for spatially extended glitches and a practical topological visualization tool. The use of real O4a data and the explicit reporting of a specific KS value on a named data segment are strengths; however, the absence of embedding diagnostics or independent validation limits the immediate impact.

major comments (3)

[Abstract (results paragraph)] The headline KS=0.963 result at k=68 is reported on the same LIGO O4a session used both to construct the 1216-centroid VQ index and to select the optimal k; no cross-validation, held-out segments, or description of the selection procedure is supplied, rendering the separation statistic non-independent.
[Abstract (method and results)] No embedding-space diagnostics (nearest-centroid purity, intra-/inter-class distances, or patch-token t-SNE) are presented to test whether the frozen ViT-S/14 tokens preserve morphology-specific geometry on GW spectrograms rather than collapsing to generic edge features; this assumption is load-bearing for the claim that VQ indexing, rather than generic saliency, drives the separation.
[Abstract] The dilution problem is defined solely by reference to the authors' prior work (Cirfeta 2026b) and the VQ centroids are fitted quantities; the manuscript supplies neither an external baseline comparison nor error bars on the KS statistic, weakening the quantitative claim of improvement.

minor comments (2)

[Abstract] The data segment identifier 'session 20260524' is used without definition or reference to its public availability.
[Abstract] Notation for the top-k order statistic and the background matrix (78 null segments) is introduced without an explicit equation or algorithmic pseudocode.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments and the opportunity to address the concerns raised. We respond point by point below, clarifying factual aspects of the method and committing to revisions that strengthen the manuscript without misrepresenting the current results.

read point-by-point responses

Referee: [Abstract (results paragraph)] The headline KS=0.963 result at k=68 is reported on the same LIGO O4a session used both to construct the 1216-centroid VQ index and to select the optimal k; no cross-validation, held-out segments, or description of the selection procedure is supplied, rendering the separation statistic non-independent.

Authors: We clarify that the 1216-centroid VQ index (K=64 per class across 19 Gravity Spy O3b morphologies) is constructed exclusively from the independent O3b Gravity Spy dataset and is not derived from the O4a session. The O4a L1 data (session 20260524) with strain injections serves solely as the test set. However, we agree that the value k=68 was selected on this same test session to maximize the reported KS statistic, and no cross-validation or held-out segments were used for this choice. This limits the independence of the headline result. In the revised manuscript we will explicitly describe the k-selection procedure and add a 5-fold cross-validation across O4a segments to report the KS distribution at the selected k. revision: yes
Referee: [Abstract (method and results)] No embedding-space diagnostics (nearest-centroid purity, intra-/inter-class distances, or patch-token t-SNE) are presented to test whether the frozen ViT-S/14 tokens preserve morphology-specific geometry on GW spectrograms rather than collapsing to generic edge features; this assumption is load-bearing for the claim that VQ indexing, rather than generic saliency, drives the separation.

Authors: We acknowledge that the current manuscript does not include explicit embedding-space diagnostics. The claims rest on the observed KS separation for extended morphologies and the topological saliency maps. To directly address the concern, the revised version will add a supplementary analysis section reporting nearest-centroid purity, mean intra- versus inter-class Euclidean distances on patch tokens, and a t-SNE projection of the patch embeddings computed on the Gravity Spy O3b set. These diagnostics will test whether the frozen DINOv2 tokens retain morphology-specific structure on spectrograms. revision: yes
Referee: [Abstract] The dilution problem is defined solely by reference to the authors' prior work (Cirfeta 2026b) and the VQ centroids are fitted quantities; the manuscript supplies neither an external baseline comparison nor error bars on the KS statistic, weakening the quantitative claim of improvement.

Authors: We agree that a self-contained definition of the dilution problem would improve accessibility. The revised abstract and introduction will include a brief, standalone description of the CLS-token dilution effect. We will also add a direct baseline comparison by reporting the KS statistic obtained with the global CLS token on the identical O4a injection set. Finally, we will compute and report bootstrap-derived 95% confidence intervals on the KS=0.963 value using 1000 resamples of the test segments. revision: yes

Circularity Check

2 steps flagged

Self-citation for dilution limit plus optimal-k selection on evaluation data reduce independence of KS=0.963 claim

specific steps

self citation load bearing [Abstract, sentence 1]
"We present a patch-level scoring architecture for unsupervised gravitational-wave glitch detection that mitigates the signal dilution limitation identified in Cirfeta (2026b)."

The paper's premise and claim to break the 'signal dilution barrier' is justified solely by citation to prior work by the same author (Cirfeta 2026b); the limitation itself is not re-derived or externally benchmarked here.
fitted input called prediction [Abstract, results sentence]
"we demonstrate a statistically significant distributional separation (KS=0.963 at optimal k=68) for spatially extended morphologies (SpiralBurst)"

k=68 is explicitly labeled 'optimal' and the KS value is reported at that value on the identical LIGO O4a L1 injection dataset (session 20260524), so the separation statistic is obtained after fitting the order-statistic hyperparameter to the evaluation distribution.

full rationale

The paper's central motivation invokes a self-citation to define the signal dilution problem it claims to solve. The headline KS separation is reported specifically at the 'optimal k=68' chosen on the same LIGO O4a injection dataset used for the result, satisfying the fitted-input-called-prediction pattern. No other load-bearing steps reduce by construction; the VQ construction and DINOv2 usage remain independent of the target metric.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on the transferability of natural-image DINOv2 features to spectrograms, the representativeness of the 19 Gravity Spy classes, and the post-hoc selection of k and K; these are domain assumptions rather than derived quantities.

free parameters (2)

k (top-k order statistic) = 68
Optimal value 68 selected to maximize reported KS separation on the evaluation data.
K (centroids per class) = 64
Vector-quantization codebook size fixed at 64 per morphology class.

axioms (2)

domain assumption DINOv2 ViT-S/14 embeddings trained on ImageNet remain informative when applied to GW spectrogram patches without fine-tuning.
Frozen model is used throughout; no adaptation step is described.
domain assumption The 19 Gravity Spy O3b morphologies constitute a sufficient and representative basis for unsupervised detection.
All reference centroids are derived from these 19 classes.

pith-pipeline@v0.9.1-grok · 5789 in / 1634 out tokens · 20061 ms · 2026-06-27T17:48:24.463374+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 1 canonical work pages

[1]

2012, Physical Review D, 85, 122006, https://doi.org/10.1103/PhysRevD.85.122006

Allen, B., Anderson, W.G., Brady, P.R., et al. 2012, Physical Review D, 85, 122006, https://doi.org/10.1103/PhysRevD.85.122006

work page doi:10.1103/physrevd.85.122006 2012
[2]

2026a, arXiv preprint arXiv:2605.28572

Cirfeta, L. 2026a, arXiv preprint arXiv:2605.28572

Pith/arXiv arXiv
[3]

2026b, arXiv preprint arXiv:2606.06237

Cirfeta, L. 2026b, arXiv preprint arXiv:2606.06237

Pith/arXiv arXiv
[4]

2024, ICLR 2024, arXiv:2309.16588

Darcet, T., Oquab, M., Doup´ e, E., & Bourdoukan, R. 2024, ICLR 2024, arXiv:2309.16588

Pith/arXiv arXiv 2024
[5]

B., et al

Glanzer, J., Banagiri, S., Coughlin, S. B., et al. 2023, Classical and Quantum Gravity, 40, 065004

2023
[6]

Kolmogorov, A. N. 1933, Giornale dell’Istituto Italiano degli Attuari, 4, 83–91

1933
[7]

2024, Machine Learning: Science and Technology

Li, X., et al. 2024, Machine Learning: Science and Technology

2024
[8]

2024, Transactions on Machine Learning Research

Oquab, M., Darcet, T., Moutakanni, T., et al. 2024, Transactions on Machine Learning Research

2024
[9]

2010, Proceedings of the 19th international conference on World wide web (WWW ’10)

Sculley, D. 2010, Proceedings of the 19th international conference on World wide web (WWW ’10)

2010
[10]

1948, Annals of Mathematical Statistics, 19(2), 279–281

Smirnov, N. 1948, Annals of Mathematical Statistics, 19(2), 279–281

1948
[11]

2025, arXiv preprint arXiv:2409.02831

Soni, S., et al. 2025, arXiv preprint arXiv:2409.02831

arXiv 2025
[12]

2017, Classical and Quantum Gravity

Zevin, M., et al. 2017, Classical and Quantum Gravity

2017

[1] [1]

2012, Physical Review D, 85, 122006, https://doi.org/10.1103/PhysRevD.85.122006

Allen, B., Anderson, W.G., Brady, P.R., et al. 2012, Physical Review D, 85, 122006, https://doi.org/10.1103/PhysRevD.85.122006

work page doi:10.1103/physrevd.85.122006 2012

[2] [2]

2026a, arXiv preprint arXiv:2605.28572

Cirfeta, L. 2026a, arXiv preprint arXiv:2605.28572

Pith/arXiv arXiv

[3] [3]

2026b, arXiv preprint arXiv:2606.06237

Cirfeta, L. 2026b, arXiv preprint arXiv:2606.06237

Pith/arXiv arXiv

[4] [4]

2024, ICLR 2024, arXiv:2309.16588

Darcet, T., Oquab, M., Doup´ e, E., & Bourdoukan, R. 2024, ICLR 2024, arXiv:2309.16588

Pith/arXiv arXiv 2024

[5] [5]

B., et al

Glanzer, J., Banagiri, S., Coughlin, S. B., et al. 2023, Classical and Quantum Gravity, 40, 065004

2023

[6] [6]

Kolmogorov, A. N. 1933, Giornale dell’Istituto Italiano degli Attuari, 4, 83–91

1933

[7] [7]

2024, Machine Learning: Science and Technology

Li, X., et al. 2024, Machine Learning: Science and Technology

2024

[8] [8]

2024, Transactions on Machine Learning Research

Oquab, M., Darcet, T., Moutakanni, T., et al. 2024, Transactions on Machine Learning Research

2024

[9] [9]

2010, Proceedings of the 19th international conference on World wide web (WWW ’10)

Sculley, D. 2010, Proceedings of the 19th international conference on World wide web (WWW ’10)

2010

[10] [10]

1948, Annals of Mathematical Statistics, 19(2), 279–281

Smirnov, N. 1948, Annals of Mathematical Statistics, 19(2), 279–281

1948

[11] [11]

2025, arXiv preprint arXiv:2409.02831

Soni, S., et al. 2025, arXiv preprint arXiv:2409.02831

arXiv 2025

[12] [12]

2017, Classical and Quantum Gravity

Zevin, M., et al. 2017, Classical and Quantum Gravity

2017