Automated Detection and Climatological Analysis of Ripple-Scale Gravity Wave Instabilities Using a Squeeze-and-Excitation Convolutional Neural Network

Adriana Feener; Alan Liu; Jiahui Hu; Jing Li; Tao Li; Wenjun Dong

arxiv: 2603.03669 · v3 · submitted 2026-03-04 · ⚛️ physics.ao-ph

Automated Detection and Climatological Analysis of Ripple-Scale Gravity Wave Instabilities Using a Squeeze-and-Excitation Convolutional Neural Network

Jiahui Hu , Alan Liu , Adriana Feener , Jing Li , Tao Li , Wenjun Dong This is my paper

Pith reviewed 2026-05-15 17:11 UTC · model grok-4.3

classification ⚛️ physics.ao-ph

keywords gravity waveairglow imagingconvolutional neural networkmesosphereinstabilityripple scaleautomated detectionclimatology

0 comments

The pith

A squeeze-and-excitation CNN detects ripple-scale gravity wave instabilities in airglow images with 92% F1-score.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops an automated detection method based on a squeeze-and-excitation convolutional neural network to find ripple-scale gravity wave instabilities in all-sky OH airglow images taken near 87 km altitude. These features have 5-15 km horizontal wavelengths and brief lifetimes, making consistent manual identification difficult. The network is trained on normalized 41 by 41 pixel patches extracted from larger images and reaches 92% F1-score on test patches. At the event level the detections match roughly 90% of previously identified ripples by hand while also flagging additional low-amplitude cases. The resulting catalog supports objective counts of ripple frequency, seasonal patterns, and lifetime statistics across long image archives.

Core claim

The SE-CNN classifier, applied via sliding window to time-differenced and MAD-normalized image patches, achieves 92% F1-score at the patch level and recovers approximately 90% of manually identified ripple events at the event level while also identifying additional low-amplitude occurrences, thereby enabling objective quantification of ripple occurrence frequency, seasonal modulation, and lifetime distributions from long-term airglow image archives.

What carries the argument

Squeeze-and-excitation convolutional neural network (SE-CNN) trained to classify 41x41 pixel normalized patches as ripple or non-ripple, combined with sliding-window scanning and spatial-temporal clustering to define discrete events.

If this is right

Ripple occurrence frequency can be measured objectively without human bias across multi-year datasets.
Seasonal modulation of ripple events becomes quantifiable through consistent automated catalogs.
Lifetime distributions of these short-lived instabilities can be derived directly from the detections.
The approach scales to process entire long-term airglow archives without proportional increases in labor.
Additional weak ripples missed by manual review are now included in the statistics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same patch-based classification could be retrained on data from other airglow wavelengths or instruments to broaden the climatology.
Combining the ripple detections with simultaneous wind or temperature profiles could link instability occurrence to wave breaking and momentum flux.
If run in near real time the method might support continuous monitoring networks for mesospheric dynamics.

Load-bearing premise

The manually annotated ripple and non-ripple patches form an unbiased and consistent ground truth without significant labeling errors or variability between annotators.

What would settle it

A new round of independent annotations by multiple observers on the same image set that shows the automated catalog disagrees with the original manual events on more than 20% of cases, especially low-amplitude ripples.

read the original abstract

All-sky OH airglow imaging provides two-dimensional observations of mesospheric gravity wave structure near ~87 km altitude. Ripple-scale instability signatures, characterized by 5-15 km horizontal wavelengths and short lifetimes, are particularly difficult to identify consistently using manual inspection. In this study, we develop a reproducible, automated detection framework based on a squeeze-and-excitation convolutional neural network (SE-CNN) trained on 41 x 41 pixel image patches, to identify ripple-scale structures in 512 x 512 pixel all-sky airglow images acquired at Yucca Ridge Field Station (40.7o N, 104.9o W). The time-differenced images are normalized using a robust median-absolute-deviation (MAD) scaling procedure to mitigate star contamination and background variability. The model is trained and validated on manually annotated ripple and non-ripple patches, then evaluated using independent test subsets. The automated detection is performed using a sliding-window approach with spatial and temporal clustering criteria for event definition. At the patch level, the classifier achieves 92\% F1-score with high precision and recall. At the event level, automated detections recover approximately 90\% of manually identified ripple events while identifying additional low-amplitude occurrences. Validated against previous manual identification study, the automated detection catalog enables objective quantification of ripple occurrence frequency, seasonal modulation, and lifetime distributions. By emphasizing methodological transparency, calibration considerations, and validation metrics, this framework establishes a scalable measurement technique for systematic detection of mesospheric instability signatures in long-term airglow image archives.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The SE-CNN pipeline automates ripple detection in OH airglow images with solid patch-level numbers and recovers most manual events, but the whole claim rests on unverified manual labels that lack any agreement check.

read the letter

The paper puts a squeeze-and-excitation CNN to work on 41x41 patches from time-differenced all-sky OH images, using MAD normalization to handle stars and background noise, then clusters the hits in space and time to define events. That combination is new enough for this narrow task and gives a practical way to scan long archives without staring at every frame. They report 92% F1 at the patch level and recover about 90% of the manually flagged events while picking up some extra low-amplitude ones, which lets them start talking about occurrence rates, seasonal changes, and lifetimes in a more systematic way. The independent test subsets and the focus on reproducibility are clear pluses for a methods paper in mesospheric remote sensing. The soft spot is exactly what the stress-test note flags: the training labels come from manual annotation with no reported protocol, no number of labelers, and no inter-annotator agreement number. Low-amplitude ripples are the hard ones to call consistently by eye, so any bias there flows straight into the model and into the 90% recovery figure. No error bars on the metrics and no test of how the clustering thresholds move the event counts either. Those gaps are real but not fatal; they are the kind of thing a revision can fix with a short methods addendum and a sensitivity table. This is for people who already work with airglow imagery and need to scale up instability catalogs. It is worth sending to peer review because the core pipeline is grounded, the metrics are concrete, and it solves a genuine throughput problem even if the validation details need tightening.

Referee Report

2 major / 2 minor

Summary. The manuscript develops a squeeze-and-excitation convolutional neural network (SE-CNN) trained on manually annotated 41×41 pixel patches extracted from time-differenced, MAD-normalized all-sky OH airglow images to detect ripple-scale gravity wave instabilities (5-15 km wavelengths). At patch level the classifier reports 92% F1-score on independent test subsets; at event level, after spatial-temporal clustering, it recovers ~90% of a prior manual catalog while flagging additional low-amplitude features, thereby enabling objective climatological statistics on occurrence frequency, seasonal modulation, and lifetime distributions.

Significance. If the central performance claims hold under improved validation, the work supplies a reproducible, scalable pipeline that can replace inconsistent manual inspection of long-term airglow archives, directly supporting quantitative studies of mesospheric instability processes that have previously been limited by subjective labeling.

major comments (2)

[Methods (data preparation and annotation)] The ground-truth labels are produced by manual annotation of 41×41 patches, yet the manuscript provides no annotation protocol, number of labelers, or inter-annotator agreement statistic. Because the abstract itself notes that low-amplitude ripples are “particularly difficult to identify consistently,” any systematic bias in these labels propagates directly into the reported 92% F1 and 90% event-recovery figures and undermines the claim that additional detections constitute objective gains.
[Results (event definition and clustering)] Event-level statistics depend on post-hoc spatial-temporal clustering thresholds whose values are listed among the free parameters but are not subjected to ablation; the manuscript therefore does not demonstrate that the ~90% recovery rate is robust to reasonable variations in those thresholds.

minor comments (2)

[Results (patch-level metrics)] Performance figures are given without error bars or confidence intervals; adding bootstrap or cross-validation uncertainty estimates would strengthen the quantitative claims.
[Methods (detection pipeline)] The exact sliding-window stride, overlap handling, and precise definition of an “event” after clustering should be stated explicitly to permit full reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments that highlight important aspects of reproducibility and robustness in our study. We provide detailed responses to each major comment and commit to revisions that strengthen the manuscript.

read point-by-point responses

Referee: [Methods (data preparation and annotation)] The ground-truth labels are produced by manual annotation of 41×41 patches, yet the manuscript provides no annotation protocol, number of labelers, or inter-annotator agreement statistic. Because the abstract itself notes that low-amplitude ripples are “particularly difficult to identify consistently,” any systematic bias in these labels propagates directly into the reported 92% F1 and 90% event-recovery figures and undermines the claim that additional detections constitute objective gains.

Authors: We agree that the manuscript should provide more details on how the ground-truth labels were generated. In the revised version, we will expand the Methods section to include a full description of the annotation protocol, specifying the visual criteria used for identifying ripple-scale instabilities (5-15 km wavelengths in time-differenced images), that the annotations were carried out by a single experienced researcher to maintain consistency, and the total number of patches labeled. Although inter-annotator agreement statistics are not available because a single annotator was used, we will add a paragraph discussing the challenges of low-amplitude ripple identification as mentioned in the abstract and how the automated approach offers improved consistency for climatological analysis. This will clarify that the additional detections are not undermined by label bias but rather highlight the model's ability to detect subtle features objectively. revision: yes
Referee: [Results (event definition and clustering)] Event-level statistics depend on post-hoc spatial-temporal clustering thresholds whose values are listed among the free parameters but are not subjected to ablation; the manuscript therefore does not demonstrate that the ~90% recovery rate is robust to reasonable variations in those thresholds.

Authors: We concur that an ablation study on the clustering thresholds is necessary to validate the robustness of the event-level recovery rate. Accordingly, in the revised manuscript, we will add results from an ablation experiment where we systematically vary the spatial and temporal clustering parameters within physically plausible ranges and demonstrate that the recovery rate remains stable around 90%. This will be presented in a new figure or table, ensuring that the reported statistics are not dependent on specific threshold choices. revision: yes

Circularity Check

0 steps flagged

No circularity: standard supervised evaluation on held-out labels

full rationale

The paper trains the SE-CNN on manually annotated 41x41 patches and reports patch-level F1 and event-level recovery metrics on independent test subsets and against a prior manual catalog. No equations reduce these metrics to fitted inputs by construction, no self-citations supply load-bearing uniqueness theorems or ansatzes, and the preprocessing (MAD normalization, sliding-window clustering) does not redefine the target quantities. The derivation chain is self-contained against external manual benchmarks.

Axiom & Free-Parameter Ledger

3 free parameters · 1 axioms · 0 invented entities

The framework rests on standard supervised-learning assumptions plus domain-specific preprocessing choices; no new physical entities are postulated.

free parameters (3)

41x41 patch size
Chosen to match typical ripple horizontal scale
MAD scaling factor
Robust normalization parameter to suppress star contamination
spatial-temporal clustering thresholds
Post-processing rules that convert patch detections into events

axioms (1)

domain assumption Human annotations provide reliable ground truth for ripple presence
Training and validation depend on manual labels without reported inter-annotator agreement metrics

pith-pipeline@v0.9.0 · 5597 in / 1359 out tokens · 61167 ms · 2026-05-15T17:11:34.596014+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The model is trained and validated on manually annotated ripple and non-ripple patches... SE block performs squeeze (global average pooling), excitation (two-layer bottleneck with ReLU+sigmoid), scaling
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

MAD scaling... 92% F1-score... 90% event recovery

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.