Recognition: unknown
Machine Learning Supports Existence of Previously Unrecognized Transient Astronomical Phenomena in Historical Observatory Images
Pith reviewed 2026-05-10 03:40 UTC · model grok-4.3
The pith
Machine learning filtering leaves intact the excess of transients near nuclear tests and their deficit in Earth's shadow in old plates.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
After training a machine learning classifier on expert visual labels of 250 transient image pairs taken 30 minutes apart, deployment on 107,875 candidates shows that controlling for ML-identified artifacts leaves transient counts significantly elevated within the nuclear window and produces a significant shadow deficit that is largest among the highest-probability real transients.
What carries the argument
Machine learning classifier trained on 250 expert-labeled transient image pairs, which assigns each of 107,875 transients a probability of being real rather than a plate defect.
If this is right
- Transient counts remain significantly elevated for dates inside the nuclear window after controlling for ML-identified artifacts.
- Transients with the highest probability of being real are more likely to occur inside the nuclear window than lower-probability ones.
- The shadow deficit is significant overall and is largest in the highest-probability transients relative to lower-probability transients.
Where Pith is reading between the lines
- The same ML validation approach could be applied to other historical plate archives to search for additional high-probability candidates.
- If the high-probability transients are real, targeted modern observations timed to nuclear-test anniversaries or shadow geometry might detect similar brief events.
- Cross-matching the highest-probability transients against independent catalogs or re-imaging selected fields would provide a non-visual test of their reality.
Load-bearing premise
Expert visual classification of the 250 image pairs provides accurate, unbiased ground truth for training the model, and the model generalizes without systematic bias to the full set of transients.
What would settle it
Repeating the full analysis with a fresh set of expert labels or an independent non-ML classification method that removes the statistical significance of both the nuclear-window excess and the shadow deficit.
read the original abstract
Transient, star-like point sources that appear and vanish over short timescales are described in astronomical images prior to launch of Sputnik. We have reported that transient numbers diminish significantly in Earth's shadow (shadow deficit) and are more likely within (plus/minus) one day of nuclear testing (nuclear window). These findings remain debated with some arguing that transients identified via existing automated pipelines are simply plate defects. Therefore, we use machine learning (ML) to enhance transient identification accuracy and validate the phenomenon. The model was trained against 250 transient image pairs taken 30 minutes apart that were classified as real versus plate defect by expert visual review; the model demonstrated good discrimination (out-of-fold AUC$=$0.81; sensitivity$=$0.71, specificity$=$0.71). After deployment in a dataset of 107,875 previously-identified transients, the model assigned each a probability of being real. After controlling for ML-identified artifacts, transient counts were significantly elevated for dates within a nuclear window (p$=$.024); transients with the highest probability of being real were more likely to occur within a nuclear window (p$<$.0001). The shadow deficit was significant (p$<$.0001) and largest in the highest probability transients relative to lower probability transients (p$=$.003). Results strongly support existence of an unrecognized population of transient objects in historical astronomical plates warranting further study.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims that a machine learning classifier, trained on 250 expert-labeled pairs of historical astronomical images (out-of-fold AUC=0.81, sensitivity=specificity=0.71), can be deployed on 107,875 previously identified transients to separate real events from plate defects. After this filtering, transient counts remain significantly elevated within a one-day nuclear-testing window (p=0.024), high-probability transients are more likely to fall in the nuclear window (p<0.0001), and a shadow deficit is confirmed (p<0.0001) with the effect strongest among the highest-probability subset (p=0.003). These results are presented as support for an unrecognized population of transient objects in pre-Sputnik plates.
Significance. If the central statistical claims survive additional validation, the work would constitute a meaningful methodological contribution to the analysis of historical plate archives by using ML to address long-standing artifact criticisms. It would elevate the nuclear-window and shadow-deficit signals from plausible to more robustly supported, potentially motivating targeted follow-up observations or re-processing of other plate collections. The approach demonstrates how supervised classification can be integrated with astronomical time-domain statistics when ground-truth labels are available.
major comments (2)
- [Results section describing ML deployment and probability-stratified tests] The reported sensitivity and specificity of 0.71 imply a ~29% misclassification rate. The key results (nuclear-window elevation after ML control, p=0.024; shadow deficit p<0.0001 strongest in high-probability transients, p=0.003) rest on the assumption that residual classification errors are statistically independent of observation date and shadow geometry. No diagnostic is described that stratifies error rates or false-positive rates by nuclear-window membership or shadow status; if plate defects or scanning artifacts vary systematically with era or observing conditions, the probability-weighted tests could produce spurious significance.
- [Methods section on ML training and expert labeling] The training set of 250 expert-classified image pairs is modest relative to the 107,875-transient deployment set. While out-of-fold AUC=0.81 is reported, the manuscript does not detail inter-rater reliability among experts, potential date-dependent labeling biases (e.g., emulsion or scanning quality changes that track nuclear-test eras), or explicit checks that the learned decision boundary generalizes without introducing systematic bias correlated with the nuclear-window or shadow variables.
minor comments (2)
- [Abstract] The abstract supplies the AUC and sensitivity/specificity values but omits the ML algorithm type, feature set, and any class-imbalance mitigation strategy. Adding one sentence summarizing these choices would improve reproducibility and reader confidence without lengthening the abstract substantially.
- [Statistical methods] The three reported p-values (0.024, <0.0001, 0.003) are presented without explicit discussion of multiple-testing correction or family-wise error control. Even if the authors judge correction unnecessary, a brief statement would clarify the statistical procedure.
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive comments, which have prompted us to strengthen the robustness section of the manuscript. We address each major comment below.
read point-by-point responses
-
Referee: The reported sensitivity and specificity of 0.71 imply a ~29% misclassification rate. The key results (nuclear-window elevation after ML control, p=0.024; shadow deficit p<0.0001 strongest in high-probability transients, p=0.003) rest on the assumption that residual classification errors are statistically independent of observation date and shadow geometry. No diagnostic is described that stratifies error rates or false-positive rates by nuclear-window membership or shadow status; if plate defects or scanning artifacts vary systematically with era or observing conditions, the probability-weighted tests could produce spurious significance.
Authors: We agree that residual misclassification could in principle introduce bias if error rates correlate with nuclear-window membership or shadow status, and that the current manuscript lacks an explicit diagnostic for this. The observed strengthening of both signals among the highest-probability transients is consistent with the classifier recovering genuine astrophysical differences rather than fabricating them, but this is indirect evidence. In the revised manuscript we will add a new Results subsection that (i) compares the full distribution of ML probabilities inside versus outside the nuclear window and inside versus outside the shadow, and (ii) reports a Kolmogorov-Smirnov test on these distributions. We will also tabulate the fraction of high-probability transients in each stratum. These additions will directly test the independence assumption. revision: yes
-
Referee: The training set of 250 expert-classified image pairs is modest relative to the 107,875-transient deployment set. While out-of-fold AUC=0.81 is reported, the manuscript does not detail inter-rater reliability among experts, potential date-dependent labeling biases (e.g., emulsion or scanning quality changes that track nuclear-test eras), or explicit checks that the learned decision boundary generalizes without introducing systematic bias correlated with the nuclear-window or shadow variables.
Authors: The 250-pair training set is the complete expert-labeled resource available; its size is modest but yielded a usable out-of-fold AUC of 0.81. We will expand the Methods section to describe the labeling protocol, including that the single expert reviewer was blinded to nuclear-test dates and shadow geometry. We will also add a sensitivity check that recomputes AUC on training subsets drawn from different observation eras. Because only one expert performed the classifications, inter-rater reliability statistics were never collected and cannot be supplied. revision: partial
- Inter-rater reliability metrics for the expert visual classifications, as only a single expert performed the labeling and no multi-rater data exist.
Circularity Check
No significant circularity in the ML validation pipeline.
full rationale
The paper trains an ML classifier on 250 expert-labeled image pairs (independent ground truth) and applies it to score 107,875 pre-identified transients. Statistical tests (p-values for nuclear window elevation and shadow deficit) are then performed on the probability-stratified counts. These steps do not reduce to self-definition or fitted inputs called predictions. While the transient catalog originates from the authors' prior work, the current results introduce new ML-based controls and are not forced by that prior catalog alone. No uniqueness theorems or ansatzes are smuggled in. The derivation remains self-contained against the external nuclear test dates and expert labels.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Expert visual review of 250 transient image pairs supplies accurate ground-truth labels separating real sources from plate defects
- domain assumption The definitions of the nuclear testing window and Earth's shadow region are unbiased and correctly implemented for the statistical tests
Forward citations
Cited by 1 Pith paper
-
Statistically Significant Linear Alignments Among High-Confidence Transient Candidates on POSS-I Photographic Plates
Statistically significant linear alignments among high-confidence transient candidates on 1949-1957 photographic plates are detected, projecting to constant geographic longitudes with clustering near specific Earth sites.
Reference graph
Works this paper leans on
-
[1]
Transients in the Palomar Observatory Sky Survey (POSS-I) may be associated with nuclear testing and reports of unidentified anomalous phenomena
Bruehl, S., Villarroel, B. Transients in the Palomar Observatory Sky Survey (POSS-I) may be associated with nuclear testing and reports of unidentified anomalous phenomena. Sci Rep 2025; 15: 34125
2025
-
[2]
Searching for Fast Astronomical Transients in Archival Photographic Plates
Busko, I. Searching for Fast Astronomical Transients in Archival Photographic Plates. 2026. Preprint at: https://arxiv.org/abs/2603.20407
-
[3]
Cann, K. Geomagnetic storm suppression of photographic plate transient detections in the POSS-I archive: an independent physical variable strengthening the nuclear test correlation. 2026. Preprint at: https://arxiv.org/abs/2604.04950
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[4]
Cann, K. Plate sensitivity is invariant across geomagnetic storm intensity at two independent observatories: Ruling out the airglow artifact and confirming source-specific transient suppression. 2025. Preprint at: https://doi.org/10.22541/essoar.15002100/v1
-
[5]
Doherty, B. Independent Replication of Nuclear Test-Transient Correlations and Earth Shadow Deficit in POSS-I Photographic Plates. 2026. Preprint at: https://arxiv.org/abs/2604.00056v1
-
[6]
On the nature of apparent transient sources on the National Geographic Society–Palomar Observatory Sky Survey glass copy plates
Hambly NC, Blair A. On the nature of apparent transient sources on the National Geographic Society–Palomar Observatory Sky Survey glass copy plates. RAS Techniques and Instruments 2024; 3, 73-79
2024
-
[7]
UFOs & nukes: extraordinary encounters at nuclear weapons sites (2nd Edition)
Hastings, R. UFOs & nukes: extraordinary encounters at nuclear weapons sites (2nd Edition). 2017. Self-Published
2017
-
[8]
Solano, E. et al. A bright triple transient that vanished within 50 min, Monthly Notices Royal Astron Soc. 2024; 527: 6312–6320
2024
-
[9]
Discovering vanishing objects in POSS I red images 24 using the Virtual Observatory, Monthly Notices Royal Astron Soc
Solano, E., Villarroel, B., & Rodrigo, C. Discovering vanishing objects in POSS I red images 24 using the Virtual Observatory, Monthly Notices Royal Astron Soc. 2022; 515: 1380–1391 (2022)
2022
-
[10]
Having a dream
Spitzer RL, Williams JB. Having a dream. A research strategy for DSM-IV . Arch Gen Psychiatry 1988; 45: 871-874
1988
-
[11]
Missing Star
Villarroel, B. et al. The Vanishing and Appearing Sources during a Century of Observations Project. I. USNO Objects Missing in Modern Sky Surveys and Follow-up Observations of a “Missing Star”. Astronom J 2020; 159: 8
2020
-
[12]
Villarroel, B. et al. Exploring nine simultaneously occurring transients on April 12th 1950. Sci Rep 2021; 11: 12794
1950
-
[13]
Villarroel, B. et al. A glint in the eye: Photographic plate archive searches for non- terrestrial artefacts. Acta Astronautica 2022; 194: 106-113
2022
-
[14]
Villarroel, B. et al. 2025. Aligned, Multiple-Transient Events in the First Palomar Sky Survey. PASP 2-25; 137: 104504
2025
- [15]
-
[16]
A Response to Watters et al
Villarroel, B., Streblyanska, A., Bruehl, S., Geier, S. A Response to Watters et al. (2026)
2026
-
[17]
Preprint at: https://arxiv.org/pdf/2602.15171
work page internal anchor Pith review Pith/arXiv arXiv
-
[18]
Watters C. Z., Domine L., Little S. Pratt C., Knuth K. H., 2026, Critical Evaluation of Studies Alleging Evidence for Technosignatures in the POSS1-E Photographic Plates. 2026. Preprint at: https://arxiv.org/abs/2601.21946
-
[19]
Toward an empirical classification for the DSM-IV
Widiger TA, Frances AJ, Pincus HA, Davis WW, & First MB. Toward an empirical classification for the DSM-IV . J Abnorm Psychol 1991; 100: 280-288. 25 Acknowledgments B.V . is funded by the Swedish Research Council (Vetenskapsr\aa det, grant no. 2024-04708) and supported by a generous donor. A.S. is supported by the Athanatos Foundation. The authors would l...
1991
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.