arxiv: 2604.18799 · v2 · submitted 2026-04-20 · 🌌 astro-ph.IM

Recognition: unknown

Machine Learning Supports Existence of Previously Unrecognized Transient Astronomical Phenomena in Historical Observatory Images

Stephen Bruehl , Brian Doherty , Alina Streblyanska , Beatriz Villarroel

Authors on Pith no claims yet

Pith reviewed 2026-05-10 03:40 UTC · model grok-4.3

classification 🌌 astro-ph.IM

keywords transient astronomical phenomenahistorical observatory platesmachine learning classificationnuclear testingEarth shadow deficitphotographic plate defectspre-satellite astronomy

0 comments

The pith

Machine learning filtering leaves intact the excess of transients near nuclear tests and their deficit in Earth's shadow in old plates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether star-like transients that appear and disappear in historical observatory images before Sputnik are genuine sky objects or simply defects on photographic plates. Researchers trained a machine learning model on 250 image pairs that experts had visually classified as real versus artifact, then applied it to score more than 107,000 previously identified transients by probability of being real. After removing likely artifacts, the original statistical patterns survived and strengthened: transients appeared more often within one day of nuclear weapons tests, and they were rarer when the plates were taken in Earth's shadow. These effects were clearest for the transients the model rated as most probably real. A sympathetic reader would care because confirmation would indicate an unrecognized population of brief events in mid-20th-century skies that is not explained by plate flaws.

Core claim

After training a machine learning classifier on expert visual labels of 250 transient image pairs taken 30 minutes apart, deployment on 107,875 candidates shows that controlling for ML-identified artifacts leaves transient counts significantly elevated within the nuclear window and produces a significant shadow deficit that is largest among the highest-probability real transients.

What carries the argument

Machine learning classifier trained on 250 expert-labeled transient image pairs, which assigns each of 107,875 transients a probability of being real rather than a plate defect.

If this is right

Transient counts remain significantly elevated for dates inside the nuclear window after controlling for ML-identified artifacts.
Transients with the highest probability of being real are more likely to occur inside the nuclear window than lower-probability ones.
The shadow deficit is significant overall and is largest in the highest-probability transients relative to lower-probability transients.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same ML validation approach could be applied to other historical plate archives to search for additional high-probability candidates.
If the high-probability transients are real, targeted modern observations timed to nuclear-test anniversaries or shadow geometry might detect similar brief events.
Cross-matching the highest-probability transients against independent catalogs or re-imaging selected fields would provide a non-visual test of their reality.

Load-bearing premise

Expert visual classification of the 250 image pairs provides accurate, unbiased ground truth for training the model, and the model generalizes without systematic bias to the full set of transients.

What would settle it

Repeating the full analysis with a fresh set of expert labels or an independent non-ML classification method that removes the statistical significance of both the nuclear-window excess and the shadow deficit.

read the original abstract

Transient, star-like point sources that appear and vanish over short timescales are described in astronomical images prior to launch of Sputnik. We have reported that transient numbers diminish significantly in Earth's shadow (shadow deficit) and are more likely within (plus/minus) one day of nuclear testing (nuclear window). These findings remain debated with some arguing that transients identified via existing automated pipelines are simply plate defects. Therefore, we use machine learning (ML) to enhance transient identification accuracy and validate the phenomenon. The model was trained against 250 transient image pairs taken 30 minutes apart that were classified as real versus plate defect by expert visual review; the model demonstrated good discrimination (out-of-fold AUC$=$0.81; sensitivity$=$0.71, specificity$=$0.71). After deployment in a dataset of 107,875 previously-identified transients, the model assigned each a probability of being real. After controlling for ML-identified artifacts, transient counts were significantly elevated for dates within a nuclear window (p$=$.024); transients with the highest probability of being real were more likely to occur within a nuclear window (p$<$.0001). The shadow deficit was significant (p$<$.0001) and largest in the highest probability transients relative to lower probability transients (p$=$.003). Results strongly support existence of an unrecognized population of transient objects in historical astronomical plates warranting further study.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ML probabilities make the nuclear-window excess and shadow deficit look stronger, but the work still rests on untested assumptions about date-independent classification errors.

read the letter

The main takeaway is that training a classifier on 250 expert-labeled transient pairs and then showing the reported signals strengthen in the high-probability subset adds a useful internal check to the earlier claims. The out-of-fold AUC of 0.81 and the monotonic improvement in p-values (nuclear window down to p<0.0001, shadow deficit largest at p=0.003 in the top bin) are the genuinely new pieces here. That approach directly tackles the objection that the detections are mostly plate defects by letting the data speak on which subset behaves more like real events. It is a reasonable next step for a group that already has the catalog and the date associations in hand. The paper does this cleanly enough that the central pattern survives the re-weighting. The soft spots sit mostly in the validation layer. The training set is small, feature details are thin in the abstract, and there is no reported check that residual errors are uncorrelated with observation date or nuclear-test windows. With sensitivity and specificity both around 0.71, even modest date-dependent artifacts could still leak into the high-probability bin and drive the statistics. The fact that the same research program supplied both the original transients and the expert labels adds a modest circularity that reviewers will want to see addressed. This is a paper for people already tracking archival-plate transients or working with legacy sky surveys. It is not a broad discovery but a methodological tightening of a contested result. A serious editor should send it to peer review; the ML step improves the evidence enough to merit referee time, provided the authors supply the missing error-independence tests and fuller methods.

Referee Report

2 major / 2 minor

Summary. The manuscript claims that a machine learning classifier, trained on 250 expert-labeled pairs of historical astronomical images (out-of-fold AUC=0.81, sensitivity=specificity=0.71), can be deployed on 107,875 previously identified transients to separate real events from plate defects. After this filtering, transient counts remain significantly elevated within a one-day nuclear-testing window (p=0.024), high-probability transients are more likely to fall in the nuclear window (p<0.0001), and a shadow deficit is confirmed (p<0.0001) with the effect strongest among the highest-probability subset (p=0.003). These results are presented as support for an unrecognized population of transient objects in pre-Sputnik plates.

Significance. If the central statistical claims survive additional validation, the work would constitute a meaningful methodological contribution to the analysis of historical plate archives by using ML to address long-standing artifact criticisms. It would elevate the nuclear-window and shadow-deficit signals from plausible to more robustly supported, potentially motivating targeted follow-up observations or re-processing of other plate collections. The approach demonstrates how supervised classification can be integrated with astronomical time-domain statistics when ground-truth labels are available.

major comments (2)

[Results section describing ML deployment and probability-stratified tests] The reported sensitivity and specificity of 0.71 imply a ~29% misclassification rate. The key results (nuclear-window elevation after ML control, p=0.024; shadow deficit p<0.0001 strongest in high-probability transients, p=0.003) rest on the assumption that residual classification errors are statistically independent of observation date and shadow geometry. No diagnostic is described that stratifies error rates or false-positive rates by nuclear-window membership or shadow status; if plate defects or scanning artifacts vary systematically with era or observing conditions, the probability-weighted tests could produce spurious significance.
[Methods section on ML training and expert labeling] The training set of 250 expert-classified image pairs is modest relative to the 107,875-transient deployment set. While out-of-fold AUC=0.81 is reported, the manuscript does not detail inter-rater reliability among experts, potential date-dependent labeling biases (e.g., emulsion or scanning quality changes that track nuclear-test eras), or explicit checks that the learned decision boundary generalizes without introducing systematic bias correlated with the nuclear-window or shadow variables.

minor comments (2)

[Abstract] The abstract supplies the AUC and sensitivity/specificity values but omits the ML algorithm type, feature set, and any class-imbalance mitigation strategy. Adding one sentence summarizing these choices would improve reproducibility and reader confidence without lengthening the abstract substantially.
[Statistical methods] The three reported p-values (0.024, <0.0001, 0.003) are presented without explicit discussion of multiple-testing correction or family-wise error control. Even if the authors judge correction unnecessary, a brief statement would clarify the statistical procedure.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their careful reading and constructive comments, which have prompted us to strengthen the robustness section of the manuscript. We address each major comment below.

read point-by-point responses

Referee: The reported sensitivity and specificity of 0.71 imply a ~29% misclassification rate. The key results (nuclear-window elevation after ML control, p=0.024; shadow deficit p<0.0001 strongest in high-probability transients, p=0.003) rest on the assumption that residual classification errors are statistically independent of observation date and shadow geometry. No diagnostic is described that stratifies error rates or false-positive rates by nuclear-window membership or shadow status; if plate defects or scanning artifacts vary systematically with era or observing conditions, the probability-weighted tests could produce spurious significance.

Authors: We agree that residual misclassification could in principle introduce bias if error rates correlate with nuclear-window membership or shadow status, and that the current manuscript lacks an explicit diagnostic for this. The observed strengthening of both signals among the highest-probability transients is consistent with the classifier recovering genuine astrophysical differences rather than fabricating them, but this is indirect evidence. In the revised manuscript we will add a new Results subsection that (i) compares the full distribution of ML probabilities inside versus outside the nuclear window and inside versus outside the shadow, and (ii) reports a Kolmogorov-Smirnov test on these distributions. We will also tabulate the fraction of high-probability transients in each stratum. These additions will directly test the independence assumption. revision: yes
Referee: The training set of 250 expert-classified image pairs is modest relative to the 107,875-transient deployment set. While out-of-fold AUC=0.81 is reported, the manuscript does not detail inter-rater reliability among experts, potential date-dependent labeling biases (e.g., emulsion or scanning quality changes that track nuclear-test eras), or explicit checks that the learned decision boundary generalizes without introducing systematic bias correlated with the nuclear-window or shadow variables.

Authors: The 250-pair training set is the complete expert-labeled resource available; its size is modest but yielded a usable out-of-fold AUC of 0.81. We will expand the Methods section to describe the labeling protocol, including that the single expert reviewer was blinded to nuclear-test dates and shadow geometry. We will also add a sensitivity check that recomputes AUC on training subsets drawn from different observation eras. Because only one expert performed the classifications, inter-rater reliability statistics were never collected and cannot be supplied. revision: partial

standing simulated objections not resolved

Inter-rater reliability metrics for the expert visual classifications, as only a single expert performed the labeling and no multi-rater data exist.

Circularity Check

0 steps flagged

No significant circularity in the ML validation pipeline.

full rationale

The paper trains an ML classifier on 250 expert-labeled image pairs (independent ground truth) and applies it to score 107,875 pre-identified transients. Statistical tests (p-values for nuclear window elevation and shadow deficit) are then performed on the probability-stratified counts. These steps do not reduce to self-definition or fitted inputs called predictions. While the transient catalog originates from the authors' prior work, the current results introduce new ML-based controls and are not forced by that prior catalog alone. No uniqueness theorems or ansatzes are smuggled in. The derivation remains self-contained against the external nuclear test dates and expert labels.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim depends on the reliability of expert visual labels as ground truth and on the appropriateness of the temporal and geometric windows used for statistical testing. No explicit numerical free parameters are stated beyond those implicit in model training. No new physical entities are postulated.

axioms (2)

domain assumption Expert visual review of 250 transient image pairs supplies accurate ground-truth labels separating real sources from plate defects
These labels are used to train and evaluate the machine learning model.
domain assumption The definitions of the nuclear testing window and Earth's shadow region are unbiased and correctly implemented for the statistical tests
These windows are central to the reported p-values for count elevation and deficit.

pith-pipeline@v0.9.0 · 5560 in / 1574 out tokens · 44829 ms · 2026-05-10T03:40:29.002436+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Statistically Significant Linear Alignments Among High-Confidence Transient Candidates on POSS-I Photographic Plates
astro-ph.IM 2026-05 unverdicted novelty 6.0

Statistically significant linear alignments among high-confidence transient candidates on 1949-1957 photographic plates are detected, projecting to constant geographic longitudes with clustering near specific Earth sites.

Reference graph

Works this paper leans on

19 extracted references · 7 canonical work pages · cited by 1 Pith paper · 2 internal anchors

[1]

Transients in the Palomar Observatory Sky Survey (POSS-I) may be associated with nuclear testing and reports of unidentified anomalous phenomena

Bruehl, S., Villarroel, B. Transients in the Palomar Observatory Sky Survey (POSS-I) may be associated with nuclear testing and reports of unidentified anomalous phenomena. Sci Rep 2025; 15: 34125

2025
[2]

Searching for Fast Astronomical Transients in Archival Photographic Plates

Busko, I. Searching for Fast Astronomical Transients in Archival Photographic Plates. 2026. Preprint at: https://arxiv.org/abs/2603.20407

work page arXiv 2026
[3]

Geomagnetic storm suppression of photographic plate transient detections in the POSS-I archive: an independent physical variable strengthening the nuclear test correlation

Cann, K. Geomagnetic storm suppression of photographic plate transient detections in the POSS-I archive: an independent physical variable strengthening the nuclear test correlation. 2026. Preprint at: https://arxiv.org/abs/2604.04950

work page internal anchor Pith review Pith/arXiv arXiv 2026
[4]

Cann, K. Plate sensitivity is invariant across geomagnetic storm intensity at two independent observatories: Ruling out the airglow artifact and confirming source-specific transient suppression. 2025. Preprint at: https://doi.org/10.22541/essoar.15002100/v1

work page doi:10.22541/essoar.15002100/v1 2025
[5]

Independent Replication of Nuclear Test-Transient Correlations and Earth Shadow Deficit in POSS-I Photographic Plates

Doherty, B. Independent Replication of Nuclear Test-Transient Correlations and Earth Shadow Deficit in POSS-I Photographic Plates. 2026. Preprint at: https://arxiv.org/abs/2604.00056v1

work page arXiv 2026
[6]

On the nature of apparent transient sources on the National Geographic Society–Palomar Observatory Sky Survey glass copy plates

Hambly NC, Blair A. On the nature of apparent transient sources on the National Geographic Society–Palomar Observatory Sky Survey glass copy plates. RAS Techniques and Instruments 2024; 3, 73-79

2024
[7]

UFOs & nukes: extraordinary encounters at nuclear weapons sites (2nd Edition)

Hastings, R. UFOs & nukes: extraordinary encounters at nuclear weapons sites (2nd Edition). 2017. Self-Published

2017
[8]

Solano, E. et al. A bright triple transient that vanished within 50 min, Monthly Notices Royal Astron Soc. 2024; 527: 6312–6320

2024
[9]

Discovering vanishing objects in POSS I red images 24 using the Virtual Observatory, Monthly Notices Royal Astron Soc

Solano, E., Villarroel, B., & Rodrigo, C. Discovering vanishing objects in POSS I red images 24 using the Virtual Observatory, Monthly Notices Royal Astron Soc. 2022; 515: 1380–1391 (2022)

2022
[10]

Having a dream

Spitzer RL, Williams JB. Having a dream. A research strategy for DSM-IV . Arch Gen Psychiatry 1988; 45: 871-874

1988
[11]

Missing Star

Villarroel, B. et al. The Vanishing and Appearing Sources during a Century of Observations Project. I. USNO Objects Missing in Modern Sky Surveys and Follow-up Observations of a “Missing Star”. Astronom J 2020; 159: 8

2020
[12]

Villarroel, B. et al. Exploring nine simultaneously occurring transients on April 12th 1950. Sci Rep 2021; 11: 12794

1950
[13]

Villarroel, B. et al. A glint in the eye: Photographic plate archive searches for non- terrestrial artefacts. Acta Astronautica 2022; 194: 106-113

2022
[14]

Villarroel, B. et al. 2025. Aligned, Multiple-Transient Events in the First Palomar Sky Survey. PASP 2-25; 137: 104504

2025
[15]

Villarroel, B., Solano, E., & Marcy, G. W. On the Image Profiles of Transients in the Palomar Sky Survey. 2025. Preprint at: https://arxiv.org/abs/2507.15896

work page arXiv 2025
[16]

A Response to Watters et al

Villarroel, B., Streblyanska, A., Bruehl, S., Geier, S. A Response to Watters et al. (2026)

2026
[17]

Preprint at: https://arxiv.org/pdf/2602.15171

work page internal anchor Pith review Pith/arXiv arXiv
[18]

Z., Domine L., Little S

Watters C. Z., Domine L., Little S. Pratt C., Knuth K. H., 2026, Critical Evaluation of Studies Alleging Evidence for Technosignatures in the POSS1-E Photographic Plates. 2026. Preprint at: https://arxiv.org/abs/2601.21946

work page arXiv 2026
[19]

Toward an empirical classification for the DSM-IV

Widiger TA, Frances AJ, Pincus HA, Davis WW, & First MB. Toward an empirical classification for the DSM-IV . J Abnorm Psychol 1991; 100: 280-288. 25 Acknowledgments B.V . is funded by the Swedish Research Council (Vetenskapsr\aa det, grant no. 2024-04708) and supported by a generous donor. A.S. is supported by the Athanatos Foundation. The authors would l...

1991