pith. sign in

arxiv: 2604.04714 · v2 · submitted 2026-04-06 · 🌌 astro-ph.IM

Enhancing astrometric registration of Chinese historical Astronomical Digital Plates with deep learning

Pith reviewed 2026-05-10 19:10 UTC · model grok-4.3

classification 🌌 astro-ph.IM
keywords historical astronomical platesastrometric registrationdeep learningTransformer modelsource classificationdigitized archivesGaia catalogtime-domain astronomy
0
0 comments X

The pith

A Transformer model classifies reliable sources on degraded plates to enable astrometric registration of 1353 additional historical cases.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Chinese astronomical plates collected since 1900 have been digitized but many resist standard astrometric registration because source extraction produces too many unreliable detections from storage damage and scanner artifacts. The authors trained a Transformer-based classifier on cutouts of sources from plates that already succeeded with conventional matching. The model uses multi-scale feature fusion to label which detections are trustworthy stellar sources. When run on 1883 plates that had previously failed, it supplied clean enough source lists to complete registration for 1353 of them against the Gaia catalog. This step directly increases the number of usable plates for long-term studies of stellar positions and variability.

Core claim

The central claim is that a Transformer-based classification model with multi-scale feature fusion, trained exclusively on successfully registered plates, can identify trustworthy stellar sources among SExtractor detections on degraded historical plates. When this classifier was applied to the source lists from 1883 plates that had failed prior astrometric matching with Astrometry.net and Gaia, it produced input catalogs that allowed successful registration for 1353 plates. The approach thereby converts a large fraction of previously unusable digitized plates into scientifically usable data for time-domain astronomy.

What carries the argument

Transformer-based classification model with multi-scale feature fusion that labels cutouts of SExtractor-detected sources as trustworthy stellar objects suitable for Gaia matching.

If this is right

  • A larger fraction of the Chinese historical plate collection becomes available for century-scale astrometric and photometric studies.
  • Automated source filtering reduces the fraction of plates requiring manual intervention in processing pipelines.
  • The same classification step can be inserted into other plate archives that use SExtractor and Astrometry.net.
  • Improved source lists raise the yield of successful Gaia-based solutions across the entire digitized set.
  • Longer observational baselines from the newly registered plates support better measurements of proper motions and long-period variables.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same trained classifier could be tested on plate collections from other countries that share similar degradation and scanning issues.
  • Retraining or fine-tuning the model on a small set of newly verified plates from different eras might raise the success rate further.
  • Pairing the classifier with modern source-detection networks instead of SExtractor could reduce the initial failure rate before classification is even applied.
  • The recovered plates open the possibility of new searches for rare long-term transients or solar-system objects across more than a century of observations.

Load-bearing premise

The visual and feature distribution of sources on the previously failed plates is similar enough to the successful training plates that the classifier transfers without retraining or adaptation.

What would settle it

Manual verification of sources selected by the classifier on a new set of failed plates showing a registration success rate well below 72 percent, or systematic position residuals larger than Gaia uncertainties in the resulting solutions.

Figures

Figures reproduced from arXiv: 2604.04714 by Hao Luo, Jianhai Zhao, Jing Yang, Meiting Yang, Quanfeng Xu, Shiyin Shen, Yong Yu, Zhenghong Tang, Zhengjun Shang.

Figure 1
Figure 1. Figure 1: An overview of the astrometric registration workflow for Chinese historical nighttime astronomical [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: An overview of Swin Transformer methods includes the structural details. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Distribution of Plate Count (a) and Stellar Fraction (b) by Source Number Plates. (Boxes repre [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Representative defects in photographic plate data, categorized by storage (top row), and include [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Representative Examples of Unclassifiable Stars from Unmatched Plates. [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
read the original abstract

China has systematically collected nighttime astronomical plates since 1900, creating a large historical dataset that has been digitized with optical scanners. For astrometric registration of these digitized plates, sources were first extracted using SExtractor, and then matched astrometrically with Astrometry.net and the Gaia catalog. However, suboptimal early storage conditions and subsequent environmental deterioration have impeded accurate source matching, resulting in processing failures for several thousand digitized plates. In this work, we introduce a Transformer-based classification model that takes cutouts of SExtractor-detected sources as input and leverages multi-scale feature fusion to identify trustworthy stellar sources on the plates. Trained on plates with successful astrometric calibration, our AI-based classifier was then applied to SExtractor detected sources of 1883 digitized plates, enabling us to complete the astrometric registration for 1353 of them. This AI-augmented pipeline streamlines the processing of historical plate archives and enhances their scientific value for long-term time-domain astronomical studies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces a Transformer-based classifier with multi-scale feature fusion that takes cutouts of SExtractor-detected sources as input. The model is trained on plates with successful astrometric registration using Astrometry.net and Gaia, then applied to SExtractor sources from 1883 previously failed digitized Chinese historical plates, enabling successful registration for 1353 of them.

Significance. If the reported generalization holds, the work offers a practical method to recover astrometric data from deteriorated historical plates, increasing the value of large archival datasets for long-term time-domain studies. The approach combines established source extraction with modern deep learning and demonstrates application to a real, large-scale problem in astroinformatics.

major comments (2)
  1. [Abstract] Abstract: The central claim that the classifier 'enabling us to complete the astrometric registration for 1353 of them' is presented without any quantitative performance metrics (precision, recall, F1, or success rate on the target plates), error analysis, or comparison to baselines such as simple magnitude cuts or non-Transformer classifiers. This leaves the numerical success unsupported by evidence in the provided text.
  2. [Abstract] Abstract and methods description: The model is trained exclusively on cutouts from successfully registered plates and applied directly to sources from failed plates, yet no held-out validation on failed plates, cross-validation details, or domain-adaptation steps are described. The assumption that source-feature distributions (including deterioration effects) are sufficiently similar for reliable generalization is therefore untested and load-bearing for the reported completion count.
minor comments (1)
  1. [Abstract] Abstract: The phrase 'our AI-based classifier was then applied' would benefit from a brief statement of the input size (number of sources per plate) or training-set size to allow readers to assess scale.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for major revision. We will revise the manuscript to strengthen the abstract and methods sections with the requested quantitative metrics, validation details, and clarifications on generalization, while preserving the core contribution of the work.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that the classifier 'enabling us to complete the astrometric registration for 1353 of them' is presented without any quantitative performance metrics (precision, recall, F1, or success rate on the target plates), error analysis, or comparison to baselines such as simple magnitude cuts or non-Transformer classifiers. This leaves the numerical success unsupported by evidence in the provided text.

    Authors: We agree that the abstract would benefit from supporting metrics. In the revised version we will report cross-validation precision, recall, and F1 scores obtained on held-out successful plates, include a brief error analysis, and add a comparison against simple baselines (magnitude cuts and a non-Transformer CNN). We will also state the effective success rate (1353/1883 plates) as the primary outcome metric while making clear that these figures are derived from the downstream registration success after source filtering. revision: yes

  2. Referee: [Abstract] Abstract and methods description: The model is trained exclusively on cutouts from successfully registered plates and applied directly to sources from failed plates, yet no held-out validation on failed plates, cross-validation details, or domain-adaptation steps are described. The assumption that source-feature distributions (including deterioration effects) are sufficiently similar for reliable generalization is therefore untested and load-bearing for the reported completion count.

    Authors: We will expand the methods section to detail the cross-validation protocol used on the successful-plate training set. Direct held-out validation on the failed plates is not possible because ground-truth labels for trustworthy sources do not exist for those plates. We will therefore clarify that generalization is supported indirectly by the empirical outcome: after applying the classifier, Astrometry.net succeeded on 1353 of the 1883 plates. We will also discuss the multi-scale feature fusion as an implicit robustness mechanism and note the lack of explicit domain-adaptation techniques as a limitation to be addressed in future work. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents an empirical ML application: a Transformer classifier is trained on source cutouts from successfully astrometrically registered plates and then applied to SExtractor detections on a separate set of 1883 failed plates, enabling registration for 1353 of them. No equations, parameter fits, self-definitions, or load-bearing self-citations are present that would reduce any claimed result to its own inputs by construction. The pipeline is self-contained against external benchmarks (successful vs. failed plates) with no renaming of known results or ansatz smuggling.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the generalization of a supervised classifier across plate quality domains and on the assumption that SExtractor detections contain sufficient distinguishing features for the model.

axioms (1)
  • domain assumption The classifier trained on successful plates generalizes to the failed plates without significant domain shift.
    Required for the reported success on 1353 plates but not justified or tested in the abstract.

pith-pipeline@v0.9.0 · 5491 in / 1290 out tokens · 73415 ms · 2026-05-10T19:10:45.857374+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    We adopt the Swin Transformer backbone ... multi-scale feature fusion to identify trustworthy stellar sources ... Trained on plates with successful astrometric calibration, our AI-based classifier was then applied to SExtractor detected sources of 1883 digitized plates, enabling us to complete the astrometric registration for 1353 of them.

  • IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    The astrometric registration of the digitized plates consists of three main steps: source extraction, stellar source classification, and astrometric matching.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · 1 internal anchor

  1. [1]

    @esa (Ref

    \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command natbib from the document \@ifclassloaded aguplus natbib The aguplus class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later rem...

  2. [2]

    @stdbsttrue NAT@ctr \@lbibitem[ NAT@ctr ] \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 [ @natanchorstart #2\@extra@b@citeb \@biblabel @num @natanchorend] @ifcmd#1(@)(@)\@nil #2 @lbibitem\@undefined @lbibitem\@lbibitem \@lbibitem[#1]#2 @lb...

  3. [3]

    , keywords =

    @open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifundefined NAT@sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifundefined bib@heading @heading NAT@ctr thebibliography [1] @ \@biblabel NAT@ctr \@bibsetup #1 NAT@ctr 0 @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.=1000 \@...

  4. [4]

    & Arnouts, S

    Bertin, E. & Arnouts, S. 1996, Astron. Astrophys. Suppl. Ser., 117, 393

  5. [5]

    Dosovitskiy A., Beyer L., Kolesnikov A., et al., 2021, in International Conference on Learning Representations (ICLR)

  6. [6]

    Enke H., Tuvikene T., Groote D., Edelmann H., & Heber U., 2024, A&A, 687, A165

  7. [7]

    Fortson L., Masters K., Nichol R., et al., 2012, Advances in machine learning and data mining for astronomy, 2012, 213

  8. [8]

    Gaia Collaboration, Brown A., Vallebaru A., et al., 2018, A&A, 616, A1

  9. [9]

    Gaia Collaboration, Vallebaru A., Brown A., et al., 2023, A&A, 674, A1

  10. [10]

    Grindlay J., Tang S., Los E., & Servillat M., 2011, Proceedings of the International Astronomical Union, 7, 29–34

  11. [11]

    Hambly N., MacGillivray H., Read M., et al., 2001, MNRAS, 326, 1279

  12. [12]

    Hambly N., Irwin M., & MacGillivray H., 2001b, MNRAS, 326, 1295

  13. [13]

    A., Dumais S

    Hearst M. A., Dumais S. T., Osuna E., Platt J., & Scholkopf B., 1998, IEEE Intelligent Systems and their applications, 13, 18

  14. [14]

    Adam: A Method for Stochastic Optimization

    Kingma D. P., & Ba J., 2014, preprint (arXiv:1412.6980)

  15. [15]

    W., Mierle K., Blanton M., & Roweis S., 2010, AJ, 139, 1782

    Lang D., Hogg D. W., Mierle K., Blanton M., & Roweis S., 2010, AJ, 139, 1782

  16. [16]

    Liu Z., Lin Y., Cao Y., et al., 2021, in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)

  17. [17]

    Ma M., Yuan H., Xiao K., et al., 2025, ApJS, 280, 18

  18. [18]

    A., Chambers KC., Flewelling HA., et al., 2020, ApJS, 251, 3

    Magnier E. A., Chambers KC., Flewelling HA., et al., 2020, ApJS, 251, 3

  19. [19]

    Shang Z., Yu Y., Wang L., et al., 2024, RAA, 24, 055010

  20. [20]

    E., Los E., et al., 2006, in Applications of Digital Image Processing XXIX

    Simcoe R., Grindlay J. E., Los E., et al., 2006, in Applications of Digital Image Processing XXIX. 338–349

  21. [21]

    T., Ivezi \'c Z ., & Lupton R

    Slater C. T., Ivezi \'c Z ., & Lupton R. H., 2020, AJ, 159, 65

  22. [22]

    Walmsley M., Smith L., Lintott C., et al., 2020, MNRAS, 491, 1554

  23. [23]

    Xu Q., Shen S., de Souza R., et al., 2023, MNRAS, 526, 6391

  24. [24]

    Yu Y., Zhao J., Tang Z., & Shang Z., 2017, RAA, 17, 28

  25. [25]

    Ye R., Shen S., de Souza R., et al., 2025, MNRAS, 537, 640