Enhancing astrometric registration of Chinese historical Astronomical Digital Plates with deep learning

Hao Luo; Jianhai Zhao; Jing Yang; Meiting Yang; Quanfeng Xu; Shiyin Shen; Yong Yu; Zhenghong Tang; Zhengjun Shang

arxiv: 2604.04714 · v2 · submitted 2026-04-06 · 🌌 astro-ph.IM

Enhancing astrometric registration of Chinese historical Astronomical Digital Plates with deep learning

Quanfeng Xu , Zhengjun Shang , Shiyin Shen , Yong Yu , Meiting Yang , Hao Luo , Zhenghong Tang , Jing Yang

show 1 more author

Jianhai Zhao

This is my paper

Pith reviewed 2026-05-10 19:10 UTC · model grok-4.3

classification 🌌 astro-ph.IM

keywords historical astronomical platesastrometric registrationdeep learningTransformer modelsource classificationdigitized archivesGaia catalogtime-domain astronomy

0 comments

The pith

A Transformer model classifies reliable sources on degraded plates to enable astrometric registration of 1353 additional historical cases.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Chinese astronomical plates collected since 1900 have been digitized but many resist standard astrometric registration because source extraction produces too many unreliable detections from storage damage and scanner artifacts. The authors trained a Transformer-based classifier on cutouts of sources from plates that already succeeded with conventional matching. The model uses multi-scale feature fusion to label which detections are trustworthy stellar sources. When run on 1883 plates that had previously failed, it supplied clean enough source lists to complete registration for 1353 of them against the Gaia catalog. This step directly increases the number of usable plates for long-term studies of stellar positions and variability.

Core claim

The central claim is that a Transformer-based classification model with multi-scale feature fusion, trained exclusively on successfully registered plates, can identify trustworthy stellar sources among SExtractor detections on degraded historical plates. When this classifier was applied to the source lists from 1883 plates that had failed prior astrometric matching with Astrometry.net and Gaia, it produced input catalogs that allowed successful registration for 1353 plates. The approach thereby converts a large fraction of previously unusable digitized plates into scientifically usable data for time-domain astronomy.

What carries the argument

Transformer-based classification model with multi-scale feature fusion that labels cutouts of SExtractor-detected sources as trustworthy stellar objects suitable for Gaia matching.

If this is right

A larger fraction of the Chinese historical plate collection becomes available for century-scale astrometric and photometric studies.
Automated source filtering reduces the fraction of plates requiring manual intervention in processing pipelines.
The same classification step can be inserted into other plate archives that use SExtractor and Astrometry.net.
Improved source lists raise the yield of successful Gaia-based solutions across the entire digitized set.
Longer observational baselines from the newly registered plates support better measurements of proper motions and long-period variables.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same trained classifier could be tested on plate collections from other countries that share similar degradation and scanning issues.
Retraining or fine-tuning the model on a small set of newly verified plates from different eras might raise the success rate further.
Pairing the classifier with modern source-detection networks instead of SExtractor could reduce the initial failure rate before classification is even applied.
The recovered plates open the possibility of new searches for rare long-term transients or solar-system objects across more than a century of observations.

Load-bearing premise

The visual and feature distribution of sources on the previously failed plates is similar enough to the successful training plates that the classifier transfers without retraining or adaptation.

What would settle it

Manual verification of sources selected by the classifier on a new set of failed plates showing a registration success rate well below 72 percent, or systematic position residuals larger than Gaia uncertainties in the resulting solutions.

Figures

Figures reproduced from arXiv: 2604.04714 by Hao Luo, Jianhai Zhao, Jing Yang, Meiting Yang, Quanfeng Xu, Shiyin Shen, Yong Yu, Zhenghong Tang, Zhengjun Shang.

**Figure 1.** Figure 1: An overview of the astrometric registration workflow for Chinese historical nighttime astronomical [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: An overview of Swin Transformer methods includes the structural details. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Distribution of Plate Count (a) and Stellar Fraction (b) by Source Number Plates. (Boxes repre [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Representative defects in photographic plate data, categorized by storage (top row), and include [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Representative Examples of Unclassifiable Stars from Unmatched Plates. [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

read the original abstract

China has systematically collected nighttime astronomical plates since 1900, creating a large historical dataset that has been digitized with optical scanners. For astrometric registration of these digitized plates, sources were first extracted using SExtractor, and then matched astrometrically with Astrometry.net and the Gaia catalog. However, suboptimal early storage conditions and subsequent environmental deterioration have impeded accurate source matching, resulting in processing failures for several thousand digitized plates. In this work, we introduce a Transformer-based classification model that takes cutouts of SExtractor-detected sources as input and leverages multi-scale feature fusion to identify trustworthy stellar sources on the plates. Trained on plates with successful astrometric calibration, our AI-based classifier was then applied to SExtractor detected sources of 1883 digitized plates, enabling us to complete the astrometric registration for 1353 of them. This AI-augmented pipeline streamlines the processing of historical plate archives and enhances their scientific value for long-term time-domain astronomical studies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies a Transformer classifier trained on good plates to clean sources on failed ones and claims to recover registrations for 1353 of 1883 plates, but offers no metrics or target-domain checks to show the approach actually works.

read the letter

The main thing to know is that the authors trained a multi-scale Transformer on cutouts from successfully registered Chinese historical plates, then used it to filter SExtractor detections on 1883 failed plates and report completing astrometric solutions for 1353 of them. This is a direct attempt to salvage data from a large, deteriorated archive that standard tools could not handle due to storage damage and plate wear.

Referee Report

2 major / 1 minor

Summary. The paper introduces a Transformer-based classifier with multi-scale feature fusion that takes cutouts of SExtractor-detected sources as input. The model is trained on plates with successful astrometric registration using Astrometry.net and Gaia, then applied to SExtractor sources from 1883 previously failed digitized Chinese historical plates, enabling successful registration for 1353 of them.

Significance. If the reported generalization holds, the work offers a practical method to recover astrometric data from deteriorated historical plates, increasing the value of large archival datasets for long-term time-domain studies. The approach combines established source extraction with modern deep learning and demonstrates application to a real, large-scale problem in astroinformatics.

major comments (2)

[Abstract] Abstract: The central claim that the classifier 'enabling us to complete the astrometric registration for 1353 of them' is presented without any quantitative performance metrics (precision, recall, F1, or success rate on the target plates), error analysis, or comparison to baselines such as simple magnitude cuts or non-Transformer classifiers. This leaves the numerical success unsupported by evidence in the provided text.
[Abstract] Abstract and methods description: The model is trained exclusively on cutouts from successfully registered plates and applied directly to sources from failed plates, yet no held-out validation on failed plates, cross-validation details, or domain-adaptation steps are described. The assumption that source-feature distributions (including deterioration effects) are sufficiently similar for reliable generalization is therefore untested and load-bearing for the reported completion count.

minor comments (1)

[Abstract] Abstract: The phrase 'our AI-based classifier was then applied' would benefit from a brief statement of the input size (number of sources per plate) or training-set size to allow readers to assess scale.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for major revision. We will revise the manuscript to strengthen the abstract and methods sections with the requested quantitative metrics, validation details, and clarifications on generalization, while preserving the core contribution of the work.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that the classifier 'enabling us to complete the astrometric registration for 1353 of them' is presented without any quantitative performance metrics (precision, recall, F1, or success rate on the target plates), error analysis, or comparison to baselines such as simple magnitude cuts or non-Transformer classifiers. This leaves the numerical success unsupported by evidence in the provided text.

Authors: We agree that the abstract would benefit from supporting metrics. In the revised version we will report cross-validation precision, recall, and F1 scores obtained on held-out successful plates, include a brief error analysis, and add a comparison against simple baselines (magnitude cuts and a non-Transformer CNN). We will also state the effective success rate (1353/1883 plates) as the primary outcome metric while making clear that these figures are derived from the downstream registration success after source filtering. revision: yes
Referee: [Abstract] Abstract and methods description: The model is trained exclusively on cutouts from successfully registered plates and applied directly to sources from failed plates, yet no held-out validation on failed plates, cross-validation details, or domain-adaptation steps are described. The assumption that source-feature distributions (including deterioration effects) are sufficiently similar for reliable generalization is therefore untested and load-bearing for the reported completion count.

Authors: We will expand the methods section to detail the cross-validation protocol used on the successful-plate training set. Direct held-out validation on the failed plates is not possible because ground-truth labels for trustworthy sources do not exist for those plates. We will therefore clarify that generalization is supported indirectly by the empirical outcome: after applying the classifier, Astrometry.net succeeded on 1353 of the 1883 plates. We will also discuss the multi-scale feature fusion as an implicit robustness mechanism and note the lack of explicit domain-adaptation techniques as a limitation to be addressed in future work. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents an empirical ML application: a Transformer classifier is trained on source cutouts from successfully astrometrically registered plates and then applied to SExtractor detections on a separate set of 1883 failed plates, enabling registration for 1353 of them. No equations, parameter fits, self-definitions, or load-bearing self-citations are present that would reduce any claimed result to its own inputs by construction. The pipeline is self-contained against external benchmarks (successful vs. failed plates) with no renaming of known results or ansatz smuggling.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the generalization of a supervised classifier across plate quality domains and on the assumption that SExtractor detections contain sufficient distinguishing features for the model.

axioms (1)

domain assumption The classifier trained on successful plates generalizes to the failed plates without significant domain shift.
Required for the reported success on 1353 plates but not justified or tested in the abstract.

pith-pipeline@v0.9.0 · 5491 in / 1290 out tokens · 73415 ms · 2026-05-10T19:10:45.857374+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We adopt the Swin Transformer backbone ... multi-scale feature fusion to identify trustworthy stellar sources ... Trained on plates with successful astrometric calibration, our AI-based classifier was then applied to SExtractor detected sources of 1883 digitized plates, enabling us to complete the astrometric registration for 1353 of them.
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The astrometric registration of the digitized plates consists of three main steps: source extraction, stellar source classification, and astrometric matching.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · 1 internal anchor

[1]

@esa (Ref

\@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command natbib from the document \@ifclassloaded aguplus natbib The aguplus class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later rem...

work page
[2]

@stdbsttrue NAT@ctr \@lbibitem[ NAT@ctr ] \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 [ @natanchorstart #2\@extra@b@citeb \@biblabel @num @natanchorend] @ifcmd#1(@)(@)\@nil #2 @lbibitem\@undefined @lbibitem\@lbibitem \@lbibitem[#1]#2 @lb...

work page
[3]

, keywords =

@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifundefined NAT@sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifundefined bib@heading @heading NAT@ctr thebibliography [1] @ \@biblabel NAT@ctr \@bibsetup #1 NAT@ctr 0 @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.=1000 \@...

work page doi:10.1111/j.1365-2966.2001.04660.x 1995
[4]

& Arnouts, S

Bertin, E. & Arnouts, S. 1996, Astron. Astrophys. Suppl. Ser., 117, 393

work page 1996
[5]

Dosovitskiy A., Beyer L., Kolesnikov A., et al., 2021, in International Conference on Learning Representations (ICLR)

work page 2021
[6]

Enke H., Tuvikene T., Groote D., Edelmann H., & Heber U., 2024, A&A, 687, A165

work page 2024
[7]

Fortson L., Masters K., Nichol R., et al., 2012, Advances in machine learning and data mining for astronomy, 2012, 213

work page 2012
[8]

Gaia Collaboration, Brown A., Vallebaru A., et al., 2018, A&A, 616, A1

work page 2018
[9]

Gaia Collaboration, Vallebaru A., Brown A., et al., 2023, A&A, 674, A1

work page 2023
[10]

Grindlay J., Tang S., Los E., & Servillat M., 2011, Proceedings of the International Astronomical Union, 7, 29–34

work page 2011
[11]

Hambly N., MacGillivray H., Read M., et al., 2001, MNRAS, 326, 1279

work page 2001
[12]

Hambly N., Irwin M., & MacGillivray H., 2001b, MNRAS, 326, 1295

work page
[13]

A., Dumais S

Hearst M. A., Dumais S. T., Osuna E., Platt J., & Scholkopf B., 1998, IEEE Intelligent Systems and their applications, 13, 18

work page 1998
[14]

Adam: A Method for Stochastic Optimization

Kingma D. P., & Ba J., 2014, preprint (arXiv:1412.6980)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[15]

W., Mierle K., Blanton M., & Roweis S., 2010, AJ, 139, 1782

Lang D., Hogg D. W., Mierle K., Blanton M., & Roweis S., 2010, AJ, 139, 1782

work page 2010
[16]

Liu Z., Lin Y., Cao Y., et al., 2021, in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)

work page 2021
[17]

Ma M., Yuan H., Xiao K., et al., 2025, ApJS, 280, 18

work page 2025
[18]

A., Chambers KC., Flewelling HA., et al., 2020, ApJS, 251, 3

Magnier E. A., Chambers KC., Flewelling HA., et al., 2020, ApJS, 251, 3

work page 2020
[19]

Shang Z., Yu Y., Wang L., et al., 2024, RAA, 24, 055010

work page 2024
[20]

E., Los E., et al., 2006, in Applications of Digital Image Processing XXIX

Simcoe R., Grindlay J. E., Los E., et al., 2006, in Applications of Digital Image Processing XXIX. 338–349

work page 2006
[21]

T., Ivezi \'c Z ., & Lupton R

Slater C. T., Ivezi \'c Z ., & Lupton R. H., 2020, AJ, 159, 65

work page 2020
[22]

Walmsley M., Smith L., Lintott C., et al., 2020, MNRAS, 491, 1554

work page 2020
[23]

Xu Q., Shen S., de Souza R., et al., 2023, MNRAS, 526, 6391

work page 2023
[24]

Yu Y., Zhao J., Tang Z., & Shang Z., 2017, RAA, 17, 28

work page 2017
[25]

Ye R., Shen S., de Souza R., et al., 2025, MNRAS, 537, 640

work page 2025

[1] [1]

@esa (Ref

\@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command natbib from the document \@ifclassloaded aguplus natbib The aguplus class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later rem...

work page

[2] [2]

@stdbsttrue NAT@ctr \@lbibitem[ NAT@ctr ] \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 [ @natanchorstart #2\@extra@b@citeb \@biblabel @num @natanchorend] @ifcmd#1(@)(@)\@nil #2 @lbibitem\@undefined @lbibitem\@lbibitem \@lbibitem[#1]#2 @lb...

work page

[3] [3]

, keywords =

@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifundefined NAT@sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifundefined bib@heading @heading NAT@ctr thebibliography [1] @ \@biblabel NAT@ctr \@bibsetup #1 NAT@ctr 0 @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.=1000 \@...

work page doi:10.1111/j.1365-2966.2001.04660.x 1995

[4] [4]

& Arnouts, S

Bertin, E. & Arnouts, S. 1996, Astron. Astrophys. Suppl. Ser., 117, 393

work page 1996

[5] [5]

Dosovitskiy A., Beyer L., Kolesnikov A., et al., 2021, in International Conference on Learning Representations (ICLR)

work page 2021

[6] [6]

Enke H., Tuvikene T., Groote D., Edelmann H., & Heber U., 2024, A&A, 687, A165

work page 2024

[7] [7]

Fortson L., Masters K., Nichol R., et al., 2012, Advances in machine learning and data mining for astronomy, 2012, 213

work page 2012

[8] [8]

Gaia Collaboration, Brown A., Vallebaru A., et al., 2018, A&A, 616, A1

work page 2018

[9] [9]

Gaia Collaboration, Vallebaru A., Brown A., et al., 2023, A&A, 674, A1

work page 2023

[10] [10]

Grindlay J., Tang S., Los E., & Servillat M., 2011, Proceedings of the International Astronomical Union, 7, 29–34

work page 2011

[11] [11]

Hambly N., MacGillivray H., Read M., et al., 2001, MNRAS, 326, 1279

work page 2001

[12] [12]

Hambly N., Irwin M., & MacGillivray H., 2001b, MNRAS, 326, 1295

work page

[13] [13]

A., Dumais S

Hearst M. A., Dumais S. T., Osuna E., Platt J., & Scholkopf B., 1998, IEEE Intelligent Systems and their applications, 13, 18

work page 1998

[14] [14]

Adam: A Method for Stochastic Optimization

Kingma D. P., & Ba J., 2014, preprint (arXiv:1412.6980)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[15] [15]

W., Mierle K., Blanton M., & Roweis S., 2010, AJ, 139, 1782

Lang D., Hogg D. W., Mierle K., Blanton M., & Roweis S., 2010, AJ, 139, 1782

work page 2010

[16] [16]

Liu Z., Lin Y., Cao Y., et al., 2021, in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)

work page 2021

[17] [17]

Ma M., Yuan H., Xiao K., et al., 2025, ApJS, 280, 18

work page 2025

[18] [18]

A., Chambers KC., Flewelling HA., et al., 2020, ApJS, 251, 3

Magnier E. A., Chambers KC., Flewelling HA., et al., 2020, ApJS, 251, 3

work page 2020

[19] [19]

Shang Z., Yu Y., Wang L., et al., 2024, RAA, 24, 055010

work page 2024

[20] [20]

E., Los E., et al., 2006, in Applications of Digital Image Processing XXIX

Simcoe R., Grindlay J. E., Los E., et al., 2006, in Applications of Digital Image Processing XXIX. 338–349

work page 2006

[21] [21]

T., Ivezi \'c Z ., & Lupton R

Slater C. T., Ivezi \'c Z ., & Lupton R. H., 2020, AJ, 159, 65

work page 2020

[22] [22]

Walmsley M., Smith L., Lintott C., et al., 2020, MNRAS, 491, 1554

work page 2020

[23] [23]

Xu Q., Shen S., de Souza R., et al., 2023, MNRAS, 526, 6391

work page 2023

[24] [24]

Yu Y., Zhao J., Tang Z., & Shang Z., 2017, RAA, 17, 28

work page 2017

[25] [25]

Ye R., Shen S., de Souza R., et al., 2025, MNRAS, 537, 640

work page 2025