astromorph: Self-supervised machine learning pipeline for astronomical morphology analysis
Pith reviewed 2026-05-16 05:01 UTC · model grok-4.3
The pith
Astromorph adapts self-supervised BYOL learning to generate embeddings that capture morphological patterns in unlabeled astronomical images and cubes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Astromorph implements an adapted BYOL self-supervised framework that produces embeddings capturing morphological differences and similarities across large samples of astronomical data without labeled training examples. The pipeline accommodates data of varying dimensions and resolutions, including both single-channel FITS images and multi-channel spectral cubes, and is demonstrated on protoplanetary disks observed with ALMA and infrared dark clouds observed with Spitzer and Herschel. These embeddings support key tasks such as clustering, anomaly detection, and similarity-based exploration in a user-friendly manner with both streamlined scripts and deeper PyTorch customization options.
What carries the argument
The adapted BYOL (Bootstrap Your Own Latents) self-supervised training that learns representations from data augmentations to produce embeddings without any labels.
If this is right
- Large unlabeled samples can be clustered by morphological similarity for statistical studies.
- Anomaly detection becomes possible by identifying outliers in the embedding space.
- Similarity searches can locate analogous objects across different surveys or wavelengths.
- The same pipeline applies directly to both image and spectral cube data without retraining from scratch.
- Broad applicability is suggested across imaging-rich contexts from varied telescopes.
Where Pith is reading between the lines
- The embeddings could serve as input features for hybrid models that combine self-supervised pretraining with limited supervised fine-tuning on physical parameters.
- Application to simulated datasets with injected known morphologies would provide a direct test of whether learned features align with physical structures.
- Integration with upcoming large-scale surveys could lower the cost of initial data exploration before targeted follow-up observations.
- Cross-domain transfer might allow embeddings trained on one telescope's data to initialize analysis for another instrument's images.
Load-bearing premise
The adapted BYOL training on astronomical images and cubes will produce embeddings that reflect physically relevant morphological properties rather than imaging artifacts or dataset-specific biases.
What would settle it
If expert visual classification of a held-out sample from the ALMA protoplanetary disk dataset shows no agreement with clusters formed from the learned embeddings, the claim of scientifically meaningful representations would be falsified.
Figures
read the original abstract
Modern telescopes generate increasingly large and diverse datasets, often consisting of complex and morphologically rich structures. To efficiently explore such data requires automated methods that can extract and organize physically meaningful information, ideally without the need for extensive manual interaction. We aim to provide a user-friendly implementation of a self-supervised machine learning framework to explore morphological properties of large datasets, based on the BYOL (Bootstrap Your Own Latents) method. By enabling the generation of meaningful image embeddings without manually labelled data, the framework will enable key tasks such as clustering, anomaly detection, and similarity based exploration. In contrast to existing BYOL implementations, astromorph accommodates data of varying dimensions and resolutions, including both single-channel FITS images and multi-channel spectral cubes. The package is built with usability in mind, offering streamlined pipeline scripts for ease of use as well as deeper customization options via PyTorch-based classes. To demonstrate the utility of astromorph, we apply it in two contrasting science cases representing different astronomical domains: images of protoplanetary disks observed with ALMA, and infrared dark clouds observed with Spitzer and Herschel. In both cases, we demonstrate how astromorph produces scientifically meaningful embeddings that capture morphological differences and similarities across large samples. astromorph enables users to apply a robust, label-free approach for uncovering morphological patterns in astronomical datasets. The successful application to two markedly different datasets suggest that the pipeline is broadly applicable across a wide range of imaging-rich astronomical context, providing a user friendly tool for advancing discovery in observational astronomy.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents astromorph, a PyTorch-based Python package implementing an adapted BYOL self-supervised learning pipeline for generating image embeddings from astronomical data. It accommodates single-channel FITS images and multi-channel spectral cubes of varying dimensions and resolutions, and demonstrates the pipeline on two datasets: ALMA observations of protoplanetary disks and Spitzer/Herschel infrared dark clouds. The central claim is that the resulting embeddings are scientifically meaningful, capturing morphological differences and similarities in a label-free manner to support tasks such as clustering and anomaly detection.
Significance. If the embeddings prove robustly meaningful, the package would offer a practical, user-friendly tool for label-free morphological exploration of large astronomical imaging datasets across modalities, filling a gap between generic self-supervised methods and domain-specific needs in observational astronomy.
major comments (3)
- [§4] §4 (ALMA protoplanetary disks application): The claim that embeddings 'capture morphological differences and similarities' is supported only by qualitative description; no quantitative metrics (e.g., correlation with catalogued disk parameters such as radius or inclination, silhouette scores, or comparison against supervised baselines) are reported, leaving the scientific meaningfulness unverified.
- [§5] §5 (infrared dark clouds application): Similarly, the demonstration on Spitzer/Herschel data asserts meaningful capture of cloud morphology (e.g., filamentarity) without ablation studies on the variable-resolution handling, controls for SNR or beam effects, or any embedding quality metrics, making it impossible to rule out that clusters arise from imaging covariates rather than astrophysical morphology.
- [Methods] Methods section (BYOL adaptation): The modifications for handling variable-resolution and multi-channel data are described at a high level but lack explicit equations or pseudocode for the loss function, augmentation strategy, or resolution-normalization step, preventing assessment of whether the adaptation preserves the original BYOL invariance properties.
minor comments (3)
- The abstract and introduction would benefit from explicit statements of dataset sizes (number of images/cubes) and hyperparameter choices used in the demonstrations.
- Figure captions for embedding visualizations should include details on the dimensionality reduction method (e.g., t-SNE perplexity) and any color-coding scheme.
- The package documentation and installation instructions could be expanded with a minimal reproducible example script.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed review of our manuscript. We address each major comment point by point below, providing clarifications and indicating where revisions have been made to strengthen the paper.
read point-by-point responses
-
Referee: [§4] §4 (ALMA protoplanetary disks application): The claim that embeddings 'capture morphological differences and similarities' is supported only by qualitative description; no quantitative metrics (e.g., correlation with catalogued disk parameters such as radius or inclination, silhouette scores, or comparison against supervised baselines) are reported, leaving the scientific meaningfulness unverified.
Authors: We agree that quantitative support would strengthen the demonstration in §4. In the revised manuscript we have added silhouette scores for the reported clusters, Pearson correlations between embedding dimensions and catalogued parameters (radius, inclination, and total flux), and a brief comparison of clustering purity against a supervised baseline trained on the available morphological labels. These results are now included in the text of §4 and in an updated Figure 4. revision: yes
-
Referee: [§5] §5 (infrared dark clouds application): Similarly, the demonstration on Spitzer/Herschel data asserts meaningful capture of cloud morphology (e.g., filamentarity) without ablation studies on the variable-resolution handling, controls for SNR or beam effects, or any embedding quality metrics, making it impossible to rule out that clusters arise from imaging covariates rather than astrophysical morphology.
Authors: We have added ablation experiments that isolate the effect of the resolution-normalization step and report quantitative embedding quality metrics (mean intra- versus inter-cluster Euclidean distances). However, exhaustive controls for SNR and beam convolution effects would require a dedicated suite of synthetic observations, which lies beyond the scope of the present pipeline-focused paper. We have expanded the discussion in §5 to explicitly acknowledge this limitation and to describe the steps taken to mitigate obvious imaging covariates. revision: partial
-
Referee: Methods section (BYOL adaptation): The modifications for handling variable-resolution and multi-channel data are described at a high level but lack explicit equations or pseudocode for the loss function, augmentation strategy, or resolution-normalization step, preventing assessment of whether the adaptation preserves the original BYOL invariance properties.
Authors: We have revised the Methods section to include (i) the explicit loss-function equation after the resolution-normalization and channel-handling modifications, (ii) pseudocode for the full augmentation pipeline, and (iii) a step-by-step derivation of the resolution-normalization operator. These additions allow readers to verify that the core BYOL invariance properties are retained. revision: yes
Circularity Check
No circularity in derivation chain; software pipeline with qualitative demonstration
full rationale
The paper describes a user-friendly software implementation adapting the established BYOL self-supervised method for astronomical images of varying sizes and channels. No mathematical derivation, parameter fitting, or prediction step is present that could reduce to its own inputs by construction. Claims of 'scientifically meaningful embeddings' rest on application to two example datasets rather than any self-referential logic or self-citation chain. The methodology cites the original BYOL work as an external foundation and focuses on usability and adaptation for FITS data, with no load-bearing uniqueness theorems or ansatzes imported from the authors' prior work. This is a standard tool-description paper whose central assertions are not forced by internal definitions or fitted inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Self-supervised contrastive learning via BYOL can extract morphologically meaningful features from unlabeled astronomical images
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We present astromorph, a Python package that implements the BYOL method... accommodates data of varying dimensions and resolutions, including both single-channel FITS images and multi-channel spectral cubes.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Abadi, M., Agarwal, A., Barham, P., et al. 2015, TensorFlow: Large- Scale Machine Learning on Heterogeneous Systems, software avail- able from tensorflow.org
work page 2015
- [2]
-
[3]
2020, Bootstrap your own latent: A new approach to self-supervised Learning
Grill, J.-B., Strub, F., Altché, F., et al. 2020, Bootstrap your own latent: A new approach to self-supervised Learning
work page 2020
-
[4]
Hacar, A., Clark, S. E., Heitsch, F., et al. 2023, in Astronomical So- ciety of the Pacific Conference Series, Vol. 534, Protostars and Planets VII, ed. S. Inutsuka, Y. Aikawa, T. Muto, K. Tomida, & M. Tamura, 153
work page 2023
-
[5]
2016, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR, 1
He, K., Zhang, X., Ren, S., & Sun, J. 2016, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR, 1
work page 2016
-
[6]
Kainulainen, J., Beuther, H., Henning, T., & Plume, R. 2009, A&A, 508, L35
work page 2009
-
[7]
Koch, E. W., Rosolowsky, E. W., Boyden, R. D., et al. 2019, AJ, 158, 1
work page 2019
-
[8]
Larson, R. B. 1981, MNRAS, 194, 809
work page 1981
-
[9]
Lee, J. C., Sandstrom, K. M., Leroy, A. K., et al. 2023, ApJ, 944, L17 Lenkić, L., Nally, C., Jones, O. C., et al. 2024, ApJ, 967, 110 maintainers, T. & contributors. 2016, TorchVision: PyTorch’s Com- puter Vision library,https://github.com/pytorch/vision
work page 2023
-
[10]
M., Barrientes, J., Blome, C., et al
Pontoppidan, K. M., Barrientes, J., Blome, C., et al. 2022, ApJ, 936, L14
work page 2022
-
[11]
Richemond, P. H., Tam, A., Tang, Y., et al. 2023, in Proceedings of the 40th International Conference on Machine Learning, ICML’23 (JMLR.org)
work page 2023
-
[12]
2011, A&A, 529, A1 van der Maaten, L
Schneider, N., Bontemps, S., Simon, R., et al. 2011, A&A, 529, A1 van der Maaten, L. & Hinton, G. 2008, Journal of Machine Learning Research, 9, 2579
work page 2011
-
[13]
Vantyghem, A. N., Galvin, T. J., Sebastian, B., et al. 2024, Astronomy and Computing, 47, 100824
work page 2024
-
[14]
Williams, T. G., Lee, J. C., Larson, K. L., et al. 2024, ApJS, 273, 13 Article number, page 12 of 12
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.