Variational views for self-supervised learning in radio astronomy

Anna M. M. Scaife; Johnny Joseph Alphonse

arxiv: 2602.18923 · v2 · pith:NHYP4NR6new · submitted 2026-02-21 · 🌌 astro-ph.IM · astro-ph.GA

Variational views for self-supervised learning in radio astronomy

Johnny Joseph Alphonse , Anna M. M. Scaife This is my paper

Pith reviewed 2026-05-15 20:10 UTC · model grok-4.3

classification 🌌 astro-ph.IM astro-ph.GA

keywords self-supervised learningvariational autoencoderradio galaxy morphologybeta-VAEgenerative augmentationFanaroff-Riley classificationradio astronomy

0 comments

The pith

A beta-VAE supplies generative views that improve self-supervised pre-training for radio galaxy morphology when combined with standard augmentations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that training a beta-VAE on unlabeled radio galaxy images and using its reconstructions as additional views in a contrastive self-supervised pipeline raises accuracy on downstream classification tasks. This matters because large astronomical surveys produce far more images than can be labeled by hand, so scalable pre-training methods that extract structure without labels become essential. The beta-VAE is tuned with moderate regularization to balance reconstruction quality against disentanglement of factors such as source multiplicity and lobe asymmetry. Experiments demonstrate that these generative views add information beyond ordinary image transformations, with ablation studies isolating the contribution of each augmentation type.

Core claim

A beta-VAE trained at beta=2.3 on the Radio Galaxy Zoo dataset produces reconstructions that capture continuous morphological variations, including a smooth transition across Fanaroff-Riley class identity in the latent space. When these reconstructions are inserted as generative augmentations inside a view-based self-supervised model, the resulting representations yield higher accuracy on downstream radio galaxy classification than the same model trained with only standard augmentations. The work shows that the generative and contrastive approaches are complementary and that the added views remain useful after transfer to the classifier.

What carries the argument

beta-VAE reconstructions used as generative augmentations inside a view-based self-supervised pipeline, where the VAE is pre-trained to disentangle factors such as source multiplicity and lobe asymmetry.

If this is right

Combining the generative reconstructions with standard augmentations produces higher downstream classification accuracy than either alone.
Ablation studies separate the performance contribution of generative views from that of conventional image transformations.
Fanaroff-Riley class identity appears as a continuous transition across the latent space rather than a single discrete dimension.
Disentanglement-aware self-supervised learning offers a scalable route for handling complex morphological variation in upcoming large radio surveys.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same hybrid approach could be tested on optical or infrared survey images to check whether generative views add value outside the radio domain.
Because the latent space encodes morphology continuously, the representations might support regression tasks such as estimating lobe flux ratios in addition to discrete classification.
Jointly optimizing the beta-VAE with the contrastive loss might further improve the quality of the generated views without separate pre-training stages.

Load-bearing premise

The reconstructions from the beta-VAE supply morphological variations that standard image augmentations do not already provide and that these extra variations remain useful after transfer to the downstream classifier.

What would settle it

Train identical downstream classifiers on the same labeled test set after pre-training with and without the beta-VAE reconstructions; the central claim is falsified if classification accuracy shows no statistically significant gain when the generative views are included.

read the original abstract

Modern astronomical surveys are producing progressively larger and more complex datasets, making traditional supervised approaches that rely on extensive labelled catalogues increasingly difficult. Consequently, pre-training using self-supervised learning (SSL), which offers a scalable route by extracting structure directly from unlabelled images, is becoming attractive for many downstream applications. In this work we consider the use of coupled self-supervised representation learning approaches for radio galaxy morphology pre-training. In order to account for the more nuanced variations in radio galaxy morphology than are typically included in the augmented views of view-based SSL algorithms, we use a pre-trained Variational Autoencoder (VAE) to generate views for training a larger view-based self-supervised model. To do this, a $\beta$-VAE was trained on the Radio Galaxy Zoo (RGZ) dataset, where moderate regularization ($\beta = 2.3$) was found to provide a good balance between reconstruction quality and disentanglement of generative factors such as source multiplicity and lobe asymmetry. An analysis of the $\beta$-VAE reveals that Fanaroff-Riley class identity manifests as a continuous transition across the latent space, rather than being associated to a single discrete dimension. $\beta$-VAE reconstructions were then incorporated as generative augmentations within a view-based SSL pipeline. Our experiments show that combining these generative views with standard image augmentations improves downstream classification performance, and we present ablation studies clarifying the relative contribution of each augmentation type. These results indicate that generative and contrastive approaches are complementary, and point toward disentanglement-aware self-supervised learning as a promising direction for future radio astronomy surveys.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper tries beta-VAE reconstructions as extra generative views inside a contrastive SSL pipeline for radio galaxies, but the abstract supplies no numbers so the size of any gain stays unclear.

read the letter

The core move is to train a beta-VAE on RGZ images with beta set to 2.3, then feed its reconstructions into a view-based SSL model as one of the augmentations alongside the usual flips and crops. The claim is that this mix lifts downstream classification accuracy and that ablations show the generative views contribute something distinct. The latent-space check that FR class appears as a smooth transition rather than a single axis is a concrete observation worth having on record. That part of the work is straightforward and domain-appropriate; radio morphology really does vary in ways that generic augmentations can miss, so looking for a generative source of those variations is reasonable. The paper also keeps the pipeline simple enough that others could replicate the steps without new machinery. The main limitation is that the abstract gives no accuracy deltas, no baseline numbers, no error bars, and no description of the exact ablation controls. Without those, it is impossible to tell whether the reported improvement comes from the specific disentangling effect of the beta term or simply from adding one more diverse view. A control that swaps in a standard VAE or a deterministic autoencoder while holding reconstruction quality fixed would settle that question directly, and the current description does not confirm such a control exists. If the full manuscript contains those numbers and the ablations are tight, the result is useful for anyone scaling SSL to unlabeled radio survey data. If the gains shrink once the controls are added, the paper still serves as a clear negative result on this particular hybrid. Either way the work is coherent on its own terms and the authors engage honestly with the components they combine. I would bring it to a reading group focused on astronomical machine learning to see the actual tables. It is worth sending to peer review so referees can check the experimental details and statistics.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes pre-training radio galaxy morphology representations via self-supervised learning by augmenting standard image views with generative reconstructions from a pre-trained β-VAE (β=2.3) trained on the Radio Galaxy Zoo dataset. The β-VAE is shown to disentangle factors such as source multiplicity and lobe asymmetry, with Fanaroff-Riley class identity manifesting as a continuous latent-space transition rather than a discrete dimension. These generative views are incorporated into a view-based SSL pipeline, with the central claim being that the combination yields improved downstream classification performance relative to standard augmentations alone, supported by ablation studies on the relative contributions of each view type.

Significance. If the reported improvements prove robust under proper controls and quantitative reporting, the work would demonstrate a practical way to inject morphology-aware generative diversity into contrastive SSL for radio astronomy. This could be useful for scaling to unlabeled data from next-generation surveys, highlighting complementarity between variational disentanglement and standard augmentations without requiring new labeled catalogs.

major comments (3)

[Abstract] The abstract states that experiments show improved downstream classification and that ablations clarify contributions, yet no numerical results (accuracies, deltas vs. baselines, error bars, dataset splits, or statistical tests) are provided. Without these, the magnitude and reliability of the claimed gain cannot be evaluated. (Abstract; Experiments section)
[Experiments] The skeptic concern is valid: the ablation studies do not appear to include a control that replaces the β-VAE (β=2.3) with a standard VAE (β=1) or deterministic autoencoder while matching reconstruction fidelity. This control is required to test whether the reported benefit arises from the specific disentangled morphological variations (e.g., lobe asymmetry) or merely from adding any additional view diversity or reconstruction noise. (Ablation studies subsection)
[Latent space analysis] The latent-space analysis claims a continuous FR-class transition, but the manuscript does not quantify how this continuity is measured or demonstrate that it directly drives the downstream SSL improvement (e.g., via targeted interventions on the relevant latent dimensions). (Latent space analysis section)

minor comments (2)

[Methods] Notation for the β-VAE objective and the precise definition of the generative views (how reconstructions are sampled and combined with standard augmentations) should be stated explicitly in the methods to allow reproduction.
[Experiments] The manuscript should report the exact downstream classifier architecture, training protocol, and evaluation metrics (e.g., accuracy, F1, or AUC) used to measure the claimed improvement.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their detailed and constructive feedback on our manuscript. We have carefully considered each major comment and provide point-by-point responses below. We will make revisions to address the concerns raised, particularly by including more quantitative details and additional controls.

read point-by-point responses

Referee: [Abstract] The abstract states that experiments show improved downstream classification and that ablations clarify contributions, yet no numerical results (accuracies, deltas vs. baselines, error bars, dataset splits, or statistical tests) are provided. Without these, the magnitude and reliability of the claimed gain cannot be evaluated. (Abstract; Experiments section)

Authors: We agree that the abstract lacks specific numerical results, which would help readers evaluate the improvements immediately. In the revised version, we will incorporate key results including classification accuracies, performance deltas compared to baselines, error bars, and references to dataset splits and statistical significance where applicable. revision: yes
Referee: [Experiments] The skeptic concern is valid: the ablation studies do not appear to include a control that replaces the β-VAE (β=2.3) with a standard VAE (β=1) or deterministic autoencoder while matching reconstruction fidelity. This control is required to test whether the reported benefit arises from the specific disentangled morphological variations (e.g., lobe asymmetry) or merely from adding any additional view diversity or reconstruction noise. (Ablation studies subsection)

Authors: This is a valid concern. Our β=2.3 was selected for optimal disentanglement as evidenced by the separation of factors like source multiplicity. To directly address this, we will include an additional ablation study comparing the β-VAE generative views to those from a standard VAE (β=1) and a deterministic autoencoder, ensuring matched reconstruction quality where possible. This will help isolate the contribution of the disentangled representations. revision: yes
Referee: [Latent space analysis] The latent-space analysis claims a continuous FR-class transition, but the manuscript does not quantify how this continuity is measured or demonstrate that it directly drives the downstream SSL improvement (e.g., via targeted interventions on the relevant latent dimensions). (Latent space analysis section)

Authors: We will enhance the latent space analysis section to include quantitative measures of continuity, such as the results of linear interpolations in latent space and correlations between latent dimensions and morphological properties like FR class. While we do not perform explicit targeted interventions in the current experiments, the ablation studies demonstrate the overall benefit of these generative views, and we will add discussion linking the continuous transitions to the improved augmentation diversity. revision: partial

Circularity Check

0 steps flagged

No circularity: derivation relies on standard VAE/SSL components and empirical evaluation

full rationale

The paper trains a β-VAE (β=2.3 chosen for reconstruction-disentanglement balance) on RGZ data to produce generative views, then inserts those views into a standard view-based SSL pipeline and measures downstream classification gains via ablations. No equation or claim reduces a target quantity to a fitted input by construction, no self-citation supplies a load-bearing uniqueness theorem, and no ansatz is smuggled in. The central result is an empirical performance delta, not a mathematical identity that collapses to its own inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on the standard VAE assumption that a regularized latent space captures disentangled generative factors of radio galaxy morphology and that these factors translate into useful augmentations for contrastive learning.

free parameters (1)

beta = 2.3
Regularization strength chosen to balance reconstruction quality and disentanglement of factors such as source multiplicity and lobe asymmetry.

axioms (2)

domain assumption A beta-VAE latent space can represent continuous transitions in Fanaroff-Riley class identity rather than discrete dimensions.
Invoked when interpreting the learned representations for radio galaxy morphology.
domain assumption Generative reconstructions from the VAE supply morphological variations complementary to standard geometric and photometric augmentations.
Central premise for combining the two augmentation families.

pith-pipeline@v0.9.0 · 5586 in / 1262 out tokens · 19132 ms · 2026-05-15T20:10:39.868009+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

a β-VAE was trained on the Radio Galaxy Zoo (RGZ) dataset, where moderate regularization (β = 2.3) was found to provide a good balance between reconstruction quality and disentanglement of generative factors such as source multiplicity and lobe asymmetry
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

An analysis of the β-VAE reveals that Fanaroff-Riley class identity manifests as a continuous transition across the latent space, rather than being associated to a single discrete dimension

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.