arxiv: 2604.09787 · v1 · submitted 2026-04-10 · 🌌 astro-ph.IM · astro-ph.GA· cs.LG

Learning What's Real: Disentangling Signal and Measurement Artifacts in Multi-Sensor Data, with Applications to Astrophysics

Pablo Mercader-Perez , Carolina Cuesta-Lazaro , Daniel Muthukrishna , Jeroen Audenaert , V. Ashley Villar , David W. Hogg , Marc Huertas-Company , William T. Freeman This is my paper

Pith reviewed 2026-05-10 15:55 UTC · model grok-4.3

classification 🌌 astro-ph.IM astro-ph.GAcs.LG

keywords disentangled representationsmulti-sensor datacounterfactual generationastrophysicsgalaxy imagingself-supervised learninginstrument artifactsdual encoder

0 comments p. Extension

The pith

Overlapping observations from different instruments train a model to isolate intrinsic galaxy signals from sensor artifacts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a deep learning method that trains on pairs of images showing the same galaxies but captured by different telescopes. A dual-encoder network plus a counterfactual generation objective forces the shared representation to capture only the properties that remain unchanged across instruments while routing sensor-specific effects elsewhere. If the separation succeeds, downstream tasks such as estimating galaxy parameters or finding similar objects no longer mix measurement distortions with the underlying physics. The approach is shown on galaxy images from two large imaging surveys and framed as a general template for scientific self-supervised learning.

Core claim

A dual-encoder architecture trained with a counterfactual generation objective on overlapping multi-instrument observations produces representations that explicitly separate intrinsic signals from sensor-specific distortions and noise. These representations support generating counterfactual images as if observed by the alternate instrument, performing parameter inference unconfounded by measurement artifacts, and conducting instrument-independent similarity searches. The method treats sensor effects as augmentations and constructs training pairs directly from overlapping observations of the same physical objects.

What carries the argument

Dual-encoder architecture with counterfactual generation objective that treats sensor-specific effects as augmentations on overlapping observations of identical objects.

If this is right

Counterfactual images can be generated to show how the same galaxy would appear under a different sensor.
Parameter inference on galaxy properties can proceed without confounding from instrument-specific distortions.
Similarity searches for galaxies become independent of which instrument recorded the data.
The same training recipe applies to other scientific multi-modal settings by constructing pairs from overlapping observations and treating sensor differences as augmentations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same separation could be applied in other domains that collect overlapping multi-sensor measurements of the same targets, such as combining satellite and ground-based observations.
If the learned representations prove fully invariant, a single downstream model could be trained once and deployed on data from any future instrument without retraining.
Direct comparison of counterfactual generations against new overlapping observations provides an ongoing, label-free test of whether the separation remains reliable as surveys expand.

Load-bearing premise

Overlapping observations of the same physical objects across instruments contain enough shared signal for the model to isolate sensor artifacts through counterfactual training without explicit artifact labels.

What would settle it

After training, generate counterfactual images of held-out galaxies as they would appear under the second instrument and compare them quantitatively to the actual second-instrument observations; large systematic mismatches beyond noise levels would show the disentanglement is incomplete.

Figures

Figures reproduced from arXiv: 2604.09787 by Carolina Cuesta-Lazaro, Daniel Muthukrishna, David W. Hogg, Jeroen Audenaert, Marc Huertas-Company, Pablo Mercader-Perez, V. Ashley Villar, William T. Freeman.

**Figure 1.** Figure 1: Counterfactual Reconstruction Framework. The model learns to disentangle intrinsic galaxy properties from instrument systematics by reconstructing an anchor image via a dual-encoder architecture and conditional flow matching. Training uses data triplets consisting of: an anchor observation (signal s from instrument i), an instrument-augmented observation (same source s, a different instrument i ′ ), and a … view at source ↗

**Figure 2.** Figure 2: Multi-instrument Galaxy Reconstructions. Columns 1–2 (Input Conditioning): The target galaxy observed via an alternate instrument (input to the physics encoder) and a set of up to five different galaxies imaged by the target instrument (input to the instrument encoder). Column 3 (Ground Truth): The original anchor (target) image, withheld from the encoders. Columns 4–5 (Posterior Samples): Independent samp… view at source ↗

**Figure 3.** Figure 3: Left: Pixel-wise Z-score distribution of generated posterior samples relative to groundtruth target images, across all pixels and held-out galaxies. For each pixel, x is the true value for a given pixel, xˆ is the predicted one, E[ˆx] is the posterior sample mean over pixels and galaxies and std(ˆx) is the posterior sample standard deviation. Both HSC-anchored and Legacy-anchored reconstructions closely a… view at source ↗

**Figure 4.** Figure 4: Latent Space Disentanglement. UMAP projections of the physics (left) and instrument (right) latent spaces. Orange and blue points represent HSC and Legacy images, respectively. Matched markers (△, ×, □, ◦) denote cross-survey pairs of the same galaxy. In the physics space, the encoders produce overlapping distributions where cross-survey pairs are mapped to similar coordinates. In contrast, the instrumen… view at source ↗

**Figure 5.** Figure 5: Probing Latent Disentanglement via Downstream Regression. We report R2 scores for the prediction of physics-related properties (left) and instrumental properties (right) on four sets of representations: our physics encoder latents (blue), the instrument encoder latents (red), AION-1 embeddings which treat each survey as an independent modality (green), and a randomly initialized ResNet-18 of the same archi… view at source ↗

**Figure 6.** Figure 6: Instrument-Invariant Nearest-Neighbor Search. We evaluate the disentangled latent spaces by performing nearest neighbor retrieval using paired HSC and Legacy observations as queries. Physics Space Retrieval: For both queries, the corresponding pair from the alternate survey is identified as the top-1 neighbor (red). Additional top-ranked neighbors are physically similar galaxies from a mixture of both ins… view at source ↗

**Figure 7.** Figure 7: Spatial Structure Preservation. Power spectrum (left) and autocorrelation (right) per band for ground-truth and generated images. Each plot shows the true value (blue), the posterior mean from 32 generated samples (orange dashed line), and the 1σ posterior standard deviation (orange shaded region). Strong agreement across spatial frequencies and pixel separation (lag) distances indicates that the model cap… view at source ↗

read the original abstract

Data collected from the physical world is always a combination of multiple sources: an underlying signal from the physical process of interest and a signal from measurement-dependent artifacts from the sensor or instrument. This secondary signal acts as a confounding factor, limiting our ability to extract information about the physics underlying the phenomena we observe. Furthermore, it complicates the combination of observations in heterogeneous or multi-instrument settings. We propose a deep learning framework that leverages overlapping observations, a dual-encoder architecture, and a counterfactual generation objective to disentangle these factors of variation. The resulting representations explicitly separate intrinsic signals from sensor-specific distortions and noise, and can be used for counterfactual view generation, parameter inference unconfounded by measurement distortions, and instrument-independent similarity search. We demonstrate the effectiveness of our approach on astrophysical galaxy images from the DESI Legacy Imaging Survey (Legacy) and the Hyper Suprime-Cam (HSC) Survey as a representative multi-instrument setting. This framework provides a general recipe for scientific and multi-modal self-supervised pretraining: construct training pairs from overlapping observations of the same physical system, treat sensor- or modality-specific effects as augmentations, and learn invariant representations through counterfactual generation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper proposes a dual-encoder model with a counterfactual loss to separate galaxy signals from survey-specific artifacts using overlapping Legacy and HSC observations, but the abstract supplies no results or checks to show it works.

read the letter

The core idea is straightforward: train two encoders on paired galaxy images from different instruments so one latent captures the shared physical signal while the other absorbs sensor distortions, then use a counterfactual generation objective to make the separation explicit. This lets the model produce what a galaxy would look like under the other instrument and supports downstream tasks like bias-free inference or cross-survey matching. The approach is new in its direct application to multi-survey astrophysics imaging rather than generic domain adaptation, and the self-supervised recipe of treating overlaps as natural pairs is a clean way to avoid labeled artifact data. It does well at laying out a practical workflow that could extend to other multi-instrument settings where perfect calibration is impossible. The framing is honest about the confounding role of measurement effects and avoids overclaiming generality beyond the data-fusion use case. The main weakness is the complete absence of any quantitative evidence. The abstract asserts effectiveness on Legacy and HSC galaxies yet shows no metrics, ablations, or even basic reconstruction quality, so it is impossible to tell whether the separation is real or trivial. The stress-test concern lands: differences between overlapping fields include real variations in depth, seeing, and filter response that affect morphology and flux, not just artifacts. Without an explicit invariance term forcing the shared latent to reconstruct both views after artifact removal, the model can satisfy the loss by routing genuine signal differences into the sensor branch, which would break the claimed uses for unconfounded inference and instrument-independent search. The assumption that all observed differences are purely artifactual is therefore fragile and needs direct testing. This work is aimed at astronomers and machine-learning researchers who combine heterogeneous survey data and want self-supervised pretraining ideas. A reader looking for concrete recipes in multi-modal scientific settings could extract useful structure from the method description even if the experiments are missing. It deserves serious peer review because the problem is well-posed and the proposed architecture is a reasonable starting point, but only once the authors supply the missing validation and address the identifiability issue. I would send it out with a request for those checks rather than desk-rejecting it outright.

Referee Report

2 major / 0 minor

Summary. The paper proposes a self-supervised deep learning framework to disentangle intrinsic astrophysical signals from sensor-specific artifacts and noise in multi-instrument data. It employs overlapping observations of the same galaxies from the DESI Legacy Imaging Survey and Hyper Suprime-Cam Survey, a dual-encoder architecture with explicit signal and artifact branches, and a counterfactual generation objective. The resulting representations are claimed to support counterfactual view generation, parameter inference unconfounded by measurement distortions, and instrument-independent similarity search, and are positioned as a general recipe for scientific multi-modal pretraining by treating sensor effects as augmentations.

Significance. If the separation of intrinsic signal from artifacts can be achieved reliably without trivial solutions, the framework would be significant for astrophysics and other multi-sensor domains. It offers a practical way to leverage existing overlapping observations for artifact-robust representations, potentially improving cross-survey consistency, enabling more reliable downstream inference, and providing a template for self-supervised learning where paired views of the same physical system are available.

major comments (2)

[Method section] Method section (dual-encoder + counterfactual setup): The objective assumes any difference between paired Legacy/HSC observations of the same galaxy is purely sensor artifact. In practice, differences in depth, seeing, and filter transmission can alter observed morphology and flux distributions themselves. Without an explicit invariance penalty or reconstruction term forcing the shared latent to reconstruct both views after artifact removal, the optimization can satisfy the loss by routing real signal variations into sensor-specific branches, making the claimed separation non-unique and downstream uses (unconfounded inference, instrument-independent search) unreliable.
[Abstract and results] Abstract and results: The manuscript asserts effectiveness on Legacy and HSC galaxy images for the listed tasks but supplies no quantitative metrics, ablation studies, error analysis, baseline comparisons, or implementation details. This absence prevents evaluation of whether the disentanglement holds or whether the representations deliver the claimed benefits.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which highlight important aspects of our framework's robustness and evaluation. We address each major comment below and outline revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Method section] Method section (dual-encoder + counterfactual setup): The objective assumes any difference between paired Legacy/HSC observations of the same galaxy is purely sensor artifact. In practice, differences in depth, seeing, and filter transmission can alter observed morphology and flux distributions themselves. Without an explicit invariance penalty or reconstruction term forcing the shared latent to reconstruct both views after artifact removal, the optimization can satisfy the loss by routing real signal variations into sensor-specific branches, making the claimed separation non-unique and downstream uses (unconfounded inference, instrument-independent search) unreliable.

Authors: We appreciate this observation on potential degeneracies in the optimization. Our dual-encoder design with explicit signal and artifact branches, combined with the counterfactual generation objective, is intended to isolate invariant intrinsic signals by training the shared branch to produce consistent representations across instruments. However, we acknowledge that an explicit reconstruction or invariance term would further constrain the solution space and reduce the risk of signal leakage into artifact branches. In the revised manuscript, we will incorporate an additional reconstruction loss requiring the shared latent (after artifact removal) to reconstruct both input views, along with an invariance penalty on the shared representations for paired observations. This will be detailed in an updated Method section with accompanying equations and ablation results demonstrating its impact. revision: yes
Referee: [Abstract and results] Abstract and results: The manuscript asserts effectiveness on Legacy and HSC galaxy images for the listed tasks but supplies no quantitative metrics, ablation studies, error analysis, baseline comparisons, or implementation details. This absence prevents evaluation of whether the disentanglement holds or whether the representations deliver the claimed benefits.

Authors: We agree that the current manuscript would benefit from more rigorous quantitative support to substantiate the claims. The provided abstract and results emphasize the conceptual framework and qualitative examples of counterfactual generation and similarity search. In the revision, we will expand the Results section to include quantitative metrics (e.g., accuracy and consistency scores for instrument-independent retrieval, mean squared error on unconfounded parameter inference tasks), ablation studies (removing the counterfactual objective, artifact branch, or shared encoder), error analysis (including variance across galaxy types and noise levels), baseline comparisons (e.g., against standard contrastive methods like SimCLR or autoencoders without disentanglement), and full implementation details (hyperparameters, training protocol, and code availability). These additions will allow direct assessment of the disentanglement quality and downstream utility. revision: yes

Circularity Check

0 steps flagged

No circularity: separation learned via external paired data and self-supervised objective

full rationale

The paper defines a dual-encoder architecture trained on overlapping multi-instrument observations (Legacy/HSC galaxy images) using a counterfactual generation objective to produce representations that separate intrinsic signals from sensor artifacts. This chain is self-contained: inputs are real paired observations of the same physical objects, the model is optimized end-to-end on a reconstruction-style loss that does not presuppose the target separation, and downstream uses (counterfactual generation, unconfounded inference) follow directly from the learned latents. No equation reduces the claimed disentanglement to a fitted parameter or self-citation that is itself defined by the same result; the method does not rename or smuggle in prior results by construction. The reader's score of 2.0 is consistent with minor self-citation risk at most, but none is load-bearing here.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the existence of overlapping observations across instruments and the assumption that sensor effects behave like learnable augmentations separable by counterfactual objectives.

free parameters (1)

counterfactual loss weighting hyperparameters
The framework balances reconstruction, counterfactual, and invariance terms; specific weights are required but not detailed in the abstract.

axioms (1)

domain assumption Overlapping observations of identical physical objects exist between the two surveys
The training pairs are constructed from shared sky regions in DESI Legacy and HSC data.

invented entities (1)

dual-encoder architecture with explicit signal and artifact branches no independent evidence
purpose: To produce representations that isolate intrinsic signals from sensor distortions
Introduced as the core architectural choice of the framework.

pith-pipeline@v0.9.0 · 5555 in / 1410 out tokens · 52159 ms · 2026-05-10T15:55:20.714668+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

4 extracted references · 4 canonical work pages · 1 internal anchor

[1]

J., Koch, D., Basri, G., et al

doi: 10.1126/science.1185402. Yezhen Cong, Samar Khanna, Chenlin Meng, Patrick Liu, Erik Rozi, Yutong He, Marshall Burke, David Lobell, and Stefano Ermon. Satmae: Pre-training transformers for temporal and multi- spectral satellite imagery.Advances in Neural Information Processing Systems, 35:197–211,

work page doi:10.1126/science.1185402
[2]

11 Published at The 2nd Workshop on Foundation Models for Science at ICLR 2026 Remi Denton and Vighnesh Birodkar

URLhttps://arxiv.org/abs/2207.08051. 11 Published at The 2nd Workshop on Foundation Models for Science at ICLR 2026 Remi Denton and Vighnesh Birodkar. Unsupervised Learning of Disentangled Representations from Video, May 2017. URLhttp://arxiv.org/abs/1705.10915. arXiv:1705.10915 [cs]. Arjun Dey, David J. Schlegel, Dustin Lang, Robert Blum, Kaylan Burleigh...

work page arXiv 2026
[3]

J., Lang, D., et al

doi: 10.3847/1538-3881/ab089d. Arjun Dey, David J. Schlegel, Dustin Lang, Robert Blum, Kaylan Burleigh, Xiaohui Fan, Joseph R. Findlay, Doug Finkbeiner, David Herrera, St´ephanie Juneau, Martin Landriau, Michael Levi, Ian McGreer, Aaron Meisner, Adam D. Myers, John Moustakas, Peter Nugent, Anna Patej, Edward F. Schlafly, Alistair R. Walker, Francisco Vald...

work page doi:10.3847/1538-3881/ab089d 2026
[4]

Gravitational-Wave Parameter Estimation in non-Gaussian noise using Score-Based Likelihood Characterization

URLhttps://openreview.net/forum?id=Sy2fzU9gl. 13 Published at The 2nd Workshop on Foundation Models for Science at ICLR 2026 David G. Koch, William J. Borucki, Gibor Basri, Natalie M. Batalha, Timothy M. Brown, Dou- glas Caldwell, Jørgen Christensen-Dalsgaard, William D. Cochran, Edna DeV ore, Edward W. Dunham, III Gautier, Thomas N., John C. Geary, Ronal...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1088/2041-8205/713/2/l79 2026