arxiv: 2601.11689 · v2 · submitted 2026-01-16 · 📡 eess.IV · cs.CV

Bridging Modalities: Joint Synthesis and Registration Framework for Aligning Diffusion MRI with T1-Weighted Images

Xiaofan Wang , Junyi Wang , Yuqian Chen , Lauren J. O' Donnell , Fan Zhang This is my paper

Pith reviewed 2026-05-16 13:42 UTC · model grok-4.3

classification 📡 eess.IV cs.CV

keywords multimodal registrationdiffusion MRIimage synthesisunsupervised learningdeformation fieldT1-weighted alignmentgenerative registration

0 comments

The pith

A joint synthesis-registration network generates T1w-like images from diffusion MRI b0 volumes to convert cross-modal alignment into a standard unimodal registration task.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that generating synthetic images with T1-weighted contrast from diffusion data lets the registration network work entirely within a single contrast domain before applying the learned deformation back to the original diffusion space. This sidesteps the intensity mismatch that usually defeats direct multimodal methods. The network is trained unsupervised by maximizing both local structural similarity between the synthetic and real T1w images and a statistical dependency term that links the two modalities. Experiments on two separate datasets indicate higher accuracy than several existing multimodal registration approaches.

Core claim

The unsupervised generative registration network first produces a T1w-like image from the diffusion b0 volume, then estimates a deformation field that aligns this synthetic image to the fixed T1w volume; the same deformation is applied to the original diffusion data. Joint optimization of local structural similarity and cross-modal statistical dependency produces the final deformation estimate.

What carries the argument

The generative registration network that jointly synthesizes a T1w-like image and learns the deformation field from it to the real T1w image.

If this is right

The learned deformation field can be applied directly to diffusion-derived maps (FA, MD, tractography) to place them in the T1w anatomical space without additional alignment steps.
Because the synthesis step is unsupervised, the framework requires no paired ground-truth deformations for training.
The same joint synthesis-registration pattern can be retrained on other diffusion contrasts or scanner vendors without changing the overall architecture.
Improved alignment accuracy should reduce errors when diffusion metrics are later used for surgical planning or longitudinal studies that also rely on T1w anatomy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the synthesis step can be made fast enough at inference time, the method could be inserted into existing clinical diffusion pipelines with minimal extra compute.
The same idea might extend to aligning other modality pairs where one contrast is harder to register directly, such as CT to MRI or PET to structural MRI.
A failure mode would appear if the synthetic image introduces spurious structures that the registration network then locks onto, producing systematic bias in the deformation field.

Load-bearing premise

The synthesized T1w-like images preserve enough structural detail that registration errors measured in the synthetic domain correspond to accurate deformations when transferred back to the original diffusion volumes.

What would settle it

A head-to-head test on a new dataset in which a direct multimodal registration method achieves lower target registration error or higher overlap of anatomical landmarks than the proposed synthesis-plus-registration pipeline.

Figures

Figures reproduced from arXiv: 2601.11689 by Fan Zhang, Junyi Wang, Lauren J. O' Donnell, Xiaofan Wang, Yuqian Chen.

**Figure 2.** Figure 2: Warped images (row 1, columns 2–6) and the corresponding instance deformation fields [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Visual comparison of registration methods on two datasets. Each row shows the warped images alongside the [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

read the original abstract

Multimodal image registration between diffusion MRI (dMRI) and T1-weighted (T1w) MRI images is a critical step for aligning diffusion-weighted imaging (DWI) data with structural anatomical space. Traditional registration methods often struggle to ensure accuracy due to the large intensity differences between diffusion data and high-resolution anatomical structures. This paper proposes an unsupervised registration framework based on a generative registration network, which transforms the original multimodal registration problem between b0 and T1w images into a unimodal registration task between a generated image and the real T1w image. This effectively reduces the complexity of cross-modal registration. The framework first employs an image synthesis model to generate images with T1w-like contrast, and then learns a deformation field from the generated image to the fixed T1w image. The registration network jointly optimizes local structural similarity and cross-modal statistical dependency to improve deformation estimation accuracy. Experiments conducted on two independent datasets demonstrate that the proposed method outperforms several state-of-the-art approaches in multimodal registration tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Joint synthesis-registration is a reasonable framing but the abstract gives no numbers or direct checks that the learned warp actually works on the native dMRI data.

read the letter

The paper's main move is to convert dMRI-to-T1w registration into a unimodal problem by synthesizing a T1-like image from the b0 volume inside the same network that learns the deformation field. They optimize the synthesis and the warp together using local structural similarity plus a cross-modal statistical term. That joint setup is the piece that differs from the usual separate synthesis-then-register pipelines, and it could cut down on error stacking if it works as intended. On two datasets they claim better results than prior methods, which would be a practical win for pipelines that need reliable alignment for tractography or such.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes an unsupervised generative registration framework for aligning diffusion MRI (dMRI) b0 volumes with T1-weighted (T1w) images. It first synthesizes T1w-like contrast images from the dMRI data, converts the multimodal problem into unimodal registration between the synthetic image and the real T1w volume, and learns a deformation field that is then applied back to the original dMRI. The registration network jointly optimizes local structural similarity and cross-modal statistical dependency. Experiments on two independent datasets are reported to show outperformance over several state-of-the-art multimodal registration approaches.

Significance. If the synthesis step faithfully preserves anatomical geometry and the learned deformations transfer without distortion, the method could simplify and improve accuracy in dMRI-T1w alignment tasks common in neuroimaging pipelines. The unsupervised joint-optimization design and reduction to unimodal registration are conceptually attractive strengths that, if substantiated, would represent a practical advance over intensity-based or mutual-information methods.

major comments (2)

[§4] §4 (Experiments): The central performance claim that the method outperforms SOTA approaches on two datasets is load-bearing, yet the manuscript provides no isolated quantitative validation of synthesis fidelity (e.g., landmark target registration error or Dice overlap between synthesized T1w-like images and real T1w volumes). Without these metrics, it remains unclear whether registration errors measured in the synthetic domain correspond one-to-one with errors on the native dMRI data.
[§3.2] §3.2 (Registration network): The joint optimization of local structural similarity and cross-modal statistical dependency is presented as key to accurate deformation estimation, but no ablation results isolate the contribution of each term or demonstrate that their combination is necessary for the reported gains over baselines.

minor comments (2)

[Abstract] Abstract: The performance claim would be strengthened by including at least one key quantitative metric (with error bars or statistical test) rather than a qualitative statement of outperformance.
[§3] Notation: The deformation field φ is introduced without an explicit equation defining its composition with the synthesis operator; adding this would improve clarity when describing how φ is applied back to the original dMRI.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment below, providing our response and indicating planned revisions to strengthen the manuscript.

read point-by-point responses

Referee: [§4] §4 (Experiments): The central performance claim that the method outperforms SOTA approaches on two datasets is load-bearing, yet the manuscript provides no isolated quantitative validation of synthesis fidelity (e.g., landmark target registration error or Dice overlap between synthesized T1w-like images and real T1w volumes). Without these metrics, it remains unclear whether registration errors measured in the synthetic domain correspond one-to-one with errors on the native dMRI data.

Authors: We agree that isolated quantitative validation of synthesis fidelity would provide valuable additional support for the claims. Our primary evaluation metrics focus on end-to-end registration accuracy (e.g., Dice scores on anatomical structures and target registration error where landmarks are available), as these directly measure the utility for dMRI-T1w alignment. However, to address the concern about correspondence between synthetic and native domains, we will add synthesis-specific metrics in the revised manuscript, including SSIM and PSNR computed between synthesized T1w-like images and real T1w volumes on held-out validation data from both datasets. Where anatomical segmentations are available, we will also report Dice overlap between labels derived from the synthesized images and those from real T1w images. These additions will help confirm geometric preservation in the synthesis step and clarify the relationship to registration performance. revision: yes
Referee: [§3.2] §3.2 (Registration network): The joint optimization of local structural similarity and cross-modal statistical dependency is presented as key to accurate deformation estimation, but no ablation results isolate the contribution of each term or demonstrate that their combination is necessary for the reported gains over baselines.

Authors: We acknowledge that ablation studies would better isolate the contributions of the individual loss terms and demonstrate the necessity of their joint optimization. The current manuscript emphasizes the overall framework and end-to-end results, but we agree this leaves the design rationale less substantiated. In the revised version, we will include new ablation experiments comparing three variants of the registration network: (1) using only the local structural similarity loss, (2) using only the cross-modal statistical dependency loss, and (3) the full joint optimization. These results will be reported alongside the baseline comparisons to show the incremental gains from each term and confirm that the combination is required to achieve the reported improvements. revision: yes

Circularity Check

0 steps flagged

No circularity: new synthesis-plus-registration pipeline validated empirically on independent data

full rationale

The manuscript introduces a generative registration network that first synthesizes T1w-like contrast from dMRI b0 volumes and then estimates a deformation field between the synthetic image and the real T1w target; the resulting field is applied back to the original diffusion data. This pipeline is presented as an unsupervised architectural choice rather than a derivation from prior equations. No load-bearing step reduces by construction to a fitted parameter renamed as a prediction, a self-citation chain, or an ansatz smuggled through citation. The reported superiority on two independent datasets rests on direct experimental comparison, not on tautological re-expression of the input data or self-referential definitions. The framework therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The approach implicitly relies on standard assumptions of deep generative models (e.g., that adversarial or reconstruction losses produce anatomically plausible contrast) and diffeomorphic registration (smooth invertible deformations), but these are not enumerated.

pith-pipeline@v0.9.0 · 5493 in / 1110 out tokens · 42916 ms · 2026-05-16T13:42:47.171276+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages

[1]

An atlas of white matter anatomy, its variability, and reproducibility based on constrained spherical deconvolution of diffusion MRI.Neuroimage, 254:119029, July 2022

Ahmed M Radwan, Stefan Sunaert, Kurt Schilling, Maxime Descoteaux, et al. An atlas of white matter anatomy, its variability, and reproducibility based on constrained spherical deconvolution of diffusion MRI.Neuroimage, 254:119029, July 2022

work page 2022
[2]

Mapping human whole-brain structural networks with diffusion MRI.PLoS One, 2:e597, July 2007

Patric Hagmann, Maciej Kurant, Xavier Gigandet, Patrick Thiran, et al. Mapping human whole-brain structural networks with diffusion MRI.PLoS One, 2:e597, July 2007

work page 2007
[3]

Deep visual domain adaptation: A survey.Neurocomputing, 312:135–153, 2018

M Wang and W Deng. Deep visual domain adaptation: A survey.Neurocomputing, 312:135–153, 2018

work page 2018
[4]

Recursive deformable pyramid network for unsupervised medical image registration.IEEE Trans

Haiqiao Wang, Dong Ni, and Yi Wang. Recursive deformable pyramid network for unsupervised medical image registration.IEEE Trans. Med. Imaging, 43:2229–2240, June 2024

work page 2024
[5]

Elastix: A toolbox for intensity-based medical image registration.IEEE Trans

Stefan Klein, Marius Staring, Keelin Murphy, Max A Viergever, and Josien P W Pluim. Elastix: A toolbox for intensity-based medical image registration.IEEE Trans. Med. Imaging, 29:196–205, January 2010

work page 2010
[6]

Intensity gradient based registration and fusion of multi-modal images.Methods Inf

Eldad Haber and Jan Modersitzki. Intensity gradient based registration and fusion of multi-modal images.Methods Inf. Med., 46:292–299, 2007

work page 2007
[7]

Deep learning in medical image registration: a survey.Mach

Grant Haskins, Uwe Kruger, and Pingkun Yan. Deep learning in medical image registration: a survey.Mach. Vis. Appl., 31, February 2020

work page 2020
[8]

Scalable high-performance image registration framework by unsupervised deep feature representations learning.IEEE Trans

Guorong Wu, Minjeong Kim, Qian Wang, Brent C Munsell, et al. Scalable high-performance image registration framework by unsupervised deep feature representations learning.IEEE Trans. Biomed. Eng., 63:1505–1516, July 2016

work page 2016
[9]

A deep learning framework for unsupervised affine and deformable image registration.Med

Bob D de V os, Floris F Berendsen, Max A Viergever, Hessam Sokooti, et al. A deep learning framework for unsupervised affine and deformable image registration.Med. Image Anal., 52:128–143, February 2019

work page 2019
[10]

Nonrigid image registration using multi-scale 3D convolutional neural networks

Hessam Sokooti, Bob de V os, Floris Berendsen, Boudewijn P F Lelieveldt, et al. Nonrigid image registration using multi-scale 3D convolutional neural networks. InMICCAI 2017, pages 232–239. 2017

work page 2017
[11]

Wells, and Lauren J

Fan Zhang, William M. Wells, and Lauren J. O’Donnell. Deep diffusion mri registration (ddmreg): A deep learning method for diffusion mri registration.IEEE Transactions on Medical Imaging, 41:1454–1467, 2022

work page 2022
[12]

V oxelMorph: A learning framework for deformable medical image registration.IEEE Trans

Guha Balakrishnan, Amy Zhao, Mert R Sabuncu, John Guttag, and Adrian V Dalca. V oxelMorph: A learning framework for deformable medical image registration.IEEE Trans. Med. Imaging, 38:1788–1800, February 2019

work page 2019
[13]

Spatial transformer networks.Advances in neural, page 2017–2025, 2015

M Jaderberg and K Simonyan. Spatial transformer networks.Advances in neural, page 2017–2025, 2015. 6

work page 2017
[14]

A survey on deep learning in medical image registration: New technologies, uncertainty, evaluation metrics, and beyond.Medical Image Analysis, 100:103385, 2025

Junyu Chen, Yihao Liu, Shuwen Wei, Zhangxing Bian, et al. A survey on deep learning in medical image registration: New technologies, uncertainty, evaluation metrics, and beyond.Medical Image Analysis, 100:103385, 2025

work page 2025
[15]

ContraReg: Contrastive learning of multi-modality unsupervised deformable image registration.Med

Neel Dey, Jo Schlemper, Seyed Sadegh Mohseni Salehi, Bo Zhou, et al. ContraReg: Contrastive learning of multi-modality unsupervised deformable image registration.Med. Image Comput. Comput. Assist. Interv., 13436:66–77, September 2022

work page 2022
[16]

Deformable MR-CT image registration using an unsupervised, dual-channel network for neurosurgical guidance.Med

R Han, C K Jones, J Lee, P Wu, et al. Deformable MR-CT image registration using an unsupervised, dual-channel network for neurosurgical guidance.Med. Image Anal., 75:102292, January 2022

work page 2022
[17]

SynthMorph: Learning contrast-invariant registration without acquired images.IEEE Trans

Malte Hoffmann, Benjamin Billot, Douglas N Greve, Juan Eugenio Iglesias, et al. SynthMorph: Learning contrast-invariant registration without acquired images.IEEE Trans. Med. Imaging, 41:543–558, March 2022

work page 2022
[18]

Comir: Contrastive multimodal image representation for registration

Nicolas Pielawski, Elisabeth Wetzer, Johan Öfverstedt, Jiahao Lu, et al. Comir: Contrastive multimodal image representation for registration. InAdvances in Neural Information Processing Systems, volume 33, pages 18433–18444, 2020

work page 2020
[19]

TransMorph: Transformer for unsupervised medical image registration.Med

Junyu Chen, Eric C Frey, Yufan He, William P Segars, et al. TransMorph: Transformer for unsupervised medical image registration.Med. Image Anal., 82:102615, November 2022

work page 2022
[20]

CycleMorph: Cycle consistent unsupervised deformable image registration.Med

Boah Kim, Dong Hwan Kim, Seong Ho Park, Jieun Kim, et al. CycleMorph: Cycle consistent unsupervised deformable image registration.Med. Image Anal., 71:102036, July 2021

work page 2021
[21]

Brain-id: Learning contrast-agnostic anatomical representations for brain imaging

Peirong Liu, Oula Puonti, Xiaoling Hu, Daniel C Alexander, et al. Brain-id: Learning contrast-agnostic anatomical representations for brain imaging. InEuropean Conference on Computer Vision, pages 322–340. Springer, 2024

work page 2024
[22]

Glasser, Stamatios N

Matthew F. Glasser, Stamatios N. Sotiropoulos, J. Anthony Wilson, Timothy S. Coalson, et al. The minimal preprocessing pipelines for the human connectome project.NeuroImage, 80:105–124, 2013

work page 2013
[23]

The parkinson progression marker initiative (PPMI).Prog

Kenneth Marek, Danna Jennings, Shirley Lasch, Andrew Siderowf, et al. The parkinson progression marker initiative (PPMI).Prog. Neurobiol., 95:629–635, December 2011

work page 2011
[24]

Avants, C.L

B.B. Avants, C.L. Epstein, M. Grossman, and J.C. Gee. Symmetric diffeomorphic image registration with cross-correlation: Evaluating automated labeling of elderly and neurodegenerative brain.Medical Image Analysis, 12:26–41, 2008

work page 2008
[25]

FreeSurfer.Neuroimage, 62:774–781, August 2012

Bruce Fischl. FreeSurfer.Neuroimage, 62:774–781, August 2012

work page 2012
[26]

DDParcel: Deep learning anatomical brain parcellation from diffusion MRI.IEEE Trans

Fan Zhang, Kang Ik Kevin Cho, Johanna Seitz-Holland, Lipeng Ning, et al. DDParcel: Deep learning anatomical brain parcellation from diffusion MRI.IEEE Trans. Med. Imaging, 43:1191–1202, March 2024. 7

work page 2024