MRecover: A Conditional Generative Model for Recovering Motion-Corrupted MR images Using AI Generated Contrast

Andrea Sajewski; Ariel Gildengers; Bruno de Almeida; Cong Chu; Courtney Clark; Hecheng Jin; Howard J. Aizenstein; Jacob Berardinelli; Jeremy J. Berardo; Jinghang Li

arxiv: 2605.21669 · v1 · pith:BU2DFSKInew · submitted 2026-05-20 · 💻 cs.CV · cs.AI

MRecover: A Conditional Generative Model for Recovering Motion-Corrupted MR images Using AI Generated Contrast

Jinghang Li , Tales Santini , Courtney Clark , Bruno de Almeida , Cong Chu , Salem Alkhateeb , Andrea Sajewski , Jacob Berardinelli

show 8 more authors

Hecheng Jin Tobias Campos Jeremy J. Berardo Joseph Mettenburg Ariel Gildengers Howard J. Aizenstein Minjie Wu Tamer S. Ibrahim

This is my paper

Pith reviewed 2026-05-22 09:06 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords MRI image synthesismotion artifact recoveryhippocampal subfield segmentationconditional generative modelT1 to TSE synthesisAlzheimer imagingdata recovery

0 comments

The pith

A conditional generative model turns routine T1-weighted scans into high-resolution T2-weighted images that recover motion-corrupted hippocampal subfield details.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MRecover, a model that synthesizes TSE images from T1w inputs with autoregressive slice conditioning to enforce volumetric consistency. Trained on 7T data, it reaches high fidelity scores in-domain and produces subfield volumes on out-of-domain 3T scans that correlate strongly with acquired images. In the motion-affected ADNI3 dataset this yields 31.8 percent more subjects that pass quality control and larger effect sizes when comparing diagnostic groups on hippocampal atrophy.

Core claim

MRecover is a conditional generative model that synthesizes T2-weighted turbo spin echo images from T1-weighted inputs using autoregressive slice conditioning; when trained on 577 7T volumes it attains SSIM of 0.84 and FSIM of 0.94 in-domain, while on 416 out-of-domain 3T cases the resulting subfield volumes correlate at r=0.87-0.97 with acquired images, recovering 593 analyzable subjects versus 450 and increasing effect sizes for group differences in hippocampal subfield atrophy from 0.086-0.062 to 0.121-0.100.

What carries the argument

MRecover, a conditional generative model that uses autoregressive slice conditioning to synthesize TSE contrast from T1w images while preserving volumetric consistency across slices.

If this is right

Motion-corrupted datasets such as ADNI3 retain 31.8 percent more subjects after quality control when synthesized images are used.
Larger sample sizes produce increased effect sizes for detecting diagnostic group differences in hippocampal subfield atrophy.
The model generalizes from 7T training data to 3T clinical scans without retraining.
Volume measurements extracted from synthesized images match those from acquired images at correlations of 0.87 to 0.97.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If boundary fidelity holds beyond volume totals, the approach could support more precise longitudinal tracking of subfield atrophy rates.
Wider adoption might reduce repeat-scan rates for motion-sensitive sequences in memory-clinic workflows.
Similar synthesis pipelines could be tested on other motion-vulnerable contrasts to improve overall MRI data yield.

Load-bearing premise

That close agreement in measured subfield volumes between synthesized and acquired images guarantees that fine anatomical boundaries have been recovered accurately enough for reliable segmentation.

What would settle it

Expert manual segmentation of motion-free 3T TSE images versus the same subjects' synthesized versions, checking whether boundary placement errors exceed the precision needed to detect the reported atrophy differences.

read the original abstract

Hippocampal subfield segmentation requires high-resolution T2w turbo spin echo (TSE) MRI, yet this sequence is susceptible to motion artifacts, leading to substantial data loss. We developed a conditional generative model (MRecover) that synthesizes routinely acquired T1w images to create TSE images with autoregressive slice conditioning for volumetric consistency. Trained on 7T MRI data (n=577), the model achieved high in-domain fidelity (n=148, SSIM=0.84, FSIM=0.94) and generalized well to out-of-domain 3T data: subfield volumes from synthesized and the as-acquired images closely matched: (n=416, r=0.87-0.97) and yielded 31.8% more analyzable subjects in the motion-affected ADNI3 dataset after quality control (593 vs 450). The synthesized images also achieved larger effect sizes due to increasing the sample size for diagnostic group differences in hippocampal subfield atrophy (whole hippocampus $\epsilon^2$= 0.121-0.100 vs. 0.086-0.062, left-right hemispheres). Project page: https://jinghangli98.github.io/MRecover/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MRecover gives a practical way to synthesize TSE images from T1w scans and recover more subjects for hippocampal subfield analysis, but volume correlations on 3T data do not confirm accurate boundary recovery for segmentation.

read the letter

The main point is that MRecover can synthesize TSE images from T1w scans to salvage motion-corrupted data, boosting the number of analyzable subjects in datasets like ADNI3 by nearly a third and yielding larger effect sizes for subfield atrophy differences. The work does a solid job on the application side. Training on 7T data gives good fidelity scores in domain. Generalizing to 3T shows strong volume correlations between synthesized and real images. This translates to including 143 more subjects after quality control and bigger effect sizes, from around 0.07 to 0.11 for the hippocampus. The autoregressive conditioning helps with consistency across slices, which is a practical touch for volumetric data. Credit where due: the metrics are concrete and the dataset gain is real-world. The soft spot is the reliance on volume correlations for the out-of-domain test. Those r values of 0.87-0.97 are encouraging for overall volumes, but subfield segmentation depends on precise boundaries. Without Dice scores or similar on the 3T set, it's possible the model gets volumes right while distorting local anatomy in ways that matter for atrophy detection. The 7T SSIM numbers don't directly carry over to this concern. This is for people in clinical neuroimaging who need more power in hippocampal subfield studies and face motion issues with TSE sequences. A reader looking for methods to recover lost data would get something usable here. It deserves a serious referee. The core idea is grounded and the results point to a real benefit, though more boundary-specific validation would help. I would recommend sending it for peer review to get input on the generalization and the boundary accuracy questions.

Referee Report

2 major / 2 minor

Summary. The paper introduces MRecover, a conditional generative model that synthesizes high-resolution T2-weighted TSE images from routinely acquired T1-weighted images with autoregressive slice conditioning to recover motion-corrupted data for hippocampal subfield segmentation. Trained on 7T data (n=577), it reports in-domain fidelity metrics (SSIM=0.84, FSIM=0.94 on n=148) and out-of-domain generalization to 3T data with subfield volume correlations (r=0.87-0.97 on n=416). Application to the motion-affected ADNI3 dataset increases the number of analyzable subjects from 450 to 593 after quality control and produces larger effect sizes for diagnostic differences in hippocampal subfield atrophy (whole hippocampus ε²=0.121-0.100 versus 0.086-0.062 for left-right hemispheres).

Significance. If the synthesized images support accurate subfield segmentation, the method could meaningfully reduce data loss in motion-sensitive MRI protocols and increase statistical power for detecting atrophy patterns in Alzheimer's and related studies. The reported gains in sample size and effect sizes on a real-world dataset like ADNI3 indicate practical utility for neuroimaging pipelines that rely on TSE sequences.

major comments (2)

[Out-of-domain 3T evaluation] Out-of-domain 3T evaluation (n=416): subfield volume correlations (r=0.87-0.97) are reported between synthesized and acquired images, but no Dice scores, Hausdorff distances, or boundary-error statistics are provided for the subfield segmentations on this test set. Because the headline claims of +31.8% more analyzable subjects and larger effect sizes (ε²=0.121-0.100) rest on the assumption that local anatomical boundaries are recovered faithfully rather than merely matching coarse volumes, the absence of these metrics leaves the downstream segmentation reliability unverified.
[ADNI3 application] ADNI3 application results: the increase from 450 to 593 analyzable subjects and the reported improvement in group-difference effect sizes are presented as direct benefits of the synthesized images, yet these conclusions depend on the untested premise that subfield segmentations on the additional subjects reflect true anatomy rather than plausible but distorted boundaries that could inflate or attenuate the observed ε² values.

minor comments (2)

[Abstract] The abstract and title refer to 'AI Generated Contrast' while the method description emphasizes conditional generation with autoregressive conditioning; a brief clarification of terminology would improve consistency.
[Methods] Details on subject-level train/validation/test splits for the 7T training data (n=577) and any steps taken to avoid leakage across slices or subjects would strengthen reproducibility claims.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their constructive comments, which have helped us improve the manuscript. We address the major comments point by point below.

read point-by-point responses

Referee: [Out-of-domain 3T evaluation] Out-of-domain 3T evaluation (n=416): subfield volume correlations (r=0.87-0.97) are reported between synthesized and acquired images, but no Dice scores, Hausdorff distances, or boundary-error statistics are provided for the subfield segmentations on this test set. Because the headline claims of +31.8% more analyzable subjects and larger effect sizes (ε²=0.121-0.100) rest on the assumption that local anatomical boundaries are recovered faithfully rather than merely matching coarse volumes, the absence of these metrics leaves the downstream segmentation reliability unverified.

Authors: We thank the referee for highlighting this important point. Although volume correlations provide evidence of overall structural agreement, we agree that metrics assessing local boundary accuracy, such as Dice coefficients and Hausdorff distances for subfield segmentations, would more directly support the reliability of the synthesized images for downstream analysis. We will compute and include these additional metrics in the revised version of the manuscript to verify the boundary fidelity on the out-of-domain 3T set. revision: yes
Referee: [ADNI3 application] ADNI3 application results: the increase from 450 to 593 analyzable subjects and the reported improvement in group-difference effect sizes are presented as direct benefits of the synthesized images, yet these conclusions depend on the untested premise that subfield segmentations on the additional subjects reflect true anatomy rather than plausible but distorted boundaries that could inflate or attenuate the observed ε² values.

Authors: We acknowledge that the ADNI3 results rely on the generalization of the model from validated settings. Since the additional subjects in ADNI3 had motion corruption preventing acquisition of usable TSE images, direct comparison to ground truth is inherently not possible. Our approach is supported by the strong out-of-domain performance on 3T data where ground truth is available. To address this concern, we will expand the discussion section to explicitly state the assumptions underlying the ADNI3 analysis and discuss the potential impact of any boundary distortions on the effect sizes. revision: partial

standing simulated objections not resolved

Direct ground-truth validation of subfield segmentations is not possible for the motion-affected subjects in the ADNI3 application, as no usable TSE reference images are available for those cases.

Circularity Check

0 steps flagged

No significant circularity; results rely on independent held-out validation

full rationale

The paper trains MRecover on 7T data (n=577) and reports in-domain fidelity plus out-of-domain 3T generalization via direct comparison of synthesized vs. acquired subfield volumes (r=0.87-0.97, n=416) on independently acquired images. These correlations and the downstream claims (larger effect sizes, +31.8% analyzable subjects) are measured against external ground-truth acquisitions rather than reducing to fitted parameters or self-referential definitions. No equations, self-citations, or ansatzes are shown to be load-bearing in a way that forces the headline results by construction. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard deep-learning assumptions about image distributions and the domain-specific premise that T1w-to-TSE synthesis can preserve subfield anatomy sufficiently for volumetric analysis.

axioms (1)

domain assumption Synthesized TSE images preserve the anatomical boundaries required for accurate hippocampal subfield segmentation when volume correlations with acquired images exceed 0.87.
This premise is invoked to claim that the model recovers usable data for downstream analysis.

pith-pipeline@v0.9.0 · 5816 in / 1305 out tokens · 32599 ms · 2026-05-22T09:06:14.692910+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

4 extracted references · 4 canonical work pages

[1]

Berardo1, Joseph Mettenburg3, Ariel Gildengers4, Howard Aizenstein4, Minjie Wu4, Tamer S

MRecover: A Conditional Generative Model for Recovering Motion-Corrupted MR images Using AI Generated Contrast Jinghang Li#1, Tales Santini#1, Courtney Clark2, Bruno de Almeida1, Cong Chu1, Salem Alkhateeb1, Andrea Sajewski1, Jacob Berardinelli1, Hecheng Jin1, Tobias Campos1, Jeremy J. Berardo1, Joseph Mettenburg3, Ariel Gildengers4, Howard Aizenstein4, M...

work page 1932
[2]

On the in-domain 7T validation dataset (n=148), we quantified voxel-wise similarity between synthesized and as-acquired images using the structural similarity index (SSIM)39 and feature similarity index (FSIM)40. The proposed autoregressive (AR) flow-matching model achieved an SSIM of 0.8422 ± 0.0802 and a FSIM of 0.9390 ± 0.0239, outperforming the UNet b...

work page 1932
[3]

We adapted the denoising diffusion model from MONAI and incorporated autoregressive conditioning for enhanced cross-slice consistency

Flow matching training objective with autoregressive conditioning We implemented the flow matching training objective following37. We adapted the denoising diffusion model from MONAI and incorporated autoregressive conditioning for enhanced cross-slice consistency. Specifically, given a noisy source 𝓍* and a clean target image 𝓍+ the linear interpolation ...

work page 2008
[4]

Venhancer: Generative space-time enhancement for video generation

PloS one 14, e0224030 (2019). 15 Mueller, S. G. et al. Subfield atrophy pattern in temporal lobe epilepsy with and without mesial sclerosis detected by high‐resolution MRI at 4 Tesla: Preliminary results. Epilepsia 50, 1474-1483 (2009). 16 Debona, R. et al. Hippocampal subfields volumes and affective symptoms of patients with mesial temporal lobe epilepsy...

work page arXiv 2019

[1] [1]

Berardo1, Joseph Mettenburg3, Ariel Gildengers4, Howard Aizenstein4, Minjie Wu4, Tamer S

MRecover: A Conditional Generative Model for Recovering Motion-Corrupted MR images Using AI Generated Contrast Jinghang Li#1, Tales Santini#1, Courtney Clark2, Bruno de Almeida1, Cong Chu1, Salem Alkhateeb1, Andrea Sajewski1, Jacob Berardinelli1, Hecheng Jin1, Tobias Campos1, Jeremy J. Berardo1, Joseph Mettenburg3, Ariel Gildengers4, Howard Aizenstein4, M...

work page 1932

[2] [2]

On the in-domain 7T validation dataset (n=148), we quantified voxel-wise similarity between synthesized and as-acquired images using the structural similarity index (SSIM)39 and feature similarity index (FSIM)40. The proposed autoregressive (AR) flow-matching model achieved an SSIM of 0.8422 ± 0.0802 and a FSIM of 0.9390 ± 0.0239, outperforming the UNet b...

work page 1932

[3] [3]

We adapted the denoising diffusion model from MONAI and incorporated autoregressive conditioning for enhanced cross-slice consistency

Flow matching training objective with autoregressive conditioning We implemented the flow matching training objective following37. We adapted the denoising diffusion model from MONAI and incorporated autoregressive conditioning for enhanced cross-slice consistency. Specifically, given a noisy source 𝓍* and a clean target image 𝓍+ the linear interpolation ...

work page 2008

[4] [4]

Venhancer: Generative space-time enhancement for video generation

PloS one 14, e0224030 (2019). 15 Mueller, S. G. et al. Subfield atrophy pattern in temporal lobe epilepsy with and without mesial sclerosis detected by high‐resolution MRI at 4 Tesla: Preliminary results. Epilepsia 50, 1474-1483 (2009). 16 Debona, R. et al. Hippocampal subfields volumes and affective symptoms of patients with mesial temporal lobe epilepsy...

work page arXiv 2019