SIREM: Speech-Informed MRI Reconstruction with Learned Sampling

arxiv: 2605.18221 · v1 · pith:H7TEATE7new · submitted 2026-05-18 · 💻 cs.SD · cs.CL· cs.CV· cs.LG· physics.med-ph

SIREM: Speech-Informed MRI Reconstruction with Learned Sampling

Md Hasan , Nyvenn Castro , Daiqi Liu , Lukas Mulzer , Jana Hutter , Jonghye Woo , Moritz Zaiss , Andreas Maier

show 1 more author

Paula A. Perez-Toro

This is my paper

Pith reviewed 2026-05-19 23:59 UTC · model grok-4.3

classification 💻 cs.SD cs.CLcs.CVcs.LGphysics.med-ph

keywords rtMRIspeech productionmultimodal fusionlearned samplingvocal tract imagingcross-modal priorMRI reconstructionaudio informed reconstruction

0 comments p. Extension

pith:H7TEATE7 Add to your LaTeX paper

What is a Pith Number?

\usepackage{pith}
\pithnumber{H7TEATE7}

Prints a linked pith:H7TEATE7 badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

The pith

Synchronized speech serves as a prior to reconstruct undersampled MRI of vocal-tract motion at higher speeds.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops SIREM to use audio from speech to help reconstruct real-time MRI images of the vocal tract. The method predicts some image content from the sound and combines it with data from the MRI scanner using a learned blend. This approach allows scanning with less data per frame, leading to faster or higher-resolution imaging while keeping the shapes of the tongue, lips, and other parts realistic. A reader would care because better real-time views of speech production could improve studies of language and help diagnose speech disorders without invasive procedures.

Core claim

The central claim is that vocal-tract configurations are sufficiently predictable from acoustics that an audio branch can supply plausible articulator structure, which is then fused with an MRI branch via a spatial weighting map to complete the reconstruction from undersampled measurements. A learnable weighting over spiral k-space arms further adapts the sampling to this multimodal setup.

What carries the argument

A fusion model that blends an audio-driven prediction of vocal-tract structure with MRI-driven reconstruction through a learned spatial weighting map, together with a differentiable soft weighting profile for k-space spiral sampling arms.

If this is right

Reconstruction operates in a substantially higher-throughput regime than iterative methods.
Anatomically plausible vocal-tract structure is preserved.
The framework combines audio-driven prediction, MRI reconstruction, and sampling adaptation in one formulation.
Learnable sampling allows studying how k-space usage interacts with the speech prior.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the audio-to-image prediction generalizes, similar cross-modal priors could speed up other dynamic medical imaging modalities.
Custom sampling trajectories optimized for speech content might become standard in clinical rtMRI setups.
Real-time speech therapy applications could use this for immediate visual feedback during sessions.

Load-bearing premise

Vocal-tract configurations during speech are sufficiently correlated with the produced acoustics to allow a neural network to predict useful image content from audio alone.

What would settle it

A scenario where the speaker makes sounds without the expected vocal-tract motion, such as in ventriloquism or silent articulation, would show whether the audio prediction adds value or introduces errors.

Figures

Figures reproduced from arXiv: 2605.18221 by Andreas Maier, Daiqi Liu, Jana Hutter, Jonghye Woo, Lukas Mulzer, Md Hasan, Moritz Zaiss, Nyvenn Castro, Paula A. Perez-Toro.

**Figure 2.** Figure 2: Qualitative comparison on five frames from the [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Runtime analysis of reconstruction methods on the test set. Bars show mean time per frame [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

read the original abstract

Real-time magnetic resonance imaging (rtMRI) of speech production enables non-invasive visualization of dynamic vocal-tract motion and is valuable for speech science and clinical assessment. However, rtMRI is fundamentally constrained by trade-offs among spatial resolution, temporal resolution, and acquisition speed, often leading to undersampled k-space measurements and degraded reconstructions. We propose SIREM, a speech-informed MRI reconstruction framework that uses synchronized speech as a cross-modal prior. The central idea is that vocal-tract configurations during speech are correlated with the produced acoustics, making part of the image content predictable from audio. SIREM models each frame as a fusion of an audio-driven component and an MRI-driven component through a spatial weighting map. The audio branch predicts articulator-related structure from speech, while the MRI branch reconstructs complementary content from measured k-space data. We further introduce a learnable soft weighting profile over spiral arms, enabling a differentiable study of how k-space arm usage interacts with speech-informed fusion. This yields a unified multimodal formulation that combines audio-driven prediction, MRI reconstruction, and sampling adaptation. We evaluate SIREM on the USC speech rtMRI benchmark against standard baselines, including gridding, wavelet-based compressed sensing, and total variation. SIREM introduces a speech-informed reconstruction paradigm that operates in a substantially higher-throughput regime than iterative methods while preserving anatomically plausible vocal-tract structure. These results establish an initial benchmark for multimodal speech-informed rtMRI reconstruction and highlight the potential of synchronized speech as an auxiliary prior for fast reconstruction. The source code is available at https://github.com/mdhasanai/SIREM

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SIREM tries to speed up vocal-tract rtMRI by feeding in audio as a structural prior, but the lack of numbers in the write-up makes the actual gains hard to judge.

read the letter

The paper's core move is to treat synchronized speech audio as a cross-modal prior that predicts part of the vocal-tract image, then blend it with the measured MRI data through a learned spatial weighting map. They also add a differentiable soft profile over spiral k-space arms so the sampling itself can adapt during training. That combination has not shown up in the rtMRI literature they cite, and releasing the code is a practical plus for anyone who wants to test it on their own data.

Referee Report

2 major / 1 minor

Summary. The paper proposes SIREM, a multimodal framework for real-time MRI reconstruction of speech that uses synchronized audio as a cross-modal prior. Each frame is modeled as a fusion of an audio-driven prediction of articulator structure and an MRI-driven reconstruction from undersampled k-space data, combined via a learned spatial weighting map. A differentiable soft weighting profile over spiral arms is introduced to adapt sampling. The method is evaluated on the USC speech rtMRI benchmark against gridding, wavelet CS, and total variation baselines, with the central claim being that it enables a substantially higher-throughput regime than iterative methods while preserving anatomically plausible vocal-tract structure.

Significance. If the quantitative results hold, this work would establish a new paradigm for accelerating rtMRI by exploiting audio-visual correlations in speech production, potentially allowing higher temporal resolution or reduced acquisition times without sacrificing anatomical fidelity. It provides an initial benchmark for speech-informed reconstruction and could benefit speech science and clinical applications. The public release of source code is a strength for reproducibility.

major comments (2)

[Abstract / Results] Abstract and Results: The evaluation is described only at a high level against gridding, wavelet CS, and TV, with no quantitative metrics, error bars, ablation studies, or specific acceleration factors reported. This directly undermines verification of the central claim that SIREM operates in a substantially higher-throughput regime while preserving structure.
[Methods] Methods: The spatial weighting map parameters and soft weighting profile over spiral arms are learned from the same data used for evaluation. This introduces a risk that performance gains reflect overfitting rather than generalization, which is load-bearing for claims of reliable multimodal fusion at high undersampling rates.

minor comments (1)

[Abstract] The abstract would be strengthened by briefly stating the specific acceleration factors or reconstruction quality metrics achieved.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below and indicate the revisions we will incorporate.

read point-by-point responses

Referee: [Abstract / Results] Abstract and Results: The evaluation is described only at a high level against gridding, wavelet CS, and TV, with no quantitative metrics, error bars, ablation studies, or specific acceleration factors reported. This directly undermines verification of the central claim that SIREM operates in a substantially higher-throughput regime while preserving structure.

Authors: We acknowledge that the current abstract and results presentation emphasizes qualitative anatomical plausibility and the conceptual advantage in throughput over iterative methods without providing numerical metrics. To strengthen verification of the central claim, we will expand the results section in the revision to include quantitative metrics such as PSNR and SSIM with error bars, ablation studies on the audio and MRI components, and explicit acceleration factors relative to the baselines. revision: yes
Referee: [Methods] Methods: The spatial weighting map parameters and soft weighting profile over spiral arms are learned from the same data used for evaluation. This introduces a risk that performance gains reflect overfitting rather than generalization, which is load-bearing for claims of reliable multimodal fusion at high undersampling rates.

Authors: We agree that explicit clarification of the data partitioning is necessary to support generalization claims. The current manuscript does not detail the train-evaluation split in the provided text. We will revise the Methods section to describe the subject-wise cross-validation protocol used on the USC benchmark and add corresponding held-out test results to demonstrate that performance is not due to overfitting. revision: yes

Circularity Check

0 steps flagged

No significant circularity; multimodal prior and learned components remain independent of evaluation inputs

full rationale

The paper defines SIREM as a fusion architecture in which an audio branch predicts articulator structure from speech acoustics, an MRI branch reconstructs from k-space, and a learnable spatial weighting map plus soft sampling profile combine them. This formulation rests on the external assumption that vocal-tract configurations correlate with produced acoustics—an assumption stated in the abstract and not derived from the model equations themselves. No equation or step is shown to reduce the final reconstruction to a fitted parameter by algebraic identity, nor is any central claim justified solely by self-citation. Evaluation occurs on the USC benchmark against external baselines (gridding, compressed sensing, total variation), which supplies an independent test of whether the learned prior enables higher throughput. The derivation chain is therefore self-contained against those benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The framework rests on the domain assumption that audio and MRI are temporally synchronized and that vocal-tract geometry is predictable from acoustics; the neural network contains multiple learned parameters whose values are fitted to the USC benchmark data.

free parameters (2)

spatial weighting map parameters
Learned map that decides per-pixel contribution of audio prediction versus MRI data; fitted during training.
soft weighting profile over spiral arms
Differentiable parameters controlling k-space arm selection; optimized jointly with reconstruction loss.

axioms (1)

domain assumption Vocal-tract configurations are correlated with produced acoustics such that audio can predict image content
Invoked in the central idea paragraph of the abstract as the justification for the audio branch.

pith-pipeline@v0.9.0 · 5860 in / 1234 out tokens · 19251 ms · 2026-05-19T23:59:57.041745+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

SIREM models each frame as a fusion of an audio-driven component and an MRI-driven component through a spatial weighting map... xa_t = F(a_t), xm_t = F^{-1}(p ⊙ k_t), ˆx_t = w_EbA ⊙ xa_t + (1−w_EbA) ⊙ xm_t
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We evaluate SIREM on the USC speech rtMRI benchmark against standard baselines, including gridding, wavelet-based compressed sensing, and total variation.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages

[1]

Transportation Research Record: Journal of the Transportation Research Board , number=

Theoretical maximum capacity as benchmark for empty vehicle redistribution in personal rapid transit , author=. Transportation Research Record: Journal of the Transportation Research Board , number=. 2010 , publisher=

work page 2010
[2]

Scientific data , volume=

A multispeaker dataset of raw and reconstructed speech production real-time MRI video and 3D volumetric images , author=. Scientific data , volume=. 2021 , publisher=

work page 2021
[3]

Journal of Speech, Language, and Hearing Research , volume=

Accuracy of the NDI wave speech research system , author=. Journal of Speech, Language, and Hearing Research , volume=

work page
[4]

American Journal of Speech-Language Pathology , volume=

A multidimensional investigation of children's/r/productions: Perceptual, ultrasound, and acoustic measures , author=. American Journal of Speech-Language Pathology , volume=

work page
[5]

Journal of Magnetic Resonance Imaging , volume=

Real-time magnetic resonance imaging , author=. Journal of Magnetic Resonance Imaging , volume=. 2022 , publisher=

work page 2022
[6]

and Kumar, Prakash and Yagiz, Ecrin and Tian, Ye and Nayak, Krishna S

Le, Duc H. and Kumar, Prakash and Yagiz, Ecrin and Tian, Ye and Nayak, Krishna S. , urldate =. Online Spatiotemporally Constrained Reconstruction for Real-Time Interactive. doi:10.1002/mrm.70131 , abstract =

work page doi:10.1002/mrm.70131
[7]

The Current Status of

Haller, Sven and Hedderich, Dennis and Federau, Christian and Weisstanner, Christian and Edjlali, Myriam and Cauter, Sofie van and Zaharchuk, Greg , date =. The Current Status of. doi:10.1148/radiol.243819 , abstract =

work page doi:10.1148/radiol.243819
[8]

Radiology , volume=

The current status of AI-accelerated MRI techniques in clinical use , author=. Radiology , volume=. 2025 , publisher=

work page 2025
[9]

Computer Speech & Language , volume=

Analysis of speech production real-time MRI , author=. Computer Speech & Language , volume=. 2018 , publisher=

work page 2018
[10]

Journal of Speech, Language, and Hearing Research , volume=

Characterizing articulation in apraxic speech using real-time magnetic resonance imaging , author=. Journal of Speech, Language, and Hearing Research , volume=. 2017 , publisher=

work page 2017
[11]

75-Speaker Annot-16: A benchmark dataset for speech articulatory rt-MRI annotation with articulator contours and phonetic alignment , author=. Proc. Interspeech 2025 , pages=

work page 2025
[12]

IEEE transactions on medical imaging , volume=

MoDL: Model-based deep learning architecture for inverse problems , author=. IEEE transactions on medical imaging , volume=. 2018 , publisher=

work page 2018
[13]

Magnetic Resonance in Medicine , volume =

Learning a Variational Network for Reconstruction of Accelerated MRI Data , author =. Magnetic Resonance in Medicine , volume =. 2018 , doi =

work page 2018
[14]

international conference on information processing in medical imaging , pages=

Learning-based optimization of the under-sampling pattern in MRI , author=. international conference on information processing in medical imaging , pages=. 2019 , organization=

work page 2019
[15]

International conference on medical image computing and computer-assisted intervention , pages=

End-to-end variational networks for accelerated MRI reconstruction , author=. International conference on medical image computing and computer-assisted intervention , pages=. 2020 , organization=

work page 2020
[16]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Reducing uncertainty in undersampled MRI reconstruction with active acquisition , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[17]

Magnetic Resonance in Medicine , volume=

High-resolution dynamic speech imaging with joint low-rank and sparsity constraints , author=. Magnetic Resonance in Medicine , volume=. 2015 , publisher=

work page 2015
[18]

NMR in Biomedicine , volume=

Prospectively accelerated dynamic speech magnetic resonance imaging at 3 T using a self-navigated spiral-based manifold regularized scheme , author=. NMR in Biomedicine , volume=. 2024 , publisher=

work page 2024
[19]

Magnetic Resonance Imaging , volume=

Self-navigated subspace reconstruction for real-time MR imaging of the vocal tract , author=. Magnetic Resonance Imaging , volume=. 2025 , publisher=

work page 2025
[20]

ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages=

Real-time mri video synthesis from time aligned phonemes with sequence-to-sequence networks , author=. ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages=. 2023 , organization=

work page 2023
[21]

ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages=

Speech2rtMRI: Speech-Guided Diffusion Model for Real-time MRI Video of the Vocal Tract during Speech , author=. ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages=. 2025 , organization=

work page 2025
[22]

Medical Image Analysis , pages=

A speech-to-video synthesis approach using spatio-temporal diffusion for vocal tract MRI , author=. Medical Image Analysis , pages=. 2026 , publisher=

work page 2026
[23]

arXiv preprint arXiv:2509.13767 , year=

VocSegMRI: Multimodal Learning for Precise Vocal Tract Segmentation in Real-time MRI , author=. arXiv preprint arXiv:2509.13767 , year=

work page arXiv
[24]

Computer Speech & Language , pages=

Speech acoustics to rt-MRI articulatory dynamics inversion with video diffusion model , author=. Computer Speech & Language , pages=. 2025 , publisher=

work page 2025
[25]

arXiv preprint arXiv:2406.15754 , year=

Multimodal segmentation for vocal tract modeling , author=. arXiv preprint arXiv:2406.15754 , year=

work page arXiv
[26]

Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine , volume=

SENSE: sensitivity encoding for fast MRI , author=. Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine , volume=. 1999 , publisher=

work page 1999
[27]

Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine , volume=

Generalized autocalibrating partially parallel acquisitions (GRAPPA) , author=. Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine , volume=. 2002 , publisher=

work page 2002
[28]

Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine , volume=

Sparse MRI: The application of compressed sensing for rapid MR imaging , author=. Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine , volume=. 2007 , publisher=

work page 2007
[29]

IEEE/ACM Transactions on Audio, Speech, and Language Processing , volume=

HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units , author=. IEEE/ACM Transactions on Audio, Speech, and Language Processing , volume=. 2021 , doi=

work page 2021
[30]

ISMRM Workshop on Data Sampling and Image Reconstruction , year=

SigPy: A Python Package for High Performance Iterative Reconstruction , author=. ISMRM Workshop on Data Sampling and Image Reconstruction , year=

work page
[31]

Magnetic Resonance in Medicine , volume=

Adaptive Reconstruction of Phased Array MR Imagery , author=. Magnetic Resonance in Medicine , volume=. 2000 , doi=

work page 2000
[32]

IEEE Transactions on Image Processing , volume=

Image Quality Assessment: From Error Visibility to Structural Similarity , author=. IEEE Transactions on Image Processing , volume=. 2004 , doi=

work page 2004
[33]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=

The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , author=. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=

work page
[34]

IEEE Transactions on Image Processing , volume=

Image Information and Visual Quality , author=. IEEE Transactions on Image Processing , volume=. 2006 , doi=

work page 2006
[35]

Advances in Neural Information Processing Systems , volume=

GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , author=. Advances in Neural Information Processing Systems , volume=

work page

[1] [1]

Transportation Research Record: Journal of the Transportation Research Board , number=

Theoretical maximum capacity as benchmark for empty vehicle redistribution in personal rapid transit , author=. Transportation Research Record: Journal of the Transportation Research Board , number=. 2010 , publisher=

work page 2010

[2] [2]

Scientific data , volume=

A multispeaker dataset of raw and reconstructed speech production real-time MRI video and 3D volumetric images , author=. Scientific data , volume=. 2021 , publisher=

work page 2021

[3] [3]

Journal of Speech, Language, and Hearing Research , volume=

Accuracy of the NDI wave speech research system , author=. Journal of Speech, Language, and Hearing Research , volume=

work page

[4] [4]

American Journal of Speech-Language Pathology , volume=

A multidimensional investigation of children's/r/productions: Perceptual, ultrasound, and acoustic measures , author=. American Journal of Speech-Language Pathology , volume=

work page

[5] [5]

Journal of Magnetic Resonance Imaging , volume=

Real-time magnetic resonance imaging , author=. Journal of Magnetic Resonance Imaging , volume=. 2022 , publisher=

work page 2022

[6] [6]

and Kumar, Prakash and Yagiz, Ecrin and Tian, Ye and Nayak, Krishna S

Le, Duc H. and Kumar, Prakash and Yagiz, Ecrin and Tian, Ye and Nayak, Krishna S. , urldate =. Online Spatiotemporally Constrained Reconstruction for Real-Time Interactive. doi:10.1002/mrm.70131 , abstract =

work page doi:10.1002/mrm.70131

[7] [7]

The Current Status of

Haller, Sven and Hedderich, Dennis and Federau, Christian and Weisstanner, Christian and Edjlali, Myriam and Cauter, Sofie van and Zaharchuk, Greg , date =. The Current Status of. doi:10.1148/radiol.243819 , abstract =

work page doi:10.1148/radiol.243819

[8] [8]

Radiology , volume=

The current status of AI-accelerated MRI techniques in clinical use , author=. Radiology , volume=. 2025 , publisher=

work page 2025

[9] [9]

Computer Speech & Language , volume=

Analysis of speech production real-time MRI , author=. Computer Speech & Language , volume=. 2018 , publisher=

work page 2018

[10] [10]

Journal of Speech, Language, and Hearing Research , volume=

Characterizing articulation in apraxic speech using real-time magnetic resonance imaging , author=. Journal of Speech, Language, and Hearing Research , volume=. 2017 , publisher=

work page 2017

[11] [11]

75-Speaker Annot-16: A benchmark dataset for speech articulatory rt-MRI annotation with articulator contours and phonetic alignment , author=. Proc. Interspeech 2025 , pages=

work page 2025

[12] [12]

IEEE transactions on medical imaging , volume=

MoDL: Model-based deep learning architecture for inverse problems , author=. IEEE transactions on medical imaging , volume=. 2018 , publisher=

work page 2018

[13] [13]

Magnetic Resonance in Medicine , volume =

Learning a Variational Network for Reconstruction of Accelerated MRI Data , author =. Magnetic Resonance in Medicine , volume =. 2018 , doi =

work page 2018

[14] [14]

international conference on information processing in medical imaging , pages=

Learning-based optimization of the under-sampling pattern in MRI , author=. international conference on information processing in medical imaging , pages=. 2019 , organization=

work page 2019

[15] [15]

International conference on medical image computing and computer-assisted intervention , pages=

End-to-end variational networks for accelerated MRI reconstruction , author=. International conference on medical image computing and computer-assisted intervention , pages=. 2020 , organization=

work page 2020

[16] [16]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Reducing uncertainty in undersampled MRI reconstruction with active acquisition , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page

[17] [17]

Magnetic Resonance in Medicine , volume=

High-resolution dynamic speech imaging with joint low-rank and sparsity constraints , author=. Magnetic Resonance in Medicine , volume=. 2015 , publisher=

work page 2015

[18] [18]

NMR in Biomedicine , volume=

Prospectively accelerated dynamic speech magnetic resonance imaging at 3 T using a self-navigated spiral-based manifold regularized scheme , author=. NMR in Biomedicine , volume=. 2024 , publisher=

work page 2024

[19] [19]

Magnetic Resonance Imaging , volume=

Self-navigated subspace reconstruction for real-time MR imaging of the vocal tract , author=. Magnetic Resonance Imaging , volume=. 2025 , publisher=

work page 2025

[20] [20]

ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages=

Real-time mri video synthesis from time aligned phonemes with sequence-to-sequence networks , author=. ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages=. 2023 , organization=

work page 2023

[21] [21]

ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages=

Speech2rtMRI: Speech-Guided Diffusion Model for Real-time MRI Video of the Vocal Tract during Speech , author=. ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages=. 2025 , organization=

work page 2025

[22] [22]

Medical Image Analysis , pages=

A speech-to-video synthesis approach using spatio-temporal diffusion for vocal tract MRI , author=. Medical Image Analysis , pages=. 2026 , publisher=

work page 2026

[23] [23]

arXiv preprint arXiv:2509.13767 , year=

VocSegMRI: Multimodal Learning for Precise Vocal Tract Segmentation in Real-time MRI , author=. arXiv preprint arXiv:2509.13767 , year=

work page arXiv

[24] [24]

Computer Speech & Language , pages=

Speech acoustics to rt-MRI articulatory dynamics inversion with video diffusion model , author=. Computer Speech & Language , pages=. 2025 , publisher=

work page 2025

[25] [25]

arXiv preprint arXiv:2406.15754 , year=

Multimodal segmentation for vocal tract modeling , author=. arXiv preprint arXiv:2406.15754 , year=

work page arXiv

[26] [26]

Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine , volume=

SENSE: sensitivity encoding for fast MRI , author=. Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine , volume=. 1999 , publisher=

work page 1999

[27] [27]

Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine , volume=

Generalized autocalibrating partially parallel acquisitions (GRAPPA) , author=. Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine , volume=. 2002 , publisher=

work page 2002

[28] [28]

Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine , volume=

Sparse MRI: The application of compressed sensing for rapid MR imaging , author=. Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine , volume=. 2007 , publisher=

work page 2007

[29] [29]

IEEE/ACM Transactions on Audio, Speech, and Language Processing , volume=

HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units , author=. IEEE/ACM Transactions on Audio, Speech, and Language Processing , volume=. 2021 , doi=

work page 2021

[30] [30]

ISMRM Workshop on Data Sampling and Image Reconstruction , year=

SigPy: A Python Package for High Performance Iterative Reconstruction , author=. ISMRM Workshop on Data Sampling and Image Reconstruction , year=

work page

[31] [31]

Magnetic Resonance in Medicine , volume=

Adaptive Reconstruction of Phased Array MR Imagery , author=. Magnetic Resonance in Medicine , volume=. 2000 , doi=

work page 2000

[32] [32]

IEEE Transactions on Image Processing , volume=

Image Quality Assessment: From Error Visibility to Structural Similarity , author=. IEEE Transactions on Image Processing , volume=. 2004 , doi=

work page 2004

[33] [33]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=

The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , author=. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=

work page

[34] [34]

IEEE Transactions on Image Processing , volume=

Image Information and Visual Quality , author=. IEEE Transactions on Image Processing , volume=. 2006 , doi=

work page 2006

[35] [35]

Advances in Neural Information Processing Systems , volume=

GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , author=. Advances in Neural Information Processing Systems , volume=

work page