Leveraging Image Editing Foundation Models for Data-Efficient CT Metal Artifact Reduction

Ahmet Rasim Emirdagi; Burak Can Biner; G\"orkay Aydemir; M. Ak{\i}n Y{\i}lmaz; M{\i}sra Yavuz; Nasrin Rahimi; S\"uleyman Aslan; Yunus Bilge Kurt

arxiv: 2604.05934 · v1 · submitted 2026-04-07 · 💻 cs.CV · eess.IV

Leveraging Image Editing Foundation Models for Data-Efficient CT Metal Artifact Reduction

Ahmet Rasim Emirdagi , S\"uleyman Aslan , M{\i}sra Yavuz , G\"orkay Aydemir , Yunus Bilge Kurt , Nasrin Rahimi , Burak Can Biner , M. Ak{\i}n Y{\i}lmaz This is my paper

Pith reviewed 2026-05-10 18:41 UTC · model grok-4.3

classification 💻 cs.CV eess.IV

keywords CT metal artifact reductiondiffusion modelsLoRA adaptationdata-efficient learningin-context reasoningmedical image reconstructionvision-language modelsmulti-reference conditioning

0 comments

The pith

Adapting a vision-language diffusion model via LoRA with multi-reference conditioning suppresses CT metal artifacts using only 16 to 128 paired training examples.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Metal artifacts from implants in CT scans hide anatomy and standard deep learning fixes demand thousands of paired clean and corrupted images that are costly to acquire. This work reframes the task as in-context reasoning and adapts a general-purpose vision-language diffusion foundation model with parameter-efficient LoRA fine-tuning. Clean anatomical images from unrelated subjects serve as reference context to guide restoration of the corrupted scan. The method reaches state-of-the-art scores on the AAPM CT-MAR benchmark for both perceptual quality and radiological features while cutting the required training pairs by roughly two orders of magnitude. Domain adaptation proves essential; without it the model mistakes streak artifacts for unrelated natural objects.

Core claim

By treating metal artifact reduction as an in-context reasoning task and adapting a vision-language diffusion foundation model with LoRA plus multi-reference conditioning on clean exemplars from other subjects, the approach achieves effective artifact suppression and state-of-the-art performance on perceptual and radiological metrics using only 16 to 128 paired examples, two orders of magnitude fewer than conventional supervised methods.

What carries the argument

LoRA adaptation of a vision-language diffusion foundation model that receives the corrupted CT slice together with clean anatomical reference images from unrelated subjects to perform in-context restoration.

If this is right

Artifact suppression reaches state-of-the-art levels on the AAPM CT-MAR benchmark for both perceptual and radiological-feature metrics.
Training data requirements drop from thousands to 16-128 paired examples.
Domain adaptation via LoRA prevents the foundation model from misinterpreting artifacts as unrelated natural objects.
Multi-reference conditioning with clean exemplars from other patients enables category-specific anatomical inference.
The adapted foundation model supplies an interpretable, data-efficient route to medical image reconstruction.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same in-context conditioning strategy could be tested on other limited-data medical reconstruction problems such as low-dose CT or MRI denoising.
Performance may depend on how closely the reference images match the anatomical category of the corrupted scan, suggesting a need for automated reference selection.
If the approach scales, large unlabeled medical image collections could serve as reference banks without requiring new paired acquisitions.

Load-bearing premise

That clean anatomical exemplars from unrelated subjects supply sufficient category-specific context for the adapted model to correctly infer and restore the underlying anatomy without hallucinating new structures.

What would settle it

Running the model on the AAPM CT-MAR test set without the multi-reference conditioning or without LoRA domain adaptation and measuring whether streak artifacts are still interpreted as natural objects or whether quantitative and perceptual metrics fall below prior supervised baselines.

Figures

Figures reproduced from arXiv: 2604.05934 by Ahmet Rasim Emirdagi, Burak Can Biner, G\"orkay Aydemir, M. Ak{\i}n Y{\i}lmaz, M{\i}sra Yavuz, Nasrin Rahimi, S\"uleyman Aslan, Yunus Bilge Kurt.

**Figure 2.** Figure 2: Visual comparison of artifact reduction methods. (a) Noisy inputs, (b) ADN, (c) Rise-MAR, (d) OSCNet+ and (e) Our recon [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Visual demonstration of domain misalignment. Foun [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

read the original abstract

Metal artifacts from high-attenuation implants severely degrade CT image quality, obscuring critical anatomical structures and posing a challenge for standard deep learning methods that require extensive paired training data. We propose a paradigm shift: reframing artifact reduction as an in-context reasoning task by adapting a general-purpose vision-language diffusion foundation model via parameter-efficient Low-Rank Adaptation (LoRA). By leveraging rich visual priors, our approach achieves effective artifact suppression with only 16 to 128 paired training examples reducing data requirements by two orders of magnitude. Crucially, we demonstrate that domain adaptation is essential for hallucination mitigation; without it, foundation models interpret streak artifacts as erroneous natural objects (e.g., waffles or petri dishes). To ground the restoration, we propose a multi-reference conditioning strategy where clean anatomical exemplars from unrelated subjects are provided alongside the corrupted input, enabling the model to exploit category-specific context to infer uncorrupted anatomy. Extensive evaluation on the AAPM CT-MAR benchmark demonstrates that our method achieves state-of-the-art performance on perceptual and radiological-feature metrics . This work establishes that foundation models, when appropriately adapted, offer a scalable alternative for interpretable, data-efficient medical image reconstruction. Code is available at https://github.com/ahmetemirdagi/CT-EditMAR.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adapts a vision-language diffusion model via LoRA plus multi-reference conditioning from unrelated clean scans to cut CT metal artifact training data to 16-128 pairs while claiming SOTA on AAPM, but the references may not supply reliable patient-specific anatomy.

read the letter

The main point is that the authors adapt a vision-language diffusion foundation model using LoRA on a very small number of paired CT examples and add multi-reference conditioning with clean scans from other patients to reduce metal artifacts. They claim this gets state-of-the-art results on the AAPM benchmark while using two orders of magnitude less data than typical methods. They do a couple things well. The note that without adaptation the model mistakes streaks for everyday objects like waffles or petri dishes highlights a real failure mode of these models on medical images. Framing the task as in-context reasoning with references is a smart way to leverage the priors in the foundation model without needing huge amounts of domain-specific training data. The soft spots are more concerning. The central assumption is that clean exemplars from unrelated subjects provide enough category-specific context to recover the correct anatomy in the corrupted scan. But CT anatomy differs a lot between patients, so the references may not align well with the test subject's geometry. With only 16-128 adaptation examples, there's not much room for the model to learn patient-specific corrections, raising the risk of inconsistent or hallucinated restorations. The stress-test concern lands here because the paper's data-efficiency claim depends on this working reliably. The abstract mentions perceptual and radiological-feature metrics but gives no actual numbers, error bars, or ablation results on reference choice or training stability. That makes it tough to judge how solid the SOTA claim is. This paper is for researchers working on data-efficient medical image tasks or adapting foundation models to new domains. Someone looking for ideas on low-data regimes in imaging would find it useful to read. It should go to peer review. The approach is novel enough and the problem is important, so referees can check the details and push for stronger validation on the reference strategy.

Referee Report

1 major / 3 minor

Summary. The paper proposes reframing CT metal artifact reduction (MAR) as an in-context reasoning task by adapting a vision-language diffusion foundation model via LoRA, combined with multi-reference conditioning using clean anatomical exemplars from unrelated subjects. It claims this enables effective artifact suppression and state-of-the-art performance on the AAPM CT-MAR benchmark using only 16 to 128 paired training examples (two orders of magnitude less data than standard methods), while showing that domain adaptation is essential to prevent the model from misinterpreting artifacts as natural objects.

Significance. If the central performance claims hold under rigorous validation, this work would be significant for demonstrating that foundation models can be adapted for data-scarce medical imaging tasks like CT reconstruction, potentially reducing reliance on large paired datasets. The public code release supports reproducibility and allows community verification of the empirical results.

major comments (1)

[§3.2] §3.2 (multi-reference conditioning): The central data-efficiency claim depends on the assumption that clean exemplars from unrelated subjects supply usable category-specific priors for recovering patient-specific anatomy. Given substantial inter-patient variability in organ geometry, size, and position in CT, the manuscript provides no analysis, ablation, or patient-specific validation of reference selection/impact to show that the model recovers true geometry rather than blending or hallucinating structures from the references. This is load-bearing for the 16–128 example regime, where LoRA adaptation has limited opportunity to learn mappings.

minor comments (3)

The abstract asserts state-of-the-art performance on 'perceptual and radiological-feature metrics' without naming the specific metrics, providing numerical values, or referencing the corresponding results table; this should be stated explicitly in the abstract or introduction.
[Method] The method section should specify the exact foundation model (e.g., which Stable Diffusion variant or vision-language model) and the precise LoRA configuration (rank, target modules) for reproducibility.
[Results] Figure captions in the results section could more explicitly describe the role of the provided references and highlight differences in artifact suppression across methods.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their insightful review and for recognizing the potential significance of adapting foundation models to data-scarce medical imaging tasks. We address the major comment below and will revise the manuscript to incorporate additional analysis as suggested.

read point-by-point responses

Referee: [§3.2] §3.2 (multi-reference conditioning): The central data-efficiency claim depends on the assumption that clean exemplars from unrelated subjects supply usable category-specific priors for recovering patient-specific anatomy. Given substantial inter-patient variability in organ geometry, size, and position in CT, the manuscript provides no analysis, ablation, or patient-specific validation of reference selection/impact to show that the model recovers true geometry rather than blending or hallucinating structures from the references. This is load-bearing for the 16–128 example regime, where LoRA adaptation has limited opportunity to learn mappings.

Authors: We thank the referee for this important observation. The multi-reference conditioning strategy is intended to exploit the foundation model's in-context reasoning by supplying clean anatomical exemplars as category-specific visual priors, enabling inference of uncorrupted patient anatomy even with minimal paired data. Our experiments indicate that this yields measurable gains over single-reference or no-reference baselines on the AAPM CT-MAR benchmark. Nevertheless, we agree that the current manuscript lacks a dedicated ablation and validation of reference selection and impact. In the revised version, we will add: (1) an ablation varying reference count and anatomical similarity (measured via feature-based metrics), (2) quantitative assessments of anatomical fidelity on non-artifact regions using available ground-truth structures, and (3) patient-specific case studies with visualizations to demonstrate recovery of true geometry rather than blending or hallucination. These additions will directly address inter-patient variability and strengthen the data-efficiency claims for the low-data regime. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents an empirical method: reframing metal artifact reduction as in-context reasoning via LoRA adaptation of a vision-language diffusion foundation model, combined with multi-reference conditioning using clean exemplars. Performance claims rest on AAPM CT-MAR benchmark evaluation and ablation studies showing data efficiency (16-128 examples) and the necessity of domain adaptation to avoid hallucinations. No equations, first-principles derivations, or predictions are offered that reduce by construction to fitted inputs, self-definitions, or self-citation chains. The central results are externally falsifiable via the public benchmark and code release, with no load-bearing self-citations or uniqueness theorems invoked.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Only the abstract is available, so the ledger is limited to assumptions stated or implied therein. The central claim rests on the pre-trained diffusion model possessing transferable visual priors that survive light adaptation and on the utility of unrelated clean exemplars for anatomical inference.

axioms (2)

domain assumption General-purpose vision-language diffusion models contain rich visual priors transferable to medical CT images after parameter-efficient adaptation
Invoked by the claim that the foundation model can be adapted for artifact suppression.
standard math LoRA enables effective domain adaptation without catastrophic forgetting of the base model's capabilities
Standard assumption underlying the use of Low-Rank Adaptation.

pith-pipeline@v0.9.0 · 5575 in / 1497 out tokens · 43849 ms · 2026-05-10T18:41:35.926341+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 1 internal anchor

[1]

Aapm ct metal artifact reduction grand challenge benchmark tool.https : / / github

AAPM CT-MAR Grand Challenge Team. Aapm ct metal artifact reduction grand challenge benchmark tool.https : / / github . com / xcist / example / tree / main / AAPM _ datachallenge/, 2025. 4

work page 2025
[2]

Akın Yılmaz, A

Ahmet Bilican, M. Akın Yılmaz, A. Murat Tekalp, and R. G ¨okberk Cinbis ¸. Exploring sparsity for pa- rameter efficient fine tuning using wavelets.preprint arXiv:2505.12532, 2025. 2

work page arXiv 2025
[3]

FouRA: Fourier low-rank adaptation

Shubhankar Borse, Shreya Kadambi, Nilesh Prasad Pandey, Kartikeya Bhardwaj, Viswanath Ganapathy, Sweta Priyadarshi, Risheek Garrepalli, Rafael Es- teves, Munawar Hayat, and Fatih Porikli. FouRA: Fourier low-rank adaptation. InThe Thirty-eighth An- nual Conference on Neural Information Processing Systems, 2024. 2

work page 2024
[4]

Chang, H.-N

C.-H. Chang, H.-N. Wu, C.-H. Hsu, and H.-H. Lin. Virtual monochromatic imaging with projection-based material decomposition algorithm for metal artifacts reduction in photon-counting detector computed to- mography.PLoS ONE, 18(3):e0282900, 2023. 1

work page 2023
[5]

Sam-med2d

Junlong Cheng, Jin Ye, Zhongying Deng, Jianpin Chen, Tianbin Li, Haoyu Wang, Yanzhou Su, Ziyan Huang, Jilong Chen, Lei Jiang, Hui Sun, Junjun He, Shaoting Zhang, Min Zhu, and Yu Qiao. Sam-med2d. arXiv preprint arXiv:2308.16184, 2023. 2

work page arXiv 2023
[6]

An iterative maximum-likelihood polychromatic algo- rithm for ct.IEEE Transactions on Medical Imaging, 20(10):999–1008, 2001

Bruno De Man, Johan Nuyts, Patrick Dupont, et al. An iterative maximum-likelihood polychromatic algo- rithm for ct.IEEE Transactions on Medical Imaging, 20(10):999–1008, 2001. 1

work page 2001
[7]

Scaling rectified flow transformers for high-resolution image synthe- sis

Patrick Esser, Sumith Kulal, Andreas Blattmann, Tim Dockhorn, Jonas M ¨uller, Harry Lorenz, Yam Zhang, Robin Caplette, and Robin Rombach. Scaling rectified flow transformers for high-resolution image synthe- sis. InInternational Conference on Machine Learn- ing, 2024. 3

work page 2024
[8]

Parameter- efficient fine-tuning with discrete fourier transform

Ziqi Gao, Qichao Wang, Aochuan Chen, Zijing Liu, Bingzhe Wu, Liang Chen, and Jia Li. Parameter- efficient fine-tuning with discrete fourier transform. InForty-first International Conference on Machine Learning, 2024. 2

work page 2024
[9]

Metal artifact reduction in ct: Where are we after four decades?IEEE Access, 4:5826–5849, 2016

Lars Gjesteby, Bruno De Man, Yannan Jin, et al. Metal artifact reduction in ct: Where are we after four decades?IEEE Access, 4:5826–5849, 2016. 1

work page 2016
[10]

Multi-frequency electrical impedance tomography and neuroimaging data in stroke patients.Scientific Data, 5:180112, 2018

Nir Goren, James Avery, Thomas Dowrick, Eleanor Mackle, Anna Witkowska-Wrobel, and David Holder. Multi-frequency electrical impedance tomography and neuroimaging data in stroke patients.Scientific Data, 5:180112, 2018. 4

work page 2018
[11]

Haneda, N

E. Haneda, N. Peters, J. Zhang, et al. Aapm ct metal artifact reduction grand challenge.Medical Physics, 52(10):e70050, 2025. 2, 4

work page 2025
[12]

Gans trained by a two time-scale update rule converge to a local nash equilibrium

Martin Heusel, Hubert Ramsauer, Thomas Un- terthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. InAdvances in Neural Information Processing Systems, 2017. 5

work page 2017
[13]

Lora: Low-rank adaptation of large language models

Edward J Hu, Yelong Shen, Phillip Wallis, et al. Lora: Low-rank adaptation of large language models. In ICLR, 2022. 2, 3

work page 2022
[14]

W. A. Kalender, R. Hebel, and J. Ebersberger. Reduc- tion of ct artifacts caused by metallic implants.Radi- ology, 164(2):576–577, 1987. 1

work page 1987
[15]

Akın Yılmaz, A

Onur Keles ¸, M. Akın Yılmaz, A. Murat Tekalp, Cansu Korkmaz, and Zafer Do ˘gan. On the computation of psnr for a set of images or video. In2021 Picture Coding Symposium (PCS), pages 1–5, 2021. 5

work page 2021
[16]

VeRA: Vector-based random matrix adaptation

Dawid Jan Kopiczko, Tijmen Blankevoort, and Yuki M Asano. VeRA: Vector-based random matrix adaptation. InThe Twelfth International Conference on Learning Representations, 2024. 2

work page 2024
[17]

Kevin Zhou, and Jiebo Luo

Haofu Liao, Wei-An Lin, S. Kevin Zhou, and Jiebo Luo. Adn: Artifact disentanglement network for unsu- pervised metal artifact reduction.IEEE Transactions on Medical Imaging, 39(3):634–643, 2020. 2, 4, 5

work page 2020
[18]

Dudonet: Dual domain network for ct metal artifact reduction

Wei-An Lin, Haofu Liao, Cheng Peng, et al. Dudonet: Dual domain network for ct metal artifact reduction. InCVPR, pages 10512–10521, 2019. 1

work page 2019
[19]

Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling. InInternational Conference on Learning Representations, 2023. 3

work page 2023
[20]

Dora: Weight- decomposed low-rank adaptation

Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, and Min-Hung Chen. Dora: Weight- decomposed low-rank adaptation. InICML, 2024. 2

work page 2024
[21]

Radiologist-in-the-loop self-training for gener- alizable ct metal artifact reduction.IEEE Transactions on Medical Imaging, 44(6):2504–2514, 2025

Chenglong Ma, Zilong Li, Yuanlin Li, Jing Han, Jun- ping Zhang, Yi Zhang, Jiannan Liu, and Hongming Shan. Radiologist-in-the-loop self-training for gener- alizable ct metal artifact reduction.IEEE Transactions on Medical Imaging, 44(6):2504–2514, 2025. 4, 8

work page 2025
[22]

Segment anything in medical images

Jun Ma, Yuting He, Feifei Li, Lin Han, Chenyu You, and Bo Wang. Segment anything in medical images. Nat. Commun., 15(1):654, 2024. 2

work page 2024
[23]

Robson, Brett Marinelli, Mingqian Huang, Amish Doshi, Adam Ja- cobi, Chendi Cao, Katherine E

Xueyan Mei, Zelong Liu, Philip M. Robson, Brett Marinelli, Mingqian Huang, Amish Doshi, Adam Ja- cobi, Chendi Cao, Katherine E. Link, Thomas Yang, et al. Radimagenet: An open radiologic deep learning research dataset for effective transfer learning.Radi- ology: Artificial Intelligence, 4(5):e210315, 2022. 5

work page 2022
[24]

Normalized metal artifact reduction (nmar) in computed tomogra- phy.Medical Physics, 37(10):5482–5493, 2010

Esther Meyer, Rainer Raupach, Michael Lell, Bern- hard Schmidt, and Marc Kachelrieß. Normalized metal artifact reduction (nmar) in computed tomogra- phy.Medical Physics, 37(10):5482–5493, 2010. 1

work page 2010
[25]

GPT-4 technical report

OpenAI. GPT-4 technical report. Technical report, OpenAI, 2023. 2

work page 2023
[26]

Peters, E

N. Peters, E. Haneda, J. Zhang, et al. A hybrid train- ing database and evaluation benchmark for assessing metal artifact reduction methods for x-ray ct imaging. Medical Physics, 52(10):e70020, 2025. 4

work page 2025
[27]

Learning trans- ferable visual models from natural language supervi- sion

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sas- try, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning trans- ferable visual models from natural language supervi- sion. InInternational Conference on Machine Learn- ing, pages 8748–8763. PMLR, 2021. 2

work page 2021
[28]

Akın Yılmaz, and A

Nasrin Rahimi, Mısra Yavuz, Burak Can Biner, Yunus Bilge Kurt, Ahmet Rasim Emirda˘gı, S¨uleyman Aslan, G¨orkay Aydemir, M. Akın Yılmaz, and A. Mu- rat Tekalp. Edit2interp: Adapting image founda- tion models from spatial editing to video frame in- terpolation with few-shot learning.arXiv preprint arXiv:2603.15003, 2026. 2

work page arXiv 2026
[29]

High- resolution image synthesis with latent diffusion mod- els

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High- resolution image synthesis with latent diffusion mod- els. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022. 2

work page 2022
[30]

Ad- vances in metal artifact reduction in CT images: A review of traditional and novel metal artifact reduc- tion techniques.Eur

Mark Selles, Jochen A C van Osch, Mario Maas, Mar- tijn F Boomsma, and Ruud H H Wellenberg. Ad- vances in metal artifact reduction in CT images: A review of traditional and novel metal artifact reduc- tion techniques.Eur. J. Radiol., 170(111276):111276,

work page
[31]

Solving inverse problems in medical imaging with score-based generative models

Yang Song, Liyue Shen, Lei Xing, and Stefano Er- mon. Solving inverse problems in medical imaging with score-based generative models. InICLR, 2022. 2

work page 2022
[32]

An image inpainting technique based on the fast marching method.Journal of Graph- ics Tools, 9(1):23–34, 2004

Alexandru Telea. An image inpainting technique based on the fast marching method.Journal of Graph- ics Tools, 9(1):23–34, 2004. 4

work page 2004
[33]

Ge Wang, D. L. Snyder, J. A. O’Sullivan, and M. W. Vannier. Iterative deblurring for metal artifact reduc- tion in ct.IEEE Transactions on Medical Imaging, 15 (5):657–664, 1996. 1

work page 1996
[34]

Orientation-shared convolution representation for CT metal artifact learn- ing

Hong Wang, Qi Xie, Yuexiang Li, Yawen Huang, Deyu Meng, and Yefeng Zheng. Orientation-shared convolution representation for CT metal artifact learn- ing. InLecture Notes in Computer Science, pages 665–675. Springer Nature Switzerland, Cham, 2022. 4, 5

work page 2022
[35]

Indudonet+: A deep un- folding dual domain network for metal artifact reduc- tion in ct images.Medical Image Analysis, 85:102729,

Hong Wang, Yuexiang Li, Haimiao Zhang, Deyu Meng, and Yefeng Zheng. Indudonet+: A deep un- folding dual domain network for metal artifact reduc- tion in ct images.Medical Image Analysis, 85:102729,

work page
[36]

Oscnet: Orientation-shared convolutional network for ct metal artifact learning

Hong Wang, Qi Xie, Dong Zeng, Jianhua Ma, Deyu Meng, and Yefeng Zheng. Oscnet: Orientation-shared convolutional network for ct metal artifact learning. IEEE Transactions on Medical Imaging, 43(1):489– 502, 2024. 4, 5

work page 2024
[37]

Conditional generative adversarial networks for metal artifact reduction in CT images of the ear.Med

Jianing Wang, Yiyuan Zhao, Jack H Noble, and Benoit M Dawant. Conditional generative adversarial networks for metal artifact reduction in CT images of the ear.Med. Image Comput. Comput. Assist. Interv., 11070:3–11, 2018. 2

work page 2018
[38]

Bovik, Hamid R

Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, and Eero P. Simoncelli. Image quality assessment: From error visibility to structural similarity.IEEE Transac- tions on Image Processing, 13(4):600–612, 2004. 5

work page 2004
[39]

Springer International Publishing, 2023

Tamar Willson.CT and SPECT/CT Artefacts, page 1–4. Springer International Publishing, 2023. 1

work page 2023
[40]

Qwen-Image Technical Report

Chenfei Wu, Jiahao Li, Jingren Zhou, Junyang Lin, et al. Qwen-image technical report.arXiv preprint arXiv:2508.02324, 2025. 2, 3

work page internal anchor Pith review Pith/arXiv arXiv 2025
[41]

Xcist—an open access x-ray/ct simulation toolkit.Physics in Medicine &; Biology, 67(19): 194002, 2022

Mingye Wu, Paul FitzGerald, Jiayong Zhang, W Paul Segars, Hengyong Yu, Yongshun Xu, and Bruno De Man. Xcist—an open access x-ray/ct simulation toolkit.Physics in Medicine &; Biology, 67(19): 194002, 2022. 4

work page 2022
[42]

Sum- mers

Ke Yan, Xiaosong Wang, Le Lu, and Ronald M. Sum- mers. Deeplesion: Automated mining of large-scale lesion annotations and universal lesion detection with deep learning.Journal of Medical Imaging, 5(3): 036501, 2018. 4, 5

work page 2018
[43]

Akın Yılmaz, Ahmet Bilican, Burak Can Biner, and A

M. Akın Yılmaz, Ahmet Bilican, Burak Can Biner, and A. Murat Tekalp. Edit2restore: Few-shot image restoration via parameter-efficient adaptation of pre-trained editing models.arXiv preprint arXiv:2601.03391, 2026. 2

work page arXiv 2026
[44]

Efros, Eli Shechtman, and Oliver Wang

Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. The unreasonable ef- fectiveness of deep features as a perceptual metric. InIEEE Conference on Computer Vision and Pattern Recognition, pages 586–595, 2018. 5

work page 2018
[45]

Lungren, Tristan Naumann, and Hoifung Poon

Sheng Zhang, Yanbo Xu, Naoto Usuyama, Han- wen Xu, Jaspreet Bagga, Robert Tinn, Sam Preston, Rajesh Rao, Mu Wei, Naveen Valluri, Cliff Wong, Matthew P. Lungren, Tristan Naumann, and Hoifung Poon. A multimodal biomedical foundation model trained from fifteen million image–text pairs.NEJM AI, 2(1), 2024. 2

work page 2024

[1] [1]

Aapm ct metal artifact reduction grand challenge benchmark tool.https : / / github

AAPM CT-MAR Grand Challenge Team. Aapm ct metal artifact reduction grand challenge benchmark tool.https : / / github . com / xcist / example / tree / main / AAPM _ datachallenge/, 2025. 4

work page 2025

[2] [2]

Akın Yılmaz, A

Ahmet Bilican, M. Akın Yılmaz, A. Murat Tekalp, and R. G ¨okberk Cinbis ¸. Exploring sparsity for pa- rameter efficient fine tuning using wavelets.preprint arXiv:2505.12532, 2025. 2

work page arXiv 2025

[3] [3]

FouRA: Fourier low-rank adaptation

Shubhankar Borse, Shreya Kadambi, Nilesh Prasad Pandey, Kartikeya Bhardwaj, Viswanath Ganapathy, Sweta Priyadarshi, Risheek Garrepalli, Rafael Es- teves, Munawar Hayat, and Fatih Porikli. FouRA: Fourier low-rank adaptation. InThe Thirty-eighth An- nual Conference on Neural Information Processing Systems, 2024. 2

work page 2024

[4] [4]

Chang, H.-N

C.-H. Chang, H.-N. Wu, C.-H. Hsu, and H.-H. Lin. Virtual monochromatic imaging with projection-based material decomposition algorithm for metal artifacts reduction in photon-counting detector computed to- mography.PLoS ONE, 18(3):e0282900, 2023. 1

work page 2023

[5] [5]

Sam-med2d

Junlong Cheng, Jin Ye, Zhongying Deng, Jianpin Chen, Tianbin Li, Haoyu Wang, Yanzhou Su, Ziyan Huang, Jilong Chen, Lei Jiang, Hui Sun, Junjun He, Shaoting Zhang, Min Zhu, and Yu Qiao. Sam-med2d. arXiv preprint arXiv:2308.16184, 2023. 2

work page arXiv 2023

[6] [6]

An iterative maximum-likelihood polychromatic algo- rithm for ct.IEEE Transactions on Medical Imaging, 20(10):999–1008, 2001

Bruno De Man, Johan Nuyts, Patrick Dupont, et al. An iterative maximum-likelihood polychromatic algo- rithm for ct.IEEE Transactions on Medical Imaging, 20(10):999–1008, 2001. 1

work page 2001

[7] [7]

Scaling rectified flow transformers for high-resolution image synthe- sis

Patrick Esser, Sumith Kulal, Andreas Blattmann, Tim Dockhorn, Jonas M ¨uller, Harry Lorenz, Yam Zhang, Robin Caplette, and Robin Rombach. Scaling rectified flow transformers for high-resolution image synthe- sis. InInternational Conference on Machine Learn- ing, 2024. 3

work page 2024

[8] [8]

Parameter- efficient fine-tuning with discrete fourier transform

Ziqi Gao, Qichao Wang, Aochuan Chen, Zijing Liu, Bingzhe Wu, Liang Chen, and Jia Li. Parameter- efficient fine-tuning with discrete fourier transform. InForty-first International Conference on Machine Learning, 2024. 2

work page 2024

[9] [9]

Metal artifact reduction in ct: Where are we after four decades?IEEE Access, 4:5826–5849, 2016

Lars Gjesteby, Bruno De Man, Yannan Jin, et al. Metal artifact reduction in ct: Where are we after four decades?IEEE Access, 4:5826–5849, 2016. 1

work page 2016

[10] [10]

Multi-frequency electrical impedance tomography and neuroimaging data in stroke patients.Scientific Data, 5:180112, 2018

Nir Goren, James Avery, Thomas Dowrick, Eleanor Mackle, Anna Witkowska-Wrobel, and David Holder. Multi-frequency electrical impedance tomography and neuroimaging data in stroke patients.Scientific Data, 5:180112, 2018. 4

work page 2018

[11] [11]

Haneda, N

E. Haneda, N. Peters, J. Zhang, et al. Aapm ct metal artifact reduction grand challenge.Medical Physics, 52(10):e70050, 2025. 2, 4

work page 2025

[12] [12]

Gans trained by a two time-scale update rule converge to a local nash equilibrium

Martin Heusel, Hubert Ramsauer, Thomas Un- terthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. InAdvances in Neural Information Processing Systems, 2017. 5

work page 2017

[13] [13]

Lora: Low-rank adaptation of large language models

Edward J Hu, Yelong Shen, Phillip Wallis, et al. Lora: Low-rank adaptation of large language models. In ICLR, 2022. 2, 3

work page 2022

[14] [14]

W. A. Kalender, R. Hebel, and J. Ebersberger. Reduc- tion of ct artifacts caused by metallic implants.Radi- ology, 164(2):576–577, 1987. 1

work page 1987

[15] [15]

Akın Yılmaz, A

Onur Keles ¸, M. Akın Yılmaz, A. Murat Tekalp, Cansu Korkmaz, and Zafer Do ˘gan. On the computation of psnr for a set of images or video. In2021 Picture Coding Symposium (PCS), pages 1–5, 2021. 5

work page 2021

[16] [16]

VeRA: Vector-based random matrix adaptation

Dawid Jan Kopiczko, Tijmen Blankevoort, and Yuki M Asano. VeRA: Vector-based random matrix adaptation. InThe Twelfth International Conference on Learning Representations, 2024. 2

work page 2024

[17] [17]

Kevin Zhou, and Jiebo Luo

Haofu Liao, Wei-An Lin, S. Kevin Zhou, and Jiebo Luo. Adn: Artifact disentanglement network for unsu- pervised metal artifact reduction.IEEE Transactions on Medical Imaging, 39(3):634–643, 2020. 2, 4, 5

work page 2020

[18] [18]

Dudonet: Dual domain network for ct metal artifact reduction

Wei-An Lin, Haofu Liao, Cheng Peng, et al. Dudonet: Dual domain network for ct metal artifact reduction. InCVPR, pages 10512–10521, 2019. 1

work page 2019

[19] [19]

Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling. InInternational Conference on Learning Representations, 2023. 3

work page 2023

[20] [20]

Dora: Weight- decomposed low-rank adaptation

Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, and Min-Hung Chen. Dora: Weight- decomposed low-rank adaptation. InICML, 2024. 2

work page 2024

[21] [21]

Radiologist-in-the-loop self-training for gener- alizable ct metal artifact reduction.IEEE Transactions on Medical Imaging, 44(6):2504–2514, 2025

Chenglong Ma, Zilong Li, Yuanlin Li, Jing Han, Jun- ping Zhang, Yi Zhang, Jiannan Liu, and Hongming Shan. Radiologist-in-the-loop self-training for gener- alizable ct metal artifact reduction.IEEE Transactions on Medical Imaging, 44(6):2504–2514, 2025. 4, 8

work page 2025

[22] [22]

Segment anything in medical images

Jun Ma, Yuting He, Feifei Li, Lin Han, Chenyu You, and Bo Wang. Segment anything in medical images. Nat. Commun., 15(1):654, 2024. 2

work page 2024

[23] [23]

Robson, Brett Marinelli, Mingqian Huang, Amish Doshi, Adam Ja- cobi, Chendi Cao, Katherine E

Xueyan Mei, Zelong Liu, Philip M. Robson, Brett Marinelli, Mingqian Huang, Amish Doshi, Adam Ja- cobi, Chendi Cao, Katherine E. Link, Thomas Yang, et al. Radimagenet: An open radiologic deep learning research dataset for effective transfer learning.Radi- ology: Artificial Intelligence, 4(5):e210315, 2022. 5

work page 2022

[24] [24]

Normalized metal artifact reduction (nmar) in computed tomogra- phy.Medical Physics, 37(10):5482–5493, 2010

Esther Meyer, Rainer Raupach, Michael Lell, Bern- hard Schmidt, and Marc Kachelrieß. Normalized metal artifact reduction (nmar) in computed tomogra- phy.Medical Physics, 37(10):5482–5493, 2010. 1

work page 2010

[25] [25]

GPT-4 technical report

OpenAI. GPT-4 technical report. Technical report, OpenAI, 2023. 2

work page 2023

[26] [26]

Peters, E

N. Peters, E. Haneda, J. Zhang, et al. A hybrid train- ing database and evaluation benchmark for assessing metal artifact reduction methods for x-ray ct imaging. Medical Physics, 52(10):e70020, 2025. 4

work page 2025

[27] [27]

Learning trans- ferable visual models from natural language supervi- sion

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sas- try, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning trans- ferable visual models from natural language supervi- sion. InInternational Conference on Machine Learn- ing, pages 8748–8763. PMLR, 2021. 2

work page 2021

[28] [28]

Akın Yılmaz, and A

Nasrin Rahimi, Mısra Yavuz, Burak Can Biner, Yunus Bilge Kurt, Ahmet Rasim Emirda˘gı, S¨uleyman Aslan, G¨orkay Aydemir, M. Akın Yılmaz, and A. Mu- rat Tekalp. Edit2interp: Adapting image founda- tion models from spatial editing to video frame in- terpolation with few-shot learning.arXiv preprint arXiv:2603.15003, 2026. 2

work page arXiv 2026

[29] [29]

High- resolution image synthesis with latent diffusion mod- els

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High- resolution image synthesis with latent diffusion mod- els. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022. 2

work page 2022

[30] [30]

Ad- vances in metal artifact reduction in CT images: A review of traditional and novel metal artifact reduc- tion techniques.Eur

Mark Selles, Jochen A C van Osch, Mario Maas, Mar- tijn F Boomsma, and Ruud H H Wellenberg. Ad- vances in metal artifact reduction in CT images: A review of traditional and novel metal artifact reduc- tion techniques.Eur. J. Radiol., 170(111276):111276,

work page

[31] [31]

Solving inverse problems in medical imaging with score-based generative models

Yang Song, Liyue Shen, Lei Xing, and Stefano Er- mon. Solving inverse problems in medical imaging with score-based generative models. InICLR, 2022. 2

work page 2022

[32] [32]

An image inpainting technique based on the fast marching method.Journal of Graph- ics Tools, 9(1):23–34, 2004

Alexandru Telea. An image inpainting technique based on the fast marching method.Journal of Graph- ics Tools, 9(1):23–34, 2004. 4

work page 2004

[33] [33]

Ge Wang, D. L. Snyder, J. A. O’Sullivan, and M. W. Vannier. Iterative deblurring for metal artifact reduc- tion in ct.IEEE Transactions on Medical Imaging, 15 (5):657–664, 1996. 1

work page 1996

[34] [34]

Orientation-shared convolution representation for CT metal artifact learn- ing

Hong Wang, Qi Xie, Yuexiang Li, Yawen Huang, Deyu Meng, and Yefeng Zheng. Orientation-shared convolution representation for CT metal artifact learn- ing. InLecture Notes in Computer Science, pages 665–675. Springer Nature Switzerland, Cham, 2022. 4, 5

work page 2022

[35] [35]

Indudonet+: A deep un- folding dual domain network for metal artifact reduc- tion in ct images.Medical Image Analysis, 85:102729,

Hong Wang, Yuexiang Li, Haimiao Zhang, Deyu Meng, and Yefeng Zheng. Indudonet+: A deep un- folding dual domain network for metal artifact reduc- tion in ct images.Medical Image Analysis, 85:102729,

work page

[36] [36]

Oscnet: Orientation-shared convolutional network for ct metal artifact learning

Hong Wang, Qi Xie, Dong Zeng, Jianhua Ma, Deyu Meng, and Yefeng Zheng. Oscnet: Orientation-shared convolutional network for ct metal artifact learning. IEEE Transactions on Medical Imaging, 43(1):489– 502, 2024. 4, 5

work page 2024

[37] [37]

Conditional generative adversarial networks for metal artifact reduction in CT images of the ear.Med

Jianing Wang, Yiyuan Zhao, Jack H Noble, and Benoit M Dawant. Conditional generative adversarial networks for metal artifact reduction in CT images of the ear.Med. Image Comput. Comput. Assist. Interv., 11070:3–11, 2018. 2

work page 2018

[38] [38]

Bovik, Hamid R

Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, and Eero P. Simoncelli. Image quality assessment: From error visibility to structural similarity.IEEE Transac- tions on Image Processing, 13(4):600–612, 2004. 5

work page 2004

[39] [39]

Springer International Publishing, 2023

Tamar Willson.CT and SPECT/CT Artefacts, page 1–4. Springer International Publishing, 2023. 1

work page 2023

[40] [40]

Qwen-Image Technical Report

Chenfei Wu, Jiahao Li, Jingren Zhou, Junyang Lin, et al. Qwen-image technical report.arXiv preprint arXiv:2508.02324, 2025. 2, 3

work page internal anchor Pith review Pith/arXiv arXiv 2025

[41] [41]

Xcist—an open access x-ray/ct simulation toolkit.Physics in Medicine &; Biology, 67(19): 194002, 2022

Mingye Wu, Paul FitzGerald, Jiayong Zhang, W Paul Segars, Hengyong Yu, Yongshun Xu, and Bruno De Man. Xcist—an open access x-ray/ct simulation toolkit.Physics in Medicine &; Biology, 67(19): 194002, 2022. 4

work page 2022

[42] [42]

Sum- mers

Ke Yan, Xiaosong Wang, Le Lu, and Ronald M. Sum- mers. Deeplesion: Automated mining of large-scale lesion annotations and universal lesion detection with deep learning.Journal of Medical Imaging, 5(3): 036501, 2018. 4, 5

work page 2018

[43] [43]

Akın Yılmaz, Ahmet Bilican, Burak Can Biner, and A

M. Akın Yılmaz, Ahmet Bilican, Burak Can Biner, and A. Murat Tekalp. Edit2restore: Few-shot image restoration via parameter-efficient adaptation of pre-trained editing models.arXiv preprint arXiv:2601.03391, 2026. 2

work page arXiv 2026

[44] [44]

Efros, Eli Shechtman, and Oliver Wang

Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. The unreasonable ef- fectiveness of deep features as a perceptual metric. InIEEE Conference on Computer Vision and Pattern Recognition, pages 586–595, 2018. 5

work page 2018

[45] [45]

Lungren, Tristan Naumann, and Hoifung Poon

Sheng Zhang, Yanbo Xu, Naoto Usuyama, Han- wen Xu, Jaspreet Bagga, Robert Tinn, Sam Preston, Rajesh Rao, Mu Wei, Naveen Valluri, Cliff Wong, Matthew P. Lungren, Tristan Naumann, and Hoifung Poon. A multimodal biomedical foundation model trained from fifteen million image–text pairs.NEJM AI, 2(1), 2024. 2

work page 2024