pith. sign in

arxiv: 2604.05934 · v1 · submitted 2026-04-07 · 💻 cs.CV · eess.IV

Leveraging Image Editing Foundation Models for Data-Efficient CT Metal Artifact Reduction

Pith reviewed 2026-05-10 18:41 UTC · model grok-4.3

classification 💻 cs.CV eess.IV
keywords CT metal artifact reductiondiffusion modelsLoRA adaptationdata-efficient learningin-context reasoningmedical image reconstructionvision-language modelsmulti-reference conditioning
0
0 comments X

The pith

Adapting a vision-language diffusion model via LoRA with multi-reference conditioning suppresses CT metal artifacts using only 16 to 128 paired training examples.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Metal artifacts from implants in CT scans hide anatomy and standard deep learning fixes demand thousands of paired clean and corrupted images that are costly to acquire. This work reframes the task as in-context reasoning and adapts a general-purpose vision-language diffusion foundation model with parameter-efficient LoRA fine-tuning. Clean anatomical images from unrelated subjects serve as reference context to guide restoration of the corrupted scan. The method reaches state-of-the-art scores on the AAPM CT-MAR benchmark for both perceptual quality and radiological features while cutting the required training pairs by roughly two orders of magnitude. Domain adaptation proves essential; without it the model mistakes streak artifacts for unrelated natural objects.

Core claim

By treating metal artifact reduction as an in-context reasoning task and adapting a vision-language diffusion foundation model with LoRA plus multi-reference conditioning on clean exemplars from other subjects, the approach achieves effective artifact suppression and state-of-the-art performance on perceptual and radiological metrics using only 16 to 128 paired examples, two orders of magnitude fewer than conventional supervised methods.

What carries the argument

LoRA adaptation of a vision-language diffusion foundation model that receives the corrupted CT slice together with clean anatomical reference images from unrelated subjects to perform in-context restoration.

If this is right

  • Artifact suppression reaches state-of-the-art levels on the AAPM CT-MAR benchmark for both perceptual and radiological-feature metrics.
  • Training data requirements drop from thousands to 16-128 paired examples.
  • Domain adaptation via LoRA prevents the foundation model from misinterpreting artifacts as unrelated natural objects.
  • Multi-reference conditioning with clean exemplars from other patients enables category-specific anatomical inference.
  • The adapted foundation model supplies an interpretable, data-efficient route to medical image reconstruction.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same in-context conditioning strategy could be tested on other limited-data medical reconstruction problems such as low-dose CT or MRI denoising.
  • Performance may depend on how closely the reference images match the anatomical category of the corrupted scan, suggesting a need for automated reference selection.
  • If the approach scales, large unlabeled medical image collections could serve as reference banks without requiring new paired acquisitions.

Load-bearing premise

That clean anatomical exemplars from unrelated subjects supply sufficient category-specific context for the adapted model to correctly infer and restore the underlying anatomy without hallucinating new structures.

What would settle it

Running the model on the AAPM CT-MAR test set without the multi-reference conditioning or without LoRA domain adaptation and measuring whether streak artifacts are still interpreted as natural objects or whether quantitative and perceptual metrics fall below prior supervised baselines.

Figures

Figures reproduced from arXiv: 2604.05934 by Ahmet Rasim Emirdagi, Burak Can Biner, G\"orkay Aydemir, M. Ak{\i}n Y{\i}lmaz, M{\i}sra Yavuz, Nasrin Rahimi, S\"uleyman Aslan, Yunus Bilge Kurt.

Figure 1
Figure 1. Figure 1: Overview of our framework [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Visual comparison of artifact reduction methods. (a) Noisy inputs, (b) ADN, (c) Rise-MAR, (d) OSCNet+ and (e) Our recon [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Visual demonstration of domain misalignment. Foun [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
read the original abstract

Metal artifacts from high-attenuation implants severely degrade CT image quality, obscuring critical anatomical structures and posing a challenge for standard deep learning methods that require extensive paired training data. We propose a paradigm shift: reframing artifact reduction as an in-context reasoning task by adapting a general-purpose vision-language diffusion foundation model via parameter-efficient Low-Rank Adaptation (LoRA). By leveraging rich visual priors, our approach achieves effective artifact suppression with only 16 to 128 paired training examples reducing data requirements by two orders of magnitude. Crucially, we demonstrate that domain adaptation is essential for hallucination mitigation; without it, foundation models interpret streak artifacts as erroneous natural objects (e.g., waffles or petri dishes). To ground the restoration, we propose a multi-reference conditioning strategy where clean anatomical exemplars from unrelated subjects are provided alongside the corrupted input, enabling the model to exploit category-specific context to infer uncorrupted anatomy. Extensive evaluation on the AAPM CT-MAR benchmark demonstrates that our method achieves state-of-the-art performance on perceptual and radiological-feature metrics . This work establishes that foundation models, when appropriately adapted, offer a scalable alternative for interpretable, data-efficient medical image reconstruction. Code is available at https://github.com/ahmetemirdagi/CT-EditMAR.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 3 minor

Summary. The paper proposes reframing CT metal artifact reduction (MAR) as an in-context reasoning task by adapting a vision-language diffusion foundation model via LoRA, combined with multi-reference conditioning using clean anatomical exemplars from unrelated subjects. It claims this enables effective artifact suppression and state-of-the-art performance on the AAPM CT-MAR benchmark using only 16 to 128 paired training examples (two orders of magnitude less data than standard methods), while showing that domain adaptation is essential to prevent the model from misinterpreting artifacts as natural objects.

Significance. If the central performance claims hold under rigorous validation, this work would be significant for demonstrating that foundation models can be adapted for data-scarce medical imaging tasks like CT reconstruction, potentially reducing reliance on large paired datasets. The public code release supports reproducibility and allows community verification of the empirical results.

major comments (1)
  1. [§3.2] §3.2 (multi-reference conditioning): The central data-efficiency claim depends on the assumption that clean exemplars from unrelated subjects supply usable category-specific priors for recovering patient-specific anatomy. Given substantial inter-patient variability in organ geometry, size, and position in CT, the manuscript provides no analysis, ablation, or patient-specific validation of reference selection/impact to show that the model recovers true geometry rather than blending or hallucinating structures from the references. This is load-bearing for the 16–128 example regime, where LoRA adaptation has limited opportunity to learn mappings.
minor comments (3)
  1. The abstract asserts state-of-the-art performance on 'perceptual and radiological-feature metrics' without naming the specific metrics, providing numerical values, or referencing the corresponding results table; this should be stated explicitly in the abstract or introduction.
  2. [Method] The method section should specify the exact foundation model (e.g., which Stable Diffusion variant or vision-language model) and the precise LoRA configuration (rank, target modules) for reproducibility.
  3. [Results] Figure captions in the results section could more explicitly describe the role of the provided references and highlight differences in artifact suppression across methods.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their insightful review and for recognizing the potential significance of adapting foundation models to data-scarce medical imaging tasks. We address the major comment below and will revise the manuscript to incorporate additional analysis as suggested.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (multi-reference conditioning): The central data-efficiency claim depends on the assumption that clean exemplars from unrelated subjects supply usable category-specific priors for recovering patient-specific anatomy. Given substantial inter-patient variability in organ geometry, size, and position in CT, the manuscript provides no analysis, ablation, or patient-specific validation of reference selection/impact to show that the model recovers true geometry rather than blending or hallucinating structures from the references. This is load-bearing for the 16–128 example regime, where LoRA adaptation has limited opportunity to learn mappings.

    Authors: We thank the referee for this important observation. The multi-reference conditioning strategy is intended to exploit the foundation model's in-context reasoning by supplying clean anatomical exemplars as category-specific visual priors, enabling inference of uncorrupted patient anatomy even with minimal paired data. Our experiments indicate that this yields measurable gains over single-reference or no-reference baselines on the AAPM CT-MAR benchmark. Nevertheless, we agree that the current manuscript lacks a dedicated ablation and validation of reference selection and impact. In the revised version, we will add: (1) an ablation varying reference count and anatomical similarity (measured via feature-based metrics), (2) quantitative assessments of anatomical fidelity on non-artifact regions using available ground-truth structures, and (3) patient-specific case studies with visualizations to demonstrate recovery of true geometry rather than blending or hallucination. These additions will directly address inter-patient variability and strengthen the data-efficiency claims for the low-data regime. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents an empirical method: reframing metal artifact reduction as in-context reasoning via LoRA adaptation of a vision-language diffusion foundation model, combined with multi-reference conditioning using clean exemplars. Performance claims rest on AAPM CT-MAR benchmark evaluation and ablation studies showing data efficiency (16-128 examples) and the necessity of domain adaptation to avoid hallucinations. No equations, first-principles derivations, or predictions are offered that reduce by construction to fitted inputs, self-definitions, or self-citation chains. The central results are externally falsifiable via the public benchmark and code release, with no load-bearing self-citations or uniqueness theorems invoked.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Only the abstract is available, so the ledger is limited to assumptions stated or implied therein. The central claim rests on the pre-trained diffusion model possessing transferable visual priors that survive light adaptation and on the utility of unrelated clean exemplars for anatomical inference.

axioms (2)
  • domain assumption General-purpose vision-language diffusion models contain rich visual priors transferable to medical CT images after parameter-efficient adaptation
    Invoked by the claim that the foundation model can be adapted for artifact suppression.
  • standard math LoRA enables effective domain adaptation without catastrophic forgetting of the base model's capabilities
    Standard assumption underlying the use of Low-Rank Adaptation.

pith-pipeline@v0.9.0 · 5575 in / 1497 out tokens · 43849 ms · 2026-05-10T18:41:35.926341+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 1 internal anchor

  1. [1]

    Aapm ct metal artifact reduction grand challenge benchmark tool.https : / / github

    AAPM CT-MAR Grand Challenge Team. Aapm ct metal artifact reduction grand challenge benchmark tool.https : / / github . com / xcist / example / tree / main / AAPM _ datachallenge/, 2025. 4

  2. [2]

    Akın Yılmaz, A

    Ahmet Bilican, M. Akın Yılmaz, A. Murat Tekalp, and R. G ¨okberk Cinbis ¸. Exploring sparsity for pa- rameter efficient fine tuning using wavelets.preprint arXiv:2505.12532, 2025. 2

  3. [3]

    FouRA: Fourier low-rank adaptation

    Shubhankar Borse, Shreya Kadambi, Nilesh Prasad Pandey, Kartikeya Bhardwaj, Viswanath Ganapathy, Sweta Priyadarshi, Risheek Garrepalli, Rafael Es- teves, Munawar Hayat, and Fatih Porikli. FouRA: Fourier low-rank adaptation. InThe Thirty-eighth An- nual Conference on Neural Information Processing Systems, 2024. 2

  4. [4]

    Chang, H.-N

    C.-H. Chang, H.-N. Wu, C.-H. Hsu, and H.-H. Lin. Virtual monochromatic imaging with projection-based material decomposition algorithm for metal artifacts reduction in photon-counting detector computed to- mography.PLoS ONE, 18(3):e0282900, 2023. 1

  5. [5]

    Sam-med2d

    Junlong Cheng, Jin Ye, Zhongying Deng, Jianpin Chen, Tianbin Li, Haoyu Wang, Yanzhou Su, Ziyan Huang, Jilong Chen, Lei Jiang, Hui Sun, Junjun He, Shaoting Zhang, Min Zhu, and Yu Qiao. Sam-med2d. arXiv preprint arXiv:2308.16184, 2023. 2

  6. [6]

    An iterative maximum-likelihood polychromatic algo- rithm for ct.IEEE Transactions on Medical Imaging, 20(10):999–1008, 2001

    Bruno De Man, Johan Nuyts, Patrick Dupont, et al. An iterative maximum-likelihood polychromatic algo- rithm for ct.IEEE Transactions on Medical Imaging, 20(10):999–1008, 2001. 1

  7. [7]

    Scaling rectified flow transformers for high-resolution image synthe- sis

    Patrick Esser, Sumith Kulal, Andreas Blattmann, Tim Dockhorn, Jonas M ¨uller, Harry Lorenz, Yam Zhang, Robin Caplette, and Robin Rombach. Scaling rectified flow transformers for high-resolution image synthe- sis. InInternational Conference on Machine Learn- ing, 2024. 3

  8. [8]

    Parameter- efficient fine-tuning with discrete fourier transform

    Ziqi Gao, Qichao Wang, Aochuan Chen, Zijing Liu, Bingzhe Wu, Liang Chen, and Jia Li. Parameter- efficient fine-tuning with discrete fourier transform. InForty-first International Conference on Machine Learning, 2024. 2

  9. [9]

    Metal artifact reduction in ct: Where are we after four decades?IEEE Access, 4:5826–5849, 2016

    Lars Gjesteby, Bruno De Man, Yannan Jin, et al. Metal artifact reduction in ct: Where are we after four decades?IEEE Access, 4:5826–5849, 2016. 1

  10. [10]

    Multi-frequency electrical impedance tomography and neuroimaging data in stroke patients.Scientific Data, 5:180112, 2018

    Nir Goren, James Avery, Thomas Dowrick, Eleanor Mackle, Anna Witkowska-Wrobel, and David Holder. Multi-frequency electrical impedance tomography and neuroimaging data in stroke patients.Scientific Data, 5:180112, 2018. 4

  11. [11]

    Haneda, N

    E. Haneda, N. Peters, J. Zhang, et al. Aapm ct metal artifact reduction grand challenge.Medical Physics, 52(10):e70050, 2025. 2, 4

  12. [12]

    Gans trained by a two time-scale update rule converge to a local nash equilibrium

    Martin Heusel, Hubert Ramsauer, Thomas Un- terthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. InAdvances in Neural Information Processing Systems, 2017. 5

  13. [13]

    Lora: Low-rank adaptation of large language models

    Edward J Hu, Yelong Shen, Phillip Wallis, et al. Lora: Low-rank adaptation of large language models. In ICLR, 2022. 2, 3

  14. [14]

    W. A. Kalender, R. Hebel, and J. Ebersberger. Reduc- tion of ct artifacts caused by metallic implants.Radi- ology, 164(2):576–577, 1987. 1

  15. [15]

    Akın Yılmaz, A

    Onur Keles ¸, M. Akın Yılmaz, A. Murat Tekalp, Cansu Korkmaz, and Zafer Do ˘gan. On the computation of psnr for a set of images or video. In2021 Picture Coding Symposium (PCS), pages 1–5, 2021. 5

  16. [16]

    VeRA: Vector-based random matrix adaptation

    Dawid Jan Kopiczko, Tijmen Blankevoort, and Yuki M Asano. VeRA: Vector-based random matrix adaptation. InThe Twelfth International Conference on Learning Representations, 2024. 2

  17. [17]

    Kevin Zhou, and Jiebo Luo

    Haofu Liao, Wei-An Lin, S. Kevin Zhou, and Jiebo Luo. Adn: Artifact disentanglement network for unsu- pervised metal artifact reduction.IEEE Transactions on Medical Imaging, 39(3):634–643, 2020. 2, 4, 5

  18. [18]

    Dudonet: Dual domain network for ct metal artifact reduction

    Wei-An Lin, Haofu Liao, Cheng Peng, et al. Dudonet: Dual domain network for ct metal artifact reduction. InCVPR, pages 10512–10521, 2019. 1

  19. [19]

    Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling. InInternational Conference on Learning Representations, 2023. 3

  20. [20]

    Dora: Weight- decomposed low-rank adaptation

    Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, and Min-Hung Chen. Dora: Weight- decomposed low-rank adaptation. InICML, 2024. 2

  21. [21]

    Radiologist-in-the-loop self-training for gener- alizable ct metal artifact reduction.IEEE Transactions on Medical Imaging, 44(6):2504–2514, 2025

    Chenglong Ma, Zilong Li, Yuanlin Li, Jing Han, Jun- ping Zhang, Yi Zhang, Jiannan Liu, and Hongming Shan. Radiologist-in-the-loop self-training for gener- alizable ct metal artifact reduction.IEEE Transactions on Medical Imaging, 44(6):2504–2514, 2025. 4, 8

  22. [22]

    Segment anything in medical images

    Jun Ma, Yuting He, Feifei Li, Lin Han, Chenyu You, and Bo Wang. Segment anything in medical images. Nat. Commun., 15(1):654, 2024. 2

  23. [23]

    Robson, Brett Marinelli, Mingqian Huang, Amish Doshi, Adam Ja- cobi, Chendi Cao, Katherine E

    Xueyan Mei, Zelong Liu, Philip M. Robson, Brett Marinelli, Mingqian Huang, Amish Doshi, Adam Ja- cobi, Chendi Cao, Katherine E. Link, Thomas Yang, et al. Radimagenet: An open radiologic deep learning research dataset for effective transfer learning.Radi- ology: Artificial Intelligence, 4(5):e210315, 2022. 5

  24. [24]

    Normalized metal artifact reduction (nmar) in computed tomogra- phy.Medical Physics, 37(10):5482–5493, 2010

    Esther Meyer, Rainer Raupach, Michael Lell, Bern- hard Schmidt, and Marc Kachelrieß. Normalized metal artifact reduction (nmar) in computed tomogra- phy.Medical Physics, 37(10):5482–5493, 2010. 1

  25. [25]

    GPT-4 technical report

    OpenAI. GPT-4 technical report. Technical report, OpenAI, 2023. 2

  26. [26]

    Peters, E

    N. Peters, E. Haneda, J. Zhang, et al. A hybrid train- ing database and evaluation benchmark for assessing metal artifact reduction methods for x-ray ct imaging. Medical Physics, 52(10):e70020, 2025. 4

  27. [27]

    Learning trans- ferable visual models from natural language supervi- sion

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sas- try, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning trans- ferable visual models from natural language supervi- sion. InInternational Conference on Machine Learn- ing, pages 8748–8763. PMLR, 2021. 2

  28. [28]

    Akın Yılmaz, and A

    Nasrin Rahimi, Mısra Yavuz, Burak Can Biner, Yunus Bilge Kurt, Ahmet Rasim Emirda˘gı, S¨uleyman Aslan, G¨orkay Aydemir, M. Akın Yılmaz, and A. Mu- rat Tekalp. Edit2interp: Adapting image founda- tion models from spatial editing to video frame in- terpolation with few-shot learning.arXiv preprint arXiv:2603.15003, 2026. 2

  29. [29]

    High- resolution image synthesis with latent diffusion mod- els

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High- resolution image synthesis with latent diffusion mod- els. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022. 2

  30. [30]

    Ad- vances in metal artifact reduction in CT images: A review of traditional and novel metal artifact reduc- tion techniques.Eur

    Mark Selles, Jochen A C van Osch, Mario Maas, Mar- tijn F Boomsma, and Ruud H H Wellenberg. Ad- vances in metal artifact reduction in CT images: A review of traditional and novel metal artifact reduc- tion techniques.Eur. J. Radiol., 170(111276):111276,

  31. [31]

    Solving inverse problems in medical imaging with score-based generative models

    Yang Song, Liyue Shen, Lei Xing, and Stefano Er- mon. Solving inverse problems in medical imaging with score-based generative models. InICLR, 2022. 2

  32. [32]

    An image inpainting technique based on the fast marching method.Journal of Graph- ics Tools, 9(1):23–34, 2004

    Alexandru Telea. An image inpainting technique based on the fast marching method.Journal of Graph- ics Tools, 9(1):23–34, 2004. 4

  33. [33]

    Ge Wang, D. L. Snyder, J. A. O’Sullivan, and M. W. Vannier. Iterative deblurring for metal artifact reduc- tion in ct.IEEE Transactions on Medical Imaging, 15 (5):657–664, 1996. 1

  34. [34]

    Orientation-shared convolution representation for CT metal artifact learn- ing

    Hong Wang, Qi Xie, Yuexiang Li, Yawen Huang, Deyu Meng, and Yefeng Zheng. Orientation-shared convolution representation for CT metal artifact learn- ing. InLecture Notes in Computer Science, pages 665–675. Springer Nature Switzerland, Cham, 2022. 4, 5

  35. [35]

    Indudonet+: A deep un- folding dual domain network for metal artifact reduc- tion in ct images.Medical Image Analysis, 85:102729,

    Hong Wang, Yuexiang Li, Haimiao Zhang, Deyu Meng, and Yefeng Zheng. Indudonet+: A deep un- folding dual domain network for metal artifact reduc- tion in ct images.Medical Image Analysis, 85:102729,

  36. [36]

    Oscnet: Orientation-shared convolutional network for ct metal artifact learning

    Hong Wang, Qi Xie, Dong Zeng, Jianhua Ma, Deyu Meng, and Yefeng Zheng. Oscnet: Orientation-shared convolutional network for ct metal artifact learning. IEEE Transactions on Medical Imaging, 43(1):489– 502, 2024. 4, 5

  37. [37]

    Conditional generative adversarial networks for metal artifact reduction in CT images of the ear.Med

    Jianing Wang, Yiyuan Zhao, Jack H Noble, and Benoit M Dawant. Conditional generative adversarial networks for metal artifact reduction in CT images of the ear.Med. Image Comput. Comput. Assist. Interv., 11070:3–11, 2018. 2

  38. [38]

    Bovik, Hamid R

    Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, and Eero P. Simoncelli. Image quality assessment: From error visibility to structural similarity.IEEE Transac- tions on Image Processing, 13(4):600–612, 2004. 5

  39. [39]

    Springer International Publishing, 2023

    Tamar Willson.CT and SPECT/CT Artefacts, page 1–4. Springer International Publishing, 2023. 1

  40. [40]

    Qwen-Image Technical Report

    Chenfei Wu, Jiahao Li, Jingren Zhou, Junyang Lin, et al. Qwen-image technical report.arXiv preprint arXiv:2508.02324, 2025. 2, 3

  41. [41]

    Xcist—an open access x-ray/ct simulation toolkit.Physics in Medicine &; Biology, 67(19): 194002, 2022

    Mingye Wu, Paul FitzGerald, Jiayong Zhang, W Paul Segars, Hengyong Yu, Yongshun Xu, and Bruno De Man. Xcist—an open access x-ray/ct simulation toolkit.Physics in Medicine &; Biology, 67(19): 194002, 2022. 4

  42. [42]

    Sum- mers

    Ke Yan, Xiaosong Wang, Le Lu, and Ronald M. Sum- mers. Deeplesion: Automated mining of large-scale lesion annotations and universal lesion detection with deep learning.Journal of Medical Imaging, 5(3): 036501, 2018. 4, 5

  43. [43]

    Akın Yılmaz, Ahmet Bilican, Burak Can Biner, and A

    M. Akın Yılmaz, Ahmet Bilican, Burak Can Biner, and A. Murat Tekalp. Edit2restore: Few-shot image restoration via parameter-efficient adaptation of pre-trained editing models.arXiv preprint arXiv:2601.03391, 2026. 2

  44. [44]

    Efros, Eli Shechtman, and Oliver Wang

    Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. The unreasonable ef- fectiveness of deep features as a perceptual metric. InIEEE Conference on Computer Vision and Pattern Recognition, pages 586–595, 2018. 5

  45. [45]

    Lungren, Tristan Naumann, and Hoifung Poon

    Sheng Zhang, Yanbo Xu, Naoto Usuyama, Han- wen Xu, Jaspreet Bagga, Robert Tinn, Sam Preston, Rajesh Rao, Mu Wei, Naveen Valluri, Cliff Wong, Matthew P. Lungren, Tristan Naumann, and Hoifung Poon. A multimodal biomedical foundation model trained from fifteen million image–text pairs.NEJM AI, 2(1), 2024. 2