pith. sign in

arxiv: 2604.08922 · v1 · submitted 2026-04-10 · 💻 cs.CV

Degradation-Robust Fusion: An Efficient Degradation-Aware Diffusion Framework for Multimodal Image Fusion in Arbitrary Degradation Scenarios

Pith reviewed 2026-05-10 17:24 UTC · model grok-4.3

classification 💻 cs.CV
keywords image fusiondiffusion modeldegradation awaremultimodal fusionimplicit denoisingobservation model correctionarbitrary degradations
0
0 comments X

The pith

Direct regression of the fused image in a diffusion process with joint constraint correction enables robust multimodal fusion under arbitrary degradations using few sampling steps.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper presents a degradation-aware diffusion framework designed for fusing multimodal images affected by complex degradations such as noise, blur, and low resolution. Rather than predicting noise like standard diffusion models, the approach directly regresses the target fused image to perform implicit denoising. This allows the model to adapt flexibly to various fusion tasks with only a limited number of sampling steps. A joint observation model correction mechanism is introduced to enforce both the degradation constraints from the input sources and the fusion requirements during the sampling process. Sympathetic readers would care because existing end-to-end methods lack interpretability and standard diffusion struggles with the lack of natural fused data, so this offers a practical solution for real-world image fusion.

Core claim

The paper establishes that performing implicit denoising by directly regressing the fused image within a diffusion-style process, combined with a joint observation model correction that imposes degradation and fusion constraints simultaneously during sampling, allows for efficient and accurate multimodal image fusion in arbitrary degradation scenarios without requiring explicit noise modeling or large numbers of sampling steps.

What carries the argument

The joint observation model correction mechanism, which simultaneously applies degradation and fusion constraints during the limited-step sampling process in a direct-regression diffusion framework.

If this is right

  • Fusion performance remains high even when input images suffer from combined degradations like noise and blur.
  • The framework supports multiple multimodal tasks such as infrared-visible fusion and medical image fusion under the same model structure.
  • High reconstruction accuracy is achieved with significantly fewer sampling steps compared to conventional diffusion approaches.
  • Complementary information from multiple degraded sources is captured effectively through the regression-based process.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This regression strategy might extend to other multimodal inverse problems where ground-truth combined data is scarce.
  • Removing the need for natural fused training data could lower barriers for applying generative models in fusion research.
  • The interpretability gains from the structured sampling process may help diagnose failure cases in fusion outputs.

Load-bearing premise

Directly regressing the fused image effectively captures complementary information from multiple degraded inputs without explicit noise modeling, and the joint correction accurately enforces both types of constraints even with limited sampling steps.

What would settle it

An ablation study removing the joint observation model correction and measuring the resulting drop in fusion quality metrics on standard benchmarks with added combined degradations such as noise plus blur would test whether the mechanism is essential for the claimed accuracy.

Figures

Figures reproduced from arXiv: 2604.08922 by Huafeng Li, Juan Cheng, Xun Chen, Yu Liu, Yu Shi, Zhong-Cheng Wu.

Figure 1
Figure 1. Figure 1: Comparison of fusion strategies under different degrada [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The proposed framework for multimodal image fusion under various degradation scenarios in this work. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Qualitative results of different fusion methods on M3FD dataset. For the comparison methods, we first use corresponding [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative comparison of different fusion methods on the Harvard dataset. For the comparison methods, we first apply the [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of time efficiency and model parameters [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Ablation study results on two datasets. (a) represents the [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Results with and without the joint constraint correction mechanism under different degradation scenarios on M3FD dataset. [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Results with and without the joint constraint correction mechanism under different degradation scenarios on PET-MRI dataset. [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Qualitative detection results based on fusion images generated by different fusion methods. [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗
read the original abstract

Complex degradations like noise, blur, and low resolution are typical challenges in real world image fusion tasks, limiting the performance and practicality of existing methods. End to end neural network based approaches are generally simple to design and highly efficient in inference, but their black-box nature leads to limited interpretability. Diffusion based methods alleviate this to some extent by providing powerful generative priors and a more structured inference process. However, they are trained to learn a single domain target distribution, whereas fusion lacks natural fused data and relies on modeling complementary information from multiple sources, making diffusion hard to apply directly in practice. To address these challenges, this paper proposes an efficient degradation aware diffusion framework for image fusion under arbitrary degradation scenarios. Specifically, instead of explicitly predicting noise as in conventional diffusion models, our method performs implicit denoising by directly regressing the fused image, enabling flexible adaptation to diverse fusion tasks under complex degradations with limited steps. Moreover, we design a joint observation model correction mechanism that simultaneously imposes degradation and fusion constraints during sampling to ensure high reconstruction accuracy. Experiments on diverse fusion tasks and degradation configurations demonstrate the superiority of the proposed method under complex degradation scenarios.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes an efficient degradation-aware diffusion framework for multimodal image fusion under arbitrary real-world degradations (noise, blur, low resolution). It replaces the standard noise-prediction objective with direct regression of the fused image for implicit denoising, enabling adaptation to diverse tasks with limited sampling steps, and introduces a joint observation model correction mechanism that enforces both degradation consistency and fusion constraints during sampling. Experiments on various fusion tasks and degradation configurations are claimed to demonstrate superiority over existing methods.

Significance. If the central claims hold, the work would offer a practical advance in applying diffusion models to fusion problems that lack natural ground-truth fused data. The direct-regression approach combined with a joint correction could provide efficiency and some interpretability gains over black-box end-to-end networks while leveraging generative priors, with potential impact on applications such as infrared-visible or medical image fusion where degradations are common. The emphasis on limited-step sampling addresses a key practicality barrier for diffusion in this domain.

major comments (2)
  1. [Abstract and Method] Abstract and Method: The substitution of noise prediction with direct regression of the fused image is load-bearing for the efficiency and 'implicit denoising' claims, yet the description provides no derivation showing how this still yields a valid reverse process that aggregates complementary information across arbitrarily degraded sources; the skeptic's concern that mismatches will compound rather than denoise away in limited steps therefore requires explicit analysis or comparison to standard diffusion objectives.
  2. [Method] Method: The joint observation model correction mechanism is presented as simultaneously imposing degradation and fusion constraints to ensure high accuracy, but without its mathematical formulation, derivation, or proof that it avoids posterior mismatch in the limited-step regime, it is impossible to verify whether the mechanism is robust or merely ad-hoc; this is central to the reconstruction-accuracy claim.
minor comments (2)
  1. [Abstract] Abstract: The phrase 'arbitrary degradation scenarios' is used without enumerating the specific degradation types or combinations tested; adding this would strengthen the claim of generality.
  2. [Experiments] Experiments: While superiority is asserted, the abstract supplies no quantitative metrics, baselines, or ablation results; the full manuscript should ensure these are clearly tabulated with statistical significance tests.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and will revise the paper to strengthen the theoretical justification of our approach.

read point-by-point responses
  1. Referee: [Abstract and Method] Abstract and Method: The substitution of noise prediction with direct regression of the fused image is load-bearing for the efficiency and 'implicit denoising' claims, yet the description provides no derivation showing how this still yields a valid reverse process that aggregates complementary information across arbitrarily degraded sources; the skeptic's concern that mismatches will compound rather than denoise away in limited steps therefore requires explicit analysis or comparison to standard diffusion objectives.

    Authors: We appreciate the referee's concern that the substitution of noise prediction with direct regression of the fused image requires explicit justification to confirm it produces a valid reverse process. In the revised manuscript, we will add a detailed derivation of the reverse diffusion process under this objective, showing how the joint constraints enable aggregation of complementary information from arbitrarily degraded sources. We will also include a direct comparison to standard DDPM noise-prediction objectives and analysis demonstrating that mismatches are mitigated (rather than compounded) during limited-step sampling. revision: yes

  2. Referee: [Method] Method: The joint observation model correction mechanism is presented as simultaneously imposing degradation and fusion constraints to ensure high accuracy, but without its mathematical formulation, derivation, or proof that it avoids posterior mismatch in the limited-step regime, it is impossible to verify whether the mechanism is robust or merely ad-hoc; this is central to the reconstruction-accuracy claim.

    Authors: We acknowledge that the current manuscript does not provide sufficient mathematical detail on the joint observation model correction mechanism. In the revision, we will include the complete formulation, a step-by-step derivation, and analysis (including discussion of posterior mismatch) to demonstrate that the mechanism robustly enforces both degradation consistency and fusion constraints in the limited-step regime, rather than relying on ad-hoc choices. revision: yes

Circularity Check

0 steps flagged

No circularity: direct regression target and joint correction are explicit design choices, not reductions to inputs

full rationale

The paper explicitly replaces the standard noise-prediction objective of diffusion models with direct regression of the fused image and introduces a joint observation model correction applied during sampling. These are presented as methodological adaptations to address the absence of natural fused targets and arbitrary degradations, rather than any derivation that equates a prediction to its own fitted parameters or self-referential definitions. No equations reduce claimed performance to quantities defined by the model itself, no uniqueness theorems are imported from self-citations, and no ansatzes are smuggled via prior work. Claims rest on experimental validation across tasks rather than tautological construction. This is the common case of a self-contained proposal adapting existing priors with new components.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities beyond the high-level description of the new correction mechanism; details on any fitted components or background assumptions are absent.

axioms (1)
  • domain assumption Diffusion models can be effectively repurposed for fusion tasks by replacing explicit noise prediction with direct regression of the target fused image
    Core design choice stated in the abstract to enable limited-step adaptation
invented entities (1)
  • joint observation model correction mechanism no independent evidence
    purpose: To simultaneously enforce degradation and fusion constraints during the sampling process
    New component introduced to ensure reconstruction accuracy under complex degradations

pith-pipeline@v0.9.0 · 5517 in / 1408 out tokens · 54594 ms · 2026-05-10T17:24:37.160420+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

65 extracted references · 65 canonical work pages

  1. [1]

    Ensemble of cnn for multi-focus image fusion.Information Fusion, 51:201–214, 2019

    Mostafa Amin-Naji, Ali Aghagolzadeh, and Mehdi Ezoji. Ensemble of cnn for multi-focus image fusion.Information Fusion, 51:201–214, 2019. 2

  2. [2]

    A novel state space model with local enhancement and state sharing for image fusion

    Zihan Cao, Xiao Wu, Liang-Jian Deng, and Yu Zhong. A novel state space model with local enhancement and state sharing for image fusion. InProceedings of the 32nd ACM International Conference on Multimedia, pages 1235–1244,

  3. [3]

    Invertible diffusion models for compressed sensing.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 2025

    Bin Chen, Zhenyu Zhang, Weiqi Li, Chen Zhao, Jiwen Yu, Shijie Zhao, Jie Chen, and Jian Zhang. Invertible diffusion models for compressed sensing.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 2025. 3, 4

  4. [4]

    Mdb- fusion: a visible and infrared image fusion framework ca- pable for motion deblurring

    Jun Chen, Wei Yu, Xin Tian, Jun Huang, and Jiayi Ma. Mdb- fusion: a visible and infrared image fusion framework ca- pable for motion deblurring. In2024 IEEE International Conference on Image Processing (ICIP), pages 1019–1025. IEEE, 2024. 1

  5. [5]

    Diffusion models in vision: A survey

    Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, and Mubarak Shah. Diffusion models in vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 45(9):10850–10869, 2023. 2

  6. [6]

    Generative dif- fusion prior for unified image restoration and enhancement

    Ben Fei, Zhaoyang Lyu, Liang Pan, Junzhe Zhang, Weidong Yang, Tianyue Luo, Bo Zhang, and Bo Dai. Generative dif- fusion prior for unified image restoration and enhancement. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9935–9946, 2023. 3

  7. [7]

    Diffusion models in low-level vision: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

    Chunming He, Yuqi Shen, Chengyu Fang, Fengyang Xiao, Longxiang Tang, Yulun Zhang, Wangmeng Zuo, Zhenhua Guo, and Xiu Li. Diffusion models in low-level vision: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025. 2

  8. [8]

    Denoising diffu- sion probabilistic models.Advances in Neural Information Processing Systems, 33:6840–6851, 2020

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffu- sion probabilistic models.Advances in Neural Information Processing Systems, 33:6840–6851, 2020. 2

  9. [9]

    Dednet: Infrared and visible image fusion with noise removal by decomposition-driven network

    Jingxue Huang, Xiaosong Li, Haishu Tan, Lemiao Yang, Gao Wang, and Peng Yi. Dednet: Infrared and visible image fusion with noise removal by decomposition-driven network. Measurement, 237:115092, 2024. 1

  10. [10]

    An infrared and visible image fusion method based on multi-scale transfor- mation and norm optimization.Information Fusion, 71:109– 129, 2021

    Guofa Li, Yongjie Lin, and Xingda Qu. An infrared and visible image fusion method based on multi-scale transfor- mation and norm optimization.Information Fusion, 71:109– 129, 2021. 2

  11. [11]

    Joint medical image fusion, denoising and enhancement via discriminative low-rank sparse dictionaries learning.Pattern Recognition, 79:130–146, 2018

    Huafeng Li, Xiaoge He, Dapeng Tao, Yuanyan Tang, and Ruxin Wang. Joint medical image fusion, denoising and enhancement via discriminative low-rank sparse dictionaries learning.Pattern Recognition, 79:130–146, 2018. 1

  12. [12]

    Huafeng Li, Yueliang Cen, Yu Liu, Xun Chen, and Zhengtao Yu. Different input resolutions and arbitrary output resolu- tion: A meta learning-based deep framework for infrared and visible image fusion.IEEE Transactions on Image Process- ing, 30:4070–4083, 2021. 1

  13. [13]

    Huafeng Li, Zengyi Yang, Yafei Zhang, Wei Jia, Zheng- tao Yu, and Yu Liu. Mulfs-cap: Multimodal fusion- supervised cross-modality alignment perception for unreg- istered infrared-visible image fusion.IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(5):3673– 3690, 2025. 2

  14. [14]

    Infrared and visible image fusion via sparse representation and guided filtering in laplacian pyramid domain.Remote Sensing, 16 (20):3804, 2024

    Liangliang Li, Yan Shi, Ming Lv, Zhenhong Jia, Minqin Liu, Xiaobin Zhao, Xueyu Zhang, and Hongbing Ma. Infrared and visible image fusion via sparse representation and guided filtering in laplacian pyramid domain.Remote Sensing, 16 (20):3804, 2024. 2

  15. [15]

    Fusiondiff: Multi-focus image fusion using de- noising diffusion probabilistic models.Expert Systems with Applications, 238:121664, 2024

    Mining Li, Ronghao Pei, Tianyou Zheng, Yang Zhang, and Weiwei Fu. Fusiondiff: Multi-focus image fusion using de- noising diffusion probabilistic models.Expert Systems with Applications, 238:121664, 2024. 2

  16. [16]

    Joint image fu- sion and denoising via three-layer decomposition and sparse representation.Knowledge-Based Systems, 224:107087,

    Xiaosong Li, Fuqiang Zhou, and Haishu Tan. Joint image fu- sion and denoising via three-layer decomposition and sparse representation.Knowledge-Based Systems, 224:107087,

  17. [17]

    Contourlet residual for prompt learning enhanced infrared image super-resolution

    Xingyuan Li, Jinyuan Liu, Zhixin Chen, Yang Zou, Long Ma, Xin Fan, and Risheng Liu. Contourlet residual for prompt learning enhanced infrared image super-resolution. InEuropean Conference on Computer Vision, pages 270–

  18. [18]

    Difiisr: A diffu- sion model with gradient guidance for infrared image super- resolution

    Xingyuan Li, Zirui Wang, Yang Zou, Zhixin Chen, Jun Ma, Zhiying Jiang, Long Ma, and Jinyuan Liu. Difiisr: A diffu- sion model with gradient guidance for infrared image super- resolution. InProceedings of the Computer Vision and Pat- tern Recognition Conference, pages 7534–7544, 2025. 3

  19. [19]

    Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection

    Jinyuan Liu, Xin Fan, Zhanbo Huang, Guanyao Wu, Risheng Liu, Wei Zhong, and Zhongxuan Luo. Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5802–5811, 2022. 5

  20. [20]

    A lightweight pixel-level unified image fusion net- work.IEEE Transactions on Neural Networks and Learning Systems, 2023

    Jinyang Liu, Shutao Li, Haibo Liu, Renwei Dian, and Xiao- hui Wei. A lightweight pixel-level unified image fusion net- work.IEEE Transactions on Neural Networks and Learning Systems, 2023. 2

  21. [21]

    Coconet: Coupled con- trastive learning network with multi-level feature ensemble for multi-modality image fusion.International Journal of Computer Vision, 132(5):1748–1775, 2024

    Jinyuan Liu, Runjia Lin, Guanyao Wu, Risheng Liu, Zhongxuan Luo, and Xin Fan. Coconet: Coupled con- trastive learning network with multi-level feature ensemble for multi-modality image fusion.International Journal of Computer Vision, 132(5):1748–1775, 2024. 2

  22. [22]

    A task-guided, implicitly-searched and meta-initialized deep model for image fusion.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 46(10):6594–6609,

    Risheng Liu, Zhu Liu, Jinyuan Liu, Xin Fan, and Zhongxuan Luo. A task-guided, implicitly-searched and meta-initialized deep model for image fusion.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 46(10):6594–6609,

  23. [23]

    Simultaneous image fusion and denoising with adaptive sparse representation.IET Image Processing, 9(5):347–357, 2015

    Yu Liu and Zengfu Wang. Simultaneous image fusion and denoising with adaptive sparse representation.IET Image Processing, 9(5):347–357, 2015. 2

  24. [24]

    A general frame- work for image fusion based on multi-scale transform and sparse representation.Information Fusion, 24:147–164,

    Yu Liu, Shuping Liu, and Zengfu Wang. A general frame- work for image fusion based on multi-scale transform and sparse representation.Information Fusion, 24:147–164,

  25. [25]

    Image fusion with convolutional sparse representation.IEEE Signal Processing Letters, 23(12):1882–1886, 2016

    Yu Liu, Xun Chen, Rabab K Ward, and Z Jane Wang. Image fusion with convolutional sparse representation.IEEE Signal Processing Letters, 23(12):1882–1886, 2016. 2

  26. [26]

    Multi-focus image fusion with a deep convolutional neural network.In- formation Fusion, 36:191–207, 2017

    Yu Liu, Xun Chen, Hu Peng, and Zengfu Wang. Multi-focus image fusion with a deep convolutional neural network.In- formation Fusion, 36:191–207, 2017. 2

  27. [27]

    Glioma segmentation-oriented multi-modal mr image fusion with adversarial learning.IEEE/CAA Journal of Automatica Sinica, 9(8):1528–1531, 2022

    Yu Liu, Yu Shi, Fuhao Mu, Juan Cheng, and Xun Chen. Glioma segmentation-oriented multi-modal mr image fusion with adversarial learning.IEEE/CAA Journal of Automatica Sinica, 9(8):1528–1531, 2022. 2

  28. [28]

    Mm-net: A mixformer-based multi-scale network for anatomical and functional image fusion.IEEE Transactions on Image Processing, 33:2197–2212, 2024

    Yu Liu, Chen Yu, Juan Cheng, Z Jane Wang, and Xun Chen. Mm-net: A mixformer-based multi-scale network for anatomical and functional image fusion.IEEE Transactions on Image Processing, 33:2197–2212, 2024. 2

  29. [29]

    Ddcgan: A dual-discriminator conditional gen- erative adversarial network for multi-resolution image fu- sion.IEEE Transactions on Image Processing, 29:4980– 4995, 2020

    Jiayi Ma, Han Xu, Junjun Jiang, Xiaoguang Mei, and Xiao- Ping Zhang. Ddcgan: A dual-discriminator conditional gen- erative adversarial network for multi-resolution image fu- sion.IEEE Transactions on Image Processing, 29:4980– 4995, 2020. 2

  30. [30]

    Dif-gan: A generative adversarial network with multi-scale attention and diffusion models for infrared-visible image fusion

    Chengyi Pan, Xiuliang Xi, Xin Jin, Huangqimei Zheng, Puming Wang, and Qiang Jiang. Dif-gan: A generative adversarial network with multi-scale attention and diffusion models for infrared-visible image fusion. In2024 IEEE In- ternational Symposium on Parallel and Distributed Process- ing with Applications (ISPA), pages 1960–1967. IEEE, 2024. 2

  31. [31]

    Vdmufusion: A versatile diffusion model-based unsuper- vised framework for image fusion.IEEE Transactions on Image Processing, 2024

    Yu Shi, Yu Liu, Juan Cheng, Z Jane Wang, and Xun Chen. Vdmufusion: A versatile diffusion model-based unsuper- vised framework for image fusion.IEEE Transactions on Image Processing, 2024. 2, 6

  32. [32]

    Yu Shi, Yu Liu, Juan Cheng, Huafeng Li, and Xun Chen. Semantic-guided diffusion sampling: A generalized strategy for enhancing object segmentation oriented multimodal im- age fusion.IEEE Journal of Selected Topics in Signal Pro- cessing, pages 1–13, 2025. 2

  33. [33]

    Drmf: Degradation-robust multi- modal image fusion via composable diffusion prior

    Linfeng Tang, Yuxin Deng, Xunpeng Yi, Qinglong Yan, Yix- uan Yuan, and Jiayi Ma. Drmf: Degradation-robust multi- modal image fusion via composable diffusion prior. InPro- ceedings of the 32nd ACM International Conference on Mul- timedia, pages 8546–8555, 2024. 2

  34. [34]

    Mask-difuser: A masked diffusion model for unified unsupervised image fu- sion.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

    Linfeng Tang, Chunyu Li, and Jiayi Ma. Mask-difuser: A masked diffusion model for unified unsupervised image fu- sion.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025. 6

  35. [35]

    Lidia: Lightweight learned image denoising with instance adaptation

    Gregory Vaksman, Michael Elad, and Peyman Milanfar. Lidia: Lightweight learned image denoising with instance adaptation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 524–525, 2020. 6

  36. [36]

    Rap-sr: restoration prior enhance- ment in diffusion models for realistic image super-resolution

    Jiangang Wang, Qingnan Fan, Jinwei Chen, Hong Gu, Feng Huang, and Wenqi Ren. Rap-sr: restoration prior enhance- ment in diffusion models for realistic image super-resolution. InProceedings of the AAAI Conference on Artificial Intelli- gence, pages 7727–7735, 2025. 3

  37. [37]

    Uud-fusion: An unsupervised universal image fusion approach via generative diffusion model.Com- puter Vision and Image Understanding, 249:104218, 2024

    Xiangxiang Wang, Lixing Fang, Junli Zhao, Zhenkuan Pan, Hui Li, and Yi Li. Uud-fusion: An unsupervised universal image fusion approach via generative diffusion model.Com- puter Vision and Image Understanding, 249:104218, 2024. 2

  38. [38]

    Zero-shot image restora- tion using denoising diffusion null-space model.arXiv preprint arXiv:2212.00490,

    Yinhuai Wang, Jiwen Yu, and Jian Zhang. Zero-shot im- age restoration using denoising diffusion null-space model. arXiv preprint arXiv:2212.00490, 2022. 3, 4, 5

  39. [39]

    Sinsr: diffusion-based image super- resolution in a single step

    Yufei Wang, Wenhan Yang, Xinyuan Chen, Yaohui Wang, Lanqing Guo, Lap-Pui Chau, Ziwei Liu, Yu Qiao, Alex C Kot, and Bihan Wen. Sinsr: diffusion-based image super- resolution in a single step. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 25796–25805, 2024. 3

  40. [40]

    Unfusion: A unified multi-scale densely connected network for infrared and visible image fusion

    Zhishe Wang, Junyao Wang, Yuanyuan Wu, Jiawei Xu, and Xiaoqin Zhang. Unfusion: A unified multi-scale densely connected network for infrared and visible image fusion. IEEE Transactions on Circuits and Systems for Video Tech- nology, 32(6):3360–3374, 2021. 2

  41. [41]

    Dr2: Diffusion-based robust degradation remover for blind face restoration

    Zhixin Wang, Ziying Zhang, Xiaoyun Zhang, Huangjie Zheng, Mingyuan Zhou, Ya Zhang, and Yanfeng Wang. Dr2: Diffusion-based robust degradation remover for blind face restoration. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1704– 1713, 2023. 3

  42. [42]

    Efficient rectified flow for image fusion.arXiv preprint arXiv:2509.16549, 2025

    Zirui Wang, Jiayi Zhang, Tianwei Guan, Yuhan Zhou, Xingyuan Li, Minjing Dong, and Jinyuan Liu. Efficient recti- fied flow for image fusion.arXiv preprint arXiv:2509.16549,

  43. [43]

    Diffir: Efficient diffusion model for image restoration

    Bin Xia, Yulun Zhang, Shiyin Wang, Yitong Wang, Xing- long Wu, Yapeng Tian, Wenming Yang, and Luc Van Gool. Diffir: Efficient diffusion model for image restoration. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13095–13105, 2023. 3

  44. [44]

    Fusionmamba: Dynamic feature enhancement for mul- timodal image fusion with mamba.Visual Intelligence, 2(1): 37, 2024

    Xinyu Xie, Yawen Cui, Tao Tan, Xubin Zheng, and Zitong Yu. Fusionmamba: Dynamic feature enhancement for mul- timodal image fusion with mamba.Visual Intelligence, 2(1): 37, 2024. 2

  45. [45]

    Emfusion: An unsupervised enhanced medical image fusion network.Information Fusion, 76:177– 186, 2021

    Han Xu and Jiayi Ma. Emfusion: An unsupervised enhanced medical image fusion network.Information Fusion, 76:177– 186, 2021. 2

  46. [46]

    U2fusion: A unified unsupervised image fusion net- work.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(1):502–518, 2020

    Han Xu, Jiayi Ma, Junjun Jiang, Xiaojie Guo, and Haibin Ling. U2fusion: A unified unsupervised image fusion net- work.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(1):502–518, 2020. 6

  47. [47]

    Murf: Mutually re- inforcing multi-modal image registration and fusion.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(10):12148–12166, 2023

    Han Xu, Jiteng Yuan, and Jiayi Ma. Murf: Mutually re- inforcing multi-modal image registration and fusion.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(10):12148–12166, 2023. 6

  48. [48]

    Simultaneous tri-modal medical image fusion and super- resolution using conditional diffusion model

    Yushen Xu, Xiaosong Li, Yuchan Jie, and Haishu Tan. Simultaneous tri-modal medical image fusion and super- resolution using conditional diffusion model. InInter- national Conference on Medical Image Computing and Computer-Assisted Intervention, pages 635–645. Springer,

  49. [49]

    Flexid-fuse: Flexible number of inputs multi-modal medical image fusion based on diffu- sion model.Expert Systems with Applications, page 128895,

    Yushen Xu, Xiaosong Li, Yuchun Wang, Xiaoqi Cheng, Huafeng Li, and Haishu Tan. Flexid-fuse: Flexible number of inputs multi-modal medical image fusion based on diffu- sion model.Expert Systems with Applications, page 128895,

  50. [50]

    Lfdt-fusion: A latent feature-guided diffu- sion transformer model for general image fusion.Informa- tion Fusion, 113:102639, 2025

    Bo Yang, Zhaohui Jiang, Dong Pan, Haoyang Yu, Gui Gui, and Weihua Gui. Lfdt-fusion: A latent feature-guided diffu- sion transformer model for general image fusion.Informa- tion Fusion, 113:102639, 2025. 2

  51. [51]

    Instruction-driven fusion of infrared–visible images: Tailor- ing for diverse downstream tasks.Information Fusion, 121: 103148, 2025

    Zengyi Yang, Yafei Zhang, Huafeng Li, and Yu Liu. Instruction-driven fusion of infrared–visible images: Tailor- ing for diverse downstream tasks.Information Fusion, 121: 103148, 2025. 2

  52. [52]

    Diff-if: Multi-modality image fusion via diffusion model with fusion knowledge prior.Information Fusion, 110:102450, 2024

    Xunpeng Yi, Linfeng Tang, Hao Zhang, Han Xu, and Ji- ayi Ma. Diff-if: Multi-modality image fusion via diffusion model with fusion knowledge prior.Information Fusion, 110:102450, 2024. 2

  53. [53]

    Text-if: Leveraging semantic text guidance for degradation-aware and interactive image fusion

    Xunpeng Yi, Han Xu, Hao Zhang, Linfeng Tang, and Ji- ayi Ma. Text-if: Leveraging semantic text guidance for degradation-aware and interactive image fusion. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 27026–27035, 2024. 2

  54. [54]

    Simultaneous im- age fusion and super-resolution using sparse representation

    Haitao Yin, Shutao Li, and Leyuan Fang. Simultaneous im- age fusion and super-resolution using sparse representation. Information Fusion, 14(3):229–240, 2013. 1

  55. [55]

    Dif-fusion: Toward high color fidelity in infrared and visible image fusion with diffusion models.IEEE Transactions on Image Processing, 32:5705–5720, 2023

    Jun Yue, Leyuan Fang, Shaobo Xia, Yue Deng, and Jiayi Ma. Dif-fusion: Toward high color fidelity in infrared and visible image fusion with diffusion models.IEEE Transactions on Image Processing, 32:5705–5720, 2023. 2

  56. [56]

    Restormer: Efficient transformer for high-resolution image restoration

    Syed Waqas Zamir, Aditya Arora, Salman Khan, Mu- nawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang. Restormer: Efficient transformer for high-resolution image restoration. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5728– 5739, 2022. 6

  57. [57]

    Text-difuse: An inter- active multi-modal image fusion framework based on text- modulated diffusion model.Advances in Neural Information Processing Systems, 37:39552–39572, 2024

    Hao Zhang, Lei Cao, and Jiayi Ma. Text-difuse: An inter- active multi-modal image fusion framework based on text- modulated diffusion model.Advances in Neural Information Processing Systems, 37:39552–39572, 2024. 6

  58. [58]

    Transcending the limit of local window: Advanced super-resolution transformer with adaptive token dictionary

    Leheng Zhang, Yawei Li, Xingyu Zhou, Xiaorui Zhao, and Shuhang Gu. Transcending the limit of local window: Advanced super-resolution transformer with adaptive token dictionary. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2856– 2865, 2024. 6

  59. [59]

    Ifcnn: A general image fusion framework based on convolutional neural network.Information Fusion, 54: 99–118, 2020

    Yu Zhang, Yu Liu, Peng Sun, Han Yan, Xiaolin Zhao, and Li Zhang. Ifcnn: A general image fusion framework based on convolutional neural network.Information Fusion, 54: 99–118, 2020. 6

  60. [60]

    Cddfuse: Correlation-driven dual-branch feature decompo- sition for multi-modality image fusion

    Zixiang Zhao, Haowen Bai, Jiangshe Zhang, Yulun Zhang, Shuang Xu, Zudi Lin, Radu Timofte, and Luc Van Gool. Cddfuse: Correlation-driven dual-branch feature decompo- sition for multi-modality image fusion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5906–5916, 2023. 2

  61. [61]

    Ddfm: denoising diffusion model for multi-modality image fusion

    Zixiang Zhao, Haowen Bai, Yuanzhi Zhu, Jiangshe Zhang, Shuang Xu, Yulun Zhang, Kai Zhang, Deyu Meng, Radu Timofte, and Luc Van Gool. Ddfm: denoising diffusion model for multi-modality image fusion. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion, pages 8082–8093, 2023. 2, 6

  62. [62]

    Equivariant multi-modality image fusion

    Zixiang Zhao, Haowen Bai, Jiangshe Zhang, Yulun Zhang, Kai Zhang, Shuang Xu, Dongdong Chen, Radu Timofte, and Luc Van Gool. Equivariant multi-modality image fusion. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 25912–25921, 2024. 2

  63. [63]

    A general spatial- frequency learning framework for multimodal image fusion

    Man Zhou, Jie Huang, Keyu Yan, Danfeng Hong, Xiuping Jia, Jocelyn Chanussot, and Chongyi Li. A general spatial- frequency learning framework for multimodal image fusion. IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 2024. 2

  64. [64]

    Mamba collaborative implicit neural representation for hyperspectral and multispectral remote sensing image fu- sion.IEEE Transactions on Geoscience and Remote Sensing,

    Chunyu Zhu, Shangqi Deng, Xuan Song, Yachao Li, and Qi Wang. Mamba collaborative implicit neural representation for hyperspectral and multispectral remote sensing image fu- sion.IEEE Transactions on Geoscience and Remote Sensing,

  65. [65]

    Diffusion Models A.1

    2 Degradation-Robust Fusion: An Efficient Degradation-Aware Diffusion Framework for Multimodal Image Fusion in Arbitrary Degradation Scenarios Supplementary Material A. Diffusion Models A.1. Denoising Diffusion Probabilistic Models Denoising Diffusion Probabilistic Models (DDPM) are a class of generative models that rely on a forward process of gradually ...