Uncertainty-aware Spatial-Frequency Registration and Fusion for Infrared and Visible Images

Haoyuan Xu; Jinyuan Liu; Jun Ma; Xingyuan Li; Xingyue Zhu; Yang Zou; Zhiying Jiang

arxiv: 2605.13049 · v1 · pith:6XCWGOOPnew · submitted 2026-05-13 · 💻 cs.CV

Uncertainty-aware Spatial-Frequency Registration and Fusion for Infrared and Visible Images

Xingyuan Li , Haoyuan Xu , Xingyue Zhu , Jun Ma , Yang Zou , Zhiying Jiang , Jinyuan Liu This is my paper

Pith reviewed 2026-05-14 20:28 UTC · model grok-4.3

classification 💻 cs.CV

keywords infrared visible fusionimage registrationuncertainty estimationfrequency domain alignmentthermal radiation consistencydeformation fieldmulti-scale fusion

0 comments

The pith

SFRF uses uncertainty estimates and thermal consistency to prevent cumulative errors when registering and fusing unregistered infrared and visible images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents the Spatial-Frequency Registration and Fusion framework to solve misalignment problems that degrade infrared-visible image fusion. It builds a multi-scale iterative registration process that applies uncertainty estimates at each stage to limit error buildup and refine deformation fields dynamically. Frequency-domain supervision through infrared thermal radiation consistency enforces global alignment of thermal distributions, after which a dual-branch fusion module combines spatial geometry with frequency information to produce the final images.

Core claim

SFRF constructs a Multi-scale Iterative Registration framework that iteratively refines the deformation field across scales while using uncertainty estimation to mitigate error accumulation, employs thermal radiation distribution consistency as a frequency-domain supervisory signal for global alignment, and then applies a Dual-branch Spatial-Frequency Fusion module to reconstruct images from the aligned spatial geometric features and frequency distribution information.

What carries the argument

Multi-scale Iterative Registration (MIR) that refines deformation fields with per-stage uncertainty estimates, guided by thermal radiation distribution consistency in the frequency domain, followed by Dual-branch Spatial-Frequency Fusion (DSFF) that merges spatial and frequency cues.

If this is right

Registration and fusion stages can operate jointly without separate coarse-to-fine steps that compound misalignment.
Frequency-domain consistency supervision improves global thermal feature alignment beyond purely spatial methods.
Fused outputs retain visual quality even when input images start with moderate geometric offsets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The uncertainty-guided refinement pattern could extend to other multi-modal registration tasks such as medical CT-MRI fusion where error accumulation also occurs.
Performance on datasets with controlled initial misalignment levels would directly test how well the dynamic error mitigation works.
Real-time implementations might reduce preprocessing requirements in surveillance or night-vision systems.

Load-bearing premise

Uncertainty estimates generated at each registration stage can be trusted to block cumulative errors without creating new artifacts or demanding heavy parameter tuning.

What would settle it

Run the fusion pipeline on unregistered test pairs with the uncertainty branch disabled and measure whether alignment error and visual artifacts increase compared with the full model.

Figures

Figures reproduced from arXiv: 2605.13049 by Haoyuan Xu, Jinyuan Liu, Jun Ma, Xingyuan Li, Xingyue Zhu, Yang Zou, Zhiying Jiang.

**Figure 1.** Figure 1: Vanilla methods often register the image directly, failing to account for the accumulated errors at coarse scales and severely [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: Overview of SFRF. Spatial-Frequency Registration and Fusion (SFRF) framework incorporates uncertainty estimation and infrared thermal radiation distribution consistency into a unified pipeline to handle the error accumulation for robust registration and fusion across both spatial and frequency domains. 3.1 Multi-scale Iterative Registration (MIR) As shown in [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Qualitative results of registration. The blue boxes are the difference maps, where darker colors indicate smaller differences [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative results of IVIF [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative ablation study of image registration. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

read the original abstract

Infrared and Visible Image Fusion (IVIF) has shown promise in visual tasks under challenging environments, but fusion under unregistered conditions faces inherent misalignments. Current studies to solve them either predict the deformation parameters coarse-to-fine (i.e., coarse registration and fine registration) or estimate the deformation fields in multi-scales for registration. Though straightforward, they overlook the cumulative errors in registration, which contaminate the fusion stage and severely deteriorate the resulting images. We introduce the Spatial-Frequency Registration and Fusion (SFRF) framework, which incorporates uncertainty estimation and infrared thermal radiation distribution consistency into a unified pipeline to handle the error accumulation for robust registration and fusion across both spatial and frequency domains. Specifically, SFRF constructs a Multi-scale Iterative Registration (MIR) framework that iteratively refines the deformation field across scales, leveraging uncertainty estimation at each stage to mitigate error accumulation and enhance alignment accuracy dynamically. To ensure the accurate alignment of infrared thermal distributions during registration, thermal radiation distribution consistency is employed as a frequency-domain supervisory signal, promoting global consistency in the frequency domain. Based on the spatial-frequency alignment, SFRF further adopts a Dual-branch Spatial-Frequency Fusion (DSFF) module, which incorporates spatial geometric features and frequency distribution information to reconstruct visually appealing images. SFRF achieves impressive performance across diverse datasets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SFRF adds uncertainty-weighted multi-scale registration and frequency-domain thermal consistency to unregistered IVIF, but the abstract gives no equations or ablations so the error-mitigation claim stays unverified.

read the letter

The paper's main move is to wrap multi-scale iterative registration in per-stage uncertainty estimates, add a frequency-domain consistency term that enforces thermal radiation distribution, and then feed the result into a dual-branch spatial-frequency fusion module. That combination is new as a single pipeline even if the pieces are familiar from prior IVIF work. It directly targets the practical problem that coarse-to-fine or multi-scale registration still lets early errors pollute the fusion stage, which matters for any downstream task that sees misaligned infrared and visible frames in the wild. The frequency supervision idea is sensible because thermal radiation has a fairly stable spectral signature that can act as a global anchor. If the full paper shows clean ablations and quantitative gains on standard unregistered datasets, this would be a useful engineering increment for people who actually deploy fusion in surveillance or robotics. The soft spot is exactly where the stress-test note points: the abstract says uncertainty is leveraged at each stage to mitigate accumulation, yet supplies no description of how the uncertainty is computed, how it gates or weights the deformation updates, or whether the estimator stays unbiased on progressively misaligned inputs. Without that link, the frequency consistency signal could simply be applied to an already corrupted alignment. The claim of “impressive performance across diverse datasets” is stated but not supported by numbers, error bars, or comparisons in the abstract, so the soundness rating stays low until the full text is checked. This is the kind of paper a fusion reading group should see once the equations and results are on the table. It is coherent on its own terms and engages the right literature, so a serious editor should send it to referees rather than desk-reject; the revisions will mainly need to make the uncertainty mechanism explicit and show that it does not introduce new artifacts.

Referee Report

3 major / 2 minor

Summary. The paper proposes the Spatial-Frequency Registration and Fusion (SFRF) framework to address misalignment in infrared-visible image fusion. It introduces a Multi-scale Iterative Registration (MIR) module that iteratively refines deformation fields while using per-stage uncertainty estimates to mitigate cumulative errors, employs infrared thermal radiation distribution consistency as a frequency-domain supervisory signal, and applies a Dual-branch Spatial-Frequency Fusion (DSFF) module to reconstruct the fused image from aligned spatial and frequency features. The abstract claims improved performance across diverse datasets.

Significance. If the central claims hold, the work offers a practical advance for unregistered IVIF by explicitly targeting error accumulation, a common failure mode in coarse-to-fine or multi-scale registration pipelines. The unified treatment of uncertainty, thermal-radiation consistency, and spatial-frequency fusion is a coherent design choice that could generalize to other multi-modal registration tasks. The absence of machine-checked proofs or fully parameter-free derivations is offset by the empirical focus, but reproducible code and detailed ablations would strengthen the contribution.

major comments (3)

[Section 3.2] MIR module (Section 3.2): the claim that uncertainty estimation at each iteration mitigates error accumulation is load-bearing yet lacks an explicit integration rule. No equation is given for the uncertainty-weighted loss, masked gradient flow, or scale-specific gating; it is therefore unclear whether the estimator remains unbiased when fed misaligned inputs from the prior stage.
[Section 3.3] Thermal radiation consistency (Section 3.3): the frequency-domain supervisory signal is introduced to enforce global alignment, but the manuscript provides no derivation or ablation showing that this term remains effective once registration errors have already propagated from earlier MIR stages.
[Section 4] Experimental validation (Section 4): the reported performance gains are presented without error bars, statistical significance tests, or ablations that isolate the contribution of the uncertainty mechanism versus the thermal consistency term; this weakens the causal link between the proposed modules and the claimed robustness.

minor comments (2)

[Abstract] Abstract: the phrase 'impressive performance across diverse datasets' is unsupported by any quantitative numbers or dataset identifiers.
[Section 3] Notation: the symbols for uncertainty maps and deformation fields are introduced without a consolidated table of definitions, making cross-section reading difficult.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments on our manuscript. We address each major comment point by point below and will incorporate the suggested clarifications and additional experiments in a major revision.

read point-by-point responses

Referee: [Section 3.2] MIR module (Section 3.2): the claim that uncertainty estimation at each iteration mitigates error accumulation is load-bearing yet lacks an explicit integration rule. No equation is given for the uncertainty-weighted loss, masked gradient flow, or scale-specific gating; it is therefore unclear whether the estimator remains unbiased when fed misaligned inputs from the prior stage.

Authors: We agree that the integration rule for uncertainty in the MIR module requires explicit formulation. In the revised manuscript we will add a new equation in Section 3.2 that defines the uncertainty-weighted registration loss at each scale, where the per-pixel uncertainty map modulates both the loss magnitude and the gradient flow to the deformation field update. We will also include a short analysis demonstrating that the estimator remains approximately unbiased under the coarse-to-fine schedule because uncertainty is recomputed from the current alignment residual at every iteration. revision: yes
Referee: [Section 3.3] Thermal radiation consistency (Section 3.3): the frequency-domain supervisory signal is introduced to enforce global alignment, but the manuscript provides no derivation or ablation showing that this term remains effective once registration errors have already propagated from earlier MIR stages.

Authors: We acknowledge the need for explicit validation of the thermal consistency term under residual misalignment. The revised version will contain a short derivation in Section 3.3 showing that the frequency-domain loss is computed on global spectral statistics and is therefore tolerant to small spatial residuals. We will also add an ablation study that injects controlled registration errors from the MIR stages and measures the incremental benefit of the consistency term on final fusion metrics. revision: yes
Referee: [Section 4] Experimental validation (Section 4): the reported performance gains are presented without error bars, statistical significance tests, or ablations that isolate the contribution of the uncertainty mechanism versus the thermal consistency term; this weakens the causal link between the proposed modules and the claimed robustness.

Authors: We agree that stronger statistical reporting is required. In the revision we will report mean and standard deviation (error bars) across five independent training runs for all quantitative metrics, perform paired t-tests to establish statistical significance of the reported gains, and provide expanded ablation tables that disable the uncertainty estimation, the thermal consistency term, and both components separately to isolate their individual contributions. revision: yes

Circularity Check

0 steps flagged

No significant circularity in SFRF derivation chain

full rationale

The paper introduces the SFRF framework with three distinct new modules: Multi-scale Iterative Registration (MIR) that applies uncertainty estimation to mitigate cumulative registration errors, thermal radiation distribution consistency as an external frequency-domain supervisory signal, and Dual-branch Spatial-Frequency Fusion (DSFF) that combines spatial and frequency features for reconstruction. None of these reduce to self-definition, fitted inputs renamed as predictions, or self-citation chains; the uncertainty estimator is presented as an added component rather than derived from the fusion output, and no equations or uniqueness theorems are shown to collapse by construction. The derivation remains additive and externally motivated by the problem of unregistered IVIF inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The abstract introduces new modules (MIR, DSFF) and supervisory signals without specifying their internal parameters or proving their independence from fitted quantities.

axioms (2)

domain assumption Deformation fields can be iteratively refined across scales while uncertainty estimates remain meaningful at each scale
Invoked by the Multi-scale Iterative Registration framework
domain assumption Infrared thermal radiation distributions provide a reliable global consistency signal in the frequency domain
Used as supervisory signal for alignment

invented entities (2)

Uncertainty estimation at each registration stage no independent evidence
purpose: To dynamically mitigate cumulative registration errors
New component introduced to address error accumulation
Dual-branch Spatial-Frequency Fusion module no independent evidence
purpose: To reconstruct images from aligned spatial geometric and frequency distribution features
New fusion architecture proposed after registration

pith-pipeline@v0.9.0 · 5550 in / 1474 out tokens · 38984 ms · 2026-05-14T20:28:07.943515+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Multi-scale Iterative Registration (MIR) … leveraging uncertainty estimation at each stage … thermal radiation distribution consistency … KL divergence on normalized magnitude spectra
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean J_uniquely_calibrated_via_higher_derivative unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

uncertainty-weighted reconstruction term L_u1 … uncertainty regularization L_u2 … MSF … ω(i) = exp(−β σ_i ⊙ R(i,k))

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages

[1]

Unsupervised multi-modal im- age registration via geometry preserving image-to-image transla- tion

[Araret al., 2020 ] Moab Arar, Yiftach Ginger, Dov Danon, Amit H Bermano, and Daniel Cohen-Or. Unsupervised multi-modal im- age registration via geometry preserving image-to-image transla- tion. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13410–13419,

work page 2020
[2]

Refusion: Learning image fusion from reconstruction with learnable loss via meta-learning.International Journal of Com- puter Vision, pages 1–21,

[Baiet al., 2024 ] Haowen Bai, Zixiang Zhao, Jiangshe Zhang, Yichen Wu, Lilun Deng, Yukun Cui, Baisong Jiang, and Shuang Xu. Refusion: Learning image fusion from reconstruction with learnable loss via meta-learning.International Journal of Com- puter Vision, pages 1–21,

work page 2024
[3]

Beauchemin and John L

[Beauchemin and Barron, 1995] Steven S. Beauchemin and John L. Barron. The computation of optical flow.ACM computing sur- veys (CSUR), 27(3):433–466,

work page 1995
[4]

Large displacement optical flow: descriptor matching in variational mo- tion estimation.IEEE transactions on pattern analysis and ma- chine intelligence, 33(3):500–513,

[Brox and Malik, 2010] Thomas Brox and Jitendra Malik. Large displacement optical flow: descriptor matching in variational mo- tion estimation.IEEE transactions on pattern analysis and ma- chine intelligence, 33(3):500–513,

work page 2010
[5]

Optical flow constraints on deformable models with applications to face tracking.International Journal of Computer Vision, 38(2):99–127,

[Decarlo and Metaxas, 2000] Douglas Decarlo and Dimitris Metaxas. Optical flow constraints on deformable models with applications to face tracking.International Journal of Computer Vision, 38(2):99–127,

work page 2000
[6]

Unireplknet: A universal perception large-kernel convnet for audio video point cloud time-series and image recognition

[Dinget al., 2024 ] Xiaohan Ding, Yiyuan Zhang, Yixiao Ge, Sijie Zhao, Lin Song, Xiangyu Yue, and Ying Shan. Unireplknet: A universal perception large-kernel convnet for audio video point cloud time-series and image recognition. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition, pages 5513–5524,

work page 2024
[7]

Domain adaptation guided infrared and visible image fusion

[Guanet al., 2026 ] Tianwei Guan, Haozhen Wei, Yuhan Zhou, Jun Ma, Zecheng Xu, Zhiying Jiang, Jinyuan Liu, and Xingyuan Li. Domain adaptation guided infrared and visible image fusion. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 4376–4384,

work page 2026
[8]

Deep residual learning for image recognition

[Heet al., 2016 ] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InPro- ceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778,

work page 2016
[9]

Vif-net: An unsupervised framework for infrared and visible image fu- sion.IEEE Transactions on Computational Imaging, 6:640–651,

[Houet al., 2020 ] Ruichao Hou, Dongming Zhou, Rencan Nie, Dong Liu, Lei Xiong, Yanbu Guo, and Chuanbo Yu. Vif-net: An unsupervised framework for infrared and visible image fu- sion.IEEE Transactions on Computational Imaging, 6:640–651,

work page 2020
[10]

Sfdfusion: An efficient spatial-frequency domain fusion network for infrared and visible image fusion

[Huet al., 2024 ] Kun Hu, Qingle Zhang, Maoxun Yuan, and Yitian Zhang. Sfdfusion: An efficient spatial-frequency domain fusion network for infrared and visible image fusion. InECAI 2024, pages 482–489. IOS Press,

work page 2024
[11]

Reconet: Recurrent cor- rection network for fast and efficient multi-modality image fu- sion

[Huanget al., 2022 ] Zhanbo Huang, Jinyuan Liu, Xin Fan, Risheng Liu, Wei Zhong, and Zhongxuan Luo. Reconet: Recurrent cor- rection network for fast and efficient multi-modality image fu- sion. InEuropean conference on computer Vision, pages 539–

work page 2022
[12]

Seg- ment anything

[Kirillovet al., 2023 ] Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. Seg- ment anything. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 4015–4026,

work page 2023
[13]

Attentionfgan: Infrared and visible image fusion us- ing attention-based generative adversarial networks.IEEE Trans- actions on Multimedia, 23:1383–1396,

[Liet al., 2020 ] Jing Li, Hongtao Huo, Chang Li, Renhua Wang, and Qi Feng. Attentionfgan: Infrared and visible image fusion us- ing attention-based generative adversarial networks.IEEE Trans- actions on Multimedia, 23:1383–1396,

work page 2020
[14]

From text to pix- els: A context-aware semantic synergy solution for infrared and visible image fusion.arXiv preprint arXiv:2401.00421,

[Liet al., 2023 ] Xingyuan Li, Yang Zou, Jinyuan Liu, Zhiying Jiang, Long Ma, Xin Fan, and Risheng Liu. From text to pix- els: A context-aware semantic synergy solution for infrared and visible image fusion.arXiv preprint arXiv:2401.00421,

work page arXiv 2023
[15]

Bsafusion: A bidirectional stepwise feature alignment network for unaligned medical image fusion.arXiv preprint arXiv:2412.08050,

[Liet al., 2024b ] Huafeng Li, Dayong Su, Qing Cai, and Yafei Zhang. Bsafusion: A bidirectional stepwise feature alignment network for unaligned medical image fusion.arXiv preprint arXiv:2412.08050,

work page arXiv
[16]

In-loop filtering via trained look-up tables

[Liet al., 2024c ] Zhuoyuan Li, Jiacheng Li, Yao Li, Li Li, Dong Liu, and Feng Wu. In-loop filtering via trained look-up tables. In 2024 IEEE International Conference on Visual Communications and Image Processing (VCIP), pages 1–5,

work page 2024
[17]

Difiisr: A diffusion model with gradient guidance for infrared image super- resolution

[Liet al., 2025 ] Xingyuan Li, Zirui Wang, Yang Zou, Zhixin Chen, Jun Ma, Zhiying Jiang, Long Ma, and Jinyuan Liu. Difiisr: A diffusion model with gradient guidance for infrared image super- resolution. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 7534–7544,

work page 2025
[18]

Unifusion: A unified image fusion framework with robust representation and source-aware preservation.arXiv preprint arXiv:2603.14214,

[Liet al., 2026 ] Xingyuan Li, Songcheng Du, Yang Zou, HaoYuan Xu, Zhiying Jiang, and Jinyuan Liu. Unifusion: A unified image fusion framework with robust representation and source-aware preservation.arXiv preprint arXiv:2603.14214,

work page arXiv 2026
[19]

Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection

[Liuet al., 2022 ] Jinyuan Liu, Xin Fan, Zhanbo Huang, Guanyao Wu, Risheng Liu, Wei Zhong, and Zhongxuan Luo. Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5802–5811,

work page 2022
[20]

Multi- interactive feature learning and a full-time multi-modality bench- mark for image fusion and segmentation

[Liuet al., 2023 ] Jinyuan Liu, Zhu Liu, Guanyao Wu, Long Ma, Risheng Liu, Wei Zhong, Zhongxuan Luo, and Xin Fan. Multi- interactive feature learning and a full-time multi-modality bench- mark for image fusion and segmentation. InProceedings of the IEEE/CVF international conference on computer vision, pages 8115–8124,

work page 2023
[21]

Bridging human evaluation to infrared and visible image fusion.arXiv preprint arXiv:2603.03871,

[Liuet al., 2026 ] Jinyuan Liu, Xingyuan Li, Qingyun Mei, Haoyuan Xu, Zhiying Jiang, Long Ma, Risheng Liu, and Xin Fan. Bridging human evaluation to infrared and visible image fusion.arXiv preprint arXiv:2603.03871,

work page arXiv 2026
[22]

Fully convolutional networks for semantic segmenta- tion

[Longet al., 2015 ] Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmenta- tion. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 3431–3440,

work page 2015
[23]

Multi-focus image fusion using hosvd and edge intensity.Journal of Visual Communication and Image Representation, 45:46–61,

[Luoet al., 2017 ] Xiaoqing Luo, Zhancheng Zhang, Cuiying Zhang, and Xiaojun Wu. Multi-focus image fusion using hosvd and edge intensity.Journal of Visual Communication and Image Representation, 45:46–61,

work page 2017
[24]

Infrared and visible image fusion methods and applications: A survey.Infor- mation fusion, 45:153–178,

[Maet al., 2019 ] Jiayi Ma, Yong Ma, and Chang Li. Infrared and visible image fusion methods and applications: A survey.Infor- mation fusion, 45:153–178,

work page 2019
[25]

Infrared and visible image fusion based on varia- tional auto-encoder and infrared feature compensation.Infrared Physics & Technology, 117:103839,

[Renet al., 2021 ] Long Ren, Zhibin Pan, Jianzhong Cao, and Ji- awen Liao. Infrared and visible image fusion based on varia- tional auto-encoder and infrared feature compensation.Infrared Physics & Technology, 117:103839,

work page 2021
[26]

The tno multiband image data collec- tion.Data in brief, 15:249,

[Toet, 2017] Alexander Toet. The tno multiband image data collec- tion.Data in brief, 15:249,

work page 2017
[27]

Glu-net: Global-local universal network for dense flow and correspondences

[Truonget al., 2020 ] Prune Truong, Martin Danelljan, and Radu Timofte. Glu-net: Global-local universal network for dense flow and correspondences. InProceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition, pages 6258– 6268,

work page 2020
[28]

Deep- flash: An efficient network for learning-based medical image reg- istration

[Wang and Zhang, 2020] Jian Wang and Miaomiao Zhang. Deep- flash: An efficient network for learning-based medical image reg- istration. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 4444–4452,

work page 2020
[29]

Unsupervised misaligned infrared and visible image fu- sion via cross-modality image generation and registration.arXiv preprint arXiv:2205.11876,

[Wanget al., 2022 ] Di Wang, Jinyuan Liu, Xin Fan, and Risheng Liu. Unsupervised misaligned infrared and visible image fu- sion via cross-modality image generation and registration.arXiv preprint arXiv:2205.11876,

work page arXiv 2022
[30]

Improving misaligned multi-modality image fusion with one-stage progressive dense registration.IEEE Transactions on Circuits and Systems for Video Technology,

[Wanget al., 2024 ] Di Wang, Jinyuan Liu, Long Ma, Risheng Liu, and Xin Fan. Improving misaligned multi-modality image fusion with one-stage progressive dense registration.IEEE Transactions on Circuits and Systems for Video Technology,

work page 2024
[31]

Efficient rectified flow for image fusion.Advances in Neural Information Processing Systems,

[Wanget al., 2025 ] Zirui Wang, Jiayi Zhang, Tianwei Guan, Yuhan Zhou, Xingyuan Li, Minjing Dong, and Jinyuan Liu. Efficient rectified flow for image fusion.Advances in Neural Information Processing Systems,

work page 2025
[32]

Incorporat- ing degradation estimation in light field spatial super-resolution

[Xiao and Xiong, 2025] Zeyu Xiao and Zhiwei Xiong. Incorporat- ing degradation estimation in light field spatial super-resolution. Computer Vision and Image Understanding, 252:104295,

work page 2025
[33]

Occlusion-embedded hybrid transformer for light field super- resolution

[Xiaoet al., 2025 ] Zeyu Xiao, Zhuoyuan Li, and Wei Jia. Occlusion-embedded hybrid transformer for light field super- resolution. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 8700–8708,

work page 2025
[34]

Learning dual modality interactions for event-based motion deblurring.IEEE Transactions on Multi- media,

[Xiaoet al., 2026 ] Zeyu Xiao, Zhuoyuan Li, Yang Zhao, Yu Liu, Zhao Zhang, and Wei Jia. Learning dual modality interactions for event-based motion deblurring.IEEE Transactions on Multi- media,

work page 2026
[35]

U2fusion: A unified unsupervised image fusion network.IEEE transactions on pattern analysis and machine intelligence, 44(1):502–518,

[Xuet al., 2020 ] Han Xu, Jiayi Ma, Junjun Jiang, Xiaojie Guo, and Haibin Ling. U2fusion: A unified unsupervised image fusion network.IEEE transactions on pattern analysis and machine intelligence, 44(1):502–518,

work page 2020
[36]

Murf: Mu- tually reinforcing multi-modal image registration and fusion

[Xuet al., 2023 ] Han Xu, Jiteng Yuan, and Jiayi Ma. Murf: Mu- tually reinforcing multi-modal image registration and fusion. IEEE transactions on pattern analysis and machine intelligence, 45(10):12148–12166,

work page 2023
[37]

A multiscale framework with unsu- pervised learning for remote sensing image registration.IEEE Transactions on Geoscience and Remote Sensing, 60:1–15,

[Yeet al., 2022 ] Yuanxin Ye, Tengfeng Tang, Bai Zhu, Chao Yang, Bo Li, and Siyuan Hao. A multiscale framework with unsu- pervised learning for remote sensing image registration.IEEE Transactions on Geoscience and Remote Sensing, 60:1–15,

work page 2022
[38]

Restormer: Efficient transformer for high-resolution image restoration

[Zamiret al., 2022 ] Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang. Restormer: Efficient transformer for high-resolution image restoration. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 5728–5739,

work page 2022
[39]

Mrfs: Mutually reinforcing image fusion and segmentation

[Zhanget al., 2024 ] Hao Zhang, Xuhui Zuo, Jie Jiang, Chunchao Guo, and Jiayi Ma. Mrfs: Mutually reinforcing image fusion and segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 26974–26983,

work page 2024
[40]

Didfuse: Deep im- age decomposition for infrared and visible image fusion.arXiv preprint arXiv:2003.09210,

[Zhaoet al., 2020 ] Zixiang Zhao, Shuang Xu, Chunxia Zhang, Jun- min Liu, Pengfei Li, and Jiangshe Zhang. Didfuse: Deep im- age decomposition for infrared and visible image fusion.arXiv preprint arXiv:2003.09210,

work page arXiv 2020
[41]

Cddfuse: Correlation-driven dual-branch feature de- composition for multi-modality image fusion

[Zhaoet al., 2023 ] Zixiang Zhao, Haowen Bai, Jiangshe Zhang, Yulun Zhang, Shuang Xu, Zudi Lin, Radu Timofte, and Luc Van Gool. Cddfuse: Correlation-driven dual-branch feature de- composition for multi-modality image fusion. InProceedings of the IEEE/CVF conference on computer vision and pattern recog- nition, pages 5906–5916,

work page 2023
[42]

Spike cam- era optical flow estimation based on continuous spike streams

[Zhaoet al., 2026 ] Rui Zhao, Ruiqin Xiong, Dongkai Wang, Shiyu Xuan, Jian Zhang, Xiaopeng Fan, and Tiejun Huang. Spike cam- era optical flow estimation based on continuous spike streams. IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 48(4):4756–4770,

work page 2026
[43]

Probing syner- gistic high-order interaction in infrared and visible image fusion

[Zhenget al., 2024 ] Naishan Zheng, Man Zhou, Jie Huang, Jun- ming Hou, Haoying Li, Yuan Xu, and Feng Zhao. Probing syner- gistic high-order interaction in infrared and visible image fusion. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 26384–26395, 2024

work page 2024

[1] [1]

Unsupervised multi-modal im- age registration via geometry preserving image-to-image transla- tion

[Araret al., 2020 ] Moab Arar, Yiftach Ginger, Dov Danon, Amit H Bermano, and Daniel Cohen-Or. Unsupervised multi-modal im- age registration via geometry preserving image-to-image transla- tion. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13410–13419,

work page 2020

[2] [2]

Refusion: Learning image fusion from reconstruction with learnable loss via meta-learning.International Journal of Com- puter Vision, pages 1–21,

[Baiet al., 2024 ] Haowen Bai, Zixiang Zhao, Jiangshe Zhang, Yichen Wu, Lilun Deng, Yukun Cui, Baisong Jiang, and Shuang Xu. Refusion: Learning image fusion from reconstruction with learnable loss via meta-learning.International Journal of Com- puter Vision, pages 1–21,

work page 2024

[3] [3]

Beauchemin and John L

[Beauchemin and Barron, 1995] Steven S. Beauchemin and John L. Barron. The computation of optical flow.ACM computing sur- veys (CSUR), 27(3):433–466,

work page 1995

[4] [4]

Large displacement optical flow: descriptor matching in variational mo- tion estimation.IEEE transactions on pattern analysis and ma- chine intelligence, 33(3):500–513,

[Brox and Malik, 2010] Thomas Brox and Jitendra Malik. Large displacement optical flow: descriptor matching in variational mo- tion estimation.IEEE transactions on pattern analysis and ma- chine intelligence, 33(3):500–513,

work page 2010

[5] [5]

Optical flow constraints on deformable models with applications to face tracking.International Journal of Computer Vision, 38(2):99–127,

[Decarlo and Metaxas, 2000] Douglas Decarlo and Dimitris Metaxas. Optical flow constraints on deformable models with applications to face tracking.International Journal of Computer Vision, 38(2):99–127,

work page 2000

[6] [6]

Unireplknet: A universal perception large-kernel convnet for audio video point cloud time-series and image recognition

[Dinget al., 2024 ] Xiaohan Ding, Yiyuan Zhang, Yixiao Ge, Sijie Zhao, Lin Song, Xiangyu Yue, and Ying Shan. Unireplknet: A universal perception large-kernel convnet for audio video point cloud time-series and image recognition. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition, pages 5513–5524,

work page 2024

[7] [7]

Domain adaptation guided infrared and visible image fusion

[Guanet al., 2026 ] Tianwei Guan, Haozhen Wei, Yuhan Zhou, Jun Ma, Zecheng Xu, Zhiying Jiang, Jinyuan Liu, and Xingyuan Li. Domain adaptation guided infrared and visible image fusion. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 4376–4384,

work page 2026

[8] [8]

Deep residual learning for image recognition

[Heet al., 2016 ] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InPro- ceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778,

work page 2016

[9] [9]

Vif-net: An unsupervised framework for infrared and visible image fu- sion.IEEE Transactions on Computational Imaging, 6:640–651,

[Houet al., 2020 ] Ruichao Hou, Dongming Zhou, Rencan Nie, Dong Liu, Lei Xiong, Yanbu Guo, and Chuanbo Yu. Vif-net: An unsupervised framework for infrared and visible image fu- sion.IEEE Transactions on Computational Imaging, 6:640–651,

work page 2020

[10] [10]

Sfdfusion: An efficient spatial-frequency domain fusion network for infrared and visible image fusion

[Huet al., 2024 ] Kun Hu, Qingle Zhang, Maoxun Yuan, and Yitian Zhang. Sfdfusion: An efficient spatial-frequency domain fusion network for infrared and visible image fusion. InECAI 2024, pages 482–489. IOS Press,

work page 2024

[11] [11]

Reconet: Recurrent cor- rection network for fast and efficient multi-modality image fu- sion

[Huanget al., 2022 ] Zhanbo Huang, Jinyuan Liu, Xin Fan, Risheng Liu, Wei Zhong, and Zhongxuan Luo. Reconet: Recurrent cor- rection network for fast and efficient multi-modality image fu- sion. InEuropean conference on computer Vision, pages 539–

work page 2022

[12] [12]

Seg- ment anything

[Kirillovet al., 2023 ] Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. Seg- ment anything. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 4015–4026,

work page 2023

[13] [13]

Attentionfgan: Infrared and visible image fusion us- ing attention-based generative adversarial networks.IEEE Trans- actions on Multimedia, 23:1383–1396,

[Liet al., 2020 ] Jing Li, Hongtao Huo, Chang Li, Renhua Wang, and Qi Feng. Attentionfgan: Infrared and visible image fusion us- ing attention-based generative adversarial networks.IEEE Trans- actions on Multimedia, 23:1383–1396,

work page 2020

[14] [14]

From text to pix- els: A context-aware semantic synergy solution for infrared and visible image fusion.arXiv preprint arXiv:2401.00421,

[Liet al., 2023 ] Xingyuan Li, Yang Zou, Jinyuan Liu, Zhiying Jiang, Long Ma, Xin Fan, and Risheng Liu. From text to pix- els: A context-aware semantic synergy solution for infrared and visible image fusion.arXiv preprint arXiv:2401.00421,

work page arXiv 2023

[15] [15]

Bsafusion: A bidirectional stepwise feature alignment network for unaligned medical image fusion.arXiv preprint arXiv:2412.08050,

[Liet al., 2024b ] Huafeng Li, Dayong Su, Qing Cai, and Yafei Zhang. Bsafusion: A bidirectional stepwise feature alignment network for unaligned medical image fusion.arXiv preprint arXiv:2412.08050,

work page arXiv

[16] [16]

In-loop filtering via trained look-up tables

[Liet al., 2024c ] Zhuoyuan Li, Jiacheng Li, Yao Li, Li Li, Dong Liu, and Feng Wu. In-loop filtering via trained look-up tables. In 2024 IEEE International Conference on Visual Communications and Image Processing (VCIP), pages 1–5,

work page 2024

[17] [17]

Difiisr: A diffusion model with gradient guidance for infrared image super- resolution

[Liet al., 2025 ] Xingyuan Li, Zirui Wang, Yang Zou, Zhixin Chen, Jun Ma, Zhiying Jiang, Long Ma, and Jinyuan Liu. Difiisr: A diffusion model with gradient guidance for infrared image super- resolution. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 7534–7544,

work page 2025

[18] [18]

Unifusion: A unified image fusion framework with robust representation and source-aware preservation.arXiv preprint arXiv:2603.14214,

[Liet al., 2026 ] Xingyuan Li, Songcheng Du, Yang Zou, HaoYuan Xu, Zhiying Jiang, and Jinyuan Liu. Unifusion: A unified image fusion framework with robust representation and source-aware preservation.arXiv preprint arXiv:2603.14214,

work page arXiv 2026

[19] [19]

Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection

[Liuet al., 2022 ] Jinyuan Liu, Xin Fan, Zhanbo Huang, Guanyao Wu, Risheng Liu, Wei Zhong, and Zhongxuan Luo. Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5802–5811,

work page 2022

[20] [20]

Multi- interactive feature learning and a full-time multi-modality bench- mark for image fusion and segmentation

[Liuet al., 2023 ] Jinyuan Liu, Zhu Liu, Guanyao Wu, Long Ma, Risheng Liu, Wei Zhong, Zhongxuan Luo, and Xin Fan. Multi- interactive feature learning and a full-time multi-modality bench- mark for image fusion and segmentation. InProceedings of the IEEE/CVF international conference on computer vision, pages 8115–8124,

work page 2023

[21] [21]

Bridging human evaluation to infrared and visible image fusion.arXiv preprint arXiv:2603.03871,

[Liuet al., 2026 ] Jinyuan Liu, Xingyuan Li, Qingyun Mei, Haoyuan Xu, Zhiying Jiang, Long Ma, Risheng Liu, and Xin Fan. Bridging human evaluation to infrared and visible image fusion.arXiv preprint arXiv:2603.03871,

work page arXiv 2026

[22] [22]

Fully convolutional networks for semantic segmenta- tion

[Longet al., 2015 ] Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmenta- tion. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 3431–3440,

work page 2015

[23] [23]

Multi-focus image fusion using hosvd and edge intensity.Journal of Visual Communication and Image Representation, 45:46–61,

[Luoet al., 2017 ] Xiaoqing Luo, Zhancheng Zhang, Cuiying Zhang, and Xiaojun Wu. Multi-focus image fusion using hosvd and edge intensity.Journal of Visual Communication and Image Representation, 45:46–61,

work page 2017

[24] [24]

Infrared and visible image fusion methods and applications: A survey.Infor- mation fusion, 45:153–178,

[Maet al., 2019 ] Jiayi Ma, Yong Ma, and Chang Li. Infrared and visible image fusion methods and applications: A survey.Infor- mation fusion, 45:153–178,

work page 2019

[25] [25]

Infrared and visible image fusion based on varia- tional auto-encoder and infrared feature compensation.Infrared Physics & Technology, 117:103839,

[Renet al., 2021 ] Long Ren, Zhibin Pan, Jianzhong Cao, and Ji- awen Liao. Infrared and visible image fusion based on varia- tional auto-encoder and infrared feature compensation.Infrared Physics & Technology, 117:103839,

work page 2021

[26] [26]

The tno multiband image data collec- tion.Data in brief, 15:249,

[Toet, 2017] Alexander Toet. The tno multiband image data collec- tion.Data in brief, 15:249,

work page 2017

[27] [27]

Glu-net: Global-local universal network for dense flow and correspondences

[Truonget al., 2020 ] Prune Truong, Martin Danelljan, and Radu Timofte. Glu-net: Global-local universal network for dense flow and correspondences. InProceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition, pages 6258– 6268,

work page 2020

[28] [28]

Deep- flash: An efficient network for learning-based medical image reg- istration

[Wang and Zhang, 2020] Jian Wang and Miaomiao Zhang. Deep- flash: An efficient network for learning-based medical image reg- istration. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 4444–4452,

work page 2020

[29] [29]

Unsupervised misaligned infrared and visible image fu- sion via cross-modality image generation and registration.arXiv preprint arXiv:2205.11876,

[Wanget al., 2022 ] Di Wang, Jinyuan Liu, Xin Fan, and Risheng Liu. Unsupervised misaligned infrared and visible image fu- sion via cross-modality image generation and registration.arXiv preprint arXiv:2205.11876,

work page arXiv 2022

[30] [30]

Improving misaligned multi-modality image fusion with one-stage progressive dense registration.IEEE Transactions on Circuits and Systems for Video Technology,

[Wanget al., 2024 ] Di Wang, Jinyuan Liu, Long Ma, Risheng Liu, and Xin Fan. Improving misaligned multi-modality image fusion with one-stage progressive dense registration.IEEE Transactions on Circuits and Systems for Video Technology,

work page 2024

[31] [31]

Efficient rectified flow for image fusion.Advances in Neural Information Processing Systems,

[Wanget al., 2025 ] Zirui Wang, Jiayi Zhang, Tianwei Guan, Yuhan Zhou, Xingyuan Li, Minjing Dong, and Jinyuan Liu. Efficient rectified flow for image fusion.Advances in Neural Information Processing Systems,

work page 2025

[32] [32]

Incorporat- ing degradation estimation in light field spatial super-resolution

[Xiao and Xiong, 2025] Zeyu Xiao and Zhiwei Xiong. Incorporat- ing degradation estimation in light field spatial super-resolution. Computer Vision and Image Understanding, 252:104295,

work page 2025

[33] [33]

Occlusion-embedded hybrid transformer for light field super- resolution

[Xiaoet al., 2025 ] Zeyu Xiao, Zhuoyuan Li, and Wei Jia. Occlusion-embedded hybrid transformer for light field super- resolution. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 8700–8708,

work page 2025

[34] [34]

Learning dual modality interactions for event-based motion deblurring.IEEE Transactions on Multi- media,

[Xiaoet al., 2026 ] Zeyu Xiao, Zhuoyuan Li, Yang Zhao, Yu Liu, Zhao Zhang, and Wei Jia. Learning dual modality interactions for event-based motion deblurring.IEEE Transactions on Multi- media,

work page 2026

[35] [35]

U2fusion: A unified unsupervised image fusion network.IEEE transactions on pattern analysis and machine intelligence, 44(1):502–518,

[Xuet al., 2020 ] Han Xu, Jiayi Ma, Junjun Jiang, Xiaojie Guo, and Haibin Ling. U2fusion: A unified unsupervised image fusion network.IEEE transactions on pattern analysis and machine intelligence, 44(1):502–518,

work page 2020

[36] [36]

Murf: Mu- tually reinforcing multi-modal image registration and fusion

[Xuet al., 2023 ] Han Xu, Jiteng Yuan, and Jiayi Ma. Murf: Mu- tually reinforcing multi-modal image registration and fusion. IEEE transactions on pattern analysis and machine intelligence, 45(10):12148–12166,

work page 2023

[37] [37]

A multiscale framework with unsu- pervised learning for remote sensing image registration.IEEE Transactions on Geoscience and Remote Sensing, 60:1–15,

[Yeet al., 2022 ] Yuanxin Ye, Tengfeng Tang, Bai Zhu, Chao Yang, Bo Li, and Siyuan Hao. A multiscale framework with unsu- pervised learning for remote sensing image registration.IEEE Transactions on Geoscience and Remote Sensing, 60:1–15,

work page 2022

[38] [38]

Restormer: Efficient transformer for high-resolution image restoration

[Zamiret al., 2022 ] Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang. Restormer: Efficient transformer for high-resolution image restoration. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 5728–5739,

work page 2022

[39] [39]

Mrfs: Mutually reinforcing image fusion and segmentation

[Zhanget al., 2024 ] Hao Zhang, Xuhui Zuo, Jie Jiang, Chunchao Guo, and Jiayi Ma. Mrfs: Mutually reinforcing image fusion and segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 26974–26983,

work page 2024

[40] [40]

Didfuse: Deep im- age decomposition for infrared and visible image fusion.arXiv preprint arXiv:2003.09210,

[Zhaoet al., 2020 ] Zixiang Zhao, Shuang Xu, Chunxia Zhang, Jun- min Liu, Pengfei Li, and Jiangshe Zhang. Didfuse: Deep im- age decomposition for infrared and visible image fusion.arXiv preprint arXiv:2003.09210,

work page arXiv 2020

[41] [41]

Cddfuse: Correlation-driven dual-branch feature de- composition for multi-modality image fusion

[Zhaoet al., 2023 ] Zixiang Zhao, Haowen Bai, Jiangshe Zhang, Yulun Zhang, Shuang Xu, Zudi Lin, Radu Timofte, and Luc Van Gool. Cddfuse: Correlation-driven dual-branch feature de- composition for multi-modality image fusion. InProceedings of the IEEE/CVF conference on computer vision and pattern recog- nition, pages 5906–5916,

work page 2023

[42] [42]

Spike cam- era optical flow estimation based on continuous spike streams

[Zhaoet al., 2026 ] Rui Zhao, Ruiqin Xiong, Dongkai Wang, Shiyu Xuan, Jian Zhang, Xiaopeng Fan, and Tiejun Huang. Spike cam- era optical flow estimation based on continuous spike streams. IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 48(4):4756–4770,

work page 2026

[43] [43]

Probing syner- gistic high-order interaction in infrared and visible image fusion

[Zhenget al., 2024 ] Naishan Zheng, Man Zhou, Jie Huang, Jun- ming Hou, Haoying Li, Yuan Xu, and Feng Zhao. Probing syner- gistic high-order interaction in infrared and visible image fusion. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 26384–26395, 2024

work page 2024