Customized Fusion: A Closed-Loop Dynamic Network for Adaptive Multi-Task-Aware Infrared-Visible Image Fusion

Huafeng Li; Juan Cheng; Yafei Zhang; Yu Liu; Zengyi Yang; Zhiqin Zhu

arxiv: 2604.08924 · v1 · submitted 2026-04-10 · 💻 cs.CV

Customized Fusion: A Closed-Loop Dynamic Network for Adaptive Multi-Task-Aware Infrared-Visible Image Fusion

Zengyi Yang , Yu Liu , Juan Cheng , Zhiqin Zhu , Yafei Zhang , Huafeng Li This is my paper

Pith reviewed 2026-05-10 17:19 UTC · model grok-4.3

classification 💻 cs.CV

keywords infrared-visible image fusionmulti-task adaptationclosed-loop optimizationsemantic compensationadaptive fusion networkcomputer visiondynamic network

0 comments

The pith

A closed-loop dynamic network customizes infrared-visible fusion for multiple downstream tasks by feeding back task performance to compensate semantics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to overcome the limitation that standard fusion methods for infrared and visible images cannot adjust themselves when used for different tasks such as detection or segmentation. It proposes a network that creates an explicit loop: task results influence a compensation module which then modifies the fusion process on the fly. This module draws from a bank of basis vectors and injects task-specific adjustments into the network architecture. The adjustments are guided by a reward or penalty based on whether task accuracy improves or declines. As a result, the same fusion model can serve multiple tasks without being retrained from scratch for each one.

Core claim

The central claim is that a closed-loop optimization mechanism, built around a Requirement-driven Semantic Compensation module, can transmit semantic needs from downstream tasks back to the fusion network. The module employs a Basis Vector Bank together with an Architecture-Adaptive Semantic Injection block to alter network behavior according to task requirements, so that the fused image actively supports whichever task is active without any retraining of the fusion weights.

What carries the argument

The Requirement-driven Semantic Compensation (RSC) module, which uses a Basis Vector Bank and Architecture-Adaptive Semantic Injection block to reshape the fusion network according to measured task performance.

If this is right

The fusion network maintains high visual quality on standard benchmarks while gaining the ability to serve multiple tasks.
Explicit feedback from task metrics drives semantic changes, removing the need to retrain the fusion model for each new task.
A reward-penalty rule based on task performance variations guides the compensation process.
The same trained model exhibits measurable adaptability across the M3FD, FMB, and VT5000 datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The closed-loop idea could be applied to other multi-modal fusion settings where the best output depends on which task is running at the moment.
By avoiding separate fusion models for each task, the approach may lower overall storage and compute costs in systems that switch between tasks.
If the compensation remains stable over long sequences of changing tasks, the method might support continuous online adaptation in deployed vision systems.

Load-bearing premise

Measured changes in downstream task performance can be translated into stable, useful adjustments to the fusion network without causing instability or overfitting to individual tasks.

What would settle it

Running the method on a new task or dataset where the adapted fusion produces lower task accuracy than a fixed, non-adaptive fusion baseline would show that the closed-loop compensation is not providing the claimed benefit.

Figures

Figures reproduced from arXiv: 2604.08924 by Huafeng Li, Juan Cheng, Yafei Zhang, Yu Liu, Zengyi Yang, Zhiqin Zhu.

**Figure 2.** Figure 2: Overview of the adaptive multi-task-aware infrared-visible image fusion network. The network forms a semantic transmission [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Architecture of the A2SI block. The A2SI block com [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative comparison between the proposed method and the “task network retraining” methods. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative comparison between the proposed method [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Qualitative (a) and quantitative (b) comparison between [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 9.** Figure 9: Network architecture of VFN. The VFN (a) consists of a Feature Extraction Blocks (FEB) (b) and a Fusion Feature Reconstruction [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗

**Figure 10.** Figure 10: Qualitative comparison between the proposed method and existing state-of-the-art approaches. The first and second columns [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗

**Figure 11.** Figure 11: Qualitative comparison between the full model and the [PITH_FULL_IMAGE:figures/full_fig_p014_11.png] view at source ↗

**Figure 12.** Figure 12: Training loss curves of the proposed method. [PITH_FULL_IMAGE:figures/full_fig_p015_12.png] view at source ↗

read the original abstract

Infrared-visible image fusion aims to integrate complementary information for robust visual understanding, but existing fusion methods struggle with simultaneously adapting to multiple downstream tasks. To address this issue, we propose a Closed-Loop Dynamic Network (CLDyN) that can adaptively respond to the semantic requirements of diverse downstream tasks for task-customized image fusion. Specifically, CLDyN introduces a closed-loop optimization mechanism that establishes a semantic transmission chain to achieve explicit feedback from downstream tasks to the fusion network through a Requirement-driven Semantic Compensation (RSC) module. The RSC module leverages a Basis Vector Bank (BVB) and an Architecture-Adaptive Semantic Injection (A2SI) block to customize the network architecture according to task requirements, thereby enabling task-specific semantic compensation and allowing the fusion network to actively adapt to diverse tasks without retraining. To promote semantic compensation, a reward-penalty strategy is introduced to reward or penalize the RSC module based on task performance variations. Experiments on the M3FD, FMB, and VT5000 datasets demonstrate that CLDyN not only maintains high fusion quality but also exhibits strong multi-task adaptability. The code is available at https://github.com/YR0211/CLDyN.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CLDyN adds closed-loop task feedback to IR-VIS fusion via RSC, BVB, and reward-penalty, but the abstract supplies no metrics or ablations so the adaptation claims stay untested.

read the letter

The paper's main contribution is a fusion network that takes explicit feedback from downstream task performance and uses it to customize the output through a Requirement-driven Semantic Compensation module. This module draws from a Basis Vector Bank and an Architecture-Adaptive Semantic Injection block, then applies a reward-penalty update so the same network can shift behavior for different tasks without full retraining. The closed-loop idea directly targets the practical problem that one fused image often has to serve detection, segmentation, and other models at once.

Referee Report

2 major / 2 minor

Summary. The paper introduces a Closed-Loop Dynamic Network (CLDyN) for adaptive infrared-visible image fusion across multiple downstream tasks. It features a closed-loop optimization with a Requirement-driven Semantic Compensation (RSC) module that utilizes a Basis Vector Bank (BVB) and Architecture-Adaptive Semantic Injection (A2SI) block to customize the fusion network based on task semantics. A reward-penalty strategy guides the adaptation using variations in task performance, allowing the system to respond to diverse tasks without retraining. Validation is performed on the M3FD, FMB, and VT5000 datasets, asserting maintained fusion quality alongside multi-task adaptability.

Significance. Should the proposed closed-loop mechanism prove stable and effective in providing task-driven customization, this contribution would be significant for infrared-visible fusion research. It tackles the challenge of task-specific adaptation in fusion networks, which could streamline applications requiring robustness to varying semantic needs, such as in object detection or segmentation pipelines. The public code release aids in verifying and extending the work.

major comments (2)

[RSC module and reward-penalty strategy] The reward-penalty strategy employs downstream task performance to directly influence the RSC module's adjustments to the fusion network. This creates a potential circular dependency, where the performance metric serves both as the driver for modification and the evaluator of the output. To support the central claim of reliable adaptation without retraining, the paper must demonstrate the stability of this process, perhaps through convergence proofs or extensive empirical validation beyond the reported datasets.
[Experiments section] The experiments claim strong multi-task adaptability on M3FD, FMB, and VT5000, yet the provided description lacks specific quantitative metrics, ablation studies isolating the contributions of BVB and A2SI, and error analysis. This omission weakens the ability to assess whether the closed-loop truly enables the claimed customization or if results could be due to other factors.

minor comments (2)

Consider adding a table summarizing quantitative fusion metrics (e.g., PSNR, SSIM) and task performance improvements across datasets for clarity.
[Abstract] The abstract states 'strong multi-task adaptability' without supporting numbers; including one or two key results would strengthen the summary.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback on our manuscript. We address each major comment point by point below, providing clarifications and outlining planned revisions to improve the paper's rigor and clarity.

read point-by-point responses

Referee: [RSC module and reward-penalty strategy] The reward-penalty strategy employs downstream task performance to directly influence the RSC module's adjustments to the fusion network. This creates a potential circular dependency, where the performance metric serves both as the driver for modification and the evaluator of the output. To support the central claim of reliable adaptation without retraining, the paper must demonstrate the stability of this process, perhaps through convergence proofs or extensive empirical validation beyond the reported datasets.

Authors: We acknowledge the valid concern about potential circular dependency in the closed-loop design. The reward-penalty mechanism uses performance variations as feedback to adjust the RSC module via the BVB and A2SI, but the downstream metrics (e.g., detection mAP or segmentation IoU) are computed independently on the fused output after each adaptation step, breaking direct circularity. While a formal convergence proof is not provided in the current manuscript due to the non-convex and dynamic nature of the architecture search, we will add extensive empirical validation in the revision, including convergence plots of task performance over adaptation iterations, stability analysis across random seeds, and results on additional task variations within the M3FD, FMB, and VT5000 datasets. These additions will support the claim of reliable adaptation without retraining. revision: partial
Referee: [Experiments section] The experiments claim strong multi-task adaptability on M3FD, FMB, and VT5000, yet the provided description lacks specific quantitative metrics, ablation studies isolating the contributions of BVB and A2SI, and error analysis. This omission weakens the ability to assess whether the closed-loop truly enables the claimed customization or if results could be due to other factors.

Authors: We appreciate this observation and agree that more granular details are needed. The original manuscript reports quantitative fusion metrics (e.g., PSNR, SSIM, VIF) and downstream task results (e.g., mAP on detection), but we will expand the experiments section to include: (1) specific numerical tables with all metrics and standard deviations, (2) dedicated ablation studies isolating BVB and A2SI contributions (with and without each component), and (3) error analysis including per-task performance breakdowns, failure case discussions, and statistical significance tests. These revisions will better demonstrate the closed-loop's role in customization. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's central mechanism (closed-loop optimization via RSC module with BVB, A2SI, and reward-penalty based on downstream task performance variations) is presented as an external feedback process from task metrics to network adaptation, not as a self-referential definition or a fitted parameter renamed as a prediction. No equations or steps in the abstract reduce the claimed semantic transmission chain to its own inputs by construction. No self-citations, uniqueness theorems, or ansatzes are invoked in the provided text to load-bear the architecture. The multi-dataset experiments are cited as empirical support for stability and adaptability, keeping the derivation self-contained against external benchmarks rather than tautological.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 3 invented entities

The central claim depends on two new invented modules (RSC with BVB and A2SI) whose effectiveness is asserted without external validation or parameter-free derivation; the reward-penalty loop is an ad-hoc training signal whose stability is assumed rather than proven.

axioms (1)

domain assumption Downstream task performance provides a stable and informative signal for adjusting fusion parameters
Invoked in the description of the reward-penalty strategy that drives the RSC module.

invented entities (3)

Requirement-driven Semantic Compensation (RSC) module no independent evidence
purpose: To receive task feedback and customize fusion via semantic compensation
New component introduced to close the loop between fusion and downstream tasks
Basis Vector Bank (BVB) no independent evidence
purpose: To provide basis vectors for architecture adaptation
New data structure introduced inside the RSC module
Architecture-Adaptive Semantic Injection (A2SI) block no independent evidence
purpose: To inject task-specific semantics into the network
New architectural block for dynamic customization

pith-pipeline@v0.9.0 · 5532 in / 1411 out tokens · 44949 ms · 2026-05-10T17:19:12.350503+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

CLDyN introduces a closed-loop optimization mechanism that establishes a semantic transmission chain... through a Requirement-driven Semantic Compensation (RSC) module... Basis Vector Bank (BVB) and an Architecture-Adaptive Semantic Injection (A2SI) block... reward-penalty strategy
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Experiments on the M3FD, FMB, and VT5000 datasets demonstrate that CLDyN not only maintains high fusion quality but also exhibits strong multi-task adaptability

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · 1 internal anchor

[1]

Task- driven image fusion with learnable fusion loss

Haowen Bai, Jiangshe Zhang, Zixiang Zhao, Yichen Wu, Lilun Deng, Yukun Cui, Tao Feng, and Shuang Xu. Task- driven image fusion with learnable fusion loss. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7457–7468, 2025. 1, 2, 6

work page 2025
[2]

Deep unfolding multi-modal image fusion network via attri- bution analysis.IEEE Transactions on Circuits and Systems for Video Technology, 35(4):3498–3511, 2025

Haowen Bai, Zixiang Zhao, Jiangshe Zhang, Baisong Jiang, Lilun Deng, Yukun Cui, Shuang Xu, and Chunxia Zhang. Deep unfolding multi-modal image fusion network via attri- bution analysis.IEEE Transactions on Circuits and Systems for Video Technology, 35(4):3498–3511, 2025. 1, 2

work page 2025
[3]

Closed-loop visuomotor control with gen- erative expectation for robotic manipulation

Qingwen Bu, Jia Zeng, Li Chen, Yanchao Yang, Guyue Zhou, Junchi Yan, Ping Luo, Heming Cui, Yi Ma, and Hongyang Li. Closed-loop visuomotor control with gen- erative expectation for robotic manipulation. InAdvances in Neural Information Processing Systems (NeurIPS), pages 139002–139029, 2024. 2

work page 2024
[4]

Conditional controllable image fusion

Bing Cao, Xingxin Xu, Pengfei Zhu, Qilong Wang, and Qinghua Hu. Conditional controllable image fusion. InAd- vances in Neural Information Processing Systems (NeurIPS), pages 120311–120335, 2024. 1

work page 2024
[5]

End- to-end object detection with transformers

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End- to-end object detection with transformers. InProceedings of the European Conference on Computer Vision (ECCV), pages 213–229, 2020. 7, 8

work page 2020
[6]

Varshney

Hao Chen and Pramod K. Varshney. A human perception inspired quality metric for image fusion based on regional information.Information Fusion, 8(2):193–207, 2007. 6

work page 2007
[7]

Sdsfusion: A semantic-aware infrared and visible image fusion network for degraded scenes.IEEE Transactions on Image Processing, 34:3139–3153, 2025

Jun Chen, Liling Yang, Wei Yu, Wenping Gong, Zhanchuan Cai, and Jiayi Ma. Sdsfusion: A semantic-aware infrared and visible image fusion network for degraded scenes.IEEE Transactions on Image Processing, 34:3139–3153, 2025. 2

work page 2025
[8]

Yin Chen and Rick S. Blum. A new automated quality as- sessment algorithm for image fusion.Image and Vision Com- puting, 27(10):1421–1432, 2009. 6

work page 2009
[9]

Dynamic convolution: Attention over convolution kernels

Yinpeng Chen, Xiyang Dai, Mengchen Liu, Dongdong Chen, Lu Yuan, and Zicheng Liu. Dynamic convolution: Attention over convolution kernels. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11027–11036, 2020. 2

work page 2020
[10]

One model for all: Low-level task interaction is a key to task-agnostic image fu- sion

Chunyang Cheng, Tianyang Xu, Zhenhua Feng, Xiaojun Wu, Zhangyong Tang, Hui Li, Zeyang Zhang, Sara Atito, Muhammad Awais, and Josef Kittler. One model for all: Low-level task interaction is a key to task-agnostic image fu- sion. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 28102– 28112, 2025. 1

work page 2025
[11]

Clever, Greg Turk, C

Zackory Erickson, Henry M. Clever, Greg Turk, C. Karen Liu, and Charles C. Kemp. Deep haptic model predictive control for robot-assisted dressing. In2018 IEEE Inter- national Conference on Robotics and Automation (ICRA), pages 4437–4444, 2018. 2

work page 2018
[12]

Sam-guided multi-level collaborative transformer for infrared and visible image fusion.Pattern Recognition, 162:111391, 2025

Lin Guo, Xiaoqing Luo, Yue Liu, Zhancheng Zhang, and Xi- aojun Wu. Sam-guided multi-level collaborative transformer for infrared and visible image fusion.Pattern Recognition, 162:111391, 2025. 2

work page 2025
[13]

Dynamic neural networks: A sur- vey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11):7436–7456, 2022

Yizeng Han, Gao Huang, Shiji Song, Le Yang, Honghui Wang, and Yulin Wang. Dynamic neural networks: A sur- vey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11):7436–7456, 2022. 2

work page 2022
[14]

Llvip: A visible-infrared paired dataset for low- light vision

Xinyu Jia, Chuang Zhu, Minzhen Li, Wenqi Tang, and Wenli Zhou. Llvip: A visible-infrared paired dataset for low- light vision. InProceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), pages 3496–3504, 2021. 5

work page 2021
[15]

Kingma and Jimmy Ba

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InProceedings of International Conference on Learning Representations (ICLR), 2015. 6

work page 2015
[16]

Berg, Wan-Yen Lo, Piotr Dollar, and Ross Girshick

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C. Berg, Wan-Yen Lo, Piotr Dollar, and Ross Girshick. Segment anything. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4015–4026, 2023. 2

work page 2023
[17]

Hui Li, Congcong Bian, Zeyang Zhang, Xiaoning Song, Xi Li, and XiaoJun Wu. Occo: Lvm-guided infrared and visible image fusion framework based on object-aware and contex- tual contrastive learning.International Journal of Computer Vision, 133(9):6611–6635, 2025. 6

work page 2025
[18]

Hui Li, Congcong Bian, Zeyang Zhang, Xiaoning Song, Xi Li, and Xiao-Jun Wu. Occo: Lvm-guided infrared and visi- ble image fusion framework based on object-aware and con- textual contrastive learning.International Journal of Com- puter Vision, 133(9):6611–6635, 2025. 1

work page 2025
[19]

Huafeng Li, Zengyi Yang, Yafei Zhang, Wei Jia, Zheng- tao Yu, and Yu Liu. Mulfs-cap: Multimodal fusion- supervised cross-modality alignment perception for unreg- istered infrared-visible image fusion.IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(5):3673– 3690, 2025

work page 2025
[20]

From text to pix- els: A context-aware semantic synergy solution for infrared and visible image fusion.arXiv preprint arXiv:2401.00421,

Xingyuan Li, Yang Zou, Jinyuan Liu, Zhiying Jiang, Long Ma, Xin Fan, and Risheng Liu. From text to pixels: A context-aware semantic synergy solution for infrared and visible image fusion.arXiv preprint arXiv:2401.00421, 2023

work page arXiv 2023
[21]

Contourlet residual for prompt learning enhanced infrared image super-resolution

Xingyuan Li, Jinyuan Liu, Zhixin Chen, Yang Zou, Long Ma, Xin Fan, and Risheng Liu. Contourlet residual for prompt learning enhanced infrared image super-resolution. InEuropean Conference on Computer Vision, pages 270–

work page
[22]

Difiisr: A diffu- sion model with gradient guidance for infrared image super- resolution

Xingyuan Li, Zirui Wang, Yang Zou, Zhixin Chen, Jun Ma, Zhiying Jiang, Long Ma, and Jinyuan Liu. Difiisr: A diffu- sion model with gradient guidance for infrared image super- resolution. InProceedings of the Computer Vision and Pat- tern Recognition Conference, pages 7534–7544, 2025. 1

work page 2025
[23]

Fusion from decomposition: A self-supervised approach for image fusion and beyond.arXiv preprint arXiv: 2410.12274, 2024

Pengwei Liang, Junjun Jiang, Qing Ma, Xianming Liu, and Jiayi Ma. Fusion from decomposition: A self-supervised approach for image fusion and beyond.arXiv preprint arXiv: 2410.12274, 2024. 2

work page arXiv 2024
[24]

Conflict-averse gradient descent for multi-task learn- ing

Bo Liu, Xingchao Liu, Xiaojie Jin, Peter Stone, and Qiang Liu. Conflict-averse gradient descent for multi-task learn- ing. InAdvances in Neural Information Processing Systems (NeurIPS), pages 18878–18890, 2021. 4

work page 2021
[25]

Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection

Jinyuan Liu, Xin Fan, Zhanbo Huang, Guanyao Wu, Risheng Liu, Wei Zhong, and Zhongxuan Luo. Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5802–5811,

work page
[26]

Multi-interactive feature learning and a full-time multi-modality benchmark for image fusion and segmentation

Jinyuan Liu, Zhu Liu, Guanyao Wu, Long Ma, Risheng Liu, Wei Zhong, Zhongxuan Luo, and Xin Fan. Multi-interactive feature learning and a full-time multi-modality benchmark for image fusion and segmentation. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 8081–8090, 2023. 2, 5, 6, 7, 1, 3

work page 2023
[27]

Coconet: Coupled con- trastive learning network with multi-level feature ensemble for multi-modality image fusion.International Journal of Computer Vision, 132(5):1748–1775, 2024

Jinyuan Liu, Runjia Lin, Guanyao Wu, Risheng Liu, Zhongxuan Luo, and Xin Fan. Coconet: Coupled con- trastive learning network with multi-level feature ensemble for multi-modality image fusion.International Journal of Computer Vision, 132(5):1748–1775, 2024. 6, 1

work page 2024
[28]

Infrared and visible image fusion: From data compatibility to task adaption.IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, 47(4):2349–2369, 2025

Jinyuan Liu, Guanyao Wu, Zhu Liu, Di Wang, Zhiying Jiang, Long Ma, Wei Zhong, Xin Fan, and Risheng Liu. Infrared and visible image fusion: From data compatibility to task adaption.IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, 47(4):2349–2369, 2025. 6

work page 2025
[29]

Infrared and visible image fusion: From data compatibility to task adaption.IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, 47(4):2349–2369, 2025

Jinyuan Liu, Guanyao Wu, Zhu Liu, Di Wang, Zhiying Jiang, Long Ma, Wei Zhong, Xin Fan, and Risheng Liu. Infrared and visible image fusion: From data compatibility to task adaption.IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, 47(4):2349–2369, 2025. 5

work page 2025
[30]

A task-guided, implicitly-searched and meta-initialized deep model for image fusion.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 46(10):6594–6609,

Risheng Liu, Zhu Liu, Jinyuan Liu, Xin Fan, and Zhongxuan Luo. A task-guided, implicitly-searched and meta-initialized deep model for image fusion.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 46(10):6594–6609,

work page
[31]

Yu Liu, Zhengzheng Qi, Juan Cheng, and Xun Chen. Re- thinking the effectiveness of objective evaluation metrics in multi-focus image fusion: A statistic-based approach.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(8):5806–5819, 2024. 6

work page 2024
[32]

Bi-level dynamic learning for jointly multi- modality image fusion and beyond

Zhu Liu, Jinyuan Liu, Guanyao Wu, Long Ma, Xin Fan, and Risheng Liu. Bi-level dynamic learning for jointly multi- modality image fusion and beyond. InProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI), pages 1240–1248, 2023. 2

work page 2023
[33]

Paif: Perception-aware infrared-visible image fusion for attack-tolerant semantic segmentation

Zhu Liu, Jinyuan Liu, Benzhuang Zhang, Long Ma, Xin Fan, and Risheng Liu. Paif: Perception-aware infrared-visible image fusion for attack-tolerant semantic segmentation. In Proceedings of the 31st ACM International Conference on Multimedia (ACM MM), page 3706–3714, 2023. 2

work page 2023
[34]

Infrared and visible im- age fusion methods and applications: A survey.Information Fusion, 45:153–178, 2019

Jiayi Ma, Yong Ma, and Chang Li. Infrared and visible im- age fusion methods and applications: A survey.Information Fusion, 45:153–178, 2019. 6

work page 2019
[35]

Jane Wang, and Xun Chen

Yu Shi, Yu Liu, Juan Cheng, Z. Jane Wang, and Xun Chen. Vdmufusion: A versatile diffusion model-based unsuper- vised framework for image fusion.IEEE Transactions on Image Processing, 34:441–454, 2025. 1

work page 2025
[36]

Det- fusion: A detection-driven infrared and visible image fusion network

Yiming Sun, Bing Cao, Pengfei Zhu, and Qinghua Hu. Det- fusion: A detection-driven infrared and visible image fusion network. InProceedings of the 30th ACM International Con- ference on Multimedia (ACM MM), page 4003–4011, 2022. 1, 2

work page 2022
[37]

Task-gated multi- expert collaboration network for degraded multi-modal im- age fusion

Yiming Sun, Xin Li, Pengfei Zhu, Qinghua Hu, Dongwei Ren, Huiying Xu, and Xinzhong Zhu. Task-gated multi- expert collaboration network for degraded multi-modal im- age fusion. InProceedings of 42nd International Conference on Machine Learning (ICML), 2025. 1

work page 2025
[38]

Image fusion in the loop of high-level vision tasks: A semantic-aware real- time infrared and visible image fusion network.Information Fusion, 82:28–42, 2022

Linfeng Tang, Jiteng Yuan, and Jiayi Ma. Image fusion in the loop of high-level vision tasks: A semantic-aware real- time infrared and visible image fusion network.Information Fusion, 82:28–42, 2022. 1, 2

work page 2022
[39]

Piafusion: A progressive infrared and visible im- age fusion network based on illumination aware.Information Fusion, 83-84:79–92, 2022

Linfeng Tang, Jiteng Yuan, Hao Zhang, Xingyu Jiang, and Jiayi Ma. Piafusion: A progressive infrared and visible im- age fusion network based on illumination aware.Information Fusion, 83-84:79–92, 2022. 5

work page 2022
[40]

Linfeng Tang, Hao Zhang, Han Xu, and Jiayi Ma. Rethink- ing the necessity of image fusion in high-level vision tasks: A practical infrared and visible image fusion network based on progressive semantic injection and scene fidelity.Infor- mation Fusion, 99:101870, 2023. 2

work page 2023
[41]

C2rf: Bridging multi-modal image regis- tration and fusion via commonality mining and contrastive learning.International Journal of Computer Vision, 133(8): 5262–5280, 2025

Linfeng Tang, Qinglong Yan, Xinyu Xiang, Leyuan Fang, and Jiayi Ma. C2rf: Bridging multi-modal image regis- tration and fusion via commonality mining and contrastive learning.International Journal of Computer Vision, 133(8): 5262–5280, 2025. 1

work page 2025
[42]

LLaMA: Open and Efficient Foundation Language Models

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timoth´ee Lacroix, Baptiste Rozi`ere, Naman Goyal, Eric Hambro, Faisal Azhar, Aure- lien Rodriguez, Armand Joulin, Edouard Grave, and Guil- laume Lample. Llama: Open and efficient foundation lan- guage models.arXiv preprint arXiv: 2302.13971, 2023. 7

work page internal anchor Pith review Pith/arXiv arXiv 2023
[43]

Rgbt salient object detection: A large- scale dataset and benchmark.IEEE Transactions on Multi- media, 25:4163–4176, 2023

Zhengzheng Tu, Yan Ma, Zhun Li, Chenglong Li, Jieming Xu, and Yongtao Liu. Rgbt salient object detection: A large- scale dataset and benchmark.IEEE Transactions on Multi- media, 25:4163–4176, 2023. 5, 3

work page 2023
[44]

An inter- actively reinforced paradigm for joint infrared-visible image fusion and saliency object detection.Information Fusion, 98: 101828, 2023

Di Wang, Jinyuan Liu, Risheng Liu, and Xin Fan. An inter- actively reinforced paradigm for joint infrared-visible image fusion and saliency object detection.Information Fusion, 98: 101828, 2023. 2, 6, 7, 1

work page 2023
[45]

Di Wang, Xianghao Jiao, Jinyuan Liu, and Xin Fan. Robust one-stop multi-modality image registration-fusion- segmentation framework against misalignments and adver- sarial attacks.IEEE Transactions on Multimedia, 27:4531– 4543, 2025. 1 10

work page 2025
[46]

Every sam drop counts: Embracing semantic priors for multi-modality image fusion and beyond

Guanyao Wu, Haoyu Liu, Hongming Fu, Yichuan Peng, Jinyuan Liu, Xin Fan, and Risheng Liu. Every sam drop counts: Embracing semantic priors for multi-modality image fusion and beyond. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 17882–17891, 2025. 1, 2, 6

work page 2025
[47]

Segformer: Simple and efficient design for semantic segmentation with transform- ers

Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M Alvarez, and Ping Luo. Segformer: Simple and efficient design for semantic segmentation with transform- ers. InAdvances in Neural Information Processing Systems (NeurIPS), 2021. 5

work page 2021
[48]

Fusiondn: A unified densely connected network for image fusion.Proceedings of the AAAI Conference on Arti- ficial Intelligence, 34(07):12484–12491, 2020

Han Xu, Jiayi Ma, Zhuliang Le, Junjun Jiang, and Xiaojie Guo. Fusiondn: A unified densely connected network for image fusion.Proceedings of the AAAI Conference on Arti- ficial Intelligence, 34(07):12484–12491, 2020. 5

work page 2020
[49]

U2fusion: A unified unsupervised image fusion net- work.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(1):502–518, 2022

Han Xu, Jiayi Ma, Junjun Jiang, Xiaojie Guo, and Haibin Ling. U2fusion: A unified unsupervised image fusion net- work.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(1):502–518, 2022. 5

work page 2022
[50]

Objective image fusion performance measure.Electronics letters, 36(4):308– 309, 2000

Costas S Xydeas, Vladimir Petrovic, et al. Objective image fusion performance measure.Electronics letters, 36(4):308– 309, 2000. 6

work page 2000
[51]

Instruction-driven fusion of infrared–visible images: Tailor- ing for diverse downstream tasks.Information Fusion, 121: 103148, 2025

Zengyi Yang, Yafei Zhang, Huafeng Li, and Yu Liu. Instruction-driven fusion of infrared–visible images: Tailor- ing for diverse downstream tasks.Information Fusion, 121: 103148, 2025. 2

work page 2025
[52]

Text-if: Leveraging semantic text guidance for degradation-aware and interactive image fusion

Xunpeng Yi, Han Xu, Hao Zhang, Linfeng Tang, and Ji- ayi Ma. Text-if: Leveraging semantic text guidance for degradation-aware and interactive image fusion. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 27016–27025, 2024. 6

work page 2024
[53]

Mrfs: Mutually reinforcing image fusion and segmenta- tion

Hao Zhang, Xuhui Zuo, Jie Jiang, Chunchao Guo, and Jiayi Ma. Mrfs: Mutually reinforcing image fusion and segmenta- tion. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 26964– 26973, 2024. 1, 2, 6, 7

work page 2024
[54]

Omnifuse: Composite degradation-robust image fusion with language-driven semantics.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 47(9):7577–7595,

Hao Zhang, Lei Cao, Xuhui Zuo, Zhenfeng Shao, and Jiayi Ma. Omnifuse: Composite degradation-robust image fusion with language-driven semantics.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 47(9):7577–7595,

work page
[55]

Visible and infrared image fusion using deep learning.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 45(8):10535–10554,

Xingchen Zhang and Yiannis Demiris. Visible and infrared image fusion using deep learning.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 45(8):10535–10554,

work page
[56]

Vifb: A visi- ble and infrared image fusion benchmark

Xingchen Zhang, Ping Ye, and Gang Xiao. Vifb: A visi- ble and infrared image fusion benchmark. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 468–478, 2020. 6

work page 2020
[57]

Metafusion: Infrared and visible image fusion via meta- feature embedding from object detection

Wenda Zhao, Shigeng Xie, Fan Zhao, You He, and Huchuan Lu. Metafusion: Infrared and visible image fusion via meta- feature embedding from object detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13955–13965, 2023. 1, 2

work page 2023
[58]

Freefusion: Infrared and visible image fusion via cross reconstruction learning.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 47(9):8040–8056,

Wenda Zhao, Hengshuai Cui, Haipeng Wang, You He, and Huchuan Lu. Freefusion: Infrared and visible image fusion via cross reconstruction learning.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 47(9):8040–8056,

work page
[59]

Com- plementary trilateral decoder for fast and accurate salient ob- ject detection

Zhirui Zhao, Changqun Xia, Chenxi Xie, and Jia Li. Com- plementary trilateral decoder for fast and accurate salient ob- ject detection. InProceedings of the 29th ACM International Conference on Multimedia (ACM MM), page 4967–4975,

work page
[60]

Equivariant multi-modality image fusion

Zixiang Zhao, Haowen Bai, Jiangshe Zhang, Yulun Zhang, Kai Zhang, Shuang Xu, Dongdong Chen, Radu Timofte, and Luc Van Gool. Equivariant multi-modality image fusion. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 25912–25921,

work page
[61]

task network retraining

1 11 Customized Fusion: A Closed-Loop Dynamic Network for Adaptive Multi-Task-Aware Infrared-Visible Image Fusion Supplementary Material A. More Details of VFN In the first stage, we train the VFN to focus on generating vi- sually guided fused images. In the second stage, the VFN is frozen, while the RSC module assists in adapting the VFN to various downs...

work page

[1] [1]

Task- driven image fusion with learnable fusion loss

Haowen Bai, Jiangshe Zhang, Zixiang Zhao, Yichen Wu, Lilun Deng, Yukun Cui, Tao Feng, and Shuang Xu. Task- driven image fusion with learnable fusion loss. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7457–7468, 2025. 1, 2, 6

work page 2025

[2] [2]

Deep unfolding multi-modal image fusion network via attri- bution analysis.IEEE Transactions on Circuits and Systems for Video Technology, 35(4):3498–3511, 2025

Haowen Bai, Zixiang Zhao, Jiangshe Zhang, Baisong Jiang, Lilun Deng, Yukun Cui, Shuang Xu, and Chunxia Zhang. Deep unfolding multi-modal image fusion network via attri- bution analysis.IEEE Transactions on Circuits and Systems for Video Technology, 35(4):3498–3511, 2025. 1, 2

work page 2025

[3] [3]

Closed-loop visuomotor control with gen- erative expectation for robotic manipulation

Qingwen Bu, Jia Zeng, Li Chen, Yanchao Yang, Guyue Zhou, Junchi Yan, Ping Luo, Heming Cui, Yi Ma, and Hongyang Li. Closed-loop visuomotor control with gen- erative expectation for robotic manipulation. InAdvances in Neural Information Processing Systems (NeurIPS), pages 139002–139029, 2024. 2

work page 2024

[4] [4]

Conditional controllable image fusion

Bing Cao, Xingxin Xu, Pengfei Zhu, Qilong Wang, and Qinghua Hu. Conditional controllable image fusion. InAd- vances in Neural Information Processing Systems (NeurIPS), pages 120311–120335, 2024. 1

work page 2024

[5] [5]

End- to-end object detection with transformers

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End- to-end object detection with transformers. InProceedings of the European Conference on Computer Vision (ECCV), pages 213–229, 2020. 7, 8

work page 2020

[6] [6]

Varshney

Hao Chen and Pramod K. Varshney. A human perception inspired quality metric for image fusion based on regional information.Information Fusion, 8(2):193–207, 2007. 6

work page 2007

[7] [7]

Sdsfusion: A semantic-aware infrared and visible image fusion network for degraded scenes.IEEE Transactions on Image Processing, 34:3139–3153, 2025

Jun Chen, Liling Yang, Wei Yu, Wenping Gong, Zhanchuan Cai, and Jiayi Ma. Sdsfusion: A semantic-aware infrared and visible image fusion network for degraded scenes.IEEE Transactions on Image Processing, 34:3139–3153, 2025. 2

work page 2025

[8] [8]

Yin Chen and Rick S. Blum. A new automated quality as- sessment algorithm for image fusion.Image and Vision Com- puting, 27(10):1421–1432, 2009. 6

work page 2009

[9] [9]

Dynamic convolution: Attention over convolution kernels

Yinpeng Chen, Xiyang Dai, Mengchen Liu, Dongdong Chen, Lu Yuan, and Zicheng Liu. Dynamic convolution: Attention over convolution kernels. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11027–11036, 2020. 2

work page 2020

[10] [10]

One model for all: Low-level task interaction is a key to task-agnostic image fu- sion

Chunyang Cheng, Tianyang Xu, Zhenhua Feng, Xiaojun Wu, Zhangyong Tang, Hui Li, Zeyang Zhang, Sara Atito, Muhammad Awais, and Josef Kittler. One model for all: Low-level task interaction is a key to task-agnostic image fu- sion. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 28102– 28112, 2025. 1

work page 2025

[11] [11]

Clever, Greg Turk, C

Zackory Erickson, Henry M. Clever, Greg Turk, C. Karen Liu, and Charles C. Kemp. Deep haptic model predictive control for robot-assisted dressing. In2018 IEEE Inter- national Conference on Robotics and Automation (ICRA), pages 4437–4444, 2018. 2

work page 2018

[12] [12]

Sam-guided multi-level collaborative transformer for infrared and visible image fusion.Pattern Recognition, 162:111391, 2025

Lin Guo, Xiaoqing Luo, Yue Liu, Zhancheng Zhang, and Xi- aojun Wu. Sam-guided multi-level collaborative transformer for infrared and visible image fusion.Pattern Recognition, 162:111391, 2025. 2

work page 2025

[13] [13]

Dynamic neural networks: A sur- vey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11):7436–7456, 2022

Yizeng Han, Gao Huang, Shiji Song, Le Yang, Honghui Wang, and Yulin Wang. Dynamic neural networks: A sur- vey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11):7436–7456, 2022. 2

work page 2022

[14] [14]

Llvip: A visible-infrared paired dataset for low- light vision

Xinyu Jia, Chuang Zhu, Minzhen Li, Wenqi Tang, and Wenli Zhou. Llvip: A visible-infrared paired dataset for low- light vision. InProceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), pages 3496–3504, 2021. 5

work page 2021

[15] [15]

Kingma and Jimmy Ba

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InProceedings of International Conference on Learning Representations (ICLR), 2015. 6

work page 2015

[16] [16]

Berg, Wan-Yen Lo, Piotr Dollar, and Ross Girshick

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C. Berg, Wan-Yen Lo, Piotr Dollar, and Ross Girshick. Segment anything. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4015–4026, 2023. 2

work page 2023

[17] [17]

Hui Li, Congcong Bian, Zeyang Zhang, Xiaoning Song, Xi Li, and XiaoJun Wu. Occo: Lvm-guided infrared and visible image fusion framework based on object-aware and contex- tual contrastive learning.International Journal of Computer Vision, 133(9):6611–6635, 2025. 6

work page 2025

[18] [18]

Hui Li, Congcong Bian, Zeyang Zhang, Xiaoning Song, Xi Li, and Xiao-Jun Wu. Occo: Lvm-guided infrared and visi- ble image fusion framework based on object-aware and con- textual contrastive learning.International Journal of Com- puter Vision, 133(9):6611–6635, 2025. 1

work page 2025

[19] [19]

Huafeng Li, Zengyi Yang, Yafei Zhang, Wei Jia, Zheng- tao Yu, and Yu Liu. Mulfs-cap: Multimodal fusion- supervised cross-modality alignment perception for unreg- istered infrared-visible image fusion.IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(5):3673– 3690, 2025

work page 2025

[20] [20]

From text to pix- els: A context-aware semantic synergy solution for infrared and visible image fusion.arXiv preprint arXiv:2401.00421,

Xingyuan Li, Yang Zou, Jinyuan Liu, Zhiying Jiang, Long Ma, Xin Fan, and Risheng Liu. From text to pixels: A context-aware semantic synergy solution for infrared and visible image fusion.arXiv preprint arXiv:2401.00421, 2023

work page arXiv 2023

[21] [21]

Contourlet residual for prompt learning enhanced infrared image super-resolution

Xingyuan Li, Jinyuan Liu, Zhixin Chen, Yang Zou, Long Ma, Xin Fan, and Risheng Liu. Contourlet residual for prompt learning enhanced infrared image super-resolution. InEuropean Conference on Computer Vision, pages 270–

work page

[22] [22]

Difiisr: A diffu- sion model with gradient guidance for infrared image super- resolution

Xingyuan Li, Zirui Wang, Yang Zou, Zhixin Chen, Jun Ma, Zhiying Jiang, Long Ma, and Jinyuan Liu. Difiisr: A diffu- sion model with gradient guidance for infrared image super- resolution. InProceedings of the Computer Vision and Pat- tern Recognition Conference, pages 7534–7544, 2025. 1

work page 2025

[23] [23]

Fusion from decomposition: A self-supervised approach for image fusion and beyond.arXiv preprint arXiv: 2410.12274, 2024

Pengwei Liang, Junjun Jiang, Qing Ma, Xianming Liu, and Jiayi Ma. Fusion from decomposition: A self-supervised approach for image fusion and beyond.arXiv preprint arXiv: 2410.12274, 2024. 2

work page arXiv 2024

[24] [24]

Conflict-averse gradient descent for multi-task learn- ing

Bo Liu, Xingchao Liu, Xiaojie Jin, Peter Stone, and Qiang Liu. Conflict-averse gradient descent for multi-task learn- ing. InAdvances in Neural Information Processing Systems (NeurIPS), pages 18878–18890, 2021. 4

work page 2021

[25] [25]

Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection

Jinyuan Liu, Xin Fan, Zhanbo Huang, Guanyao Wu, Risheng Liu, Wei Zhong, and Zhongxuan Luo. Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5802–5811,

work page

[26] [26]

Multi-interactive feature learning and a full-time multi-modality benchmark for image fusion and segmentation

Jinyuan Liu, Zhu Liu, Guanyao Wu, Long Ma, Risheng Liu, Wei Zhong, Zhongxuan Luo, and Xin Fan. Multi-interactive feature learning and a full-time multi-modality benchmark for image fusion and segmentation. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 8081–8090, 2023. 2, 5, 6, 7, 1, 3

work page 2023

[27] [27]

Coconet: Coupled con- trastive learning network with multi-level feature ensemble for multi-modality image fusion.International Journal of Computer Vision, 132(5):1748–1775, 2024

Jinyuan Liu, Runjia Lin, Guanyao Wu, Risheng Liu, Zhongxuan Luo, and Xin Fan. Coconet: Coupled con- trastive learning network with multi-level feature ensemble for multi-modality image fusion.International Journal of Computer Vision, 132(5):1748–1775, 2024. 6, 1

work page 2024

[28] [28]

Infrared and visible image fusion: From data compatibility to task adaption.IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, 47(4):2349–2369, 2025

Jinyuan Liu, Guanyao Wu, Zhu Liu, Di Wang, Zhiying Jiang, Long Ma, Wei Zhong, Xin Fan, and Risheng Liu. Infrared and visible image fusion: From data compatibility to task adaption.IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, 47(4):2349–2369, 2025. 6

work page 2025

[29] [29]

Infrared and visible image fusion: From data compatibility to task adaption.IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, 47(4):2349–2369, 2025

Jinyuan Liu, Guanyao Wu, Zhu Liu, Di Wang, Zhiying Jiang, Long Ma, Wei Zhong, Xin Fan, and Risheng Liu. Infrared and visible image fusion: From data compatibility to task adaption.IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, 47(4):2349–2369, 2025. 5

work page 2025

[30] [30]

A task-guided, implicitly-searched and meta-initialized deep model for image fusion.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 46(10):6594–6609,

Risheng Liu, Zhu Liu, Jinyuan Liu, Xin Fan, and Zhongxuan Luo. A task-guided, implicitly-searched and meta-initialized deep model for image fusion.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 46(10):6594–6609,

work page

[31] [31]

Yu Liu, Zhengzheng Qi, Juan Cheng, and Xun Chen. Re- thinking the effectiveness of objective evaluation metrics in multi-focus image fusion: A statistic-based approach.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(8):5806–5819, 2024. 6

work page 2024

[32] [32]

Bi-level dynamic learning for jointly multi- modality image fusion and beyond

Zhu Liu, Jinyuan Liu, Guanyao Wu, Long Ma, Xin Fan, and Risheng Liu. Bi-level dynamic learning for jointly multi- modality image fusion and beyond. InProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI), pages 1240–1248, 2023. 2

work page 2023

[33] [33]

Paif: Perception-aware infrared-visible image fusion for attack-tolerant semantic segmentation

Zhu Liu, Jinyuan Liu, Benzhuang Zhang, Long Ma, Xin Fan, and Risheng Liu. Paif: Perception-aware infrared-visible image fusion for attack-tolerant semantic segmentation. In Proceedings of the 31st ACM International Conference on Multimedia (ACM MM), page 3706–3714, 2023. 2

work page 2023

[34] [34]

Infrared and visible im- age fusion methods and applications: A survey.Information Fusion, 45:153–178, 2019

Jiayi Ma, Yong Ma, and Chang Li. Infrared and visible im- age fusion methods and applications: A survey.Information Fusion, 45:153–178, 2019. 6

work page 2019

[35] [35]

Jane Wang, and Xun Chen

Yu Shi, Yu Liu, Juan Cheng, Z. Jane Wang, and Xun Chen. Vdmufusion: A versatile diffusion model-based unsuper- vised framework for image fusion.IEEE Transactions on Image Processing, 34:441–454, 2025. 1

work page 2025

[36] [36]

Det- fusion: A detection-driven infrared and visible image fusion network

Yiming Sun, Bing Cao, Pengfei Zhu, and Qinghua Hu. Det- fusion: A detection-driven infrared and visible image fusion network. InProceedings of the 30th ACM International Con- ference on Multimedia (ACM MM), page 4003–4011, 2022. 1, 2

work page 2022

[37] [37]

Task-gated multi- expert collaboration network for degraded multi-modal im- age fusion

Yiming Sun, Xin Li, Pengfei Zhu, Qinghua Hu, Dongwei Ren, Huiying Xu, and Xinzhong Zhu. Task-gated multi- expert collaboration network for degraded multi-modal im- age fusion. InProceedings of 42nd International Conference on Machine Learning (ICML), 2025. 1

work page 2025

[38] [38]

Image fusion in the loop of high-level vision tasks: A semantic-aware real- time infrared and visible image fusion network.Information Fusion, 82:28–42, 2022

Linfeng Tang, Jiteng Yuan, and Jiayi Ma. Image fusion in the loop of high-level vision tasks: A semantic-aware real- time infrared and visible image fusion network.Information Fusion, 82:28–42, 2022. 1, 2

work page 2022

[39] [39]

Piafusion: A progressive infrared and visible im- age fusion network based on illumination aware.Information Fusion, 83-84:79–92, 2022

Linfeng Tang, Jiteng Yuan, Hao Zhang, Xingyu Jiang, and Jiayi Ma. Piafusion: A progressive infrared and visible im- age fusion network based on illumination aware.Information Fusion, 83-84:79–92, 2022. 5

work page 2022

[40] [40]

Linfeng Tang, Hao Zhang, Han Xu, and Jiayi Ma. Rethink- ing the necessity of image fusion in high-level vision tasks: A practical infrared and visible image fusion network based on progressive semantic injection and scene fidelity.Infor- mation Fusion, 99:101870, 2023. 2

work page 2023

[41] [41]

C2rf: Bridging multi-modal image regis- tration and fusion via commonality mining and contrastive learning.International Journal of Computer Vision, 133(8): 5262–5280, 2025

Linfeng Tang, Qinglong Yan, Xinyu Xiang, Leyuan Fang, and Jiayi Ma. C2rf: Bridging multi-modal image regis- tration and fusion via commonality mining and contrastive learning.International Journal of Computer Vision, 133(8): 5262–5280, 2025. 1

work page 2025

[42] [42]

LLaMA: Open and Efficient Foundation Language Models

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timoth´ee Lacroix, Baptiste Rozi`ere, Naman Goyal, Eric Hambro, Faisal Azhar, Aure- lien Rodriguez, Armand Joulin, Edouard Grave, and Guil- laume Lample. Llama: Open and efficient foundation lan- guage models.arXiv preprint arXiv: 2302.13971, 2023. 7

work page internal anchor Pith review Pith/arXiv arXiv 2023

[43] [43]

Rgbt salient object detection: A large- scale dataset and benchmark.IEEE Transactions on Multi- media, 25:4163–4176, 2023

Zhengzheng Tu, Yan Ma, Zhun Li, Chenglong Li, Jieming Xu, and Yongtao Liu. Rgbt salient object detection: A large- scale dataset and benchmark.IEEE Transactions on Multi- media, 25:4163–4176, 2023. 5, 3

work page 2023

[44] [44]

An inter- actively reinforced paradigm for joint infrared-visible image fusion and saliency object detection.Information Fusion, 98: 101828, 2023

Di Wang, Jinyuan Liu, Risheng Liu, and Xin Fan. An inter- actively reinforced paradigm for joint infrared-visible image fusion and saliency object detection.Information Fusion, 98: 101828, 2023. 2, 6, 7, 1

work page 2023

[45] [45]

Di Wang, Xianghao Jiao, Jinyuan Liu, and Xin Fan. Robust one-stop multi-modality image registration-fusion- segmentation framework against misalignments and adver- sarial attacks.IEEE Transactions on Multimedia, 27:4531– 4543, 2025. 1 10

work page 2025

[46] [46]

Every sam drop counts: Embracing semantic priors for multi-modality image fusion and beyond

Guanyao Wu, Haoyu Liu, Hongming Fu, Yichuan Peng, Jinyuan Liu, Xin Fan, and Risheng Liu. Every sam drop counts: Embracing semantic priors for multi-modality image fusion and beyond. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 17882–17891, 2025. 1, 2, 6

work page 2025

[47] [47]

Segformer: Simple and efficient design for semantic segmentation with transform- ers

Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M Alvarez, and Ping Luo. Segformer: Simple and efficient design for semantic segmentation with transform- ers. InAdvances in Neural Information Processing Systems (NeurIPS), 2021. 5

work page 2021

[48] [48]

Fusiondn: A unified densely connected network for image fusion.Proceedings of the AAAI Conference on Arti- ficial Intelligence, 34(07):12484–12491, 2020

Han Xu, Jiayi Ma, Zhuliang Le, Junjun Jiang, and Xiaojie Guo. Fusiondn: A unified densely connected network for image fusion.Proceedings of the AAAI Conference on Arti- ficial Intelligence, 34(07):12484–12491, 2020. 5

work page 2020

[49] [49]

U2fusion: A unified unsupervised image fusion net- work.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(1):502–518, 2022

Han Xu, Jiayi Ma, Junjun Jiang, Xiaojie Guo, and Haibin Ling. U2fusion: A unified unsupervised image fusion net- work.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(1):502–518, 2022. 5

work page 2022

[50] [50]

Objective image fusion performance measure.Electronics letters, 36(4):308– 309, 2000

Costas S Xydeas, Vladimir Petrovic, et al. Objective image fusion performance measure.Electronics letters, 36(4):308– 309, 2000. 6

work page 2000

[51] [51]

Instruction-driven fusion of infrared–visible images: Tailor- ing for diverse downstream tasks.Information Fusion, 121: 103148, 2025

Zengyi Yang, Yafei Zhang, Huafeng Li, and Yu Liu. Instruction-driven fusion of infrared–visible images: Tailor- ing for diverse downstream tasks.Information Fusion, 121: 103148, 2025. 2

work page 2025

[52] [52]

Text-if: Leveraging semantic text guidance for degradation-aware and interactive image fusion

Xunpeng Yi, Han Xu, Hao Zhang, Linfeng Tang, and Ji- ayi Ma. Text-if: Leveraging semantic text guidance for degradation-aware and interactive image fusion. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 27016–27025, 2024. 6

work page 2024

[53] [53]

Mrfs: Mutually reinforcing image fusion and segmenta- tion

Hao Zhang, Xuhui Zuo, Jie Jiang, Chunchao Guo, and Jiayi Ma. Mrfs: Mutually reinforcing image fusion and segmenta- tion. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 26964– 26973, 2024. 1, 2, 6, 7

work page 2024

[54] [54]

Omnifuse: Composite degradation-robust image fusion with language-driven semantics.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 47(9):7577–7595,

Hao Zhang, Lei Cao, Xuhui Zuo, Zhenfeng Shao, and Jiayi Ma. Omnifuse: Composite degradation-robust image fusion with language-driven semantics.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 47(9):7577–7595,

work page

[55] [55]

Visible and infrared image fusion using deep learning.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 45(8):10535–10554,

Xingchen Zhang and Yiannis Demiris. Visible and infrared image fusion using deep learning.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 45(8):10535–10554,

work page

[56] [56]

Vifb: A visi- ble and infrared image fusion benchmark

Xingchen Zhang, Ping Ye, and Gang Xiao. Vifb: A visi- ble and infrared image fusion benchmark. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 468–478, 2020. 6

work page 2020

[57] [57]

Metafusion: Infrared and visible image fusion via meta- feature embedding from object detection

Wenda Zhao, Shigeng Xie, Fan Zhao, You He, and Huchuan Lu. Metafusion: Infrared and visible image fusion via meta- feature embedding from object detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13955–13965, 2023. 1, 2

work page 2023

[58] [58]

Freefusion: Infrared and visible image fusion via cross reconstruction learning.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 47(9):8040–8056,

Wenda Zhao, Hengshuai Cui, Haipeng Wang, You He, and Huchuan Lu. Freefusion: Infrared and visible image fusion via cross reconstruction learning.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 47(9):8040–8056,

work page

[59] [59]

Com- plementary trilateral decoder for fast and accurate salient ob- ject detection

Zhirui Zhao, Changqun Xia, Chenxi Xie, and Jia Li. Com- plementary trilateral decoder for fast and accurate salient ob- ject detection. InProceedings of the 29th ACM International Conference on Multimedia (ACM MM), page 4967–4975,

work page

[60] [60]

Equivariant multi-modality image fusion

Zixiang Zhao, Haowen Bai, Jiangshe Zhang, Yulun Zhang, Kai Zhang, Shuang Xu, Dongdong Chen, Radu Timofte, and Luc Van Gool. Equivariant multi-modality image fusion. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 25912–25921,

work page

[61] [61]

task network retraining

1 11 Customized Fusion: A Closed-Loop Dynamic Network for Adaptive Multi-Task-Aware Infrared-Visible Image Fusion Supplementary Material A. More Details of VFN In the first stage, we train the VFN to focus on generating vi- sually guided fused images. In the second stage, the VFN is frozen, while the RSC module assists in adapting the VFN to various downs...

work page