Customized Fusion: A Closed-Loop Dynamic Network for Adaptive Multi-Task-Aware Infrared-Visible Image Fusion
Pith reviewed 2026-05-10 17:19 UTC · model grok-4.3
The pith
A closed-loop dynamic network customizes infrared-visible fusion for multiple downstream tasks by feeding back task performance to compensate semantics.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a closed-loop optimization mechanism, built around a Requirement-driven Semantic Compensation module, can transmit semantic needs from downstream tasks back to the fusion network. The module employs a Basis Vector Bank together with an Architecture-Adaptive Semantic Injection block to alter network behavior according to task requirements, so that the fused image actively supports whichever task is active without any retraining of the fusion weights.
What carries the argument
The Requirement-driven Semantic Compensation (RSC) module, which uses a Basis Vector Bank and Architecture-Adaptive Semantic Injection block to reshape the fusion network according to measured task performance.
If this is right
- The fusion network maintains high visual quality on standard benchmarks while gaining the ability to serve multiple tasks.
- Explicit feedback from task metrics drives semantic changes, removing the need to retrain the fusion model for each new task.
- A reward-penalty rule based on task performance variations guides the compensation process.
- The same trained model exhibits measurable adaptability across the M3FD, FMB, and VT5000 datasets.
Where Pith is reading between the lines
- The closed-loop idea could be applied to other multi-modal fusion settings where the best output depends on which task is running at the moment.
- By avoiding separate fusion models for each task, the approach may lower overall storage and compute costs in systems that switch between tasks.
- If the compensation remains stable over long sequences of changing tasks, the method might support continuous online adaptation in deployed vision systems.
Load-bearing premise
Measured changes in downstream task performance can be translated into stable, useful adjustments to the fusion network without causing instability or overfitting to individual tasks.
What would settle it
Running the method on a new task or dataset where the adapted fusion produces lower task accuracy than a fixed, non-adaptive fusion baseline would show that the closed-loop compensation is not providing the claimed benefit.
Figures
read the original abstract
Infrared-visible image fusion aims to integrate complementary information for robust visual understanding, but existing fusion methods struggle with simultaneously adapting to multiple downstream tasks. To address this issue, we propose a Closed-Loop Dynamic Network (CLDyN) that can adaptively respond to the semantic requirements of diverse downstream tasks for task-customized image fusion. Specifically, CLDyN introduces a closed-loop optimization mechanism that establishes a semantic transmission chain to achieve explicit feedback from downstream tasks to the fusion network through a Requirement-driven Semantic Compensation (RSC) module. The RSC module leverages a Basis Vector Bank (BVB) and an Architecture-Adaptive Semantic Injection (A2SI) block to customize the network architecture according to task requirements, thereby enabling task-specific semantic compensation and allowing the fusion network to actively adapt to diverse tasks without retraining. To promote semantic compensation, a reward-penalty strategy is introduced to reward or penalize the RSC module based on task performance variations. Experiments on the M3FD, FMB, and VT5000 datasets demonstrate that CLDyN not only maintains high fusion quality but also exhibits strong multi-task adaptability. The code is available at https://github.com/YR0211/CLDyN.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a Closed-Loop Dynamic Network (CLDyN) for adaptive infrared-visible image fusion across multiple downstream tasks. It features a closed-loop optimization with a Requirement-driven Semantic Compensation (RSC) module that utilizes a Basis Vector Bank (BVB) and Architecture-Adaptive Semantic Injection (A2SI) block to customize the fusion network based on task semantics. A reward-penalty strategy guides the adaptation using variations in task performance, allowing the system to respond to diverse tasks without retraining. Validation is performed on the M3FD, FMB, and VT5000 datasets, asserting maintained fusion quality alongside multi-task adaptability.
Significance. Should the proposed closed-loop mechanism prove stable and effective in providing task-driven customization, this contribution would be significant for infrared-visible fusion research. It tackles the challenge of task-specific adaptation in fusion networks, which could streamline applications requiring robustness to varying semantic needs, such as in object detection or segmentation pipelines. The public code release aids in verifying and extending the work.
major comments (2)
- [RSC module and reward-penalty strategy] The reward-penalty strategy employs downstream task performance to directly influence the RSC module's adjustments to the fusion network. This creates a potential circular dependency, where the performance metric serves both as the driver for modification and the evaluator of the output. To support the central claim of reliable adaptation without retraining, the paper must demonstrate the stability of this process, perhaps through convergence proofs or extensive empirical validation beyond the reported datasets.
- [Experiments section] The experiments claim strong multi-task adaptability on M3FD, FMB, and VT5000, yet the provided description lacks specific quantitative metrics, ablation studies isolating the contributions of BVB and A2SI, and error analysis. This omission weakens the ability to assess whether the closed-loop truly enables the claimed customization or if results could be due to other factors.
minor comments (2)
- Consider adding a table summarizing quantitative fusion metrics (e.g., PSNR, SSIM) and task performance improvements across datasets for clarity.
- [Abstract] The abstract states 'strong multi-task adaptability' without supporting numbers; including one or two key results would strengthen the summary.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback on our manuscript. We address each major comment point by point below, providing clarifications and outlining planned revisions to improve the paper's rigor and clarity.
read point-by-point responses
-
Referee: [RSC module and reward-penalty strategy] The reward-penalty strategy employs downstream task performance to directly influence the RSC module's adjustments to the fusion network. This creates a potential circular dependency, where the performance metric serves both as the driver for modification and the evaluator of the output. To support the central claim of reliable adaptation without retraining, the paper must demonstrate the stability of this process, perhaps through convergence proofs or extensive empirical validation beyond the reported datasets.
Authors: We acknowledge the valid concern about potential circular dependency in the closed-loop design. The reward-penalty mechanism uses performance variations as feedback to adjust the RSC module via the BVB and A2SI, but the downstream metrics (e.g., detection mAP or segmentation IoU) are computed independently on the fused output after each adaptation step, breaking direct circularity. While a formal convergence proof is not provided in the current manuscript due to the non-convex and dynamic nature of the architecture search, we will add extensive empirical validation in the revision, including convergence plots of task performance over adaptation iterations, stability analysis across random seeds, and results on additional task variations within the M3FD, FMB, and VT5000 datasets. These additions will support the claim of reliable adaptation without retraining. revision: partial
-
Referee: [Experiments section] The experiments claim strong multi-task adaptability on M3FD, FMB, and VT5000, yet the provided description lacks specific quantitative metrics, ablation studies isolating the contributions of BVB and A2SI, and error analysis. This omission weakens the ability to assess whether the closed-loop truly enables the claimed customization or if results could be due to other factors.
Authors: We appreciate this observation and agree that more granular details are needed. The original manuscript reports quantitative fusion metrics (e.g., PSNR, SSIM, VIF) and downstream task results (e.g., mAP on detection), but we will expand the experiments section to include: (1) specific numerical tables with all metrics and standard deviations, (2) dedicated ablation studies isolating BVB and A2SI contributions (with and without each component), and (3) error analysis including per-task performance breakdowns, failure case discussions, and statistical significance tests. These revisions will better demonstrate the closed-loop's role in customization. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper's central mechanism (closed-loop optimization via RSC module with BVB, A2SI, and reward-penalty based on downstream task performance variations) is presented as an external feedback process from task metrics to network adaptation, not as a self-referential definition or a fitted parameter renamed as a prediction. No equations or steps in the abstract reduce the claimed semantic transmission chain to its own inputs by construction. No self-citations, uniqueness theorems, or ansatzes are invoked in the provided text to load-bear the architecture. The multi-dataset experiments are cited as empirical support for stability and adaptability, keeping the derivation self-contained against external benchmarks rather than tautological.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Downstream task performance provides a stable and informative signal for adjusting fusion parameters
invented entities (3)
-
Requirement-driven Semantic Compensation (RSC) module
no independent evidence
-
Basis Vector Bank (BVB)
no independent evidence
-
Architecture-Adaptive Semantic Injection (A2SI) block
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
CLDyN introduces a closed-loop optimization mechanism that establishes a semantic transmission chain... through a Requirement-driven Semantic Compensation (RSC) module... Basis Vector Bank (BVB) and an Architecture-Adaptive Semantic Injection (A2SI) block... reward-penalty strategy
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Experiments on the M3FD, FMB, and VT5000 datasets demonstrate that CLDyN not only maintains high fusion quality but also exhibits strong multi-task adaptability
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Task- driven image fusion with learnable fusion loss
Haowen Bai, Jiangshe Zhang, Zixiang Zhao, Yichen Wu, Lilun Deng, Yukun Cui, Tao Feng, and Shuang Xu. Task- driven image fusion with learnable fusion loss. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7457–7468, 2025. 1, 2, 6
work page 2025
-
[2]
Haowen Bai, Zixiang Zhao, Jiangshe Zhang, Baisong Jiang, Lilun Deng, Yukun Cui, Shuang Xu, and Chunxia Zhang. Deep unfolding multi-modal image fusion network via attri- bution analysis.IEEE Transactions on Circuits and Systems for Video Technology, 35(4):3498–3511, 2025. 1, 2
work page 2025
-
[3]
Closed-loop visuomotor control with gen- erative expectation for robotic manipulation
Qingwen Bu, Jia Zeng, Li Chen, Yanchao Yang, Guyue Zhou, Junchi Yan, Ping Luo, Heming Cui, Yi Ma, and Hongyang Li. Closed-loop visuomotor control with gen- erative expectation for robotic manipulation. InAdvances in Neural Information Processing Systems (NeurIPS), pages 139002–139029, 2024. 2
work page 2024
-
[4]
Conditional controllable image fusion
Bing Cao, Xingxin Xu, Pengfei Zhu, Qilong Wang, and Qinghua Hu. Conditional controllable image fusion. InAd- vances in Neural Information Processing Systems (NeurIPS), pages 120311–120335, 2024. 1
work page 2024
-
[5]
End- to-end object detection with transformers
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End- to-end object detection with transformers. InProceedings of the European Conference on Computer Vision (ECCV), pages 213–229, 2020. 7, 8
work page 2020
- [6]
-
[7]
Jun Chen, Liling Yang, Wei Yu, Wenping Gong, Zhanchuan Cai, and Jiayi Ma. Sdsfusion: A semantic-aware infrared and visible image fusion network for degraded scenes.IEEE Transactions on Image Processing, 34:3139–3153, 2025. 2
work page 2025
-
[8]
Yin Chen and Rick S. Blum. A new automated quality as- sessment algorithm for image fusion.Image and Vision Com- puting, 27(10):1421–1432, 2009. 6
work page 2009
-
[9]
Dynamic convolution: Attention over convolution kernels
Yinpeng Chen, Xiyang Dai, Mengchen Liu, Dongdong Chen, Lu Yuan, and Zicheng Liu. Dynamic convolution: Attention over convolution kernels. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11027–11036, 2020. 2
work page 2020
-
[10]
One model for all: Low-level task interaction is a key to task-agnostic image fu- sion
Chunyang Cheng, Tianyang Xu, Zhenhua Feng, Xiaojun Wu, Zhangyong Tang, Hui Li, Zeyang Zhang, Sara Atito, Muhammad Awais, and Josef Kittler. One model for all: Low-level task interaction is a key to task-agnostic image fu- sion. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 28102– 28112, 2025. 1
work page 2025
-
[11]
Zackory Erickson, Henry M. Clever, Greg Turk, C. Karen Liu, and Charles C. Kemp. Deep haptic model predictive control for robot-assisted dressing. In2018 IEEE Inter- national Conference on Robotics and Automation (ICRA), pages 4437–4444, 2018. 2
work page 2018
-
[12]
Lin Guo, Xiaoqing Luo, Yue Liu, Zhancheng Zhang, and Xi- aojun Wu. Sam-guided multi-level collaborative transformer for infrared and visible image fusion.Pattern Recognition, 162:111391, 2025. 2
work page 2025
-
[13]
Yizeng Han, Gao Huang, Shiji Song, Le Yang, Honghui Wang, and Yulin Wang. Dynamic neural networks: A sur- vey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11):7436–7456, 2022. 2
work page 2022
-
[14]
Llvip: A visible-infrared paired dataset for low- light vision
Xinyu Jia, Chuang Zhu, Minzhen Li, Wenqi Tang, and Wenli Zhou. Llvip: A visible-infrared paired dataset for low- light vision. InProceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), pages 3496–3504, 2021. 5
work page 2021
-
[15]
Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InProceedings of International Conference on Learning Representations (ICLR), 2015. 6
work page 2015
-
[16]
Berg, Wan-Yen Lo, Piotr Dollar, and Ross Girshick
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C. Berg, Wan-Yen Lo, Piotr Dollar, and Ross Girshick. Segment anything. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4015–4026, 2023. 2
work page 2023
-
[17]
Hui Li, Congcong Bian, Zeyang Zhang, Xiaoning Song, Xi Li, and XiaoJun Wu. Occo: Lvm-guided infrared and visible image fusion framework based on object-aware and contex- tual contrastive learning.International Journal of Computer Vision, 133(9):6611–6635, 2025. 6
work page 2025
-
[18]
Hui Li, Congcong Bian, Zeyang Zhang, Xiaoning Song, Xi Li, and Xiao-Jun Wu. Occo: Lvm-guided infrared and visi- ble image fusion framework based on object-aware and con- textual contrastive learning.International Journal of Com- puter Vision, 133(9):6611–6635, 2025. 1
work page 2025
-
[19]
Huafeng Li, Zengyi Yang, Yafei Zhang, Wei Jia, Zheng- tao Yu, and Yu Liu. Mulfs-cap: Multimodal fusion- supervised cross-modality alignment perception for unreg- istered infrared-visible image fusion.IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(5):3673– 3690, 2025
work page 2025
-
[20]
Xingyuan Li, Yang Zou, Jinyuan Liu, Zhiying Jiang, Long Ma, Xin Fan, and Risheng Liu. From text to pixels: A context-aware semantic synergy solution for infrared and visible image fusion.arXiv preprint arXiv:2401.00421, 2023
-
[21]
Contourlet residual for prompt learning enhanced infrared image super-resolution
Xingyuan Li, Jinyuan Liu, Zhixin Chen, Yang Zou, Long Ma, Xin Fan, and Risheng Liu. Contourlet residual for prompt learning enhanced infrared image super-resolution. InEuropean Conference on Computer Vision, pages 270–
-
[22]
Difiisr: A diffu- sion model with gradient guidance for infrared image super- resolution
Xingyuan Li, Zirui Wang, Yang Zou, Zhixin Chen, Jun Ma, Zhiying Jiang, Long Ma, and Jinyuan Liu. Difiisr: A diffu- sion model with gradient guidance for infrared image super- resolution. InProceedings of the Computer Vision and Pat- tern Recognition Conference, pages 7534–7544, 2025. 1
work page 2025
-
[23]
Pengwei Liang, Junjun Jiang, Qing Ma, Xianming Liu, and Jiayi Ma. Fusion from decomposition: A self-supervised approach for image fusion and beyond.arXiv preprint arXiv: 2410.12274, 2024. 2
-
[24]
Conflict-averse gradient descent for multi-task learn- ing
Bo Liu, Xingchao Liu, Xiaojie Jin, Peter Stone, and Qiang Liu. Conflict-averse gradient descent for multi-task learn- ing. InAdvances in Neural Information Processing Systems (NeurIPS), pages 18878–18890, 2021. 4
work page 2021
-
[25]
Jinyuan Liu, Xin Fan, Zhanbo Huang, Guanyao Wu, Risheng Liu, Wei Zhong, and Zhongxuan Luo. Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5802–5811,
-
[26]
Jinyuan Liu, Zhu Liu, Guanyao Wu, Long Ma, Risheng Liu, Wei Zhong, Zhongxuan Luo, and Xin Fan. Multi-interactive feature learning and a full-time multi-modality benchmark for image fusion and segmentation. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 8081–8090, 2023. 2, 5, 6, 7, 1, 3
work page 2023
-
[27]
Jinyuan Liu, Runjia Lin, Guanyao Wu, Risheng Liu, Zhongxuan Luo, and Xin Fan. Coconet: Coupled con- trastive learning network with multi-level feature ensemble for multi-modality image fusion.International Journal of Computer Vision, 132(5):1748–1775, 2024. 6, 1
work page 2024
-
[28]
Jinyuan Liu, Guanyao Wu, Zhu Liu, Di Wang, Zhiying Jiang, Long Ma, Wei Zhong, Xin Fan, and Risheng Liu. Infrared and visible image fusion: From data compatibility to task adaption.IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, 47(4):2349–2369, 2025. 6
work page 2025
-
[29]
Jinyuan Liu, Guanyao Wu, Zhu Liu, Di Wang, Zhiying Jiang, Long Ma, Wei Zhong, Xin Fan, and Risheng Liu. Infrared and visible image fusion: From data compatibility to task adaption.IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, 47(4):2349–2369, 2025. 5
work page 2025
-
[30]
Risheng Liu, Zhu Liu, Jinyuan Liu, Xin Fan, and Zhongxuan Luo. A task-guided, implicitly-searched and meta-initialized deep model for image fusion.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 46(10):6594–6609,
-
[31]
Yu Liu, Zhengzheng Qi, Juan Cheng, and Xun Chen. Re- thinking the effectiveness of objective evaluation metrics in multi-focus image fusion: A statistic-based approach.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(8):5806–5819, 2024. 6
work page 2024
-
[32]
Bi-level dynamic learning for jointly multi- modality image fusion and beyond
Zhu Liu, Jinyuan Liu, Guanyao Wu, Long Ma, Xin Fan, and Risheng Liu. Bi-level dynamic learning for jointly multi- modality image fusion and beyond. InProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI), pages 1240–1248, 2023. 2
work page 2023
-
[33]
Paif: Perception-aware infrared-visible image fusion for attack-tolerant semantic segmentation
Zhu Liu, Jinyuan Liu, Benzhuang Zhang, Long Ma, Xin Fan, and Risheng Liu. Paif: Perception-aware infrared-visible image fusion for attack-tolerant semantic segmentation. In Proceedings of the 31st ACM International Conference on Multimedia (ACM MM), page 3706–3714, 2023. 2
work page 2023
-
[34]
Jiayi Ma, Yong Ma, and Chang Li. Infrared and visible im- age fusion methods and applications: A survey.Information Fusion, 45:153–178, 2019. 6
work page 2019
-
[35]
Yu Shi, Yu Liu, Juan Cheng, Z. Jane Wang, and Xun Chen. Vdmufusion: A versatile diffusion model-based unsuper- vised framework for image fusion.IEEE Transactions on Image Processing, 34:441–454, 2025. 1
work page 2025
-
[36]
Det- fusion: A detection-driven infrared and visible image fusion network
Yiming Sun, Bing Cao, Pengfei Zhu, and Qinghua Hu. Det- fusion: A detection-driven infrared and visible image fusion network. InProceedings of the 30th ACM International Con- ference on Multimedia (ACM MM), page 4003–4011, 2022. 1, 2
work page 2022
-
[37]
Task-gated multi- expert collaboration network for degraded multi-modal im- age fusion
Yiming Sun, Xin Li, Pengfei Zhu, Qinghua Hu, Dongwei Ren, Huiying Xu, and Xinzhong Zhu. Task-gated multi- expert collaboration network for degraded multi-modal im- age fusion. InProceedings of 42nd International Conference on Machine Learning (ICML), 2025. 1
work page 2025
-
[38]
Linfeng Tang, Jiteng Yuan, and Jiayi Ma. Image fusion in the loop of high-level vision tasks: A semantic-aware real- time infrared and visible image fusion network.Information Fusion, 82:28–42, 2022. 1, 2
work page 2022
-
[39]
Linfeng Tang, Jiteng Yuan, Hao Zhang, Xingyu Jiang, and Jiayi Ma. Piafusion: A progressive infrared and visible im- age fusion network based on illumination aware.Information Fusion, 83-84:79–92, 2022. 5
work page 2022
-
[40]
Linfeng Tang, Hao Zhang, Han Xu, and Jiayi Ma. Rethink- ing the necessity of image fusion in high-level vision tasks: A practical infrared and visible image fusion network based on progressive semantic injection and scene fidelity.Infor- mation Fusion, 99:101870, 2023. 2
work page 2023
-
[41]
Linfeng Tang, Qinglong Yan, Xinyu Xiang, Leyuan Fang, and Jiayi Ma. C2rf: Bridging multi-modal image regis- tration and fusion via commonality mining and contrastive learning.International Journal of Computer Vision, 133(8): 5262–5280, 2025. 1
work page 2025
-
[42]
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timoth´ee Lacroix, Baptiste Rozi`ere, Naman Goyal, Eric Hambro, Faisal Azhar, Aure- lien Rodriguez, Armand Joulin, Edouard Grave, and Guil- laume Lample. Llama: Open and efficient foundation lan- guage models.arXiv preprint arXiv: 2302.13971, 2023. 7
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[43]
Zhengzheng Tu, Yan Ma, Zhun Li, Chenglong Li, Jieming Xu, and Yongtao Liu. Rgbt salient object detection: A large- scale dataset and benchmark.IEEE Transactions on Multi- media, 25:4163–4176, 2023. 5, 3
work page 2023
-
[44]
Di Wang, Jinyuan Liu, Risheng Liu, and Xin Fan. An inter- actively reinforced paradigm for joint infrared-visible image fusion and saliency object detection.Information Fusion, 98: 101828, 2023. 2, 6, 7, 1
work page 2023
-
[45]
Di Wang, Xianghao Jiao, Jinyuan Liu, and Xin Fan. Robust one-stop multi-modality image registration-fusion- segmentation framework against misalignments and adver- sarial attacks.IEEE Transactions on Multimedia, 27:4531– 4543, 2025. 1 10
work page 2025
-
[46]
Every sam drop counts: Embracing semantic priors for multi-modality image fusion and beyond
Guanyao Wu, Haoyu Liu, Hongming Fu, Yichuan Peng, Jinyuan Liu, Xin Fan, and Risheng Liu. Every sam drop counts: Embracing semantic priors for multi-modality image fusion and beyond. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 17882–17891, 2025. 1, 2, 6
work page 2025
-
[47]
Segformer: Simple and efficient design for semantic segmentation with transform- ers
Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M Alvarez, and Ping Luo. Segformer: Simple and efficient design for semantic segmentation with transform- ers. InAdvances in Neural Information Processing Systems (NeurIPS), 2021. 5
work page 2021
-
[48]
Han Xu, Jiayi Ma, Zhuliang Le, Junjun Jiang, and Xiaojie Guo. Fusiondn: A unified densely connected network for image fusion.Proceedings of the AAAI Conference on Arti- ficial Intelligence, 34(07):12484–12491, 2020. 5
work page 2020
-
[49]
Han Xu, Jiayi Ma, Junjun Jiang, Xiaojie Guo, and Haibin Ling. U2fusion: A unified unsupervised image fusion net- work.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(1):502–518, 2022. 5
work page 2022
-
[50]
Objective image fusion performance measure.Electronics letters, 36(4):308– 309, 2000
Costas S Xydeas, Vladimir Petrovic, et al. Objective image fusion performance measure.Electronics letters, 36(4):308– 309, 2000. 6
work page 2000
-
[51]
Zengyi Yang, Yafei Zhang, Huafeng Li, and Yu Liu. Instruction-driven fusion of infrared–visible images: Tailor- ing for diverse downstream tasks.Information Fusion, 121: 103148, 2025. 2
work page 2025
-
[52]
Text-if: Leveraging semantic text guidance for degradation-aware and interactive image fusion
Xunpeng Yi, Han Xu, Hao Zhang, Linfeng Tang, and Ji- ayi Ma. Text-if: Leveraging semantic text guidance for degradation-aware and interactive image fusion. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 27016–27025, 2024. 6
work page 2024
-
[53]
Mrfs: Mutually reinforcing image fusion and segmenta- tion
Hao Zhang, Xuhui Zuo, Jie Jiang, Chunchao Guo, and Jiayi Ma. Mrfs: Mutually reinforcing image fusion and segmenta- tion. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 26964– 26973, 2024. 1, 2, 6, 7
work page 2024
-
[54]
Hao Zhang, Lei Cao, Xuhui Zuo, Zhenfeng Shao, and Jiayi Ma. Omnifuse: Composite degradation-robust image fusion with language-driven semantics.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 47(9):7577–7595,
-
[55]
Xingchen Zhang and Yiannis Demiris. Visible and infrared image fusion using deep learning.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 45(8):10535–10554,
-
[56]
Vifb: A visi- ble and infrared image fusion benchmark
Xingchen Zhang, Ping Ye, and Gang Xiao. Vifb: A visi- ble and infrared image fusion benchmark. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 468–478, 2020. 6
work page 2020
-
[57]
Metafusion: Infrared and visible image fusion via meta- feature embedding from object detection
Wenda Zhao, Shigeng Xie, Fan Zhao, You He, and Huchuan Lu. Metafusion: Infrared and visible image fusion via meta- feature embedding from object detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13955–13965, 2023. 1, 2
work page 2023
-
[58]
Wenda Zhao, Hengshuai Cui, Haipeng Wang, You He, and Huchuan Lu. Freefusion: Infrared and visible image fusion via cross reconstruction learning.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 47(9):8040–8056,
-
[59]
Com- plementary trilateral decoder for fast and accurate salient ob- ject detection
Zhirui Zhao, Changqun Xia, Chenxi Xie, and Jia Li. Com- plementary trilateral decoder for fast and accurate salient ob- ject detection. InProceedings of the 29th ACM International Conference on Multimedia (ACM MM), page 4967–4975,
-
[60]
Equivariant multi-modality image fusion
Zixiang Zhao, Haowen Bai, Jiangshe Zhang, Yulun Zhang, Kai Zhang, Shuang Xu, Dongdong Chen, Radu Timofte, and Luc Van Gool. Equivariant multi-modality image fusion. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 25912–25921,
-
[61]
1 11 Customized Fusion: A Closed-Loop Dynamic Network for Adaptive Multi-Task-Aware Infrared-Visible Image Fusion Supplementary Material A. More Details of VFN In the first stage, we train the VFN to focus on generating vi- sually guided fused images. In the second stage, the VFN is frozen, while the RSC module assists in adapting the VFN to various downs...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.