HATIR: Heat-Aware Diffusion for Turbulent Infrared Video Super-Resolution
Pith reviewed 2026-05-16 16:43 UTC · model grok-4.3
The pith
Heat-aware diffusion restores turbulent infrared video details by using consistent thermal phasor responses.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
HATIR injects heat-aware deformation priors into the diffusion sampling path to jointly model the inverse process of turbulent degradation and structural detail loss. Specifically, it constructs a Phasor-Guided Flow Estimator rooted in the physical principle that thermally active regions exhibit consistent phasor responses over time, enabling reliable turbulence-aware flow to guide the reverse diffusion process. A Turbulence-Aware Decoder is proposed to selectively suppress unstable temporal cues and enhance edge-aware feature aggregation via turbulence gating and structure-aware attention.
What carries the argument
Phasor-Guided Flow Estimator that uses consistent phasor responses in thermally active regions to generate turbulence-aware flow for guiding the reverse diffusion process.
If this is right
- Jointly modeling turbulence and super-resolution avoids error propagation from decoupled degradation handling.
- The Turbulence-Aware Decoder improves fidelity by suppressing unstable cues and enhancing edge features.
- The FLIR-IVSR dataset provides paired LR-HR sequences for benchmarking turbulent infrared VSR methods.
- Structural recovery is enhanced under nonuniform distortions and varying motion conditions.
Where Pith is reading between the lines
- If phasor consistency proves robust, the heat-aware prior could be adapted to other generative models for physical scene restoration.
- Similar domain-specific priors might help address combined degradations in visible-light video super-resolution under turbulence.
- Future tests could apply the framework to infrared sequences from different sensors to check generalizability beyond the FLIR camera.
Load-bearing premise
Thermally active regions exhibit consistent phasor responses over time that enable reliable turbulence-aware flow estimation to guide the reverse diffusion process.
What would settle it
Infrared video sequences where thermal objects exhibit rapidly varying phasor responses due to changing heat emissions, which would invalidate the flow estimation and lead to poor restoration quality.
Figures
read the original abstract
Infrared video has been of great interest in visual tasks under challenging environments, but often suffers from severe atmospheric turbulence and compression degradation. Existing video super-resolution (VSR) methods either neglect the inherent modality gap between infrared and visible images or fail to restore turbulence-induced distortions. Directly cascading turbulence mitigation (TM) algorithms with VSR methods leads to error propagation and accumulation due to the decoupled modeling of degradation between turbulence and resolution. We introduce HATIR, a Heat-Aware Diffusion for Turbulent InfraRed Video Super-Resolution, which injects heat-aware deformation priors into the diffusion sampling path to jointly model the inverse process of turbulent degradation and structural detail loss. Specifically, HATIR constructs a Phasor-Guided Flow Estimator, rooted in the physical principle that thermally active regions exhibit consistent phasor responses over time, enabling reliable turbulence-aware flow to guide the reverse diffusion process. To ensure the fidelity of structural recovery under nonuniform distortions, a Turbulence-Aware Decoder is proposed to selectively suppress unstable temporal cues and enhance edge-aware feature aggregation via turbulence gating and structure-aware attention. We built FLIR-IVSR, the first dataset for turbulent infrared VSR, comprising paired LR-HR sequences from a FLIR T1050sc camera (1024 X 768) spanning 640 diverse scenes with varying camera and object motion conditions. This encourages future research in infrared VSR. Project page: https://github.com/JZ0606/HATIR
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces HATIR, a diffusion-based method for turbulent infrared video super-resolution that injects heat-aware deformation priors into the reverse diffusion process via a Phasor-Guided Flow Estimator (rooted in consistent phasor responses of thermally active regions) and a Turbulence-Aware Decoder (using turbulence gating and structure-aware attention). It also releases the FLIR-IVSR dataset of paired LR-HR sequences from 640 scenes captured with a FLIR T1050sc camera under varying motion conditions.
Significance. If the central claims hold, the work offers a principled joint modeling strategy that avoids error accumulation from cascaded turbulence mitigation and VSR pipelines, with potential to improve structural recovery in challenging IR modalities. The FLIR-IVSR dataset is a clear positive contribution as the first dedicated benchmark for this task, likely to enable future reproducible research.
major comments (2)
- [Phasor-Guided Flow Estimator description] The Phasor-Guided Flow Estimator is load-bearing for the joint modeling claim, yet the manuscript reports no direct validation (e.g., endpoint error, temporal consistency scores) confirming that phasor responses remain stable across the FLIR-IVSR dataset's varying turbulence and motion levels.
- [Experiments / Results] No ablation studies or quantitative comparisons are provided that isolate the phasor component's benefit relative to a standard optical-flow baseline, leaving the superiority of the heat-aware priors unsubstantiated.
minor comments (1)
- [Turbulence-Aware Decoder] The interaction between the Turbulence-Aware Decoder's gating mechanism and the diffusion sampling steps could be clarified with a diagram or pseudocode for better reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and positive assessment of the work's significance and the FLIR-IVSR dataset contribution. We address each major comment below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: The Phasor-Guided Flow Estimator is load-bearing for the joint modeling claim, yet the manuscript reports no direct validation (e.g., endpoint error, temporal consistency scores) confirming that phasor responses remain stable across the FLIR-IVSR dataset's varying turbulence and motion levels.
Authors: We agree that the manuscript does not include direct quantitative validation of the Phasor-Guided Flow Estimator, such as endpoint error or temporal consistency metrics across varying turbulence and motion conditions in FLIR-IVSR. The estimator is grounded in the physical principle of consistent phasor responses for thermally active regions, with its effectiveness shown through overall end-to-end results. To strengthen this, we will add explicit flow estimation evaluations (endpoint error and temporal consistency) on the dataset in the revised manuscript. revision: yes
-
Referee: No ablation studies or quantitative comparisons are provided that isolate the phasor component's benefit relative to a standard optical-flow baseline, leaving the superiority of the heat-aware priors unsubstantiated.
Authors: We acknowledge that the current experiments lack targeted ablations isolating the phasor-guided flow estimator against a standard optical-flow baseline. While the full HATIR model is compared to existing VSR and turbulence mitigation methods, this specific isolation is absent. We will include such ablation studies in the revised manuscript to directly substantiate the benefit of the heat-aware priors. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper grounds its central modules in an external physical principle (thermally active regions exhibit consistent phasor responses over time) rather than deriving that principle from its own fitted outputs or equations. The Phasor-Guided Flow Estimator and Turbulence-Aware Decoder are introduced as new architectural components that use this principle to guide diffusion sampling; no equation shows a parameter fitted to the target metric being relabeled as a prediction, and no self-citation chain is invoked to justify uniqueness or load-bearing assumptions. The FLIR-IVSR dataset construction and joint modeling of turbulence and super-resolution are presented as independent contributions without reducing the claimed performance gains to tautological reparameterization of inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Thermally active regions exhibit consistent phasor responses over time
invented entities (2)
-
Phasor-Guided Flow Estimator
no independent evidence
-
Turbulence-Aware Decoder
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Basicvsr++: Improving video super- resolution with enhanced propagation and alignment
Kelvin CK Chan, Shangchen Zhou, Xiangyu Xu, and Chen Change Loy. Basicvsr++: Improving video super- resolution with enhanced propagation and alignment. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5972–5981, 2022. 5
work page 2022
-
[2]
Learning spatial adaptation and temporal coherence in diffusion models for video super-resolution
Zhikai Chen, Fuchen Long, Zhaofan Qiu, Ting Yao, Wen- gang Zhou, Jiebo Luo, and Tao Mei. Learning spatial adaptation and temporal coherence in diffusion models for video super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9232–9241, 2024. 2
work page 2024
-
[3]
Egovsr: Towards high-quality egocentric video super-resolution
Yichen Chi, Junhao Gu, Jiamiao Zhang, Wenming Yang, and Yapeng Tian. Egovsr: Towards high-quality egocentric video super-resolution. IEEE Transactions on Circuits and Systems for Video Technology, 2024. 2, 5
work page 2024
-
[4]
Recurrent back-projection network for video super- resolution
Muhammad Haris, Gregory Shakhnarovich, and Norimichi Ukita. Recurrent back-projection network for video super- resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3897–3906,
-
[5]
Bidirectional recurrent convolutional networks for multi-frame super- resolution
Yan Huang, Wei Wang, and Liang Wang. Bidirectional recurrent convolutional networks for multi-frame super- resolution. Advances in neural information processing systems, 28, 2015. 2
work page 2015
-
[6]
Video super- resolution via bidirectional recurrent convolutional net- works
Yan Huang, Wei Wang, and Liang Wang. Video super- resolution via bidirectional recurrent convolutional net- works. IEEE transactions on pattern analysis and machine intelligence, 40(4):1015–1028, 2017
work page 2017
-
[7]
Video super-resolution with recur- rent structure-detail network
Takashi Isobe, Xu Jia, Shuhang Gu, Songjiang Li, Shengjin Wang, and Qi Tian. Video super-resolution with recur- rent structure-detail network. In European conference on computer vision, pages 645–660. Springer, 2020. 2
work page 2020
-
[8]
Younghyun Jo, Seoung Wug Oh, Jaeyeon Kang, and Seon Joo Kim. Deep video super-resolution network using dynamic upsampling filters without explicit motion compen- sation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3224–3232, 2018. 2
work page 2018
-
[9]
Jiaojiao Li, Songcheng Du, Chaoxiong Wu, Yihong Leng, Rui Song, and Yunsong Li. Drcr net: Dense residual chan- nel re-calibration network with non-local purification for spectral super resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1259–1268, 2022. 2
work page 2022
-
[10]
A2rnet: Adversarial attack resilient network for robust in- frared and visible image fusion
Jiawei Li, Hongwei Yu, Jiansheng Chen, Xinlong Ding, Jin- long Wang, Jinyuan Liu, Bochao Zou, and Huimin Ma. A2rnet: Adversarial attack resilient network for robust in- frared and visible image fusion. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 4770– 4778, 2025. 2
work page 2025
-
[11]
Mucan: Multi-correspondence aggregation network for video super-resolution
Wenbo Li, Xin Tao, Taian Guo, Lu Qi, Jiangbo Lu, and Jiaya Jia. Mucan: Multi-correspondence aggregation network for video super-resolution. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part X 16, pages 335–351. Springer,
work page 2020
-
[12]
Contourlet residual for prompt learning enhanced infrared image super-resolution
Xingyuan Li, Jinyuan Liu, Zhixin Chen, Yang Zou, Long Ma, Xin Fan, and Risheng Liu. Contourlet residual for prompt learning enhanced infrared image super-resolution. In European Conference on Computer Vision, pages 270–
-
[13]
Difiisr: A diffusion model with gradient guidance for infrared image super-resolution
Xingyuan Li, Zirui Wang, Yang Zou, Zhixin Chen, Jun Ma, Zhiying Jiang, Long Ma, and Jinyuan Liu. Difiisr: A diffusion model with gradient guidance for infrared image super-resolution. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 7534–7544, 2025. 2
work page 2025
-
[14]
Recurrent video restoration transformer with guided deformable attention
Jingyun Liang, Yuchen Fan, Xiaoyu Xiang, Rakesh Ran- jan, Eddy Ilg, Simon Green, Jiezhang Cao, Kai Zhang, Radu Timofte, and Luc V Gool. Recurrent video restoration transformer with guided deformable attention. Advances in Neural Information Processing Systems, 35:378–393, 2022. 2, 3
work page 2022
-
[15]
Jinyuan Liu, Xin Fan, Zhanbo Huang, Guanyao Wu, Risheng Liu, Wei Zhong, and Zhongxuan Luo. Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5802–5811, 2022. 2
work page 2022
-
[16]
Promptfusion: Harmonized semantic prompt learning for infrared and visible image fu- sion
Jinyuan Liu, Xingyuan Li, Zirui Wang, Zhiying Jiang, Wei Zhong, Wei Fan, and Bin Xu. Promptfusion: Harmonized semantic prompt learning for infrared and visible image fu- sion. IEEE/CAA Journal of Automatica Sinica, 2024. 2
work page 2024
-
[17]
Jinyuan Liu, Bowei Zhang, Qingyun Mei, Xingyuan Li, Yang Zou, Zhiying Jiang, Long Ma, Risheng Liu, and Xin Fan. Dcevo: Discriminative cross-dimensional evolu- tionary learning for infrared and visible image fusion. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 2226–2235, 2025. 2
work page 2025
-
[18]
Yating Liu, Yang Zou, Xingyuan Li, Xingyue Zhu, Kaiqi Han, Zhiying Jiang, Long Ma, and Jinyuan Liu. To- ward a training-free plug-and-play refinement framework for infrared and visible image registration and fusion. In Proceedings of the 33rd ACM International Conference on Multimedia, pages 1268–1277, 2025. 2
work page 2025
-
[19]
Infrared and visible im- age fusion methods and applications: A survey
Jiayi Ma, Yong Ma, and Chang Li. Infrared and visible im- age fusion methods and applications: A survey. Information fusion, 45:153–178, 2019. 5
work page 2019
-
[20]
Optical flow esti- mation using a spatial pyramid network
Anurag Ranjan and Michael J Black. Optical flow esti- mation using a spatial pyramid network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4161–4170, 2017. 3, 7
work page 2017
-
[21]
High-resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022. 5
work page 2022
-
[22]
Turb-seg-res: a segment-then-restore pipeline for dynamic videos with atmospheric turbulence
Ripon Kumar Saha, Dehao Qin, Nianyi Li, Jinwei Ye, and Suren Jayasuriya. Turb-seg-res: a segment-then-restore pipeline for dynamic videos with atmospheric turbulence. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 25286–25296, 2024. 3, 5
work page 2024
-
[23]
Frame-recurrent video super-resolution
Mehdi SM Sajjadi, Raviteja Vemulapalli, and Matthew Brown. Frame-recurrent video super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6626–6634, 2018. 2
work page 2018
-
[24]
Tdan: Temporally-deformable alignment network for video super-resolution
Yapeng Tian, Yulun Zhang, Yun Fu, and Chenliang Xu. Tdan: Temporally-deformable alignment network for video super-resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3360–3369, 2020. 2
work page 2020
-
[25]
Edvr: Video restoration with enhanced deformable convolutional networks
Xintao Wang, Kelvin CK Chan, Ke Yu, Chao Dong, and Chen Change Loy. Edvr: Video restoration with enhanced deformable convolutional networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 0–0, 2019. 2
work page 2019
-
[26]
Yadong Wang, Darui Jin, Junzhang Chen, and Xiangzhi Bai. Revelation of hidden 2d atmospheric turbulence strength fields from turbulence effects in infrared imaging. Nature Computational Science, 3(8):687–699, 2023. 2
work page 2023
-
[27]
Acr-net: Learning high- accuracy optical flow via adaptive-aware correlation recur- rent network
Zixu Wang, Congxuan Zhang, Zhen Chen, Weiming Hu, Ke Lu, Liyue Ge, and Zige Wang. Acr-net: Learning high- accuracy optical flow via adaptive-aware correlation recur- rent network. IEEE Transactions on Circuits and Systems for Video Technology, 34(10):9064–9077, 2024. 3
work page 2024
-
[28]
Efficient rec- tified flow for image fusion.Advances in Neural Information Processing Systems, 2025
Zirui Wang, Jiayi Zhang, Tianwei Guan, Yuhan Zhou, Xingyuan Li, Minjing Dong, and Jinyuan Liu. Efficient rec- tified flow for image fusion.Advances in Neural Information Processing Systems, 2025. 2
work page 2025
-
[29]
Zeyu Wang, Jizheng Zhang, Haiyu Song, Mingyu Ge, Jiayu Wang, and Haoran Duan. Highlight what you want: Weakly- supervised instance-level controllable infrared-visible im- age fusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12637–12647, 2025. 2
work page 2025
-
[30]
Enhancing video super-resolution via im- plicit resampling-based alignment
Kai Xu, Ziwei Yu, Xin Wang, Michael Bi Mi, and Angela Yao. Enhancing video super-resolution via im- plicit resampling-based alignment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2546–2555, 2024. 5
work page 2024
-
[31]
Motion-guided latent diffusion for temporally consistent real-world video super-resolution
Xi Yang, Chenhang He, Jianqi Ma, and Lei Zhang. Motion-guided latent diffusion for temporally consistent real-world video super-resolution. In European Conference on Computer Vision, pages 224–242. Springer, 2024. 2, 3, 5
work page 2024
-
[32]
Geunhyuk Youk, Jihyong Oh, and Munchurl Kim. Fma- net: Flow-guided dynamic filtering and iterative feature re- finement with multi-attention for joint video super-resolution and deblurring. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 44–55,
-
[33]
Spatio-temporal turbulence mitigation: a translational perspective
Xingguang Zhang, Nicholas Chimitt, Yiheng Chi, Zhiyuan Mao, and Stanley H Chan. Spatio-temporal turbulence mitigation: a translational perspective. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2889–2899, 2024. 3, 5
work page 2024
-
[34]
Learning phase distortion with selec- tive state space models for video turbulence mitigation
Xingguang Zhang, Nicholas Chimitt, Xijun Wang, Yu Yuan, and Stanley H Chan. Learning phase distortion with selec- tive state space models for video turbulence mitigation. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 2127–2138, 2025. 3, 5
work page 2025
-
[35]
Ddfm: Denoising diffusion model for multi-modality image fusion
Zixiang Zhao, Haowen Bai, Yuanzhi Zhu, Jiangshe Zhang, Shuang Xu, Yulun Zhang, Kai Zhang, Deyu Meng, Radu 9 Timofte, and Luc Van Gool. Ddfm: Denoising diffusion model for multi-modality image fusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 8082–8093, 2023. 2
work page 2023
-
[36]
Efficient video super-resolution for real-time render- ing with decoupled g-buffer guidance
Mingjun Zheng, Long Sun, Jiangxin Dong, and Jinshan Pan. Efficient video super-resolution for real-time render- ing with decoupled g-buffer guidance. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 11328–11337, 2025. 2
work page 2025
-
[37]
Upscale-a-video: Temporal- consistent diffusion model for real-world video super- resolution
Shangchen Zhou, Peiqing Yang, Jianyi Wang, Yihang Luo, and Chen Change Loy. Upscale-a-video: Temporal- consistent diffusion model for real-world video super- resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2535– 2545, 2024. 2, 3
work page 2024
-
[38]
Video super-resolution transformer with masked inter&intra-frame attention
Xingyu Zhou, Leheng Zhang, Xiaorui Zhao, Keze Wang, Leida Li, and Shuhang Gu. Video super-resolution transformer with masked inter&intra-frame attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 25399–25408, 2024. 5
work page 2024
-
[39]
Yang Zou, Xingyuan Li, Zhiying Jiang, and Jinyuan Liu. En- hancing neural radiance fields with adaptive multi-exposure fusion: A bilevel optimization approach for novel view syn- thesis. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 7882–7890, 2024. 2
work page 2024
-
[40]
Yang Zou, Zhixin Chen, Zhipeng Zhang, Xingyuan Li, Long Ma, Jinyuan Liu, Peng Wang, and Yanning Zhang. Contourlet refinement gate framework for thermal spectrum distribution regularized infrared image super-resolution. International Journal of Computer Vision, 2026. 2 10
work page 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.