ReMATF: Recurrent Motion-Adaptive Multi-scale Turbulence Mitigation for Dynamic Scenes
Pith reviewed 2026-05-21 05:04 UTC · model grok-4.3
The pith
ReMATF restores turbulence-degraded videos using only two frames at a time while preserving spatial detail and temporal stability.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ReMATF restores videos through a recurrent architecture that takes only the previous output and current frame as input. A multi-scale encoder-decoder extracts features, temporal warping aligns the prior result to the current frame, and a motion-adaptive temporal fusion module performs per-pixel combination of the warped previous output and current prediction to reduce flicker and sharpen details. Experiments demonstrate consistent gains in objective and perceptual quality metrics alongside substantially faster inference than transformer baselines that require larger temporal windows.
What carries the argument
Motion-adaptive temporal fusion module that performs per-pixel fusion between the warped previous output and the current prediction to enhance coherence
If this is right
- Supports real-time processing in resource-constrained environments due to reduced memory and compute demands compared to multi-frame methods.
- Maintains temporal stability across dynamic scenes by recurrently carrying information from one pair of frames to the next.
- Delivers measurable improvements in PSNR, SSIM, and perceptual quality on both synthetic and real turbulence datasets.
- Enables deployment where access to extended frame histories is limited or latency must remain low.
Where Pith is reading between the lines
- The two-frame recurrent pattern may extend to other video degradation tasks such as denoising where full temporal windows are costly.
- Per-pixel adaptive weighting could be tested in live streaming pipelines to check if flicker reduction holds under varying motion speeds.
- Efficiency gains might allow integration into portable imaging systems for field use without specialized hardware.
Load-bearing premise
That per-pixel motion-adaptive fusion between the warped previous output and current prediction can sufficiently enhance temporal coherence and reduce flicker without needing a larger temporal window or additional frames.
What would settle it
Observation of increased temporal flickering or lower LPIPS scores on long dynamic video sequences when the two-frame recurrent method is compared directly against a multi-frame transformer baseline under identical turbulence conditions.
Figures
read the original abstract
Atmospheric turbulence severely degrades video quality by introducing distortions such as geometric warping, blur, and temporal flickering, posing significant challenges to both visual clarity and temporal consistency. Current state-of-the-art methods are based on transformer, 3D architectures and require multi-frame input, but their large computational cost and memory usage limit real-time deployment, especially in resource-constrained scenarios. In this work, we propose ReMATF, a lightweight recurrent framework that restores videos using only two frames at a time while preserving spatial detail and temporal stability. ReMATF combines a multi-scale encoder-decoder with temporal warping and a motion-adaptive temporal fusion module that performs per-pixel fusion between the warped previous output and the current prediction to enhance coherence without enlarging the temporal window. This design reduces flicker, sharpens details, and remains efficient. Experiments on synthetic and real turbulence datasets show consistent improvements in PSNR/SSIM and perceptual quality (LPIPS), along with substantially faster inference than multi-frame transformer baselines, making ReMATF suitable turbulence mitigation in resource-constrained scenarios.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes ReMATF, a lightweight recurrent framework for atmospheric turbulence mitigation in dynamic video scenes. It restores videos using only two frames at a time via a multi-scale encoder-decoder architecture, temporal warping, and a motion-adaptive temporal fusion module that performs per-pixel blending of the warped previous output with the current prediction. Experiments on synthetic and real turbulence datasets are reported to show consistent gains in PSNR, SSIM, and LPIPS alongside substantially faster inference than multi-frame transformer baselines.
Significance. If validated, the recurrent two-frame design with motion-adaptive fusion would represent a practical efficiency advance for real-time turbulence mitigation on resource-constrained hardware, where current multi-frame transformer methods are limited by compute and memory demands. The approach directly targets the trade-off between temporal stability and speed in video restoration.
major comments (2)
- [§3] §3 (Method), motion-adaptive temporal fusion description: the central claim that per-pixel blending of the warped prior output with the current prediction is sufficient to enforce long-term temporal coherence without drift or a larger temporal window is load-bearing for the efficiency argument, yet the text provides no explicit motion estimation source, residual misalignment handling, or drift-correction mechanism despite turbulence distorting motion fields.
- [§4] §4 (Experiments): reported PSNR/SSIM/LPIPS gains and inference speedups are shown on standard short-clip evaluations, but no long-horizon consistency metrics (e.g., temporal flicker over sequences longer than typical test clips) or ablation on fusion error propagation are included, leaving the no-drift assumption without direct support.
minor comments (2)
- [§3] Add an equation formalizing the per-pixel fusion operation (e.g., weighting function) in the method section for reproducibility.
- [§4] Clarify dataset details and full experimental protocols (train/test splits, turbulence parameters) to strengthen the empirical claims.
Simulated Author's Rebuttal
We thank the referee for their constructive comments on our paper. We have carefully considered the points raised regarding the method description and experimental validation, and we provide detailed responses below. We have revised the manuscript to address these concerns.
read point-by-point responses
-
Referee: [§3] §3 (Method), motion-adaptive temporal fusion description: the central claim that per-pixel blending of the warped prior output with the current prediction is sufficient to enforce long-term temporal coherence without drift or a larger temporal window is load-bearing for the efficiency argument, yet the text provides no explicit motion estimation source, residual misalignment handling, or drift-correction mechanism despite turbulence distorting motion fields.
Authors: We appreciate this detailed feedback on the method section. In the revised manuscript, we have expanded the description of the motion-adaptive temporal fusion module in §3. The motion estimation is performed by a dedicated lightweight optical flow estimation branch within the multi-scale encoder-decoder, which computes per-scale flow fields used for warping the previous output. Residual misalignments due to turbulence are handled by the fusion module, which generates adaptive blending weights based on both spatial features and the estimated motion confidence. This allows the network to reduce the influence of misaligned pixels. Regarding long-term coherence without drift, the per-pixel blending prioritizes the current prediction in regions of high turbulence, effectively mitigating error accumulation. We have included a diagram and additional equations to illustrate this process. We agree that an explicit drift-correction mechanism like keyframe resetting could be beneficial for extremely long sequences and have noted this as future work. revision: yes
-
Referee: [§4] §4 (Experiments): reported PSNR/SSIM/LPIPS gains and inference speedups are shown on standard short-clip evaluations, but no long-horizon consistency metrics (e.g., temporal flicker over sequences longer than typical test clips) or ablation on fusion error propagation are included, leaving the no-drift assumption without direct support.
Authors: We agree that demonstrating long-term temporal consistency is important for validating the recurrent design. In the revised paper, we have extended the experimental section to include evaluations on longer video sequences (up to 200 frames) from both synthetic and real datasets. We introduce a temporal flicker metric, defined as the standard deviation of temporal gradients in the restored video, to quantify consistency over extended horizons. Furthermore, we add an ablation study that simulates error propagation by varying the turbulence strength and measuring the degradation in output quality over time with and without the motion-adaptive fusion. The results show that our fusion module significantly reduces drift compared to naive recurrent baselines. These new results are presented in §4 and the supplementary material, providing direct support for the no-drift claim within practical sequence lengths. revision: yes
Circularity Check
No significant circularity detected in ReMATF architecture or claims
full rationale
The paper presents ReMATF as an independent architectural proposal: a recurrent two-frame pipeline combining a multi-scale encoder-decoder, temporal warping, and a per-pixel motion-adaptive fusion module. These elements are described as design choices motivated by efficiency and stability needs, then validated through experiments on external synthetic and real turbulence datasets. No equations, predictions, or central claims reduce by construction to fitted parameters, self-definitions, or load-bearing self-citations. The method does not rename known results or smuggle ansatzes via prior self-work; empirical metrics (PSNR/SSIM/LPIPS) and runtime comparisons serve as external evidence rather than tautological outputs. This matches the default expectation of a self-contained engineering contribution.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
MATF performs pixel-wise estimation... static pixels place greater confidence in the warped previous output, whereas dynamic pixels place more trust in the current restoration
-
IndisputableMonolith/Foundation/ArrowOfTime.leanforward_accumulates unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
recurrent formulation that propagates information through the previously restored frame... O_t = M Ô_t + (1-M) O_{t-1}
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
In: 2018 25th IEEE International Con- ference on Image Processing (ICIP)
Anantrasirichai, N., Achim, A., Bull, D.: Atmospheric turbu- lence mitigation for sequences with moving objects using re- cursive image fusion. In: 2018 25th IEEE International Con- ference on Image Processing (ICIP). pp. 2895–2899 (2018). https://doi.org/10.1109/ICIP.2018.8451755
-
[2]
Pattern Recognition Letters171, 69–75 (2023)
Anantrasirichai, N.: Atmospheric turbulence re- moval with complex-valued convolutional neural net- work. Pattern Recognition Letters171, 69–75 (2023). https://doi.org/https://doi.org/10.1016/j.patrec.2023.05.017
-
[3]
IEEE Transac- tions on Image Processing22(6), 2398–2408 (2013)
Anantrasirichai, N., Achim, A., Kingsbury, N.G., Bull, D.R.: Atmospheric turbulence mitigation us- ing complex wavelet-based fusion. IEEE Transac- tions on Image Processing22(6), 2398–2408 (2013). https://doi.org/10.1109/TIP.2013.2249078
-
[4]
Journal of the Optical Society of America A16(6), 1417–1429 (Jun 1999)
Andrews, L.C., Phillips, R.L., Hopen, C.Y ., Al-Habash, M.A.: Theory of optical scintillation. Journal of the Optical Society of America A16(6), 1417–1429 (Jun 1999)
work page 1999
-
[5]
Foundations and Trends in Computer Graphics and Vision15(4), 253–508 (2023)
Chan, S.H., Chimitt, N.: Computational imaging through atmospheric turbulence. Foundations and Trends in Computer Graphics and Vision15(4), 253–508 (2023). https://doi.org/10.1561/0600000103
-
[6]
Cheng, Z., Li, Z., Ji, Z., Xia, A.: Quantitative atmospheric turbulence simulating method for laser field imaging. In: 2022 4th International Conference on Intelligent Control, Measurement and Signal Processing (ICMSP). pp. 238–242 (2022). https://doi.org/10.1109/ICMSP55950.2022.9858990
-
[7]
wb ≡1 recovers the uniform variant
Dai, J., Qi, H., Xiong, Y ., Li, Y ., Zhang, G., Hu, H., Wei, Y .: Deformable convolutional networks. In: 2017 IEEE Interna- tional Conference on Computer Vision (ICCV). pp. 764–773 (2017). https://doi.org/10.1109/ICCV .2017.89
-
[8]
Ettedgui, B., Yitzhaky, Y .: Atmospheric turbulence degraded video restoration with recurrent GAN (ATVR-GAN). Sen- sors23(21) (2023). https://doi.org/10.3390/s23218815
-
[9]
IEEE Journal on Selected Areas in Information TheoryPP, 1–1 (01 2023)
Feng, B., Xie, M., Metzler, C.: Turbugan: An adversarial learning approach to spatially-varying multiframe blind de- convolution with applications to imaging through turbulence. IEEE Journal on Selected Areas in Information TheoryPP, 1–1 (01 2023). https://doi.org/10.1109/JSAIT.2023.3234225
-
[10]
Gao, J., Anantrasirichai, N., Bull, D.: Atmospheric turbu- lence removal using convolutional neural network (2019)
work page 2019
-
[11]
Artificial Intelligence Review58, 101 (2025)
Hill, P., Anantrasirichai, N., Achim, A., et al.: Deep learning techniques for atmospheric turbulence removal: a review. Artificial Intelligence Review58, 101 (2025). https://doi.org/10.1007/s10462-024-11086-6
-
[12]
Hill, P., Anantrasirichai, N.: Atmospheric turbulence dataset. (Sep 2024). https://doi.org/10.5281/zenodo.13737763
-
[13]
Artificial Intelligence Review58(4), 101 (2025)
Hill, P., Anantrasirichai, N., Achim, A., Bull, D.: Deep learning techniques for atmospheric turbulence removal: a review. Artificial Intelligence Review58(4), 101 (2025)
work page 2025
-
[14]
Hill, P., Liu, Z., Achim, A., Bull, D., Anantrasirichai, N.: DMAT: An end-to-end framework for joint atmospheric tur- bulence mitigation and object detection. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Com- puter Vision (W ACV) (2026)
work page 2026
-
[15]
In: 21st IEEE International Conference on Advanced Visual and Signal-Based Surveillance (A VSS)
Hill, P., Liu, Z., Anantrasirichai, N.: MAMAT: 3D Mamba- Based Atmospheric Turbulence Removal and its Object De- tection Capability. In: 21st IEEE International Conference on Advanced Visual and Signal-Based Surveillance (A VSS). IEEE (2025)
work page 2025
-
[16]
In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Hirsch, M., Sra, S., Sch ¨olkopf, B., Harmeling, S.: Efficient filter flow for space-variant multiframe blind deconvolution. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. pp. 607–614 (2010)
work page 2010
-
[17]
Optical Engineering60(3), 033103 (2021)
Hoffmire, M.A., Hardie, R.C., Rucci, M.A., Hook, R.V ., Karch, B.K.: Deep learning for anisopla- natic optical turbulence mitigation in long-range imaging. Optical Engineering60(3), 033103 (2021). https://doi.org/10.1117/1.OE.60.3.033103
-
[18]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)
Jaiswal, A., Zhang, X., Chan, S.H., Wang, Z.: Physics-driven turbulence image restoration with stochastic refinement. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). pp. 12170–12181 (October 2023)
work page 2023
-
[19]
Jiang, W., Boominathan, V ., Veeraraghavan, A.: NeRT: Im- plicit neural representations for unsupervised atmospheric turbulence mitigation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. pp. 4236–4243 (June 2023)
work page 2023
-
[21]
Nature Machine Intelligence3(10), 876–884 (2021)
Jin, D., Chen, Y ., Lu, Y ., et al.: Neutralizing the impact of atmospheric turbulence on complex scene imaging via deep learning. Nature Machine Intelligence3(10), 876–884 (2021). https://doi.org/10.1038/s42256-021-00392-1
-
[22]
Signal Processing89(4), 649– 655 (2009)
Li, D.: Suppressing atmospheric turbulent motion in video through trajectory smoothing. Signal Processing89(4), 649– 655 (2009)
work page 2009
-
[23]
IEEE Transactions on Image Processing33, 2171– 2182 (2024)
Liang, J., Cao, J., Fan, Y ., Zhang, K., Ranjan, R., Li, Y ., Timofte, R., Van Gool, L.: Vrt: A video restoration trans- former. IEEE Transactions on Image Processing33, 2171– 2182 (2024). https://doi.org/10.1109/TIP.2024.3372454
-
[24]
In: Advances in Neural Information Processing Sys- tems
Liang, J., Fan, Y ., Xiang, X., Ranjan, R., Ilg, E., Green, S., Cao, J., Zhang, K., Timofte, R., Gool, L.V .: Recurrent video restoration transformer with guided deformable atten- tion. In: Advances in Neural Information Processing Sys- tems. vol. 35, pp. 378–393. Curran Associates, Inc. (2022)
work page 2022
-
[25]
In: Proceedings of the AAAI Conference on Artificial Intelligence (2026)
Liu, Z., Anantrasirichai, N.: RMFAT: Recurrent multi-scale feature atmospheric turbulence mitigator. In: Proceedings of the AAAI Conference on Artificial Intelligence (2026)
work page 2026
-
[26]
IEEE Journal of Selected Topics in Signal Processing17(3), 587–598 (2023)
Mei, K., Patel, V .M.: LTT-GAN: Looking through tur- bulence by inverting gans. IEEE Journal of Selected Topics in Signal Processing17(3), 587–598 (2023). https://doi.org/10.1109/JSTSP.2023.3238552
-
[27]
In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (W ACV)
Nair, N.G., Mei, K., Patel, V .M.: AT-DDPM: Restoring faces degraded by atmospheric turbulence using denoising diffu- sion probabilistic models. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (W ACV). pp. 3434–3443 (January 2023)
work page 2023
-
[28]
Rana, H.S.: Toward generic military imaging adaptive op- tics. In: Proceedings of SPIE. vol. 7119, p. 711904 (Sep 2008). https://doi.org/10.1117/12.800442
-
[29]
Point Transformer V3: Simpler, Faster, Stronger
Saha, R.K., Qin, D., Li, N., Ye, J., Jayasuriya, S.: Turb-seg-res: A segment-then-restore pipeline for dynamic videos with atmospheric turbulence. 2024 IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR) pp. 25286–25296 (2024). https://doi.org/10.1109/CVPR52733.2024.02389
-
[30]
In: Bouma, H., Prabhu, R., Yitzhaky, Y ., Kuijf, H.J
Vint, D., Caterina, G.D., Kirkland, P., Lamb, R.A.: Deep learning-based turbulence mitigation for long range imaging. In: Bouma, H., Prabhu, R., Yitzhaky, Y ., Kuijf, H.J. (eds.) Artificial Intelligence for Security and Defence Applications II. vol. 13206, p. 132060Z. International Society for Optics and Photonics (2024). https://doi.org/10.1117/12.3031269
-
[31]
Wang, J., Chan, K.C., Loy, C.C.: Exploring clip for assessing the look and feel of images. In: AAAI (2023)
work page 2023
-
[32]
arXiv preprint arXiv:2407.08377 (2024)
Xu, S., Sun, R., Chang, Y ., Cao, S., Xiao, X., Yan, L.: Long- range turbulence mitigation: A large-scale dataset and a coarse-to-fine framework. arXiv preprint arXiv:2407.08377 (2024)
-
[33]
Optics & Laser Technology188, 112880 (2025)
Yuan, Z., Meng, P., Yin, W., Zhou, L.: Turbulence mitigation in optical imaging using pyramid attention gan. Optics & Laser Technology188, 112880 (2025). https://doi.org/https://doi.org/10.1016/j.optlastec.2025.112880
-
[34]
Point Transformer V3: Simpler, Faster, Stronger
Zhang, X., Chimitt, N., Chi, Y ., Mao, Z., Chan, S.H.: Spatio- temporal turbulence mitigation: A translational perspec- tive. In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 2889–2899 (2024). https://doi.org/10.1109/CVPR52733.2024.00279
-
[35]
In: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR)
Zhang, X., Chimitt, N., Chi, Y ., Mao, Z., Chan, S.H.: Spatio- temporal turbulence mitigation: A translational perspective. In: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR). pp. 2889– 2899 (June 2024)
work page 2024
-
[36]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Zhang, X., Chimitt, N., Wang, X., Yuan, Y ., Chan, S.H.: Learning phase distortion with selective state space mod- els for video turbulence mitigation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 2127–2138 (2025)
work page 2025
-
[37]
IEEE Transactions on Computational Imaging10, 115–128 (2024)
Zhang, X., Mao, Z., Chimitt, N., Chan, S.H.: Imaging through the atmosphere using turbulence mitigation trans- former. IEEE Transactions on Computational Imaging10, 115–128 (2024). https://doi.org/10.1109/TCI.2024.3354421
-
[38]
International Journal of Computer Vision 131(1), 284–301 (2023)
Zhong, Z., Gao, Y ., Zheng, Y ., et al.: Real-world video deblurring: A benchmark dataset and an efficient recurrent neural network. International Journal of Computer Vision 131(1), 284–301 (2023). https://doi.org/10.1007/s11263- 022-01705-6
-
[39]
In: Proceedings of the AAAI Conference on Artificial Intelligence
Zhu, C., Dong, H., Pan, J., Liang, B., Huang, Y ., Fu, L., Wang, F.: Deep recurrent neural network with multi-scale bi-directional propagation for video deblur- ring. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 36, pp. 3598–3607 (2022). https://doi.org/10.1609/aaai.v36i3.20272
-
[40]
IEEE Transactions on Pattern Analysis and Machine Intelligence35(1), 157–170 (2013)
Zhu, X., Milanfar, P.: Removing atmospheric turbulence via space-invariant deconvolution. IEEE Transactions on Pattern Analysis and Machine Intelligence35(1), 157–170 (2013)
work page 2013
-
[41]
In: 17th Asian Conference on Computer Vision
Zou, Z., Anantrasirichai, N.: DeTurb: Atmospheric turbu- lence mitigation with deformable 3d convolutions and 3d swin transformers. In: 17th Asian Conference on Computer Vision. p. 20–37 (2024) S1. Additional Analysis of Generalization in AT Mitigation S1.1. Effect of Motion on Temporal Fusion We further analyse how scene motion affects the preferred temp...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.