arxiv: 2605.05079 · v1 · submitted 2026-05-06 · 💻 cs.CV

Recognition: unknown

A unified Benchmark for Multi-Frame Image Restoration under Severe Refractive Warping

Maxim V. Shugaev , Md Reshad Ul Hoque , Bridget Kennedy , Joseph T. Riley , Fiona Hwang , Justin Hagen , Harvir Ghuman , Ethan Garcia-O'Donnell

show 3 more authors

Syed Noor Qadri Freddie Santiago Mun Wai Lee

Authors on Pith no claims yet

Pith reviewed 2026-05-08 17:08 UTC · model grok-4.3

classification 💻 cs.CV

keywords image restorationrefractive warpingvideo benchmarkgeometric distortiondiffusion modelsturbulence correctionmulti-frame restorationunderwater imaging

0 comments

The pith

A unified benchmark systematically tests video restoration methods across mild to extreme refractive warping using real lab data and physics-modeled synthetics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper creates a benchmark for removing geometric distortions in video captured through turbulent air or water surfaces, where prior work addressed only mild cases. It supplies real sequences recorded in the lab together with synthetic videos produced by modeling light refraction through four distortion levels and multiple surface wave types. Evaluation covers baselines, classical registration, and learning-based methods including a new diffusion model called V-cache, scored with both pixel metrics such as PSNR and SSIM and perceptual ones such as LPIPS, DINO, and CLIP. This setup supplies the first large-scale comparison for strong, nonuniform refractive effects and positions the benchmark as a standard resource for developing restoration algorithms in highly unstable optical conditions.

Core claim

The paper establishes a benchmark dataset and protocol that spans the full range of refractive warping, from turbulence-like mild distortions to strong discontinuous deformations, by pairing laboratory-captured real sequences with synthetic sequences generated via physics-based light refraction modeling, then uses it to compare restoration performance across classical and modern methods with both accuracy and perceptual metrics.

What carries the argument

The physics-based light refraction modeling that generates synthetic video sequences at controlled distortion levels, combined with real laboratory captures, to produce comparable test cases for geometric restoration algorithms.

If this is right

Restoration methods can now be ranked consistently across a continuous gradient of distortion severity instead of only mild turbulence.
Diffusion-based approaches such as V-cache become testable specifically on high and extreme distortion regimes where classical registration fails.
Perceptual metrics supplement pixel metrics to expose quality differences invisible to PSNR or SSIM alone.
The benchmark supplies a common reference for training and validating new multi-frame restoration networks aimed at unstable optical environments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same modeling approach could be adapted to generate training data for other dynamic media such as heat shimmer or particulate scattering if the refraction physics generalize.
Integrating the benchmark sequences into end-to-end training loops might improve generalization of learning-based restorers beyond the four predefined levels.
The performance gap between classical and diffusion methods on extreme cases suggests that temporal consistency modeling will remain a bottleneck for real-time applications.

Load-bearing premise

The physics-based synthetic sequences reproduce the statistical patterns and temporal dynamics of real severe refractive warping seen in the laboratory data.

What would settle it

A direct side-by-side statistical comparison of displacement field histograms or power spectra between the synthetic sequences and the real lab captures that shows large, systematic mismatches.

Figures

Figures reproduced from arXiv: 2605.05079 by Bridget Kennedy, Ethan Garcia-O'Donnell, Fiona Hwang, Freddie Santiago, Harvir Ghuman, Joseph T. Riley, Justin Hagen, Maxim V. Shugaev, Md Reshad Ul Hoque, Mun Wai Lee, Syed Noor Qadri.

**Figure 1.** Figure 1: An illustration of the distortion level for different wave amplitudes for ocean waves. Since low and mid amplitudes provide view at source ↗

**Figure 2.** Figure 2: Schematic illustration of V-cache. N refers to the total number of transformer blocks in the model. view at source ↗

**Figure 3.** Figure 3: Example of the output produced by V-cache A5 and A3 view at source ↗

**Figure 4.** Figure 4: Example of the output for ocean waves and ripples at view at source ↗

**Figure 5.** Figure 5: Example of the output for LAB data with high distortion. view at source ↗

**Figure 6.** Figure 6: Data collection setup in the lab with water tank and water view at source ↗

read the original abstract

Video sequence capturing through refractive dynamic media, such as a turbulent air or water surface, often suffer from severe geometric distortions and temporal instability. While recent advances address mild atmospheric turbulence, no existing benchmarks systematically evaluate restoration methods under strong and highly nonuniform refractive conditions. We present a comprehensive benchmark for geometric distortion removal in video, covering a range from turbulence-like mild warping to strong discontinuous refractive deformations. The benchmark includes both laboratory-captured real data and synthetic sequences generated for static scenes via physics-based light refraction modeling across four distortion levels and multiple surface wave types. We evaluate a spectrum of methods from simple baselines and classical registration algorithms to advanced learning-based approaches including DATUM and our proposed diffusion based V-cache for high and extreme distortions regimes. Evaluation uses both pixel-level (PSNR, SSIM), and perceptual (LPIPS, DINO, CLIP) metrics providing the first large scale analysis of geometric distortion removal. Our benchmark establishes a new foundation for developing and evaluating algorithms capable of reconstructing video from highly distorted optical environments. Our code and datasets are available at https://github.com/iafoss/refractive-mfir-benchmark.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives a practical new benchmark for severe refractive warping restoration with real and synthetic data plus public code, but the synthetic-to-real match is not clearly verified.

read the letter

The main thing to know is that this work puts together a benchmark for restoring video sequences distorted by strong refractive effects, using lab-captured real data alongside synthetic sequences generated from physics-based light refraction models at four distortion levels and various wave types. They test a range of methods from simple baselines through classical registration to learning-based ones, and introduce a diffusion approach called V-cache aimed at the highest distortion regimes. The code and datasets are released publicly, which is the most concrete contribution here.

Referee Report

2 major / 2 minor

Summary. The paper introduces a unified benchmark for multi-frame image restoration under severe refractive warping caused by dynamic media such as turbulent air or water surfaces. It combines laboratory-captured real data with synthetic sequences generated via physics-based light refraction modeling for static scenes, spanning four distortion levels and multiple surface wave types. The benchmark evaluates a spectrum of methods ranging from simple baselines and classical registration algorithms to advanced learning-based approaches, including DATUM and the authors' proposed diffusion-based V-cache method targeted at high and extreme distortion regimes. Performance is assessed using pixel-level metrics (PSNR, SSIM) and perceptual metrics (LPIPS, DINO, CLIP), with the goal of providing the first large-scale analysis and establishing a foundation for algorithms reconstructing video from highly distorted optical environments. Code and datasets are released publicly.

Significance. If the synthetic data faithfully reproduces the statistics and temporal dynamics of real severe refractive warping, the benchmark would fill a notable gap in the field by enabling systematic evaluation of restoration methods beyond mild turbulence regimes. The inclusion of both real and synthetic data, multi-level distortions, and diverse metrics could support reproducible progress in applications such as underwater vision or atmospheric imaging. However, the significance is currently limited by the absence of explicit validation that the physics-based synthetics match real lab statistics, which is central to the benchmark's claimed coverage of 'highly distorted optical environments.'

major comments (2)

[Abstract / Data Generation] Abstract and data generation description: The central claim that the benchmark covers 'strong discontinuous refractive deformations' and 'highly distorted optical environments' rests on the assumption that physics-based synthetic sequences across four distortion levels and multiple wave types reproduce the statistics and temporal dynamics of the laboratory real data. No quantitative comparisons (e.g., optical-flow histograms, discontinuity frequency, or temporal correlation spectra) between real and synthetic extreme cases are reported, leaving the fidelity unverified and the benchmark's utility for the claimed regime at risk.
[Evaluation] Evaluation section: The paper reports method rankings and V-cache effectiveness on both real and synthetic data, yet the abstract provides no quantitative results, ablation details, or verification that synthetic data matches real statistics. Without these, the cross-method comparisons and claims about performance in severe regimes cannot be fully assessed for robustness.

minor comments (2)

[Abstract] The abstract mentions 'our proposed diffusion based V-cache' but does not define its architecture or training details at a level that allows immediate reproduction; a dedicated methods subsection would improve clarity.
[Evaluation] Metric choices (PSNR, SSIM, LPIPS, DINO, CLIP) are listed without justification for their suitability to geometric distortion removal; a brief rationale or reference to prior use in similar tasks would strengthen the evaluation design.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We address the major comments below and will incorporate revisions to strengthen the manuscript's claims regarding data fidelity and abstract clarity.

read point-by-point responses

Referee: [Abstract / Data Generation] The central claim that the benchmark covers 'strong discontinuous refractive deformations' rests on the assumption that physics-based synthetic sequences reproduce the statistics and temporal dynamics of the laboratory real data. No quantitative comparisons (e.g., optical-flow histograms, discontinuity frequency, or temporal correlation spectra) between real and synthetic extreme cases are reported, leaving the fidelity unverified.

Authors: We agree that explicit quantitative validation of synthetic fidelity to real data statistics is essential to support claims about coverage of severe regimes. The synthetic data is generated via physics-based refraction modeling for static scenes, but we did not report direct statistical matches (such as optical flow histograms or temporal spectra) in the original submission. In revision, we will add these comparisons for extreme distortion levels to verify alignment with lab-captured real data and bolster the benchmark's utility. revision: yes
Referee: [Evaluation] The paper reports method rankings and V-cache effectiveness on both real and synthetic data, yet the abstract provides no quantitative results, ablation details, or verification that synthetic data matches real statistics. Without these, the cross-method comparisons and claims about performance in severe regimes cannot be fully assessed for robustness.

Authors: We note that abstracts are inherently concise and typically omit detailed ablations or full quantitative tables, which appear in the evaluation section. However, to address the concern, we will revise the abstract to include key quantitative highlights (e.g., representative PSNR/SSIM gains for V-cache in high-distortion cases) and reference the added fidelity validations. This maintains standard abstract length while improving transparency on robustness. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical benchmark with no derivations or self-referential predictions

full rationale

The paper introduces a benchmark dataset and evaluates restoration methods on real lab-captured and physics-simulated refractive warping sequences using standard pixel and perceptual metrics. No equations, fitted parameters, or predictions are claimed; the central contribution is the release of data and code at the cited GitHub repository. No self-citations are invoked to justify uniqueness theorems or ansatzes, and the work contains no derivation chain that reduces to its own inputs by construction. The skeptic concern about synthetic-to-real distribution match is a validity question, not a circularity issue.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an applied empirical benchmark paper. No mathematical derivations, free parameters, or new physical axioms are introduced beyond standard light refraction simulation referenced in the abstract.

pith-pipeline@v0.9.0 · 5538 in / 1199 out tokens · 36914 ms · 2026-05-08T17:08:14.188533+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 9 canonical work pages · 7 internal anchors

[1]

Atmospheric turbulence mitigation using complex wavelet-based fusion.IEEE Transactions on Image Processing, 22(6):2398–2408, 2013

Nantheera Anantrasirichai, Alin Achim, Nick G Kingsbury, and David R Bull. Atmospheric turbulence mitigation using complex wavelet-based fusion.IEEE Transactions on Image Processing, 22(6):2398–2408, 2013. 1

2013
[2]

At- mospheric turbulence mitigation for sequences with moving objects using recursive image fusion

Nantheera Anantrasirichai, Alin Achim, and David Bull. At- mospheric turbulence mitigation for sequences with moving objects using recursive image fusion. In2018 25th IEEE international conference on image processing (ICIP), pages 2895–2899. IEEE, 2018. 1

2018
[3]

Unsupervised moving ob- ject segmentation with atmospheric turbulence

Dehao Qin, Ripon Kumar Saha, Woojeh Chung, Suren Jaya- suriya, Jinwei Ye, and Nianyi Li. Unsupervised moving ob- ject segmentation with atmospheric turbulence. InEuropean Conference on Computer Vision, pages 18–37. Springer,
[4]

Ug 2+ track 2: A collective benchmark effort for evaluating and advancing image understanding in poor visibility environments,

Ye Yuan, Wenhan Yang, Wenqi Ren, Jiaying Liu, Walter J Scheirer, and Zhangyang Wang. Ug 2+ track 2: A collective benchmark ... advancing image understanding in poor visi- bility environments.arXiv preprint arXiv:1904.04474, 2019. 1

work page arXiv 1904
[5]

Lucky imaging: high angular resolution imaging in the visible from the ground.Astronomy & Astrophysics, 446(2):739–745, 2006

Nicholas M Law, Craig D Mackay, and John E Bald- win. Lucky imaging: high angular resolution imaging in the visible from the ground.Astronomy & Astrophysics, 446(2):739–745, 2006. 1

2006
[6]

Spatio-temporal turbulence mit- igation: A translational perspective

Xingguang Zhang, Nicholas Chimitt, Yiheng Chi, Zhiyuan Mao, and Stanley H Chan. Spatio-temporal turbulence mit- igation: A translational perspective. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2889–2899, 2024. 1, 2, 3

2024
[7]

Physics-driven turbulence image restora- tion with stochastic refinement

Ajay Jaiswal, Xingguang Zhang, Stanley H Chan, and Zhangyang Wang. Physics-driven turbulence image restora- tion with stochastic refinement. InProceedings of the 7 IEEE/CVF international conference on computer vision, pages 12170–12181, 2023. 1

2023
[8]

Dynamic fluid sur- face reconstruction using deep neural network

Simron Thapa, Nianyi Li, and Jinwei Ye. Dynamic fluid sur- face reconstruction using deep neural network. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21–30, 2020. 1

2020
[9]

Restoration of non-rigidly distorted underwater images us- ing a combination of compressive sensing and local polyno- mial image representations

Jerin Geo James, Pranay Agrawal, and Ajit Rajwade. Restoration of non-rigidly distorted underwater images us- ing a combination of compressive sensing and local polyno- mial image representations. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 7839– 7848, 2019

2019
[10]

Seeing through water: Image restoration using model-based tracking

Yuandong Tian and Srinivasa G Narasimhan. Seeing through water: Image restoration using model-based tracking. In 2009 IEEE 12th International conference on computer vi- sion, pages 2303–2310. Ieee, 2009. 1

2009
[11]

Sea-undistort: A dataset for through-water image restoration in high resolution airborne bathymetric mapping

Maximilian Kromer, Panagiotis Agrafiotis, and Beg ¨um Demir. Sea-undistort: A dataset for through-water image restoration in high resolution airborne bathymetric mapping. IEEE Geoscience and Remote Sensing Letters, 2025. 1

2025
[12]

Seeing through wavy water–air interface: A restoration model for instantaneous images distorted by surface waves

Bijian Jian, Chunbo Ma, Dejian Zhu, Yixiao Sun, and Jun Ao. Seeing through wavy water–air interface: A restoration model for instantaneous images distorted by surface waves. Future Internet, 14(8):236, 2022. 2

2022
[13]

Reconstruction of the instantaneous images distorted by surface waves via helmholtz–hodge decomposi- tion.Journal of Marine Science and Engineering, 11(1):164,

Bijian Jian, Chunbo Ma, Yixiao Sun, Dejian Zhu, Xu Tian, and Jun Ao. Reconstruction of the instantaneous images distorted by surface waves via helmholtz–hodge decomposi- tion.Journal of Marine Science and Engineering, 11(1):164,
[14]

Simultaneous 3d reconstruction for water sur- face and underwater scene

Yiming Qian, Yinqiang Zheng, Minglun Gong, and Yee- Hong Yang. Simultaneous 3d reconstruction for water sur- face and underwater scene. InProceedings of the European Conference on Computer Vision (ECCV), pages 754–770,
[15]

Reconstruction of distorted un- derwater images using robust registration.Optics express, 27(7):9996–10008, 2019

Zhen Zhang and Xu Yang. Reconstruction of distorted un- derwater images using robust registration.Optics express, 27(7):9996–10008, 2019. 2

2019
[16]

Water-air interface imaging: recovering the images distorted by surface waves via an efficient registration algo- rithm.Entropy, 24(12):1765, 2022

Bijian Jian, Chunbo Ma, Dejian Zhu, Qihong Huang, and Jun Ao. Water-air interface imaging: recovering the images distorted by surface waves via an efficient registration algo- rithm.Entropy, 24(12):1765, 2022. 2

2022
[17]

Unsupervised non-rigid image distortion removal via grid deformation

Nianyi Li, Simron Thapa, Cameron Whyte, Albert W Reed, Suren Jayasuriya, and Jinwei Ye. Unsupervised non-rigid image distortion removal via grid deformation. InProceed- ings of the IEEE/CVF International Conference on Com- puter Vision, pages 2522–2532, 2021. 2, 3

2021
[18]

Xingguang Zhang, Nicholas Chimitt, Xijun Wang, Yu Yuan, and Stanley H. Chan. Learning phase distortion with se- lective state space models for video turbulence mitigation,
[19]

Learning to remove refractive distortions from underwater images

Simron Thapa, Nianyi Li, and Jinwei Ye. Learning to remove refractive distortions from underwater images. InProceed- ings of the IEEE/CVF International Conference on Com- puter Vision, pages 5007–5016, 2021. 2, 11

2021
[20]

Removing nonrigid refractive distortions for underwater images using an attention-based deep neural net- work.Intelligent Marine Technology and Systems, 2(1):25,

Tengyue Li, Jiayi Song, Zhiyu Song, Arapat Ablimit, and Long Chen. Removing nonrigid refractive distortions for underwater images using an attention-based deep neural net- work.Intelligent Marine Technology and Systems, 2(1):25,
[21]

GitHub repos- itory

A unified benchmark for multi-frame image restora- tion under severe refractive warping: code and evalua- tion framework.https://github.com/iafoss/ refractive-mfir-benchmark, 2026. GitHub repos- itory. 2, 7

2026
[22]

org/records/19390086, 2026

Refractive mfir benchmark dataset.https://zenodo. org/records/19390086, 2026. 2, 7

work page arXiv 2026
[23]

Denoising dif- fusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising dif- fusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020. 4

2020
[24]

Hierarchical Text-Conditional Image Generation with CLIP Latents

Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical text-conditional image gen- eration with clip latents.arXiv preprint arXiv:2204.06125, 1(2):3, 2022. 4

work page internal anchor Pith review arXiv 2022
[25]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022

2022
[26]

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Dominik Lorenz, Yam Levi, Zion English, Vikram V oleti, Adam Letts, et al. Stable video diffusion: Scaling latent video diffusion models to large datasets.arXiv preprint arXiv:2311.15127, 2023

work page internal anchor Pith review arXiv 2023
[27]

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Yixin Liu, Kai Zhang, Yuan Li, Zhiling Yan, Chujie Gao, Ruoxi Chen, Zhengqing Yuan, Yue Huang, Hanchi Sun, Jian- feng Gao, et al. Sora: A review on background, technology, limitations, and opportunities of large vision models.arXiv preprint arXiv:2402.17177, 2024. 4

work page internal anchor Pith review arXiv 2024
[28]

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

Zhuoyi Yang, Jiayan Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiao- han Zhang, Guanyu Feng, et al. Cogvideox: Text-to-video diffusion models with an expert transformer.arXiv preprint arXiv:2408.06072, 2024. 4

work page internal anchor Pith review arXiv 2024
[29]

Scalable diffusion models with transformers

William Peebles and Saining Xie. Scalable diffusion models with transformers. InProceedings of the IEEE/CVF inter- national conference on computer vision, pages 4195–4205,
[30]

Adding conditional control to text-to-image diffusion models

Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3836–3847, 2023. 4

2023
[31]

IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models

Hu Ye, Jun Zhang, Sibo Liu, Xiao Han, and Wei Yang. Ip- adapter: Text compatible image prompt adapter for text-to- image diffusion models.arXiv preprint arXiv:2308.06721, 2023

work page internal anchor Pith review arXiv 2023
[32]

Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation

Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 22500– 22510, 2023

2023
[33]

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patash- nik, Amit H Bermano, Gal Chechik, and Daniel Cohen- Or. An image is worth one word: Personalizing text-to- 8 image generation using textual inversion.arXiv preprint arXiv:2208.01618, 2022. 4

work page internal anchor Pith review arXiv 2022
[34]

DINOv2: Learning Robust Visual Features without Supervision

Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023. 4

work page internal anchor Pith review arXiv 2023
[35]

Dreamcache: Finetuning- free lightweight personalized image generation via feature caching

Emanuele Aiello, Umberto Michieli, Diego Valsesia, Mete Ozay, and Enrico Magli. Dreamcache: Finetuning- free lightweight personalized image generation via feature caching. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 12480–12489, 2025. 4

2025
[36]

Animate anyone: Consistent and controllable image-to-video synthesis for character animation, 2024

Li Hu, Xin Gao, Peng Zhang, Ke Sun, Bang Zhang, and Liefeng Bo. Animate anyone: Consistent and controllable image-to-video synthesis for character animation, 2024. 4

2024
[37]

Image quality assessment: Form error visibil- ity to structural similarity.IEEE Trans

Z Wang. Image quality assessment: Form error visibil- ity to structural similarity.IEEE Trans. Image Process., 13(4):604–606, 2004. 5

2004
[38]

The unreasonable effectiveness of deep features as a perceptual metric

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shecht- man, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recogni- tion, pages 586–595, 2018. 5

2018
[39]

Foundation models boost low-level perceptual similarity metrics

Abhijay Ghildyal, Nabajeet Barman, and Saman Zad- tootaghaj. Foundation models boost low-level perceptual similarity metrics. InICASSP 2025-2025 IEEE Interna- tional Conference on Acoustics, Speech and Signal Process- ing (ICASSP), pages 1–5. IEEE, 2025. 5

2025
[40]

Simulating ocean water.Simulat- ing nature: realistic and interactive techniques

Jerry Tessendorf et al. Simulating ocean water.Simulat- ing nature: realistic and interactive techniques. SIGGRAPH, 1(2):5, 2001. 10

2001
[41]

Wave breaking for nonlinear nonlocal shallow water equations.Acta Mathemat- ica, 181:229–243, 1998

Adrian Constantin and Joachim Escher. Wave breaking for nonlinear nonlocal shallow water equations.Acta Mathemat- ica, 181:229–243, 1998. 11 9 A unified Benchmark for Multi-Frame Image Restoration under Severe Refractive Warping Supplementary Material Figure 6. Data collection setup in the lab with water tank and water generators

1998
[42]

LAB setup Figure. 6. shows the laboratory data collection set up with a large water tank (20 × 7 × 3 feet) which was filled with approximately 19-inch depth of water. A TV monitor was placed above the water to display a set of background im- ages. The camera was set up below the water tank pointing towards the TV . During video recording, a wave generator...
[43]

to the left

Wave generation Table 5 provides parameters used to generate wave profiles. Details of specific wave types are provided below. 7.1. Ocean Wave For our simulation, we compute the Fast Fourier Transform (FFT) of Gerstner’s equations to represent the wave height as a random field over horizontal position and time. The heighth(x,t) at the horizontal positionx...
[44]

We mimic the LAB setup with camera located underwater and assume low field of view (parallel rays coming from the camera)

Video generation The distorted videos are generated by applying 200-frame- long series of precomputed wave normals to a selected background resized to512×512. We mimic the LAB setup with camera located underwater and assume low field of view (parallel rays coming from the camera). The vector form of Snell’s law (Eq. 9) is applied to these rays,⃗ v 1, at t...
[45]

Pixel (PSNR and SSIM) and perception metrics (LPIPS, DINO, CLIP) are used

Evaluation on the synthetic data Table 5-Table 8 provide a full summary of the evaluation on ocean, shallow water, sine, and ripple waves at low, mid, high, and extreme levels of distortion. Pixel (PSNR and SSIM) and perception metrics (LPIPS, DINO, CLIP) are used. Entire video setup refers to evaluation of the metric for each frame in the video and then ...