A Benchmark for Heterogeneous Stereo Deblurring with Physically- and Epipolar-constrained Cross Attention
Pith reviewed 2026-06-25 20:21 UTC · model grok-4.3
The pith
Physically and epipolar constrained cross attention improves deblurring for heterogeneous stereo pairs from smartphones.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By introducing the heterogeneous stereo deblurring dataset and the physically- and epipolar-constrained cross attention module, the work shows that restricting feature matching to valid disparity ranges derived from optics allows more effective and efficient cross-view information use in deblurring networks of various types.
What carries the argument
Physically- and epipolar-constrained cross attention (PECA), a lightweight module that limits cross-view matching to an epipolar search window with an optics-derived disparity bound and applies confidence-weighted residual fusion to handle reliable and unreliable correspondences.
If this is right
- PECA improves CNN-, Transformer-, and NAFNet-based deblurring baselines.
- Models using PECA show better restoration performance with favorable efficiency on the HSD dataset.
- PECA naturally falls back to self-deblurring in occluded or unreliable regions via its confidence weighting.
- The module can be integrated into existing architectures without major changes.
Where Pith is reading between the lines
- The dataset construction via multi-frame integration could be adapted for other real-world stereo deblurring scenarios beyond smartphones.
- Similar physical constraints might apply to deblurring in other multi-camera systems where calibration data is available.
- Improved deblurring could lead to better downstream tasks like depth estimation or 3D reconstruction from smartphone captures.
Load-bearing premise
The multi-frame integration process used to construct the HSD dataset from real smartphone captures accurately reproduces the asymmetric blur statistics of heterogeneous stereo pairs without introducing its own artifacts or biases.
What would settle it
A direct comparison showing that PECA-enhanced models do not outperform standard baselines on the HSD test set, or evidence that the HSD dataset's blur patterns differ significantly from actual single-frame heterogeneous captures.
Figures
read the original abstract
Modern stereo-capable smartphones enable immersive XR content capture. However, hardware heterogeneity across camera modules often causes severe asymmetric blur artifacts. Existing methods and benchmarks largely assume homogeneous stereo setups and therefore do not explicitly address such asymmetric degradation. To bridge this gap, we present a dedicated framework for heterogeneous stereo deblurring. First, we introduce the heterogeneous stereo deblurring (HSD) dataset, constructed from real smartphone stereo captures via multi-frame integration. Second, we propose physically- and epipolar-constrained cross attention (PECA), a lightweight module that restricts cross-view matching to an epipolar search window bounded by a optics-derived disparity upper bound. By enforcing physically valid disparity constraints, PECA enables efficient and reliable cross-view feature fusion. Moreover, our confidence-weighted attention with residual fusion emphasizes cross-guided deblurring when correspondences are reliable, while naturally falling back to self-deblurring in occluded or unreliable regions. PECA is architecture-agnostic and consistently improves CNN-, Transformer-, and NAFNet-based baselines. Extensive experiments on HSD show that PECA-enhanced models achieve improved restoration performance with favorable efficiency.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces the Heterogeneous Stereo Deblurring (HSD) dataset, built from real smartphone stereo captures via multi-frame integration, and proposes the Physically- and Epipolar-constrained Cross Attention (PECA) module. PECA restricts cross-view feature matching to an epipolar window bounded by an optics-derived disparity upper bound, employs confidence-weighted attention with residual fusion, and is claimed to be architecture-agnostic. It is reported to improve CNN-, Transformer-, and NAFNet-based baselines on HSD while maintaining favorable efficiency for handling asymmetric blur due to hardware heterogeneity in stereo XR capture.
Significance. If the dataset faithfully captures real asymmetric blur statistics and the reported gains hold under rigorous validation, the work fills a practical gap in deblurring for heterogeneous smartphone stereo systems. The physical and epipolar constraints in PECA offer a lightweight, interpretable mechanism for cross-view fusion that could generalize beyond the specific baselines tested.
major comments (2)
- [Dataset construction] Dataset construction section: the claim that multi-frame integration from real captures accurately reproduces hardware-induced asymmetric blur without introducing its own artifacts or biases is load-bearing for the central claim, yet no explicit validation (e.g., comparison to single-frame captures or simulated heterogeneous pairs) is referenced.
- [Experiments] Experiments section: the abstract states that PECA-enhanced models achieve improved restoration performance, but supplies no quantitative metrics, baselines, error bars, or ablation tables, preventing assessment of whether gains are statistically meaningful or merely reflect dataset artifacts.
minor comments (2)
- [PECA module description] Clarify how the optics-derived disparity upper bound is computed and whether it is fixed or per-image.
- [Method] Add a reference or brief derivation for the epipolar search window bounds to aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address the two major comments below and will revise the manuscript accordingly to strengthen the validation of the dataset and the presentation of experimental results.
read point-by-point responses
-
Referee: [Dataset construction] Dataset construction section: the claim that multi-frame integration from real captures accurately reproduces hardware-induced asymmetric blur without introducing its own artifacts or biases is load-bearing for the central claim, yet no explicit validation (e.g., comparison to single-frame captures or simulated heterogeneous pairs) is referenced.
Authors: We agree that explicit validation of the multi-frame integration process is important to support the claim that hardware-induced asymmetric blur is faithfully captured. The current manuscript describes the construction process but does not include side-by-side comparisons to single-frame captures or simulated pairs. In the revision we will add a dedicated validation subsection with both qualitative examples and quantitative metrics (e.g., blur kernel statistics and edge sharpness measures) demonstrating that the integration step does not introduce confounding artifacts. revision: yes
-
Referee: [Experiments] Experiments section: the abstract states that PECA-enhanced models achieve improved restoration performance, but supplies no quantitative metrics, baselines, error bars, or ablation tables, preventing assessment of whether gains are statistically meaningful or merely reflect dataset artifacts.
Authors: The experiments section of the manuscript does contain quantitative comparisons (PSNR/SSIM on the HSD test set), baseline results, and ablation studies on the PECA module. However, we acknowledge that the abstract provides only a qualitative statement and that error bars or statistical significance tests are not explicitly highlighted. We will revise the abstract to report the key numerical gains and will add error bars together with a brief statistical analysis to the experiments section in the revised manuscript. revision: yes
Circularity Check
No circularity in derivation chain
full rationale
The paper introduces the HSD dataset via multi-frame integration of real captures and proposes the PECA module with epipolar and physics-derived constraints. Performance claims are empirical evaluations on HSD showing improvements over baselines; no equations, fitted parameters, or predictions are described that reduce by construction to the inputs. No self-citations or uniqueness theorems are invoked in the provided text. The central claims rest on standard benchmark construction and module design rather than self-referential definitions or renamings.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Epipolar geometry and an optics-derived disparity upper bound correctly bound valid cross-view correspondences in stereo pairs.
invented entities (1)
-
PECA module
no independent evidence
Reference graph
Works this paper leans on
-
[1]
In: European conference on computer vision
Chen, L., Chu, X., Zhang, X., Sun, J.: Simple baselines for image restoration. In: European conference on computer vision. pp. 17–33. Springer (2022)
2022
-
[2]
In: Proceedings of the IEEE/CVF international conference on computer vision
Cho, S.J., Ji, S.W., Hong, J.P., Jung, S.W., Ko, S.J.: Rethinking coarse-to-fine ap- proach in single image deblurring. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 4641–4650 (2021)
2021
-
[3]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition
Chu, X., Chen, L., Yu, W.: Nafssr: Stereo image super-resolution using nafnet. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition. pp. 1239–1248 (2022)
2022
-
[4]
Annual review of vision science7(1), 571–604 (2021)
Delbracio, M., Kelly, D., Brown, M.S., Milanfar, P.: Mobile computational photog- raphy: A tour. Annual review of vision science7(1), 571–604 (2021)
2021
-
[5]
arXiv preprint arXiv:1706.02677 (2017)
Goyal, P., Dollár, P., Girshick, R., Noordhuis, P., Wesolowski, L., Kyrola, A., Tul- loch, A., Jia, Y., He, K.: Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677 (2017)
Pith/arXiv arXiv 2017
-
[6]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Ji,S.W.,Lee,J.,Kim,S.W.,Hong,J.P.,Baek,S.J.,Jung,S.W.,Ko,S.J.:Xydeblur: Divide and conquer for single image deblurring. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 17421–17430 (2022)
2022
-
[7]
In: Journal of Physics: Conference Series
Kang, J., Yao, R., Zhu, H., Sun, K., Li, X., Zhao, J., Zhou, Y.: Dual-lens super- resolution with semantic-enhanced feature matching and adaptive texture transfer. In: Journal of Physics: Conference Series. vol. 3108, p. 012021. IOP Publishing (2025)
2025
-
[8]
Journal homepage: http://iieta
Kim, S.: Generation of stereo images from the heterogeneous cameras. Journal homepage: http://iieta. org/journals/i2m20(2), 73–78 (2021)
2021
-
[9]
In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision
Kim, Y., Lim, J., Cho, H., Lee, M., Lee, D., Yoon, K.J., Choi, H.J.: Efficient reference-based video super-resolution (ervsr): Single reference image is all you need. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 1828–1837 (2023)
2023
-
[10]
In: Proceedings of the IEEE/CVF con- ference on computer vision and pattern recognition
Kong, L., Dong, J., Ge, J., Li, M., Pan, J.: Efficient frequency domain-based trans- formers for high-quality image deblurring. In: Proceedings of the IEEE/CVF con- ference on computer vision and pattern recognition. pp. 5886–5895 (2023)
2023
-
[11]
In: Proceedings of the IEEE/CVF conference on com- puter vision and pattern recognition
Lee, J., Lee, M., Cho, S., Lee, S.: Reference-based video super-resolution using multi-camera video triplets. In: Proceedings of the IEEE/CVF conference on com- puter vision and pattern recognition. pp. 17824–17833 (2022)
2022
-
[12]
IEEE Transactions on Circuits and Systems for Video Technology (2025)
Lin, M., Zhang, C., He, C., Yu, L.: Learning parallax for stereo event-based mo- tion deblurring. IEEE Transactions on Circuits and Systems for Video Technology (2025)
2025
-
[13]
In: International Symposium on Visual Computing
Liu, H., Li, B., Lu, M., Wu, Y.: Real-world image deblurring via unsupervised domain adaptation. In: International Symposium on Visual Computing. pp. 148–
-
[14]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Liu, H., Liu, C., Xu, J., Jiang, P., Lu, M.: Xyscannet: A state space model for single image deblurring. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 779–789 (2025)
2025
-
[15]
arXiv preprint arXiv:1711.05101 (2017)
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)
Pith/arXiv arXiv 2017
-
[16]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Min, J., Jeon, Y., Kim, J., Choi, M.: S2m2: Scalable stereo matching model for reli- able depth estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 26729–26739 (2025)
2025
-
[17]
In: A Benchmark for Heterogeneous Stereo Deblurring with PECA 17 Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition workshops
Nah, S., Baik, S., Hong, S., Moon, G., Son, S., Timofte, R., Mu Lee, K.: Ntire 2019 challenge on video deblurring and super-resolution: Dataset and study. In: A Benchmark for Heterogeneous Stereo Deblurring with PECA 17 Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition workshops. pp. 0–0 (2019)
2019
-
[18]
In: Proceedings of the IEEE conference on computer vision and pattern recognition
Nah, S., Hyun Kim, T., Mu Lee, K.: Deep multi-scale convolutional neural network for dynamic scene deblurring. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 3883–3891 (2017)
2017
-
[19]
In: Proceedings of the IEEE conference on computer vision and pattern recognition
Pan, L., Dai, Y., Liu, M., Porikli, F.: Simultaneous stereo video deblurring and scene flow estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 4382–4391 (2017)
2017
-
[20]
IEEE Transactions on Image Processing29, 1748–1761 (2019)
Pan, L., Dai, Y., Liu, M., Porikli, F., Pan, Q.: Joint stereo video deblurring, scene flow estimation and moving object segmentation. IEEE Transactions on Image Processing29, 1748–1761 (2019)
2019
-
[21]
In: European conference on computer vision
Rim, J., Lee, H., Won, J., Cho, S.: Real-world blur dataset for learning and bench- marking deblurring algorithms. In: European conference on computer vision. pp. 184–201. Springer (2020)
2020
-
[22]
In: ACM SIGGRAPH 2024 Conference Papers
Rim, J., Lee, J., Yang, H., Cho, S.: Deep hybrid camera deblurring for smartphone cameras. In: ACM SIGGRAPH 2024 Conference Papers. pp. 1–11 (2024)
2024
-
[23]
In: European conference on computer vision
Sellent, A., Rother, C., Roth, S.: Stereo video deblurring. In: European conference on computer vision. pp. 558–575. Springer (2016)
2016
-
[24]
arXiv preprint arXiv:2309.08826 (2023)
Shekarforoush, S., Walia, A., Brubaker, M.A., Derpanis, K.G., Levinshtein, A.: Dual-camera joint deblurring-denoising. arXiv preprint arXiv:2309.08826 (2023)
arXiv 2023
-
[25]
In: Proceedings of the IEEE/CVF international conference on computer vision
Shen, Z., Wang, W., Lu, X., Shen, J., Ling, H., Xu, T., Shao, L.: Human-aware motion deblurring. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 5572–5581 (2019)
2019
-
[26]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Wang, L., Wang, Y., Liang, Z., Lin, Z., Yang, J., An, W., Guo, Y.: Learning paral- lax attention for stereo image super-resolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 12250–12259 (2019)
2019
-
[27]
In: Proceedings of the IEEE/CVF international confer- ence on computer vision
Wang, T., Xie, J., Sun, W., Yan, Q., Chen, Q.: Dual-camera super-resolution with aligned attention modules. In: Proceedings of the IEEE/CVF international confer- ence on computer vision. pp. 2001–2010 (2021)
2001
-
[28]
In: Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition
Wang, Y., Ying, X., Wang, L., Yang, J., An, W., Guo, Y.: Symmetric parallax at- tention for stereo image super-resolution. In: Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition. pp. 766–775 (2021)
2021
-
[29]
In: The Thirty-ninth Annual Conference on Neural Information Processing Systems (2025)
Xiao, Z., Wang, X.: Asymmetric dual-lens video deblurring. In: The Thirty-ninth Annual Conference on Neural Information Processing Systems (2025)
2025
-
[30]
In: Proceedings of the AAAI Conference on Artificial Intelligence
Yue, H., Cui, Z., Li, K., Yang, J.: Kedusr: Real-world dual-lens super-resolution via kernel-free matching. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 38, pp. 6881–6889 (2024)
2024
-
[31]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Zamir, S.W., Arora, A., Khan, S., Hayat, M., Khan, F.S., Yang, M.H.: Restormer: Efficient transformer for high-resolution image restoration. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 5728–5739 (2022)
2022
-
[32]
IEEE Transactions on Pattern Analysis and Machine Intelligence46(7), 4850–4865 (2024)
Zhang, S., Yu, W., Jiang, F., Nie, L., Yao, H., Huang, Q., Tao, D.: Stereo image restoration via attention-guided correspondence learning. IEEE Transactions on Pattern Analysis and Machine Intelligence46(7), 4850–4865 (2024)
2024
-
[33]
ACM Computing Surveys (2025)
Zhang, T., Lu, J., Jin, Q., Zeng, T.: A survey of single image blind motion deblur- ring from traditional to deep learning. ACM Computing Surveys (2025)
2025
-
[34]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Zhou, S., Zhang, J., Zuo, W., Xie, H., Pan, J., Ren, J.S.: Davanet: Stereo deblurring with view aggregation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 10996–11005 (2019) 18 H. Shin et al
2019
-
[35]
In: 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
Zou, H., Suganuma, M., Okatani, T.: Refvsr++: Exploiting reference inputs for reference-based video super-resolution. In: 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). pp. 2756–2765. IEEE (2025)
2025
-
[36]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Zou, W., Gao, H., Chen, L., Zhang, Y., Jiang, M., Yu, Z., Tan, M.: Cross-view hier- archy network for stereo image super-resolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 1396–1405 (2023) A Benchmark for Heterogeneous Stereo Deblurring with PECA 1 A Benchmark for Heterogeneous Stereo Deblurring with PEC...
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.