pith. sign in

arxiv: 2606.25962 · v1 · pith:SJCJNJXZnew · submitted 2026-06-24 · 💻 cs.CV

A Benchmark for Heterogeneous Stereo Deblurring with Physically- and Epipolar-constrained Cross Attention

Pith reviewed 2026-06-25 20:21 UTC · model grok-4.3

classification 💻 cs.CV
keywords heterogeneous stereo deblurringepipolar constraintcross attentionasymmetric blursmartphone camerasHSD datasetimage restoration
0
0 comments X

The pith

Physically and epipolar constrained cross attention improves deblurring for heterogeneous stereo pairs from smartphones.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to solve asymmetric blur in stereo images taken by heterogeneous camera modules on smartphones, which current deblurring methods overlook because they assume identical cameras. To do this, it creates the HSD dataset from actual phone captures using multi-frame integration and proposes the PECA module. PECA constrains cross-attention to physically plausible epipolar regions based on optical limits and weights the fusion by confidence to blend cross-view and self-deblurring appropriately. A reader would care because this could enable sharper immersive content from everyday devices without requiring matched hardware.

Core claim

By introducing the heterogeneous stereo deblurring dataset and the physically- and epipolar-constrained cross attention module, the work shows that restricting feature matching to valid disparity ranges derived from optics allows more effective and efficient cross-view information use in deblurring networks of various types.

What carries the argument

Physically- and epipolar-constrained cross attention (PECA), a lightweight module that limits cross-view matching to an epipolar search window with an optics-derived disparity bound and applies confidence-weighted residual fusion to handle reliable and unreliable correspondences.

If this is right

  • PECA improves CNN-, Transformer-, and NAFNet-based deblurring baselines.
  • Models using PECA show better restoration performance with favorable efficiency on the HSD dataset.
  • PECA naturally falls back to self-deblurring in occluded or unreliable regions via its confidence weighting.
  • The module can be integrated into existing architectures without major changes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The dataset construction via multi-frame integration could be adapted for other real-world stereo deblurring scenarios beyond smartphones.
  • Similar physical constraints might apply to deblurring in other multi-camera systems where calibration data is available.
  • Improved deblurring could lead to better downstream tasks like depth estimation or 3D reconstruction from smartphone captures.

Load-bearing premise

The multi-frame integration process used to construct the HSD dataset from real smartphone captures accurately reproduces the asymmetric blur statistics of heterogeneous stereo pairs without introducing its own artifacts or biases.

What would settle it

A direct comparison showing that PECA-enhanced models do not outperform standard baselines on the HSD test set, or evidence that the HSD dataset's blur patterns differ significantly from actual single-frame heterogeneous captures.

Figures

Figures reproduced from arXiv: 2606.25962 by Hoju Shin, Jiah Kim, Seowon Ji, Seung-Wook Kim.

Figure 1
Figure 1. Figure 1: Overview of PECA performance and visual quality on the HSD dataset. (a) Accuracy-efficiency trade-off across three representative backbones (XYDeblur, Restormer, and NAFNet). (b) Qualitative restoration results on a challenging scene. ultra-wide or telephoto lenses [4]. Beyond enhancing single-view photography, this configuration enables stereoscopic capture and other emerging 3D applica￾tions that require… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the HSD benchmark construction from real heterogeneous stereo capture. (a) Real smartphone stereoscopic capture, where hardware-induced blur asym￾metry naturally arises between wide and ultra-wide modules. (b) Construction protocol applied to real synchronized stereo sequences. nor are existing single-image benchmarks suitable for evaluating heterogeneous stereo deblurring. 2.2 Stereo Image Res… view at source ↗
Figure 3
Figure 3. Figure 3: Overview of the proposed physically- and epipolar-constrained cross attention (PECA). The framework consists of dual encoders for the blurry ultra-wide input and the sharp wide reference, followed by the PECA module and residual feature fusion. 4 Proposed Method 4.1 Design Principles We design PECA to address heterogeneous stereo deblurring in practical mo￾bile capture, where blur is systematically asymmet… view at source ↗
Figure 4
Figure 4. Figure 4: An illustration of the proposed PECA module. (a) The physically derived disparity bound Dmax restricts correspondence search along the epipolar line. (b) Pro￾gressive restriction of attention search space: global cross attention (full 2D), full-row cross attention (full 1D scanline), and PECA (constrained 1D disparity window). spatial fidelity. We therefore formulate restoration asymmetrically, restoring I… view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative results on the HSD dataset for three backbones, evaluated based on the presence of PECA. Three zoomed regions focusing on large-scale blur, transparent vinyl, and fine-grained text are highlighted [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative comparison on real handheld stereo captures without ground truth [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: PSNR and SSIM heatmaps over Dmax and τ with the XYDeblur backbone. Regarding τ , a small temperature yields a peaky attention distribution that better isolates reliable correspondences, whereas a large temperature (e.g., τ=1.0) produces relatively diffuse weights due to the bounded cosine similarity, which may weaken discrimination and degrades restoration quality. In this diffuse regime, expanding Dmax ca… view at source ↗
read the original abstract

Modern stereo-capable smartphones enable immersive XR content capture. However, hardware heterogeneity across camera modules often causes severe asymmetric blur artifacts. Existing methods and benchmarks largely assume homogeneous stereo setups and therefore do not explicitly address such asymmetric degradation. To bridge this gap, we present a dedicated framework for heterogeneous stereo deblurring. First, we introduce the heterogeneous stereo deblurring (HSD) dataset, constructed from real smartphone stereo captures via multi-frame integration. Second, we propose physically- and epipolar-constrained cross attention (PECA), a lightweight module that restricts cross-view matching to an epipolar search window bounded by a optics-derived disparity upper bound. By enforcing physically valid disparity constraints, PECA enables efficient and reliable cross-view feature fusion. Moreover, our confidence-weighted attention with residual fusion emphasizes cross-guided deblurring when correspondences are reliable, while naturally falling back to self-deblurring in occluded or unreliable regions. PECA is architecture-agnostic and consistently improves CNN-, Transformer-, and NAFNet-based baselines. Extensive experiments on HSD show that PECA-enhanced models achieve improved restoration performance with favorable efficiency.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces the Heterogeneous Stereo Deblurring (HSD) dataset, built from real smartphone stereo captures via multi-frame integration, and proposes the Physically- and Epipolar-constrained Cross Attention (PECA) module. PECA restricts cross-view feature matching to an epipolar window bounded by an optics-derived disparity upper bound, employs confidence-weighted attention with residual fusion, and is claimed to be architecture-agnostic. It is reported to improve CNN-, Transformer-, and NAFNet-based baselines on HSD while maintaining favorable efficiency for handling asymmetric blur due to hardware heterogeneity in stereo XR capture.

Significance. If the dataset faithfully captures real asymmetric blur statistics and the reported gains hold under rigorous validation, the work fills a practical gap in deblurring for heterogeneous smartphone stereo systems. The physical and epipolar constraints in PECA offer a lightweight, interpretable mechanism for cross-view fusion that could generalize beyond the specific baselines tested.

major comments (2)
  1. [Dataset construction] Dataset construction section: the claim that multi-frame integration from real captures accurately reproduces hardware-induced asymmetric blur without introducing its own artifacts or biases is load-bearing for the central claim, yet no explicit validation (e.g., comparison to single-frame captures or simulated heterogeneous pairs) is referenced.
  2. [Experiments] Experiments section: the abstract states that PECA-enhanced models achieve improved restoration performance, but supplies no quantitative metrics, baselines, error bars, or ablation tables, preventing assessment of whether gains are statistically meaningful or merely reflect dataset artifacts.
minor comments (2)
  1. [PECA module description] Clarify how the optics-derived disparity upper bound is computed and whether it is fixed or per-image.
  2. [Method] Add a reference or brief derivation for the epipolar search window bounds to aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments below and will revise the manuscript accordingly to strengthen the validation of the dataset and the presentation of experimental results.

read point-by-point responses
  1. Referee: [Dataset construction] Dataset construction section: the claim that multi-frame integration from real captures accurately reproduces hardware-induced asymmetric blur without introducing its own artifacts or biases is load-bearing for the central claim, yet no explicit validation (e.g., comparison to single-frame captures or simulated heterogeneous pairs) is referenced.

    Authors: We agree that explicit validation of the multi-frame integration process is important to support the claim that hardware-induced asymmetric blur is faithfully captured. The current manuscript describes the construction process but does not include side-by-side comparisons to single-frame captures or simulated pairs. In the revision we will add a dedicated validation subsection with both qualitative examples and quantitative metrics (e.g., blur kernel statistics and edge sharpness measures) demonstrating that the integration step does not introduce confounding artifacts. revision: yes

  2. Referee: [Experiments] Experiments section: the abstract states that PECA-enhanced models achieve improved restoration performance, but supplies no quantitative metrics, baselines, error bars, or ablation tables, preventing assessment of whether gains are statistically meaningful or merely reflect dataset artifacts.

    Authors: The experiments section of the manuscript does contain quantitative comparisons (PSNR/SSIM on the HSD test set), baseline results, and ablation studies on the PECA module. However, we acknowledge that the abstract provides only a qualitative statement and that error bars or statistical significance tests are not explicitly highlighted. We will revise the abstract to report the key numerical gains and will add error bars together with a brief statistical analysis to the experiments section in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper introduces the HSD dataset via multi-frame integration of real captures and proposes the PECA module with epipolar and physics-derived constraints. Performance claims are empirical evaluations on HSD showing improvements over baselines; no equations, fitted parameters, or predictions are described that reduce by construction to the inputs. No self-citations or uniqueness theorems are invoked in the provided text. The central claims rest on standard benchmark construction and module design rather than self-referential definitions or renamings.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The approach rests on standard stereo geometry and the assumption that the constructed dataset faithfully represents real heterogeneous blur; no free parameters or new entities with independent evidence are described.

axioms (1)
  • standard math Epipolar geometry and an optics-derived disparity upper bound correctly bound valid cross-view correspondences in stereo pairs.
    Invoked to restrict the search window in PECA.
invented entities (1)
  • PECA module no independent evidence
    purpose: Lightweight cross-attention block enforcing physical and epipolar constraints for heterogeneous deblurring.
    New component proposed by the authors.

pith-pipeline@v0.9.1-grok · 5737 in / 1173 out tokens · 16776 ms · 2026-06-25T20:21:39.943839+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

36 extracted references · 2 linked inside Pith

  1. [1]

    In: European conference on computer vision

    Chen, L., Chu, X., Zhang, X., Sun, J.: Simple baselines for image restoration. In: European conference on computer vision. pp. 17–33. Springer (2022)

  2. [2]

    In: Proceedings of the IEEE/CVF international conference on computer vision

    Cho, S.J., Ji, S.W., Hong, J.P., Jung, S.W., Ko, S.J.: Rethinking coarse-to-fine ap- proach in single image deblurring. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 4641–4650 (2021)

  3. [3]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition

    Chu, X., Chen, L., Yu, W.: Nafssr: Stereo image super-resolution using nafnet. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition. pp. 1239–1248 (2022)

  4. [4]

    Annual review of vision science7(1), 571–604 (2021)

    Delbracio, M., Kelly, D., Brown, M.S., Milanfar, P.: Mobile computational photog- raphy: A tour. Annual review of vision science7(1), 571–604 (2021)

  5. [5]

    arXiv preprint arXiv:1706.02677 (2017)

    Goyal, P., Dollár, P., Girshick, R., Noordhuis, P., Wesolowski, L., Kyrola, A., Tul- loch, A., Jia, Y., He, K.: Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677 (2017)

  6. [6]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Ji,S.W.,Lee,J.,Kim,S.W.,Hong,J.P.,Baek,S.J.,Jung,S.W.,Ko,S.J.:Xydeblur: Divide and conquer for single image deblurring. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 17421–17430 (2022)

  7. [7]

    In: Journal of Physics: Conference Series

    Kang, J., Yao, R., Zhu, H., Sun, K., Li, X., Zhao, J., Zhou, Y.: Dual-lens super- resolution with semantic-enhanced feature matching and adaptive texture transfer. In: Journal of Physics: Conference Series. vol. 3108, p. 012021. IOP Publishing (2025)

  8. [8]

    Journal homepage: http://iieta

    Kim, S.: Generation of stereo images from the heterogeneous cameras. Journal homepage: http://iieta. org/journals/i2m20(2), 73–78 (2021)

  9. [9]

    In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision

    Kim, Y., Lim, J., Cho, H., Lee, M., Lee, D., Yoon, K.J., Choi, H.J.: Efficient reference-based video super-resolution (ervsr): Single reference image is all you need. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 1828–1837 (2023)

  10. [10]

    In: Proceedings of the IEEE/CVF con- ference on computer vision and pattern recognition

    Kong, L., Dong, J., Ge, J., Li, M., Pan, J.: Efficient frequency domain-based trans- formers for high-quality image deblurring. In: Proceedings of the IEEE/CVF con- ference on computer vision and pattern recognition. pp. 5886–5895 (2023)

  11. [11]

    In: Proceedings of the IEEE/CVF conference on com- puter vision and pattern recognition

    Lee, J., Lee, M., Cho, S., Lee, S.: Reference-based video super-resolution using multi-camera video triplets. In: Proceedings of the IEEE/CVF conference on com- puter vision and pattern recognition. pp. 17824–17833 (2022)

  12. [12]

    IEEE Transactions on Circuits and Systems for Video Technology (2025)

    Lin, M., Zhang, C., He, C., Yu, L.: Learning parallax for stereo event-based mo- tion deblurring. IEEE Transactions on Circuits and Systems for Video Technology (2025)

  13. [13]

    In: International Symposium on Visual Computing

    Liu, H., Li, B., Lu, M., Wu, Y.: Real-world image deblurring via unsupervised domain adaptation. In: International Symposium on Visual Computing. pp. 148–

  14. [14]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Liu, H., Liu, C., Xu, J., Jiang, P., Lu, M.: Xyscannet: A state space model for single image deblurring. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 779–789 (2025)

  15. [15]

    arXiv preprint arXiv:1711.05101 (2017)

    Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)

  16. [16]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Min, J., Jeon, Y., Kim, J., Choi, M.: S2m2: Scalable stereo matching model for reli- able depth estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 26729–26739 (2025)

  17. [17]

    In: A Benchmark for Heterogeneous Stereo Deblurring with PECA 17 Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition workshops

    Nah, S., Baik, S., Hong, S., Moon, G., Son, S., Timofte, R., Mu Lee, K.: Ntire 2019 challenge on video deblurring and super-resolution: Dataset and study. In: A Benchmark for Heterogeneous Stereo Deblurring with PECA 17 Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition workshops. pp. 0–0 (2019)

  18. [18]

    In: Proceedings of the IEEE conference on computer vision and pattern recognition

    Nah, S., Hyun Kim, T., Mu Lee, K.: Deep multi-scale convolutional neural network for dynamic scene deblurring. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 3883–3891 (2017)

  19. [19]

    In: Proceedings of the IEEE conference on computer vision and pattern recognition

    Pan, L., Dai, Y., Liu, M., Porikli, F.: Simultaneous stereo video deblurring and scene flow estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 4382–4391 (2017)

  20. [20]

    IEEE Transactions on Image Processing29, 1748–1761 (2019)

    Pan, L., Dai, Y., Liu, M., Porikli, F., Pan, Q.: Joint stereo video deblurring, scene flow estimation and moving object segmentation. IEEE Transactions on Image Processing29, 1748–1761 (2019)

  21. [21]

    In: European conference on computer vision

    Rim, J., Lee, H., Won, J., Cho, S.: Real-world blur dataset for learning and bench- marking deblurring algorithms. In: European conference on computer vision. pp. 184–201. Springer (2020)

  22. [22]

    In: ACM SIGGRAPH 2024 Conference Papers

    Rim, J., Lee, J., Yang, H., Cho, S.: Deep hybrid camera deblurring for smartphone cameras. In: ACM SIGGRAPH 2024 Conference Papers. pp. 1–11 (2024)

  23. [23]

    In: European conference on computer vision

    Sellent, A., Rother, C., Roth, S.: Stereo video deblurring. In: European conference on computer vision. pp. 558–575. Springer (2016)

  24. [24]

    arXiv preprint arXiv:2309.08826 (2023)

    Shekarforoush, S., Walia, A., Brubaker, M.A., Derpanis, K.G., Levinshtein, A.: Dual-camera joint deblurring-denoising. arXiv preprint arXiv:2309.08826 (2023)

  25. [25]

    In: Proceedings of the IEEE/CVF international conference on computer vision

    Shen, Z., Wang, W., Lu, X., Shen, J., Ling, H., Xu, T., Shao, L.: Human-aware motion deblurring. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 5572–5581 (2019)

  26. [26]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Wang, L., Wang, Y., Liang, Z., Lin, Z., Yang, J., An, W., Guo, Y.: Learning paral- lax attention for stereo image super-resolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 12250–12259 (2019)

  27. [27]

    In: Proceedings of the IEEE/CVF international confer- ence on computer vision

    Wang, T., Xie, J., Sun, W., Yan, Q., Chen, Q.: Dual-camera super-resolution with aligned attention modules. In: Proceedings of the IEEE/CVF international confer- ence on computer vision. pp. 2001–2010 (2021)

  28. [28]

    In: Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition

    Wang, Y., Ying, X., Wang, L., Yang, J., An, W., Guo, Y.: Symmetric parallax at- tention for stereo image super-resolution. In: Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition. pp. 766–775 (2021)

  29. [29]

    In: The Thirty-ninth Annual Conference on Neural Information Processing Systems (2025)

    Xiao, Z., Wang, X.: Asymmetric dual-lens video deblurring. In: The Thirty-ninth Annual Conference on Neural Information Processing Systems (2025)

  30. [30]

    In: Proceedings of the AAAI Conference on Artificial Intelligence

    Yue, H., Cui, Z., Li, K., Yang, J.: Kedusr: Real-world dual-lens super-resolution via kernel-free matching. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 38, pp. 6881–6889 (2024)

  31. [31]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Zamir, S.W., Arora, A., Khan, S., Hayat, M., Khan, F.S., Yang, M.H.: Restormer: Efficient transformer for high-resolution image restoration. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 5728–5739 (2022)

  32. [32]

    IEEE Transactions on Pattern Analysis and Machine Intelligence46(7), 4850–4865 (2024)

    Zhang, S., Yu, W., Jiang, F., Nie, L., Yao, H., Huang, Q., Tao, D.: Stereo image restoration via attention-guided correspondence learning. IEEE Transactions on Pattern Analysis and Machine Intelligence46(7), 4850–4865 (2024)

  33. [33]

    ACM Computing Surveys (2025)

    Zhang, T., Lu, J., Jin, Q., Zeng, T.: A survey of single image blind motion deblur- ring from traditional to deep learning. ACM Computing Surveys (2025)

  34. [34]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Zhou, S., Zhang, J., Zuo, W., Xie, H., Pan, J., Ren, J.S.: Davanet: Stereo deblurring with view aggregation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 10996–11005 (2019) 18 H. Shin et al

  35. [35]

    In: 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

    Zou, H., Suganuma, M., Okatani, T.: Refvsr++: Exploiting reference inputs for reference-based video super-resolution. In: 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). pp. 2756–2765. IEEE (2025)

  36. [36]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Zou, W., Gao, H., Chen, L., Zhang, Y., Jiang, M., Yu, Z., Tan, M.: Cross-view hier- archy network for stereo image super-resolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 1396–1405 (2023) A Benchmark for Heterogeneous Stereo Deblurring with PECA 1 A Benchmark for Heterogeneous Stereo Deblurring with PEC...