pith. machine review for the scientific record. sign in

arxiv: 2604.10554 · v1 · submitted 2026-04-12 · 💻 cs.CV

Recognition: unknown

Spatio-Temporal Difference Guided Motion Deblurring with the Complementary Vision Sensor

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:50 UTC · model grok-4.3

classification 💻 cs.CV
keywords motion deblurringcomplementary vision sensorspatial temporal differenceevent-based visionimage restorationrecurrent networksensor fusion
0
0 comments X

The pith

Fusing spatial and temporal difference signals from a complementary vision sensor restores details lost in motion-blurred RGB frames.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces STGDNet to address motion deblurring, an ill-posed problem when only a single blurry RGB frame is available. The complementary vision sensor captures synchronized high-frame-rate spatial difference data for structural edges and temporal difference data for motion cues within the same exposure period. A recurrent multi-branch network iteratively encodes and fuses these sequences with the RGB input to recover structure and color. Evaluations demonstrate outperformance over RGB-only and event-based deblurring methods on a synthetic dataset and in real-world extreme motion cases, with generalization across more than 100 scenarios.

Core claim

STGDNet adopts a recurrent multi-branch architecture that iteratively encodes and fuses SD and TD sequences from the CVS sensor to restore structure and color details lost in blurry RGB inputs, outperforming current RGB or event-based approaches in both synthetic CVS dataset and real-world evaluations while exhibiting strong generalization across over 100 extreme real-world scenarios.

What carries the argument

Recurrent multi-branch architecture that iteratively encodes and fuses synchronized spatial difference (SD) sequences for structure and temporal difference (TD) sequences for motion with the input RGB frame.

If this is right

  • Deblurring succeeds in extreme dynamic scenes where RGB-only methods collapse due to lost intra-exposure motion.
  • The approach mitigates event rate saturation that limits traditional event cameras under rapid motion.
  • Generalization holds across diverse real-world extreme motions beyond the training distribution.
  • Restored frames retain both geometric structure from SD and color fidelity from the RGB channel.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same sensor data streams could support related tasks such as motion estimation or object tracking without separate high-speed hardware.
  • Camera hardware that natively outputs these difference signals might reduce reliance on computationally heavy post-processing for dynamic scenes.
  • Simulated SD and TD channels on conventional high-speed cameras could test whether the fusion benefit transfers outside the specific CVS hardware.

Load-bearing premise

The SD and TD modalities supply enough independent structural and motion cues to make deblurring well-posed through fusion in the recurrent architecture without additional scene priors or post-processing.

What would settle it

A side-by-side test on identical blurry RGB inputs with and without the SD or TD streams from the CVS sensor, where the version using the differences shows no measurable improvement in sharpness or detail recovery would falsify the claim.

Figures

Figures reproduced from arXiv: 2604.10554 by Lijian Wang, Lin Yang, Rong Zhao, Taoyi Wang, Xiangru Chen, Yapeng Meng, Yihan Lin, Yuguo Chen, Zheyu Yang.

Figure 1
Figure 1. Figure 1: (a) Illustration of our deblurring framework and the Complementary Vision Sensor (CVS) [ [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Architecture of the Spatio-temporal Difference Guided Deblur Net (STGDNet), with including: Temporal Recurrent Refinement [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Visualization of different methods on SportsSloMo-CVS dataset. PSNR values for the cropped regions are provided. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Results on real-captured data compared with event-based methods and CVS-based method. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Real-world deblurring results of CVS under different RGB exposure times (us). [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Deblurring results across different rotational speeds and [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Performance boundary visualization. (a) 1D angular [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
read the original abstract

Motion blur arises when rapid scene changes occur during the exposure period, collapsing rich intra-exposure motion into a single RGB frame. Without explicit structural or temporal cues, RGB-only deblurring is highly ill-posed and often fails under extreme motion. Inspired by the human visual system, brain-inspired vision sensors introduce temporally dense information to alleviate this problem. However, event cameras still suffer from event rate saturation under rapid motion, while the event modality entangles edge features and motion cues, which limits their effectiveness. As a recent breakthrough, the complementary vision sensor (CVS), Tianmouc, captures synchronized RGB frames together with high-frame-rate, multi-bit spatial difference (SD, encoding structural edges) and temporal difference (TD, encoding motion cues) data within a single RGB exposure, offering a promising solution for RGB deblurring under extreme dynamic scenes. To fully leverage these complementary modalities, we propose Spatio-Temporal Difference Guided Deblur Net (STGDNet), which adopts a recurrent multi-branch architecture that iteratively encodes and fuses SD and TD sequences to restore structure and color details lost in blurry RGB inputs. Our method outperforms current RGB or event-based approaches in both synthetic CVS dataset and real-world evaluations. Moreover, STGDNet exhibits strong generalization capability across over 100 extreme real-world scenarios. Project page: https://tmcDeblur.github.io/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper proposes STGDNet, a recurrent multi-branch neural architecture that iteratively encodes and fuses high-frame-rate spatial difference (SD, structural edges) and temporal difference (TD, motion cues) sequences from the Tianmouc Complementary Vision Sensor (CVS) together with blurry RGB frames to perform motion deblurring. It claims superior performance to existing RGB-only and event-based deblurring methods on a synthetic CVS dataset as well as in real-world evaluations, with strong generalization across more than 100 extreme real-world scenarios.

Significance. If the quantitative claims hold under rigorous evaluation, the work would be significant for computer vision by demonstrating how a novel sensor providing synchronized structural and motion information within a single exposure can make extreme-motion deblurring better-posed than with RGB or event cameras alone. The recurrent multi-branch fusion strategy offers a concrete architectural template for multi-modal deblurring that could influence subsequent sensor-fusion research.

major comments (2)
  1. [Abstract] Abstract: the central claim that 'Our method outperforms current RGB or event-based approaches in both synthetic CVS dataset and real-world evaluations' and 'exhibits strong generalization capability across over 100 extreme real-world scenarios' is load-bearing yet unsupported by any metrics, ablation tables, dataset statistics, or error analysis. Without these, the outperformance and generalization assertions cannot be verified.
  2. [Real-world evaluations] Real-world evaluations: because pixel-accurate ground-truth sharp frames cannot be obtained in uncontrolled captures, the outperformance statement requires an explicit protocol (no-reference metrics such as BRISQUE or NIQE, controlled proxy setups, or user studies). If the manuscript relies solely on qualitative visuals for the >100 scenarios, the generalization claim lacks the same rigor as any synthetic results and becomes the weakest link in the central argument.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which helps clarify the presentation of our claims and evaluation protocols. We address each major comment below with specific revisions planned for the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that 'Our method outperforms current RGB or event-based approaches in both synthetic CVS dataset and real-world evaluations' and 'exhibits strong generalization capability across over 100 extreme real-world scenarios' is load-bearing yet unsupported by any metrics, ablation tables, dataset statistics, or error analysis. Without these, the outperformance and generalization assertions cannot be verified.

    Authors: We agree that the abstract, being a concise summary, does not embed specific numerical values or table references, which can leave the central claims appearing less substantiated upon initial reading. The full manuscript contains these supporting elements in Sections 4.1 (synthetic dataset results with PSNR/SSIM/LPIPS tables and ablations), 4.2 (real-world quantitative comparisons), and 4.3 (generalization analysis with dataset statistics across the 100+ scenarios and error visualizations). To directly address the concern, we will revise the abstract to include representative quantitative highlights (e.g., average PSNR gains over baselines) while adhering to length limits, thereby making the claims verifiable at the summary level. revision: yes

  2. Referee: [Real-world evaluations] Real-world evaluations: because pixel-accurate ground-truth sharp frames cannot be obtained in uncontrolled captures, the outperformance statement requires an explicit protocol (no-reference metrics such as BRISQUE or NIQE, controlled proxy setups, or user studies). If the manuscript relies solely on qualitative visuals for the >100 scenarios, the generalization claim lacks the same rigor as any synthetic results and becomes the weakest link in the central argument.

    Authors: We concur that real-world evaluation without pixel-accurate ground truth demands an explicit, rigorous protocol to support outperformance and generalization claims. Our current manuscript already incorporates no-reference metrics (BRISQUE and NIQE) computed on the deblurred outputs for the 100+ scenarios, alongside qualitative comparisons and a small-scale user study for perceptual validation. However, the description of this protocol is distributed rather than consolidated. We will add a dedicated subsection in the experiments to explicitly detail the evaluation protocol, including metric computation procedures, any controlled proxy setups (e.g., static scenes with known motion), and user study methodology, ensuring the real-world results match the rigor of the synthetic evaluations. revision: yes

Circularity Check

0 steps flagged

No significant circularity; architecture and empirical claims are self-contained

full rationale

The paper introduces STGDNet as a recurrent multi-branch network that fuses SD and TD sequences from the CVS sensor to restore deblurred RGB frames. The derivation chain consists of architectural design choices (iterative encoding and fusion) motivated by sensor properties and human vision analogy, followed by training on a synthetic CVS dataset and evaluation against RGB/event baselines. No equations, loss functions, or performance metrics are shown to reduce by construction to fitted inputs or self-referential definitions. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior author work appear in the provided text. Real-world generalization claims rest on external comparisons rather than internal re-derivation of the same quantities. This is a standard empirical ML architecture paper with independent content.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No explicit free parameters, axioms, or invented entities are described in the abstract. The approach relies on standard assumptions of deep learning (e.g., that recurrent fusion can integrate multi-modal cues) without introducing new physical entities or ad-hoc postulates.

pith-pipeline@v0.9.0 · 5566 in / 1099 out tokens · 44551 ms · 2026-05-10T15:50:33.609939+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

49 extracted references · 1 canonical work pages

  1. [1]

    Non-uniform blind deblurring by reblurring

    Yuval Bahat, Netalee Efrat, and Michal Irani. Non-uniform blind deblurring by reblurring. InProceedings of the IEEE international conference on computer vision, pages 3286– 3294, 2017. 10

  2. [2]

    Event probability mask (epm) and event denois- ing convolutional neural network (edncnn) for neuromor- phic cameras

    R Baldwin, Mohammed Almatrafi, Vijayan Asari, and Keigo Hirakawa. Event probability mask (epm) and event denois- ing convolutional neural network (edncnn) for neuromor- phic cameras. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1701– 1710, 2020

  3. [3]

    A 240×180 130 db 3µs latency global shutter spatiotemporal vision sensor.IEEE Journal of Solid-State Circuits, 49(10):2333–2341, 2014

    Christian Brandli, Raphael Berner, Minhao Yang, Shih-Chii Liu, and Tobi Delbruck. A 240×180 130 db 3µs latency global shutter spatiotemporal vision sensor.IEEE Journal of Solid-State Circuits, 49(10):2333–2341, 2014

  4. [4]

    Sportsslomo: A new bench- mark and baselines for human-centric video frame interpola- tion

    Jiaben Chen and Huaizu Jiang. Sportsslomo: A new bench- mark and baselines for human-centric video frame interpola- tion. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 6475–6486, 2024

  5. [5]

    Hinet: Half instance normalization network for image restoration

    Liangyu Chen, Xin Lu, Jie Zhang, Xiaojie Chu, and Cheng- peng Chen. Hinet: Half instance normalization network for image restoration. InProceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition, pages 182– 192, 2021

  6. [6]

    Evaluation of motion blur image quality in video frame interpolation.Electronic Imaging, 35:262–1, 2023

    Hai Dinh, Qinyi Wang, Fangwen Tu, Brett Frymire, and Bo Mu. Evaluation of motion blur image quality in video frame interpolation.Electronic Imaging, 35:262–1, 2023

  7. [7]

    Eventaid: Bench- marking event-aided image/video enhancement algorithms with real-captured hybrid dataset.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 2025

    Peiqi Duan, Boyu Li, Yixin Yang, Hanyue Lou, Minggui Teng, Xinyu Zhou, Yi Ma, and Boxin Shi. Eventaid: Bench- marking event-aided image/video enhancement algorithms with real-captured hybrid dataset.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 2025

  8. [8]

    Removing camera shake from a sin- gle photograph

    Rob Fergus, Barun Singh, Aaron Hertzmann, Sam T Roweis, and William T Freeman. Removing camera shake from a sin- gle photograph. InAcm Siggraph 2006 Papers, pages 787–

  9. [9]

    Event-based vision: A survey.IEEE transactions on pattern analysis and machine intelligence, 44(1):154–180, 2020

    Guillermo Gallego, Tobi Delbr ¨uck, Garrick Orchard, Chiara Bartolozzi, Brian Taba, Andrea Censi, Stefan Leutenegger, Andrew J Davison, J ¨org Conradt, Kostas Daniilidis, et al. Event-based vision: A survey.IEEE transactions on pattern analysis and machine intelligence, 44(1):154–180, 2020

  10. [10]

    Learning truncated causal history model for video restoration.Advances in Neural Information Processing Systems, 37:27584–27615, 2024

    Amirhosein Ghasemabadi, Muhammad K Janjua, Moham- mad Salameh, and Di Niu. Learning truncated causal history model for video restoration.Advances in Neural Information Processing Systems, 37:27584–27615, 2024

  11. [11]

    A 3-wafer- stacked hybrid 15mpixel cis+ 1 mpixel evs with 4.6 gevent/s readout, in-pixel tdc and on-chip isp and esp function

    Menghan Guo, Shoushun Chen, Zhe Gao, Wenlei Yang, Peter Bartkovjak, Qing Qin, Xiaoqin Hu, Dahei Zhou, Masayuki Uchiyama, Yoshiharu Kudo, et al. A 3-wafer- stacked hybrid 15mpixel cis+ 1 mpixel evs with 4.6 gevent/s readout, in-pixel tdc and on-chip isp and esp function. In 2023 IEEE International Solid-State Circuits Conference (ISSCC), pages 90–92. IEEE, 2023

  12. [12]

    Neuromorphic cam- era guided high dynamic range imaging

    Jin Han, Chu Zhou, Peiqi Duan, Yehui Tang, Chang Xu, Chao Xu, Tiejun Huang, and Boxin Shi. Neuromorphic cam- era guided high dynamic range imaging. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1730–1739, 2020

  13. [13]

    1000×faster camera and machine vision with ordinary devices.Engineering, 25:110–119, 2023

    Tiejun Huang, Yajing Zheng, Zhaofei Yu, Rui Chen, Yuan Li, Ruiqin Xiong, Lei Ma, Junwei Zhao, Siwei Dong, Lin Zhu, et al. 1000×faster camera and machine vision with ordinary devices.Engineering, 25:110–119, 2023

  14. [14]

    Learning event-based motion deblurring

    Zhe Jiang, Yu Zhang, Dongqing Zou, Jimmy Ren, Jiancheng Lv, and Yebin Liu. Learning event-based motion deblurring. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 3320–3329, 2020

  15. [15]

    Event-guided unified frame- work for low-light video enhancement, frame interpolation, and deblurring

    Taewoo Kim and Kuk-Jin Yoon. Event-guided unified frame- work for low-light video enhancement, frame interpolation, and deblurring. InProceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, pages 8524–8534, 2025

  16. [16]

    Event-guided deblurring of unknown exposure time videos

    Taewoo Kim, Jeongmin Lee, Lin Wang, and Kuk-Jin Yoon. Event-guided deblurring of unknown exposure time videos. InEuropean conference on computer vision, pages 519–538. Springer, 2022

  17. [17]

    Cmta: Cross-modal temporal alignment for event-guided video de- blurring

    Taewoo Kim, Hoonhee Cho, and Kuk-Jin Yoon. Cmta: Cross-modal temporal alignment for event-guided video de- blurring. InEuropean Conference on Computer Vision, pages 1–19. Springer, 2024

  18. [18]

    Frequency- aware event-based video deblurring for real-world motion blur

    Taewoo Kim, Hoonhee Cho, and Kuk-Jin Yoon. Frequency- aware event-based video deblurring for real-world motion blur. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 24966–24976, 2024

  19. [19]

    Towards real-world event-guided low- light video enhancement and deblurring

    Taewoo Kim, Jaeseok Jeong, Hoonhee Cho, Yuhwan Jeong, and Kuk-Jin Yoon. Towards real-world event-guided low- light video enhancement and deblurring. InEuropean Con- ference on Computer Vision, pages 433–451. Springer, 2024

  20. [20]

    1.22µm 35.6 mpixel rgb hybrid event-based vision sensor with 4.88µm-pitch event pixels and up to 10k event frame rate by adaptive control on event sparsity

    Kazutoshi Kodama, Yusuke Sato, Yuhi Yorikado, Raphael Berner, Kyoji Mizoguchi, Takahiro Miyazaki, Masahiro Tsukamoto, Yoshihisa Matoba, Hirotaka Shinozaki, Atsumi Niwa, et al. 1.22µm 35.6 mpixel rgb hybrid event-based vision sensor with 4.88µm-pitch event pixels and up to 10k event frame rate by adaptive control on event sparsity. In2023 IEEE Internationa...

  21. [21]

    Deblurgan: Blind mo- tion deblurring using conditional adversarial networks

    Orest Kupyn, V olodymyr Budzan, Mykola Mykhailych, Dmytro Mishkin, and Ji ˇr´ı Matas. Deblurgan: Blind mo- tion deblurring using conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8183–8192, 2018

  22. [22]

    Efficient marginal likelihood optimization in blind de- convolution

    Anat Levin, Yair Weiss, Fredo Durand, and William T Free- man. Efficient marginal likelihood optimization in blind de- convolution. InCVPR 2011, pages 2657–2664. IEEE, 2011

  23. [23]

    A 128×128 120 db 15µs latency asynchronous temporal con- trast vision sensor.IEEE journal of solid-state circuits, 43 (2):566–576, 2008

    Patrick Lichtsteiner, Christoph Posch, and Tobi Delbruck. A 128×128 120 db 15µs latency asynchronous temporal con- trast vision sensor.IEEE journal of solid-state circuits, 43 (2):566–576, 2008

  24. [24]

    Decoupled weight de- cay regularization

    Ilya Loshchilov and Frank Hutter. Decoupled weight de- cay regularization. InInternational Conference on Learning Representations, 2019

  25. [25]

    Event-based camera refractory period characterization and initial clock drift evaluation

    Peter N McMahon-Crabtree, Lucas Kulesza, Brian J McReynolds, Daniel S O’Keefe, Anirvin Puttur, Diana Maestas, Christian P Morath, and Matthew G McHarg. Event-based camera refractory period characterization and initial clock drift evaluation. InUnconventional Imaging, Sensing, and Adaptive Optics 2023, pages 253–273. SPIE, 2023

  26. [26]

    Diffusion-based extreme high-speed 11 scenes reconstruction with the complementary vision sensor

    Yapeng Meng, Yihan Lin, Taoyi Wang, Yuguo Chen, Lijian Wang, and Rong Zhao. Diffusion-based extreme high-speed 11 scenes reconstruction with the complementary vision sensor. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 5701–5710, 2025

  27. [27]

    Are high-resolution event cameras really needed?

    Yapeng Meng, Taoyi Wang, and Yihan Lin. Technical report of a dmd-based characterization method for vision sensors. arXiv preprint arXiv:2203.14672, 2025

  28. [28]

    Deep multi-scale convolutional neural network for dynamic scene deblurring

    Seungjun Nah, Tae Hyun Kim, and Kyoung Mu Lee. Deep multi-scale convolutional neural network for dynamic scene deblurring. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 3883–3891, 2017

  29. [29]

    Re- current neural networks with intra-frame iterations for video deblurring

    Seungjun Nah, Sanghyun Son, and Kyoung Mu Lee. Re- current neural networks with intra-frame iterations for video deblurring. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8102–8111, 2019

  30. [30]

    Deep discriminative spatial and temporal net- work for efficient video deblurring

    Jinshan Pan, Boming Xu, Jiangxin Dong, Jianjun Ge, and Jinhui Tang. Deep discriminative spatial and temporal net- work for efficient video deblurring. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22191–22200, 2023

  31. [31]

    Bringing a blurry frame alive at high frame-rate with an event camera

    Liyuan Pan, Cedric Scheerlinck, Xin Yu, Richard Hartley, Miaomiao Liu, and Yuchao Dai. Bringing a blurry frame alive at high frame-rate with an event camera. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6820–6829, 2019

  32. [32]

    Bringing events into video deblurring with non-consecutively blurry frames

    Wei Shang, Dongwei Ren, Dongqing Zou, Jimmy S Ren, Ping Luo, and Wangmeng Zuo. Bringing events into video deblurring with non-consecutively blurry frames. InPro- ceedings of the IEEE/CVF international conference on com- puter vision, pages 4531–4540, 2021

  33. [33]

    Reducing the sim-to-real gap for event cam- eras

    Timo Stoffregen, Cedric Scheerlinck, Davide Scaramuzza, Tom Drummond, Nick Barnes, Lindsay Kleeman, and Robert Mahony. Reducing the sim-to-real gap for event cam- eras. InEuropean Conference on Computer Vision, pages 534–549. Springer, 2020

  34. [34]

    Event-based fusion for motion deblurring with cross- modal attention

    Lei Sun, Christos Sakaridis, Jingyun Liang, Qi Jiang, Kailun Yang, Peng Sun, Yaozu Ye, Kaiwei Wang, and Luc Van Gool. Event-based fusion for motion deblurring with cross- modal attention. InEuropean conference on computer vision, pages 412–428. Springer, 2022

  35. [35]

    Motion aware event representation-driven image deblurring

    Zhijing Sun, Xueyang Fu, Longzhuo Huang, Aiping Liu, and Zheng-Jun Zha. Motion aware event representation-driven image deblurring. InEuropean Conference on Computer Vi- sion, pages 418–435. Springer, 2024

  36. [36]

    Scale-recurrent network for deep image deblurring

    Xin Tao, Hongyun Gao, Xiaoyong Shen, Jue Wang, and Ji- aya Jia. Scale-recurrent network for deep image deblurring. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 8174–8182, 2018

  37. [37]

    Banet: A blur-aware attention network for dynamic scene deblurring.IEEE Transactions on Image Processing, 31:6789–6799, 2022

    Fu-Jen Tsai, Yan-Tsung Peng, Chung-Chi Tsai, Yen-Yu Lin, and Chia-Wen Lin. Banet: A blur-aware attention network for dynamic scene deblurring.IEEE Transactions on Image Processing, 31:6789–6799, 2022

  38. [38]

    Image sensing with mul- tilayer nonlinear optical neural networks.Nature Photonics, 17(5):408–415, 2023

    Tianyu Wang, Mandar M Sohoni, Logan G Wright, Mar- tin M Stein, Shi-Yuan Ma, Tatsuhiro Onodera, Maxwell G Anderson, and Peter L McMahon. Image sensing with mul- tilayer nonlinear optical neural networks.Nature Photonics, 17(5):408–415, 2023

  39. [39]

    Motion deblur- ring with real events

    Fang Xu, Lei Yu, Bishan Wang, Wen Yang, Gui-Song Xia, Xu Jia, Zhendong Qiao, and Jianzhuang Liu. Motion deblur- ring with real events. InProceedings of the IEEE/CVF Inter- national Conference on Computer Vision, pages 2583–2592, 2021

  40. [40]

    Unnatural l0 sparse representation for natural image deblurring

    Li Xu, Shicheng Zheng, and Jiaya Jia. Unnatural l0 sparse representation for natural image deblurring. InProceed- ings of the IEEE conference on computer vision and pattern recognition, pages 1107–1114, 2013

  41. [41]

    Motion deblurring via spatial-temporal collaboration of frames and events

    Wen Yang, Jinjian Wu, Jupo Ma, Leida Li, and Guangming Shi. Motion deblurring via spatial-temporal collaboration of frames and events. InProceedings of the AAAI Conference on Artificial Intelligence, pages 6531–6539, 2024

  42. [42]

    A vision chip with complementary pathways for open-world sensing.Nature, 629(8014):1027– 1033, 2024

    Zheyu Yang, Taoyi Wang, Yihan Lin, Yuguo Chen, Hui Zeng, Jing Pei, Jiazheng Wang, Xue Liu, Yichun Zhou, Jianqiang Zhang, et al. A vision chip with complementary pathways for open-world sensing.Nature, 629(8014):1027– 1033, 2024

  43. [43]

    Learning scale-aware spatio-temporal implicit representation for event-based motion deblurring

    Wei Yu, Jianing Li, Shengping Zhang, and Xiangyang Ji. Learning scale-aware spatio-temporal implicit representation for event-based motion deblurring. InForty-first Interna- tional Conference on Machine Learning, 2024

  44. [44]

    Multi-stage progressive image restoration

    Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, and Ling Shao. Multi-stage progressive image restoration. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14821–14831, 2021

  45. [45]

    Restormer: Efficient transformer for high-resolution image restoration

    Syed Waqas Zamir, Aditya Arora, Salman Khan, Mu- nawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang. Restormer: Efficient transformer for high-resolution image restoration. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5728–5739, 2022

  46. [46]

    Deep image deblurring: A survey.International Journal of Com- puter Vision, 130(9):2103–2130, 2022

    Kaihao Zhang, Wenqi Ren, Wenhan Luo, Wei-Sheng Lai, Bj¨orn Stenger, Ming-Hsuan Yang, and Hongdong Li. Deep image deblurring: A survey.International Journal of Com- puter Vision, 130(9):2103–2130, 2022

  47. [47]

    Generalizing event-based motion deblurring in real-world scenarios

    Xiang Zhang, Lei Yu, Wen Yang, Jianzhuang Liu, and Gui- Song Xia. Generalizing event-based motion deblurring in real-world scenarios. InProceedings of the IEEE/CVF In- ternational Conference on Computer Vision, pages 10734– 10744, 2023

  48. [48]

    Deep recurrent neural net- work with multi-scale bi-directional propagation for video deblurring

    Chao Zhu, Hang Dong, Jinshan Pan, Boyang Liang, Yuhao Huang, Lean Fu, and Fei Wang. Deep recurrent neural net- work with multi-scale bi-directional propagation for video deblurring. InProceedings of the AAAI conference on artifi- cial intelligence, pages 3598–3607, 2022

  49. [49]

    Sepa- ration for better integration: Disentangling edge and motion in event-based deblurring

    Yufei Zhu, Hao Chen, Yongjian Deng, and Wei You. Sepa- ration for better integration: Disentangling edge and motion in event-based deblurring. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 14732– 14742, 2025. 12