arxiv: 2604.10554 · v1 · submitted 2026-04-12 · 💻 cs.CV

Recognition: unknown

Spatio-Temporal Difference Guided Motion Deblurring with the Complementary Vision Sensor

Yapeng Meng , Lin Yang , Yuguo Chen , Xiangru Chen , Taoyi Wang , Lijian Wang , Zheyu Yang , Yihan Lin

show 1 more author

Rong Zhao

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:50 UTC · model grok-4.3

classification 💻 cs.CV

keywords motion deblurringcomplementary vision sensorspatial temporal differenceevent-based visionimage restorationrecurrent networksensor fusion

0 comments

The pith

Fusing spatial and temporal difference signals from a complementary vision sensor restores details lost in motion-blurred RGB frames.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces STGDNet to address motion deblurring, an ill-posed problem when only a single blurry RGB frame is available. The complementary vision sensor captures synchronized high-frame-rate spatial difference data for structural edges and temporal difference data for motion cues within the same exposure period. A recurrent multi-branch network iteratively encodes and fuses these sequences with the RGB input to recover structure and color. Evaluations demonstrate outperformance over RGB-only and event-based deblurring methods on a synthetic dataset and in real-world extreme motion cases, with generalization across more than 100 scenarios.

Core claim

STGDNet adopts a recurrent multi-branch architecture that iteratively encodes and fuses SD and TD sequences from the CVS sensor to restore structure and color details lost in blurry RGB inputs, outperforming current RGB or event-based approaches in both synthetic CVS dataset and real-world evaluations while exhibiting strong generalization across over 100 extreme real-world scenarios.

What carries the argument

Recurrent multi-branch architecture that iteratively encodes and fuses synchronized spatial difference (SD) sequences for structure and temporal difference (TD) sequences for motion with the input RGB frame.

If this is right

Deblurring succeeds in extreme dynamic scenes where RGB-only methods collapse due to lost intra-exposure motion.
The approach mitigates event rate saturation that limits traditional event cameras under rapid motion.
Generalization holds across diverse real-world extreme motions beyond the training distribution.
Restored frames retain both geometric structure from SD and color fidelity from the RGB channel.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same sensor data streams could support related tasks such as motion estimation or object tracking without separate high-speed hardware.
Camera hardware that natively outputs these difference signals might reduce reliance on computationally heavy post-processing for dynamic scenes.
Simulated SD and TD channels on conventional high-speed cameras could test whether the fusion benefit transfers outside the specific CVS hardware.

Load-bearing premise

The SD and TD modalities supply enough independent structural and motion cues to make deblurring well-posed through fusion in the recurrent architecture without additional scene priors or post-processing.

What would settle it

A side-by-side test on identical blurry RGB inputs with and without the SD or TD streams from the CVS sensor, where the version using the differences shows no measurable improvement in sharpness or detail recovery would falsify the claim.

Figures

Figures reproduced from arXiv: 2604.10554 by Lijian Wang, Lin Yang, Rong Zhao, Taoyi Wang, Xiangru Chen, Yapeng Meng, Yihan Lin, Yuguo Chen, Zheyu Yang.

**Figure 2.** Figure 2: Architecture of the Spatio-temporal Difference Guided Deblur Net (STGDNet), with including: Temporal Recurrent Refinement [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Visualization of different methods on SportsSloMo-CVS dataset. PSNR values for the cropped regions are provided. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Results on real-captured data compared with event-based methods and CVS-based method. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Real-world deblurring results of CVS under different RGB exposure times (us). [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Deblurring results across different rotational speeds and [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Performance boundary visualization. (a) 1D angular [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

read the original abstract

Motion blur arises when rapid scene changes occur during the exposure period, collapsing rich intra-exposure motion into a single RGB frame. Without explicit structural or temporal cues, RGB-only deblurring is highly ill-posed and often fails under extreme motion. Inspired by the human visual system, brain-inspired vision sensors introduce temporally dense information to alleviate this problem. However, event cameras still suffer from event rate saturation under rapid motion, while the event modality entangles edge features and motion cues, which limits their effectiveness. As a recent breakthrough, the complementary vision sensor (CVS), Tianmouc, captures synchronized RGB frames together with high-frame-rate, multi-bit spatial difference (SD, encoding structural edges) and temporal difference (TD, encoding motion cues) data within a single RGB exposure, offering a promising solution for RGB deblurring under extreme dynamic scenes. To fully leverage these complementary modalities, we propose Spatio-Temporal Difference Guided Deblur Net (STGDNet), which adopts a recurrent multi-branch architecture that iteratively encodes and fuses SD and TD sequences to restore structure and color details lost in blurry RGB inputs. Our method outperforms current RGB or event-based approaches in both synthetic CVS dataset and real-world evaluations. Moreover, STGDNet exhibits strong generalization capability across over 100 extreme real-world scenarios. Project page: https://tmcDeblur.github.io/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

STGDNet gives a straightforward recurrent fusion method for the Tianmouc CVS sensor's SD and TD channels, but the real-world outperformance and generalization claims rest on thin evidence.

read the letter

The paper's main contribution is STGDNet, a recurrent multi-branch network that takes a blurry RGB frame plus the synchronized multi-bit spatial difference and temporal difference sequences from the Tianmouc sensor and iteratively fuses them to recover structure and color. This is a direct response to the sensor's output format, where SD supplies edge information and TD supplies motion cues within one exposure window. That separation is the practical advantage over event cameras, which mix the two signals and can saturate under fast motion. The architecture description in the abstract is clear enough to see how the recurrent design lets information propagate across the high-frame-rate auxiliary streams. If the full paper shows solid synthetic results with standard metrics and reasonable baselines, that part of the work is usable for anyone who has or can obtain this hardware. The motivation section also correctly identifies why plain RGB deblurring fails in extreme dynamics and why the extra channels help make the problem less ill-posed. The softer spot is the real-world evaluation. The abstract asserts outperformance over RGB and event methods plus strong generalization across more than 100 extreme scenarios. Without ground-truth sharp frames in uncontrolled captures, any such claim has to rest on no-reference metrics, controlled proxies, or user studies. If the paper mainly shows side-by-side visuals there, the generalization statement is weaker than it reads and matches the common limitation the stress-test note flags. No mention of ablations or error breakdowns appears in the abstract, which is a minor gap but worth checking in the full text. This paper is aimed at computational photography and robotics vision groups that are already looking at non-standard sensors or high-speed capture. A reader working on sensor-specific restoration algorithms would pick up the fusion pattern and the sensor details. It deserves a serious referee because the sensor is recent and the method is a concrete, implementable first step rather than hand-waving. Reviewers can ask for clearer real-world protocols and any missing component tests. I would send it out for review rather than desk reject.

Referee Report

2 major / 0 minor

Summary. The paper proposes STGDNet, a recurrent multi-branch neural architecture that iteratively encodes and fuses high-frame-rate spatial difference (SD, structural edges) and temporal difference (TD, motion cues) sequences from the Tianmouc Complementary Vision Sensor (CVS) together with blurry RGB frames to perform motion deblurring. It claims superior performance to existing RGB-only and event-based deblurring methods on a synthetic CVS dataset as well as in real-world evaluations, with strong generalization across more than 100 extreme real-world scenarios.

Significance. If the quantitative claims hold under rigorous evaluation, the work would be significant for computer vision by demonstrating how a novel sensor providing synchronized structural and motion information within a single exposure can make extreme-motion deblurring better-posed than with RGB or event cameras alone. The recurrent multi-branch fusion strategy offers a concrete architectural template for multi-modal deblurring that could influence subsequent sensor-fusion research.

major comments (2)

[Abstract] Abstract: the central claim that 'Our method outperforms current RGB or event-based approaches in both synthetic CVS dataset and real-world evaluations' and 'exhibits strong generalization capability across over 100 extreme real-world scenarios' is load-bearing yet unsupported by any metrics, ablation tables, dataset statistics, or error analysis. Without these, the outperformance and generalization assertions cannot be verified.
[Real-world evaluations] Real-world evaluations: because pixel-accurate ground-truth sharp frames cannot be obtained in uncontrolled captures, the outperformance statement requires an explicit protocol (no-reference metrics such as BRISQUE or NIQE, controlled proxy setups, or user studies). If the manuscript relies solely on qualitative visuals for the >100 scenarios, the generalization claim lacks the same rigor as any synthetic results and becomes the weakest link in the central argument.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which helps clarify the presentation of our claims and evaluation protocols. We address each major comment below with specific revisions planned for the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that 'Our method outperforms current RGB or event-based approaches in both synthetic CVS dataset and real-world evaluations' and 'exhibits strong generalization capability across over 100 extreme real-world scenarios' is load-bearing yet unsupported by any metrics, ablation tables, dataset statistics, or error analysis. Without these, the outperformance and generalization assertions cannot be verified.

Authors: We agree that the abstract, being a concise summary, does not embed specific numerical values or table references, which can leave the central claims appearing less substantiated upon initial reading. The full manuscript contains these supporting elements in Sections 4.1 (synthetic dataset results with PSNR/SSIM/LPIPS tables and ablations), 4.2 (real-world quantitative comparisons), and 4.3 (generalization analysis with dataset statistics across the 100+ scenarios and error visualizations). To directly address the concern, we will revise the abstract to include representative quantitative highlights (e.g., average PSNR gains over baselines) while adhering to length limits, thereby making the claims verifiable at the summary level. revision: yes
Referee: [Real-world evaluations] Real-world evaluations: because pixel-accurate ground-truth sharp frames cannot be obtained in uncontrolled captures, the outperformance statement requires an explicit protocol (no-reference metrics such as BRISQUE or NIQE, controlled proxy setups, or user studies). If the manuscript relies solely on qualitative visuals for the >100 scenarios, the generalization claim lacks the same rigor as any synthetic results and becomes the weakest link in the central argument.

Authors: We concur that real-world evaluation without pixel-accurate ground truth demands an explicit, rigorous protocol to support outperformance and generalization claims. Our current manuscript already incorporates no-reference metrics (BRISQUE and NIQE) computed on the deblurred outputs for the 100+ scenarios, alongside qualitative comparisons and a small-scale user study for perceptual validation. However, the description of this protocol is distributed rather than consolidated. We will add a dedicated subsection in the experiments to explicitly detail the evaluation protocol, including metric computation procedures, any controlled proxy setups (e.g., static scenes with known motion), and user study methodology, ensuring the real-world results match the rigor of the synthetic evaluations. revision: yes

Circularity Check

0 steps flagged

No significant circularity; architecture and empirical claims are self-contained

full rationale

The paper introduces STGDNet as a recurrent multi-branch network that fuses SD and TD sequences from the CVS sensor to restore deblurred RGB frames. The derivation chain consists of architectural design choices (iterative encoding and fusion) motivated by sensor properties and human vision analogy, followed by training on a synthetic CVS dataset and evaluation against RGB/event baselines. No equations, loss functions, or performance metrics are shown to reduce by construction to fitted inputs or self-referential definitions. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior author work appear in the provided text. Real-world generalization claims rest on external comparisons rather than internal re-derivation of the same quantities. This is a standard empirical ML architecture paper with independent content.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No explicit free parameters, axioms, or invented entities are described in the abstract. The approach relies on standard assumptions of deep learning (e.g., that recurrent fusion can integrate multi-modal cues) without introducing new physical entities or ad-hoc postulates.

pith-pipeline@v0.9.0 · 5566 in / 1099 out tokens · 44551 ms · 2026-05-10T15:50:33.609939+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

49 extracted references · 1 canonical work pages

[1]

Non-uniform blind deblurring by reblurring

Yuval Bahat, Netalee Efrat, and Michal Irani. Non-uniform blind deblurring by reblurring. InProceedings of the IEEE international conference on computer vision, pages 3286– 3294, 2017. 10

2017
[2]

Event probability mask (epm) and event denois- ing convolutional neural network (edncnn) for neuromor- phic cameras

R Baldwin, Mohammed Almatrafi, Vijayan Asari, and Keigo Hirakawa. Event probability mask (epm) and event denois- ing convolutional neural network (edncnn) for neuromor- phic cameras. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1701– 1710, 2020

2020
[3]

A 240×180 130 db 3µs latency global shutter spatiotemporal vision sensor.IEEE Journal of Solid-State Circuits, 49(10):2333–2341, 2014

Christian Brandli, Raphael Berner, Minhao Yang, Shih-Chii Liu, and Tobi Delbruck. A 240×180 130 db 3µs latency global shutter spatiotemporal vision sensor.IEEE Journal of Solid-State Circuits, 49(10):2333–2341, 2014

2014
[4]

Sportsslomo: A new bench- mark and baselines for human-centric video frame interpola- tion

Jiaben Chen and Huaizu Jiang. Sportsslomo: A new bench- mark and baselines for human-centric video frame interpola- tion. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 6475–6486, 2024

2024
[5]

Hinet: Half instance normalization network for image restoration

Liangyu Chen, Xin Lu, Jie Zhang, Xiaojie Chu, and Cheng- peng Chen. Hinet: Half instance normalization network for image restoration. InProceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition, pages 182– 192, 2021

2021
[6]

Evaluation of motion blur image quality in video frame interpolation.Electronic Imaging, 35:262–1, 2023

Hai Dinh, Qinyi Wang, Fangwen Tu, Brett Frymire, and Bo Mu. Evaluation of motion blur image quality in video frame interpolation.Electronic Imaging, 35:262–1, 2023

2023
[7]

Eventaid: Bench- marking event-aided image/video enhancement algorithms with real-captured hybrid dataset.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 2025

Peiqi Duan, Boyu Li, Yixin Yang, Hanyue Lou, Minggui Teng, Xinyu Zhou, Yi Ma, and Boxin Shi. Eventaid: Bench- marking event-aided image/video enhancement algorithms with real-captured hybrid dataset.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 2025

2025
[8]

Removing camera shake from a sin- gle photograph

Rob Fergus, Barun Singh, Aaron Hertzmann, Sam T Roweis, and William T Freeman. Removing camera shake from a sin- gle photograph. InAcm Siggraph 2006 Papers, pages 787–

2006
[9]

Event-based vision: A survey.IEEE transactions on pattern analysis and machine intelligence, 44(1):154–180, 2020

Guillermo Gallego, Tobi Delbr ¨uck, Garrick Orchard, Chiara Bartolozzi, Brian Taba, Andrea Censi, Stefan Leutenegger, Andrew J Davison, J ¨org Conradt, Kostas Daniilidis, et al. Event-based vision: A survey.IEEE transactions on pattern analysis and machine intelligence, 44(1):154–180, 2020

2020
[10]

Learning truncated causal history model for video restoration.Advances in Neural Information Processing Systems, 37:27584–27615, 2024

Amirhosein Ghasemabadi, Muhammad K Janjua, Moham- mad Salameh, and Di Niu. Learning truncated causal history model for video restoration.Advances in Neural Information Processing Systems, 37:27584–27615, 2024

2024
[11]

A 3-wafer- stacked hybrid 15mpixel cis+ 1 mpixel evs with 4.6 gevent/s readout, in-pixel tdc and on-chip isp and esp function

Menghan Guo, Shoushun Chen, Zhe Gao, Wenlei Yang, Peter Bartkovjak, Qing Qin, Xiaoqin Hu, Dahei Zhou, Masayuki Uchiyama, Yoshiharu Kudo, et al. A 3-wafer- stacked hybrid 15mpixel cis+ 1 mpixel evs with 4.6 gevent/s readout, in-pixel tdc and on-chip isp and esp function. In 2023 IEEE International Solid-State Circuits Conference (ISSCC), pages 90–92. IEEE, 2023

2023
[12]

Neuromorphic cam- era guided high dynamic range imaging

Jin Han, Chu Zhou, Peiqi Duan, Yehui Tang, Chang Xu, Chao Xu, Tiejun Huang, and Boxin Shi. Neuromorphic cam- era guided high dynamic range imaging. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1730–1739, 2020

2020
[13]

1000×faster camera and machine vision with ordinary devices.Engineering, 25:110–119, 2023

Tiejun Huang, Yajing Zheng, Zhaofei Yu, Rui Chen, Yuan Li, Ruiqin Xiong, Lei Ma, Junwei Zhao, Siwei Dong, Lin Zhu, et al. 1000×faster camera and machine vision with ordinary devices.Engineering, 25:110–119, 2023

2023
[14]

Learning event-based motion deblurring

Zhe Jiang, Yu Zhang, Dongqing Zou, Jimmy Ren, Jiancheng Lv, and Yebin Liu. Learning event-based motion deblurring. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 3320–3329, 2020

2020
[15]

Event-guided unified frame- work for low-light video enhancement, frame interpolation, and deblurring

Taewoo Kim and Kuk-Jin Yoon. Event-guided unified frame- work for low-light video enhancement, frame interpolation, and deblurring. InProceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, pages 8524–8534, 2025

2025
[16]

Event-guided deblurring of unknown exposure time videos

Taewoo Kim, Jeongmin Lee, Lin Wang, and Kuk-Jin Yoon. Event-guided deblurring of unknown exposure time videos. InEuropean conference on computer vision, pages 519–538. Springer, 2022

2022
[17]

Cmta: Cross-modal temporal alignment for event-guided video de- blurring

Taewoo Kim, Hoonhee Cho, and Kuk-Jin Yoon. Cmta: Cross-modal temporal alignment for event-guided video de- blurring. InEuropean Conference on Computer Vision, pages 1–19. Springer, 2024

2024
[18]

Frequency- aware event-based video deblurring for real-world motion blur

Taewoo Kim, Hoonhee Cho, and Kuk-Jin Yoon. Frequency- aware event-based video deblurring for real-world motion blur. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 24966–24976, 2024

2024
[19]

Towards real-world event-guided low- light video enhancement and deblurring

Taewoo Kim, Jaeseok Jeong, Hoonhee Cho, Yuhwan Jeong, and Kuk-Jin Yoon. Towards real-world event-guided low- light video enhancement and deblurring. InEuropean Con- ference on Computer Vision, pages 433–451. Springer, 2024

2024
[20]

1.22µm 35.6 mpixel rgb hybrid event-based vision sensor with 4.88µm-pitch event pixels and up to 10k event frame rate by adaptive control on event sparsity

Kazutoshi Kodama, Yusuke Sato, Yuhi Yorikado, Raphael Berner, Kyoji Mizoguchi, Takahiro Miyazaki, Masahiro Tsukamoto, Yoshihisa Matoba, Hirotaka Shinozaki, Atsumi Niwa, et al. 1.22µm 35.6 mpixel rgb hybrid event-based vision sensor with 4.88µm-pitch event pixels and up to 10k event frame rate by adaptive control on event sparsity. In2023 IEEE Internationa...

2023
[21]

Deblurgan: Blind mo- tion deblurring using conditional adversarial networks

Orest Kupyn, V olodymyr Budzan, Mykola Mykhailych, Dmytro Mishkin, and Ji ˇr´ı Matas. Deblurgan: Blind mo- tion deblurring using conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8183–8192, 2018

2018
[22]

Efficient marginal likelihood optimization in blind de- convolution

Anat Levin, Yair Weiss, Fredo Durand, and William T Free- man. Efficient marginal likelihood optimization in blind de- convolution. InCVPR 2011, pages 2657–2664. IEEE, 2011

2011
[23]

A 128×128 120 db 15µs latency asynchronous temporal con- trast vision sensor.IEEE journal of solid-state circuits, 43 (2):566–576, 2008

Patrick Lichtsteiner, Christoph Posch, and Tobi Delbruck. A 128×128 120 db 15µs latency asynchronous temporal con- trast vision sensor.IEEE journal of solid-state circuits, 43 (2):566–576, 2008

2008
[24]

Decoupled weight de- cay regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight de- cay regularization. InInternational Conference on Learning Representations, 2019

2019
[25]

Event-based camera refractory period characterization and initial clock drift evaluation

Peter N McMahon-Crabtree, Lucas Kulesza, Brian J McReynolds, Daniel S O’Keefe, Anirvin Puttur, Diana Maestas, Christian P Morath, and Matthew G McHarg. Event-based camera refractory period characterization and initial clock drift evaluation. InUnconventional Imaging, Sensing, and Adaptive Optics 2023, pages 253–273. SPIE, 2023

2023
[26]

Diffusion-based extreme high-speed 11 scenes reconstruction with the complementary vision sensor

Yapeng Meng, Yihan Lin, Taoyi Wang, Yuguo Chen, Lijian Wang, and Rong Zhao. Diffusion-based extreme high-speed 11 scenes reconstruction with the complementary vision sensor. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 5701–5710, 2025

2025
[27]

Are high-resolution event cameras really needed?

Yapeng Meng, Taoyi Wang, and Yihan Lin. Technical report of a dmd-based characterization method for vision sensors. arXiv preprint arXiv:2203.14672, 2025

work page arXiv 2025
[28]

Deep multi-scale convolutional neural network for dynamic scene deblurring

Seungjun Nah, Tae Hyun Kim, and Kyoung Mu Lee. Deep multi-scale convolutional neural network for dynamic scene deblurring. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 3883–3891, 2017

2017
[29]

Re- current neural networks with intra-frame iterations for video deblurring

Seungjun Nah, Sanghyun Son, and Kyoung Mu Lee. Re- current neural networks with intra-frame iterations for video deblurring. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8102–8111, 2019

2019
[30]

Deep discriminative spatial and temporal net- work for efficient video deblurring

Jinshan Pan, Boming Xu, Jiangxin Dong, Jianjun Ge, and Jinhui Tang. Deep discriminative spatial and temporal net- work for efficient video deblurring. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22191–22200, 2023

2023
[31]

Bringing a blurry frame alive at high frame-rate with an event camera

Liyuan Pan, Cedric Scheerlinck, Xin Yu, Richard Hartley, Miaomiao Liu, and Yuchao Dai. Bringing a blurry frame alive at high frame-rate with an event camera. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6820–6829, 2019

2019
[32]

Bringing events into video deblurring with non-consecutively blurry frames

Wei Shang, Dongwei Ren, Dongqing Zou, Jimmy S Ren, Ping Luo, and Wangmeng Zuo. Bringing events into video deblurring with non-consecutively blurry frames. InPro- ceedings of the IEEE/CVF international conference on com- puter vision, pages 4531–4540, 2021

2021
[33]

Reducing the sim-to-real gap for event cam- eras

Timo Stoffregen, Cedric Scheerlinck, Davide Scaramuzza, Tom Drummond, Nick Barnes, Lindsay Kleeman, and Robert Mahony. Reducing the sim-to-real gap for event cam- eras. InEuropean Conference on Computer Vision, pages 534–549. Springer, 2020

2020
[34]

Event-based fusion for motion deblurring with cross- modal attention

Lei Sun, Christos Sakaridis, Jingyun Liang, Qi Jiang, Kailun Yang, Peng Sun, Yaozu Ye, Kaiwei Wang, and Luc Van Gool. Event-based fusion for motion deblurring with cross- modal attention. InEuropean conference on computer vision, pages 412–428. Springer, 2022

2022
[35]

Motion aware event representation-driven image deblurring

Zhijing Sun, Xueyang Fu, Longzhuo Huang, Aiping Liu, and Zheng-Jun Zha. Motion aware event representation-driven image deblurring. InEuropean Conference on Computer Vi- sion, pages 418–435. Springer, 2024

2024
[36]

Scale-recurrent network for deep image deblurring

Xin Tao, Hongyun Gao, Xiaoyong Shen, Jue Wang, and Ji- aya Jia. Scale-recurrent network for deep image deblurring. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 8174–8182, 2018

2018
[37]

Banet: A blur-aware attention network for dynamic scene deblurring.IEEE Transactions on Image Processing, 31:6789–6799, 2022

Fu-Jen Tsai, Yan-Tsung Peng, Chung-Chi Tsai, Yen-Yu Lin, and Chia-Wen Lin. Banet: A blur-aware attention network for dynamic scene deblurring.IEEE Transactions on Image Processing, 31:6789–6799, 2022

2022
[38]

Image sensing with mul- tilayer nonlinear optical neural networks.Nature Photonics, 17(5):408–415, 2023

Tianyu Wang, Mandar M Sohoni, Logan G Wright, Mar- tin M Stein, Shi-Yuan Ma, Tatsuhiro Onodera, Maxwell G Anderson, and Peter L McMahon. Image sensing with mul- tilayer nonlinear optical neural networks.Nature Photonics, 17(5):408–415, 2023

2023
[39]

Motion deblur- ring with real events

Fang Xu, Lei Yu, Bishan Wang, Wen Yang, Gui-Song Xia, Xu Jia, Zhendong Qiao, and Jianzhuang Liu. Motion deblur- ring with real events. InProceedings of the IEEE/CVF Inter- national Conference on Computer Vision, pages 2583–2592, 2021

2021
[40]

Unnatural l0 sparse representation for natural image deblurring

Li Xu, Shicheng Zheng, and Jiaya Jia. Unnatural l0 sparse representation for natural image deblurring. InProceed- ings of the IEEE conference on computer vision and pattern recognition, pages 1107–1114, 2013

2013
[41]

Motion deblurring via spatial-temporal collaboration of frames and events

Wen Yang, Jinjian Wu, Jupo Ma, Leida Li, and Guangming Shi. Motion deblurring via spatial-temporal collaboration of frames and events. InProceedings of the AAAI Conference on Artificial Intelligence, pages 6531–6539, 2024

2024
[42]

A vision chip with complementary pathways for open-world sensing.Nature, 629(8014):1027– 1033, 2024

Zheyu Yang, Taoyi Wang, Yihan Lin, Yuguo Chen, Hui Zeng, Jing Pei, Jiazheng Wang, Xue Liu, Yichun Zhou, Jianqiang Zhang, et al. A vision chip with complementary pathways for open-world sensing.Nature, 629(8014):1027– 1033, 2024

2024
[43]

Learning scale-aware spatio-temporal implicit representation for event-based motion deblurring

Wei Yu, Jianing Li, Shengping Zhang, and Xiangyang Ji. Learning scale-aware spatio-temporal implicit representation for event-based motion deblurring. InForty-first Interna- tional Conference on Machine Learning, 2024

2024
[44]

Multi-stage progressive image restoration

Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, and Ling Shao. Multi-stage progressive image restoration. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14821–14831, 2021

2021
[45]

Restormer: Efficient transformer for high-resolution image restoration

Syed Waqas Zamir, Aditya Arora, Salman Khan, Mu- nawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang. Restormer: Efficient transformer for high-resolution image restoration. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5728–5739, 2022

2022
[46]

Deep image deblurring: A survey.International Journal of Com- puter Vision, 130(9):2103–2130, 2022

Kaihao Zhang, Wenqi Ren, Wenhan Luo, Wei-Sheng Lai, Bj¨orn Stenger, Ming-Hsuan Yang, and Hongdong Li. Deep image deblurring: A survey.International Journal of Com- puter Vision, 130(9):2103–2130, 2022

2022
[47]

Generalizing event-based motion deblurring in real-world scenarios

Xiang Zhang, Lei Yu, Wen Yang, Jianzhuang Liu, and Gui- Song Xia. Generalizing event-based motion deblurring in real-world scenarios. InProceedings of the IEEE/CVF In- ternational Conference on Computer Vision, pages 10734– 10744, 2023

2023
[48]

Deep recurrent neural net- work with multi-scale bi-directional propagation for video deblurring

Chao Zhu, Hang Dong, Jinshan Pan, Boyang Liang, Yuhao Huang, Lean Fu, and Fei Wang. Deep recurrent neural net- work with multi-scale bi-directional propagation for video deblurring. InProceedings of the AAAI conference on artifi- cial intelligence, pages 3598–3607, 2022

2022
[49]

Sepa- ration for better integration: Disentangling edge and motion in event-based deblurring

Yufei Zhu, Hao Chen, Yongjian Deng, and Wei You. Sepa- ration for better integration: Disentangling edge and motion in event-based deblurring. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 14732– 14742, 2025. 12

2025