arxiv: 2602.19202 · v2 · submitted 2026-02-22 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

UniE2F: A Unified Diffusion Framework for Event-to-Frame Reconstruction with Video Foundation Models

Gang Xu , Zhiyu Zhu , Junhui Hou

Authors on Pith no claims yet

Pith reviewed 2026-05-15 20:33 UTC · model grok-4.3

classification 💻 cs.CV

keywords event cameravideo reconstructiondiffusion modelframe interpolationzero-shot predictiongenerative priorevent-to-frame

0 comments

The pith

Pre-trained video diffusion models reconstruct high-fidelity frames from sparse event camera streams when guided by inter-frame residuals.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that event cameras lose absolute intensity and static texture because they record only changes, and that this loss can be recovered by conditioning a pre-trained video diffusion model on the event stream. It introduces an event-based inter-frame residual guidance term that exploits the physical correlation between events and frame differences to improve reconstruction accuracy. The same framework is then extended without retraining to video interpolation and prediction by modulating the diffusion sampling process. If these steps hold, event data can be turned into dense, high-quality video output that outperforms prior specialized methods on both real and synthetic benchmarks.

Core claim

A baseline that feeds event data directly as conditioning to a video diffusion model already produces usable frames; adding event-based inter-frame residual guidance, which injects the difference between consecutive reconstructed frames as an additional control signal, raises fidelity further. The same conditioned reverse process supports zero-shot frame interpolation and future-frame prediction simply by changing the number and timing of sampling steps.

What carries the argument

Event-based inter-frame residual guidance, a conditioning signal derived from the physical difference between successive frames that is injected into the diffusion reverse process alongside the event stream.

If this is right

Event streams can be converted to dense video without task-specific training once a video diffusion prior is available.
The same model supports both reconstruction and temporal tasks (interpolation, prediction) by changing only the sampling schedule.
Quantitative gains appear on both synthetic and real event datasets, indicating the guidance term transfers across domains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If residual guidance proves robust, the method could reduce the need for paired event-frame training data in future event-vision pipelines.
Extending the same conditioning to longer sequences might allow consistent video generation from very sparse event input over many seconds.
The framework suggests that other sparse sensors (e.g., lidar point clouds) could be paired with video diffusion priors using analogous residual signals.

Load-bearing premise

The physical correlation between recorded events and actual frame intensity differences is strong enough that residual guidance computed from reconstructed frames remains accurate.

What would settle it

A controlled ablation on a held-out real-world event dataset in which removing the residual guidance term produces a statistically significant drop in PSNR or perceptual metrics compared with the full method.

read the original abstract

Event cameras excel at high-speed, low-power, and high-dynamic-range scene perception. However, as they fundamentally record only relative intensity changes rather than absolute intensity, the resulting data streams suffer from a significant loss of spatial information and static texture details. In this paper, we address this limitation by leveraging the generative prior of a pre-trained video diffusion model to reconstruct high-fidelity video frames from sparse event data. Specifically, we first establish a baseline model by directly applying event data as a condition to synthesize videos. Then, based on the physical correlation between the event stream and video frames, we further introduce the event-based inter-frame residual guidance to enhance the accuracy of video frame reconstruction. Furthermore, we extend our method to video frame interpolation and prediction in a zero-shot manner by modulating the reverse diffusion sampling process, thereby creating a unified event-to-frame reconstruction framework. Experimental results on real-world and synthetic datasets demonstrate that our method significantly outperforms previous approaches both quantitatively and qualitatively. We also refer the reviewers to the video demo contained in the supplementary material for video results. The code will be publicly available at https://github.com/CS-GangXu/UniE2F.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper conditions a pre-trained video diffusion model on event data with added inter-frame residual guidance from the standard event camera model, then modulates sampling for zero-shot interpolation and prediction, producing a unified framework that claims gains on real and synthetic data.

read the letter

The main new piece is the combination of direct event conditioning on a video diffusion prior with an explicit residual guidance term that uses the physical log-intensity difference encoded by events. They then show how to tweak the reverse sampling process to handle interpolation and future-frame prediction without any fine-tuning. That unification is the cleanest part of the work and avoids training separate models for each task. The approach stays consistent with how diffusion models are conditioned in other domains and with the established event camera forward model, so there is no hidden circularity or contradictory assumption. Code release is also noted, which makes the claims easier to check later. The abstract reports quantitative and qualitative improvements over prior event-to-frame methods on both real-world and synthetic sets, which is the expected place to start for this kind of engineering contribution. The soft spots are mostly about missing detail rather than broken logic. No error bars, ablation numbers, or dataset statistics appear in the summary, so it is hard to judge how much the residual guidance actually adds versus the diffusion prior alone or how stable the numbers are across seeds. Because the method is generative, the outputs can look convincing even when they deviate from ground-truth intensity; the paper will need to demonstrate that the reconstructions are accurate enough for downstream tasks rather than merely plausible. Those gaps are common at the abstract stage but will matter for a full review. This paper is aimed at people working on event-camera pipelines for robotics or high-speed vision who already use or are willing to adopt diffusion models. A reader who needs a practical bridge from sparse events to dense frames that existing video models can consume will get the most out of it. The central claim is internally consistent and the extensions are testable, so it deserves a serious referee rather than a desk reject.

Referee Report

2 major / 2 minor

Summary. The paper proposes UniE2F, a unified diffusion framework that reconstructs high-fidelity video frames from sparse event-camera data by conditioning a pre-trained video diffusion model on event streams. It first builds a baseline via direct event conditioning, then adds event-based inter-frame residual guidance derived from physical log-intensity change correlations, and extends the approach to zero-shot interpolation and prediction by modulating the reverse diffusion sampling process. Experiments on real-world and synthetic datasets are reported to show quantitative and qualitative gains over prior methods, with code promised to be released.

Significance. If the experimental claims hold under closer scrutiny, the work offers a practical way to leverage large-scale video generative priors for event-to-frame tasks, addressing the inherent loss of absolute intensity and texture in event data. The unified treatment of reconstruction, interpolation, and prediction within a single sampling framework is a clear strength, as is the explicit grounding in the standard event-camera model (events as log-intensity differences). Public code release would further support reproducibility.

major comments (2)

[§4 Experiments] §4 (Experiments): Quantitative results are presented without error bars, standard deviations across runs, or dataset statistics (e.g., event density distributions or frame counts). This omission makes it difficult to assess whether the reported gains over baselines are statistically reliable or sensitive to particular data splits.
[§3.2 and §4] §3.2 (Residual Guidance) and §4: The claim that the event-based inter-frame residual guidance produces accurate rather than merely plausible reconstructions rests on the physical correlation between events and intensity changes, yet no ablation isolates its contribution versus the baseline conditioning alone, nor are there direct comparisons against ground-truth intensity values beyond standard perceptual metrics.

minor comments (2)

[Abstract] Abstract: The statement that the method 'significantly outperforms previous approaches' would benefit from one or two concrete metric values (e.g., PSNR or LPIPS deltas) to give readers an immediate sense of scale.
[Figures and §4] Figure captions and §4: Several qualitative comparisons lack explicit indication of which rows/columns correspond to which method or dataset; adding this would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive summary, the recommendation of minor revision, and the constructive comments. We address each major point below and will update the manuscript accordingly.

read point-by-point responses

Referee: [§4 Experiments] Quantitative results are presented without error bars, standard deviations across runs, or dataset statistics (e.g., event density distributions or frame counts). This omission makes it difficult to assess whether the reported gains over baselines are statistically reliable or sensitive to particular data splits.

Authors: We agree that error bars and additional dataset statistics would improve clarity. In the revised manuscript we will report standard deviations across runs with error bars in all quantitative tables and add dataset statistics including event density distributions and frame counts per sequence. revision: yes
Referee: [§3.2 and §4] §3.2 (Residual Guidance) and §4: The claim that the event-based inter-frame residual guidance produces accurate rather than merely plausible reconstructions rests on the physical correlation between events and intensity changes, yet no ablation isolates its contribution versus the baseline conditioning alone, nor are there direct comparisons against ground-truth intensity values beyond standard perceptual metrics.

Authors: We agree that an explicit ablation would strengthen the evidence for the residual guidance. We will add an ablation study in Section 4 comparing the baseline event conditioning against the full model with inter-frame residual guidance. Note that our reported PSNR and SSIM metrics already constitute direct pixel-level comparisons to ground-truth intensity; we will clarify this point and retain the perceptual metrics as complementary. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper conditions a pre-trained external video diffusion model on event data and augments it with inter-frame residual guidance derived from the standard event-camera physical model (events encode log-intensity differences). No equations reduce predictions to fitted parameters defined from the target data, no self-citation chains justify uniqueness or ansatzes, and no known results are merely renamed. The central claims rest on established diffusion conditioning and the well-known event-to-intensity mapping, making the derivation independent of its own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the generative capability of an external pre-trained video diffusion model and the existence of a usable physical correlation between event streams and absolute intensity frames; no free parameters are explicitly fitted in the abstract description.

axioms (1)

domain assumption physical correlation between the event stream and video frames exists and can be used as guidance
Invoked to justify the inter-frame residual guidance step

pith-pipeline@v0.9.0 · 5508 in / 1246 out tokens · 33608 ms · 2026-05-15T20:33:10.646828+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel (J-cost uniqueness) unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we further introduce the event-based inter-frame residual guidance to enhance the accuracy of video frame reconstruction... L_residual(U_t) = |ΔF - R| ... Ū_t = U_t - s ∇_U_t L_residual(U_t)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

70 extracted references · 70 canonical work pages · 1 internal anchor

[1]

IEEE Journal of Solid-State Circuits49(10), 2333–2341 (2014)

Brandli, C., Berner, R., Yang, M., Liu, S.-C., Delbruck, T.: A 240×180 130 db 3µs latency global shutter spatiotemporal vision sensor. IEEE Journal of Solid-State Circuits49(10), 2333–2341 (2014)

work page 2014
[2]

In: 2014 IEEE International Symposium on Circuits and Systems (ISCAS), pp

Brandli, C., Muller, L., Delbruck, T.: Real- time, high-speed video decompression using a frame-and event-based davis sensor. In: 2014 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 686–689 (2014). IEEE

work page 2014
[3]

IEEE Transactions on Image Pro- cessing31, 7237–7251 (2022)

Wan, Z., Dai, Y., Mao, Y.: Learning dense and continuous optical flow from an event camera. IEEE Transactions on Image Pro- cessing31, 7237–7251 (2022)

work page 2022
[4]

International Journal of Computer Vision131(2), 453–470 (2023)

Zhang, H., Zhang, L., Dai, Y., Li, H., Koniusz, P.: Event-guided multi-patch net- work with self-supervision for non-uniform motion deblurring. International Journal of Computer Vision131(2), 453–470 (2023)

work page 2023
[5]

Advances in Neural Information Processing Systems35, 7462– 7476 (2022)

Zhu, Z., Hou, J., Lyu, X.: Learning graph- embedded key-event back-tracing for object tracking in event clouds. Advances in Neural Information Processing Systems35, 7462– 7476 (2022)

work page 2022
[6]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

Zhu, Z., Hou, J., Wu, D.O.: Cross-modal orthogonal high-rank augmentation for rgb- event transformer-trackers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 22045–22055 (2023)

work page 2023
[7]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Chen, Z., Zhu, Z., Zhang, Y., Hou, J., Shi, G., Wu, J.: Segment any event streams via weighted adaptation of pivotal tokens. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3890–3900 (2024) 18

work page 2024
[8]

IEEE Transactions on Mobile Computing (2025)

Zhu, Z., Hou, J., Li, J., Wu, J., Hou, J.: Modeling state shifting via local-global dis- tillation for event-frame gaze tracking. IEEE Transactions on Mobile Computing (2025)

work page 2025
[9]

IEEE transactions on pattern analysis and machine intelligence43(6), 1964–1980 (2019)

Rebecq, H., Ranftl, R., Koltun, V., Scara- muzza, D.: High speed and high dynamic range video with an event camera. IEEE transactions on pattern analysis and machine intelligence43(6), 1964–1980 (2019)

work page 1964
[10]

IEEE transactions on pattern analysis and machine intelligence44(1), 154–180 (2020)

Gallego, G., Delbr¨ uck, T., Orchard, G., Bar- tolozzi, C., Taba, B., Censi, A., Leutenegger, S., Davison, A.J., Conradt, J., Daniilidis, K., et al.: Event-based vision: A survey. IEEE transactions on pattern analysis and machine intelligence44(1), 154–180 (2020)

work page 2020
[11]

In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVII 16, pp

Stoffregen, T., Scheerlinck, C., Scaramuzza, D., Drummond, T., Barnes, N., Kleeman, L., Mahony, R.: Reducing the sim-to-real gap for event cameras. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVII 16, pp. 534–549 (2020). Springer

work page 2020
[12]

In: Proceedings of the IEEE/CVF Win- ter Conference on Applications of Computer Vision, pp

Scheerlinck, C., Rebecq, H., Gehrig, D., Barnes, N., Mahony, R., Scaramuzza, D.: Fast image reconstruction with an event cam- era. In: Proceedings of the IEEE/CVF Win- ter Conference on Applications of Computer Vision, pp. 156–163 (2020)

work page 2020
[13]

IEEE Transactions on Image Processing 30, 2488–2500 (2021)

Cadena, P.R.G., Qian, Y., Wang, C., Yang, M.: Spade-e2vid: Spatially-adaptive denor- malization for event-based video reconstruc- tion. IEEE Transactions on Image Processing 30, 2488–2500 (2021)

work page 2021
[14]

In: Proceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, pp

Weng, W., Zhang, Y., Xiong, Z.: Event- based video reconstruction using transformer. In: Proceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, pp. 2563–2572 (2021)

work page 2021
[15]

ACM Transactions on Multimedia Computing, Communications and Applications19(2s), 1– 31 (2023)

Dong, J., Ota, K., Dong, M.: Video frame interpolation: A comprehensive survey. ACM Transactions on Multimedia Computing, Communications and Applications19(2s), 1– 31 (2023)

work page 2023
[16]

IEEE Transactions on Circuits and Systems for Video Technology31(4), 1283– 1295 (2020)

Li, S., Fang, J., Xu, H., Xue, J.: Video frame prediction by deep multi-branch mask net- work. IEEE Transactions on Circuits and Systems for Video Technology31(4), 1283– 1295 (2020)

work page 2020
[17]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Tulyakov, S., Gehrig, D., Georgoulis, S., Erbach, J., Gehrig, M., Li, Y., Scaramuzza, D.: Time lens: Event-based video frame inter- polation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16155–16164 (2021)

work page 2021
[18]

IEEE Conference on Computer Vision and Pattern Recognition (2022)

Tulyakov, S., Bochicchio, A., Gehrig, D., Georgoulis, S., Li, Y., Scaramuzza, D.: Time Lens++: Event-based frame interpolation with non-linear parametric flow and multi- scale fusion. IEEE Conference on Computer Vision and Pattern Recognition (2022)

work page 2022
[20]

In: Proceedings of the Com- puter Vision and Pattern Recognition Con- ference, pp

Wang, Z., Hamann, F., Chaney, K., Jiang, W., Gallego, G., Daniilidis, K.: Event-based continuous color video decompression from single frames. In: Proceedings of the Com- puter Vision and Pattern Recognition Con- ference, pp. 4968–4978 (2025)

work page 2025
[21]

Advances in neural information processing systems33, 6840– 6851 (2020)

Ho, J., Jain, A., Abbeel, P.: Denoising diffu- sion probabilistic models. Advances in neural information processing systems33, 6840– 6851 (2020)

work page 2020
[22]

Advances in Neural Information Processing Systems35, 23593–23606 (2022)

Kawar, B., Elad, M., Ermon, S., Song, J.: Denoising diffusion restoration models. Advances in Neural Information Processing Systems35, 23593–23606 (2022)

work page 2022
[23]

In: The Eleventh International Conference on Learn- ing Representations (2023)

Chung, H., Kim, J., Mccann, M.T., Klasky, M.L., Ye, J.C.: Diffusion posterior sampling for general noisy inverse problems. In: The Eleventh International Conference on Learn- ing Representations (2023)

work page 2023
[24]

In: The Eleventh Interna- tional Conference on Learning Representa- tions (2023) 19

Wang, Y., Yu, J., Zhang, J.: Zero-shot image restoration using denoising diffusion null-space model. In: The Eleventh Interna- tional Conference on Learning Representa- tions (2023) 19

work page 2023
[25]

IEEE transactions on pattern analysis and machine intelligence45(4), 4713–4726 (2022)

Saharia, C., Ho, J., Chan, W., Salimans, T., Fleet, D.J., Norouzi, M.: Image super- resolution via iterative refinement. IEEE transactions on pattern analysis and machine intelligence45(4), 4713–4726 (2022)

work page 2022
[26]

In: ACM SIGGRAPH 2022 Conference Proceedings, pp

Saharia, C., Chan, W., Chang, H., Lee, C., Ho, J., Salimans, T., Fleet, D., Norouzi, M.: Palette: Image-to-image diffusion mod- els. In: ACM SIGGRAPH 2022 Conference Proceedings, pp. 1–10 (2022)

work page 2022
[27]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Gao, S., Liu, X., Zeng, B., Xu, S., Li, Y., Luo, X., Liu, J., Zhen, X., Zhang, B.: Implicit diffusion models for continu- ous super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10021–10030 (2023)

work page 2023
[28]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695 (2022)

work page 2022
[29]

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

Blattmann, A., Dockhorn, T., Kulal, S., Mendelevitch, D., Kilian, M., Lorenz, D., Levi, Y., English, Z., Voleti, V., Letts, A., et al.: Stable video diffusion: Scaling latent video diffusion models to large datasets. arXiv preprint arXiv:2311.15127 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[30]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Tulyakov, S., Bochicchio, A., Gehrig, D., Georgoulis, S., Li, Y., Scaramuzza, D.: Time lens++: Event-based frame interpo- lation with parametric non-linear flow and multi-scale fusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17755–17764 (2022)

work page 2022
[31]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

He, W., You, K., Qiao, Z., Jia, X., Zhang, Z., Wang, W., Lu, H., Wang, Y., Liao, J.: Timereplayer: Unlocking the potential of event cameras for video interpolation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17804–17813 (2022)

work page 2022
[32]

In: European Conference on Computer Vision, pp

Wu, S., You, K., He, W., Yang, C., Tian, Y., Wang, Y., Zhang, Z., Liao, J.: Video inter- polation by event-driven anisotropic adjust- ment of optical flow. In: European Conference on Computer Vision, pp. 267–283 (2022). Springer

work page 2022
[33]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Pan, L., Scheerlinck, C., Yu, X., Hartley, R., Liu, M., Dai, Y.: Bringing a blurry frame alive at high frame-rate with an event camera. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6820–6829 (2019)

work page 2019
[34]

IEEE Transactions on Pattern Analysis and Machine Intelligence44(5), 2519–2533 (2020)

Pan, L., Hartley, R., Scheerlinck, C., Liu, M., Yu, X., Dai, Y.: High frame rate video reconstruction based on an event cam- era. IEEE Transactions on Pattern Analysis and Machine Intelligence44(5), 2519–2533 (2020)

work page 2020
[35]

IEEE Transactions on Circuits and Systems for Video Technol- ogy33(2), 701–712 (2022)

Chen, Z., Wu, J., Hou, J., Li, L., Dong, W., Shi, G.: Ecsnet: Spatio-temporal feature learning for event camera. IEEE Transactions on Circuits and Systems for Video Technol- ogy33(2), 701–712 (2022)

work page 2022
[36]

In: The 2011 International Joint Conference on Neural Networks, pp

Cook, M., Gugelmann, L., Jug, F., Krautz, C., Steger, A.: Interacting maps for fast visual interpretation. In: The 2011 International Joint Conference on Neural Networks, pp. 770–776 (2011). IEEE

work page 2011
[37]

Kim, H., Handa, A., Benosman, R., Ieng, S.-H., Davison, A.J.: Simultaneous mosaicing and tracking with an event camera. J. Solid State Circ43, 566–576 (2008)

work page 2008
[38]

International Journal of Computer Vision 126(12), 1381–1393 (2018)

Munda, G., Reinbacher, C., Pock, T.: Real-time intensity-image reconstruction for event cameras using manifold regularisation. International Journal of Computer Vision 126(12), 1381–1393 (2018)

work page 2018
[39]

In: Proceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

Bardow, P., Davison, A.J., Leutenegger, S.: Simultaneous optical flow and intensity esti- mation from an event camera. In: Proceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 884–892 (2016)

work page 2016
[40]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Paredes-Vall´ es, F., De Croon, G.C.: Back to event basics: Self-supervised learning of 20 image reconstruction for event cameras via photometric constancy. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3446– 3455 (2021)

work page 2021
[41]

In: Proceedings of the 31st ACM International Conference on Mul- timedia, pp

Liang, Q., Zheng, X., Huang, K., Zhang, Y., Chen, J., Tian, Y.: Event-diffusion: Event- based image reconstruction and restoration with diffusion models. In: Proceedings of the 31st ACM International Conference on Mul- timedia, pp. 3837–3846 (2023)

work page 2023
[42]

In: Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part XL, pp

Zhu, L., Zheng, Y., Zhang, Y., Wang, X., Wang, L., Huang, H.: Temporal residual guided diffusion framework for event-driven video reconstruction. In: Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part XL, pp. 411–427. Springer, Cham, Switzerland (2024)

work page 2024
[43]

Advances in Neural Information Processing Systems37, 70406–70430 (2024)

Chen, K., Li, H., Zhou, J., Wang, Z., Wang, L.: Lase-e2v: Towards language- guided semantic-aware event-to-video recon- struction. Advances in Neural Information Processing Systems37, 70406–70430 (2024)

work page 2024
[44]

arXiv preprint arXiv:2407.08231 (2024)

Liang, J., Yu, B., Yang, Y., Han, Y., Shi, B.: E2vidiff: Perceptual events-to-video reconstruction using diffusion priors. arXiv preprint arXiv:2407.08231 (2024)

work page arXiv 2024
[45]

In: 2024 IEEE International Con- ference on Image Processing (ICIP), pp

Zhao, Y., Zhang, P., Wang, C., Lam, E.Y.: Controllable unsupervised event-based video generation. In: 2024 IEEE International Con- ference on Image Processing (ICIP), pp. 2278–2284 (2024). IEEE

work page 2024
[46]

IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

Tulyakov, S., Bochicchio, A., Gehrig, D., Georgoulis, S., Li, Y., Scaramuzza, D.: Time lens++: Event-based frame interpolation with parametric non-linear flow and multi- scale fusion. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

work page 2022
[47]

In: Proceedings of the Computer Vision and Pattern Recognition Conference, pp

Liu, H., Xu, J., Chang, Y., Zhou, H., Zhao, H., Wang, L., Yan, L.: Timetracker: Event- based continuous point tracking for video frame interpolation with non-linear motion. In: Proceedings of the Computer Vision and Pattern Recognition Conference, pp. 17649– 17659 (2025)

work page 2025
[48]

In: European Conference on Computer Vision, pp

Ma, Y., Guo, S., Chen, Y., Xue, T., Gu, J.: Timelens-xl: Real-time event-based video frame interpolation with large motion. In: European Conference on Computer Vision, pp. 178–194 (2024). Springer

work page 2024
[49]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Kim, T., Chae, Y., Jang, H.-K., Yoon, K.-J.: Event-based video frame interpolation with cross-modal asymmetric bidirectional motion fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18032–18042 (2023)

work page 2023
[50]

In: Proceedings of the Computer Vision and Pattern Recognition Conference, pp

Chen, J., Feng, B.Y., Cai, H., Wang, T., Burner, L., Yuan, D., Fermuller, C., Met- zler, C.A., Aloimonos, Y.: Repurposing pre- trained video diffusion models for event-based video interpolation. In: Proceedings of the Computer Vision and Pattern Recognition Conference, pp. 12456–12466 (2025)

work page 2025
[51]

In: Proceedings of the IEEE/CVF Inter- national Conference on Computer Vision, pp

Xia, B., Zhang, Y., Wang, S., Wang, Y., Wu, X., Tian, Y., Yang, W., Van Gool, L.: Diffir: Efficient diffusion model for image restora- tion. In: Proceedings of the IEEE/CVF Inter- national Conference on Computer Vision, pp. 13095–13105 (2023)

work page 2023
[52]

In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Blattmann, A., Rombach, R., Ling, H., Dock- horn, T., Kim, S.W., Fidler, S., Kreis, K.: Align your latents: High-resolution video syn- thesis with latent diffusion models. In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22563–22575 (2023)

work page 2023
[53]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion, pp

Wang, L., Ho, Y.-S., Yoon, K.-J.,et al.: Event-based high dynamic range image and very high frame rate video generation using conditional generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion, pp. 10081–10090 (2019)

work page 2019
[54]

In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 21 pp

Nam, Y., Mostafavi, M., Yoon, K.-J., Choi, J.: Stereo depth from events cameras: Con- centrate and focus on the future. In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 21 pp. 6114–6123 (2022)

work page 2022
[55]

In: European Confer- ence on Computer Vision, pp

Teng, M., Zhou, C., Lou, H., Shi, B.: Nest: Neural event stack for event-based image enhancement. In: European Confer- ence on Computer Vision, pp. 660–676 (2022). Springer

work page 2022
[56]

In: International Con- ference on Learning Representations (2021)

Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.: Score- based generative modeling through stochastic differential equations. In: International Con- ference on Learning Representations (2021)

work page 2021
[57]

Advances in Neural Information Processing Systems37, 105552– 105582 (2024)

Wu, S., Zhu, Z., Hou, J., Shi, G., Wu, J.: E-motion: Future motion simulation via event sequence diffusion. Advances in Neural Information Processing Systems37, 105552– 105582 (2024)

work page 2024
[58]

Advances in neural information processing systems35, 26565– 26577 (2022)

Karras, T., Aittala, M., Aila, T., Laine, S.: Elucidating the design space of diffusion- based generative models. Advances in neural information processing systems35, 26565– 26577 (2022)

work page 2022
[59]

In: Proceedings of the IEEE Conference on Com- puter Vision and Pattern Recognition, pp

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Com- puter Vision and Pattern Recognition, pp. 770–778 (2016)

work page 2016
[60]

In: Proceedings of the European Conference on Computer Vision (ECCV), pp

Muller, M., Bibi, A., Giancola, S., Alsubaihi, S., Ghanem, B.: Trackingnet: A large-scale dataset and benchmark for object tracking in the wild. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 300–317 (2018)

work page 2018
[61]

In: European Conference on Computer Vision, pp

Lin, S., Ma, Y., Guo, Z., Wen, B.: Dvs- voltmeter: Stochastic process-based event simulator for dynamic vision sensors. In: European Conference on Computer Vision, pp. 578–593 (2022). Springer

work page 2022
[62]

IEEE Transactions on Image Process- ing33, 1826–1837 (2024)

Ercan, B., Eker, O., Saglam, C., Erdem, A., Erdem, E.: Hypere2vid: Improving event- based video reconstruction via hypernet- works. IEEE Transactions on Image Process- ing33, 1826–1837 (2024)

work page 2024
[63]

In: International Con- ference on Learning Representations (2019)

Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: International Con- ference on Learning Representations (2019)

work page 2019
[64]

IEEE transactions on image processing13(4), 600– 612 (2004)

Wang, Z., Bovik, A.C., Sheikh, H.R., Simon- celli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing13(4), 600– 612 (2004)

work page 2004
[65]

In: Proceedings of the IEEE Conference on Com- puter Vision and Pattern Recognition, pp

Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE Conference on Com- puter Vision and Pattern Recognition, pp. 586–595 (2018)

work page 2018
[66]

In: AAAI Conference on Artificial Intelligence (AAAI) (2024)

Zhu, J., Wan, Z., Dai, Y.: Video frame predic- tion from a single image and events. In: AAAI Conference on Artificial Intelligence (AAAI) (2024)

work page 2024
[67]

The International journal of robotics research 36(2), 142–149 (2017)

Mueggler, E., Rebecq, H., Gallego, G., Del- bruck, T., Scaramuzza, D.: The event-camera dataset and simulator: Event-based data for pose estimation, visual odometry, and slam. The International journal of robotics research 36(2), 142–149 (2017)

work page 2017
[68]

IEEE Robotics and Automation Letters3(3), 2032– 2039 (2018)

Zhu, A.Z., Thakur, D., ¨Ozaslan, T., Pfrom- mer, B., Kumar, V., Daniilidis, K.: The multivehicle stereo event camera dataset: An event camera dataset for 3d perception. IEEE Robotics and Automation Letters3(3), 2032– 2039 (2018)

work page 2032
[69]

Advances in neural information processing systems30(2017)

Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems30(2017)

work page 2017
[70]

In: Computer vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pp

Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll´ ar, P., Zitnick, C.L.: Microsoft coco: Common objects in con- text. In: Computer vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pp. 740–755 (2014). Springer 22 Appendix Contents AProofs of Proposition 1 BDetails of ...

work page 2014
[71]

After obtaining the intermediate or subsequent frames from the network output, we compare them with the corresponding ground-truth frames from the test set to compute MSE, SSIM, and LPIPS. 25 Fig. D1: Visual comparison of event-based video frame Interpolation results on synthetic dataset. Fig. D2: Visual comparison of event-based video frame Interpolation...

work page