Semantically Consistent Video Inpainting with Conditional Diffusion Models

Adam Scibior; Berend Zwartsenberg; Dylan Green; Frank Wood; Jonathan Lavington; Ke Zhang; Matthew Niedoba; Saeid Naderiparizi; Setareh Dabiri; Vasileios Lioutas

arxiv: 2405.00251 · v2 · pith:KYCWKNWKnew · submitted 2024-04-30 · 💻 cs.CV · cs.LG

Semantically Consistent Video Inpainting with Conditional Diffusion Models

Dylan Green , William Harvey , Saeid Naderiparizi , Matthew Niedoba , Yunpeng Liu , Xiaoxuan Liang , Jonathan Lavington , Ke Zhang

show 5 more authors

Vasileios Lioutas Setareh Dabiri Adam Scibior Berend Zwartsenberg Frank Wood

This is my paper

classification 💻 cs.CV cs.LG

keywords videoconditionalframesinpaintingapproachesconsistentcontentcontext

0 comments

read the original abstract

Current state-of-the-art methods for video inpainting typically rely on optical flow or attention-based approaches to inpaint masked regions by propagating visual information across frames. While such approaches have led to significant progress on standard benchmarks, they struggle with tasks that require the synthesis of novel content that is not present in other frames. In this paper, we reframe video inpainting as a conditional generative modeling problem and present a framework for solving such problems with conditional video diffusion models. We introduce inpainting-specific sampling schemes which capture crucial long-range dependencies in the context, and devise a novel method for conditioning on the known pixels in incomplete frames. We highlight the advantages of using a generative approach for this task, showing that our method is capable of generating diverse, high-quality inpaintings and synthesizing new content that is spatially, temporally, and semantically consistent with the provided context.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Controllable Video Object Insertion via Multiview Priors
cs.CV 2026-04 unverdicted novelty 5.0

A multi-view prior-based framework for video object insertion that uses dual-path conditioning and an integration-aware consistency module to improve appearance stability and occlusion handling.