PROVE proposes RC metrics for perceptual removal coherence and releases PROVE-Bench to better align automatic scores with human judgments on object removal tasks.
arXiv preprint arXiv:2505.24873 , year=
7 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 7representative citing papers
VideoCoF adds an explicit reasoning step using edit-region latents in video diffusion models to enable precise mask-free editing and motion alignment with only 50k training pairs.
SimInsert is a training-free video object insertion technique that decouples the task into single-frame editing and semantic motion description, using image-to-video diffusion models with non-invasive guidance to achieve spatio-temporal coherence.
CLEAR achieves end-to-end mask-free video subtitle removal via dual-encoder self-supervised orthogonality and LoRA-based generation feedback, delivering +6.77 dB PSNR gains and strong zero-shot multilingual performance.
Smart-Insertion-V is a dual-stream closed-loop framework with Dual-World-View RoPE and a Decoupled Guidance Module that inserts reference objects into videos while achieving stylistic harmony despite domain gaps.
Proposes SNIS and NGM to enable tuning-free instruction-based video editing with improved visual quality and claimed SOTA results.
citing papers explorer
-
PROVE: A Perceptual RemOVal cohErence Benchmark for Visual Media
PROVE proposes RC metrics for perceptual removal coherence and releases PROVE-Bench to better align automatic scores with human judgments on object removal tasks.
-
VideoCoF: Unified Video Editing with Temporal Reasoner
VideoCoF adds an explicit reasoning step using edit-region latents in video diffusion models to enable precise mask-free editing and motion alignment with only 50k training pairs.
-
SimInsert: Seamless Video Object Insertion via Regional Sparse Attention Fusion
SimInsert is a training-free video object insertion technique that decouples the task into single-frame editing and semantic motion description, using image-to-video diffusion models with non-invasive guidance to achieve spatio-temporal coherence.
-
CLEAR: Context-Aware Learning with End-to-End Mask-Free Inference for Adaptive Video Subtitle Removal
CLEAR achieves end-to-end mask-free video subtitle removal via dual-encoder self-supervised orthogonality and LoRA-based generation feedback, delivering +6.77 dB PSNR gains and strong zero-shot multilingual performance.
-
Smart-Insertion-V: Photorealistic Video Insertion via a Closed-Loop Feedback Dual-Stream Framework
Smart-Insertion-V is a dual-stream closed-loop framework with Dual-World-View RoPE and a Decoupled Guidance Module that inserts reference objects into videos while achieving stylistic harmony despite domain gaps.
-
Tuning-free Instruction-based Video Editing Via Structural Noise Initialization and Guidance
Proposes SNIS and NGM to enable tuning-free instruction-based video editing with improved visual quality and claimed SOTA results.
- Occlusion-Aware Physics-Semantic Keyframe Selection for Robust Video Editing