arXiv preprint arXiv:2505.24873 , year=

Zi, B · 2025 · arXiv 2505.24873

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

representative citing papers

PROVE: A Perceptual RemOVal cohErence Benchmark for Visual Media

cs.CV · 2026-05-14 · unverdicted · novelty 7.0

PROVE proposes RC metrics for perceptual removal coherence and releases PROVE-Bench to better align automatic scores with human judgments on object removal tasks.

VideoCoF: Unified Video Editing with Temporal Reasoner

cs.CV · 2025-12-08 · unverdicted · novelty 7.0

VideoCoF adds an explicit reasoning step using edit-region latents in video diffusion models to enable precise mask-free editing and motion alignment with only 50k training pairs.

SimInsert: Seamless Video Object Insertion via Regional Sparse Attention Fusion

cs.CV · 2026-05-22 · unverdicted · novelty 6.0

SimInsert is a training-free video object insertion technique that decouples the task into single-frame editing and semantic motion description, using image-to-video diffusion models with non-invasive guidance to achieve spatio-temporal coherence.

LIVEditor-14B: Lightning Unified Video Editing via In-Context Sparse Attention

cs.CV · 2026-05-06 · unverdicted · novelty 6.0

LIVEditor-14B applies a new sparse attention method (ISA) that prunes context and uses query-sharpness routing to cut attention latency ~60% with no loss in editing quality on standard benchmarks.

CLEAR: Context-Aware Learning with End-to-End Mask-Free Inference for Adaptive Video Subtitle Removal

cs.CV · 2026-03-23 · unverdicted · novelty 6.0

CLEAR achieves end-to-end mask-free video subtitle removal via dual-encoder self-supervised orthogonality and LoRA-based generation feedback, delivering +6.77 dB PSNR gains and strong zero-shot multilingual performance.

GenEraser: Generalizable Video Object Removal via Balanced Text-Mask Guidance and Decoupled Locator-Preserver

cs.CV · 2026-05-28 · unverdicted · novelty 5.0

GenEraser proposes MC-MoE with bipartite text guidance, LD-CFG fusion, and a decoupled locator-preserver architecture for generalizable video object and effect removal, claiming 2.16 dB and 1.44 dB gains on ROSE and VOR-Eval benchmarks.

Smart-Insertion-V: Photorealistic Video Insertion via a Closed-Loop Feedback Dual-Stream Framework

cs.CV · 2026-05-22 · unverdicted · novelty 5.0

Smart-Insertion-V is a dual-stream closed-loop framework with Dual-World-View RoPE and a Decoupled Guidance Module that inserts reference objects into videos while achieving stylistic harmony despite domain gaps.

Occlusion-Aware Physics-Semantic Keyframe Selection for Robust Video Editing

cs.CV · 2026-05-22 · unverdicted · novelty 5.0

A new keyframe selection framework combines structural, tracking, and semantic criteria to select reliable anchor frames for diffusion-based video editing under occlusion.

Tuning-free Instruction-based Video Editing Via Structural Noise Initialization and Guidance

cs.CV · 2026-05-15 · unverdicted · novelty 5.0

Proposes SNIS and NGM to enable tuning-free instruction-based video editing with improved visual quality and claimed SOTA results.

citing papers explorer

Showing 8 of 8 citing papers after filters.

PROVE: A Perceptual RemOVal cohErence Benchmark for Visual Media cs.CV · 2026-05-14 · unverdicted · none · ref 19
PROVE proposes RC metrics for perceptual removal coherence and releases PROVE-Bench to better align automatic scores with human judgments on object removal tasks.
SimInsert: Seamless Video Object Insertion via Regional Sparse Attention Fusion cs.CV · 2026-05-22 · unverdicted · none · ref 21
SimInsert is a training-free video object insertion technique that decouples the task into single-frame editing and semantic motion description, using image-to-video diffusion models with non-invasive guidance to achieve spatio-temporal coherence.
LIVEditor-14B: Lightning Unified Video Editing via In-Context Sparse Attention cs.CV · 2026-05-06 · unverdicted · none · ref 59
LIVEditor-14B applies a new sparse attention method (ISA) that prunes context and uses query-sharpness routing to cut attention latency ~60% with no loss in editing quality on standard benchmarks.
CLEAR: Context-Aware Learning with End-to-End Mask-Free Inference for Adaptive Video Subtitle Removal cs.CV · 2026-03-23 · unverdicted · none · ref 12
CLEAR achieves end-to-end mask-free video subtitle removal via dual-encoder self-supervised orthogonality and LoRA-based generation feedback, delivering +6.77 dB PSNR gains and strong zero-shot multilingual performance.
GenEraser: Generalizable Video Object Removal via Balanced Text-Mask Guidance and Decoupled Locator-Preserver cs.CV · 2026-05-28 · unverdicted · none · ref 30
GenEraser proposes MC-MoE with bipartite text guidance, LD-CFG fusion, and a decoupled locator-preserver architecture for generalizable video object and effect removal, claiming 2.16 dB and 1.44 dB gains on ROSE and VOR-Eval benchmarks.
Smart-Insertion-V: Photorealistic Video Insertion via a Closed-Loop Feedback Dual-Stream Framework cs.CV · 2026-05-22 · unverdicted · none · ref 38
Smart-Insertion-V is a dual-stream closed-loop framework with Dual-World-View RoPE and a Decoupled Guidance Module that inserts reference objects into videos while achieving stylistic harmony despite domain gaps.
Occlusion-Aware Physics-Semantic Keyframe Selection for Robust Video Editing cs.CV · 2026-05-22 · unverdicted · none · ref 95
A new keyframe selection framework combines structural, tracking, and semantic criteria to select reliable anchor frames for diffusion-based video editing under occlusion.
Tuning-free Instruction-based Video Editing Via Structural Noise Initialization and Guidance cs.CV · 2026-05-15 · unverdicted · none · ref 19
Proposes SNIS and NGM to enable tuning-free instruction-based video editing with improved visual quality and claimed SOTA results.

arXiv preprint arXiv:2505.24873 , year=

fields

years

verdicts

representative citing papers

citing papers explorer