Videoanydoor: High-fidelity video object insertion with precise motion control.arXiv preprint arXiv:2501.01427

· 2025 · arXiv 2501.01427

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Script-a-Video: Deep Structured Audio-visual Captions via Factorized Streams and Relational Grounding

cs.CV · 2026-04-13 · unverdicted · novelty 7.0

MTSS replaces monolithic video captions with factorized streams and relational grounding, yielding reported gains in understanding benchmarks and generation consistency.

Durian: Dual Reference Image-Guided Portrait Animation with Attribute Transfer

cs.CV · 2025-09-04 · conditional · novelty 7.0

Durian introduces a dual-reference diffusion model trained via self-reconstruction on video frames to enable cross-identity attribute transfer in portrait animations, supporting multi-attribute composition and interpolation.

SimInsert: Seamless Video Object Insertion via Regional Sparse Attention Fusion

cs.CV · 2026-05-22 · unverdicted · novelty 6.0

SimInsert is a training-free video object insertion technique that decouples the task into single-frame editing and semantic motion description, using image-to-video diffusion models with non-invasive guidance to achieve spatio-temporal coherence.

Bridging the Embodiment Gap: Disentangled Cross-Embodiment Video Editing

cs.RO · 2026-05-05 · unverdicted · novelty 6.0

A dual-contrastive disentanglement method factorizes videos into independent task and embodiment latents, then uses a parameter-efficient adapter on a frozen video diffusion model to synthesize robot executions from single human demonstrations without paired data.

citing papers explorer

Showing 4 of 4 citing papers.

Script-a-Video: Deep Structured Audio-visual Captions via Factorized Streams and Relational Grounding cs.CV · 2026-04-13 · unverdicted · none · ref 24
MTSS replaces monolithic video captions with factorized streams and relational grounding, yielding reported gains in understanding benchmarks and generation consistency.
Durian: Dual Reference Image-Guided Portrait Animation with Attribute Transfer cs.CV · 2025-09-04 · conditional · none · ref 15
Durian introduces a dual-reference diffusion model trained via self-reconstruction on video frames to enable cross-identity attribute transfer in portrait animations, supporting multi-attribute composition and interpolation.
SimInsert: Seamless Video Object Insertion via Regional Sparse Attention Fusion cs.CV · 2026-05-22 · unverdicted · none · ref 11
SimInsert is a training-free video object insertion technique that decouples the task into single-frame editing and semantic motion description, using image-to-video diffusion models with non-invasive guidance to achieve spatio-temporal coherence.
Bridging the Embodiment Gap: Disentangled Cross-Embodiment Video Editing cs.RO · 2026-05-05 · unverdicted · none · ref 19
A dual-contrastive disentanglement method factorizes videos into independent task and embodiment latents, then uses a parameter-efficient adapter on a frozen video diffusion model to synthesize robot executions from single human demonstrations without paired data.

Videoanydoor: High-fidelity video object insertion with precise motion control.arXiv preprint arXiv:2501.01427

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer