Flatten: optical flow-guided attention for consistent text-to-video editing.arXiv preprint arXiv:2310.05922

Yuren Cong, Mengmeng Xu, Christian Simon, Shoufa Chen, Jiawei Ren, Yanping Xie, Juan-Manuel Perez-Rua, Bodo Rosenhahn, Tao Xiang, Sen He · 2023 · arXiv 2310.05922

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

citation-role summary

background 1 dataset 1

citation-polarity summary

background 1 use dataset 1

representative citing papers

TeDiO: Temporal Diagonal Optimization for Training-Free Coherent Video Diffusion

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

TeDiO regularizes temporal diagonals in diffusion transformer attention maps to produce smoother video motion while keeping per-frame quality intact.

OphEdit: Training-Free Text-Guided Editing of Ophthalmic Surgical Videos

cs.CV · 2026-05-08 · unverdicted · novelty 7.0

OphEdit enables text-guided editing of eye surgery videos without training by injecting preserved attention value tensors into the diffusion denoising process to maintain anatomical structure.

FrameDiT: Diffusion Transformer with Matrix Attention for Efficient Video Generation

cs.CV · 2026-03-10 · unverdicted · novelty 7.0

FrameDiT proposes Matrix Attention for DiTs to achieve SOTA video generation with improved temporal coherence and efficiency comparable to local factorized attention.

ASTRA: Let Arbitrary Subjects Transform in Video Editing

cs.CV · 2025-10-01 · unverdicted · novelty 7.0

ASTRA is a plug-and-play training-free method for precise multi-subject video editing that uses prompt-guided multimodal alignment and prior-based mask retargeting to avoid attention dilution and boundary issues.

Semantic-Aware, Physics-Informed, Geometry-Grounded Weather Video Synthesis

cs.CV · 2026-06-27 · unverdicted · novelty 6.0

A new framework factorizes weather video synthesis into semantic appearance anchoring, physics-informed Gaussian particle simulation under gravity/wind/turbulence, and geometry-grounded alignment to produce diverse realistic weather effects.

Towards 3D-Aware Video Diffusion Models: Render-Free Human Motion Control with Mesh Tokenization

cs.CV · 2026-06-01 · unverdicted · novelty 6.0

Introduces mesh tokenization to condition DiT-based video diffusion models directly on 3D human meshes for motion control without 2D rendering.

DiT as Real-Time Rerenderer: Streaming Video Stylization with Autoregressive Diffusion Transformer

cs.CV · 2026-04-15 · unverdicted · novelty 6.0

RTR-DiT distills a bidirectional DiT teacher into an autoregressive few-step model using Self Forcing and Distribution Matching Distillation, plus a reference-preserving KV cache, to enable stable real-time text- and reference-guided video stylization.

InsEdit: Towards Instruction-based Visual Editing via Data-Efficient Video Diffusion Models Adaptation

cs.CV · 2026-04-09 · unverdicted · novelty 6.0

InsEdit adapts a video diffusion backbone for text-instruction video editing via Mutual Context Attention, achieving SOTA open-source results with O(100K) data while also supporting image editing.

Controllable Video Object Insertion via Multiview Priors

cs.CV · 2026-04-16 · unverdicted · novelty 5.0

A multi-view prior-based framework for video object insertion that uses dual-path conditioning and an integration-aware consistency module to improve appearance stability and occlusion handling.

citing papers explorer

Showing 8 of 8 citing papers after filters.

TeDiO: Temporal Diagonal Optimization for Training-Free Coherent Video Diffusion cs.CV · 2026-05-13 · unverdicted · none · ref 12
TeDiO regularizes temporal diagonals in diffusion transformer attention maps to produce smoother video motion while keeping per-frame quality intact.
OphEdit: Training-Free Text-Guided Editing of Ophthalmic Surgical Videos cs.CV · 2026-05-08 · unverdicted · none · ref 2
OphEdit enables text-guided editing of eye surgery videos without training by injecting preserved attention value tensors into the diffusion denoising process to maintain anatomical structure.
FrameDiT: Diffusion Transformer with Matrix Attention for Efficient Video Generation cs.CV · 2026-03-10 · unverdicted · none · ref 7
FrameDiT proposes Matrix Attention for DiTs to achieve SOTA video generation with improved temporal coherence and efficiency comparable to local factorized attention.
Semantic-Aware, Physics-Informed, Geometry-Grounded Weather Video Synthesis cs.CV · 2026-06-27 · unverdicted · none · ref 11
A new framework factorizes weather video synthesis into semantic appearance anchoring, physics-informed Gaussian particle simulation under gravity/wind/turbulence, and geometry-grounded alignment to produce diverse realistic weather effects.
Towards 3D-Aware Video Diffusion Models: Render-Free Human Motion Control with Mesh Tokenization cs.CV · 2026-06-01 · unverdicted · none · ref 6
Introduces mesh tokenization to condition DiT-based video diffusion models directly on 3D human meshes for motion control without 2D rendering.
DiT as Real-Time Rerenderer: Streaming Video Stylization with Autoregressive Diffusion Transformer cs.CV · 2026-04-15 · unverdicted · none · ref 5
RTR-DiT distills a bidirectional DiT teacher into an autoregressive few-step model using Self Forcing and Distribution Matching Distillation, plus a reference-preserving KV cache, to enable stable real-time text- and reference-guided video stylization.
InsEdit: Towards Instruction-based Visual Editing via Data-Efficient Video Diffusion Models Adaptation cs.CV · 2026-04-09 · unverdicted · none · ref 6
InsEdit adapts a video diffusion backbone for text-instruction video editing via Mutual Context Attention, achieving SOTA open-source results with O(100K) data while also supporting image editing.
Controllable Video Object Insertion via Multiview Priors cs.CV · 2026-04-16 · unverdicted · none · ref 8
A multi-view prior-based framework for video object insertion that uses dual-path conditioning and an integration-aware consistency module to improve appearance stability and occlusion handling.

Flatten: optical flow-guided attention for consistent text-to-video editing.arXiv preprint arXiv:2310.05922

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer