pith. sign in

hub

arXiv preprint arXiv:2510.14648 , year=

14 Pith papers cite this work. Polarity classification is still indexing.

14 Pith papers citing it

hub tools

citation-role summary

background 2 baseline 2

citation-polarity summary

fields

cs.CV 14

years

2026 13 2025 1

verdicts

UNVERDICTED 14

clear filters

representative citing papers

VideoCoF: Unified Video Editing with Temporal Reasoner

cs.CV · 2025-12-08 · unverdicted · novelty 7.0

VideoCoF adds an explicit reasoning step using edit-region latents in video diffusion models to enable precise mask-free editing and motion alignment with only 50k training pairs.

SteerVTE: Seamless Video Text Editing with Style and Glyph Control

cs.CV · 2026-06-22 · unverdicted · novelty 6.0

SteerVTE adds lightweight style and dual-granularity glyph adapters to a frozen video diffusion model, introduces a glyph-aware loss and progressive training, and releases a 1M synthetic dataset to enable accurate video text editing.

SpongeBob: Sync-Aware Harmonious Audio-Visual Generative Editing

cs.CV · 2026-05-24 · unverdicted · novelty 6.0

SpongeBob introduces the first end-to-end audio-visual joint editing framework using sync-aware bidirectional attention and context-aware modules, plus a new dataset and benchmark, claiming 30% Sync-C and 12.5% Ctx-F1 gains over baselines.

Bernini: Latent Semantic Planning for Video Diffusion

cs.CV · 2026-05-21 · unverdicted · novelty 5.0

Bernini is a framework that uses an MLLM planner to output semantic representations for a DiT renderer to generate or edit videos, reporting SOTA benchmark performance.

citing papers explorer

Showing 14 of 14 citing papers after filters.