pith. sign in

Five: A fine-grained video editing benchmark for evaluating emerging diffusion and rectified flow models

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

citation-role summary

background 1 dataset 1

citation-polarity summary

fields

cs.CV 6

years

2026 4 2025 2

verdicts

UNVERDICTED 6

representative citing papers

VideoCoF: Unified Video Editing with Temporal Reasoner

cs.CV · 2025-12-08 · unverdicted · novelty 7.0

VideoCoF adds an explicit reasoning step using edit-region latents in video diffusion models to enable precise mask-free editing and motion alignment with only 50k training pairs.

Bernini: Latent Semantic Planning for Video Diffusion

cs.CV · 2026-05-21 · unverdicted · novelty 5.0

Bernini is a framework that uses an MLLM planner to output semantic representations for a DiT renderer to generate or edit videos, reporting SOTA benchmark performance.

Context Unrolling in Omni Models

cs.CV · 2026-04-23 · unverdicted · novelty 5.0

Omni is a multimodal model whose native training on diverse data types enables context unrolling, allowing explicit reasoning across modalities to better approximate shared knowledge and improve downstream performance.

citing papers explorer

Showing 6 of 6 citing papers.

  • LangDriveCTRL: Natural Language Controllable Driving Scene Editing with Multi-modal Agents cs.CV · 2025-12-19 · unverdicted · none · ref 23

    LangDriveCTRL decomposes driving videos into 3D scene graphs and uses an agentic pipeline with specialized multi-modal agents to perform language-controlled object and behavior edits, achieving nearly 2x higher instruction alignment than prior state-of-the-art methods.

  • VideoCoF: Unified Video Editing with Temporal Reasoner cs.CV · 2025-12-08 · unverdicted · none · ref 17

    VideoCoF adds an explicit reasoning step using edit-region latents in video diffusion models to enable precise mask-free editing and motion alignment with only 50k training pairs.

  • StreamGVE: Training-Free Video Editing via Few-Step Streaming Video Generation cs.CV · 2026-05-20 · unverdicted · none · ref 42

    StreamGVE enables high-quality training-free video editing by converting the task to noise-to-data streaming generation with dual-branch fast sampling, self-attention bridges, cross-attention grounding, source-oriented guidance, and visual prompting.

  • VEFX-Bench: A Holistic Benchmark for Generic Video Editing and Visual Effects cs.CV · 2026-04-17 · unverdicted · none · ref 14

    VEFX-Bench releases a large human-labeled video editing dataset, a multi-dimensional reward model, and a standardized benchmark that better matches human judgments than generic evaluators.

  • Bernini: Latent Semantic Planning for Video Diffusion cs.CV · 2026-05-21 · unverdicted · none · ref 37

    Bernini is a framework that uses an MLLM planner to output semantic representations for a DiT renderer to generate or edit videos, reporting SOTA benchmark performance.

  • Context Unrolling in Omni Models cs.CV · 2026-04-23 · unverdicted · none · ref 25

    Omni is a multimodal model whose native training on diverse data types enables context unrolling, allowing explicit reasoning across modalities to better approximate shared knowledge and improve downstream performance.