hub

Dreamix: Video diffusion models are general video editors

Molad, E · 2023 · arXiv 2302.01329

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

read on arXiv browse 11 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution

cs.CL · 2023-09-28 · unverdicted · novelty 8.0

Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.

InstructAV2AV: Instruction-Guided Audio-Video Joint Editing

cs.CV · 2026-05-18 · unverdicted · novelty 7.0

InstructAV2AV is an end-to-end instruction-guided audio-video joint editing model that adapts a pre-trained backbone with gated attention and two-stage training, outperforming prior methods on 11 metrics after building the InsAVE-80K dataset.

Functionalization via Structure Completion and Motion Rectification

cs.CV · 2026-05-18 · unverdicted · novelty 7.0

Object functionalization is cast as neural graph completion over a functional graph of parts, contacts, and motions, followed by geometry realization that also rectifies erroneous motions, demonstrated on furniture with a new paired dataset.

Learning Interactive Real-World Simulators

cs.AI · 2023-10-09 · conditional · novelty 7.0

UniSim learns a universal real-world simulator from orchestrated diverse datasets, enabling zero-shot deployment of policies trained purely in simulation.

When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models

cs.CV · 2026-04-09 · unverdicted · novelty 6.0

NUMINA improves counting accuracy in text-to-video diffusion models by up to 7.4% via a training-free identify-then-guide framework on the new CountBench dataset.

Video models are zero-shot learners and reasoners

cs.LG · 2025-09-24 · unverdicted · novelty 6.0

Generative video models exhibit emergent zero-shot capabilities across perception, manipulation, and basic reasoning tasks.

We'll Fix it in Post: Improving Text-to-Video Generation with Neuro-Symbolic Feedback

cs.CV · 2025-04-24 · unverdicted · novelty 6.0

NeuS-E is a post-generation refinement method that uses neuro-symbolic analysis of a formal video representation to detect and correct semantic and temporal inconsistencies in text-to-video outputs, improving prompt alignment by nearly 40%.

VideoCrafter1: Open Diffusion Models for High-Quality Video Generation

cs.CV · 2023-10-30 · unverdicted · novelty 6.0

Open-source text-to-video and image-to-video diffusion models generate high-quality 1024x576 videos, with the I2V variant claimed as the first to strictly preserve reference image content.

Scaling Robot Learning with Semantically Imagined Experience

cs.RO · 2023-02-22 · unverdicted · novelty 6.0

Augmenting robot datasets via diffusion-based semantic inpainting enables manipulation policies to solve unseen tasks with new objects and improves robustness to novel distractors.

DiffMagicFace: Identity Consistent Facial Editing of Real Videos

cs.CV · 2026-04-15 · unverdicted · novelty 5.0

DiffMagicFace uses concurrent fine-tuned text and image diffusion models plus a rendered multi-view dataset to achieve identity-consistent text-conditioned editing of real facial videos.

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

cs.CV · 2024-02-27 · unverdicted · novelty 2.0

The paper reviews the background, technology, applications, limitations, and future directions of OpenAI's Sora text-to-video generative model based on public information.

citing papers explorer

Showing 11 of 11 citing papers.

Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution cs.CL · 2023-09-28 · unverdicted · none · ref 235
Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.
InstructAV2AV: Instruction-Guided Audio-Video Joint Editing cs.CV · 2026-05-18 · unverdicted · none · ref 20
InstructAV2AV is an end-to-end instruction-guided audio-video joint editing model that adapts a pre-trained backbone with gated attention and two-stage training, outperforming prior methods on 11 metrics after building the InsAVE-80K dataset.
Functionalization via Structure Completion and Motion Rectification cs.CV · 2026-05-18 · unverdicted · none · ref 99
Object functionalization is cast as neural graph completion over a functional graph of parts, contacts, and motions, followed by geometry realization that also rectifies erroneous motions, demonstrated on furniture with a new paired dataset.
Learning Interactive Real-World Simulators cs.AI · 2023-10-09 · conditional · none · ref 171
UniSim learns a universal real-world simulator from orchestrated diverse datasets, enabling zero-shot deployment of policies trained purely in simulation.
When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models cs.CV · 2026-04-09 · unverdicted · none · ref 49
NUMINA improves counting accuracy in text-to-video diffusion models by up to 7.4% via a training-free identify-then-guide framework on the new CountBench dataset.
Video models are zero-shot learners and reasoners cs.LG · 2025-09-24 · unverdicted · none · ref 24
Generative video models exhibit emergent zero-shot capabilities across perception, manipulation, and basic reasoning tasks.
We'll Fix it in Post: Improving Text-to-Video Generation with Neuro-Symbolic Feedback cs.CV · 2025-04-24 · unverdicted · none · ref 66
NeuS-E is a post-generation refinement method that uses neuro-symbolic analysis of a formal video representation to detect and correct semantic and temporal inconsistencies in text-to-video outputs, improving prompt alignment by nearly 40%.
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation cs.CV · 2023-10-30 · unverdicted · none · ref 35
Open-source text-to-video and image-to-video diffusion models generate high-quality 1024x576 videos, with the I2V variant claimed as the first to strictly preserve reference image content.
Scaling Robot Learning with Semantically Imagined Experience cs.RO · 2023-02-22 · unverdicted · none · ref 77
Augmenting robot datasets via diffusion-based semantic inpainting enables manipulation policies to solve unseen tasks with new objects and improves robustness to novel distractors.
DiffMagicFace: Identity Consistent Facial Editing of Real Videos cs.CV · 2026-04-15 · unverdicted · none · ref 14
DiffMagicFace uses concurrent fine-tuned text and image diffusion models plus a rendered multi-view dataset to achieve identity-consistent text-conditioned editing of real facial videos.
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models cs.CV · 2024-02-27 · unverdicted · none · ref 184
The paper reviews the background, technology, applications, limitations, and future directions of OpenAI's Sora text-to-video generative model based on public information.

Dreamix: Video diffusion models are general video editors

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer