pith. sign in

hub

Godiva: Generating open-domain videos from natural descriptions

17 Pith papers cite this work. Polarity classification is still indexing.

17 Pith papers citing it

hub tools

citation-role summary

baseline 2 background 1

citation-polarity summary

clear filters

representative citing papers

Learning Interactive Real-World Simulators

cs.AI · 2023-10-09 · conditional · novelty 7.0

UniSim learns a universal real-world simulator from orchestrated diverse datasets, enabling zero-shot deployment of policies trained purely in simulation.

StreamEdit: Training-Free Video Editing via Few-Step Streaming Video Generation

cs.CV · 2026-05-20 · unverdicted · novelty 6.0 · 2 refs

StreamEdit enables high-quality training-free video editing by adapting streaming video generation models with dual-branch fast sampling, self-attention bridge, cross-attention grounding, source-oriented guidance, and visual prompting, outperforming prior methods in few-step regimes.

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

cs.CV · 2023-11-25 · conditional · novelty 6.0

Stable Video Diffusion scales latent video diffusion models via text-to-image pretraining, video pretraining on curated data, and high-quality finetuning to produce competitive text-to-video and image-to-video results while enabling motion LoRA and multi-view 3D applications.

ModelScope Text-to-Video Technical Report

cs.CV · 2023-08-12 · unverdicted · novelty 4.0

ModelScopeT2V is a 1.7-billion-parameter text-to-video model built on Stable Diffusion that adds temporal modeling and outperforms prior methods on three evaluation metrics.

citing papers explorer

Showing 2 of 2 citing papers after filters.

  • Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets cs.CV · 2023-11-25 · conditional · none · ref 103

    Stable Video Diffusion scales latent video diffusion models via text-to-image pretraining, video pretraining on curated data, and high-quality finetuning to produce competitive text-to-video and image-to-video results while enabling motion LoRA and multi-view 3D applications.

  • ModelScope Text-to-Video Technical Report cs.CV · 2023-08-12 · unverdicted · none · ref 61

    ModelScopeT2V is a 1.7-billion-parameter text-to-video model built on Stable Diffusion that adds temporal modeling and outperforms prior methods on three evaluation metrics.