pith. sign in

hub Canonical reference

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Canonical reference. 85% of citing Pith papers cite this work as background.

48 Pith papers citing it
Background 85% of classified citations
abstract

Sora is a text-to-video generative AI model, released by OpenAI in February 2024. The model is trained to generate videos of realistic or imaginative scenes from text instructions and show potential in simulating the physical world. Based on public technical reports and reverse engineering, this paper presents a comprehensive review of the model's background, related technologies, applications, remaining challenges, and future directions of text-to-video AI models. We first trace Sora's development and investigate the underlying technologies used to build this "world simulator". Then, we describe in detail the applications and potential impact of Sora in multiple industries ranging from film-making and education to marketing. We discuss the main challenges and limitations that need to be addressed to widely deploy Sora, such as ensuring safe and unbiased video generation. Lastly, we discuss the future development of Sora and video generation models in general, and how advancements in the field could enable new ways of human-AI interaction, boosting productivity and creativity of video generation.

hub tools

citation-role summary

background 12 baseline 1

citation-polarity summary

representative citing papers

DEVIS-GRPO: Unleashing GRPO on Dynamic Extreme View Synthesis

cs.CV · 2026-05-16 · unverdicted · novelty 7.0

DEVIS-GRPO applies online policy gradients with an accumulative small-to-large view sampling strategy and multi-level rewards to improve trajectory-controlled extreme view video generation, reporting gains on Kubric-4D, iPhone, and DL3DV datasets.

Immune2V: Image Immunization Against Dual-Stream Image-to-Video Generation

cs.CV · 2026-04-12 · unverdicted · novelty 7.0

Immune2V immunizes images against dual-stream I2V generation by enforcing temporally balanced latent divergence and aligning generative features to a precomputed collapse trajectory, yielding stronger persistent degradation than image-level baselines.

Controllable Generative Video Compression

cs.CV · 2026-04-08 · unverdicted · novelty 7.0

CGVC uses coded keyframes and per-frame priors to guide controllable generative reconstruction of video frames, outperforming prior perceptual compression methods in both signal fidelity and perceptual quality.

VDFP: Video Deflickering with Flicker-banding Priors

cs.CV · 2026-05-20 · unverdicted · novelty 6.0 · 2 refs

VDFP uses degradation field modeling based on rolling shutter and continuous prior perception with a flicker-aware loss to deflicker videos while preserving spatial-temporal details via zero-initialized pre-trained priors.

DiffATS: Diffusion in Aligned Tensor Space

cs.LG · 2026-05-10 · unverdicted · novelty 6.0

DiffATS trains diffusion models directly on aligned Tucker tensor primitives that are proven to be homeomorphisms, delivering efficient unconditional and conditional generation across images, videos, and PDE data with high compression.

citing papers explorer

Showing 48 of 48 citing papers.