Sora detector: A unified hallucination detection for large text-to-video models

Chu, Z · 2024 · arXiv 2405.04180

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

TransVLM: A Vision-Language Framework and Benchmark for Detecting Any Shot Transitions

cs.CV · 2026-04-30 · unverdicted · novelty 7.0

TransVLM formalizes Shot Transition Detection as identifying full temporal transition segments rather than single cut points and introduces a VLM that injects optical flow as a motion prior via simple feature fusion, plus a synthetic data engine and benchmark.

Enhancing Train-Free Infinite-Frame Generation for Consistent Long Videos

cs.CV · 2026-05-18 · unverdicted · novelty 6.0

MIGA introduces two-stage alignment to close train-inference gaps and dual consistency enhancement via self-reflection and long-range guidance to achieve SOTA temporal consistency in infinite-frame video generation on VBench and NarrLV.

We'll Fix it in Post: Improving Text-to-Video Generation with Neuro-Symbolic Feedback

cs.CV · 2025-04-24 · unverdicted · novelty 6.0

NeuS-E is a post-generation refinement method that uses neuro-symbolic analysis of a formal video representation to detect and correct semantic and temporal inconsistencies in text-to-video outputs, improving prompt alignment by nearly 40%.

citing papers explorer

Showing 3 of 3 citing papers.

TransVLM: A Vision-Language Framework and Benchmark for Detecting Any Shot Transitions cs.CV · 2026-04-30 · unverdicted · none · ref 12
TransVLM formalizes Shot Transition Detection as identifying full temporal transition segments rather than single cut points and introduces a VLM that injects optical flow as a motion prior via simple feature fusion, plus a synthetic data engine and benchmark.
Enhancing Train-Free Infinite-Frame Generation for Consistent Long Videos cs.CV · 2026-05-18 · unverdicted · none · ref 8
MIGA introduces two-stage alignment to close train-inference gaps and dual consistency enhancement via self-reflection and long-range guidance to achieve SOTA temporal consistency in infinite-frame video generation on VBench and NarrLV.
We'll Fix it in Post: Improving Text-to-Video Generation with Neuro-Symbolic Feedback cs.CV · 2025-04-24 · unverdicted · none · ref 16
NeuS-E is a post-generation refinement method that uses neuro-symbolic analysis of a formal video representation to detect and correct semantic and temporal inconsistencies in text-to-video outputs, improving prompt alignment by nearly 40%.

Sora detector: A unified hallucination detection for large text-to-video models

fields

years

verdicts

representative citing papers

citing papers explorer