T2v-compbench: A comprehen- sive benchmark for compositional text-to-video generation

Kaiyue Sun, Kaiyi Huang, Xian Liu, Yue Wu, Zihan Xu, Zhenguo Li, Xihui Liu · 2024 · arXiv 2407.14505

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Self-Correcting Text-to-Video Generation with Misalignment Detection and Localized Refinement

cs.CV · 2024-11-22 · unverdicted · novelty 7.0

VideoRepair detects text-video misalignments via MLLM-generated questions and performs localized, region-preserving refinement to improve alignment in existing T2V diffusion models.

Quantitative Video World Model Evaluation for Geometric-Consistency

cs.CV · 2026-05-14 · unverdicted · novelty 6.0

PDI-Bench computes 3D projective residuals from segmented and tracked points to quantify geometric inconsistency in AI-generated videos.

PhyMotion: Structured 3D Motion Reward for Physics-Grounded Human Video Generation

cs.CV · 2026-05-14 · conditional · novelty 6.0

PhyMotion scores generated human videos by grounding recovered 3D poses in a physics simulator across kinematic, contact, and dynamic axes, yielding stronger human correlation and larger RL post-training gains than prior 2D rewards.

RAPO++: Cross-Stage Prompt Optimization for Text-to-Video Generation via Data Alignment and Test-Time Scaling

cs.CV · 2025-10-23 · unverdicted · novelty 6.0

RAPO++ is a three-stage prompt optimization framework combining retrieval-augmented refinement, closed-loop test-time scaling, and LLM fine-tuning to enhance text-to-video generation quality.

Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation

cs.RO · 2025-08-07 · unverdicted · novelty 6.0

Genie Envisioner unifies robotic policy learning, simulation, and evaluation inside one instruction-conditioned video diffusion framework using GE-Base, GE-Act, and GE-Sim.

We'll Fix it in Post: Improving Text-to-Video Generation with Neuro-Symbolic Feedback

cs.CV · 2025-04-24 · unverdicted · novelty 6.0

NeuS-E is a post-generation refinement method that uses neuro-symbolic analysis of a formal video representation to detect and correct semantic and temporal inconsistencies in text-to-video outputs, improving prompt alignment by nearly 40%.

VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness

cs.CV · 2025-03-27 · accept · novelty 6.0

VBench-2.0 is a benchmark suite that automatically evaluates video generative models on five dimensions of intrinsic faithfulness: Human Fidelity, Controllability, Creativity, Physics, and Commonsense using VLMs, LLMs, and anomaly detection methods.

Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation

cs.CV · 2024-10-07 · unverdicted · novelty 6.0

PhyGenBench supplies 160 prompts across 27 physical laws and an automated LLM/VLM evaluation pipeline to measure physical commonsense compliance in current text-to-video models.

citing papers explorer

Showing 8 of 8 citing papers.

Self-Correcting Text-to-Video Generation with Misalignment Detection and Localized Refinement cs.CV · 2024-11-22 · unverdicted · none · ref 45
VideoRepair detects text-video misalignments via MLLM-generated questions and performs localized, region-preserving refinement to improve alignment in existing T2V diffusion models.
Quantitative Video World Model Evaluation for Geometric-Consistency cs.CV · 2026-05-14 · unverdicted · none · ref 26
PDI-Bench computes 3D projective residuals from segmented and tracked points to quantify geometric inconsistency in AI-generated videos.
PhyMotion: Structured 3D Motion Reward for Physics-Grounded Human Video Generation cs.CV · 2026-05-14 · conditional · none · ref 14
PhyMotion scores generated human videos by grounding recovered 3D poses in a physics simulator across kinematic, contact, and dynamic axes, yielding stronger human correlation and larger RL post-training gains than prior 2D rewards.
RAPO++: Cross-Stage Prompt Optimization for Text-to-Video Generation via Data Alignment and Test-Time Scaling cs.CV · 2025-10-23 · unverdicted · none · ref 58
RAPO++ is a three-stage prompt optimization framework combining retrieval-augmented refinement, closed-loop test-time scaling, and LLM fine-tuning to enhance text-to-video generation quality.
Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation cs.RO · 2025-08-07 · unverdicted · none · ref 27
Genie Envisioner unifies robotic policy learning, simulation, and evaluation inside one instruction-conditioned video diffusion framework using GE-Base, GE-Act, and GE-Sim.
We'll Fix it in Post: Improving Text-to-Video Generation with Neuro-Symbolic Feedback cs.CV · 2025-04-24 · unverdicted · none · ref 79
NeuS-E is a post-generation refinement method that uses neuro-symbolic analysis of a formal video representation to detect and correct semantic and temporal inconsistencies in text-to-video outputs, improving prompt alignment by nearly 40%.
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness cs.CV · 2025-03-27 · accept · none · ref 73
VBench-2.0 is a benchmark suite that automatically evaluates video generative models on five dimensions of intrinsic faithfulness: Human Fidelity, Controllability, Creativity, Physics, and Commonsense using VLMs, LLMs, and anomaly detection methods.
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation cs.CV · 2024-10-07 · unverdicted · none · ref 29
PhyGenBench supplies 160 prompts across 27 physical laws and an automated LLM/VLM evaluation pipeline to measure physical commonsense compliance in current text-to-video models.

T2v-compbench: A comprehen- sive benchmark for compositional text-to-video generation

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer