hub Mixed citations

Huynh-Thu, Q

Xuan He, Dongfu Jiang, Ping Nie, Minghao Liu, Zhengxuan Jiang, Mingyi Su, Wentao Ma, Junru Lin, Chun Ye, Yi Lu, et al · 2025 · arXiv 2509.22799

Mixed citation behavior. Most common role is background (67%).

15 Pith papers citing it

Background 67% of classified citations

read on arXiv browse 15 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4 baseline 2

citation-polarity summary

background 4 baseline 2

representative citing papers

RoboGaze: Evaluating Robot World Models via Structured Vision-Language Analysis

cs.RO · 2026-06-22 · unverdicted · novelty 7.0

RoboGaze presents a structured multi-agent VLM pipeline and robotics-specific error taxonomy that improves video evaluation metrics by up to 43 F1 points over zero-shot baselines on a 382-clip dataset.

CaC: Advancing Video Reward Models via Hierarchical Spatiotemporal Concentrating

cs.CV · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

CaC presents a new spatiotemporal concentrating reward model for video anomalies, built on a novel large-scale dataset and three-stage training with RL and IoU rewards, claiming 25.7% accuracy gains and 11.7% anomaly reduction.

PhyGround: Benchmarking Physical Reasoning in Generative World Models

cs.CV · 2026-05-11 · accept · novelty 7.0

PhyGround is a new benchmark with curated prompts, a 13-law taxonomy, large-scale human annotations, and an open physics-specialized VLM judge for evaluating physical reasoning in generative video models.

RewardHarness: Self-Evolving Agentic Post-Training

cs.AI · 2026-05-09 · unverdicted · novelty 7.0

RewardHarness self-evolves a tool-and-skill library from 100 preference examples to reach 47.4% accuracy on image-edit evaluation, beating GPT-5, and yields stronger RL-tuned models.

AnimationBench: Are Video Models Good at Character-Centric Animation?

cs.CV · 2026-04-16 · unverdicted · novelty 7.0

AnimationBench is the first benchmark that operationalizes the twelve basic principles of animation and IP preservation into scalable, VLM-assisted metrics for animation-style I2V generation.

SpecLoR: Spectral Lookahead Rectification for Motion-Coherent Text-to-Video Generation

cs.CV · 2026-06-10 · unverdicted · novelty 6.0

SpecLoR rectifies the amplitude spectrum of lookahead-estimated clean latents to natural-video priors during early ODE sampling steps, cutting physical artifacts with only four extra NFEs.

WorldReasonBench: Human-Aligned Stress Testing of Video Generators as Future World-State Predictors

cs.CV · 2026-05-11 · unverdicted · novelty 6.0

The paper presents WorldReasonBench, a benchmark that tests video generators on maintaining physical, social, logical, and informational consistency when predicting future states from initial conditions and actions.

Think, then Score: Decoupled Reasoning and Scoring for Video Reward Modeling

cs.CV · 2026-05-07 · unverdicted · novelty 6.0

DeScore decouples CoT reasoning from reward scoring in video reward models using a two-stage training process to improve generalization and avoid optimization bottlenecks of coupled generative RMs.

How Far Are Video Models from True Multimodal Reasoning?

cs.CV · 2026-04-21 · unverdicted · novelty 6.0

Current video models succeed on basic understanding but achieve under 25% success on logically grounded generation and near 0% on interactive generation, exposing gaps in multimodal reasoning.

Long-CODE: Isolating Pure Long-Context as an Orthogonal Dimension in Video Evaluation

cs.CV · 2026-04-19 · unverdicted · novelty 6.0

Long-CODE isolates long-context video evaluation with a new benchmark dataset and shot-dynamics metric that correlates better with human judgments on narrative richness and global consistency than short-video metrics.

MMPhysVideo: Scaling Physical Plausibility in Video Generation via Joint Multimodal Modeling

cs.CV · 2026-04-03 · unverdicted · novelty 6.0

MMPhysVideo improves physical plausibility in video diffusion models by jointly modeling RGB with perceptual cues in pseudo-RGB format via a bidirectional teacher-student architecture and a new data curation pipeline.

VisPhyWorld: Probing Physical Reasoning via Code-Driven Video Reconstruction

cs.CV · 2026-02-09 · unverdicted · novelty 6.0

VisPhyWorld evaluates MLLMs' physical reasoning via executable code generation for video reconstruction, with VisPhyBench showing strong semantics but weak parameter inference and dynamics simulation.

Plan-and-Verify Video Reward Reasoning with Spatio-Temporal Scene Graph Grounding

cs.CV · 2026-06-10 · unverdicted · novelty 5.0

SG-PVR introduces plan-and-verify reasoning grounded in spatio-temporal scene graphs to address verification gaps and implicit evidence in existing T2V reward models.

CineDance: Towards Next-Generation Multi-Shot Long-Form Cinematic Audio-Video Generation

cs.CV · 2026-06-08 · unverdicted · novelty 5.0

Introduces CineDance-1M dataset for multi-shot long-form text-to-audio-video generation along with CineBench and a model adaptation.

LoViF 2026 The First Challenge on Holistic Quality Assessment for 4D World Model (PhyScore)

cs.CV · 2026-05-06 · conditional · novelty 5.0

The PhyScore challenge creates the first benchmark requiring metrics to jointly score video quality, physical realism, condition alignment, and temporal consistency while localizing physical anomalies in 1554 videos from seven generative models across text-to-2D, image-to-4D, and video-to-4D tracks.

citing papers explorer

Showing 15 of 15 citing papers.

RoboGaze: Evaluating Robot World Models via Structured Vision-Language Analysis cs.RO · 2026-06-22 · unverdicted · none · ref 12
RoboGaze presents a structured multi-agent VLM pipeline and robotics-specific error taxonomy that improves video evaluation metrics by up to 43 F1 points over zero-shot baselines on a 382-clip dataset.
CaC: Advancing Video Reward Models via Hierarchical Spatiotemporal Concentrating cs.CV · 2026-05-12 · unverdicted · none · ref 16 · 2 links
CaC presents a new spatiotemporal concentrating reward model for video anomalies, built on a novel large-scale dataset and three-stage training with RL and IoU rewards, claiming 25.7% accuracy gains and 11.7% anomaly reduction.
PhyGround: Benchmarking Physical Reasoning in Generative World Models cs.CV · 2026-05-11 · accept · none · ref 15
PhyGround is a new benchmark with curated prompts, a 13-law taxonomy, large-scale human annotations, and an open physics-specialized VLM judge for evaluating physical reasoning in generative video models.
RewardHarness: Self-Evolving Agentic Post-Training cs.AI · 2026-05-09 · unverdicted · none · ref 8
RewardHarness self-evolves a tool-and-skill library from 100 preference examples to reach 47.4% accuracy on image-edit evaluation, beating GPT-5, and yields stronger RL-tuned models.
AnimationBench: Are Video Models Good at Character-Centric Animation? cs.CV · 2026-04-16 · unverdicted · none · ref 7
AnimationBench is the first benchmark that operationalizes the twelve basic principles of animation and IP preservation into scalable, VLM-assisted metrics for animation-style I2V generation.
SpecLoR: Spectral Lookahead Rectification for Motion-Coherent Text-to-Video Generation cs.CV · 2026-06-10 · unverdicted · none · ref 53
SpecLoR rectifies the amplitude spectrum of lookahead-estimated clean latents to natural-video priors during early ODE sampling steps, cutting physical artifacts with only four extra NFEs.
WorldReasonBench: Human-Aligned Stress Testing of Video Generators as Future World-State Predictors cs.CV · 2026-05-11 · unverdicted · none · ref 4
The paper presents WorldReasonBench, a benchmark that tests video generators on maintaining physical, social, logical, and informational consistency when predicting future states from initial conditions and actions.
Think, then Score: Decoupled Reasoning and Scoring for Video Reward Modeling cs.CV · 2026-05-07 · unverdicted · none · ref 9
DeScore decouples CoT reasoning from reward scoring in video reward models using a two-stage training process to improve generalization and avoid optimization bottlenecks of coupled generative RMs.
How Far Are Video Models from True Multimodal Reasoning? cs.CV · 2026-04-21 · unverdicted · none · ref 22
Current video models succeed on basic understanding but achieve under 25% success on logically grounded generation and near 0% on interactive generation, exposing gaps in multimodal reasoning.
Long-CODE: Isolating Pure Long-Context as an Orthogonal Dimension in Video Evaluation cs.CV · 2026-04-19 · unverdicted · none · ref 8
Long-CODE isolates long-context video evaluation with a new benchmark dataset and shot-dynamics metric that correlates better with human judgments on narrative richness and global consistency than short-video metrics.
MMPhysVideo: Scaling Physical Plausibility in Video Generation via Joint Multimodal Modeling cs.CV · 2026-04-03 · unverdicted · none · ref 1
MMPhysVideo improves physical plausibility in video diffusion models by jointly modeling RGB with perceptual cues in pseudo-RGB format via a bidirectional teacher-student architecture and a new data curation pipeline.
VisPhyWorld: Probing Physical Reasoning via Code-Driven Video Reconstruction cs.CV · 2026-02-09 · unverdicted · none · ref 19
VisPhyWorld evaluates MLLMs' physical reasoning via executable code generation for video reconstruction, with VisPhyBench showing strong semantics but weak parameter inference and dynamics simulation.
Plan-and-Verify Video Reward Reasoning with Spatio-Temporal Scene Graph Grounding cs.CV · 2026-06-10 · unverdicted · none · ref 2
SG-PVR introduces plan-and-verify reasoning grounded in spatio-temporal scene graphs to address verification gaps and implicit evidence in existing T2V reward models.
CineDance: Towards Next-Generation Multi-Shot Long-Form Cinematic Audio-Video Generation cs.CV · 2026-06-08 · unverdicted · none · ref 18
Introduces CineDance-1M dataset for multi-shot long-form text-to-audio-video generation along with CineBench and a model adaptation.
LoViF 2026 The First Challenge on Holistic Quality Assessment for 4D World Model (PhyScore) cs.CV · 2026-05-06 · conditional · none · ref 15
The PhyScore challenge creates the first benchmark requiring metrics to jointly score video quality, physical realism, condition alignment, and temporal consistency while localizing physical anomalies in 1554 videos from seven generative models across text-to-2D, image-to-4D, and video-to-4D tracks.

Huynh-Thu, Q

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer