PhyGround is a new benchmark with curated prompts, a 13-law taxonomy, large-scale human annotations, and an open physics-specialized VLM judge for evaluating physical reasoning in generative video models.
Content-rich aigc video quality assessment via intricate text alignment and motion-aware consistency
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
RefVQA uses a query-centered reference graph and graph-guided difference aggregation to improve AI-generated video quality assessment by incorporating inter-video comparisons.
NaviGen encodes user behavior via dual collaborative-textual identifiers and applies SFT+RL to produce personalized multimodal outputs and better instructions from interaction history.
TailorMind links hypergraph collaborative filtering and textual gradient descent with multimodal generation to produce user-tailored content, showing gains in novelty, aesthetics, and reranking recall on a new benchmark from three platforms.
MASS adds spatiotemporal motion signals and 3D grounding to VLMs and releases MASS-Bench, yielding physics-reasoning performance within 2% of Gemini-2.5-Flash after reinforcement fine-tuning.
citing papers explorer
-
PhyGround: Benchmarking Physical Reasoning in Generative World Models
PhyGround is a new benchmark with curated prompts, a 13-law taxonomy, large-scale human annotations, and an open physics-specialized VLM judge for evaluating physical reasoning in generative video models.
-
Comparison Drives Preference: Reference-Aware Modeling for AI-Generated Video Quality Assessment
RefVQA uses a query-centered reference graph and graph-guided difference aggregation to improve AI-generated video quality assessment by incorporating inter-video comparisons.
-
Navigating User Behavior toward Personalized Multimodal Generation
NaviGen encodes user behavior via dual collaborative-textual identifiers and applies SFT+RL to produce personalized multimodal outputs and better instructions from interaction history.
-
TailorMind: Towards Preference-Aligned Multimodal Content Generation
TailorMind links hypergraph collaborative filtering and textual gradient descent with multimodal generation to produce user-tailored content, showing gains in novelty, aesthetics, and reranking recall on a new benchmark from three platforms.
-
MASS: Motion-Aware Spatial-Temporal Grounding for Physics Reasoning and Comprehension in Vision-Language Models
MASS adds spatiotemporal motion signals and 3D grounding to VLMs and releases MASS-Bench, yielding physics-reasoning performance within 2% of Gemini-2.5-Flash after reinforcement fine-tuning.