PhysInOne is a new dataset of 2 million videos across 153,810 dynamic 3D scenes covering 71 physical phenomena, shown to improve AI performance on physics-aware video generation, prediction, property estimation, and motion transfer.
Phyx: Does your model have the” wits” for physical reasoning?
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
FeynmanBench is the first benchmark for evaluating multimodal LLMs on diagrammatic reasoning with Feynman diagrams, revealing systematic failures in enforcing physical constraints and global topology.
SeePhys Pro benchmark reveals multimodal models degrade on physics reasoning as information transfers from text to images, with blind training improvements often stemming from textual cues rather than visual evidence.
PolyReal benchmark shows leading MLLMs perform well on polymer knowledge reasoning but drop sharply on practical tasks like lab safety analysis and raw data extraction.
VLMs show answer inertia in CoT reasoning and remain influenced by misleading textual cues even with sufficient visual evidence, making CoT an incomplete window into modality reliance.
MLLMs exhibit a consistent recognition-reasoning inversion on discrete visual symbols across domains, underperforming on elementary perception while appearing competent on higher-level reasoning via linguistic compensation.
VisPhyWorld evaluates MLLMs' physical reasoning via executable code generation for video reconstruction, with VisPhyBench showing strong semantics but weak parameter inference and dynamics simulation.
Video foundation models infer dynamic physical properties such as elasticity, viscosity, and friction from videos at levels close to classical oracles while outperforming current MLLMs with suitable prompting.
citing papers explorer
-
PhysInOne: Visual Physics Learning and Reasoning in One Suite
PhysInOne is a new dataset of 2 million videos across 153,810 dynamic 3D scenes covering 71 physical phenomena, shown to improve AI performance on physics-aware video generation, prediction, property estimation, and motion transfer.
-
FeynmanBench: Benchmarking Multimodal LLMs on Diagrammatic Physics Reasoning
FeynmanBench is the first benchmark for evaluating multimodal LLMs on diagrammatic reasoning with Feynman diagrams, revealing systematic failures in enforcing physical constraints and global topology.
-
SeePhys Pro: Diagnosing Modality Transfer and Blind-Training Effects in Multimodal RLVR for Physics Reasoning
SeePhys Pro benchmark reveals multimodal models degrade on physics reasoning as information transfers from text to images, with blind training improvements often stemming from textual cues rather than visual evidence.
-
PolyReal: A Benchmark for Real-World Polymer Science Workflows
PolyReal benchmark shows leading MLLMs perform well on polymer knowledge reasoning but drop sharply on practical tasks like lab safety analysis and raw data extraction.
-
Reasoning Dynamics and the Limits of Monitoring Modality Reliance in Vision-Language Models
VLMs show answer inertia in CoT reasoning and remain influenced by misleading textual cues even with sufficient visual evidence, making CoT an incomplete window into modality reliance.
-
Cognitive Mismatch in Multimodal Large Language Models for Discrete Symbol Understanding
MLLMs exhibit a consistent recognition-reasoning inversion on discrete visual symbols across domains, underperforming on elementary perception while appearing competent on higher-level reasoning via linguistic compensation.
-
VisPhyWorld: Probing Physical Reasoning via Code-Driven Video Reconstruction
VisPhyWorld evaluates MLLMs' physical reasoning via executable code generation for video reconstruction, with VisPhyBench showing strong semantics but weak parameter inference and dynamics simulation.
-
Inferring Dynamic Physical Properties from Video Foundation Models
Video foundation models infer dynamic physical properties such as elasticity, viscosity, and friction from videos at levels close to classical oracles while outperforming current MLLMs with suitable prompting.
- Cognitive Pivot Points and Visual Anchoring: Unveiling and Rectifying Hallucinations in Multimodal Reasoning Models