{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2024:IWZ5ZRQFGNOUJ4M5HMBT663Y4O","short_pith_number":"pith:IWZ5ZRQF","schema_version":"1.0","canonical_sha256":"45b3dcc605335d44f19d3b033f7b78e39bcf13b6b77c4670fdfb2c06b3004c9c","source":{"kind":"arxiv","id":"2410.05363","version":1},"attestation_state":"computed","paper":{"title":"Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"","cross_cats":[],"primary_cat":"cs.CV","authors_text":"Dianqi Li, Fanqing Meng, Jiaqi Liao, Kaipeng Zhang, Ping Luo, Quanfeng Lu, Wenqi Shao, Xinyu Tan, Yu Cheng, Yu Qiao","submitted_at":"2024-10-07T17:56:04Z","abstract_excerpt":"Text-to-video (T2V) models like Sora have made significant strides in visualizing complex prompts, which is increasingly viewed as a promising path towards constructing the universal world simulator. Cognitive psychologists believe that the foundation for achieving this goal is the ability to understand intuitive physics. However, the capacity of these models to accurately represent intuitive physics remains largely unexplored. To bridge this gap, we introduce PhyGenBench, a comprehensive \\textbf{Phy}sics \\textbf{Gen}eration \\textbf{Ben}chmark designed to evaluate physical commonsense correctn"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":false,"formal_links_present":false},"canonical_record":{"source":{"id":"2410.05363","kind":"arxiv","version":1},"metadata":{"license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","primary_cat":"cs.CV","submitted_at":"2024-10-07T17:56:04Z","cross_cats_sorted":[],"title_canon_sha256":"6dbe6eba003100b64423c486c364856307e92e060b44088407a36cbde977c653","abstract_canon_sha256":"af1b19b17d70bec5f423adc0fd86d65269ac5b5c00ddf83354ee49c5a3ddc727"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-18T14:34:27.341186Z","signature_b64":"lNbbeosh5WNp5TcYoHTLnCiNk+Eee4Xhan7LuVDms6gG5JB4ldVho3cGNo97XGImA4O7TLrRFeU0WfSxf4ZjAw==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"45b3dcc605335d44f19d3b033f7b78e39bcf13b6b77c4670fdfb2c06b3004c9c","last_reissued_at":"2026-05-18T14:34:27.334997Z","signature_status":"signed_v1","first_computed_at":"2026-05-18T14:34:27.334997Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"","cross_cats":[],"primary_cat":"cs.CV","authors_text":"Dianqi Li, Fanqing Meng, Jiaqi Liao, Kaipeng Zhang, Ping Luo, Quanfeng Lu, Wenqi Shao, Xinyu Tan, Yu Cheng, Yu Qiao","submitted_at":"2024-10-07T17:56:04Z","abstract_excerpt":"Text-to-video (T2V) models like Sora have made significant strides in visualizing complex prompts, which is increasingly viewed as a promising path towards constructing the universal world simulator. Cognitive psychologists believe that the foundation for achieving this goal is the ability to understand intuitive physics. However, the capacity of these models to accurately represent intuitive physics remains largely unexplored. To bridge this gap, we introduce PhyGenBench, a comprehensive \\textbf{Phy}sics \\textbf{Gen}eration \\textbf{Ben}chmark designed to evaluate physical commonsense correctn"},"claims":{"count":0,"items":[],"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"source":{"id":"2410.05363","kind":"arxiv","version":1},"verdict":{"id":null,"model_set":{},"created_at":null,"strongest_claim":"","one_line_summary":"","pipeline_version":null,"weakest_assumption":"","pith_extraction_headline":""},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2410.05363","created_at":"2026-05-18T14:34:27.335115+00:00"},{"alias_kind":"arxiv_version","alias_value":"2410.05363v1","created_at":"2026-05-18T14:34:27.335115+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2410.05363","created_at":"2026-05-18T14:34:27.335115+00:00"},{"alias_kind":"pith_short_12","alias_value":"IWZ5ZRQFGNOU","created_at":"2026-05-18T14:34:27.335115+00:00"},{"alias_kind":"pith_short_16","alias_value":"IWZ5ZRQFGNOUJ4M5","created_at":"2026-05-18T14:34:27.335115+00:00"},{"alias_kind":"pith_short_8","alias_value":"IWZ5ZRQF","created_at":"2026-05-18T14:34:27.335115+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":24,"internal_anchor_count":24,"sample":[{"citing_arxiv_id":"2605.23699","citing_title":"CRONOS: Benchmarking Counterfactual Physical Consistency in Video Models","ref_index":36,"is_internal_anchor":true},{"citing_arxiv_id":"2602.13294","citing_title":"VisPhyWorld: Probing Physical Reasoning via Code-Driven Video Reconstruction","ref_index":31,"is_internal_anchor":true},{"citing_arxiv_id":"2603.21743","citing_title":"CellFluxRL: Biologically-Constrained Virtual Cell Modeling via Reinforcement Learning","ref_index":31,"is_internal_anchor":true},{"citing_arxiv_id":"2603.21002","citing_title":"SURF: Signature-Retained Fast Video Generation","ref_index":23,"is_internal_anchor":true},{"citing_arxiv_id":"2512.01843","citing_title":"PhyDetEx: Detecting and Explaining the Physical Plausibility of T2V Models","ref_index":31,"is_internal_anchor":true},{"citing_arxiv_id":"2605.18396","citing_title":"NEWTON: Agentic Planning for Physically Grounded Video Generation","ref_index":22,"is_internal_anchor":true},{"citing_arxiv_id":"2605.19242","citing_title":"PhyWorld: Physics-Faithful World Model for Video Generation","ref_index":51,"is_internal_anchor":true},{"citing_arxiv_id":"2509.24702","citing_title":"Enhancing Physical Plausibility in Video Generation by Reasoning the Implausibility","ref_index":17,"is_internal_anchor":true},{"citing_arxiv_id":"2510.20206","citing_title":"RAPO++: Cross-Stage Prompt Optimization for Text-to-Video Generation via Data Alignment and Test-Time Scaling","ref_index":31,"is_internal_anchor":true},{"citing_arxiv_id":"2512.13609","citing_title":"Do-Undo Bench: Reversibility for Action Understanding in Image Generation","ref_index":22,"is_internal_anchor":true},{"citing_arxiv_id":"2512.13281","citing_title":"VideoASMR-Bench: Can AI-Generated ASMR Videos Fool VLMs and Humans?","ref_index":32,"is_internal_anchor":true},{"citing_arxiv_id":"2508.05635","citing_title":"Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation","ref_index":23,"is_internal_anchor":true},{"citing_arxiv_id":"2605.15185","citing_title":"Quantitative Video World Model Evaluation for Geometric-Consistency","ref_index":21,"is_internal_anchor":true},{"citing_arxiv_id":"2605.14269","citing_title":"PhyMotion: Structured 3D Motion Reward for Physics-Grounded Human Video Generation","ref_index":12,"is_internal_anchor":true},{"citing_arxiv_id":"2603.21743","citing_title":"CellFluxRL: Biologically-Constrained Virtual Cell Modeling via Reinforcement Learning","ref_index":31,"is_internal_anchor":true},{"citing_arxiv_id":"2503.21755","citing_title":"VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness","ref_index":72,"is_internal_anchor":true},{"citing_arxiv_id":"2505.13211","citing_title":"MAGI-1: Autoregressive Video Generation at Scale","ref_index":32,"is_internal_anchor":true},{"citing_arxiv_id":"2605.12090","citing_title":"World Action Models: The Next Frontier in Embodied AI","ref_index":213,"is_internal_anchor":true},{"citing_arxiv_id":"2605.10434","citing_title":"WorldReasonBench: Human-Aligned Stress Testing of Video Generators as Future World-State Predictors","ref_index":16,"is_internal_anchor":true},{"citing_arxiv_id":"2605.10806","citing_title":"PhyGround: Benchmarking Physical Reasoning in Generative World Models","ref_index":26,"is_internal_anchor":true},{"citing_arxiv_id":"2604.19193","citing_title":"How Far Are Video Models from True Multimodal Reasoning?","ref_index":48,"is_internal_anchor":true},{"citing_arxiv_id":"2604.07990","citing_title":"SceneScribe-1M: A Large-Scale Video Dataset with Comprehensive Geometric and Semantic Annotations","ref_index":34,"is_internal_anchor":true},{"citing_arxiv_id":"2605.07061","citing_title":"Do Joint Audio-Video Generation Models Understand Physics?","ref_index":28,"is_internal_anchor":true},{"citing_arxiv_id":"2604.15299","citing_title":"AnimationBench: Are Video Models Good at Character-Centric Animation?","ref_index":10,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":0,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/IWZ5ZRQFGNOUJ4M5HMBT663Y4O","json":"https://pith.science/pith/IWZ5ZRQFGNOUJ4M5HMBT663Y4O.json","graph_json":"https://pith.science/api/pith-number/IWZ5ZRQFGNOUJ4M5HMBT663Y4O/graph.json","events_json":"https://pith.science/api/pith-number/IWZ5ZRQFGNOUJ4M5HMBT663Y4O/events.json","paper":"https://pith.science/paper/IWZ5ZRQF"},"agent_actions":{"view_html":"https://pith.science/pith/IWZ5ZRQFGNOUJ4M5HMBT663Y4O","download_json":"https://pith.science/pith/IWZ5ZRQFGNOUJ4M5HMBT663Y4O.json","view_paper":"https://pith.science/paper/IWZ5ZRQF","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2410.05363&json=true","fetch_graph":"https://pith.science/api/pith-number/IWZ5ZRQFGNOUJ4M5HMBT663Y4O/graph.json","fetch_events":"https://pith.science/api/pith-number/IWZ5ZRQFGNOUJ4M5HMBT663Y4O/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/IWZ5ZRQFGNOUJ4M5HMBT663Y4O/action/timestamp_anchor","attest_storage":"https://pith.science/pith/IWZ5ZRQFGNOUJ4M5HMBT663Y4O/action/storage_attestation","attest_author":"https://pith.science/pith/IWZ5ZRQFGNOUJ4M5HMBT663Y4O/action/author_attestation","sign_citation":"https://pith.science/pith/IWZ5ZRQFGNOUJ4M5HMBT663Y4O/action/citation_signature","submit_replication":"https://pith.science/pith/IWZ5ZRQFGNOUJ4M5HMBT663Y4O/action/replication_record"}},"created_at":"2026-05-18T14:34:27.335115+00:00","updated_at":"2026-05-18T14:34:27.335115+00:00"}