{"total":11,"items":[{"citing_arxiv_id":"2605.21372","ref_index":38,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Closed Loop Dynamic Driving Data Mixture for Real-Synthetic Co-Training","primary_cat":"cs.CV","submitted_at":"2026-05-20T16:36:26+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"AutoScale is a closed-loop data engine using Graph-RAE for scene representation and Cluster-GA for importance-based retrieval to improve real-synthetic co-training for autonomous driving.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.11596","ref_index":17,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"HorizonDrive: Self-Corrective Autoregressive World Model for Long-horizon Driving Simulation","primary_cat":"cs.CV","submitted_at":"2026-05-12T06:22:16+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"HorizonDrive is a new anti-drifting autoregressive training and distillation method that enables minute-scale stable driving video rollouts by making the teacher model rollout-capable via scheduled rollout recovery and teacher rollout DMD.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.06912","ref_index":29,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Advancing Reliable Synthetic Video Detection: Insights from the SAFE Challenge","primary_cat":"cs.CV","submitted_at":"2026-05-07T20:22:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"The SAFE challenge shows measurable progress in detecting synthetic videos across different generators but persistent weaknesses against post-processing operations.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.17147","ref_index":37,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ScenarioControl: Vision-Language Controllable Vectorized Latent Scenario Generation","primary_cat":"cs.CV","submitted_at":"2026-04-18T21:00:26+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"ScenarioControl introduces the first vision-language controllable generator for realistic vectorized 3D driving scenarios with temporal consistency across actor views.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.16592","ref_index":138,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Human Cognition in Machines: A Unified Perspective of World Models","primary_cat":"cs.RO","submitted_at":"2026-04-17T17:51:46+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"The paper introduces a unified framework for world models that fully incorporates all cognitive functions from Cognitive Architecture Theory, highlights under-researched areas in motivation and meta-cognition, and proposes Epistemic World Models as a new category for scientific discovery agents.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"World Models can pair imagined roll outs with safety estimates to guide actor-critic optimization under additional constraints, such as VLM-based safety signals [137]. At larger scale, foundation video models gen- erate zero-shot trajectory plans from internet-scale data, which can be converted intoexecutablerobotactions[29],whilebroaderdreamingpipelines[121],includ- ingCosmos-Drive-Dreams[138],GAIA-1,GAIA-2[67,143],andDream4Drive[213] use synthetic rollouts as training data. Benchmarks such as WorldLens evaluate the physical fidelity of these imagined trajectories [106], reinforcing imagination as a core mechanism for bridging perception, reasoning, and action in embodied settings. Motivation.Motivation in embodied World Models defines how agents evalu-"},{"citing_arxiv_id":"2603.28489","ref_index":183,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Video Generation Models as World Models: Efficient Paradigms, Architectures and Algorithms","primary_cat":"eess.IV","submitted_at":"2026-03-30T14:23:45+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Video generation models can function as world simulators if efficiency gaps in spatiotemporal modeling are bridged via organized paradigms, architectures, and algorithms.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"22 98.06 TABLE III APPLICATIONS OFVIDEO-BASEDWORLDMODELS Application Data Synthesis Interactive Simulation Generative Planning Autonomous Driving GAIA [18], [171], [172], DriveDreamer4D [173], InfinityDrive [174], Glad [175], STAGE [176], UniScene [177], WorldSplat [178], EOT-WM [179], WoV oGen [180],Cosmos-Drive- Dreams [181] Drive-WM [182], Vista [183], MiLA [184], ADriver-I [185], [186], Drivedreamer [19], MagicDrive-V2 [40], DriveArena [187], MAD [188] Epona [189], GenAD [190], DriveLaW [191], DrivingGPT [192], VaV AM [193] Embodied AI Vidar [194], DreamGen [195], GenMimic [196], RBench [197], GigaWorld-0 [198], RIGVid [199], LuciBot [200], Gen2Act [201], Dreamitate [202] World-Env [203], EV AC [204],"},{"citing_arxiv_id":"2601.20540","ref_index":57,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Advancing Open-source World Models","primary_cat":"cs.CV","submitted_at":"2026-01-28T12:37:01+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"LingBot-World is presented as an open-source world model that delivers high-fidelity simulation, minute-level contextual consistency, and real-time interactivity under one second latency.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2512.23421","ref_index":59,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"DriveLaW:Unifying Planning and Video Generation in a Latent Driving World","primary_cat":"cs.CV","submitted_at":"2025-12-29T12:32:27+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DriveLaW unifies video world modeling and trajectory planning by injecting video-generator latents into a diffusion planner, achieving SOTA video prediction and a new record on the NAVSIM planning benchmark.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2511.00062","ref_index":67,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"World Simulation with Video Foundation Models for Physical AI","primary_cat":"cs.CV","submitted_at":"2025-10-28T22:44:13+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Cosmos-Predict2.5 unifies text-to-world, image-to-world, and video-to-world generation in one model trained on 200M clips with RL post-training, delivering improved quality and control for physical AI.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Grounded sam: Assembling open-world models for diverse visual tasks.arXiv preprint arXiv:2401.14159, 2024. 22 [66] Xuanchi Ren, Yifan Lu, Tianshi Cao, Ruiyuan Gao, Shengyu Huang, Amirmojtaba Sabour, Tianchang Shen, Tobias Pfaff, Jay Zhangjie Wu, Runjian Chen, et al. Cosmos-drive-dreams: Scalable synthetic driving data generation with world foundation models.arXiv preprint arXiv:2506.09042, 2025. 3, 25, 28, 36 [67] Xuanchi Ren, Tianchang Shen, Jiahui Huang, Huan Ling, Yifan Lu, Merlin Nimier-David, Thomas Müller, Alexander Keller, Sanja Fidler, and Jun Gao. Gen3c: 3d-informed world-consistent video generation with precise camera control. InCVPR, 2025. 36 [68] Runway. Gen 3, 2024. URLhttps://runwayml.com/research/introducing-gen-3-alpha. 35 [69] Paul D Sampson."},{"citing_arxiv_id":"2510.10125","ref_index":38,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Ctrl-World: A Controllable Generative World Model for Robot Manipulation","primary_cat":"cs.RO","submitted_at":"2025-10-11T09:13:10+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A controllable world model trained on the DROID dataset generates consistent multi-view robot trajectories for over 20 seconds and improves generalist policy success rates by 44.7% via imagined trajectory fine-tuning.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2508.10934","ref_index":55,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ViPE: Video Pose Engine for 3D Geometric Perception","primary_cat":"cs.CV","submitted_at":"2025-08-12T18:39:13+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"ViPE estimates camera intrinsics, motion, and dense near-metric depth from uncalibrated videos, outperforming baselines on TUM and KITTI while releasing annotations for 96M frames across real and generated videos.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}