arXiv preprint arXiv:2410.10563 , year=

· 2024 · arXiv 2410.10563

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Qwen-RobotWorld Technical Report: Unifying Embodied World Modeling through Language-Conditioned Video Generation

cs.CV · 2026-06-15 · unverdicted · novelty 6.0

Qwen-RobotWorld is a language-conditioned video world model using Double-Stream MMDiT, an 8.6M-frame embodied corpus, and progressive curriculum training that ranks first on EWMBench and DreamGen Bench.

From Structure to Synergy: A Survey of Vision-Language Perception Paradigm Evolution in Multimodal Large Language Models

cs.CL · 2026-06-24 · unverdicted · novelty 5.0

The survey formalizes MLLM perception as a unified vision-language capability and traces its evolution via a new five-stage taxonomy while outlining future challenges.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Qwen-RobotWorld Technical Report: Unifying Embodied World Modeling through Language-Conditioned Video Generation cs.CV · 2026-06-15 · unverdicted · none · ref 191
Qwen-RobotWorld is a language-conditioned video world model using Double-Stream MMDiT, an 8.6M-frame embodied corpus, and progressive curriculum training that ranks first on EWMBench and DreamGen Bench.

arXiv preprint arXiv:2410.10563 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer