OneVL achieves superior accuracy to explicit chain-of-thought reasoning at answer-only latency by supervising latent tokens with a visual world model decoder that predicts future frames.
GaussianPretrain: A simple unified 3D Gaussian representation for visual pre-training in autonomous driving
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.CV 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
ShelfGaussian achieves state-of-the-art zero-shot semantic occupancy prediction on Occ3D-nuScenes by jointly supervising Gaussian representations with vision foundation model features at 2D image and 3D scene levels.
citing papers explorer
-
Xiaomi OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation
OneVL achieves superior accuracy to explicit chain-of-thought reasoning at answer-only latency by supervising latent tokens with a visual world model decoder that predicts future frames.
-
ShelfGaussian: Shelf-Supervised Open-Vocabulary Gaussian-based 3D Scene Understanding
ShelfGaussian achieves state-of-the-art zero-shot semantic occupancy prediction on Occ3D-nuScenes by jointly supervising Gaussian representations with vision foundation model features at 2D image and 3D scene levels.