DeVI enables zero-shot physically plausible dexterous control by imitating synthetic videos via a hybrid 3D-human plus 2D-object tracking reward.
Depth anything: Unleashing the power of large-scale unlabeled data
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 4years
2026 4verdicts
UNVERDICTED 4roles
method 1polarities
use method 1representative citing papers
VGGT-Occ embeds geometric tokens via PA-DA and uses sequential coarse-to-fine gated fusion to reach 33.00% IoU and 21.08% mIoU on SurroundOcc-nuScenes while using only ~41M parameters in the occupancy head.
EponaV2 advances perception-free driving world models by forecasting comprehensive future 3D geometry and semantic representations, achieving SOTA planning performance on NAVSIM benchmarks.
Sword improves world model simulators for VLA policies by disentangling visual style from dynamics and bootstrapping latents for better consistency, outperforming baselines on LIBERO in generalization and RL post-training success.
citing papers explorer
-
DeVI: Physics-based Dexterous Human-Object Interaction via Synthetic Video Imitation
DeVI enables zero-shot physically plausible dexterous control by imitating synthetic videos via a hybrid 3D-human plus 2D-object tracking reward.
-
VGGT-Occ: Geometry-Grounded and Density-Aware Gated Fusion for 3D Occupancy Prediction
VGGT-Occ embeds geometric tokens via PA-DA and uses sequential coarse-to-fine gated fusion to reach 33.00% IoU and 21.08% mIoU on SurroundOcc-nuScenes while using only ~41M parameters in the occupancy head.
-
EponaV2: Driving World Model with Comprehensive Future Reasoning
EponaV2 advances perception-free driving world models by forecasting comprehensive future 3D geometry and semantic representations, achieving SOTA planning performance on NAVSIM benchmarks.
-
Sword: Style-Robust World Models as Simulators via Dynamic Latent Bootstrapping for VLA Policy Post-Training
Sword improves world model simulators for VLA policies by disentangling visual style from dynamics and bootstrapping latents for better consistency, outperforming baselines on LIBERO in generalization and RL post-training success.