OA-VAT improves visual active tracking by combining instance-level prototype discrimination with occlusion-aware diffusion planning, reporting gains over prior SOTA on simulated and real drone benchmarks.
Diffusion policy: Visuomotor policy learning via action dif- fusion.The International Journal of Robotics Research
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
CUBic learns a shared tokenized representation for bimanual robot perception and control via unidirectional aggregation, bidirectional codebook coordination, and a unified diffusion policy, yielding higher coordination accuracy and task success on the RoboTwin benchmark.
ConsisVLA-4D adds cross-view semantic alignment, cross-object geometric fusion, and cross-scene dynamic reasoning to VLA models, delivering 21.6% and 41.5% gains plus 2.3x and 2.4x speedups on LIBERO and real-world tasks.
UniSplat learns consistent 3D geometry, appearance, and semantics from unposed images using dual masking, progressive Gaussian splatting, and recalibration to align predictions across tasks.
citing papers explorer
-
Instance-level Visual Active Tracking with Occlusion-Aware Planning
OA-VAT improves visual active tracking by combining instance-level prototype discrimination with occlusion-aware diffusion planning, reporting gains over prior SOTA on simulated and real drone benchmarks.
-
CUBic: Coordinated Unified Bimanual Perception and Control Framework
CUBic learns a shared tokenized representation for bimanual robot perception and control via unidirectional aggregation, bidirectional codebook coordination, and a unified diffusion policy, yielding higher coordination accuracy and task success on the RoboTwin benchmark.
-
ConsisVLA-4D: Advancing Spatiotemporal Consistency in Efficient 3D-Perception and 4D-Reasoning for Robotic Manipulation
ConsisVLA-4D adds cross-view semantic alignment, cross-object geometric fusion, and cross-scene dynamic reasoning to VLA models, delivering 21.6% and 41.5% gains plus 2.3x and 2.4x speedups on LIBERO and real-world tasks.
-
Learning 3D Representations for Spatial Intelligence from Unposed Multi-View Images
UniSplat learns consistent 3D geometry, appearance, and semantics from unposed images using dual masking, progressive Gaussian splatting, and recalibration to align predictions across tasks.