Cogact: A foundational vision-language-action model for synergizing cognition and action in robotic manipulation

Qixiu Li, Yaobo Liang, Zeyu Wang, Lin Luo, Xi Chen, Mozheng Liao, Fangyun Wei, Yu Deng, Sicheng Xu, Yizhong Zhang, Xiaofan Wang, Bei Liu, Jianlong Fu, Jianmin Bao, Dong Chen, Yuanchun Shi, Jiaolong Yang, Baining Guo · 2024

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

VLAs are Confined yet Capable of Generalizing to Novel Instructions

cs.RO · 2025-05-06 · unverdicted · novelty 7.0

Averaging and temporally interpolating text latents in VLAs enables 83% success on novel task combinations in the libero-ood benchmark where SOTA models achieve under 15%.

UAV-Track VLA: Embodied Aerial Tracking via Vision-Language-Action Models

cs.CV · 2026-04-02 · conditional · novelty 6.0

UAV-Track VLA modifies the π0.5 VLA architecture with temporal compression and dual-branch decoding to reach 61.76% success and 269.65 average frames in long-distance pedestrian tracking on a new 890K-frame UAV dataset, while cutting inference latency by 33.4%.

citing papers explorer

Showing 2 of 2 citing papers.

VLAs are Confined yet Capable of Generalizing to Novel Instructions cs.RO · 2025-05-06 · unverdicted · none · ref 20
Averaging and temporally interpolating text latents in VLAs enables 83% success on novel task combinations in the libero-ood benchmark where SOTA models achieve under 15%.
UAV-Track VLA: Embodied Aerial Tracking via Vision-Language-Action Models cs.CV · 2026-04-02 · conditional · none · ref 33
UAV-Track VLA modifies the π0.5 VLA architecture with temporal compression and dual-branch decoding to reach 61.76% success and 269.65 average frames in long-distance pedestrian tracking on a new 890K-frame UAV dataset, while cutting inference latency by 33.4%.

Cogact: A foundational vision-language-action model for synergizing cognition and action in robotic manipulation

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer