pith. sign in

hub Canonical reference

Hume: Introducing system- 2 thinking in visual-language-action model.arXiv preprint arXiv:2505.21432

Canonical reference. 100% of citing Pith papers cite this work as background.

14 Pith papers citing it
Background 100% of classified citations

hub tools

citation-role summary

background 5

citation-polarity summary

years

2026 10 2025 4

roles

background 5

polarities

background 5

representative citing papers

UAV-Track VLA: Embodied Aerial Tracking via Vision-Language-Action Models

cs.CV · 2026-04-02 · conditional · novelty 6.0

UAV-Track VLA modifies the π0.5 VLA architecture with temporal compression and dual-branch decoding to reach 61.76% success and 269.65 average frames in long-distance pedestrian tracking on a new 890K-frame UAV dataset, while cutting inference latency by 33.4%.

citing papers explorer

Showing 14 of 14 citing papers.