Aero-World adapts a pretrained latent diffusion transformer for action-conditioned aerial video generation by injecting inertial action tokens and using a frozen latent-space Physics Probe for inertial consistency supervision during LoRA finetuning, with a new AeroBench benchmark showing improved AA
Serpiva et al
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
The survey frames VLA models as pipelines that generate progressively grounded action tokens and classifies those tokens into eight types to guide future development.
An open-sourced Unified Autonomy Stack fuses LiDAR, radar, vision and inertial data with sampling-based planning and control barrier functions to deliver resilient autonomy on aerial and ground robots in challenging real-world settings.
A survey of UAV vision-and-language navigation that establishes a methodological taxonomy, reviews resources and challenges, and proposes a forward-looking research roadmap.
GRaD-Nav++ combines 3D Gaussian Splatting simulation and differentiable RL to train an onboard VLA policy that achieves 50-83% success on language-guided drone navigation tasks in simulation and real hardware.
citing papers explorer
-
Vision-and-Language Navigation for UAVs: Progress, Challenges, and a Research Roadmap
A survey of UAV vision-and-language navigation that establishes a methodological taxonomy, reviews resources and challenges, and proposes a forward-looking research roadmap.