Citynav: Language-goal aerial navigation dataset with geographic information

Jungdae Lee, Taiki Miyanishi, Shuhei Kurita, Koya Sakamoto, Daichi Azuma, Yutaka Matsuo, Nakamasa Inoue · 2025 · arXiv 2406.14240

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 4

citation-polarity summary

background 4

representative citing papers

CosFly-Track: A Large-Scale Multi-Modal Dataset for UAV Visual Tracking via Multi-Constraint Trajectory Optimization

cs.RO · 2026-05-18 · conditional · novelty 7.0 · 2 refs

CosFlyTrack provides 12,000 expert UAV trajectories with aligned RGB, depth, segmentation, pose, target state, and bilingual instructions to train visual tracking agents, yielding 53-69 point gains in success rate after fine-tuning.

Beyond Isolation: A Unified Benchmark for General-Purpose Navigation

cs.RO · 2026-05-10 · unverdicted · novelty 7.0

OmniNavBench is a unified benchmark for general-purpose navigation featuring composite multi-skill instructions, support for humanoid, quadrupedal and wheeled robots, and 1779 human teleoperated trajectories across 170 environments.

ESARBench: A Benchmark for Agentic UAV Embodied Search and Rescue

cs.RO · 2026-05-02 · unverdicted · novelty 7.0

ESARBench is the first unified benchmark for MLLM-driven UAV agents that must explore, locate clues, and decide on victim positions in photorealistic simulated SAR environments.

LookasideVLN: Direction-Aware Aerial Vision-and-Language Navigation

cs.CV · 2026-04-19 · unverdicted · novelty 7.0

LookasideVLN improves aerial vision-and-language navigation by encoding directional cues from instructions into an egocentric graph and lightweight knowledge base, outperforming prior methods like CityNavAgent even with single-step lookahead.

How Far Are Large Multimodal Models from Human-Level Spatial Action? A Benchmark for Goal-Oriented Embodied Navigation in Urban Airspace

cs.AI · 2026-04-09 · unverdicted · novelty 7.0

Large multimodal models display emerging but limited spatial action capabilities in goal-oriented urban 3D navigation, remaining far from human-level performance with errors diverging rapidly after critical decision points.

FineCog-Nav: Integrating Fine-grained Cognitive Modules for Zero-shot Multimodal UAV Navigation

cs.CV · 2026-04-17 · unverdicted · novelty 6.0

FineCog-Nav uses fine-grained cognitive modules driven by foundation models to outperform zero-shot baselines in UAV navigation and introduces the AerialVLN-Fine benchmark with refined instructions.

HTNav: A Hybrid Navigation Framework with Tiered Structure for Urban Aerial Vision-and-Language Navigation

cs.RO · 2026-04-10 · unverdicted · novelty 6.0

HTNav combines imitation and reinforcement learning in a staged, tiered structure with map learning to reach state-of-the-art performance on the CityNav benchmark for urban aerial navigation.

citing papers explorer

Showing 7 of 7 citing papers.

CosFly-Track: A Large-Scale Multi-Modal Dataset for UAV Visual Tracking via Multi-Constraint Trajectory Optimization cs.RO · 2026-05-18 · conditional · none · ref 3 · 2 links
CosFlyTrack provides 12,000 expert UAV trajectories with aligned RGB, depth, segmentation, pose, target state, and bilingual instructions to train visual tracking agents, yielding 53-69 point gains in success rate after fine-tuning.
Beyond Isolation: A Unified Benchmark for General-Purpose Navigation cs.RO · 2026-05-10 · unverdicted · none · ref 17
OmniNavBench is a unified benchmark for general-purpose navigation featuring composite multi-skill instructions, support for humanoid, quadrupedal and wheeled robots, and 1779 human teleoperated trajectories across 170 environments.
ESARBench: A Benchmark for Agentic UAV Embodied Search and Rescue cs.RO · 2026-05-02 · unverdicted · none · ref 31
ESARBench is the first unified benchmark for MLLM-driven UAV agents that must explore, locate clues, and decide on victim positions in photorealistic simulated SAR environments.
LookasideVLN: Direction-Aware Aerial Vision-and-Language Navigation cs.CV · 2026-04-19 · unverdicted · none · ref 23
LookasideVLN improves aerial vision-and-language navigation by encoding directional cues from instructions into an egocentric graph and lightweight knowledge base, outperforming prior methods like CityNavAgent even with single-step lookahead.
How Far Are Large Multimodal Models from Human-Level Spatial Action? A Benchmark for Goal-Oriented Embodied Navigation in Urban Airspace cs.AI · 2026-04-09 · unverdicted · none · ref 22
Large multimodal models display emerging but limited spatial action capabilities in goal-oriented urban 3D navigation, remaining far from human-level performance with errors diverging rapidly after critical decision points.
FineCog-Nav: Integrating Fine-grained Cognitive Modules for Zero-shot Multimodal UAV Navigation cs.CV · 2026-04-17 · unverdicted · none · ref 18
FineCog-Nav uses fine-grained cognitive modules driven by foundation models to outperform zero-shot baselines in UAV navigation and introduces the AerialVLN-Fine benchmark with refined instructions.
HTNav: A Hybrid Navigation Framework with Tiered Structure for Urban Aerial Vision-and-Language Navigation cs.RO · 2026-04-10 · unverdicted · none · ref 19
HTNav combines imitation and reinforcement learning in a staged, tiered structure with map learning to reach state-of-the-art performance on the CityNav benchmark for urban aerial navigation.

Citynav: Language-goal aerial navigation dataset with geographic information

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer