pith. sign in

arxiv: 1806.00047 · v1 · pith:YTUCDRSWnew · submitted 2018-05-31 · 💻 cs.AI · cs.CL· cs.CV· cs.LG· cs.RO

Following High-level Navigation Instructions on a Simulated Quadcopter with Imitation Learning

classification 💻 cs.AI cs.CLcs.CVcs.LGcs.RO
keywords explicitgsmninstructionsmappingmodelnetworkfollowinghigh-level
0
0 comments X
read the original abstract

We introduce a method for following high-level navigation instructions by mapping directly from images, instructions and pose estimates to continuous low-level velocity commands for real-time control. The Grounded Semantic Mapping Network (GSMN) is a fully-differentiable neural network architecture that builds an explicit semantic map in the world reference frame by incorporating a pinhole camera projection model within the network. The information stored in the map is learned from experience, while the local-to-world transformation is computed explicitly. We train the model using DAggerFM, a modified variant of DAgger that trades tabular convergence guarantees for improved training speed and memory use. We test GSMN in virtual environments on a realistic quadcopter simulator and show that incorporating an explicit mapping and grounding modules allows GSMN to outperform strong neural baselines and almost reach an expert policy performance. Finally, we analyze the learned map representations and show that using an explicit map leads to an interpretable instruction-following model.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Towards Precise Intent-Aligned VLA Aerial Navigation via Expert-Guided GRPO

    cs.RO 2026-06 unverdicted novelty 4.0

    EG-GRPO augments VLA aerial navigation with expert-guided group relative policy optimization and a faster simulation pipeline, claiming 2.13x success rate and 60.9% better intent alignment versus SFT baseline.