pith. sign in

arxiv: 1903.01292 · v1 · pith:OOOT3QQZnew · submitted 2019-03-04 · 💻 cs.AI · cs.CV· cs.RO

The StreetLearn Environment and Dataset

classification 💻 cs.AI cs.CVcs.RO
keywords navigationenvironmentdatasetslearningstreetlearnagentcodedataset
0
0 comments X
read the original abstract

Navigation is a rich and well-grounded problem domain that drives progress in many different areas of research: perception, planning, memory, exploration, and optimisation in particular. Historically these challenges have been separately considered and solutions built that rely on stationary datasets - for example, recorded trajectories through an environment. These datasets cannot be used for decision-making and reinforcement learning, however, and in general the perspective of navigation as an interactive learning task, where the actions and behaviours of a learning agent are learned simultaneously with the perception and planning, is relatively unsupported. Thus, existing navigation benchmarks generally rely on static datasets (Geiger et al., 2013; Kendall et al., 2015) or simulators (Beattie et al., 2016; Shah et al., 2018). To support and validate research in end-to-end navigation, we present StreetLearn: an interactive, first-person, partially-observed visual environment that uses Google Street View for its photographic content and broad coverage, and give performance baselines for a challenging goal-driven navigation task. The environment code, baseline agent code, and the dataset are available at http://streetlearn.cc

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Audio-Visual Camera Pose Estimation with Passive Scene Sounds and In-the-Wild Video

    cs.CV 2025-12 unverdicted novelty 8.0

    Integrating direction-of-arrival spectra and binaural embeddings from passive audio into vision models improves relative camera pose estimation in in-the-wild videos and adds robustness to visual corruption.

  2. Navig-AI-tion: Navigation by Contextual AI and Spatial Audio

    cs.HC 2026-03 unverdicted novelty 7.0

    A system combining VLM landmark instructions with real-time corrective spatial audio reduces route deviations in a small user study compared to VLM-only and Google Maps audio baselines.

  3. Vision-Language Navigation for Aerial Robots: Towards the Era of Large Language Models

    cs.RO 2026-04 unverdicted novelty 4.0

    This survey organizes aerial vision-language navigation methods into five architectural categories, critically reviews evaluation infrastructure, and synthesizes seven open problems for LLM/VLM integration.