SEMNAV: Enhancing Visual Semantic Navigation in Robotics through Semantic Segmentation
Pith reviewed 2026-05-22 01:29 UTC · model grok-4.3
The pith
SEMNAV improves visual semantic navigation by using semantic segmentation maps instead of raw RGB images as input.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SEMNAV demonstrates that replacing raw RGB observations with semantic segmentation labels as the primary visual representation allows a navigation policy to achieve higher success rates when locating target objects in unseen environments. The model is trained in simulation using the HM3D dataset inside Habitat 2.0 and is then deployed on real robotic platforms, where the semantic input reduces the performance drop caused by visual domain differences between rendered scenes and actual camera footage.
What carries the argument
SEMNAV model that takes semantic segmentation maps as its main visual input for learning navigation policies toward target objects.
If this is right
- Higher success rates when locating objects in previously unseen simulated rooms using the HM3D dataset inside Habitat 2.0.
- Narrowed performance gap between simulation training and real-robot execution because semantic labels are less affected by rendering differences than raw pixels.
- Improved ability to navigate toward specific objects in practical settings after training only in simulation.
- A new curated dataset that supports further work on navigation models that rely on semantic rather than pixel input.
Where Pith is reading between the lines
- The same semantic input strategy could be tested on other robot tasks such as opening doors or placing objects where consistent object identity matters more than exact appearance.
- Training with semantic maps might let teams collect less real-world data because policies transfer more readily from simulation.
- The approach could be extended to environments with moving people or changing furniture to check whether segmentation still provides stable guidance.
Load-bearing premise
Semantic segmentation labels produced by an external model stay accurate enough in real-world scenes whose lighting, textures, and layouts differ from the simulator used for training.
What would settle it
Deploy the trained SEMNAV policy on a real robot in a new room where the segmentation network mislabels doors, furniture, or floors at high rates and measure whether success rates fall to the level of standard RGB-based models.
Figures
read the original abstract
Visual Semantic Navigation (VSN) is a fundamental problem in robotics, where an agent must navigate toward a target object in an unknown environment, mainly using visual information. Most state-of-the-art VSN models are trained in simulation environments, where rendered scenes of the real world are used, at best. These approaches typically rely on raw RGB data from the virtual scenes, which limits their ability to generalize to real-world environments due to domain adaptation issues. To tackle this problem, in this work, we propose SEMNAV, a novel approach that leverages semantic segmentation as the main visual input representation of the environment to enhance the agent's perception and decision-making capabilities. By explicitly incorporating this type of high-level semantic information, our model learns robust navigation policies that improve generalization across unseen environments, both in simulated and real world settings. We also introduce the SEMNAV dataset, a newly curated dataset designed for training semantic segmentation-aware navigation models like SEMNAV. Our approach is evaluated extensively in both simulated environments and with real-world robotic platforms. Experimental results demonstrate that SEMNAV outperforms existing state-of-the-art VSN models, achieving higher success rates in the Habitat 2.0 simulation environment, using the HM3D dataset. Furthermore, our real-world experiments highlight the effectiveness of semantic segmentation in mitigating the sim-to-real gap, making our model a promising solution for practical VSN-based robotic applications. The code and datasets are accessible at https://github.com/gramuah/semnav
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes SEMNAV, a visual semantic navigation model that replaces raw RGB inputs with semantic segmentation maps to improve policy robustness and sim-to-real transfer. It introduces a curated SEMNAV dataset for training such models and reports superior success rates over prior VSN methods in Habitat 2.0 using HM3D, together with real-robot trials that attribute gains to the semantic representation.
Significance. If the performance gains prove robust and the segmentation assumption holds on real imagery, the work would offer a practical route to reducing domain shift in robotic navigation without heavy reliance on image-level adaptation techniques. The release of code, datasets, and a segmentation-aware benchmark would be a useful community resource for VSN research.
major comments (2)
- [Real-world Experiments] Real-world Experiments section: The claim that semantic segmentation mitigates the sim-to-real gap is load-bearing yet rests on an untested assumption. No mIoU, per-class accuracy, or other quantitative segmentation metrics are supplied for the external model’s output on the actual real-robot camera images; without these, observed success-rate improvements cannot be confidently attributed to the semantic input rather than to segmentation errors or other factors.
- [Experimental Results] Experimental Results (tables reporting success rate, SPL, etc.): The abstract states higher success rates than SOTA VSN models, but the manuscript supplies no error bars, ablation studies isolating the segmentation component, or statistical tests across random seeds. This weakens the central empirical claim that the approach outperforms existing methods under standard controls.
minor comments (2)
- [§3.1] The notation for the navigation policy input (segmentation map versus RGB) should be defined explicitly in §3.1 to avoid ambiguity when comparing to prior RGB-only baselines.
- [Figures] Figure captions for the real-robot setup could clarify the exact camera intrinsics and lighting conditions used, aiding reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below, indicating where revisions will be made to strengthen the work.
read point-by-point responses
-
Referee: [Real-world Experiments] Real-world Experiments section: The claim that semantic segmentation mitigates the sim-to-real gap is load-bearing yet rests on an untested assumption. No mIoU, per-class accuracy, or other quantitative segmentation metrics are supplied for the external model’s output on the actual real-robot camera images; without these, observed success-rate improvements cannot be confidently attributed to the semantic input rather than to segmentation errors or other factors.
Authors: We agree that quantitative segmentation metrics on real-robot images would strengthen attribution of the performance gains. However, ground-truth semantic annotations are not available for the real-world camera images used in our experiments, precluding computation of mIoU or per-class accuracy. In the revised manuscript we will add qualitative visualizations of segmentation outputs on representative real-robot images together with a discussion of observed segmentation quality and potential error sources. We believe the combination of these visualizations and the reported real-world success-rate improvements still supports the value of semantic inputs, while acknowledging the limitation noted by the referee. revision: partial
-
Referee: [Experimental Results] Experimental Results (tables reporting success rate, SPL, etc.): The abstract states higher success rates than SOTA VSN models, but the manuscript supplies no error bars, ablation studies isolating the segmentation component, or statistical tests across random seeds. This weakens the central empirical claim that the approach outperforms existing methods under standard controls.
Authors: We acknowledge that error bars, ablations, and statistical tests would provide stronger empirical support. In the revision we will re-run the simulation experiments across multiple random seeds, add error bars to the success-rate and SPL tables, include an ablation comparing semantic-segmentation inputs against RGB inputs, and report statistical significance tests (e.g., paired t-tests) on the performance differences. revision: yes
- Quantitative mIoU and per-class accuracy for the external segmentation model on real-robot images, due to the absence of ground-truth annotations for those images.
Circularity Check
No circularity: empirical performance claims on held-out tests
full rationale
The paper describes an empirical ML system (SEMNAV) that replaces RGB input with semantic segmentation labels, trains a navigation policy on a curated dataset, and reports success rates on held-out Habitat 2.0 / HM3D episodes plus real-robot trials. These are measured outcomes from standard train/test splits and physical experiments, not quantities obtained by fitting a parameter to a subset and then relabeling the same quantity as a prediction, nor by self-defining a metric in terms of itself. No equations or uniqueness theorems are invoked that reduce the central claim to a self-citation chain or an ansatz smuggled from prior work by the same authors. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Semantic segmentation labels produced by an off-the-shelf model are sufficiently accurate and domain-invariant for policy learning in both simulation and real environments.
invented entities (2)
-
SEMNAV model
no independent evidence
-
SEMNAV dataset
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
our model learns robust navigation policies that improve generalization across unseen environments... by explicitly incorporating this type of high-level semantic information
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
Designing Privacy-Preserving Visual Perception for Robot Navigation Based on User Privacy Preferences
User studies reveal preferences for visual abstractions and distance-dependent low-resolution capture, leading to a configurable privacy policy for robot navigation.
Reference graph
Works this paper leans on
-
[1]
Simultaneous localization and mapping: part i,
H. Durrant-Whyte and T. Bailey, “Simultaneous localization and mapping: part i,” IEEE Robotics and Automation Magazine , vol. 13, no. 2, pp. 99–110, 2006
work page 2006
-
[2]
Obvi-slam: Long-term object- visual slam,
A. Adkins, T. Chen, and J. Biswas, “Obvi-slam: Long-term object- visual slam,” IEEE Robotics and Automation Letters , vol. 9, no. 3, pp. 2909–2916, 2024
work page 2024
-
[3]
Kimera: an open- source library for real-time metric-semantic localization and mapping,
A. Rosinol, M. Abate, Y . Chang, and L. Carlone, “Kimera: an open- source library for real-time metric-semantic localization and mapping,” ICRA, 2020
work page 2020
-
[4]
C. Cadena, L. Carlone, H. Carrillo, Y . Latif, D. Scaramuzza, J. Neira, I. Reid, and J. J. Leonard, “Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age,” IEEE Transactions on Robotics , vol. 32, no. 6, pp. 1309–1332, 2016
work page 2016
-
[5]
Habitat-Web: Learning Embodied Object-Search Strategies from Human Demon- strations at Scale,
R. Ramrakhya, E. Undersander, D. Batra, and A. Das, “Habitat-Web: Learning Embodied Object-Search Strategies from Human Demon- strations at Scale,” in CVPR, 2022
work page 2022
-
[6]
Offline visual representation learning for embodied navigation,
K. Yadav, R. Ramrakhya, A. Majumdar, V .-P. Berges, S. Kuhar, D. Ba- tra, A. Baevski, and O. Maksymets, “Offline visual representation learning for embodied navigation,” in ICLR, 2023
work page 2023
-
[7]
Ob- ject Goal Navigation using Goal-Oriented Semantic Exploration,
D. S. Chaplot, D. Gandhi, A. Gupta, and R. Salakhutdinov, “Ob- ject Goal Navigation using Goal-Oriented Semantic Exploration,” in NeurIPS, 2020
work page 2020
-
[8]
Semantic Visual Navigation by Watching Youtube Videos,
M. Chang, A. Gupta, and S. Gupta, “Semantic Visual Navigation by Watching Youtube Videos,” in NeurIPS, 2020
work page 2020
-
[9]
DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames,
E. Wijmans, A. Kadian, A. Morcos, S. Lee, I. Essa, D. Parikh, M. Savva, and D. Batra, “DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames,” in ICLR, 2020
work page 2020
-
[10]
Multi-agent embodied visual semantic navigation with scene prior knowledge,
X. Liu, D. Guo, H. Liu, and F. Sun, “Multi-agent embodied visual semantic navigation with scene prior knowledge,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 3154–3161, 2022
work page 2022
-
[11]
HM3D- OVON: A dataset and benchmark for open-vocabulary object goal navigation,
N. Yokoyama, R. Ramrakhya, A. Das, D. Batra, and S. Ha, “HM3D- OVON: A dataset and benchmark for open-vocabulary object goal navigation,” IROS, 2024
work page 2024
-
[12]
Auxiliary tasks and exploration enable ObjectGoal navigation,
J. Ye, D. Batra, A. Das, and E. Wijmans, “Auxiliary tasks and exploration enable ObjectGoal navigation,” in ICCV, 2021
work page 2021
-
[13]
Visual semantic navigation using scene priors,
W. Yang, X. Wang, A. Farhadi, A. K. Gupta, and R. Mottaghi, “Visual semantic navigation using scene priors,” ICLR, 2018
work page 2018
-
[14]
An object-driven navigation strategy based on active perception and semantic association,
Y . Guo, J. Sun, R. Zhang, Z. Jiang, Z. Mi, C. Yao, X. Ban, and M. S. Obaidat, “An object-driven navigation strategy based on active perception and semantic association,” IEEE Robotics and Automation Letters, vol. 9, no. 8, pp. 7110–7117, 2024
work page 2024
-
[15]
Semantic policy network for zero-shot object goal visual navigation,
Q. Zhao, L. Zhang, B. He, and Z. Liu, “Semantic policy network for zero-shot object goal visual navigation,” IEEE Robotics and Automation Letters, vol. 8, no. 11, pp. 7655–7662, 2023
work page 2023
-
[16]
K. Yadav, J. Krantz, R. Ramrakhya, S. K. Ramakrishnan, J. Yang, A. Wang, J. Turner, A. Gokaslan, V .-P. Berges, R. Mootaghi, O. Maksymets, A. X. Chang, M. Savva, A. Clegg, D. S. Chaplot, and D. Batra, “Habitat challenge 2023,” https://aihabitat.org/challenge/ 2023/, 2023
work page 2023
-
[17]
Habitat 2.0: Training home assistants to rearrange their habitat,
A. Szot, A. Clegg, E. Undersander, E. Wijmans, Y . Zhao, J. Turner, N. Maestre, M. Mukadam, D. S. Chaplot, O. Maksymets, A. Gokaslan, V . V ondruˇs, S. Dharur, F. Meier, W. Galuba, A. Chang, Z. Kira, Fig. 7. Qualitative results of the robot successfully navigating in the real world toward a sofa, a television, and a chair. V . Koltun, J. Malik, M. Savva, ...
work page 2021
-
[18]
AI2-THOR: An Interactive 3D Environment for Visual AI
E. Kolve, R. Mottaghi, W. Han, E. VanderBilt, L. Weihs, A. Herrasti, M. Deitke, K. Ehsani, D. Gordon, Y . Zhu, A. Kembhavi, A. K. Gupta, and A. Farhadi, “Ai2-thor: An interactive 3d environment for visual ai,” ArXiv, vol. abs/1712.05474, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[19]
ProcTHOR: Large-Scale Embodied AI Using Procedural Generation,
M. Deitke, E. VanderBilt, A. Herrasti, L. Weihs, J. Salvador, K. Ehsani, W. Han, E. Kolve, A. Farhadi, A. Kembhavi, and R. Mottaghi, “ProcTHOR: Large-Scale Embodied AI Using Procedural Generation,” in NeurIPS, 2022, outstanding Paper Award
work page 2022
-
[20]
Habitat-Matterport 3D Dataset (HM3D): 1000 large-scale 3D environments for embodied AI,
S. K. Ramakrishnan, A. Gokaslan, E. Wijmans, O. Maksymets, A. Clegg, J. Turner, E. Undersander, W. Galuba, A. Westbury, A. X. Chang, M. Savva, Y . Zhao, and D. Batra, “Habitat-Matterport 3D Dataset (HM3D): 1000 large-scale 3D environments for embodied AI,” in NeurIPS, 2021
work page 2021
-
[21]
Habitat-matterport 3d semantics dataset,
K. Yadav, R. Ramrakhya, S. K. Ramakrishnan, T. Gervet, J. Turner, A. Gokaslan, N. Maestre, A. X. Chang, D. Batra, M. Savva et al. , “Habitat-matterport 3d semantics dataset,” arXiv preprint arXiv:2210.05633, 2022
-
[22]
Navigating to Objects in the Real World,
T. Gervet, S. Chintala, D. Batra, J. Malik, and D. S. Chaplot, “Navigating to Objects in the Real World,” Science Robotics , 2022
work page 2022
-
[23]
Exploitation- guided exploration for semantic embodied navigation,
J. Wasserman, G. Chowdhary, A. Gupta, and U. Jain, “Exploitation- guided exploration for semantic embodied navigation,” ICRA, 2024
work page 2024
-
[24]
Visual semantic navi- gation with real robots,
C. Guti ´errez- ´Alvarez, P. R ´ıos-Navarro, R. Flor-Rodr ´ıguez, F. J. Acevedo-Rodr´ıguez, and R. J. L ´opez-Sastre, “Visual semantic navi- gation with real robots,” Applied Intelligence , vol. 55, 2025
work page 2025
-
[25]
Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam,
C. Campos, R. Elvira, J. J. G. Rodr ´ıguez, J. M. M. Montiel, and J. D. Tard ´os, “Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam,” IEEE Transactions on Robotics , 2021
work page 2021
-
[26]
Slam++: Simultaneous localisation and mapping at the level of objects,
R. F. Salas-Moreno, R. A. Newcombe, H. Strasdat, P. H. Kelly, and A. J. Davison, “Slam++: Simultaneous localisation and mapping at the level of objects,” in 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 1352–1359
work page 2013
-
[27]
Deepfactors: Real-time probabilistic dense monocular slam,
J. Czarnowski, T. Laidlow, R. Clark, and A. J. Davison, “Deepfactors: Real-time probabilistic dense monocular slam,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 721–728, 2020
work page 2020
-
[28]
Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning,
Y . Zhu, R. Mottaghi, E. Kolve, J. J. Lim, A. Gupta, L. Fei-Fei, and A. Farhadi, “Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning,” in ICLR, 2017
work page 2017
-
[29]
H. Wang, Y . Wang, F. Zhong, M. Wu, J. Zhang, Y . Wang, and H. Dong, “Learning semantic-agnostic and spatial-aware representation for gen- eralizable visual-audio navigation,” IEEE Robotics and Automation Letters, 2023
work page 2023
-
[30]
Multi-goal audio-visual navigation us- ing sound direction map,
H. Kondoh and A. Kanezaki, “Multi-goal audio-visual navigation us- ing sound direction map,” in 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , 2023, pp. 5219–5226
work page 2023
-
[31]
HSP- Nav: Hierarchical scene prior learning for visual semantic navigation towards real settings,
J. Kang, B. Chen, P. Zhong, H. Yang, Y . Sheng, and J. Wang, “HSP- Nav: Hierarchical scene prior learning for visual semantic navigation towards real settings,” ICRA, 2024
work page 2024
-
[32]
Enhancing scene under- standing for vision-and-language navigation by knowledge awareness,
F. Gao, J. Tang, J. Wang, S. Li, and J. Yu, “Enhancing scene under- standing for vision-and-language navigation by knowledge awareness,” IEEE Robotics and Automation Letters , vol. 9, no. 12, pp. 10 874– 10 881, 2024
work page 2024
-
[33]
L. Yue, D. Zhou, L. Xie, F. Zhang, Y . Yan, and E. Yin, “Safe-vln: Collision avoidance for vision-and-language navigation of autonomous robots operating in continuous environments,” IEEE Robotics and Automation Letters, vol. 9, no. 6, pp. 4918–4925, 2024
work page 2024
-
[34]
Boosting efficient reinforcement learning for vision-and-language navigation with open- sourced llm,
J. Wang, T. Wang, W. Cai, L. Xu, and C. Sun, “Boosting efficient reinforcement learning for vision-and-language navigation with open- sourced llm,” IEEE Robotics and Automation Letters , vol. 10, no. 1, pp. 612–619, 2025
work page 2025
-
[35]
Denoising diffusion probabilistic models,
J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” in Proceedings of the 34th International Conference on Neural Information Processing Systems , 2020
work page 2020
-
[36]
High-resolution image synthesis with latent diffusion models,
R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR) , 2022
work page 2022
-
[37]
ViNT: A foundation model for visual navigation,
D. Shah, A. Sridhar, N. Dashora, K. Stachowicz, K. Black, N. Hirose, and S. Levine, “ViNT: A foundation model for visual navigation,” in 7th Conference on Robot Learning (CoRL) , 2023, pp. 1–23
work page 2023
-
[38]
Flownav: Combining flow matching and depth priors for efficient navigation,
S. Gode, A. Nayak, D. N. P. Oliveira, M. Krawez, C. Schmid, and W. Burgard, “Flownav: Combining flow matching and depth priors for efficient navigation,” 2025. [Online]. Available: https: //arxiv.org/abs/2411.09524
-
[39]
Visual navigation using a webcam based on semantic segmentation for indoor robots,
M. Adachi, S. Shatari, and R. Miyamoto, “Visual navigation using a webcam based on semantic segmentation for indoor robots,” in 2019 15th International Conference on Signal-Image Technology and Internet-Based Systems (SITIS) , 2019, pp. 15–21
work page 2019
-
[40]
M. Adachi, K. Honda, J. Xue, H. Sudo, Y . Ueda, Y . Yuda, M. Wada, and R. Miyamoto, “Practical implementation of visual navigation based on semantic segmentation for human-centric environments,” Journal of Robotics and Mechatronics , vol. 35, no. 6, pp. 1419–1434, 2023
work page 2023
-
[41]
Visual representations for semantic target driven naviga- tion,
A. Mousavian, A. Toshev, M. Fi ˇser, J. Ko ˇseck´a, A. Wahid, and J. Davidson, “Visual representations for semantic target driven naviga- tion,” in 2019 International Conference on Robotics and Automation (ICRA), 2019, pp. 8846–8852
work page 2019
-
[42]
Indoor segmenta- tion and support inference from rgbd images,
N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmenta- tion and support inference from rgbd images,” inEuropean Conference on Computer Vision , 2012
work page 2012
-
[43]
Deep residual learning for image recognition,
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2016, pp. 770–778
work page 2016
-
[44]
Emerging properties in self-supervised vision trans- formers,
M. Caron, H. Touvron, I. Misra, H. J’egou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging properties in self-supervised vision trans- formers,” 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9630–9640, 2021
work page 2021
-
[45]
Learning phrase representations using RNN encoder–decoder for statistical machine translation,
K. Cho, B. van Merri ¨enboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y . Bengio, “Learning phrase representations using RNN encoder–decoder for statistical machine translation,” in Pro- ceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) . Association for Computational Linguistics, Oct. 2014, pp. 1724–1734
work page 2014
-
[46]
ROS wrapper for Kobuki base Turtlebot 2,
K. Ltd., “ROS wrapper for Kobuki base Turtlebot 2,” 2023. [Online]. Available: https://github.com/yujinrobot/kobuki.git
work page 2023
-
[47]
Efficient rgb-d semantic segmentation for indoor scene analy- sis,
D. Seichter, M. K ¨ohler, B. Lewandowski, T. Wengefeld, and H.-M. Groß, “Efficient rgb-d semantic segmentation for indoor scene analy- sis,” 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 13 525–13 531, 2020
work page 2021
-
[48]
PIRLNav: Pre- training with Imitation and RL Finetuning for ObjectNav,
R. Ramrakhya, D. Batra, E. Wijmans, and A. Das, “PIRLNav: Pre- training with Imitation and RL Finetuning for ObjectNav,” in CVPR, 2023
work page 2023
-
[49]
DD-PPO: Learning nearperfect pointgoal navigators from 2.5 billion frames,
E. Wijmans, A. Kadian, A. Morcos, S. Lee, I. Essa, D. Parikh, M. Savva, and D. Batra, “DD-PPO: Learning nearperfect pointgoal navigators from 2.5 billion frames,” in ICLR, 2019
work page 2019
-
[50]
Ovrl-v2: A simple state-of-art baseline for imagenav and objectnav,
K. Yadav, A. Majumdar, R. Ramrakhya, N. Yokoyama, A. Baevski, Z. Kira, O. Maksymets, and D. Batra, “Ovrl-v2: A simple state-of-art baseline for imagenav and objectnav,” arXiv preprint arXiv:2303.07798, 2023
-
[51]
MOPA: Modular object navigation with pointgoal agents,
S. Raychaudhuri, T. Campari, U. Jain, M. Savva, and A. X. Chang, “MOPA: Modular object navigation with pointgoal agents,” in WACV, 2024
work page 2024
-
[52]
Homerobot: Open-vocabulary mobile manipulation,
S. Yenamandra, A. Ramachandran, K. Yadav, A. Wang, M. Khanna, T. Gervet, T.-Y . Yang, V . Jain, A. W. Clegg, J. Turner, Z. Kira, M. Savva, A. Chang, D. S. Chaplot, D. Batra, R. Mottaghi, Y . Bisk, and C. Paxton, “Homerobot: Open-vocabulary mobile manipulation,” 2024
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.