pith. sign in

arxiv: 2605.09869 · v2 · pith:BQXXPF2Ynew · submitted 2026-05-11 · 💻 cs.RO · cs.CV

ConsistNav: Closing the Action Consistency Gap in Zero-Shot Object Navigation with Semantic Executive Control

Pith reviewed 2026-05-19 18:02 UTC · model grok-4.3

classification 💻 cs.RO cs.CV
keywords zero-shot object navigationaction consistency gapsemantic executivepersistent memoryrobot navigationembodied AIfinite-state controltraining-free navigation
0
0 comments X

The pith

A semantic executive with three coordinated modules closes the action consistency gap by enforcing persistent commitment to target pursuit in zero-shot object navigation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper points out that even when zero-shot methods spot a plausible target, agents often switch back and forth between exploring and pursuing or quit near the goal because semantic evidence gets reinterpreted fresh at every step. ConsistNav adds a training-free semantic executive layer on top of existing detectors and planners to stage pursuit in guarded phases, keep stable object hypotheses across frames, and block wasteful actions such as spinning in place. A reader would care because this keeps the agent from abandoning a found object and raises success without any retraining or changes to the underlying perception and planning code. Experiments on HM3D and MP3D show the approach reaches state-of-the-art numbers among compared zero-shot methods and lifts success rate by 11.4 percent and SPL by 7.9 percent over a controlled baseline on MP3D. Ablations and real-robot tests confirm the executive modules are what drive the gains.

Core claim

The paper claims that the action consistency gap—repeated reinterpretation of semantic evidence without persistent commitment across the episode—explains why agents oscillate or abandon targets near success, and that this gap can be closed by a semantic executive composed of a Finite-State Executive Controller that stages guarded pursuit phases, a Persistent Candidate Memory that accumulates cross-frame target evidence into stable hypotheses, and Stability-Aware Action Control that suppresses rotational stagnation and unverified stopping, all without modifying the detector or low-level planner.

What carries the argument

Semantic executive, a training-free coordinator that decides when semantic evidence should drive navigation and when it should be suppressed or revisited through its three modules.

If this is right

  • Agents maintain stable object hypotheses across multiple frames instead of reinterpreting evidence at each step.
  • Pursuit is staged through guarded semantic phases that prevent premature abandonment of detected targets.
  • Rotational stagnation and ineffective pursuit actions are suppressed while still allowing verified stopping.
  • The same detector and planner can be used with higher reliability simply by adding the executive layer.
  • The method transfers to real-world robot deployments without additional training.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same executive structure could be tested on other embodied tasks where agents must commit to a detected goal over time, such as object manipulation sequences.
  • Because the modules act after detection, they might combine with newer open-vocabulary detectors without retraining the consistency logic.
  • If the gap is truly central, similar executive controls could be added to language-guided exploration methods to reduce backtracking.
  • The approach leaves open whether the same gains appear when the underlying planner itself is also improved.

Load-bearing premise

The action consistency gap is the dominant failure mode in current zero-shot object navigation and the three executive modules can close it without creating new exploration failures or requiring detector or planner changes.

What would settle it

Compare oscillation frequency and abandonment rate between the baseline and ConsistNav in identical MP3D episodes and check whether the executive modules produce a clear drop in switches between exploration and pursuit while success rate rises.

Figures

Figures reproduced from arXiv: 2605.09869 by Defeng Gu, Haosen Wang, Kai Li, Liaoyuan Fan, Lutao Jiang, Tingbang Liang, Wenjian Hou, Yibin Wen, Yinqiang Zhang, Yizhou Zhao, Zhenyang Li, Zongqi He.

Figure 1
Figure 1. Figure 1: ConsistNav pipeline. ⃝1 Perception converts RGB-D and target cues through VLM scoring into value maps; ⃝2A ⃝2B planning maintains candidates and selects frontier/candidate subgoals; ⃝3 execution outputs LEFT, FORWARD, RIGHT, and STOP actions through the FSE controller. Thus, Ct stores accumulated evidence, qt gates planning, and at remains in the standard ObjectNav action space. The following subsections m… view at source ↗
Figure 2
Figure 2. Figure 2: Candidate Memory and FSE Controller. Left: Candidate Memory builds/stores the se￾mantic candidate map. Right: seven-state FSE transitions, with black/green for commitment/success, gray/yellow for invalidation/recovery, and blue for returning to search. Consistency score and priority. To decide which hypotheses can influence control, the executive first converts the memory fields into a consistency score s … view at source ↗
Figure 3
Figure 3. Figure 3: Simulation results on HM3Dv2. Qualitative comparison of ConsistNav, VLFM, and ApexNav. Each column shows one episode; green/blue paths denote reference/agent trajectories, and green/black frames denote success/failure. candidates become explicit search failures rather than unstable commitments, while infeasible and late-discovery cases remain dataset-level limits. 4.4 ABLATION STUDY Ablation analysis [PIT… view at source ↗
Figure 4
Figure 4. Figure 4: Failure-cause comparison. Outcome statistics for the Non-executive method and Consist￾Nav on HM3Dv1, HM3Dv2, and MP3D, covering verified success and five residual failure modes [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Real-world deployment comparison. Visual comparison of the Non-executive baseline and ConsistNav on four target tasks using the AgileX LIMO platform. The results illustrate that ConsistNav maintains target hypotheses, verifies close-range evidence, and stops reliably under real sensor and timing conditions. and path efficiency, and ablations show that each executive component contributes complementary gain… view at source ↗
read the original abstract

Zero-shot object navigation has advanced rapidly with open-vocabulary detectors, image--text models, and language-guided exploration. However, even after current methods detect a plausible target hypothesis, the agent may still oscillate between exploration and pursuit, or abandon the object near success. We identify this failure mode as an action consistency gap: semantic evidence is repeatedly reinterpreted at each step without persistent commitment across the episode. We introduce ConsistNav, a training-free zero-shot ObjectNav framework built around a semantic executive composed of three coordinated modules: Finite-State Executive Controller stages target pursuit through guarded semantic phases; Persistent Candidate Memory accumulates cross-frame target evidence into stable object hypotheses; and Stability-Aware Action Control suppresses rotational stagnation, ineffective pursuit, and unverified stopping. This design changes neither the detector nor the low-level planner; instead, it controls when semantic evidence should influence navigation and when it should be suppressed or revisited. We conduct extensive experiments on HM3D and MP3D, where ConsistNav achieves state-of-the-art results among compared zero-shot ObjectNav methods and improves SR by 11.4% and SPL by 7.9% over the controlled baseline on MP3D. Ablation studies and real-world deployment experiments further demonstrate the effectiveness and robustness of the proposed executive mechanism.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper identifies an 'action consistency gap' in zero-shot object navigation, where agents repeatedly reinterpret semantic evidence without persistent commitment, leading to oscillation between exploration and pursuit or premature abandonment of targets. It introduces ConsistNav, a training-free framework with a semantic executive consisting of three modules: a Finite-State Executive Controller that stages target pursuit through guarded phases, Persistent Candidate Memory that accumulates cross-frame evidence into stable hypotheses, and Stability-Aware Action Control that suppresses rotational stagnation and unverified stopping. The approach leaves the detector and low-level planner unchanged. Experiments on HM3D and MP3D report state-of-the-art results among zero-shot methods, with 11.4% higher success rate (SR) and 7.9% higher SPL over a controlled baseline on MP3D, plus supporting ablations and real-world deployment.

Significance. If the central claim holds, the work would offer a modular, training-free method to improve consistency in open-vocabulary navigation without retraining core perception or planning components. The explicit separation of executive control from the detector/planner, combined with real-world validation, strengthens potential for broader adoption in embodied AI. The identification of a specific failure mode and the provision of ablations are positive elements.

major comments (2)
  1. [§4 (Experiments)] §4 (Experiments) and associated tables: The reported 11.4% SR and 7.9% SPL gains on MP3D are presented as evidence that the modules close the action consistency gap, but the manuscript provides no episode-level diagnostics such as counts of explore/pursue switches, abandoned hypotheses, or rotational stagnation events before versus after adding the executive. Without these, it remains possible that gains arise from auxiliary effects of memory accumulation and stability filtering rather than enforced cross-step commitment, weakening attribution to the identified gap.
  2. [§3 (Method)] §3 (Method), description of the three modules: The Finite-State Executive Controller and Persistent Candidate Memory are presented as directly addressing reinterpretation without commitment, yet the design inserts a higher-level policy layer. A direct comparison to simpler non-executive heuristics (e.g., fixed hysteresis thresholds on detection confidence) would be needed to establish that the full three-module coordination is necessary for the observed gains rather than replicable by lighter mechanisms.
minor comments (2)
  1. [Abstract] Abstract and §1: The phrase 'guarded semantic phases' is introduced without a concise definition or diagram reference at first mention; a brief inline clarification or pointer to Figure 2 would improve readability.
  2. [§4.3 (Ablations)] §4.3 (Ablations): The ablation table would benefit from explicit reporting of standard deviations or confidence intervals across the N runs, consistent with the main result tables.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the attribution of our results to the action consistency gap. We respond to each major comment below and indicate where revisions will be made.

read point-by-point responses
  1. Referee: [§4 (Experiments)] §4 (Experiments) and associated tables: The reported 11.4% SR and 7.9% SPL gains on MP3D are presented as evidence that the modules close the action consistency gap, but the manuscript provides no episode-level diagnostics such as counts of explore/pursue switches, abandoned hypotheses, or rotational stagnation events before versus after adding the executive. Without these, it remains possible that gains arise from auxiliary effects of memory accumulation and stability filtering rather than enforced cross-step commitment, weakening attribution to the identified gap.

    Authors: We agree that explicit episode-level diagnostics would strengthen direct attribution to reduced oscillation and premature abandonment. The current ablations isolate module contributions and the overall SR/SPL gains align with fewer consistency failures, but without per-episode switch counts the link remains indirect. In the revised version we will add these diagnostics, reporting average explore/pursue transitions, abandoned hypotheses, and rotational stagnation events for the baseline versus ConsistNav on MP3D. revision: yes

  2. Referee: [§3 (Method)] §3 (Method), description of the three modules: The Finite-State Executive Controller and Persistent Candidate Memory are presented as directly addressing reinterpretation without commitment, yet the design inserts a higher-level policy layer. A direct comparison to simpler non-executive heuristics (e.g., fixed hysteresis thresholds on detection confidence) would be needed to establish that the full three-module coordination is necessary for the observed gains rather than replicable by lighter mechanisms.

    Authors: The three modules are coordinated: the finite-state controller stages commitment, memory accumulates evidence across frames, and stability control suppresses ineffective actions. A simple hysteresis threshold on confidence would address only part of the reinterpretation problem and would not stage pursuit phases or suppress rotational stagnation. Our module ablations already show that removing any component degrades performance. Nevertheless, to address the request we will add a controlled comparison against a hysteresis-only variant in the revised experiments. revision: yes

Circularity Check

0 steps flagged

No circularity: additive executive modules on unchanged base components

full rationale

The paper identifies an action consistency gap as an observed failure mode in existing zero-shot ObjectNav pipelines and introduces three new modules (Finite-State Executive Controller, Persistent Candidate Memory, Stability-Aware Action Control) that act as a training-free semantic executive layer. No equations, fitted parameters, or predictions are defined in terms of themselves; the modules are explicitly additive and leave the detector and low-level planner unchanged. Results (11.4% SR / 7.9% SPL gains on MP3D) are reported as empirical measurements against a controlled baseline rather than derived quantities. No self-citation chains, uniqueness theorems, or ansatzes are invoked to force the architecture. The derivation chain is therefore self-contained: problem observation plus modular design plus benchmark evaluation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 3 invented entities

The central claim depends on the premise that the newly introduced executive modules can enforce cross-step consistency using only existing semantic evidence; no numerical free parameters are stated, but three new control entities are postulated without external falsifiable handles beyond the reported experiments.

axioms (1)
  • domain assumption Semantic evidence from open-vocabulary detectors can be staged into guarded phases and accumulated across frames without losing necessary exploration coverage.
    Invoked by the Finite-State Executive Controller and Persistent Candidate Memory descriptions.
invented entities (3)
  • Finite-State Executive Controller no independent evidence
    purpose: Stages target pursuit through guarded semantic phases
    New component introduced to enforce consistency.
  • Persistent Candidate Memory no independent evidence
    purpose: Accumulates cross-frame target evidence into stable object hypotheses
    New memory structure for stable hypotheses.
  • Stability-Aware Action Control no independent evidence
    purpose: Suppresses rotational stagnation, ineffective pursuit, and unverified stopping
    New action filter to prevent oscillation and premature stops.

pith-pipeline@v0.9.0 · 5805 in / 1494 out tokens · 55667 ms · 2026-05-19T18:02:15.884466+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

69 extracted references · 69 canonical work pages · 2 internal anchors

  1. [1]

    Batra, Dhruv and Gokaslan, Aaron and Kembhavi, Aniruddha and Maksymets, Oleksandr and Mottaghi, Roozbeh and Savva, Manolis and Toshev, Alexander and Wijmans, Erik , journal =

  2. [2]

    Savva, Manolis and Kadian, Abhishek and Maksymets, Oleksandr and Zhao, Yili and Wijmans, Erik and Jain, Bhavana and Straub, Julian and Liu, Jia and Koltun, Vladlen and Malik, Jitendra and Parikh, Devi and Batra, Dhruv , booktitle =

  3. [3]

    and Undersander, Eric and Galuba, Wojciech and Westbury, Andrew and Chang, Angel X

    Ramakrishnan, Santhosh Kumar and Gokaslan, Aaron and Wijmans, Erik and Maksymets, Oleksandr and Clegg, Alexander and Turner, John M. and Undersander, Eric and Galuba, Wojciech and Westbury, Andrew and Chang, Angel X. and Savva, Manolis and Zhao, Yili and Batra, Dhruv , booktitle =

  4. [4]

    and Dai, Angela and Funkhouser, Thomas and Halber, Maciej and Niessner, Matthias and Savva, Manolis and Song, Shuran and Zeng, Andy and Zhang, Yinda , booktitle =

    Chang, Angel X. and Dai, Angela and Funkhouser, Thomas and Halber, Maciej and Niessner, Matthias and Savva, Manolis and Song, Shuran and Zeng, Andy and Zhang, Yinda , booktitle =. Matterport3D: Learning from

  5. [5]

    Wijmans, Erik and Kadian, Abhishek and Morcos, Ari and Lee, Stefan and Essa, Irfan and Parikh, Devi and Savva, Manolis and Batra, Dhruv , booktitle =

  6. [6]

    Advances in Neural Information Processing Systems (NeurIPS) , year =

    Object Goal Navigation using Goal-Oriented Semantic Exploration , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

  7. [7]

    and Chaplot, Devendra Singh and Al-Halah, Ziad and Malik, Jitendra and Grauman, Kristen , booktitle =

    Ramakrishnan, Santhosh K. and Chaplot, Devendra Singh and Al-Halah, Ziad and Malik, Jitendra and Grauman, Kristen , booktitle =

  8. [8]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =

    Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =

  9. [9]

    Yadav, Karmesh and Ramrakhya, Ram and Majumdar, Arjun and Yokoyama, Naoki and Baevski, Alexei and Kira, Zsolt and Maksymets, Oleksandr and Batra, Dhruv , journal =

  10. [10]

    Simple but Effective:

    Khandelwal, Apoorv and Weihs, Luca and Mottaghi, Roozbeh and Kembhavi, Aniruddha , booktitle =. Simple but Effective:

  11. [11]

    Majumdar, Arjun and Aggarwal, Gunjan and Devnani, Bhavika and Hoffman, Judy and Batra, Dhruv , booktitle =

  12. [12]

    Gadre, Samir Yitzhak and Wortsman, Mitchell and Ilharco, Gabriel and Schmidt, Ludwig and Song, Shuran , booktitle =

  13. [13]

    Yokoyama, Naoki and Ha, Sehoon and Batra, Dhruv and Wang, Jiuguang and Bucher, Bernadette , booktitle =

  14. [14]

    Yu, Bangguo and Kasaei, Hamidreza and Cao, Ming , booktitle =

  15. [15]

    Proceedings of the Conference on Robot Learning (CoRL) , year =

    Shah, Dhruv and Osi. Proceedings of the Conference on Robot Learning (CoRL) , year =

  16. [16]

    2025 , eprint =

    Zhang, Mingjie and Du, Yuheng and Wu, Chengkai and Zhou, Jinni and Qi, Zhenchao and Ma, Jun and Zhou, Boyu , journal =. 2025 , eprint =

  17. [17]

    Proceedings of the IEEE International Symposium on Computational Intelligence in Robotics and Automation (CIRA) , year =

    A Frontier-Based Approach for Autonomous Exploration , author =. Proceedings of the IEEE International Symposium on Computational Intelligence in Robotics and Automation (CIRA) , year =

  18. [18]

    Li, Junnan and Li, Dongxu and Savarese, Silvio and Hoi, Steven , booktitle =

  19. [19]

    Grounding

    Liu, Shilong and Zeng, Zhaoyang and Ren, Tianhe and Li, Feng and Zhang, Hao and Yang, Jie and Jiang, Qing and Li, Chunyuan and Yang, Jianwei and Su, Hang and Zhu, Jun and Zhang, Lei , booktitle =. Grounding

  20. [20]

    Wang, Chien-Yao and Bochkovskiy, Alexey and Liao, Hong-Yuan Mark , booktitle =

  21. [21]

    Faster Segment Anything: Towards Lightweight

    Zhang, Chaoning and Han, Dongshen and Qiao, Yu and Kim, Jung Uk and Bae, Sung-Ho and Lee, Seungkyu and Hong, Choong Seon , journal =. Faster Segment Anything: Towards Lightweight

  22. [22]

    Automated Planning: Theory and Practice , author =

  23. [23]

    and Precup, Doina and Singh, Satinder , journal =

    Sutton, Richard S. and Precup, Doina and Singh, Satinder , journal =. Between

  24. [24]

    Artificial Intelligence , volume =

    Planning and Acting in Partially Observable Stochastic Domains , author =. Artificial Intelligence , volume =

  25. [25]

    Proceedings of the International Conference on Machine Learning (ICML) , year =

    Learning Transferable Visual Models from Natural Language Supervision , author =. Proceedings of the International Conference on Machine Learning (ICML) , year =

  26. [26]

    Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , year =

    Segment Anything , author =. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , year =

  27. [27]

    Advances in Neural Information Processing Systems (NeurIPS) , year =

    Visual Instruction Tuning , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

  28. [28]

    arXiv preprint arXiv:2303.08774 , year =

  29. [29]

    Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , year =

    Emerging Properties in Self-Supervised Vision Transformers , author =. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , year =

  30. [30]

    International Conference on Learning Representations (ICLR) , year =

    Open-Vocabulary Object Detection via Vision and Language Knowledge Distillation , author =. International Conference on Learning Representations (ICLR) , year =

  31. [31]

    Proceedings of the European Conference on Computer Vision (ECCV) , year =

    Simple Open-Vocabulary Object Detection with Vision Transformers , author =. Proceedings of the European Conference on Computer Vision (ECCV) , year =

  32. [32]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =

    Grounded Language-Image Pre-Training , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =

  33. [33]

    Zhou, Kaiwen and Zheng, Kaizhi and Pryor, Connor and Shen, Yilin and Jin, Hongxia and Getoor, Lise and Wang, Xin Eric , booktitle =

  34. [34]

    Rajvanshi, Abhinav and Sikka, Karan and Lin, Xiao and Lee, Bhoram and Chiu, Han-Pang and Velasquez, Alvaro , booktitle =

  35. [35]

    Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) , year =

    Bridging Zero-shot Object Navigation and Foundation Models through Pixel-Guided Navigation Skill , author =. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) , year =

  36. [36]

    Kuang, Yuxuan and Lin, Hai and Jiang, Meng , booktitle =

  37. [37]

    Long, Yuxing and Cai, Wenzhe and Wang, Hongcheng and Zhan, Guanqi and Dong, Hao , journal =

  38. [38]

    Zhang, Lingfeng and Zhang, Qiang and Wang, Hao and Xiao, Erjia and Jiang, Zixuan and Chen, Honglei and Xu, Renjing , booktitle =

  39. [39]

    Yin, Hang and Xu, Xiuwei and Wu, Zhenyu and Zhou, Jie and Lu, Jiwen , booktitle =

  40. [40]

    Zhang, Jiazhao and Wang, Kunyu and Xu, Rongtao and Zhou, Gengze and Hong, Yicong and Fang, Xiaomeng and Wu, Qi and Zhang, Zhizheng and He, Wang , booktitle =

  41. [41]

    Learning to Explore Using Active Neural

    Chaplot, Devendra Singh and Gandhi, Dhiraj and Gupta, Saurabh and Gupta, Abhinav and Salakhutdinov, Ruslan , booktitle =. Learning to Explore Using Active Neural

  42. [42]

    Ramrakhya, Ram and Batra, Dhruv and Wijmans, Erik and Das, Abhishek , booktitle =

  43. [43]

    Deitke, Matt and VanderBilt, Eli and Herrasti, Alvaro and Weihs, Luca and Ehsani, Kiana and Salvador, Jordi and Han, Winson and Kolve, Eric and Kembhavi, Aniruddha and Mottaghi, Roozbeh , booktitle =

  44. [44]

    Maksymets, Oleksandr and Cartillier, Vincent and Gokaslan, Aaron and Wijmans, Erik and Galuba, Wojciech and Lee, Stefan and Batra, Dhruv , booktitle =

  45. [45]

    Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , year =

    Hierarchical Object-to-Zone Graph for Object Navigation , author =. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , year =

  46. [46]

    Hong, Yicong and Wu, Qi and Qi, Yuankai and Rodriguez-Opazo, Cristian and Gould, Stephen , booktitle =

  47. [47]

    An, Dong and Wang, Hanqing and Wang, Wenguan and Wang, Zun and Huang, Yan and He, Keji and Wang, Liang , journal =

  48. [48]

    On Evaluation of Embodied Navigation Agents

    On Evaluation of Embodied Navigation Agents , author =. arXiv preprint arXiv:1807.06757 , year =

  49. [49]

    A Survey of Embodied

    Duan, Jiafei and Yu, Samson and Tan, Hui Li and Zhu, Hongyuan and Tan, Cheston , journal =. A Survey of Embodied

  50. [50]

    Rosinol, Antoni and Abate, Marcus and Chang, Yun and Carlone, Luca , booktitle =

  51. [51]

    and Leutenegger, Stefan , booktitle =

    McCormac, John and Handa, Ankur and Davison, Andrew J. and Leutenegger, Stefan , booktitle =

  52. [52]

    Planning Algorithms , author =

  53. [53]

    Behavior Trees in Robotics and

    Colledanchise, Michele and. Behavior Trees in Robotics and

  54. [54]

    IEEE Robotics & Automation Magazine , volume =

    The Dynamic Window Approach to Collision Avoidance , author =. IEEE Robotics & Automation Magazine , volume =

  55. [55]

    Advances in Neural Information Processing Systems (NeurIPS) , year =

    Flamingo: A Visual Language Model for Few-Shot Learning , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

  56. [56]

    Driess, Danny and Xia, Fei and Sajjadi, Mehdi S. M. and Lynch, Corey and Chowdhery, Aakanksha and Ichter, Brian and Wahid, Ayzaan and Tompson, Jonathan and Vuong, Quan and Yu, Tianhe and Huang, Wenlong and Chebotar, Yevgen and Sermanet, Pierre and Duckworth, Daniel and Levine, Sergey and Vanhoucke, Vincent and Hausman, Karol and Tober, Marc and Zeng, Andy...

  57. [57]

    Zhu, Deyao and Chen, Jun and Shen, Xiaoqian and Li, Xiang and Elhoseiny, Mohamed , journal =

  58. [58]

    Science Robotics , year =

    Navigating to Objects in the Real World , author =. Science Robotics , year =

  59. [59]

    Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) , year =

    Visual Language Maps for Robot Navigation , author =. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) , year =

  60. [60]

    Shah, Dhruv and Eysenbach, Benjamin and Kahn, Gregory and Levine, Sergey , booktitle =

  61. [61]

    Advances in Neural Information Processing Systems (NeurIPS) , year =

    Think Before You Act: Decision Transformers with Working Memory , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

  62. [62]

    Brohan, Anthony and Brown, Noah and Carbajal, Justice and Chebotar, Yevgen and Chen, Xi and Choromanski, Krzysztof and Ding, Tianli and Driess, Danny and Dubey, Avinava and Finn, Chelsea and others , journal =

  63. [63]

    Ahn, Michael and Brohan, Anthony and Brown, Noah and Chebotar, Yevgen and Cortes, Omar and David, Byron and Finn, Chelsea and Fu, Chuyuan and Gober, Keerthana and Gopalakrishnan, Karol and others , booktitle =. Do As

  64. [64]

    Advances in Neural Information Processing Systems (NeurIPS) , year =

    Attention Is All You Need , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

  65. [65]

    International Conference on Learning Representations (ICLR) , year =

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , author =. International Conference on Learning Representations (ICLR) , year =

  66. [66]

    Mur-Artal, Raul and Montiel, J. M. M. and Tard. IEEE Transactions on Robotics , volume =

  67. [67]

    IEEE Transactions on Robotics , volume =

    Campos, Carlos and Elvira, Richard and Rodr. IEEE Transactions on Robotics , volume =

  68. [68]

    , booktitle =

    Quigley, Morgan and Conley, Ken and Gerkey, Brian and Faust, Josh and Foote, Tully and Leibs, Jeremy and Wheeler, Rob and Ng, Andrew Y. , booktitle =

  69. [69]

    Advances in Neural Information Processing Systems (NeurIPS) , year =

    Habitat 2.0: Training Home Assistants to Rearrange their Habitat , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =