arxiv: 2510.20685 · v4 · submitted 2025-10-23 · 💻 cs.RO

C-NAV: Towards Self-Evolving Continual Object Navigation in Open World

Ming-Ming Yu , Fei Zhu , Wenzhuo Liu , Yirong Yang , Qunbo Wang , Wenjun Wu , Jing Liu This is my paper

Pith reviewed 2026-05-18 04:29 UTC · model grok-4.3

classification 💻 cs.RO

keywords continual object navigationanti-forgetting mechanismfeature distillationfeature replayadaptive samplingembodied agentsvisual navigationopen world

0 comments

The pith

A dual-path anti-forgetting system lets navigation agents learn new objects in changing environments while retaining old skills and using far less memory than full-history methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to enable embodied agents to acquire navigation skills for new object categories in dynamic open-world settings without losing proficiency on previously learned categories. It introduces a dedicated continual object navigation benchmark to measure this ability and presents the C-Nav framework as a solution. C-Nav combines feature distillation to align multi-modal inputs into a stable representation space with feature replay to preserve policy behavior in the action decoder, plus an adaptive sampling step that chooses diverse experiences to limit storage. Experiments across model architectures show the approach surpasses prior methods and even exceeds baselines that retain complete past trajectories, all while cutting memory needs. If correct, this would allow agents to operate continuously in evolving real-world spaces without periodic full retraining or unbounded data accumulation.

Core claim

C-Nav integrates a dual-path anti-forgetting mechanism—feature distillation that aligns multi-modal inputs into a consistent representation space for representation consistency and feature replay that retains temporal features within the action decoder for policy consistency—together with an adaptive sampling strategy that selects diverse and informative experiences to reduce redundancy and memory overhead, yielding superior performance on the continual object navigation benchmark.

What carries the argument

Dual-path anti-forgetting mechanism of feature distillation for representation consistency and feature replay for policy consistency, combined with adaptive sampling for experience selection.

If this is right

Agents maintain high success rates on old object categories while gaining competence on new ones without retraining from scratch.
Memory footprint stays low because only selected experiences are retained rather than full trajectories.
The same framework delivers gains across different underlying model architectures.
Navigation systems can handle open-world evolution with less risk of performance collapse over time.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar dual-consistency paths could transfer to other embodied continual tasks such as manipulation or exploration.
Real-robot deployments with streaming sensor data would provide a direct test of the sampling strategy under non-stationary conditions.
Scaling the number of categories to dozens or hundreds would clarify where the consistency mechanisms begin to degrade.

Load-bearing premise

The dual-path mechanism of feature distillation and feature replay plus adaptive sampling will continue to block catastrophic forgetting when object categories and environmental shifts grow much larger than the tested benchmark scenarios.

What would settle it

An experiment that doubles or triples the number of object categories or introduces more frequent scene changes and then measures a sharp drop in success rate on previously learned navigation tasks.

Figures

Figures reproduced from arXiv: 2510.20685 by Fei Zhu, Jing Liu, Ming-Ming Yu, Qunbo Wang, Wenjun Wu, Wenzhuo Liu, Yirong Yang.

**Figure 2.** Figure 2: Different navigation model architectures. According to the action decoder, they can be [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of the proposed C-Nav framework for continual object navigation. It consists of two key components: (1) adaptive experience selection, which identifies semantically meaningful keyframes via LOF in the representation space, reducing storage and redundancy; and (2) dual-path anti-forgetting, which mitigates representation and policy drift through a feature distillation path (bottom left) and a featu… view at source ↗

**Figure 4.** Figure 4: Results of SR and SPL on the MP3D and HM3D datasets. Solid lines represent SR, while [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Ablation on experience selection and retention ratio on [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Comparison with the Zero-Shot Method. Stage 1 Stage 2 Stage 3 Stage 4 5 10 15 20 25 30 35 40 45 Percentage (%) Order1 (SR) Order1 (SPL) Order2 (SR) Order2 (SPL) Order3 (SR) Order3 (SPL) [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 9.** Figure 9: Demonstration trajectory length distribution per stage in HM3D dataset. [PITH_FULL_IMAGE:figures/full_fig_p014_9.png] view at source ↗

**Figure 10.** Figure 10: Demonstration trajectory length distribution per stage in MP3D Dataset. [PITH_FULL_IMAGE:figures/full_fig_p015_10.png] view at source ↗

**Figure 11.** Figure 11: Ablation study: Freezing vs. unfreezing the multimodal encoder on MP3D Continual [PITH_FULL_IMAGE:figures/full_fig_p017_11.png] view at source ↗

**Figure 12.** Figure 12: Results for the dual-path anti-forgetting performance at each stage. [PITH_FULL_IMAGE:figures/full_fig_p018_12.png] view at source ↗

**Figure 13.** Figure 13: Average success rate for each component across four architectures. HM3D-SR-Avg [PITH_FULL_IMAGE:figures/full_fig_p018_13.png] view at source ↗

**Figure 14.** Figure 14: Results for the adaptive experience selection at each stage. [PITH_FULL_IMAGE:figures/full_fig_p019_14.png] view at source ↗

**Figure 15.** Figure 15: Average success rate for different sample methods across four architectures. [PITH_FULL_IMAGE:figures/full_fig_p019_15.png] view at source ↗

read the original abstract

Embodied agents are expected to perform object navigation in dynamic, open-world environments. However, existing approaches typically rely on static trajectories and a fixed set of object categories during training, overlooking the real-world requirement for continual adaptation to evolving scenarios. To facilitate related studies, we introduce the continual object navigation benchmark, which requires agents to acquire navigation skills for new object categories while avoiding catastrophic forgetting of previously learned knowledge. To tackle this challenge, we propose C-Nav, a continual visual navigation framework that integrates two key innovations: (1) A dual-path anti-forgetting mechanism, which comprises feature distillation that aligns multi-modal inputs into a consistent representation space to ensure representation consistency, and feature replay that retains temporal features within the action decoder to ensure policy consistency. (2) An adaptive sampling strategy that selects diverse and informative experiences, thereby reducing redundancy and minimizing memory overhead. Extensive experiments across multiple model architectures demonstrate that C-Nav consistently outperforms existing approaches, achieving superior performance even compared to baselines with full trajectory retention, while significantly lowering memory requirements. The code will be publicly available at https://bigtree765.github.io/C-Nav-project.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

C-Nav adds a continual navigation benchmark and a dual-path method that beats full-retention baselines on memory and performance, but the gains may trace more to adaptive sampling than the anti-forgetting paths.

read the letter

The paper's main contribution is a new benchmark for continual object navigation plus the C-Nav framework. It combines feature distillation for representation consistency, feature replay inside the action decoder for policy consistency, and adaptive sampling to cut redundancy. Experiments across model architectures show it outperforming prior methods and even full-trajectory baselines while using less memory. That combination addresses a practical gap: robots need to handle new objects in changing environments without retraining from scratch or storing everything.

Referee Report

2 major / 2 minor

Summary. The paper introduces a continual object navigation benchmark requiring agents to learn navigation for new object categories in dynamic open-world settings without catastrophic forgetting. It proposes the C-Nav framework featuring a dual-path anti-forgetting mechanism (feature distillation for multi-modal representation consistency and feature replay for policy consistency in the action decoder) together with an adaptive sampling strategy that selects diverse experiences to reduce memory overhead. Experiments across model architectures claim consistent outperformance over prior methods, including baselines that retain full trajectories, while using substantially less memory.

Significance. If the central claims hold after addressing experimental controls, the work would be significant for continual learning in embodied robotics: it provides a new benchmark for open-world adaptation and demonstrates a memory-efficient approach that could scale navigation agents beyond static training regimes. The dual-path design and adaptive sampling directly target representation and policy consistency, which are load-bearing for long-term deployment in evolving environments.

major comments (2)

[Abstract / Experimental Evaluation] Abstract and experimental evaluation: the claim of outperforming full-trajectory-retention baselines while using less memory is load-bearing for the central contribution. However, no ablation is reported that applies the identical adaptive sampling strategy to those full-retention baselines. Without this control, any observed performance edge cannot be unambiguously attributed to the dual-path anti-forgetting mechanism rather than improved data selection.
[Method] Method section on adaptive sampling and dual-path components: the paper does not specify whether the adaptive sampling criteria (diversity and informativeness) or the distillation/replay weights were tuned on the same benchmark splits used for final evaluation. This raises a moderate risk of circularity that could inflate the reported gains relative to untuned baselines.

minor comments (2)

[Figures and Tables] Figure captions and table headers should explicitly state the number of runs, random seeds, and statistical tests used to support the 'consistently outperforms' claim.
[Benchmark Description] The continual object navigation benchmark definition would benefit from an explicit description of the train/test split protocol and the schedule of new object category introductions to facilitate future comparisons.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below with clarifications and commit to revisions that strengthen the experimental controls and transparency of the manuscript.

read point-by-point responses

Referee: [Abstract / Experimental Evaluation] Abstract and experimental evaluation: the claim of outperforming full-trajectory-retention baselines while using less memory is load-bearing for the central contribution. However, no ablation is reported that applies the identical adaptive sampling strategy to those full-retention baselines. Without this control, any observed performance edge cannot be unambiguously attributed to the dual-path anti-forgetting mechanism rather than improved data selection.

Authors: We agree this control is valuable for unambiguous attribution. The adaptive sampling strategy is a core part of C-Nav for memory reduction, but to isolate the dual-path anti-forgetting contribution, we will add a new ablation in the revised manuscript. This ablation will apply the identical adaptive sampling criteria to the full-trajectory-retention baselines and report the resulting performance and memory metrics. We expect this to confirm that the observed gains are not solely due to data selection. revision: yes
Referee: [Method] Method section on adaptive sampling and dual-path components: the paper does not specify whether the adaptive sampling criteria (diversity and informativeness) or the distillation/replay weights were tuned on the same benchmark splits used for final evaluation. This raises a moderate risk of circularity that could inflate the reported gains relative to untuned baselines.

Authors: We thank the referee for noting this potential issue. The adaptive sampling criteria and distillation/replay weights were tuned on a held-out validation split that is disjoint from the test splits used for final reporting. We will revise the Method section to explicitly describe the tuning protocol, including the validation split construction, the search ranges, and the selection criteria, thereby removing any ambiguity about circularity. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical framework with independent experimental validation

full rationale

The paper proposes C-Nav as a practical continual navigation framework combining dual-path anti-forgetting (feature distillation and replay) with adaptive sampling, then validates it through experiments on a new benchmark. No mathematical derivation chain exists that reduces predictions or uniqueness claims to fitted parameters or self-citations by construction. The central performance claims rest on comparative results against baselines (including full-trajectory retention), which are externally falsifiable via the reported metrics and code release rather than being forced by internal definitions or prior author work. Adaptive sampling and distillation weights are implementation choices whose impact is measured, not smuggled in as predictions. This is the common case of a self-contained empirical contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the empirical effectiveness of the proposed mechanisms rather than on new mathematical axioms or invented physical entities. No free parameters are explicitly named in the abstract beyond standard training hyperparameters.

pith-pipeline@v0.9.0 · 5750 in / 1141 out tokens · 22386 ms · 2026-05-18T04:29:18.768233+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

dual-path anti-forgetting mechanism, which comprises feature distillation that aligns multi-modal inputs into a consistent representation space ... and feature replay that retains temporal features within the action decoder
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

adaptive experience selection module ... leverages the Local Outlier Factor (LOF) algorithm to automatically identify critical navigation moments

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

When Robots Do the Chores: A Benchmark and Agent for Long-Horizon Household Task Execution
cs.AI 2026-05 conditional novelty 6.0

LongAct benchmark reveals top VLMs reach only 59% goal completion and 16% full success on long-horizon household tasks, while HoloMind agent improves results via DAG planner, multimodal spatial memory, episodic memory...

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · cited by 1 Pith paper · 8 internal anchors

[1]

Object goal navigation using goal-oriented semantic exploration

Devendra Singh Chaplot, Dhiraj Prakashchand Gandhi, Abhinav Gupta, et al. Object goal navigation using goal-oriented semantic exploration. InAdvances in Neural Information Processing Systems, volume 33 of NeurIPS, pages 4247–4258, 2020

work page 2020
[2]

Objectnav revisited: On evaluation of embodied agents navigating to objects.arXiv preprint arXiv:2006.13171, 2020

Dhruv Batra, Aaron Gokaslan, Aniruddha Kembhavi, Oleksandr Maksymets, Roozbeh Mottaghi, Manolis Savva, Alexander Toshev, and Erik Wijmans. Objectnav revisited: On evaluation of embodied agents navigating to objects.arXiv preprint arXiv:2006.13171, 2020

work page arXiv 2006
[3]

V oronav: V oronoi-based zeroshot object navigation with large language model

Pengying Wu, Yao Mu, Bingxian Wu, Yi Hou, Ji Ma, Shanghang Zhang, and Chang Liu. V oronav: V oronoi-based zeroshot object navigation with large language model. InInternational Conference on Machine Learning (ICML), 2024

work page 2024
[4]

Vlfm: Vision- language frontier maps for zero-shot semantic navigation

Naoki Yokoyama, Sehoon Ha, Dhruv Batra, Jiuguang Wang, and Bernadette Bucher. Vlfm: Vision- language frontier maps for zero-shot semantic navigation. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 42–48. IEEE, 2024

work page 2024
[5]

Esc: Exploration with soft commonsense constraints for zero-shot object navigation

Kaiwen Zhou, Kaizhi Zheng, Connor Pryor, Yilin Shen, Hongxia Jin, Lise Getoor, and Xin Eric Wang. Esc: Exploration with soft commonsense constraints for zero-shot object navigation. InInternational Conference on Machine Learning, pages 42829–42842. PMLR, 2023

work page 2023
[6]

Ovrl-v2: A simple state-of-the-art baseline for imagenav and objectnav.arXiv preprint arXiv:2303.07798, 2023

Karmesh Yadav, Arjun Majumdar, Ram Ramrakhya, et al. Ovrl-v2: A simple state-of-the-art baseline for imagenav and objectnav.arXiv preprint arXiv:2303.07798, 2023. Version 2

work page arXiv 2023
[7]

Pirlnav: Pretraining with imitation and rl finetuning for objectnav

Ram Ramrakhya, Dhruv Batra, Erik Wijmans, et al. Pirlnav: Pretraining with imitation and rl finetuning for objectnav. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17896–17906. IEEE/CVF, 2023

work page 2023
[8]

NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation

Jiazhao Zhang, Kunyu Wang, Rongtao Xu, Gengze Zhou, Yicong Hong, Xiaomeng Fang, Qi Wu, Zhizheng Zhang, and He Wang. Navid: Video-based vlm plans the next step for vision-and-language navigation. arXiv preprint arXiv:2402.15852, 2024

work page internal anchor Pith review arXiv 2024
[9]

Object goal navigation with recursive implicit maps

Siyuan Chen, Thomas Chabal, Ivan Laptev, et al. Object goal navigation with recursive implicit maps. In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 7089–7096. IEEE, 2023

work page 2023
[10]

Habitat-web: Learning embodied object-search strategies from human demonstrations at scale

Ram Ramrakhya, Erik Undersander, Dhruv Batra, et al. Habitat-web: Learning embodied object-search strategies from human demonstrations at scale. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5173–5183. IEEE/CVF, 2022

work page 2022
[11]

Radford, J

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever. Learning transferable visual models from natural language supervision. InProceedings of the 38th International Conference on Machine Learning, volume 139, pages 8748–8763, feb 2021

work page 2021
[12]

Sigmoid loss for language image pre-training

Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, et al. Sigmoid loss for language image pre-training. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 11975–11986. IEEE/CVF, 2023

work page 2023
[13]

Caron, H

M. Caron, H. Touvron, I. Misra, H. Jégou, J. Mairal, P. Bojanowski, and A. Joulin. Emerging properties in self-supervised vision transformers. InProceedings of the International Conference on Computer Vision (ICCV), 2021

work page 2021
[14]

A comprehensive survey on continual learning in generative models

Haiyang Guo, Fanhu Zeng, Fei Zhu, Jiayi Wang, Xukai Wang, Jingang Zhou, Hongbo Zhao, Wenzhuo Liu, Shijie Ma, Xu-Yao Zhang, et al. A comprehensive survey on continual learning in generative models. arXiv preprint arXiv:2506.13045, 2025

work page arXiv 2025
[15]

Open-world machine learning: A review and new outlooks.arXiv e-prints, pages arXiv–2403, 2024

Fei Zhu, Shijie Ma, Zhen Cheng, Xu-Yao Zhang, Zhaoxiang Zhang, and Cheng-Lin Liu. Open-world machine learning: A review and new outlooks.arXiv e-prints, pages arXiv–2403, 2024

work page 2024
[16]

Bagdanov, and Joost Van De Weijer

Marc Masana, Xialei Liu, Bartłomiej Twardowski, Mikel Menta, Andrew D. Bagdanov, and Joost Van De Weijer. Class-incremental learning: Survey and performance evaluation on image classification. volume 45, pages 5513–5533. IEEE, 2022. 11

work page 2022
[17]

Dkt: Diverse knowledge transfer transformer for class incremental learning

Xiangyu Gao, Yang He, Shanshan Dong, Yujia Chen, Lei Wang, Wei Zhang, Hao Li, and Mingyuan Zhou. Dkt: Diverse knowledge transfer transformer for class incremental learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 24236–24245. IEEE/CVF, 2023

work page 2023
[18]

Spoc: Imitating shortest paths in simulation enables effective navigation and manipulation in the real world.arXiv preprint arXiv:2312.02976, 2023

Kiana Ehsani, Tanmay Gupta, Ryan Hendrix, et al. Spoc: Imitating shortest paths in simulation enables effective navigation and manipulation in the real world.arXiv preprint arXiv:2312.02976, 2023

work page arXiv 2023
[19]

Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H. Lampert. iCaRL: incre- mental classifier and representation learning. InCVPR, 2017

work page 2017
[20]

Ramakrishnan, Dhruv Batra, Yonatan Bisk, and Roozbeh Mottaghi

Karmesh Yadav, Ram Ramrakhya, Santhosh K. Ramakrishnan, Dhruv Batra, Yonatan Bisk, and Roozbeh Mottaghi. Habitat-matterport 3d semantics dataset. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4927–4936. IEEE/CVF, 2023

work page 2023
[21]

Matterport3D: Learning from RGB-D Data in Indoor Environments

Angel Chang, Angela Dai, Thomas Funkhouser, and et al. Matterport3d: Learning from rgb-d data in indoor environments.arXiv preprint arXiv:1709.06158, September 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[22]

Habitat: A platform for embodied ai research

Manolis Savva, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, et al. Habitat: A platform for embodied ai research. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9339–9347, 2019

work page 2019
[23]

Simple but effective: Clip embeddings for embodied ai

Apoorv Khandelwal, Luca Weihs, Roozbeh Mottaghi, and Aniruddha Kembhavi. Simple but effective: Clip embeddings for embodied ai. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14829–14838, 2022

work page 2022
[24]

Hu, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al

Edward J. Hu, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. LoRA: Low-rank adaptation of large language models. InAdvances in Neural Information Processing Systems, volume 36, 2023

work page 2023
[25]

Magmax: Leveraging model merging for seamless continual learning

Daniel Marczak, Bartłomiej Twardowski, Tomasz Trzci´nski, and Sebastian Cygert. Magmax: Leveraging model merging for seamless continual learning. InEuropean Conference on Computer Vision, pages 379–395. Springer, 2024

work page 2024
[26]

Learning without forgetting.IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(12):2935–2947, 2017

Zhizhong Li and Derek Hoiem. Learning without forgetting.IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(12):2935–2947, 2017

work page 2017
[27]

Lof: identifying density-based local outliers

Markus M Breunig, Hans-Peter Kriegel, Raymond T Ng, and Jörg Sander. Lof: identifying density-based local outliers. InProceedings of the 2000 ACM SIGMOD international conference on Management of data, pages 93–104, 2000

work page 2000
[28]

Openfm- nav: Towards open-set zero-shot object navigation via vision-language foundation models

Yuxuan Kuang, Hai Lin, and Meng Jiang. Openfmnav: Towards open-set zero-shot object navigation via vision-language foundation models.arXiv preprint arXiv:2402.10670, 2024

work page arXiv 2024
[29]

S. Y . Gadre, M. Wortsman, G. Ilharco, L. Schmidt, and S. Song. Cows on pasture: Baselines and benchmarks for language-driven zero-shot object navigation. InProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, volume 2023-June, pages 23171–23181, mar 2023

work page 2023
[30]

Unigoal: Towards universal zero-shot goal-oriented navigation.arXiv preprint arXiv:2503.10630, 2025

Hang Yin, Xiuwei Xu, Lingqing Zhao, Ziwei Wang, Jie Zhou, and Jiwen Lu. Unigoal: Towards universal zero-shot goal-oriented navigation.arXiv preprint arXiv:2503.10630, 2025

work page arXiv 2025
[31]

H. Liu, C. Li, Q. Wu, and Y . J. Lee. Visual instruction tuning.Advances in Neural Information Processing Systems, 36:1–19, 2023

work page 2023
[32]

GPT-4 Technical Report

OpenAI. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[33]

Qwen Technical Report

Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, et al. Qwen technical report.arXiv preprint arXiv:2309.16609, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[34]

The Llama 3 Herd of Models

Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[35]

Gemini: A Family of Highly Capable Multimodal Models

Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, Katie Millican, et al. Gemini: a family of highly capable multimodal models.arXiv preprint arXiv:2312.11805, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[36]

L3mvn: Leveraging large language models for visual target navigation

Bangguo Yu, Hamidreza Kasaei, and Ming Cao. L3mvn: Leveraging large language models for visual target navigation. In2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 3554–3560. IEEE, 2023

work page 2023
[37]

An Embodied Generalist Agent in 3D World

Jiangyong Huang, Silong Yong, Xiaojian Ma, Xiongkun Linghu, Puhao Li, Yan Wang, Qing Li, Song- Chun Zhu, Baoxiong Jia, and Siyuan Huang. An embodied generalist agent in 3d world.arXiv preprint arXiv:2311.12871, 2023. 12

work page internal anchor Pith review Pith/arXiv arXiv 2023
[38]

Hierarchical consensus-based multi-agent reinforcement learning for multi-robot cooperation tasks

Pu Feng, Junkang Liang, Size Wang, Xin Yu, Xin Ji, Yiting Chen, Kui Zhang, Rongye Shi, and Wenjun Wu. Hierarchical consensus-based multi-agent reinforcement learning for multi-robot cooperation tasks. In2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 642–649. IEEE, 2024

work page 2024
[39]

Fast-slow test-time adaptation for online vision-and-language navigation

Junyu Gao, Xuan Yao, and Changsheng Xu. Fast-slow test-time adaptation for online vision-and-language navigation. 2024

work page 2024
[40]

Navmorph: A self-evolving world model for vision-and- language navigation in continuous environments

Junyu Gao Xuan Yao and Changsheng Xu. Navmorph: A self-evolving world model for vision-and- language navigation in continuous environments. InProceedings of the IEEE/CVF International Conference on Computer Vision, 2025

work page 2025
[41]

Overcoming catastrophic forgetting in neural networks.Proceedings of the National Academy of Sciences, 114(13):3521–3526, 2017

James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. Overcoming catastrophic forgetting in neural networks.Proceedings of the National Academy of Sciences, 114(13):3521–3526, 2017

work page 2017
[42]

Class incremental learning with self-supervised pre-training and prototype learning.Pattern Recognition, 157:110943, 2025

Wenzhuo Liu, Xin-Jian Wu, Fei Zhu, Ming-Ming Yu, Chuang Wang, and Cheng-Lin Liu. Class incremental learning with self-supervised pre-training and prototype learning.Pattern Recognition, 157:110943, 2025

work page 2025
[43]

End-to-end incremental learning

Francisco M Castro, Manuel J Marín-Jiménez, Nicolás Guil, Cordelia Schmid, and Karteek Alahari. End-to-end incremental learning. InProceedings of the European conference on computer vision (ECCV), pages 233–248, 2018

work page 2018
[44]

Learning a unified classifier incrementally via rebalancing

Saihui Hou, Xinyu Pan, Chen Change Loy, Zilei Wang, and Dahua Lin. Learning a unified classifier incrementally via rebalancing. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 831–839. IEEE/CVF, 2019

work page 2019
[45]

Prototype augmentation and self- supervision for incremental learning

Fei Zhu, Xu-Yao Zhang, Chuang Wang, Fei Yin, and Cheng-Lin Liu. Prototype augmentation and self- supervision for incremental learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5871–5880, 2021

work page 2021
[46]

Der: Dynamically expandable representation for class incremental learning

Shipeng Yan, Jiangwei Xie, and Xuming He. Der: Dynamically expandable representation for class incremental learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3014–3023. IEEE/CVF, 2021

work page 2021
[47]

Llava-c: Continual improved visual instruction tuning.arXiv preprint arXiv:2506.08666, 2025

Wenzhuo Liu, Fei Zhu, Haiyang Guo, Longhui Wei, and Cheng-Lin Liu. Llava-c: Continual improved visual instruction tuning.arXiv preprint arXiv:2506.08666, 2025

work page arXiv 2025
[48]

Mllm-cl: Continual learning for multimodal large language models.arXiv preprint arXiv:2506.05453, 2025

Hongbo Zhao, Fei Zhu, Rundong Wang, Gaofeng Meng, and Zhaoxiang Zhang. Mllm-cl: Continual learning for multimodal large language models.arXiv preprint arXiv:2506.05453, 2025

work page arXiv 2025
[49]

Hide-llava: Hierarchical decoupling for continual instruction tuning of multimodal large language model

Haiyang Guo, Fanhu Zeng, Ziwei Xiang, Fei Zhu, Da-Han Wang, Xu-Yao Zhang, and Cheng-Lin Liu. Hide-llava: Hierarchical decoupling for continual instruction tuning of multimodal large language model. arXiv preprint arXiv:2503.12941, 2025

work page arXiv 2025
[50]

Modalprompt: Towards efficient multimodal continual instruction tuning with dual-modality guided prompt.arXiv preprint arXiv:2410.05849, 2024

Fanhu Zeng, Fei Zhu, Haiyang Guo, Xu-Yao Zhang, and Cheng-Lin Liu. Modalprompt: Towards efficient multimodal continual instruction tuning with dual-modality guided prompt.arXiv preprint arXiv:2410.05849, 2024

work page arXiv 2024
[51]

Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022

work page 2022
[52]

Pilora: Prototype guided incremental lora for federated class-incremental learning

Haiyang Guo, Fei Zhu, Wenzhuo Liu, Xu-Yao Zhang, and Cheng-Lin Liu. Pilora: Prototype guided incremental lora for federated class-incremental learning. InEuropean Conference on Computer Vision, pages 141–159. Springer, 2024

work page 2024
[53]

The Power of Scale for Parameter-Efficient Prompt Tuning

Brian Lester, Rami Al-Rfou, and Noah Constant. The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691, April 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[54]

Ramrakhya, E

R. Ramrakhya, E. Undersander, D. Batra, and A. Das. Habitat-web: Learning embodied object-search strategies from human demonstrations at scale. InProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2022

work page 2022
[55]

Dd-ppo: Learning near-perfect pointgoal navigators from 2.5 billion frames.arXiv preprint arXiv:1911.00357, 2019

Erik Wijmans, Abhishek Kadian, Ari Morcos, Stefan Lee, Irfan Essa, Devi Parikh, Manolis Savva, and Dhruv Batra. Dd-ppo: Learning near-perfect pointgoal navigators from 2.5 billion frames.arXiv preprint arXiv:1911.00357, 2019

work page arXiv 1911
[56]

Long short-term memory.Neural Computation, 9(8):1735–1780, 1997

Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory.Neural Computation, 9(8):1735–1780, 1997

work page 1997
[57]

Attention is all you need.Advances in neural information processing systems, 30, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017. 13 A Dataset Details A.1 Category Splits for Continual ObjectNav HM3D Dataset.The object categories are split into four stages as follows: •Stage...

work page 2017