Vision Foundation Models for Domain Generalisable Cross-View Localisation in Planetary Ground-Aerial Robotic Teams

Alberto Candela; David Harvey; Feras Dayoub; Lachlan Holden; Tat-Jun Chin

arxiv: 2601.09107 · v1 · submitted 2026-01-14 · 💻 cs.CV · cs.RO

Vision Foundation Models for Domain Generalisable Cross-View Localisation in Planetary Ground-Aerial Robotic Teams

Lachlan Holden , Feras Dayoub , Alberto Candela , David Harvey , Tat-Jun Chin This is my paper

Pith reviewed 2026-05-16 15:12 UTC · model grok-4.3

classification 💻 cs.CV cs.RO

keywords cross-view localizationplanetary roboticsvision foundation modelsdomain generalizationsynthetic dataparticle filtersrover localizationaerial mapping

0 comments

The pith

Cross-view dual-encoder networks trained on synthetic image pairs and foundation-model semantic segmentation enable accurate rover localization in aerial maps using particle filters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that rovers can determine their position inside a local aerial map from sequences of limited-field-of-view ground-view RGB images. It does this by training dual-encoder networks on large synthetic ground-aerial pairs and using vision foundation models to produce semantic segmentation that reduces the appearance gap between simulation and real planetary terrain. Particle filters then combine the network outputs over time to track both simple and complex trajectories. The approach matters because labeled real planetary data is scarce, so any method that generalizes from synthetic training directly supports larger-scale ground-aerial missions. The authors also release a new real-world rover trajectory dataset captured in a planetary analogue facility together with matching synthetic pairs.

Core claim

Dual-encoder cross-view networks that ingest ground-view images and predict their location inside an aerial map can be made to generalize from synthetic training data to real planetary images when the networks are supervised with semantic segmentation masks produced by vision foundation models; particle-filter state estimation on the network outputs then yields accurate position estimates along both simple and complex rover trajectories.

What carries the argument

Cross-view-localising dual-encoder deep neural networks that map ground-view images to positions inside an aerial map, guided by semantic segmentation from vision foundation models and trained on high-volume synthetic ground-aerial pairs.

If this is right

Accurate position estimates are obtained over both simple and complex trajectories when particle filters combine successive cross-view network outputs.
Localization performance remains usable even when only monocular ground-view RGB images with limited field of view are available.
The same synthetic-plus-foundation-model pipeline produces usable cross-view matches on the new planetary-analogue real dataset.
Ground-aerial robotic teams can therefore perform local map-based localization without requiring large quantities of labeled real flight data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could be tested on orbital imagery of other bodies by swapping the aerial map source while keeping the same synthetic-training recipe.
If the semantic segmentation step is replaced by a different foundation model, the localization error on real trajectories would indicate how sensitive the pipeline is to the choice of segmentation prior.
Extending the particle filter to also estimate heading or velocity from the same ground-view sequence would turn the current position-only estimator into a full pose tracker without extra sensors.

Load-bearing premise

Semantic segmentation masks from vision foundation models plus large synthetic training sets are enough to make the learned cross-view matching reliable on real planetary images.

What would settle it

On the contributed real-world rover dataset, replace the synthetic-trained cross-view network with a network trained only on real images and measure whether particle-filter position error increases beyond the reported synthetic-trained performance.

Figures

Figures reproduced from arXiv: 2601.09107 by Alberto Candela, David Harvey, Feras Dayoub, Lachlan Holden, Tat-Jun Chin.

**Figure 2.** Figure 2: Full rock segmentation pipeline using LLMDet and SAM 2. [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: The network follows a dual-encoder structure, with [PITH_FULL_IMAGE:figures/full_fig_p002_3.png] view at source ↗

**Figure 3.** Figure 3: Dual-encoder cross-view localising network structure, [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Examples of ground view (bottom) and rectified aerial [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 6.** Figure 6: The six trajectories, A to F, used in the validation set [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗

**Figure 9.** Figure 9: Qualitative examples of particle filter runs A (left) [PITH_FULL_IMAGE:figures/full_fig_p006_9.png] view at source ↗

**Figure 7.** Figure 7: Comparison of the distribution of particle filter [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗

**Figure 8.** Figure 8: Comparison of particle filter errors between automatic [PITH_FULL_IMAGE:figures/full_fig_p006_8.png] view at source ↗

read the original abstract

Accurate localisation in planetary robotics enables the advanced autonomy required to support the increased scale and scope of future missions. The successes of the Ingenuity helicopter and multiple planetary orbiters lay the groundwork for future missions that use ground-aerial robotic teams. In this paper, we consider rovers using machine learning to localise themselves in a local aerial map using limited field-of-view monocular ground-view RGB images as input. A key consideration for machine learning methods is that real space data with ground-truth position labels suitable for training is scarce. In this work, we propose a novel method of localising rovers in an aerial map using cross-view-localising dual-encoder deep neural networks. We leverage semantic segmentation with vision foundation models and high volume synthetic data to bridge the domain gap to real images. We also contribute a new cross-view dataset of real-world rover trajectories with corresponding ground-truth localisation data captured in a planetary analogue facility, plus a high volume dataset of analogous synthetic image pairs. Using particle filters for state estimation with the cross-view networks allows accurate position estimation over simple and complex trajectories based on sequences of ground-view images.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's real value is the new planetary analogue cross-view dataset, but the localization accuracy claims rest on an unproven assumption that VFM semantics close the domain gap.

read the letter

The main thing to know is that this work adds a new real-world dataset of rover trajectories with ground-truth positions from a planetary analogue facility, along with matching synthetic image pairs. They train dual-encoder networks that use vision foundation model semantic segmentation to handle the shift from synthetic training data to real ground-view images, then feed the outputs into a particle filter for position estimation over sequences. The abstract says this delivers accurate results on both simple and complex trajectories. That dataset is a concrete, usable contribution for anyone working on cross-view localization in space robotics, where labeled real data is scarce. The approach of layering existing VFM segmentation on top of synthetic data generation is a reasonable engineering move rather than a new framework. The particle filter step for handling sequences also makes sense for practical state estimation. The soft spot is the lack of visible quantitative backing. The abstract asserts accurate estimation but shows no error metrics, baselines, ablation results, or real-versus-synthetic performance deltas. Planetary surfaces bring lighting, scale, and texture shifts that standard VFM pre-training may not cover well, so the claim that semantic outputs reliably bridge the gap needs direct evidence to be convincing. Without those numbers or controls, it is hard to judge how well the transfer actually works. This paper is for CV and robotics researchers focused on planetary or field robotics applications. A reader looking for datasets or domain-adaptation examples in cross-view settings could get something out of it. It deserves peer review because the dataset is new and the problem is relevant, even if the experimental section will likely need strengthening on the quantitative side.

Referee Report

2 major / 2 minor

Summary. The paper proposes a cross-view localization method for planetary rovers that uses dual-encoder neural networks trained on synthetic image pairs. Vision foundation models provide semantic segmentation to help bridge the domain gap to real planetary images. New real-world rover trajectory datasets with ground-truth labels and corresponding high-volume synthetic datasets are contributed. Particle filters are applied for state estimation, with the claim that this yields accurate position estimates on both simple and complex trajectories from sequences of monocular ground-view RGB images.

Significance. If the domain-transfer claims hold, the approach would be valuable for planetary robotics, where labeled real data is scarce, by enabling localization against aerial maps without extensive real-world training. The release of new real and synthetic cross-view datasets is a clear positive contribution that could support future work. The combination of VFMs, synthetic data, and particle filtering is a reasonable empirical strategy for this setting.

major comments (2)

[Abstract] Abstract: the central claim of 'accurate position estimation' on new real datasets is asserted without any reported quantitative metrics (e.g., mean position error, success rate, or error distributions), baselines, or ablation results, so the soundness of the generalization claim cannot be evaluated from the manuscript text.
[Method/Experiments] Method and Experiments sections: the key assumption that VFM-derived semantic segmentation closes the domain gap for planetary surfaces (lighting, texture, and scale variations absent from typical VFM pre-training) is not supported by any quantitative evidence such as real-vs-synthetic error deltas, segmentation-quality ablations, or comparisons of localization performance with and without the VFM component.

minor comments (2)

[Abstract] Abstract: consider adding one or two concrete performance numbers or explicit references to result tables/figures to make the claims more informative.
[Method] Notation: clarify whether the dual-encoder architecture is symmetric or asymmetric and how the cross-view matching loss is formulated.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that the abstract and experimental evidence require strengthening with explicit quantitative results to support the claims. We will revise the manuscript accordingly and address each point below.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim of 'accurate position estimation' on new real datasets is asserted without any reported quantitative metrics (e.g., mean position error, success rate, or error distributions), baselines, or ablation results, so the soundness of the generalization claim cannot be evaluated from the manuscript text.

Authors: We agree that the abstract should include quantitative metrics to substantiate the claim of accurate position estimation. The experiments section reports mean position errors, success rates, error distributions, and comparisons against baselines on the new real planetary analogue trajectories. We will revise the abstract to incorporate key quantitative results (e.g., average localization error and success rates) along with brief references to the baselines and ablations, enabling direct evaluation of the generalization claims. revision: yes
Referee: [Method/Experiments] Method and Experiments sections: the key assumption that VFM-derived semantic segmentation closes the domain gap for planetary surfaces (lighting, texture, and scale variations absent from typical VFM pre-training) is not supported by any quantitative evidence such as real-vs-synthetic error deltas, segmentation-quality ablations, or comparisons of localization performance with and without the VFM component.

Authors: The current manuscript includes direct comparisons of localization performance with and without the VFM semantic segmentation component, showing measurable improvements in domain transfer on both synthetic and real data. We acknowledge, however, that additional quantitative support such as segmentation-quality metrics on real images and explicit real-vs-synthetic error deltas would further strengthen the argument. We will add these ablations and metrics to the revised Experiments section. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical method with new datasets and no self-referential derivations

full rationale

The paper presents an empirical pipeline for cross-view localization: dual-encoder networks trained on synthetic image pairs (augmented by VFM semantic segmentation) and evaluated via particle filters on a newly contributed real-world planetary analogue dataset. No equations, derivations, uniqueness theorems, or predictions are claimed that reduce to fitted parameters or self-citations by construction. All load-bearing elements (domain-gap bridging, trajectory accuracy) rest on experimental results from the contributed datasets rather than definitional equivalences or self-citation chains. The work is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that foundation-model semantic segmentation transfers effectively from general images to planetary scenes and that synthetic data can stand in for scarce real labeled trajectories.

axioms (1)

domain assumption Vision foundation models produce semantic segmentations that generalize to real planetary images after training on synthetic data
Invoked to justify bridging the domain gap without extensive real labeled data

pith-pipeline@v0.9.0 · 5509 in / 1225 out tokens · 32616 ms · 2026-05-16T15:12:05.279066+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith.Cost.FunctionalEquation washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We leverage semantic segmentation with vision foundation models and high volume synthetic data to bridge the domain gap to real images... Using particle filters for state estimation with the cross-view networks allows accurate position estimation

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages

[1]

MT-GEO: A Multi-Scale Feature Extraction Network for Cross-View Geo-Localization Between Street-View and Remote Sensing Imagery,

W. Zhang, J. Li, H. Chen, and J. Wu, “MT-GEO: A Multi-Scale Feature Extraction Network for Cross-View Geo-Localization Between Street-View and Remote Sensing Imagery,” inIGARSS, 2024, pp. 6964–6968

work page 2024
[2]

ArcGeo: Localizing Limited Field-of- View Images using Cross-view Matching,

M. Shugaev et al., “ArcGeo: Localizing Limited Field-of- View Images using Cross-view Matching,” inWACV, 2024, pp. 208–217

work page 2024
[3]

TransGeo: Transformer Is All You Need for Cross-view Image Geo-localization,

S. Zhu, M. Shah, and C. Chen, “TransGeo: Transformer Is All You Need for Cross-view Image Geo-localization,” in CVPR, 2022, pp. 1152–1161

work page 2022
[4]

Metric localization for lunar rovers via cross-view image matching,

Z. Chen, K. Li, H. Li, Z. Fu, H. Zhang, and Y . Guo, “Metric localization for lunar rovers via cross-view image matching,” Visual Intelligence, vol. 2, no. 1, p. 12, 2024

work page 2024
[5]

Lunar Rover Cross-View Localization Through Integration of Rover and Orbital Images,

X. Zhao, L. Cui, X. Wei, C. Liu, and J. Yin, “Lunar Rover Cross-View Localization Through Integration of Rover and Orbital Images,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–14, 2024

work page 2024
[6]

Cross-Site Visual Localization of Zhurong Mars Rover Based on Self-Supervised Keypoint Extraction and Robust Matching,

Y . Kou et al., “Cross-Site Visual Localization of Zhurong Mars Rover Based on Self-Supervised Keypoint Extraction and Robust Matching,”IEEE Transactions on Geoscience and Remote Sensing, vol. 63, pp. 1–20, 2025

work page 2025
[7]

Precise pose estimation of the NASA Mars 2020 Perseverance rover through a stereo-vision-based approach,

S. Andolfo, F. Petricca, and A. Genova, “Precise pose estimation of the NASA Mars 2020 Perseverance rover through a stereo-vision-based approach,”Journal of Field Robotics, vol. 40, no. 3, pp. 684–700, 2023

work page 2020
[8]

Toward Autonomous Localization of Plan- etary Robotic Explorers by Relying on Semantic Mapping,

K. Ebadi et al., “Toward Autonomous Localization of Plan- etary Robotic Explorers by Relying on Semantic Mapping,” inAERO, 2022, pp. 1–10

work page 2022
[9]

Rover Localization in Mars Helicopter Aerial Maps: Experimental Results in a Mars-Analogue Environment,

K. Ebadi and A.-A. Agha-Mohammadi, “Rover Localization in Mars Helicopter Aerial Maps: Experimental Results in a Mars-Analogue Environment,” inProceedings of the 2018 International Symposium on Experimental Robotics, J. Xiao, T. Kr¨oger, and O. Khatib, Eds., Cham: Springer International Publishing, 2018, pp. 72–84

work page 2018
[10]

Enabling Long & Precise Drives for The Perseverance Mars Rover via Onboard Global Localization,

V . Verma et al., “Enabling Long & Precise Drives for The Perseverance Mars Rover via Onboard Global Localization,” inAERO, Big Sky, MT, USA: IEEE, 2024, pp. 1–18

work page 2024
[11]

Topographical Landmarks for Ground-Level Terrain Relative Navigation on Mars,

J. V . Hook, R. Schwartz, K. Ebadi, K. Coble, and C. Pad- gett, “Topographical Landmarks for Ground-Level Terrain Relative Navigation on Mars,” inAERO, 2022, pp. 1–6

work page 2022
[12]

Absolute Localisation by Map Matching for Sample Fetch Rover,

M. Dinsdale et al., “Absolute Localisation by Map Matching for Sample Fetch Rover,” 2022

work page 2022
[13]

Planetary Rover Localisation via Surface and Orbital Image Matching,

V . Franchi and E. Ntagiou, “Planetary Rover Localisation via Surface and Orbital Image Matching,” inAERO, 2022, pp. 1–14

work page 2022
[14]

AI4MARS: A Dataset for Terrain-Aware Autonomous Driving on Mars,

R. M. Swan et al., “AI4MARS: A Dataset for Terrain-Aware Autonomous Driving on Mars,” inCVPR Workshops, 2021, pp. 1982–1991

work page 2021
[15]

S5Mars: Semi-Supervised Learning for Mars Semantic Segmenta- tion,

J. Zhang, L. Lin, Z. Fan, W. Wang, and J. Liu, “S5Mars: Semi-Supervised Learning for Mars Semantic Segmenta- tion,”IEEE Transactions on Geoscience and Remote Sens- ing, vol. 62, pp. 1–15, 2024

work page 2024
[16]

NOAH-H, a deep-learning, terrain classification system for Mars: Results for the ExoMars Rover candidate landing sites,

A. M. Barrett et al., “NOAH-H, a deep-learning, terrain classification system for Mars: Results for the ExoMars Rover candidate landing sites,”Icarus, vol. 371, p. 114 701, 2022

work page 2022
[17]

Rocknet: Lightweight network for real-time segmentation of Martian rocks,

P. Wei, Z. Sun, and H. Tian, “Rocknet: Lightweight network for real-time segmentation of Martian rocks,”Journal of Real-Time Image Processing, vol. 22, no. 1, p. 41, 2025

work page 2025
[18]

RockFormer: A U-Shaped Transformer Network for Martian Rock Seg- mentation,

H. Liu, M. Yao, X. Xiao, and Y . Xiong, “RockFormer: A U-Shaped Transformer Network for Martian Rock Seg- mentation,”IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–16, 2023

work page 2023
[19]

LLMDet: Learning Strong Open-V ocabulary Object Detectors under the Supervision of Large Language Models,

S. Fu et al., “LLMDet: Learning Strong Open-V ocabulary Object Detectors under the Supervision of Large Language Models,” inCVPR, 2025, pp. 14 987–14 997

work page 2025
[20]

SAM 2: Segment Anything in Images and Videos,

N. Ravi et al., “SAM 2: Segment Anything in Images and Videos,” inThe Thirteenth International Conference on Learning Representations, 2024

work page 2024
[21]

CVM-Net: Cross-View Matching Network for Image- Based Ground-to-Aerial Geo-Localization,

S. Hu, M. Feng, R. M. H. Nguyen, and G. H. Lee, “CVM-Net: Cross-View Matching Network for Image- Based Ground-to-Aerial Geo-Localization,” inCVPR, Salt Lake City, UT, USA: IEEE, 2018, pp. 7258–7267

work page 2018
[22]

Thrun, W

S. Thrun, W. Burgard, and D. Fox,Probabilistic Robotics (Intelligent Robotics and Autonomous Agents). Cambridge, Mass: MIT Press, 2005, 647 pp

work page 2005
[23]

Martin and M

I. Martin and M. Dunstan,PANGU v6: Planet and Asteroid Natural Scene Generation Utility, 2021

work page 2021
[24]

Leo Rover - Outdoor Robotics Kit for research

Leo Rover. “Leo Rover - Outdoor Robotics Kit for research. ”[Online]. Available:https://www.leorover.tech/ the-rover

work page
[25]

Motion Capture Systems,

OptiTrack. “Motion Capture Systems,” OptiTrack. [Online]. Available:http : / / www . optitrack . com / index . html

work page
[26]

ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks,

J. Kwon, J. Kim, H. Park, and I. K. Choi, “ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks,” inICML, 2021, pp. 5905–5914

work page 2021
[27]

Total Ionizing Dose Radiation Testing of NVIDIA Jetson Orin NX System on Module,

M. A. Felix, W. S. Slater, D. C. Landauer, R. E. Pinson, and B. B. Rutherford, “Total Ionizing Dose Radiation Testing of NVIDIA Jetson Orin NX System on Module,” inIEEE Space Computing Conference, 2024, pp. 116–121

work page 2024

[1] [1]

MT-GEO: A Multi-Scale Feature Extraction Network for Cross-View Geo-Localization Between Street-View and Remote Sensing Imagery,

W. Zhang, J. Li, H. Chen, and J. Wu, “MT-GEO: A Multi-Scale Feature Extraction Network for Cross-View Geo-Localization Between Street-View and Remote Sensing Imagery,” inIGARSS, 2024, pp. 6964–6968

work page 2024

[2] [2]

ArcGeo: Localizing Limited Field-of- View Images using Cross-view Matching,

M. Shugaev et al., “ArcGeo: Localizing Limited Field-of- View Images using Cross-view Matching,” inWACV, 2024, pp. 208–217

work page 2024

[3] [3]

TransGeo: Transformer Is All You Need for Cross-view Image Geo-localization,

S. Zhu, M. Shah, and C. Chen, “TransGeo: Transformer Is All You Need for Cross-view Image Geo-localization,” in CVPR, 2022, pp. 1152–1161

work page 2022

[4] [4]

Metric localization for lunar rovers via cross-view image matching,

Z. Chen, K. Li, H. Li, Z. Fu, H. Zhang, and Y . Guo, “Metric localization for lunar rovers via cross-view image matching,” Visual Intelligence, vol. 2, no. 1, p. 12, 2024

work page 2024

[5] [5]

Lunar Rover Cross-View Localization Through Integration of Rover and Orbital Images,

X. Zhao, L. Cui, X. Wei, C. Liu, and J. Yin, “Lunar Rover Cross-View Localization Through Integration of Rover and Orbital Images,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–14, 2024

work page 2024

[6] [6]

Cross-Site Visual Localization of Zhurong Mars Rover Based on Self-Supervised Keypoint Extraction and Robust Matching,

Y . Kou et al., “Cross-Site Visual Localization of Zhurong Mars Rover Based on Self-Supervised Keypoint Extraction and Robust Matching,”IEEE Transactions on Geoscience and Remote Sensing, vol. 63, pp. 1–20, 2025

work page 2025

[7] [7]

Precise pose estimation of the NASA Mars 2020 Perseverance rover through a stereo-vision-based approach,

S. Andolfo, F. Petricca, and A. Genova, “Precise pose estimation of the NASA Mars 2020 Perseverance rover through a stereo-vision-based approach,”Journal of Field Robotics, vol. 40, no. 3, pp. 684–700, 2023

work page 2020

[8] [8]

Toward Autonomous Localization of Plan- etary Robotic Explorers by Relying on Semantic Mapping,

K. Ebadi et al., “Toward Autonomous Localization of Plan- etary Robotic Explorers by Relying on Semantic Mapping,” inAERO, 2022, pp. 1–10

work page 2022

[9] [9]

Rover Localization in Mars Helicopter Aerial Maps: Experimental Results in a Mars-Analogue Environment,

K. Ebadi and A.-A. Agha-Mohammadi, “Rover Localization in Mars Helicopter Aerial Maps: Experimental Results in a Mars-Analogue Environment,” inProceedings of the 2018 International Symposium on Experimental Robotics, J. Xiao, T. Kr¨oger, and O. Khatib, Eds., Cham: Springer International Publishing, 2018, pp. 72–84

work page 2018

[10] [10]

Enabling Long & Precise Drives for The Perseverance Mars Rover via Onboard Global Localization,

V . Verma et al., “Enabling Long & Precise Drives for The Perseverance Mars Rover via Onboard Global Localization,” inAERO, Big Sky, MT, USA: IEEE, 2024, pp. 1–18

work page 2024

[11] [11]

Topographical Landmarks for Ground-Level Terrain Relative Navigation on Mars,

J. V . Hook, R. Schwartz, K. Ebadi, K. Coble, and C. Pad- gett, “Topographical Landmarks for Ground-Level Terrain Relative Navigation on Mars,” inAERO, 2022, pp. 1–6

work page 2022

[12] [12]

Absolute Localisation by Map Matching for Sample Fetch Rover,

M. Dinsdale et al., “Absolute Localisation by Map Matching for Sample Fetch Rover,” 2022

work page 2022

[13] [13]

Planetary Rover Localisation via Surface and Orbital Image Matching,

V . Franchi and E. Ntagiou, “Planetary Rover Localisation via Surface and Orbital Image Matching,” inAERO, 2022, pp. 1–14

work page 2022

[14] [14]

AI4MARS: A Dataset for Terrain-Aware Autonomous Driving on Mars,

R. M. Swan et al., “AI4MARS: A Dataset for Terrain-Aware Autonomous Driving on Mars,” inCVPR Workshops, 2021, pp. 1982–1991

work page 2021

[15] [15]

S5Mars: Semi-Supervised Learning for Mars Semantic Segmenta- tion,

J. Zhang, L. Lin, Z. Fan, W. Wang, and J. Liu, “S5Mars: Semi-Supervised Learning for Mars Semantic Segmenta- tion,”IEEE Transactions on Geoscience and Remote Sens- ing, vol. 62, pp. 1–15, 2024

work page 2024

[16] [16]

NOAH-H, a deep-learning, terrain classification system for Mars: Results for the ExoMars Rover candidate landing sites,

A. M. Barrett et al., “NOAH-H, a deep-learning, terrain classification system for Mars: Results for the ExoMars Rover candidate landing sites,”Icarus, vol. 371, p. 114 701, 2022

work page 2022

[17] [17]

Rocknet: Lightweight network for real-time segmentation of Martian rocks,

P. Wei, Z. Sun, and H. Tian, “Rocknet: Lightweight network for real-time segmentation of Martian rocks,”Journal of Real-Time Image Processing, vol. 22, no. 1, p. 41, 2025

work page 2025

[18] [18]

RockFormer: A U-Shaped Transformer Network for Martian Rock Seg- mentation,

H. Liu, M. Yao, X. Xiao, and Y . Xiong, “RockFormer: A U-Shaped Transformer Network for Martian Rock Seg- mentation,”IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–16, 2023

work page 2023

[19] [19]

LLMDet: Learning Strong Open-V ocabulary Object Detectors under the Supervision of Large Language Models,

S. Fu et al., “LLMDet: Learning Strong Open-V ocabulary Object Detectors under the Supervision of Large Language Models,” inCVPR, 2025, pp. 14 987–14 997

work page 2025

[20] [20]

SAM 2: Segment Anything in Images and Videos,

N. Ravi et al., “SAM 2: Segment Anything in Images and Videos,” inThe Thirteenth International Conference on Learning Representations, 2024

work page 2024

[21] [21]

CVM-Net: Cross-View Matching Network for Image- Based Ground-to-Aerial Geo-Localization,

S. Hu, M. Feng, R. M. H. Nguyen, and G. H. Lee, “CVM-Net: Cross-View Matching Network for Image- Based Ground-to-Aerial Geo-Localization,” inCVPR, Salt Lake City, UT, USA: IEEE, 2018, pp. 7258–7267

work page 2018

[22] [22]

Thrun, W

S. Thrun, W. Burgard, and D. Fox,Probabilistic Robotics (Intelligent Robotics and Autonomous Agents). Cambridge, Mass: MIT Press, 2005, 647 pp

work page 2005

[23] [23]

Martin and M

I. Martin and M. Dunstan,PANGU v6: Planet and Asteroid Natural Scene Generation Utility, 2021

work page 2021

[24] [24]

Leo Rover - Outdoor Robotics Kit for research

Leo Rover. “Leo Rover - Outdoor Robotics Kit for research. ”[Online]. Available:https://www.leorover.tech/ the-rover

work page

[25] [25]

Motion Capture Systems,

OptiTrack. “Motion Capture Systems,” OptiTrack. [Online]. Available:http : / / www . optitrack . com / index . html

work page

[26] [26]

ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks,

J. Kwon, J. Kim, H. Park, and I. K. Choi, “ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks,” inICML, 2021, pp. 5905–5914

work page 2021

[27] [27]

Total Ionizing Dose Radiation Testing of NVIDIA Jetson Orin NX System on Module,

M. A. Felix, W. S. Slater, D. C. Landauer, R. E. Pinson, and B. B. Rutherford, “Total Ionizing Dose Radiation Testing of NVIDIA Jetson Orin NX System on Module,” inIEEE Space Computing Conference, 2024, pp. 116–121

work page 2024