arxiv: 2605.06478 · v1 · submitted 2026-05-07 · 💻 cs.RO

Recognition: unknown

GA3T: A Ground-Aerial Terrain Traversability Dataset for Heterogeneous Robot Teams in Unstructured Environments

Siwei Cai , Knut Peterson , Quan Tran , Christian Ricks , Dhanush Parthasarathy , Amir Kaidarov , Neil Deshpande , Sukaina Najm

show 2 more authors

David Han Lifeng Zhou

Authors on Pith no claims yet

Pith reviewed 2026-05-08 08:53 UTC · model grok-4.3

classification 💻 cs.RO

keywords datasetmulti-robot perceptiontraversability estimationair-ground fusionunstructured environmentscollaborative roboticsUGV UAVcross-view perception

0 comments

The pith

GA3T supplies synchronized ground and aerial robot data across four real off-road sites to enable air-ground fusion and traversability research.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents GA3T, a new dataset gathered by a Husky unmanned ground vehicle and an EVO II unmanned aerial vehicle working together in unstructured settings such as forest trails, rocky paths, muddy areas, snow, and grass fields. It supplies more than 13,000 aligned frames from complementary sensors including LiDAR, stereo cameras, thermal imagery, and GPS, plus both automated and manual labels on over 8,000 images. The collection took place in early spring so that sparse tree canopies allow partial aerial views of the ground robot and terrain below. The authors position the dataset as a resource specifically for studying cross-view perception, viewpoint fusion, safe-terrain estimation, and joint scene understanding in actual outdoor environments rather than simulated or urban driving scenarios.

Core claim

The authors establish that GA3T provides the first real-world collection of overlapping multi-modal, multi-view observations from heterogeneous air-ground robot teams operating in diverse unstructured terrain, thereby directly supporting research on cross-view perception, air-ground viewpoint fusion, traversability estimation, and collaborative scene understanding.

What carries the argument

The GA3T dataset itself, built from synchronized streams of 3D LiDAR, stereo, IMU, GPS, RGB, and thermal/infrared data with SAM-3 zero-shot plus manual annotations collected across four early-spring sites.

If this is right

Algorithms for air-ground viewpoint fusion can be developed and evaluated using the paired overhead and ground perspectives on the same scenes.
Traversability estimation methods can be trained on real multi-modal observations of mud, snow, rocks, and trails.
Occlusion-aware perception research can use the partial aerial visibility through sparse canopies to study how ground and air views complement each other.
Collaborative scene understanding benchmarks can be created from the synchronized multi-robot, multi-view labeled frames.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The thermal channel may enable extensions to low-light or adverse-weather perception that the current RGB-heavy labels do not yet test.
Pairing GA3T with existing urban or simulated datasets could reveal how much domain adaptation is still required for general off-road deployment.
The early-spring timing suggests a natural follow-up collection in full-foliage summer to measure the impact of increased occlusion on fusion performance.

Load-bearing premise

Data from four early-spring environments with sparse canopies plus the mix of automated and manual labels are representative and high-quality enough to train models for broader unstructured-terrain tasks without large domain-shift problems.

What would settle it

Models trained only on GA3T data would be tested on traversability or fusion tasks in summer foliage, different geographic regions, or denser canopy conditions; clear failure to generalize would show the dataset's limits.

Figures

Figures reproduced from arXiv: 2605.06478 by Amir Kaidarov, Christian Ricks, David Han, Dhanush Parthasarathy, Knut Peterson, Lifeng Zhou, Neil Deshpande, Quan Tran, Siwei Cai, Sukaina Najm.

**Figure 1.** Figure 1: Multi-robot collaborative data collection. The heterogeneous aerial and view at source ↗

**Figure 2.** Figure 2: The equipment used for data collection. (a) Autel Robotics EVO II Dual view at source ↗

**Figure 3.** Figure 3: Example of RGB–thermal alignment from the UAV. The thermal image view at source ↗

**Figure 4.** Figure 4: Examples of annotations for paired Husky and Drone images. Images are view at source ↗

read the original abstract

Heterogeneous air-ground robot teams combine complementary sensing modalities, mobility characteristics, and spatial viewpoints that can significantly enhance perception in complex outdoor environments. However, progress in multi-robot collaborative perception has been constrained by the lack of real-world datasets featuring overlapping multi-modal observations from platforms operating in unstructured terrain. We present GA3T (Ground-Aerial Team for Terrain Traversal), a real-world multi-robot collaborative perception dataset collected using a Clearpath Husky UGV and an Autel EVO~II UAV across diverse unstructured environments, including forest trails, rocky paths, muddy terrain, snow piles, and grass-covered fields. The ground platform provides 3D LiDAR, stereo camera, IMU, and GPS data, while the aerial platform contributes RGB imagery, thermal/infrared observations, and GPS from a complementary overhead viewpoint, allowing for rich cross-modal and cross-view perception. The dataset is collected in 4 unique environments, with over 13,000 synchronized frames across approximately 29 minutes of operation, and includes both SAM~3-based zero-shot segmentation and over 8,000 manually labeled images. A unique aspect of the dataset is its early-spring collection period, during which sparse tree canopies allow the aerial robot to partially observe the ground robot and terrain through the trees, allowing for occlusion-aware collaborative perception. Unlike prior multi-robot datasets that focus on SLAM or simulated cooperative driving, GA3T is specifically designed to support research on cross-view perception, air-ground viewpoint fusion, traversability estimation, and collaborative scene understanding in real off-road environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GA3T is a practical new dataset for ground-aerial robot collaboration in off-road settings, but it would benefit from more details on data alignment and label quality.

read the letter

The main takeaway is that this paper releases GA3T, a real-world dataset from a Husky UGV and Autel EVO II UAV collected together in four early-spring unstructured sites. The sparse canopy timing lets the drone see parts of the ground through trees, which is a useful angle for testing partial views in collaborative perception. They gathered over 13,000 synchronized frames in about 29 minutes across forest trails, rocky paths, mud, snow piles, and grass fields, with ground data including 3D LiDAR, stereo, IMU, and GPS plus aerial RGB and thermal imagery, plus SAM-3 labels and over 8,000 manual ones. This directly targets cross-view fusion, traversability estimation, and scene understanding in off-road conditions, unlike many prior sets that stick to SLAM or simulation. The sensor choices and environment mix line up well with the stated goals, and the dual labeling approach shows attention to making the data usable for model training. The soft spots sit in the missing quantitative checks. The abstract notes synchronization but skips error measurements or calibration results, and there is no report on how consistent the manual labels are or how much domain shift the SAM-3 outputs introduce. The total run time is also modest, so variation across conditions may be limited. These gaps do not invalidate the collection, but they make it harder to judge readiness for downstream work right away. This is for researchers building perception pipelines for heterogeneous air-ground teams in disaster response or off-road autonomy. It deserves a serious referee because the described setup fills a documented gap with concrete real data and a distinctive collection condition. I would send it to review so the authors can add the alignment and quality metrics that would make the release stronger.

Referee Report

2 major / 2 minor

Summary. The manuscript presents GA3T, a real-world multi-robot collaborative perception dataset collected with a Clearpath Husky UGV and Autel EVO II UAV across four early-spring unstructured environments (forest trails, rocky paths, muddy terrain, snow piles, grass fields). It supplies over 13,000 synchronized frames (~29 min) with ground-platform 3D LiDAR, stereo camera, IMU and GPS plus aerial RGB, thermal and GPS from complementary viewpoints, together with SAM-3 zero-shot segmentation and >8,000 manual labels, explicitly to enable cross-view perception, air-ground fusion, traversability estimation and collaborative scene understanding.

Significance. If the synchronization and labeling claims hold, the dataset would meaningfully advance heterogeneous robot-team research by supplying real off-road multi-modal, cross-view data with the distinctive sparse-canopy property that permits partial aerial observation of the ground robot and terrain. This fills a documented gap relative to prior SLAM-centric or simulated cooperative-driving collections. The sensor suite and labeling mix are well-matched to the stated use cases; credit is due for the focused real-world collection and the explicit tie between data characteristics and intended downstream tasks.

major comments (2)

[Abstract] Abstract: the statement that the data consist of 'synchronized frames' is load-bearing for all cross-view and fusion claims, yet no quantitative synchronization error, temporal offset statistics, or inter-platform calibration procedure is supplied. Without these numbers the utility for precise air-ground fusion cannot be evaluated.
[Abstract] Abstract: the combination of SAM-3 zero-shot segmentation and >8,000 manual labels is presented as supporting traversability and scene-understanding research, but no label-consistency metrics, inter-annotator agreement, or validation against ground truth are reported. This directly affects the claim that the labels are of sufficient quality to avoid significant domain-shift issues in downstream training.

minor comments (2)

[Abstract] Abstract: the five environment types are enumerated but lack even brief quantitative descriptors (e.g., approximate area, slope statistics, or canopy density) that would help readers judge diversity and representativeness.
[Abstract] Abstract: the total duration is given as 'approximately 29 minutes' without a per-environment or per-platform breakdown, making it difficult to assess data balance across the four sites.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive evaluation and the recommendation of minor revision. We address the two major comments below and will update the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: the statement that the data consist of 'synchronized frames' is load-bearing for all cross-view and fusion claims, yet no quantitative synchronization error, temporal offset statistics, or inter-platform calibration procedure is supplied. Without these numbers the utility for precise air-ground fusion cannot be evaluated.

Authors: We acknowledge that the original manuscript does not supply quantitative synchronization error, temporal offset statistics, or a detailed inter-platform calibration procedure. The data were synchronized using GPS timestamps from both platforms together with hardware-triggered image capture on the ground robot. In the revised version we will add a dedicated subsection describing the synchronization and calibration procedure and will report measured temporal offset statistics (mean and standard deviation of time differences across the synchronized frames). This will directly address the concern and allow readers to assess suitability for precise fusion. revision: yes
Referee: [Abstract] Abstract: the combination of SAM-3 zero-shot segmentation and >8,000 manual labels is presented as supporting traversability and scene-understanding research, but no label-consistency metrics, inter-annotator agreement, or validation against ground truth are reported. This directly affects the claim that the labels are of sufficient quality to avoid significant domain-shift issues in downstream training.

Authors: We agree that explicit label-quality metrics would strengthen the manuscript. The >8,000 manual labels were produced by following a standardized annotation protocol with SAM-3 zero-shot masks used only as an initial aid for refinement; however, the original submission does not include inter-annotator agreement, consistency metrics, or ground-truth validation. In revision we will expand the labeling section to describe the annotation workflow in detail, report any internal consistency checks that were performed, and explicitly discuss limitations with respect to domain shift. Because a full inter-annotator study was not conducted, this will be a partial revision focused on improved documentation rather than new quantitative metrics. revision: partial

Circularity Check

0 steps flagged

No significant circularity

full rationale

This is a dataset paper whose central claim is the collection and release of synchronized multi-modal UGV/UAV data in four real environments. The abstract and description contain no equations, no fitted parameters, no predictions derived from prior results, and no self-citations used as load-bearing premises. All content is descriptive of hardware, collection procedure, labeling (SAM-3 plus manual), and intended downstream uses. No step reduces to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical dataset paper with no mathematical derivations, fitted parameters, background axioms, or new postulated entities.

pith-pipeline@v0.9.0 · 5618 in / 1242 out tokens · 79095 ms · 2026-05-08T08:53:34.374355+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 5 canonical work pages · 1 internal anchor

[1]

IEEE Transac- tions on Intelligent Transportation Systems23(3), 1852–1864 (2020)

Arnold, E., Dianati, M., de Temple, R., Fallah, S.: Cooperative perception for 3d object detection in driving scenarios using infrastructure sensors. IEEE Transac- tions on Intelligent Transportation Systems23(3), 1852–1864 (2020)

2020
[2]

In: Proceedings of the 5th International Confer- ence on Robotics and Artificial Intelligence

Aybakan, A., Haddeler, G., Akay, M.C., Ervan, O., Temeltas, H.: A 3d lidar dataset of itu heterogeneous robot team. In: Proceedings of the 5th International Confer- ence on Robotics and Artificial Intelligence. pp. 12–17 (2019)

2019
[3]

SAM 3: Segment Anything with Concepts

Carion, N., Gustafson, L., Hu, Y.T., Debnath, S., Hu, R., Suris, D., Ryali, C., Alwala,K.V.,Khedr,H.,Huang,A.,etal.:Sam3:Segmentanythingwithconcepts. arXiv preprint arXiv:2511.16719 (2025)

work page internal anchor Pith review arXiv 2025
[4]

IEEE Robotics and Automation Letters7(4), 9175–9182 (2022)

Chang, Y., Ebadi, K., Denniston, C.E., Ginting, M.F., Rosinol, A., Reinke, A., Palieri, M., Shi, J., Chatterjee, A., Morrell, B., et al.: Lamp 2.0: A robust multi- robot slam system for operation in challenging large-scale underground environ- ments. IEEE Robotics and Automation Letters7(4), 9175–9182 (2022)

2022
[5]

In: International symposium on experimental robotics

Cristofalo, E., Leahy, K., Vasile, C.I., Montijano, E., Schwager, M., Belta, C.: Localization of a ground robot by aerial robots for gps-deprived control with tem- poral logic constraints. In: International symposium on experimental robotics. pp. 525–537. Springer (2016)

2016
[6]

In: 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Datar, A., Pokhrel, A., Nazeri, M., Rao, M.B., Rangwala, H., Pan, C., Zhang, Y., Harrison, A., Wigness, M., Osteen, P.R., et al.: M2p2: A multi-modal passive perception dataset for off-road mobility in extreme low-light conditions. In: 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp. 13690–13696. IEEE (2025)

2025
[7]

arXiv preprint arXiv:2210.13723 (2022)

Feng, D., Qi, Y., Zhong, S., Chen, Z., Jiao, Y., Chen, Q., Jiang, T., Chen, H.: S3e: A large-scale multimodal dataset for collaborative slam. arXiv preprint arXiv:2210.13723 (2022)

work page arXiv 2022
[8]

In: 2021 IEEE international conference on robotics and au- tomation (ICRA)

Jiang, P., Osteen, P., Wigness, M., Saripalli, S.: Rellis-3d dataset: Data, bench- marks and analysis. In: 2021 IEEE international conference on robotics and au- tomation (ICRA). pp. 1110–1116. IEEE (2021)

2021
[9]

IEEE Robotics and Automation Letters7(4), 10914–10921 (2022)

Li, Y., Ma, D., An, Z., Wang, Z., Zhong, Y., Chen, S., Feng, C.: V2x-sim: Multi- agent collaborative perception dataset and benchmark for autonomous driving. IEEE Robotics and Automation Letters7(4), 10914–10921 (2022)

2022
[10]

In: European conference on computer vision

Liu, S., Zeng, Z., Ren, T., Li, F., Zhang, H., Yang, J., Jiang, Q., Li, C., Yang, J., Su, H., et al.: Grounding dino: Marrying dino with grounded pre-training for open-set object detection. In: European conference on computer vision. pp. 38–55. Springer (2024)

2024
[11]

In: 2024 IEEE International Conference on Robotics and Automation (ICRA)

Mortimer, P., Hagmanns, R., Granero, M., Luettel, T., Petereit, J., Wuensche, H.J.: The goose dataset for perception in unstructured environments. In: 2024 IEEE International Conference on Robotics and Automation (ICRA). pp. 14838– 14844. IEEE (2024)

2024
[12]

In: 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Patel, M., Yang, F., Qiu, Y., Cadena, C., Scherer, S., Hutter, M., Wang, W.: Tartanground: A large-scale dataset for ground robot perception and navigation. In: 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp. 20524–20531. IEEE (2025) 14 S. Cai et al

2025
[13]

In: 2026 IEEE International Conference on Robotics and Automation (ICRA) (2026)

Peterson, K., Mayers, Z., Yousuf, A., Chowdhury, P., Zaczepinski, A., Arezooman- dan, S., Maarefdoust, R., Han, D.: Lrddv3: High-resolution long-range drone detec- tion dataset with range information and thermal data. In: 2026 IEEE International Conference on Robotics and Automation (ICRA) (2026)

2026
[14]

pizer, r

Pizer, S.M.: Contrast-limited adaptive histogram equalization: Speed and effective- ness stephen m. pizer, r. eugene johnston, james p. ericksen, bonnie c. yankaskas, keith e. muller medical image display research group. In: Proceedings of the first conference on visualization in biomedical computing, Atlanta, Georgia. vol. 337, p. 2 (1990)

1990
[15]

Rouhi, A., Patel, S., McCarthy, N., Khan, S., Khorsand, H., Lefkowitz, K., Han, D.: Lrddv2: Enhanced long-range drone detection dataset with range information and comprehensivereal-worldchallenges.In:2024InternationalSymposiumofRobotics Research (ISRR) (2024)

2024
[16]

In: Proc

Rouhi, A., Umare, H., Patal, S., Kapoor, R., Deshpande, N., Arezoomandan, S., Shah, P., Han, D.: Long-range drone detection dataset. In: 2024 IEEE Interna- tional Conference on Consumer Electronics (ICCE) (2024). https://doi.org/10. 1109/ICCE59016.2024.10444135

work page arXiv 2024
[17]

In: 2024 IEEE International Conference on Robotics and Automation (ICRA)

Sivaprakasam, M., Maheshwari, P., Castro, M.G., Triest, S., Nye, M., Willits, S., Saba, A., Wang, W., Scherer, S.: Tartandrive 2.0: More modalities and better infrastructure to further self-supervised learning research in off-road driving tasks. In: 2024 IEEE International Conference on Robotics and Automation (ICRA). pp. 12606–12606. IEEE (2024)

2024
[18]

a talk at the Stanford Artificial Project in1968, 271–272 (1968)

Sobel, I., Feldman, G., et al.: A 3x3 isotropic gradient operator for image process- ing. a talk at the Stanford Artificial Project in1968, 271–272 (1968)

1968
[19]

arXiv preprint arXiv:2304.04362 (2023)

Tian, Y., Chang, Y., Quang, L., Schang, A., Nieto-Granda, C., How, J.P., Carlone, L.: Resilient and distributed multi-robot visual slam: Datasets, experiments, and lessons learned. arXiv preprint arXiv:2304.04362 (2023)

work page arXiv 2023
[20]

arXiv preprint arXiv:2207.04914 (2022)

Tranzatto, M., Dharmadhikari, M., Bernreiter, L., Camurri, M., Khattak, S., Mas- carich, F., Pfreundschuh, P., Wisth, D., Zimmermann, S., Kulkarni, M., et al.: Team cerberus wins the darpa subterranean challenge: Technical overview and lessons learned. arXiv preprint arXiv:2207.04914 (2022)

work page arXiv 2022
[21]

IEEE Robotics and Automation Letters8(2), 1029–1036 (2023)

Vizzo, I., Guadagnino, T., Mersch, B., Wiesmann, L., Behley, J., Stachniss, C.: Kiss-icp: In defense of point-to-point icp–simple, accurate, and robust registration if done the right way. IEEE Robotics and Automation Letters8(2), 1029–1036 (2023)

2023
[22]

In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Wigness, M., Eum, S., Rogers, J.G., Han, D., Kwon, H.: A rugd dataset for au- tonomous navigation and visual perception in unstructured outdoor environments. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp. 5000–5007. IEEE (2019)

2019
[23]

In: IEEE International Conference on Robotics and Automation (ICRA)

Xu, R., Xiang, H., Xia, X., Han, X., Li, J., Ma, J.: Opv2v: An open benchmark dataset and fusion pipeline for perception with vehicle-to-vehicle communication. In: IEEE International Conference on Robotics and Automation (ICRA). pp. 2583– 2589 (2022)

2022
[24]

In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Yu, H., Luo, Y., Shu, M., Huo, Y., Yang, Z., Shi, Y., Guo, Z., Li, H., Hu, X., Yuan, J., et al.: Dair-v2x: A large-scale dataset for vehicle-infrastructure cooperative 3d object detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 21361–21370 (2022)

2022
[25]

IEEE Robotics and Automation Letters9(7), 6416–6423 (2024) GA3T: A Ground-Aerial Terrain Traversability Team Dataset 15

Zhou, Y., Quang, L., Nieto-Granda, C., Loianno, G.: Coped-advancing multi-robot collaborative perception: A comprehensive dataset in real-world environments. IEEE Robotics and Automation Letters9(7), 6416–6423 (2024) GA3T: A Ground-Aerial Terrain Traversability Team Dataset 15

2024
[26]

Zhu, P., Wen, L., Du, D., Bian, X., Fan, H., Hu, Q., Ling, H.: Detection and trackingmeetdroneschallenge.IEEEtransactionsonpatternanalysisandmachine intelligence44(11), 7380–7399 (2021)

2021
[27]

IEEE Robotics and Au- tomation Letters8(2), 966–973 (2023)

Zhu, Y., Kong, Y., Jie, Y., Xu, S., Cheng, H.: Graco: A multimodal dataset for ground and aerial cooperative localization and mapping. IEEE Robotics and Au- tomation Letters8(2), 966–973 (2023)

2023