Griffin: Aerial-Ground Cooperative Detection and Tracking Dataset and Benchmark

Chuang Zhang; Haibao Yu; Jiahao Wang; Jianqiang Wang; Jiaru Zhong; Lei He; Shaobing Xu; Xiangyu Cao; Yuner Zhang; Zeyu Han

arxiv: 2503.06983 · v2 · submitted 2025-03-10 · 💻 cs.CV · cs.RO

Griffin: Aerial-Ground Cooperative Detection and Tracking Dataset and Benchmark

Jiahao Wang , Xiangyu Cao , Jiaru Zhong , Yuner Zhang , Zeyu Han , Haibao Yu , Chuang Zhang , Lei He

show 2 more authors

Shaobing Xu Jianqiang Wang

This is my paper

Pith reviewed 2026-05-22 23:51 UTC · model grok-4.3

classification 💻 cs.CV cs.RO

keywords aerial-ground cooperationcooperative perception3D detectionmulti-object trackingdatasetbenchmarkdronevehicle perception

0 comments

The pith

Griffin supplies a dataset of over 250 dynamic scenes to benchmark aerial-ground cooperative 3D detection and tracking.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Griffin to fill the absence of public datasets for aerial-ground cooperative perception, which pairs drones with ground vehicles as a lower-cost alternative to vehicle-to-vehicle systems. The dataset includes 37k+ frames across varied drone altitudes, weather conditions, and realistic dynamics from co-simulation, together with occlusion-aware 3D annotations. It ships with a unified benchmarking framework that measures cooperative methods on communication efficiency, altitude adaptability, and tolerance to latency, data loss, and localization noise. Experiments across paradigms then show both the gains and the remaining shortcomings of existing cooperative approaches.

Core claim

Griffin is a comprehensive AGC 3D perception dataset featuring over 250 dynamic scenes (37k+ frames) with varied drone altitudes (20-60m), diverse weather conditions, realistic drone dynamics via CARLA-AirSim co-simulation, and critical occlusion-aware 3D annotations, accompanied by a unified benchmarking framework for cooperative detection and tracking that evaluates communication efficiency, altitude adaptability, and robustness to communication latency, data loss and localization noise.

What carries the argument

The Griffin dataset together with its unified benchmarking framework that runs protocols for communication efficiency, altitude adaptability, and robustness to latency, data loss, and localization noise.

If this is right

Cooperative detection and tracking methods can be directly compared on communication volume and latency tolerance using the supplied protocols.
Performance of existing methods can be measured across drone altitudes from 20 m to 60 m and under multiple weather conditions.
Limitations of current cooperative paradigms become visible when the benchmark introduces data loss or localization noise.
Future algorithm design receives concrete targets from the demonstrated gaps in altitude adaptability and robustness.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the simulated occlusions and dynamics match field conditions, the benchmark rankings could guide hardware choices for real drone-vehicle teams.
The dataset structure could be extended by adding new sensor modalities without changing the evaluation protocols.
Insights on altitude effects might inform optimal drone flight policies that minimize communication while preserving detection accuracy.

Load-bearing premise

The CARLA-AirSim co-simulation and generated annotations sufficiently represent real-world aerial-ground sensor data, dynamics, and occlusion patterns.

What would settle it

Physical drone and vehicle tests that produce detection and tracking metrics differing substantially from those measured on the Griffin benchmark.

read the original abstract

While cooperative perception can overcome the limitations of single-vehicle systems, the practical implementation of vehicle-to-vehicle and vehicle-to-infrastructure systems is often impeded by significant economic barriers. Aerial-ground cooperation (AGC), which pairs ground vehicles with drones, presents a more economically viable and rapidly deployable alternative. However, this emerging field has been held back by a critical lack of high-quality public datasets and benchmarks. To bridge this gap, we present \textit{Griffin}, a comprehensive AGC 3D perception dataset, featuring over 250 dynamic scenes (37k+ frames). It incorporates varied drone altitudes (20-60m), diverse weather conditions, realistic drone dynamics via CARLA-AirSim co-simulation, and critical occlusion-aware 3D annotations. Accompanying the dataset is a unified benchmarking framework for cooperative detection and tracking, with protocols to evaluate communication efficiency, altitude adaptability, and robustness to communication latency, data loss and localization noise. By experiments through different cooperative paradigms, we demonstrate the effectiveness and limitations of current methods and provide crucial insights for future research. The dataset and codes are available at https://github.com/wang-jh18-SVM/Griffin.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Griffin supplies the first public AGC dataset in CARLA-AirSim with altitude/weather variation and comms benchmarks, but the results rest on unvalidated simulation fidelity.

read the letter

Griffin is the first public dataset aimed at aerial-ground cooperative 3D perception. It gives 250+ dynamic scenes, 37k frames, drone altitudes from 20-60 m, weather variation, occlusion-aware 3D labels, and a benchmark that tests detection and tracking under latency, data loss, and localization noise. The release includes code and follows explicit protocols, which directly addresses the gap the abstract describes for cheaper alternatives to V2V/V2I setups. That part is straightforward and useful for anyone who wants a standardized AGC testbed. The experiments run several cooperative paradigms and flag where current methods fall short on altitude or communication issues. The soft spot is exactly the one in the stress-test note. Everything comes from CARLA-AirSim co-simulation with no reported comparison to physical drone flights, real sensor noise, or ground-truth occlusion patterns. Without that check, the claims about effectiveness, limitations, and robustness transfer only inside the simulator. This is a common limitation for sim-only dataset papers, but it caps how far the benchmark insights can be taken until someone does the cross-validation. The work is for researchers building or testing cooperative perception methods who need an AGC starting point with public data and clear eval settings. It is coherent on its own terms and shows honest engagement with the stated problem. I would send it to peer review rather than desk reject; the data release and protocols are substantive enough to warrant referee input even if revisions are needed on the sim-to-real discussion.

Referee Report

1 major / 2 minor

Summary. The paper introduces Griffin, a dataset and benchmark for aerial-ground cooperative (AGC) 3D perception and tracking. It comprises over 250 dynamic scenes (37k+ frames) generated via CARLA-AirSim co-simulation, with varied drone altitudes (20-60 m), weather conditions, realistic drone dynamics, and occlusion-aware 3D annotations. A unified benchmarking framework is provided with protocols to assess cooperative detection/tracking under communication constraints (efficiency, latency, data loss, localization noise, altitude adaptability). Experiments across cooperative paradigms are used to illustrate effectiveness and limitations of existing methods.

Significance. As a public data release with accompanying code and explicit evaluation protocols, Griffin fills a noted gap in resources for the emerging AGC subfield. The scale, diversity of conditions, and standardized benchmark protocols could enable reproducible comparisons and targeted progress on cooperative perception if the simulated data characteristics prove representative. The work explicitly ships the dataset and codes, supporting external use and extension.

major comments (1)

[Abstract] Abstract: The claim that the experiments 'demonstrate the effectiveness and limitations of current methods and provide crucial insights for future research' is load-bearing for the paper's positioning of Griffin as a transferable benchmark resource. This rests on the unvalidated assumption that CARLA-AirSim co-simulation faithfully reproduces real-world aerial-ground sensor characteristics, drone dynamics at 20-60 m, weather effects, and occlusion patterns; no cross-validation against physical sensors, real drone flights, or ground-truth comparisons is reported.

minor comments (2)

[Abstract] The abstract references 'occlusion-aware 3D annotations' without detailing the annotation pipeline, quality assurance, or inter-annotator agreement; this should be expanded in the methods section for reproducibility.
Consider adding an explicit limitations subsection that directly addresses the sim-to-real gap and any known discrepancies in sensor modeling or dynamics.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive assessment of Griffin as a public dataset and benchmark, and for the constructive comment on the abstract. We address the point below.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that the experiments 'demonstrate the effectiveness and limitations of current methods and provide crucial insights for future research' is load-bearing for the paper's positioning of Griffin as a transferable benchmark resource. This rests on the unvalidated assumption that CARLA-AirSim co-simulation faithfully reproduces real-world aerial-ground sensor characteristics, drone dynamics at 20-60 m, weather effects, and occlusion patterns; no cross-validation against physical sensors, real drone flights, or ground-truth comparisons is reported.

Authors: We agree that the manuscript provides no cross-validation of the CARLA-AirSim simulation against real-world sensors or flights. The dataset and experiments are entirely simulated, and the claim in the abstract should not imply direct real-world transferability. We will revise the abstract to read: 'By experiments through different cooperative paradigms, we demonstrate the effectiveness and limitations of current methods within the simulated environment and provide insights for future research.' This change qualifies the scope without altering the paper's core contribution of releasing the dataset, benchmark protocols, and code. revision: yes

Circularity Check

0 steps flagged

No circularity: dataset release with no derivations or self-referential predictions

full rationale

The paper is a data release and benchmark presentation. It contains no equations, fitted parameters, predictions, or derivation chains that could reduce to inputs by construction. Central claims concern the creation of the Griffin dataset (250+ scenes, CARLA-AirSim co-simulation, annotations) and associated evaluation protocols; these are not derived from prior results via self-citation or ansatz. The simulation fidelity assumption is an external modeling choice, not a load-bearing mathematical step. No self-citation load-bearing, uniqueness theorems, or renaming of known results occurs. This is the expected non-finding for a dataset paper whose value is measured by external adoption rather than internal consistency of a derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the representativeness of the CARLA-AirSim simulation for real AGC scenarios and on the accuracy of the occlusion-aware 3D annotations; no free parameters or invented entities are introduced.

axioms (1)

domain assumption CARLA-AirSim co-simulation produces sufficiently realistic drone dynamics, sensor readings, and environmental effects for benchmark purposes
Dataset generation and all reported experiments depend on this assumption being adequate for transfer to physical systems.

pith-pipeline@v0.9.0 · 5766 in / 1333 out tokens · 39471 ms · 2026-05-22T23:51:03.962180+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We present Griffin, a comprehensive AGC 3D perception dataset featuring over 250 dynamic scenes (37k+ frames) with varied drone altitudes, weather, CARLA-AirSim dynamics, occlusion-aware 3D annotations...
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Table 3: Model performance, communication cost, and computational efficiency... Early Fusion... V2X-ViT... CoopTrack...

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.