Lightweight multimodal projector alignment transfers RGB VLMs to thermal drone imagery, achieving F1 scores of 0.915-0.968 for deer, rhino, and elephant recognition plus high enumeration accuracy and habitat context interpretation on a real drone dataset.
Ecosphere 10 (6), e02768
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Lightweight Multimodal Adaptation of Vision Language Models for Species Recognition and Habitat Context Interpretation in Drone Thermal Imagery
Lightweight multimodal projector alignment transfers RGB VLMs to thermal drone imagery, achieving F1 scores of 0.915-0.968 for deer, rhino, and elephant recognition plus high enumeration accuracy and habitat context interpretation on a real drone dataset.