Introduces a multimodal UAV command dataset and shows image-augmented RNN language models outperform text-only versions despite imperfect training associations.
The dataset consists of three types of modalities: language (commands), audio (utterances), vision (images)
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.SD 1years
2019 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Kite: Automatic speech recognition for unmanned aerial vehicles
Introduces a multimodal UAV command dataset and shows image-augmented RNN language models outperform text-only versions despite imperfect training associations.