Presents a three-step crowdsourcing framework for audio captioning datasets that reduces typographical errors and yields captions with average Jaccard similarity of 0.24.
Deep learning for image-to-text gener- ation: A technical overview,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.SD 1years
2019 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Crowdsourcing a Dataset of Audio Captions
Presents a three-step crowdsourcing framework for audio captioning datasets that reduces typographical errors and yields captions with average Jaccard similarity of 0.24.