Presents a three-step crowdsourcing framework for audio captioning datasets that reduces typographical errors and yields captions with average Jaccard similarity of 0.24.
It can be seen that the edited captions are less likely to contain any typo- graphical errors than the initial captions
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.SD 1years
2019 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Crowdsourcing a Dataset of Audio Captions
Presents a three-step crowdsourcing framework for audio captioning datasets that reduces typographical errors and yields captions with average Jaccard similarity of 0.24.