Screening methods based on anchor ordering, rating span, traps, and gold standard questions improve the reliability of crowdsourced listening tests for speech codecs relative to conventional lab tests.
Screening Matters: A Comparative Study of Conventional and Crowdsourced Listening Tests
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
Subjective evaluation remains the most reliable way of testing speech and audio coding techniques. Crowdsourcing the listening task is a cost-efficient and fast way of conducting this evaluation, but the quality of the results tends to be inferior to that of conventional listening tests done in the controlled environment of a laboratory. In this paper, classical and neural speech codecs are evaluated to compare P.808 against P.800 DCR tests. A statistical analysis is conducted to investigate the effectiveness of selected screening methods. The analysis shows that the crowdsourced evaluation can be improved by employing postscreening methods based on anchor ordering and rating span, and continuous screening methods like traps and gold standard questions, thus giving more value to the ratings obtained for the codecs under test. Based on these outcomes, a set of suitable screenings is proposed, for cost-effective, simplified, and bias-free enhancement of listening results.
fields
eess.AS 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Screening Matters: A Comparative Study of Conventional and Crowdsourced Listening Tests
Screening methods based on anchor ordering, rating span, traps, and gold standard questions improve the reliability of crowdsourced listening tests for speech codecs relative to conventional lab tests.