An experiment shows humans detect fully synthetic speech below chance but exhibit implicit discrimination via quality ratings in a localization task with trust cue manipulations.
I Hear, Therefore I Trust: A Socio-Technical Investigation of Humans as Synthetic Speech Detectors
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
Automatic deepfake detection has received considerable research attention, yet the socio-technical environment in which humans actually encounter synthetic speech remains poorly understood. We investigate voice deepfake detection as a perceptual and contextual process, presenting a localization task in which 47 participants marked suspected synthetic segments across authentic, fully synthetic, and partially synthetic utterances under three manipulated trust cues: instructional framing, affective priming, and provenance labeling. Participants provided quality ratings on mechanicalness, expressiveness, intelligibility, clarity, calmness, and confidence of evaluation. Utterance class was the primary determinant of detection accuracy and perceptual quality; trust cues produced no main effects but motivated detection behavior. Fully synthetic speech was detected at below-chance levels. Quality ratings tracked utterance type, indicating implicit discrimination where overt detection failed.
fields
eess.AS 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
I Hear, Therefore I Trust: A Socio-Technical Investigation of Humans as Synthetic Speech Detectors
An experiment shows humans detect fully synthetic speech below chance but exhibit implicit discrimination via quality ratings in a localization task with trust cue manipulations.