Benchmarking Cross-Domain Audio-Visual Deception Detection

Adams Wai-Kin Kong; Alex C. Kot; Bingquan Shen; Nithish Muthuchamy Selvaraj; Xiaobao Guo; Zitong Yu

arxiv: 2405.06995 · v4 · pith:7SWAJTPQnew · submitted 2024-05-11 · 💻 cs.SD · cs.CV· cs.MM· eess.AS

Benchmarking Cross-Domain Audio-Visual Deception Detection

Xiaobao Guo , Zitong Yu , Nithish Muthuchamy Selvaraj , Bingquan Shen , Adams Wai-Kin Kong , Alex C. Kot This is my paper

classification 💻 cs.SD cs.CVcs.MMeess.AS

keywords deceptiondetectionaudio-visualcross-domaindomaingeneralizationperformanceaudio

0 comments

read the original abstract

Automated deception detection is crucial for assisting humans in accurately assessing truthfulness and identifying deceptive behavior. Conventional contact-based techniques, like polygraph devices, rely on physiological signals to determine the authenticity of an individual's statements. Nevertheless, recent developments in automated deception detection have demonstrated that multimodal features derived from both audio and video modalities may outperform human observers on publicly available datasets. Despite these positive findings, the generalizability of existing audio-visual deception detection approaches across different scenarios remains largely unexplored. To close this gap, we present the first cross-domain audio-visual deception detection benchmark, that enables us to assess how well these methods generalize for use in real-world scenarios. We used widely adopted audio and visual features and different architectures for benchmarking, comparing single-to-single and multi-to-single domain generalization performance. To further exploit the impacts using data from multiple source domains for training, we investigate three types of domain sampling strategies, including domain-simultaneous, domain-alternating, and domain-by-domain for multi-to-single domain generalization evaluation. We also propose an algorithm to enhance the generalization performance by maximizing the gradient inner products between modality encoders, named ``MM-IDGM". Furthermore, we proposed the Attention-Mixer fusion method to improve performance, and we believe that this new cross-domain benchmark will facilitate future research in audio-visual deception detection.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

DecepGPT: Schema-Driven Deception Detection with Multicultural Datasets and Robust Multimodal Learning
cs.CV 2026-03 unverdicted novelty 7.0

A new 1695-sample multicultural dataset plus two modules for stable multimodal fusion and modality consistency yield state-of-the-art deception detection with cross-cultural transfer.