arxiv: 2604.05748 · v1 · submitted 2026-04-07 · 💻 cs.CV

Recognition: no theorem link

SVC 2026: the Second Multimodal Deception Detection Challenge and the First Domain Generalized Remote Physiological Measurement Challenge

Dongliang Zhu , Zhiyi Niu , Bo Zhao , Jiajian Huang , Shuo Ye , Xun Lin , Hui Ma , Taorui Wang

show 9 more authors

Jiayu Zhang Chunmei Zhu Junzhe Cao Yingjie Ma Rencheng Song Albert Clap\'es Sergio Escalera Dan Guo Zitong Yu

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:39 UTC · model grok-4.3

classification 💻 cs.CV

keywords subtle visual signalsdeception detectionremote photoplethysmographydomain generalizationmultimodal learningcomputer vision challengerPPG estimation

0 comments

The pith

The Subtle Visual Challenge 2026 organizes two tasks to build robust models for detecting weak visual signals in deception and physiological data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper announces the Subtle Visual Challenge to address limitations in current computer vision models when dealing with hard-to-perceive visual signals that carry key information. Existing approaches often lack robustness and the ability to generalize across real-world conditions and domains. The challenge introduces cross-domain multimodal deception detection and domain-generalized remote photoplethysmography estimation as concrete tasks to drive progress. By releasing baselines and attracting 22 participating teams, the effort aims to foster better representations that support applications from security to health monitoring.

Core claim

The Subtle Visual Challenge is established with two tasks—cross-domain multimodal deception detection and remote photoplethysmography estimation—to encourage the learning of robust representations for subtle visual signals that are difficult to perceive directly but reveal important hidden patterns.

What carries the argument

The Subtle Visual Challenge platform, which defines cross-domain multimodal deception detection and domain-generalized rPPG estimation tasks to target robustness and generalization gaps in subtle signal handling.

If this is right

Models will improve in handling subtle signals across different domains and modalities.
Research in computer vision and multimodal learning will advance through shared benchmarks and baselines.
Applications in biometric security, medical diagnosis, and affective computing will benefit from more reliable detection of weak visual cues.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Success here could enable more reliable non-contact vital sign monitoring in varied lighting and movement conditions.
The tasks may connect to broader problems like low-signal feature extraction in noisy real-world video data.
Future extensions could test whether challenge-derived representations transfer to related subtle-signal domains such as micro-expression analysis.

Load-bearing premise

That setting up this specific challenge with the stated tasks will successfully encourage the development of more robust and generalizable models for subtle visual understanding.

What would settle it

If post-challenge models show no measurable gains in accuracy or generalization on out-of-domain tests for deception detection or rPPG estimation compared to pre-challenge baselines, the premise that the challenge drives progress would be undermined.

Figures

Figures reproduced from arXiv: 2604.05748 by Albert Clap\'es, Bo Zhao, Chunmei Zhu, Dan Guo, Dongliang Zhu, Hui Ma, Jiajian Huang, Jiayu Zhang, Junzhe Cao, Rencheng Song, Sergio Escalera, Shuo Ye, Taorui Wang, Xun Lin, Yingjie Ma, Zhiyi Niu, Zitong Yu.

**Figure 1.** Figure 1: Examples of deceptive actions. narios, significant domain shifts often lead to performance degradation. Differences in acquisition conditions, behavioral expressions, and modality distributions across datasets make cross-domain generalization a critical bottleneck for practical applications. On the other hand, rPPG aims to recover physiological signals from videos by capturing extremely subtle color chan… view at source ↗

**Figure 2.** Figure 2: Sample examples from the DOLOS, Bag-of-Lies, and [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Baseline model. where s denotes the predicted score. Participants are required to submit a prediction file containing the sample identifier and the corresponding predicted score. An example format is shown below: SJ_BOL_EP3_lie_4 0.14431 3.3. Baseline model We provide a baseline model for cross-domain audiovisual deception detection. The model first extracts frame-level facial features using ResNet18. Be… view at source ↗

**Figure 4.** Figure 4: Network framework of Team xkxkxk. aware feature adaptation framework. The architecture processes visual information through two distinct branches: a behavior branch utilizing OpenFace Affect to extract structured facial cues from video inputs, and a spatiotemporal branch employing a ResNet18-GRU backbone to encode raw sequential facial inputs. To effectively isolate domain-invariant deceptive signals fro… view at source ↗

**Figure 6.** Figure 6: Network framework of Team ahrior. 4. The First Domain Generalized Remote Physiological Measurement Challenge 4.1. Challenge Corpora The competition is conducted on five distinct datasets to rigorously evaluate physiological measurement and crossdomain generalization: UBFC-rPPG [2], PURE [12], BUAA-MIHR [19], MMPD [13], and PhysDrive [15]. Due to strict dataset permission restrictions, the official organi… view at source ↗

**Figure 5.** Figure 5: Network framework of Team sqd. 3.4.3. Team ahrior TDAF-Net first integrates visual, audio, and textual cues to uncover subtle forgery traces that are invisible in singlemodality analysis. Then, it deploys a Temporal Difference module to capture inter-frame discrepancies, such as inconsistent motions or abrupt scene changes. Meanwhile, it also deploys a multimodal Bi-LSTM to encode sentiment, facial expre… view at source ↗

**Figure 8.** Figure 8: Network framework of Team GDMU ZZU. 5. Conclusion In this paper, we present the SVC 2026 challenge, which focuses on modeling subtle visual signals through two tasks: multimodal deception detection and domain generalized rPPG estimation. By providing a unified evaluation framework, standardized datasets, and reproducible protocols, the challenge offers a systematic benchmark for studying robustness and … view at source ↗

**Figure 7.** Figure 7: Network framework of Team RPM-HFUT. 4.3.2. Team GDMU ZZU In rPPG estimation, pulse-related signals are typically weak and easily affected by background noise and nonphysiological interference, while effective representation learning requires jointly modeling long-range periodic dependencies and short-term temporal dynamics. To address these challenges, Team GDMU ZZU proposes a dualbranch collaborative … view at source ↗

read the original abstract

Subtle visual signals, although difficult to perceive with the naked eye, contain important information that can reveal hidden patterns in visual data. These signals play a key role in many applications, including biometric security, multimedia forensics, medical diagnosis, industrial inspection, and affective computing. With the rapid development of computer vision and representation learning techniques, detecting and interpreting such subtle signals has become an emerging research direction. However, existing studies often focus on specific tasks or modalities, and models still face challenges in robustness, representation ability, and generalization when handling subtle and weak signals in real-world environments. To promote research in this area, we organize the Subtle visual Challenge, which aims to learn robust representations for subtle visual signals. The challenge includes two tasks: cross-domain multimodal deception detection and remote photoplethysmography (rPPG) estimation. We hope that this challenge will encourage the development of more robust and generalizable models for subtle visual understanding, and further advance research in computer vision and multimodal learning. A total of 22 teams submitted their final results to this workshop competition, and the corresponding baseline models have been released on the \href{https://sites.google.com/view/svc-cvpr26}{MMDD2026 platform}\footnote{https://sites.google.com/view/svc-cvpr26}

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a standard challenge report paper that sets up two tasks on subtle visual signals and notes 22 participating teams with released baselines, but adds no new methods or analyzed findings.

read the letter

This paper is mainly an announcement of the Subtle Visual Challenge, covering cross-domain multimodal deception detection and domain-generalized remote photoplethysmography estimation. It reports 22 teams submitting results and points to a platform with baselines. That is the core of what it delivers. The organizers have picked tasks that target real weaknesses in current models, like handling domain shifts in rPPG and combining modalities for deception cues. Releasing baselines is a concrete step that lets others start from the same point instead of reinventing simple setups. The framing around applications in security, health, and forensics is clear and ties the tasks to practical needs. The paper stays focused on describing the challenge structure without overclaiming technical breakthroughs. On the soft side, it stays descriptive. There is little detail on the exact datasets, evaluation protocols, or what the submissions actually showed in terms of performance gaps or common errors. The hope that the challenge will push better generalization is stated but not supported by any results or lessons from the 22 teams here. No comparisons to prior challenges or discussion of why certain approaches worked or failed. This kind of paper fits readers who track benchmarks in computer vision for physiological signals or multimodal detection. Someone building on rPPG or deception work might grab the baselines and platform link for their own experiments. It does not offer new insights that would shift research directions on their own. I would send it to peer review for a workshop venue because documenting organized challenges with participation numbers and public baselines helps the community keep track of progress on these problems. A referee could reasonably ask for more result summaries, but the paper is coherent as a record of the event.

Referee Report

1 major / 2 minor

Summary. The manuscript announces the SVC 2026 challenge (Subtle Visual Challenge), which consists of two tasks: cross-domain multimodal deception detection and domain-generalized remote photoplethysmography (rPPG) estimation. It reports participation from 22 teams that submitted final results and states that baseline models have been released on the MMDD2026 platform, with the goal of encouraging robust representations for subtle visual signals in computer vision and multimodal learning.

Significance. Challenge reports of this type can help standardize benchmarks and stimulate community interest in under-explored areas such as subtle visual cue detection for deception and physiological measurement. The reported participation of 22 teams and the release of baselines provide a modest foundation for future work, though the manuscript itself contains no new methods, empirical results, or analysis.

major comments (1)

Abstract: The manuscript states the challenge organization and participation count but supplies no task definitions, dataset descriptions, evaluation metrics, baseline implementations, or results analysis. These elements are load-bearing for the central claim that the challenge will promote research in robust subtle-visual representations, as the community cannot engage with or build upon the tasks without them.

minor comments (2)

Abstract: The title expands SVC as the Second Multimodal Deception Detection Challenge while the body text refers to the 'Subtle visual Challenge'; a single consistent expansion of the acronym would remove ambiguity.
Abstract: The href link and the accompanying footnote both contain the identical URL; removing the redundant footnote would improve presentation.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review and for highlighting the need for greater detail in the abstract. We agree that the current version is high-level and will revise the manuscript to better support the central claims.

read point-by-point responses

Referee: Abstract: The manuscript states the challenge organization and participation count but supplies no task definitions, dataset descriptions, evaluation metrics, baseline implementations, or results analysis. These elements are load-bearing for the central claim that the challenge will promote research in robust subtle-visual representations, as the community cannot engage with or build upon the tasks without them.

Authors: We acknowledge that the abstract, as presented, is concise and does not enumerate task definitions, dataset details, evaluation metrics, or baseline results. The manuscript is structured as a brief challenge announcement whose primary purpose is to report organization and participation; full specifications, data access, metrics, and baseline code are provided on the MMDD2026 platform referenced in the text. Nevertheless, we agree that this separation reduces self-containment. In the revised version we will expand the abstract to include concise statements of the two tasks, key dataset characteristics, the evaluation protocols, and a summary of baseline performance, while retaining the link to the platform for complete implementations and results. revision: yes

Circularity Check

0 steps flagged

No significant circularity: descriptive competition announcement with no derivations or fitted claims

full rationale

The manuscript is a workshop competition report announcing two tasks (cross-domain multimodal deception detection and domain-generalized rPPG estimation), reporting 22 participating teams, and releasing baselines. It contains no equations, no technical derivations, no parameter fitting, no predictions of model performance, and no load-bearing claims that could reduce to self-definition or self-citation. The aspirational statement that the challenge will encourage robust models is promotional and not a falsifiable technical result within the paper. No patterns from the enumerated circularity kinds are present; the work is self-contained as an organizational document.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This document is a competition announcement paper. It introduces no mathematical models, empirical claims, or theoretical constructs that require free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5594 in / 990 out tokens · 34223 ms · 2026-05-10T18:39:19.776864+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

25 extracted references · 4 canonical work pages

[1]

Bag-of-lies: A multimodal dataset for deception detection
[2]

Unsupervised skin tissue seg- mentation for remote photoplethysmography.Pattern recog- nition letters, 124:82–90, 2019

Serge Bobbia, Richard Macwan, Yannick Benezeth, Alamin Mansouri, and Julien Dubois. Unsupervised skin tissue seg- mentation for remote photoplethysmography.Pattern recog- nition letters, 124:82–90, 2019. 6

2019
[3]

Audio-visual deception detection: Dolos dataset and parameter-efficient crossmodal learning

Xiaobao Guo, Nithish Muthuchamy Selvaraj, Zitong Yu, Adams Wai-Kin Kong, Bingquan Shen, and Alex Kot. Audio-visual deception detection: Dolos dataset and parameter-efficient crossmodal learning. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion, pages 22135–22145, 2023. 2, 4

2023
[4]

Channel-wise interactive learning for remote heart rate estimation from facial video.IEEE Transactions on Circuits and Systems for Video Technology, 34(6):4542–4555, 2023

Qi Li, Dan Guo, Wei Qian, Xilan Tian, Xiao Sun, Haifeng Zhao, and Meng Wang. Channel-wise interactive learning for remote heart rate estimation from facial video.IEEE Transactions on Circuits and Systems for Video Technology, 34(6):4542–4555, 2023. 3

2023
[5]

Svc 2025: the first multimodal deception detection challenge

Xun Lin, Xiaobao Guo, Taorui Wang, Yingjie Ma, Jiajian Huang, Jiayu Zhang, Junzhe Cao, and Zitong Yu. Svc 2025: the first multimodal deception detection challenge. InPro- ceedings of the 1st International Workshop & Challenge on Subtle Visual Computing, pages 59–64, 2025. 2

2025
[6]

Miami university deception detection database.Behavior re- search methods, 51(1):429–439, 2019

E Paige Lloyd, Jason C Deska, Kurt Hugenberg, Allen R Mc- Connell, Brandon T Humphrey, and Jonathan W Kunstman. Miami university deception detection database.Behavior re- search methods, 51(1):429–439, 2019. 3, 4

2019
[7]

Deception detection using real-life trial data

Ver ´onica P´erez-Rosas, Mohamed Abouelenien, Rada Mihal- cea, and Mihai Burzo. Deception detection using real-life trial data. InProceedings of the 2015 ACM on international conference on multimodal interaction, pages 59–66, 2015. 3, 4

2015
[8]

Wei Qian, Dan Guo, Kun Li, Xiaowei Zhang, Xilan Tian, Xun Yang, and Meng Wang. Dual-path tokenlearner for re- mote photoplethysmography-based physiological measure- ment with facial videos.IEEE Transactions on Computa- tional Social Systems, 11(3):4465–4477, 2024. 3

2024
[9]

Cluster-phys: Facial clues clustering towards efficient re- mote physiological measurement

Wei Qian, Kun Li, Dan Guo, Bin Hu, and Meng Wang. Cluster-phys: Facial clues clustering towards efficient re- mote physiological measurement. InProceedings of the 32nd ACM International Conference on Multimedia, pages 330– 339, 2024. 3

2024
[10]

Physdiff: physiology-based dynamicity disentangled diffusion model for remote physiological measurement

Wei Qian, Gaoji Su, Dan Guo, Jinxing Zhou, Xiaobai Li, Bin Hu, Shengeng Tang, and Meng Wang. Physdiff: physiology-based dynamicity disentangled diffusion model for remote physiological measurement. InProceedings of the AAAI Conference on Artificial Intelligence, pages 6568– 6576, 2025. 3

2025
[11]

Box of lies: Multimodal deception detection in dialogues

Felix Soldner, Ver ´onica P ´erez-Rosas, and Rada Mihalcea. Box of lies: Multimodal deception detection in dialogues. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 1768–1777, 2019. 4

2019
[12]

Non-contact video-based pulse rate measurement on a mo- bile service robot

Ronny Stricker, Steffen M ¨uller, and Horst-Michael Gross. Non-contact video-based pulse rate measurement on a mo- bile service robot. InThe 23rd IEEE International Sym- posium on Robot and Human Interactive Communication, pages 1056–1062. IEEE, 2014. 6

2014
[13]

Mmpd: Multi- domain mobile video physiology dataset

Jiankai Tang, Kequan Chen, Yuntao Wang, Yuanchun Shi, Shwetak Patel, Daniel McDuff, and Xin Liu. Mmpd: Multi- domain mobile video physiology dataset. In2023 45th An- nual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pages 1–5, 2023. 6

2023
[14]

Phys- edigan: A privacy-preserving method for editing physio- logical signals in facial videos.Pattern Recognition, 169: 111966, 2026

Xiaoguang Tu, Zhiyi Niu, Juhang Yin, Yanyan Zhang, Ming Yang, Lin Wei, Yu Wang, Zhaoxin Fan, and Jian Zhao. Phys- edigan: A privacy-preserving method for editing physio- logical signals in facial videos.Pattern Recognition, 169: 111966, 2026. 3

2026
[15]

Physdrive: A multimodal remote physiological measurement dataset for in-vehicle driver monitoring.arXiv preprint arXiv:2507.19172, 2025

Jiyao Wang, Xiao Yang, Qingyong Hu, Jiankai Tang, Can Liu, Dengbo He, Yuntao Wang, Yingcong Chen, and Kaishun Wu. Physdrive: A multimodal remote physiological measurement dataset for in-vehicle driver monitoring.arXiv preprint arXiv:2507.19172, 2025. 6

work page arXiv 2025
[16]

Micro-gesture recognition: A comprehensive survey of datasets, methods, and challenges

Taorui Wang, Xun Lin, Yong Xu, Qilang Ye, Dan Guo, Ser- gio Escalera, Ghada Khoriba, and Zitong Yu. Micro-gesture recognition: A comprehensive survey of datasets, methods, and challenges. 23(2):308–331, 2026. 2

2026
[17]

Deception detection in videos

Zhe Wu, Bharat Singh, Larry Davis, and V Subrahmanian. Deception detection in videos. InProceedings of the AAAI conference on artificial intelligence, 2018. 2

2018
[18]

Cardiacmamba: A multimodal rgb-rf fusion framework with state space models for remote phys- iological measurement.arXiv preprint arXiv:2502.13624,

Zheng Wu, Yiping Xie, Bo Zhao, Jiguang He, Fei Luo, Ning Deng, and Zitong Yu. Cardiacmamba: A multimodal rgb-rf fusion framework with state space models for remote phys- iological measurement.arXiv preprint arXiv:2502.13624,

work page arXiv
[19]

Image enhancement for remote photoplethys- mography in a low-light environment

Lin Xi, Weihai Chen, Changchen Zhao, Xingming Wu, and Jianhua Wang. Image enhancement for remote photoplethys- mography in a low-light environment. In2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), pages 1–7, 2020. 6

2020
[20]

Fusionmamba: Dynamic feature enhancement for mul- timodal image fusion with mamba.Visual Intelligence, 2(1): 37, 2024

Xinyu Xie, Yawen Cui, Tao Tan, Xubin Zheng, and Zitong Yu. Fusionmamba: Dynamic feature enhancement for mul- timodal image fusion with mamba.Visual Intelligence, 2(1): 37, 2024. 2

2024
[21]

Physllm: Harnessing large language models for cross-modal remote physiological sensing.arXiv preprint arXiv:2505.03621, 2025

Yiping Xie, Bo Zhao, Mingtong Dai, Jian-Ping Zhou, Yue Sun, Tao Tan, Weicheng Xie, Linlin Shen, and Zi- tong Yu. Physllm: Harnessing large language models for cross-modal remote physiological sensing.arXiv preprint arXiv:2505.03621, 2025. 3

work page arXiv 2025
[22]

Multimodal deception de- tection: A survey.Machine Intelligence Research, 23(2): 284–307, 2026

Jiayu Zhang, Xun Lin, Jiajian Huang, Shuo Ye, Xiaobao Guo, Dongliang Zhu, Ruimin Hu, Dan Guo, Yanyan Liang, Zitong Yu, and Xiaochun Cao. Multimodal deception de- tection: A survey.Machine Intelligence Research, 23(2): 284–307, 2026. 2

2026
[23]

Phase-net: Physics-grounded harmonic attention system for efficient re- mote photoplethysmography measurement.arXiv preprint arXiv:2509.24850, 2025

Bo Zhao, Dan Guo, Junzhe Cao, Yong Xu, Tao Tan, Yue Sun, Bochao Zou, Jie Zhang, and Zitong Yu. Phase-net: Physics-grounded harmonic attention system for efficient re- mote photoplethysmography measurement.arXiv preprint arXiv:2509.24850, 2025. 3

work page arXiv 2025
[24]

Cross-illumination video anomaly de- tection benchmark

Dongliang Zhu, Ruimin Hu, Shengli Song, Xiang Guo, Xixi Li, and Zheng Wang. Cross-illumination video anomaly de- tection benchmark. InProceedings of the 31st ACM Interna- tional Conference on Multimedia, pages 2516–2525, 2023. 1

2023
[25]

Detecting deceptive behavior via learn- ing relation-aware visual representations.IEEE Transactions on Information Forensics and Security, 2025

Dongliang Zhu, Chi Zhang, Ruimin Hu, Mei Wang, Liang Liao, and Mang Ye. Detecting deceptive behavior via learn- ing relation-aware visual representations.IEEE Transactions on Information Forensics and Security, 2025. 2

2025