arxiv: 2603.26064 · v2 · submitted 2026-03-27 · 💻 cs.CV · cs.AI

Recognition: 2 theorem links

· Lean Theorem

MuDD: A Multimodal Deception Detection Dataset and GSR-Guided Progressive Distillation for Non-Contact Deception Detection

Peiyuan Jiang , Yao Liu , Yanglei Gan , Jiaye Yang , Lu Liu , Daibing Yao , Qiao Liu

Authors on Pith no claims yet

Pith reviewed 2026-05-14 23:51 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords deception detectionmultimodal datasetcross-modal knowledge distillationgalvanic skin responseprogressive distillationnon-contact sensingMuDD dataset

0 comments

The pith

GSR-guided progressive distillation transfers stable cues from skin response to video and audio for non-contact deception detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents MuDD, a new dataset of 130 participants with synchronized video, audio, and GSR recordings totaling 690 minutes, along with other physiological and personality data. It then introduces GSR-guided Progressive Distillation (GPD), a framework that progressively distills knowledge from GSR signals at feature and digit levels using dynamic routing to guide learning in visual and auditory modalities. This addresses the instability of deception cues in non-contact signals by leveraging reliable physiological patterns from GSR. If the approach holds, non-contact deception detection could become more reliable and practical without requiring physical sensors on the subject.

Core claim

The central claim is that integrating progressive feature-level and digit-level distillation with dynamic routing in GPD allows effective transfer of deception-related knowledge from GSR to non-contact modalities despite large mismatches, resulting in state-of-the-art performance on deception detection and concealed-digit identification tasks using the MuDD dataset.

What carries the argument

GSR-guided Progressive Distillation (GPD) with progressive feature-level and digit-level distillation and dynamic routing, which adaptively transfers teacher knowledge from GSR signals to mitigate negative transfer in visual and auditory representation learning.

If this is right

GPD achieves state-of-the-art performance on deception detection using only non-contact signals.
The method also leads to superior results on concealed-digit identification.
Progressive distillation with dynamic routing reduces the impact of cross-modal mismatch.
The MuDD dataset enables further studies on multimodal deception including physiological and trait data.
Non-contact detection becomes viable by borrowing stable cues from contact-based GSR.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The dynamic routing mechanism may prove useful in other cross-modal distillation scenarios with mismatched data sources.
Future work could test GPD on live video streams for real-time deception screening applications.
Combining this with personality trait analysis from the dataset might improve detection by accounting for individual differences.
Similar distillation strategies could apply to transferring knowledge from other reliable sensors like EEG to visual domains in affective computing.

Load-bearing premise

The assumption that deception-related knowledge encoded in GSR remains stable and transferable to visual and auditory signals without being overwhelmed by negative transfer from modality differences.

What would settle it

If a baseline model using only visual and audio data from MuDD achieves equal or higher accuracy on deception detection and concealed-digit tasks compared to the GPD model, this would indicate that the GSR guidance does not provide the claimed benefit.

Figures

Figures reproduced from arXiv: 2603.26064 by Daibing Yao, Jiaye Yang, Lu Liu, Peiyuan Jiang, Qiao Liu, Yanglei Gan, Yao Liu.

**Figure 2.** Figure 2: Overview of the proposed GSR-guided Progressive Distillation (GPD) framework. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Sensitivity analysis of dynamic routing. Left: vary [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Effect of distillation on student behavior in the [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

read the original abstract

Non-contact automatic deception detection remains challenging because visual and auditory deception cues often lack stable cross-subject patterns. In contrast, galvanic skin response (GSR) provides more reliable physiological cues and has been widely used in contact-based deception detection. In this work, we leverage stable deception-related knowledge in GSR to guide representation learning in non-contact modalities through cross-modal knowledge distillation. A key obstacle, however, is the lack of a suitable dataset for this setting. To address this, we introduce MuDD, a large-scale Multimodal Deception Detection dataset containing recordings from 130 participants over 690 minutes. In addition to video, audio, and GSR, MuDD also provides Photoplethysmography, heart rate, and personality traits, supporting broader scientific studies of deception. Based on this dataset, we propose GSR-guided Progressive Distillation (GPD), a cross-modal distillation framework for mitigating the negative transfer caused by the large modality mismatch between GSR and non-contact signals. The core innovation of GPD is the integration of progressive feature-level and digit-level distillation with dynamic routing, which allows the model to adaptively determine how teacher knowledge should be transferred during training, leading to more stable cross-modal knowledge transfer. Extensive experiments and visualizations show that GPD outperforms existing methods and achieves state-of-the-art performance on both deception detection and concealed-digit identification.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MuDD is a useful new dataset and GPD a reasonable distillation idea, but the gains are not clearly isolated from the data scale or architecture.

read the letter

The main takeaway is that this paper delivers a larger multimodal dataset than prior work and pairs it with a progressive distillation scheme that tries to move stable GSR signals into visual and audio models. The dataset covers 130 participants with video, audio, GSR, PPG, heart rate, and personality measures, which directly tackles the scarcity problem in non-contact deception detection. The GPD method adds feature-level and digit-level distillation steps plus dynamic routing to handle the modality mismatch, and the reported experiments show it beating existing methods on both deception detection and concealed-digit tasks.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces MuDD, a multimodal deception detection dataset with video, audio, GSR, PPG, heart rate, and personality trait recordings from 130 participants over 690 minutes. It proposes GSR-guided Progressive Distillation (GPD), a cross-modal framework that transfers knowledge from GSR to visual and auditory modalities via progressive feature-level and digit-level distillation combined with dynamic routing to adaptively mitigate negative transfer from modality mismatch. The authors claim that GPD achieves state-of-the-art performance on both deception detection and concealed-digit identification tasks, supported by extensive experiments and visualizations.

Significance. If the central claims hold after verification, the work would deliver a valuable large-scale benchmark dataset that enables systematic study of cross-modal transfer from contact-based physiological signals to non-contact modalities, with additional modalities supporting broader deception research. The GPD approach, by incorporating adaptive routing, offers a concrete mechanism for handling modality gaps that could generalize to other multimodal settings. Dataset scale and the explicit focus on isolating transfer effects represent clear strengths.

major comments (2)

[§5] §5 (Experimental Results): The claim that GPD specifically drives the reported SOTA gains requires explicit ablation isolating the progressive feature-level + digit-level distillation and dynamic routing from the effects of the new MuDD dataset size and any architecture changes. Without these controls, the causal attribution to the proposed mechanism remains unverified, particularly given the large cross-modal mismatch highlighted in the abstract.
[§4.2] §4.2 (GPD Framework): The dynamic routing is presented as adaptively determining transfer schedules, but the manuscript does not detail the optimization of routing parameters, their initialization, or empirical checks that the router avoids collapse or negative transfer; this is load-bearing for the stability claim.

minor comments (2)

[Abstract] Abstract and §3: The dataset description mentions 690 minutes but does not clarify the train/validation/test split ratios or participant-level independence, which affects reproducibility of the reported results.
Figure captions: Several visualizations are referenced but lack explicit axis labels or statistical significance markers, reducing clarity when comparing against baselines.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the contributions of our proposed GPD framework. We will revise the manuscript to address both major points by adding the requested ablations and expanding the description of the dynamic routing mechanism.

read point-by-point responses

Referee: [§5] §5 (Experimental Results): The claim that GPD specifically drives the reported SOTA gains requires explicit ablation isolating the progressive feature-level + digit-level distillation and dynamic routing from the effects of the new MuDD dataset size and any architecture changes. Without these controls, the causal attribution to the proposed mechanism remains unverified, particularly given the large cross-modal mismatch highlighted in the abstract.

Authors: We agree that isolating the contributions of the progressive distillation components and dynamic routing is essential. In the revised manuscript, we will add a dedicated ablation study in Section 5 that trains all variants (full GPD, GPD without feature-level distillation, GPD without digit-level distillation, and GPD without dynamic routing) on the identical MuDD dataset using the same base architecture. This will directly attribute performance differences to the proposed mechanisms rather than dataset scale or architectural differences. revision: yes
Referee: [§4.2] §4.2 (GPD Framework): The dynamic routing is presented as adaptively determining transfer schedules, but the manuscript does not detail the optimization of routing parameters, their initialization, or empirical checks that the router avoids collapse or negative transfer; this is load-bearing for the stability claim.

Authors: We will expand Section 4.2 with the missing details. The routing parameters are optimized jointly via backpropagation using an auxiliary routing loss that encourages balanced modality selection; they are initialized from a uniform distribution followed by softmax normalization. We will also include new empirical analysis (e.g., routing weight trajectories over training epochs and comparisons with/without the routing loss) to demonstrate that the router does not collapse and mitigates negative transfer. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical claims rest on new dataset and independent experiments

full rationale

The paper introduces the MuDD dataset (130 participants, 690 minutes of multimodal recordings) and the GPD framework (progressive feature-level plus digit-level distillation with dynamic routing). Its core claims are that GPD mitigates cross-modal negative transfer and reaches SOTA on deception detection and concealed-digit identification. These rest on empirical comparisons and visualizations rather than any derivation that reduces by construction to fitted parameters, self-citations, or renamed inputs. No equations, uniqueness theorems, or load-bearing self-citations appear in the provided text that would collapse the reported gains into tautological re-statements of the training data or prior author work.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard supervised learning assumptions plus the domain assumption that GSR carries transferable deception cues; no new entities are postulated and free parameters are limited to typical distillation hyperparameters.

free parameters (1)

distillation routing parameters
Dynamic routing mechanism in GPD requires learned or tuned weights to decide knowledge transfer strength at each stage.

axioms (2)

domain assumption GSR signals contain stable deception-related physiological information that generalizes across subjects
Invoked in the motivation for cross-modal distillation and in the design of GPD.
standard math Standard cross-entropy and distillation loss functions are appropriate for the task
Implicit in any knowledge distillation setup for classification.

pith-pipeline@v0.9.0 · 5572 in / 1394 out tokens · 33803 ms · 2026-05-14T23:51:33.043225+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

GPD uses gap-aware dynamic routing to select suitable distillation configurations based on the evolving representational gap between teacher and student, and progressively adjusts the relative importance of feature-level and logit-level knowledge during training
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

MuDD ... 130 participants ... GKT paradigm ... V+A+GSR+PPG+HR+Pers.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages · 1 internal anchor

[1]

Triantafyllos Afouras, Joon Son Chung, and Andrew Zisserman. 2020. Asr is all you need: Cross-modal distillation for lip reading. InICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2143–2147

work page 2020
[2]

Muhammad Haseeb Aslam, Muhammad Osama Zeeshan, Soufiane Belharbi, Marco Pedersoli, Alessandro Lameiras Koerich, Simon Bacon, and Eric Granger

work page
[3]

In2024 IEEE 18th International Conference on Automatic Face and Gesture Recognition (FG)

Distilling privileged multimodal information for expression recognition using optimal transport. In2024 IEEE 18th International Conference on Automatic Face and Gesture Recognition (FG). IEEE, 1–10

work page
[4]

Gershon Ben-Shakhar and Eitan Elaad. 2003. The validity of psychophysiological detection of information with the Guilty Knowledge Test: A meta-analytic review. Journal of Applied Psychology88, 1 (2003), 131

work page 2003
[5]

Charles F Bond Jr and Bella M DePaulo. 2006. Accuracy of deception judgments. Personality and social psychology Review10, 3 (2006), 214–234

work page 2006
[6]

Cong Cai, Shan Liang, Xuefei Liu, Kang Zhu, Zhengqi Wen, Jianhua Tao, Heng Xie, Jizhou Cui, Yiming Ma, Zhenhua Cheng, Hanzhe Xu, Ruibo Fu, Bin Liu, and Yongwei Li. 2025. MDPE: A Multimodal Deception Dataset with Personality and Emotional Characteristics. InProceedings of the 33rd ACM International Conference on Multimedia(Dublin, Ireland)(MM ’25). Associa...

work page doi:10.1145/3746027.3758242 2025
[7]

Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, et al . 2022. Wavlm: Large-scale self-supervised pre-training for full stack speech processing.IEEE Journal of Selected Topics in Signal Processing16, 6 (2022), 1505–1518

work page 2022
[8]

Ziqiang Cheng, Yang Yang, Shuo Jiang, Wenjie Hu, Zhangchi Ying, Ziwei Chai, and Chunping Wang. 2021. Time2Graph+: Bridging time series and graph repre- sentation learning via multiple attentions.IEEE Transactions on Knowledge and Data Engineering35, 2 (2021), 2078–2090

work page 2021
[9]

Bella M DePaulo, Deborah A Kashy, Susan E Kirkendol, Melissa M Wyer, and Jennifer A Epstein. 1996. Lying in everyday life.Journal of personality and social psychology70, 5 (1996), 979

work page 1996
[10]

Mingyu Ding, An Zhao, Zhiwu Lu, Tao Xiang, and Ji-Rong Wen. 2019. Face- focused cross-stream network for deception detection in videos. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7802–7811

work page 2019
[11]

Laslo Dinges, Marc-André Fiedler, Ayoub Al-Hamadi, Thorsten Hempel, Ahmed Abdelrahman, Joachim Weimann, Dmitri Bershadskyy, and Johann Steiner. 2024. Exploring facial cues: automated deception detection using artificial intelligence. Neural Computing and Applications36, 24 (2024), 14857–14883

work page 2024
[12]

Don C Fowles, Margaret J Christie, Robert Edelberg, William W Grings, David T Lykken, and Peter H Venables. 1981. Publication recommendations for electro- dermal measurements.Psychophysiology18, 3 (1981), 232–239

work page 1981
[13]

Jianping Gou, Baosheng Yu, Stephen J Maybank, and Dacheng Tao. 2021. Knowl- edge distillation: A survey.International journal of computer vision129, 6 (2021), 1789–1819

work page 2021
[14]

Xiaobao Guo, Nithish Muthuchamy Selvaraj, Zitong Yu, Adams Wai-Kin Kong, Bingquan Shen, and Alex Kot. 2023. Audio-visual deception detection: Do- los dataset and parameter-efficient crossmodal learning. InProceedings of the IEEE/CVF International Conference on Computer Vision. 22135–22145

work page 2023
[15]

Saurabh Gupta, Judy Hoffman, and Jitendra Malik. 2016. Cross modal distillation for supervision transfer. InProceedings of the IEEE conference on computer vision and pattern recognition. 2827–2836

work page 2016
[16]

Viresh Gupta, Mohit Agarwal, Manik Arora, Tanmoy Chakraborty, Richa Singh, and Mayank Vatsa. 2019. Bag-of-lies: A multimodal dataset for deception detec- tion. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. 0–0

work page 2019
[17]

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network.arXiv preprint arXiv:1503.02531(2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015
[18]

Julia Hirschberg, Stefan Benus, Jason M Brenier, Frank Enos, Sarah Fried- man, Sarah Gilman, Cynthia Girand, Martin Graciarena, Andreas Kathol, Laura Michaelis, et al. 2005. Distinguishing deceptive from non-deceptive speech. In Proc. Interspeech 2005. 1833–1836

work page 2005
[19]

Fushuo Huo, Wenchao Xu, Jingcai Guo, Haozhao Wang, and Song Guo. 2024. C2kd: Bridging the modality gap for cross-modal knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16006–16015

work page 2024
[20]

Gargi Joshi, Vaibhav Tasgaonkar, Aditya Deshpande, Aditya Desai, Bhavya Shah, Akshay Kushawaha, Aadith Sukumar, Kermi Kotecha, Saumit Kunder, Yoginii Waykole, et al. 2025. Multimodal machine learning for deception detection using behavioral and physiological data.Scientific Reports15, 1 (2025), 8943

work page 2025
[21]

FK Lahri and AK Ganguly. 1978. An experimental study of the accuracy of polygraph technique in diagnosis of deception with volunteer and criminal subjects.Polygraph7 (1978), 89–94

work page 1978
[22]

Sarah I Levitan, Guzhen An, Mandi Wang, Gideon Mendels, Julia Hirschberg, Michelle Levine, and Andrew Rosenberg. 2015. Cross-cultural production and detection of deception from speech. InProceedings of the 2015 ACM on workshop on multimodal deception detection. 1–8. Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Jiang et al

work page 2015
[23]

Sarah Ita Levitan, Angel Maredia, and Julia Hirschberg. 2018. Linguistic cues to deception and perceived deception in interview dialogues. InProceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 1941–1950

work page 2018
[24]

Hui Li, Pengfei Yang, Juanyang Chen, Le Dong, Yanxin Chen, and Quan Wang

work page
[25]

InProceedings of the 33rd ACM International Conference on Multimedia

Mst-distill: Mixture of specialized teachers for cross-modal knowledge distillation. InProceedings of the 33rd ACM International Conference on Multimedia. 1588–1597

work page
[26]

Mingcheng Li, Dingkang Yang, Xiao Zhao, Shuaibing Wang, Yan Wang, Kun Yang, Mingyang Sun, Dongliang Kou, Ziyun Qian, and Lihua Zhang. 2024. Correlation- decoupled knowledge distillation for multimodal sentiment analysis with incom- plete modalities. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12458–12468

work page 2024
[27]

Hangyu Lin, Chen Liu, Chengming Xu, Zhengqi Gao, Yanwei Fu, and Yuan Yao

work page
[28]

A generalization theory of cross-modality distillation with contrastive learning.arXiv preprint arXiv:2405.03355(2024)

work page arXiv 2024
[29]

Yanfeng Liu and Lefei Zhang. 2025. Multimodal decomposed distillation with instance alignment and uncertainty compensation for thermal object detection. In Proceedings of the 33rd ACM International Conference on Multimedia. 2294–2303

work page 2025
[30]

E Paige Lloyd, Jason C Deska, Kurt Hugenberg, Allen R McConnell, Brandon T Humphrey, and Jonathan W Kunstman. 2019. Miami University deception detec- tion database.Behavior research methods51, 1 (2019), 429–439

work page 2019
[31]

David T Lykken. 1959. The GSR in the detection of guilt.Journal of Applied Psychology43, 6 (1959), 385

work page 1959
[32]

Merylin Monaro, Pasquale Capuozzo, Federica Ragucci, Antonio Maffei, Antoni- etta Curci, Cristina Scarpazza, Alessandro Angrilli, and Giuseppe Sartori. 2020. Using blink rate to detect deception: A study to validate an automatic blink detector and a new dataset of videos from liars and truth-tellers. InInternational Conference on Human-Computer Interactio...

work page 2020
[33]

Jan Ondras and Hatice Gunes. 2018. Detecting deception and suspicion in dyadic game interactions. InProceedings of the 20th ACM international conference on multimodal interaction. 200–209

work page 2018
[34]

Wonpyo Park, Dongju Kim, Yan Lu, and Minsu Cho. 2019. Relational knowledge distillation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 3967–3976

work page 2019
[35]

Baoyun Peng, Xiao Jin, Jiaheng Liu, Dongsheng Li, Yichao Wu, Yu Liu, Shunfeng Zhou, and Zhaoning Zhang. 2019. Correlation congruence for knowledge distil- lation. InProceedings of the IEEE/CVF international conference on computer vision. 5007–5016

work page 2019
[36]

Verónica Pérez-Rosas, Mohamed Abouelenien, Rada Mihalcea, and Mihai Burzo

work page
[37]

InProceedings of the 2015 ACM on international conference on multimodal interaction

Deception detection using real-life trial data. InProceedings of the 2015 ACM on international conference on multimodal interaction. 59–66

work page 2015
[38]

Pritam Sarkar and Ali Etemad. 2024. Xkd: Cross-modal knowledge distillation with domain alignment for video representation learning. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 14875–14885

work page 2024
[39]

Felix Soldner, Verónica Pérez-Rosas, and Rada Mihalcea. 2019. Box of Lies: Multi- modal Deception Detection in Dialogues. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.)...

work page doi:10.18653/v1/n19-1175 2019
[40]

Shangquan Sun, Wenqi Ren, Jingzhi Li, Rui Wang, and Xiaochun Cao. 2024. Logit standardization in knowledge distillation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 15731–15740

work page 2024
[41]

Teng Sun, Yinwei Wei, Juntong Ni, Zixin Liu, Xuemeng Song, Yaowei Wang, and Liqiang Nie. 2024. Muti-modal emotion recognition via hierarchical knowledge distillation.IEEE Transactions on Multimedia26 (2024), 9036–9046

work page 2024
[42]

John Synnott, David Dietzel, and Maria Ioannou. 2015. A review of the polygraph: history, methodology and current status.Crime Psychology Review1, 1 (2015), 59–

work page 2015
[43]

2015.1060080

arXiv:https://doi.org/10.1080/23744006.2015.1060080 doi:10.1080/23744006. 2015.1060080

work page doi:10.1080/23744006.2015.1060080 2015
[44]

Frederick Tung and Greg Mori. 2019. Similarity-preserving knowledge distillation. InProceedings of the IEEE/CVF international conference on computer vision. 1365– 1374

work page 2019
[45]

Martina Vicianova. 2015. Historical techniques of lie detection.Europe’s journal of psychology11, 3 (2015), 522

work page 2015
[46]

Hu Wang, Congbo Ma, Jianpeng Zhang, Yuan Zhang, Jodie Avery, Louise Hull, and Gustavo Carneiro. 2023. Learnable cross-modal knowledge distillation for multi- modal learning with missing modality. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 216–226

work page 2023
[47]

Lin Wang and Kuk-Jin Yoon. 2021. Knowledge distillation and student-teacher learning for visual intelligence: A review and new outlooks.IEEE transactions on pattern analysis and machine intelligence44, 6 (2021), 3048–3068

work page 2021
[48]

Riling Wei, Kelu Yao, Chuanguang Yang, Jin Wang, Zhuoyan Gao, and Chao Li

work page
[49]

Asymmetric Cross-Modal Knowledge Distillation: Bridging Modalities with Weak Semantic Consistency.arXiv preprint arXiv:2511.08901(2025)

work page arXiv 2025
[50]

Shuang Wu, Heng Liang, Yong Zhang, Yanlin Chen, and Ziyu Jia. 2025. A cross- modal densely guided knowledge distillation based on modality rebalancing strategy for enhanced unimodal emotion recognition. InProceedings of the Thirty- Fourth International Joint Conference on Artificial Intelligence, IJCAI 2025, Montreal, Canada, August 16–22, 2025. 4236–4244

work page 2025
[51]

Xiaolin Xu, Wenming Zheng, Hailun Lian, Sunan Li, Jiateng Liu, Anbang Liu, Cheng Lu, Yuan Zong, and Zongbao Liang. 2025. Multimodal lie detection dataset based on Chinese dialogue.Journal of Image and Graphics30, 8 (2025), 2729–2742. doi:10.11834/jig.240571

work page doi:10.11834/jig.240571 2025
[52]

Zihui Xue, Zhengqi Gao, Sucheng Ren, and Hang Zhao. 2023. The Modality Fo- cusing Hypothesis: Towards Understanding Crossmodal Knowledge Distillation. InICLR

work page 2023
[53]

Su Zhang, Chuangao Tang, and Cuntai Guan. 2022. Visual-to-EEG cross-modal knowledge distillation for continuous emotion recognition.Pattern Recognition 130 (2022), 108833. doi:10.1016/j.patcog.2022.108833

work page doi:10.1016/j.patcog.2022.108833 2022
[54]

Zongshun Zhang, Yao Liu, Qiao Liu, Xuefeng Peng, Peiyuan Jiang, Jiaye Yang, Daibing Yao, and Wei Lin. 2026. GenLie: A Global-Enhanced Lie Detection Network under Sparsity and Semantic Interference. arXiv:2603.16935 [cs.CV] https://arxiv.org/abs/2603.16935

work page arXiv 2026
[55]

Borui Zhao, Quan Cui, Renjie Song, Yiyu Qiu, and Jiajun Liang. 2022. Decoupled knowledge distillation. InProceedings of the IEEE/CVF Conference on computer vision and pattern recognition. 11953–11962

work page 2022