pith. machine review for the scientific record. sign in

arxiv: 2605.09007 · v1 · submitted 2026-05-09 · 💻 cs.CY

Recognition: 2 theorem links

· Lean Theorem

Detecting Deception, Not Deepfakes: Why Media Forensics Needs Social Theories

Authors on Pith no claims yet

Pith reviewed 2026-05-12 02:02 UTC · model grok-4.3

classification 💻 cs.CY
keywords deepfake detectionmedia forensicsdeception detectionspeech act theorygrice cooperative principlecialdini influenceinteractive deepfakescommunicative signals
0
0 comments X

The pith

Deepfake detection must analyze deceptive interactions using social theories rather than relying only on synthetic media artifacts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that artifact-based deepfake detectors will not solve the problem of interactive deepfakes such as real-time impersonations in video or voice calls, because the harm comes from the act of deception rather than detectable flaws in the media signal. As generators improve, the five assumptions underlying artifact detection erode and create a generalization illusion where lab accuracy does not hold in the wild. To address this, the authors draw on Speech Act Theory, Grice's Cooperative Principle, and Cialdini's principles of influence to define forensic signals at the utterance, conversation, and listener-response levels. A sympathetic reader would care because this complementary layer targets the actual mechanism of harm in live settings where no prior reference media exists. The result is a unified framework that augments existing forensic tools.

Core claim

Current deepfake detection framed as media classification relies on five assumptions about signal traces that are eroding with better generators. For interactive deepfakes the relevant harm is the deceptive communicative act, not media realism. Detection therefore needs a complementary analytical layer drawn from Speech Act Theory at the utterance level, Grice's Cooperative Principle at the conversation level, and Cialdini's principles of influence at the listener-response level, producing a unified framework of forensic signals.

What carries the argument

A unified framework that applies Speech Act Theory, Grice's Cooperative Principle, and Cialdini's principles of influence to extract forensic signals at utterance, conversation, and listener-response levels.

If this is right

  • Detectors can flag violations of cooperative norms or speech-act inconsistencies even when the media signal appears realistic.
  • Analysis can extend to live two-way interactions where no clean reference clip is available.
  • Listener-response patterns can indicate whether a potential deception has succeeded or failed.
  • The approach complements rather than replaces existing low-level forensic methods.
  • It identifies open problems in operationalizing these signals for real-time systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The framework could extend to detecting deception in text-based or avatar-mediated interactions beyond audio-video deepfakes.
  • New annotated datasets of full conversations would be required to train and evaluate the social-signal layer.
  • Policy on regulating synthetic media in calls might shift from content authenticity to documented intent or pattern of use.
  • Integration with real-time conversation monitoring tools could test whether social signals provide earlier warnings than post-hoc media checks.

Load-bearing premise

The three social frameworks can be translated into operational forensic signals at the utterance, conversation, and listener levels that meaningfully improve detection.

What would settle it

A controlled test on interactive deepfake impersonation calls in which adding signals from the three social frameworks produces no measurable gain in detection accuracy over artifact-only baselines.

Figures

Figures reproduced from arXiv: 2605.09007 by Jessee Ho, Shaina Raza, Shweta Khushu.

Figure 1
Figure 1. Figure 1: Global growth in deepfake-driven identity [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The interrogation analogy. Traditional detection focuses on surface-level media artifacts. An [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Five forensic premises underlying current deepfake detection. Each assumes that synthetic [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: An incoming interaction is analyzed along two parallel paths: (1) [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
read the original abstract

For nearly a decade, deepfake detection has been framed as a classification task: given an audio or video clip, decide whether it is real or synthetic. Top detectors often report high accuracy on standard benchmarks; however, performance drops sharply on content from newer or unseen generators. We argue that better classifiers of synthetic media alone will not solve this problem, especially for interactive deepfakes such as impersonation in video and voice calls, where the harm lies not in the artifact (manipulated media signal) but in the act of deception. Deepfake detection therefore requires a complementary analytical layer focused on communicative interaction, not just media realism. We identify five assumptions that artifact-based detection (the forensic analysis of low-level signal traces) relies on and show that all five are eroding as generative models improve, producing what we call the Generalization Illusion. To address this, we draw on three well-established frameworks from philosophy of language and social psychology, namely, Speech Act Theory, Grice's Cooperative Principle, and Cialdini's principles of influence, to examine forensic signals at three levels: the utterance, the conversation, and the listener response. The result is a unified framework that complements existing forensic methods. We close with open problems for future work. https://jesseeho.github.io/deepfake-deception/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper argues that deepfake detection framed as a binary classification task on media artifacts is insufficient, particularly for interactive impersonation scenarios where harm stems from deception rather than signal manipulation. It identifies five eroding assumptions underlying artifact-based forensics that produce a 'Generalization Illusion' as generative models advance. To address this, the authors propose a complementary analytical layer drawing on Speech Act Theory, Grice's Cooperative Principle, and Cialdini's principles of influence, applied at utterance, conversation, and listener-response levels to detect communicative deception signals.

Significance. If operationalized, the framework could meaningfully broaden media forensics beyond technical classifiers toward socio-technical analysis, potentially improving robustness for real-time interactive deepfakes. The explicit enumeration of the five assumptions provides a useful diagnostic tool for the field, and the integration of established external theories from philosophy and psychology is a clear strength that avoids ad-hoc invention.

major comments (2)
  1. [Abstract and unified framework section] The central claim that the three social frameworks will 'examine forensic signals at three levels' and thereby complement existing methods (Abstract; proposed unified framework) is load-bearing but unsupported: no operational definitions, scoring procedures, or extraction methods are supplied for any signal (e.g., how a flouted Gricean maxim is automatically detected from turn-taking logs, or how listener-response features are derived from video without new labeled data).
  2. [Abstract and closing section] The assertion that the proposed layer will mitigate the Generalization Illusion (Abstract; closing section on open problems) rests on an untested translation step. The manuscript provides no case studies, empirical validation, or even illustrative examples showing that utterance/conversation/listener signals improve detection accuracy or generalization.
minor comments (2)
  1. [Section identifying the five assumptions] The five assumptions are logically presented but would benefit from explicit cross-references to prior deepfake literature for each assumption to strengthen the diagnostic claim.
  2. [Framework introduction] Notation for the three analysis levels (utterance, conversation, listener) is introduced clearly but could be summarized in a table for quick reference.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive review and for recognizing the potential of integrating social theories into media forensics. We address each major comment below, clarifying the manuscript's scope as a conceptual framework while outlining targeted revisions to improve support for our claims.

read point-by-point responses
  1. Referee: [Abstract and unified framework section] The central claim that the three social frameworks will 'examine forensic signals at three levels' and thereby complement existing methods (Abstract; proposed unified framework) is load-bearing but unsupported: no operational definitions, scoring procedures, or extraction methods are supplied for any signal (e.g., how a flouted Gricean maxim is automatically detected from turn-taking logs, or how listener-response features are derived from video without new labeled data).

    Authors: We agree that the manuscript presents a high-level theoretical framework rather than a fully operationalized detection pipeline. The three levels are structured directly from the cited theories (Speech Act Theory for utterances, Grice's Cooperative Principle for conversations, and Cialdini's principles for listener responses) to organize analysis of communicative deception. No automated scoring or extraction methods are claimed or provided, as the paper's aim is to diagnose limitations in artifact-based approaches and propose a complementary socio-technical direction. In revision, we will expand the unified framework section with concrete illustrative examples of potential signals at each level, drawn from existing pragmatics and deception-detection literature, to make the proposal more tangible without asserting current implementability. revision: partial

  2. Referee: [Abstract and closing section] The assertion that the proposed layer will mitigate the Generalization Illusion (Abstract; closing section on open problems) rests on an untested translation step. The manuscript provides no case studies, empirical validation, or even illustrative examples showing that utterance/conversation/listener signals improve detection accuracy or generalization.

    Authors: The manuscript is explicitly positioned as a theoretical argument that identifies five eroding assumptions and outlines a framework for future work, rather than an empirical demonstration. We do not claim that the proposed layer has been tested or that it will definitively mitigate the Generalization Illusion; the abstract and open problems section present this as a direction to be explored. To address the concern, we will add a set of illustrative scenarios in the revised manuscript showing how signals at the three levels could apply to interactive impersonation cases, grounded in documented real-world deception patterns. Full empirical validation, including new datasets, remains an open problem explicitly noted in the paper and is beyond the scope of this position piece. revision: partial

Circularity Check

0 steps flagged

No circularity; proposal invokes independent external theories without self-referential derivation

full rationale

The paper's central argument identifies five eroding assumptions in artifact-based deepfake detection and proposes a complementary framework drawing on Speech Act Theory, Grice's Cooperative Principle, and Cialdini's principles of influence, applied at utterance, conversation, and listener levels. These are presented as established external frameworks from philosophy and psychology, not derived from the paper's own data, equations, or prior self-citations. No load-bearing step reduces a result to a fitted parameter, self-definition, or self-citation chain; the work is a conceptual proposal for integration rather than a closed mathematical derivation. The absence of operational mappings or empirical validation is a separate feasibility concern, not evidence of circularity by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The claim rests on the premise that artifact-based assumptions are eroding and that the three named social theories provide applicable forensic signals; no free parameters or invented entities with independent evidence are introduced.

axioms (1)
  • domain assumption Artifact-based detection relies on five specific assumptions that are eroding as generative models improve.
    Stated directly in the abstract as the basis for the Generalization Illusion.
invented entities (1)
  • Generalization Illusion no independent evidence
    purpose: To name the performance drop of detectors on newer generators.
    Conceptual framing term introduced to describe the problem.

pith-pipeline@v0.9.0 · 5538 in / 1231 out tokens · 31974 ms · 2026-05-12T02:02:40.795165+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    We draw on three well-established frameworks from philosophy of language and social psychology, namely, Speech Act Theory, Grice’s Cooperative Principle, and Cialdini’s principles of influence, to examine forensic signals at three levels: the utterance, the conversation, and the listener response.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    We propose a three-layer framework for analyzing deceptive interactions... Layer 1: Illocutionary analysis (utterance level)... Layer 2: Conversational norm analysis... Layer 3: Coercion pattern analysis

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

76 extracted references · 76 canonical work pages · 2 internal anchors

  1. [1]

    Deepfakes: What are they and why would i make one? https://www.bbc.co

    BBC Bitesize. Deepfakes: What are they and why would i make one? https://www.bbc.co. uk/bitesize/articles/zfkwcqt, 2019. Accessed: 2026-05-01

  2. [2]

    FaceForensics++: Learning to detect manipulated facial images

    Andreas Rössler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner. FaceForensics++: Learning to detect manipulated facial images. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 1–11. IEEE, 2019

  3. [3]

    Celeb-df: A large-scale challenging dataset for deepfake forensics

    Yuezun Li, Xin Yang, Pu Sun, Honggang Qi, and Siwei Lyu. Celeb-df: A large-scale challenging dataset for deepfake forensics. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3207–3216, 2020

  4. [4]

    Deeperforensics-1.0: A large-scale dataset for real-world face forgery detection

    Liming Jiang, Ren Li, Wayne Wu, Chen Qian, and Chen Change Loy. Deeperforensics-1.0: A large-scale dataset for real-world face forgery detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2889–2898. IEEE, 2020

  5. [5]

    Deepfake-Eval-2024: A multi-modal in-the-wild benchmark of deepfakes circulated in 2024.arXiv preprint arXiv:2503.02857, 2025

    Nuria Alina Chandra, Ryan Murtfeldt, Lin Qiu, Arnab Karmakar, Hannah Lee, Emmanuel Tan- umihardja, Kevin Farhat, Ben Caffee, Sejin Paik, Changyeon Lee, Jongwook Choi, Aerin Kim, and Oren Etzioni. Deepfake-Eval-2024: A multi-modal in-the-wild benchmark of deepfakes circulated in 2024.arXiv preprint arXiv:2503.02857, 2025

  6. [6]

    Binh Minh Le, Jiwon Kim, Shahroz Tariq, Kristen Moore, Alsharif Abuadbba, and Simon S. Woo. Sok: Systematization and benchmarking of deepfake detectors in a unified framework. 2025 IEEE 10th European Symposium on Security and Privacy (EuroS&P), pages 883–902, 2024

  7. [7]

    Identity fraud report 2023

    Sum and Substance Ltd (UK). Identity fraud report 2023. https://sumsub.com/blog/ guides-reports/identity-fraud-report-2023/, 2023. Accessed: 2026-05-01

  8. [8]

    Identity fraud report 2024

    Sum and Substance Ltd (UK). Identity fraud report 2024. https://sumsub.com/ fraud-report-2024/, 2024. Accessed: 2026-04-17

  9. [9]

    Generative AI is expected to magnify the risk of deepfakes and other fraud in banking

    Deloitte Center for Financial Services. Generative AI is expected to magnify the risk of deepfakes and other fraud in banking. https://www.deloitte.com/us/en/insights/ industry/financial-services/deepfake-banking-fraud-risk-on-the-rise. html, 2024. Accessed: 2026-04-17

  10. [10]

    Fraudsters used ai to mimic ceo’s voice in unusual cybercrime case, 2019

    Catherine Stupp. Fraudsters used ai to mimic ceo’s voice in unusual cybercrime case, 2019. Accessed: 2026-04-16

  11. [11]

    Arup lost $25mn in hong kong deepfake video conference scam, 2024

    Leng Cheng and Ho-him Chan. Arup lost $25mn in hong kong deepfake video conference scam, 2024. Accessed: 2026-04-16

  12. [12]

    ’i need to identify you’: How one question saved ferrari from a deepfake scam, July 2024

    Daniele Lepido. ’i need to identify you’: How one question saved ferrari from a deepfake scam, July 2024. Accessed: 2026-04-16

  13. [13]

    Large language models in digital forensics: capabilities, challenges and future directions.Forensic Science International: Digital Investigation, 56:302043, 2026

    Maxim Chernyshev, Zubair Baig, Naeem Syed, Robin Doss, and Malcolm Shore. Large language models in digital forensics: capabilities, challenges and future directions.Forensic Science International: Digital Investigation, 56:302043, 2026

  14. [14]

    Deepfake generation and detection: A benchmark and survey.ACM Comput

    Gan Pei, Jiangning Zhang, Menghan Hu, Zhenyu Zhang, Chengjie Wang, Yunsheng Wu, Guangtao Zhai, Jian Yang, and Dacheng Tao. Deepfake generation and detection: A benchmark and survey.ACM Comput. Surv., 58(11), 2026

  15. [15]

    Searle.Expression and Meaning: Studies in the Theory of Speech Acts

    John R. Searle.Expression and Meaning: Studies in the Theory of Speech Acts. Cambridge University Press, 1979

  16. [16]

    H. P. Grice. Logic and conversation. In Peter Cole and Jerry L. Morgan, editors,Syntax and Semantics 3: Speech acts, pages 41–58. Academic Press, New York, 1975

  17. [17]

    Cialdini.Pre-Suasion: A Revolutionary Way to Influence and Persuade

    R. Cialdini.Pre-Suasion: A Revolutionary Way to Influence and Persuade. Simon & Schuster, 2016. 10

  18. [18]

    Face X-Ray for more general face forgery detection

    Lingzhi Li, Jianmin Bao, Ting Zhang, Hao Yang, Dong Chen, Fang Wen, and Baining Guo. Face X-Ray for more general face forgery detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 5000–5009. IEEE, 2020

  19. [19]

    Laa-net: Localized artifact attention network for quality- agnostic and generalizable deepfake detection

    Dat Nguyen, Nesryne Mejri, Inder Pal Singh, Polina Kuleshova, Marcella Astrid, Anis Kacem, Enjie Ghorbel, and Djamila Aouada. Laa-net: Localized artifact attention network for quality- agnostic and generalizable deepfake detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17395–17405. IEEE, 2024

  20. [20]

    High- resolution image synthesis with latent diffusion models

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, 2022

  21. [21]

    Df40: toward next-generation deepfake detection

    Zhiyuan Yan, Taiping Yao, Shen Chen, Yandan Zhao, Xinghe Fu, Junwei Zhu, Donghao Luo, Chengjie Wang, Shouhong Ding, Yunsheng Wu, and Li Yuan. Df40: toward next-generation deepfake detection. InProceedings of the 38th International Conference on Neural Information Processing Systems, NIPS ’24, pages 29387–29434. Curran Associates Inc., 2024

  22. [22]

    Thinking in frequency: Face forgery detection by mining frequency-aware clues

    Yuyang Qian, Guojun Yin, Lu Sheng, Zixuan Chen, and Jing Shao. Thinking in frequency: Face forgery detection by mining frequency-aware clues. InComputer Vision – ECCV 2020, page 86–103. Springer International Publishing, 2020

  23. [23]

    Frequency-aware deepfake detection: Improving generalizability through frequency space domain learning

    Chuangchuang Tan, Yao Zhao, Shikui Wei, Guanghua Gu, Ping Liu, and Yunchao Wei. Frequency-aware deepfake detection: Improving generalizability through frequency space domain learning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 5052–5060. AAAI, 2024

  24. [24]

    Fe-clip: Frequency enhanced clip model for zero-shot anomaly detection and segmentation

    Tao Gong, Qi Chu, Bin Liu, Wei Zhou, and Nenghai Yu. Fe-clip: Frequency enhanced clip model for zero-shot anomaly detection and segmentation. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 21220–21230. IEEE, 2025

  25. [25]

    On the detection of synthetic images generated by diffusion models

    Riccardo Corvi, Davide Cozzolino, Giada Zingarini, Giovanni Poggi, Koki Nagano, and Luisa Verdoliva. On the detection of synthetic images generated by diffusion models. InIEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5, 2023

  26. [26]

    AEROBLADE: Training-free detection of latent diffusion images using autoencoder reconstruction error

    Jonas Ricker, Denis Lukovnikov, and Asja Fischer. AEROBLADE: Training-free detection of latent diffusion images using autoencoder reconstruction error. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9130–9140. IEEE, 2024

  27. [27]

    Exploring Temporal Coherence for More General Video Face Forgery Detection

    Yinglin Zheng, Jianmin Bao, Dong Chen, Ming Zeng, and Fang Wen. Exploring Temporal Coherence for More General Video Face Forgery Detection . In2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 15024–15034. IEEE, 2021

  28. [28]

    Altfreezing for more general video face forgery detection

    Zhendong Wang, Jianmin Bao, Wengang Zhou, Weilun Wang, and Houqiang Li. Altfreezing for more general video face forgery detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4129–4138. IEEE, 2023

  29. [29]

    Msvt: Multiple spatiotemporal views transformer for deepfake video detection.IEEE transactions on circuits and systems for video technology (Print), 33(9):4462–4471, 2023

    Yang Yu, Rongrong Ni, Yao Zhao, Siyuan Yang, Fen Xia, Ning Jiang, and Guoqing Zhao. Msvt: Multiple spatiotemporal views transformer for deepfake video detection.IEEE transactions on circuits and systems for video technology (Print), 33(9):4462–4471, 2023

  30. [30]

    Analyzing temporal coherence for deepfake video detection.Electronic Research Archive, 32:2621–2641, 01 2024

    Muhammad Amin, Yongjian Hu, and Jiankun Hu. Analyzing temporal coherence for deepfake video detection.Electronic Research Archive, 32:2621–2641, 01 2024

  31. [31]

    Align your latents: High-resolution video synthesis with latent diffusion models

    Andreas Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim, Sanja Fidler, and Karsten Kreis. Align your latents: High-resolution video synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 22563–22575. IEEE, 2023

  32. [32]

    Spatio-temporal knowledge distilled video vision transformer (stkd-vvit) for multimodal deepfake detection.Neurocomputing, 620(C), 2025

    Shaheen Usmani, Sunil Kumar, and Debanjan Sadhya. Spatio-temporal knowledge distilled video vision transformer (stkd-vvit) for multimodal deepfake detection.Neurocomputing, 620(C), 2025. 11

  33. [33]

    In ictu oculi: Exposing ai created fake videos by detecting eye blinking

    Yuezun Li, Ming-Ching Chang, and Siwei Lyu. In ictu oculi: Exposing ai created fake videos by detecting eye blinking. In2018 IEEE International Workshop on Information Forensics and Security (WIFS), pages 1–7. IEEE, 2018

  34. [34]

    Deepvision: Deepfakes detection using human eye blinking pattern.IEEE Access, 8:83144–83154, 2020

    TackHyun Jung, Sangwon Kim, and Keecheon Kim. Deepvision: Deepfakes detection using human eye blinking pattern.IEEE Access, 8:83144–83154, 2020

  35. [35]

    Where do deep fakes look? synthetic face detection via gaze tracking

    Ilke Demir and Umur Aybars Ciftci. Where do deep fakes look? synthetic face detection via gaze tracking. InACM Symposium on Eye Tracking Research and Applications, New York, NY , USA, 2021. ACM

  36. [36]

    Fakecatcher: Detection of synthetic por- trait videos using biological signals.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020

    Umur Aybars Ciftci, Ilke Demir, and Lijun Yin. Fakecatcher: Detection of synthetic por- trait videos using biological signals.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020

  37. [37]

    Deeprhythm: Exposing deepfakes with attentional visual heartbeat rhythms

    Hua Qi, Qing Guo, Felix Juefei-Xu, Xiaofei Xie, Lei Ma, Wei Feng, Yang Liu, and Jianjun Zhao. Deeprhythm: Exposing deepfakes with attentional visual heartbeat rhythms. InProceedings of the 28th ACM International Conference on Multimedia, MM ’20, page 4318–4327. Association for Computing Machinery, 2020

  38. [38]

    Wisotzky, Arian Beckmann, Benjamin Kossack, Anna Hilsmann, and Peter Eisert

    Clemens Seibold, Eric L. Wisotzky, Arian Beckmann, Benjamin Kossack, Anna Hilsmann, and Peter Eisert. High-quality deepfakes have a heart!Frontiers in Imaging, 4, 2025

  39. [39]

    Adversarial deepfakes: Evaluating vulnerability of deepfake detectors to adversarial examples

    Shehzeen Hussain, Paarth Neekhara, Malhar Jere, Farinaz Koushanfar, and Julian McAuley. Adversarial deepfakes: Evaluating vulnerability of deepfake detectors to adversarial examples. InProceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), pages 3348–3357, 2021

  40. [40]

    Impact of video processing operations in deepfake detection

    Yuhang Lu and Touradj Ebrahimi. Impact of video processing operations in deepfake detection. 2023 24th International Conference on Digital Signal Processing (DSP), pages 1–5, 2023

  41. [41]

    Cialdini.Influence: The Psychology of Persuasion

    R.B. Cialdini.Influence: The Psychology of Persuasion. Collins Business Essentials. Harper- Collins, 2009

  42. [42]

    Oxford University Press, Oxford, 1962

    John Langshaw Austin.How to Do Things with Words. Oxford University Press, Oxford, 1962

  43. [43]

    Danni Yu, Luyang Li, Hang Su, and Matteo Fuoli. Assessing the potential of llm-assisted anno- tation for corpus-based pragmatics and discourse analysis: The case of apology.International Journal of Corpus Linguistics, 29, 06 2024

  44. [44]

    An analysis of social engineering principles in effective phishing

    Ana Ferreira and Gabriele Lenzini. An analysis of social engineering principles in effective phishing. InProceedings of the 2015 Workshop on Socio-Technical Aspects in Security and Trust, STAST ’15, page 9–16. IEEE, 2015

  45. [45]

    Algorithmic detection of misinformation and disinformation: Gricean perspectives.Journal of Documentation, 74(2):309–332, 2017

    Sille Obelitz Søe. Algorithmic detection of misinformation and disinformation: Gricean perspectives.Journal of Documentation, 74(2):309–332, 2017

  46. [46]

    Speech acts in social media fraud: Manipulative communication strategies on whatsapp and facebook.Language Circle: Journal of Language and Literature, 20:16–28, 10 2025

    Nur Lailiyah, Galuh Areni, Favorita Kurwidaria, Setyo Cahyono, Monika Surtikanti, and Farida Wijayanti. Speech acts in social media fraud: Manipulative communication strategies on whatsapp and facebook.Language Circle: Journal of Language and Literature, 20:16–28, 10 2025

  47. [47]

    Fine-grained analysis of propaganda in news articles

    Giovanni Da San Martino, Seunghak Yu, Alberto Barrón-Cedeño, Rostislav Petrov, and Preslav Nakov. Fine-grained analysis of propaganda in news articles. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5636–5646. ACL, 2019

  48. [48]

    Deepfake influence tactics through the lens of cialdini’s principles: Case studies and the deep frame tool proposal.Applied Cybersecurity & Internet Governance, 2024

    Pawel Zegarow and Ewelina Bartuzi. Deepfake influence tactics through the lens of cialdini’s principles: Case studies and the deep frame tool proposal.Applied Cybersecurity & Internet Governance, 2024

  49. [49]

    Detecting deepfakes and false ads through analysis of text and social engineering techniques

    Alicja Martinek and Ewelina Bartuzi-Trokielewicz. Detecting deepfakes and false ads through analysis of text and social engineering techniques. InProceedings of the 31st International Conference on Computational Linguistics, pages 8432–8448. ACL, 2025. 12

  50. [50]

    Spiesberger, Iosif Tsangko, Xin Jing, Verena Distler, Felix Dietz, Florian Alt, and Björn W

    Andreas Triantafyllopoulos, Anika A. Spiesberger, Iosif Tsangko, Xin Jing, Verena Distler, Felix Dietz, Florian Alt, and Björn W. Schuller. Vishing: Detecting social engineering in spoken communication — a first survey & urgent roadmap to address an emerging societal challenge. Comput. Speech Lang., 94(C), 2025

  51. [51]

    Sok: the good, the bad, and the unbalanced: measuring structural limitations of deepfake media datasets

    Seth Layton, Tyler Tucker, Daniel Olszewski, Kevin Warren, Kevin Butler, and Patrick Traynor. Sok: the good, the bad, and the unbalanced: measuring structural limitations of deepfake media datasets. InProceedings of the 33rd USENIX Conference on Security Symposium, SEC ’24, USA, 2024. USENIX Association

  52. [52]

    An analysis of recent advances in deepfake image detection in an evolving threat landscape

    Sifat Muhammad Abdullah, Aravind Cheruvu, Shravya Kanchi, Taejoong Chung, Peng Gao, Murtuza Jadliwala, and Bimal Viswanath. An analysis of recent advances in deepfake image detection in an evolving threat landscape. In2024 IEEE Symposium on Security and Privacy (SP), pages 91–109. IEEE, 2024

  53. [53]

    Unlocking the capabilities of large vision-language models for generalizable and explainable deepfake detection

    Peipeng Yu, Jianwei Fei, Hui Gao, Xuan Feng, Zhihua Xia, and Chip Hong Chang. Unlocking the capabilities of large vision-language models for generalizable and explainable deepfake detection. InProceedings of the 42nd International Conference on Machine Learning, volume 267, pages 72925–72943. PMLR, 2025

  54. [54]

    Diffusionfake: enhancing generalization in deepfake detection via guided stable diffusion

    Ke Sun, Shen Chen, Taiping Yao, Hong Liu, Xiaoshuai Sun, Shouhong Ding, and Rongrong Ji. Diffusionfake: enhancing generalization in deepfake detection via guided stable diffusion. In Proceedings of the 38th International Conference on Neural Information Processing Systems, pages 101474–101497. Curran Associates Inc., 2024

  55. [55]

    Few-shot learner generalizes across ai-generated image detection

    Shiyu Wu, Jing Liu, Jing Li, and Yequan Wang. Few-shot learner generalizes across ai-generated image detection. InProceedings of the 42nd International Conference on Machine Learning (ICML’25). JMLR.org, 2025

  56. [56]

    Seeing is believing: Exploring perceptual differences in deepfake videos

    Rashid Tahir, Brishna Batool, Hira Jamshed, Mahnoor Jameel, Mubashir Anwar, Faizan Ahmed, Muhammad Adeel Zaffar, and Muhammad Fareed Zaffar. Seeing is believing: Exploring perceptual differences in deepfake videos. InProceedings of the 2021 CHI Conference on Human Factors in Computing Systems, pages 1–16. ACM, 2021

  57. [57]

    MacDorman, Martin Teufel, and Alexander Bäuerle

    Alexander Diel, Tania Lalgi, Isabel Carolin Schröter, Karl F. MacDorman, Martin Teufel, and Alexander Bäuerle. Human performance in detecting deepfakes: A systematic review and meta-analysis of 56 papers.Computers in Human Behavior Reports, 16:100538, 2024

  58. [58]

    Miller, and Mary Holmes

    Klaire Somoray, Dan J. Miller, and Mary Holmes. Human performance in deepfake detection: A systematic review.Human Behavior and Emerging Technologies, 2025(1):1833228, 2025

  59. [59]

    Leibowicz, Sean McGregor, and Aviv Ovadya

    Claire R. Leibowicz, Sean McGregor, and Aviv Ovadya. The deepfake detection dilemma: A multistakeholder exploration of adversarial dynamics in synthetic media. InProceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, page 736–744. ACM, 2021

  60. [60]

    Spang, and Sebastian Möller

    Vera Schmitt, Luis-Felipe Villa-Arenas, Nils Feldhus, Joachim Meyer, Robert P. Spang, and Sebastian Möller. The role of explainability in collaborative human-ai disinformation detection. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, page 2157–2174. ACM, 2024

  61. [61]

    Deepfake the menace: mitigating the negative impacts of ai-generated content

    Siwei Lyu. Deepfake the menace: mitigating the negative impacts of ai-generated content. Organizational Cybersecurity Journal: Practice, Process and People, 4:1–18, 06 2024

  62. [62]

    Deepfacelab: Integrated, flexible and extensible face-swapping framework.Pattern Recogn., 141(C), 2023

    Kunlin Liu, Ivan Perov, Daiheng Gao, Nikolay Chervoniy, Wenbo Zhou, and Weiming Zhang. Deepfacelab: Integrated, flexible and extensible face-swapping framework.Pattern Recogn., 141(C), 2023

  63. [63]

    Simswap: An efficient framework for high fidelity face swapping

    Renwang Chen, Xuanhong Chen, Bingbing Ni, and Yanhao Ge. Simswap: An efficient framework for high fidelity face swapping. InProceedings of the 28th ACM International Conference on Multimedia (MM’20), pages 2003–2011. ACM, 2020

  64. [64]

    Advancing high fidelity identity swapping for forgery detection

    Lingzhi Li, Jianmin Bao, Hao Yang, Dong Chen, and Fang Wen. Advancing high fidelity identity swapping for forgery detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5073–5082, 2020. 13

  65. [65]

    Watch your up-convolution: CNN based generative deep neural networks are failing to reproduce spectral distributions

    Ricard Durall, Margret Keuper, and Janis Keuper. Watch your up-convolution: CNN based generative deep neural networks are failing to reproduce spectral distributions. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7890–7899. IEEE, 2020

  66. [66]

    Xception: Deep learning with depthwise separable convolutions

    François Chollet. Xception: Deep learning with depthwise separable convolutions. InProceed- ings of the IEEE conference on computer vision and pattern recognition, 2017

  67. [67]

    Efficientnet

    Brett Koonce. Efficientnet. InConvolutional neural networks with swift for Tensorflow, pages 109–123. Springer, 2021

  68. [68]

    Denoising diffusion probabilistic models

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. InPro- ceedings of the 34th Conference on Neural Information Processing Systems. Curran Associates Inc., 2020

  69. [69]

    Sora: Creating video from text

    OpenAI. Sora: Creating video from text. Technical report, OpenAI, 2024. Updated April 2026

  70. [70]

    HunyuanVideo: A Systematic Framework For Large Video Generative Models

    Weijie Kong, Qi Tian, Zijian Zhang, Rox Min, Zuozhuo Dai, Jin Zhou, Jia-Liang Xiong, Xin Li, Bo Wu, Jianwei Zhang, et al. Hunyuanvideo: A systematic framework for large video generative models.arXiv preprint arXiv:2412.03603, 2024

  71. [71]

    Efficient region-aware neural radiance fields for high-fidelity talking portrait synthesis

    Jiahe Li, Jiawei Zhang, Xiao Bai, Jun Zhou, and Lin Gu. Efficient region-aware neural radiance fields for high-fidelity talking portrait synthesis. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 7568–7578. IEEE, 2023

  72. [72]

    3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42(4):139–1, 2023

    Bernhard Kerbl, Georgios Kopanas, Thomas Leimkuehler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42(4):139–1, 2023

  73. [73]

    Hallo2: Long-duration and high-resolution audio-driven portrait image animation.International Conference on Learning Representations (ICLR), 2025

    Jiahao Cui, Hui Li, Yao Yao, Hao Zhu, Hanlin Shang, Kaihui Cheng, Hang Zhou, Siyu Zhu, and Jingdong Wang. Hallo2: Long-duration and high-resolution audio-driven portrait image animation.International Conference on Learning Representations (ICLR), 2025

  74. [74]

    Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen

    Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations (ICLR), 2022

  75. [75]

    DreamBooth: Fine tuning text-to-image diffusion models for subject-driven generation

    Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. DreamBooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 22500–22510, June 2023

  76. [76]

    IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models

    Hu Ye, Jun Zhang, Sibo Liu, Xiao Han, and Wei Yang. IP-Adapter: Text compatible image prompt adapter for text-to-image diffusion models.arXiv preprint arXiv:2308.06721, 2023. 14 A Appendix A.1 Core Components of Social-Theoretic Frameworks Table A.1: Core components of the three social-theoretic frameworks, their analytical level, and key conditions requi...