arxiv: 2605.09007 · v1 · submitted 2026-05-09 · 💻 cs.CY

Recognition: 2 theorem links

· Lean Theorem

Detecting Deception, Not Deepfakes: Why Media Forensics Needs Social Theories

Jessee Ho , Shweta Khushu , Shaina Raza

Authors on Pith no claims yet

Pith reviewed 2026-05-12 02:02 UTC · model grok-4.3

classification 💻 cs.CY

keywords deepfake detectionmedia forensicsdeception detectionspeech act theorygrice cooperative principlecialdini influenceinteractive deepfakescommunicative signals

0 comments

The pith

Deepfake detection must analyze deceptive interactions using social theories rather than relying only on synthetic media artifacts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that artifact-based deepfake detectors will not solve the problem of interactive deepfakes such as real-time impersonations in video or voice calls, because the harm comes from the act of deception rather than detectable flaws in the media signal. As generators improve, the five assumptions underlying artifact detection erode and create a generalization illusion where lab accuracy does not hold in the wild. To address this, the authors draw on Speech Act Theory, Grice's Cooperative Principle, and Cialdini's principles of influence to define forensic signals at the utterance, conversation, and listener-response levels. A sympathetic reader would care because this complementary layer targets the actual mechanism of harm in live settings where no prior reference media exists. The result is a unified framework that augments existing forensic tools.

Core claim

Current deepfake detection framed as media classification relies on five assumptions about signal traces that are eroding with better generators. For interactive deepfakes the relevant harm is the deceptive communicative act, not media realism. Detection therefore needs a complementary analytical layer drawn from Speech Act Theory at the utterance level, Grice's Cooperative Principle at the conversation level, and Cialdini's principles of influence at the listener-response level, producing a unified framework of forensic signals.

What carries the argument

A unified framework that applies Speech Act Theory, Grice's Cooperative Principle, and Cialdini's principles of influence to extract forensic signals at utterance, conversation, and listener-response levels.

If this is right

Detectors can flag violations of cooperative norms or speech-act inconsistencies even when the media signal appears realistic.
Analysis can extend to live two-way interactions where no clean reference clip is available.
Listener-response patterns can indicate whether a potential deception has succeeded or failed.
The approach complements rather than replaces existing low-level forensic methods.
It identifies open problems in operationalizing these signals for real-time systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The framework could extend to detecting deception in text-based or avatar-mediated interactions beyond audio-video deepfakes.
New annotated datasets of full conversations would be required to train and evaluate the social-signal layer.
Policy on regulating synthetic media in calls might shift from content authenticity to documented intent or pattern of use.
Integration with real-time conversation monitoring tools could test whether social signals provide earlier warnings than post-hoc media checks.

Load-bearing premise

The three social frameworks can be translated into operational forensic signals at the utterance, conversation, and listener levels that meaningfully improve detection.

What would settle it

A controlled test on interactive deepfake impersonation calls in which adding signals from the three social frameworks produces no measurable gain in detection accuracy over artifact-only baselines.

Figures

Figures reproduced from arXiv: 2605.09007 by Jessee Ho, Shaina Raza, Shweta Khushu.

**Figure 2.** Figure 2: The interrogation analogy. Traditional detection focuses on surface-level media artifacts. An [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Five forensic premises underlying current deepfake detection. Each assumes that synthetic [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: An incoming interaction is analyzed along two parallel paths: (1) [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

read the original abstract

For nearly a decade, deepfake detection has been framed as a classification task: given an audio or video clip, decide whether it is real or synthetic. Top detectors often report high accuracy on standard benchmarks; however, performance drops sharply on content from newer or unseen generators. We argue that better classifiers of synthetic media alone will not solve this problem, especially for interactive deepfakes such as impersonation in video and voice calls, where the harm lies not in the artifact (manipulated media signal) but in the act of deception. Deepfake detection therefore requires a complementary analytical layer focused on communicative interaction, not just media realism. We identify five assumptions that artifact-based detection (the forensic analysis of low-level signal traces) relies on and show that all five are eroding as generative models improve, producing what we call the Generalization Illusion. To address this, we draw on three well-established frameworks from philosophy of language and social psychology, namely, Speech Act Theory, Grice's Cooperative Principle, and Cialdini's principles of influence, to examine forensic signals at three levels: the utterance, the conversation, and the listener response. The result is a unified framework that complements existing forensic methods. We close with open problems for future work. https://jesseeho.github.io/deepfake-deception/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper correctly flags why pure artifact detection is losing ground on interactive deepfakes and sketches a social-theory complement, but never shows how to turn those theories into usable signals.

read the letter

The central claim is that deepfake harm in calls and chats comes from the deception act, not the media artifact, so detection needs layers from Speech Act Theory, Gricean maxims, and Cialdini at utterance, conversation, and listener levels. That framing is new for the technical literature and the five eroding assumptions are laid out clearly enough to make the Generalization Illusion feel real rather than rhetorical. The authors also keep the tone measured by listing open problems instead of claiming a fix. Those are the useful parts. The rest stays at the level of proposal. No example turns a flouted maxim or a listener response into a feature that could be extracted from logs or video. No toy dataset or even a manual annotation exercise shows whether these signals would separate deceptive from honest interactions better than current classifiers. The translation step from philosophy and psychology to forensic procedure is asserted, not demonstrated, so the practical payoff remains unknown. This is the kind of paper that belongs in a reading group for people already working on misinformation or live impersonation scams. It gives them a different vocabulary and points to gaps in the current benchmark-driven work. A referee could usefully press on operationalization and ask for at least one worked example or small validation set. I would send it out rather than desk reject.

Referee Report

2 major / 2 minor

Summary. The paper argues that deepfake detection framed as a binary classification task on media artifacts is insufficient, particularly for interactive impersonation scenarios where harm stems from deception rather than signal manipulation. It identifies five eroding assumptions underlying artifact-based forensics that produce a 'Generalization Illusion' as generative models advance. To address this, the authors propose a complementary analytical layer drawing on Speech Act Theory, Grice's Cooperative Principle, and Cialdini's principles of influence, applied at utterance, conversation, and listener-response levels to detect communicative deception signals.

Significance. If operationalized, the framework could meaningfully broaden media forensics beyond technical classifiers toward socio-technical analysis, potentially improving robustness for real-time interactive deepfakes. The explicit enumeration of the five assumptions provides a useful diagnostic tool for the field, and the integration of established external theories from philosophy and psychology is a clear strength that avoids ad-hoc invention.

major comments (2)

[Abstract and unified framework section] The central claim that the three social frameworks will 'examine forensic signals at three levels' and thereby complement existing methods (Abstract; proposed unified framework) is load-bearing but unsupported: no operational definitions, scoring procedures, or extraction methods are supplied for any signal (e.g., how a flouted Gricean maxim is automatically detected from turn-taking logs, or how listener-response features are derived from video without new labeled data).
[Abstract and closing section] The assertion that the proposed layer will mitigate the Generalization Illusion (Abstract; closing section on open problems) rests on an untested translation step. The manuscript provides no case studies, empirical validation, or even illustrative examples showing that utterance/conversation/listener signals improve detection accuracy or generalization.

minor comments (2)

[Section identifying the five assumptions] The five assumptions are logically presented but would benefit from explicit cross-references to prior deepfake literature for each assumption to strengthen the diagnostic claim.
[Framework introduction] Notation for the three analysis levels (utterance, conversation, listener) is introduced clearly but could be summarized in a table for quick reference.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive review and for recognizing the potential of integrating social theories into media forensics. We address each major comment below, clarifying the manuscript's scope as a conceptual framework while outlining targeted revisions to improve support for our claims.

read point-by-point responses

Referee: [Abstract and unified framework section] The central claim that the three social frameworks will 'examine forensic signals at three levels' and thereby complement existing methods (Abstract; proposed unified framework) is load-bearing but unsupported: no operational definitions, scoring procedures, or extraction methods are supplied for any signal (e.g., how a flouted Gricean maxim is automatically detected from turn-taking logs, or how listener-response features are derived from video without new labeled data).

Authors: We agree that the manuscript presents a high-level theoretical framework rather than a fully operationalized detection pipeline. The three levels are structured directly from the cited theories (Speech Act Theory for utterances, Grice's Cooperative Principle for conversations, and Cialdini's principles for listener responses) to organize analysis of communicative deception. No automated scoring or extraction methods are claimed or provided, as the paper's aim is to diagnose limitations in artifact-based approaches and propose a complementary socio-technical direction. In revision, we will expand the unified framework section with concrete illustrative examples of potential signals at each level, drawn from existing pragmatics and deception-detection literature, to make the proposal more tangible without asserting current implementability. revision: partial
Referee: [Abstract and closing section] The assertion that the proposed layer will mitigate the Generalization Illusion (Abstract; closing section on open problems) rests on an untested translation step. The manuscript provides no case studies, empirical validation, or even illustrative examples showing that utterance/conversation/listener signals improve detection accuracy or generalization.

Authors: The manuscript is explicitly positioned as a theoretical argument that identifies five eroding assumptions and outlines a framework for future work, rather than an empirical demonstration. We do not claim that the proposed layer has been tested or that it will definitively mitigate the Generalization Illusion; the abstract and open problems section present this as a direction to be explored. To address the concern, we will add a set of illustrative scenarios in the revised manuscript showing how signals at the three levels could apply to interactive impersonation cases, grounded in documented real-world deception patterns. Full empirical validation, including new datasets, remains an open problem explicitly noted in the paper and is beyond the scope of this position piece. revision: partial

Circularity Check

0 steps flagged

No circularity; proposal invokes independent external theories without self-referential derivation

full rationale

The paper's central argument identifies five eroding assumptions in artifact-based deepfake detection and proposes a complementary framework drawing on Speech Act Theory, Grice's Cooperative Principle, and Cialdini's principles of influence, applied at utterance, conversation, and listener levels. These are presented as established external frameworks from philosophy and psychology, not derived from the paper's own data, equations, or prior self-citations. No load-bearing step reduces a result to a fitted parameter, self-definition, or self-citation chain; the work is a conceptual proposal for integration rather than a closed mathematical derivation. The absence of operational mappings or empirical validation is a separate feasibility concern, not evidence of circularity by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The claim rests on the premise that artifact-based assumptions are eroding and that the three named social theories provide applicable forensic signals; no free parameters or invented entities with independent evidence are introduced.

axioms (1)

domain assumption Artifact-based detection relies on five specific assumptions that are eroding as generative models improve.
Stated directly in the abstract as the basis for the Generalization Illusion.

invented entities (1)

Generalization Illusion no independent evidence
purpose: To name the performance drop of detectors on newer generators.
Conceptual framing term introduced to describe the problem.

pith-pipeline@v0.9.0 · 5538 in / 1231 out tokens · 31974 ms · 2026-05-12T02:02:40.795165+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We draw on three well-established frameworks from philosophy of language and social psychology, namely, Speech Act Theory, Grice’s Cooperative Principle, and Cialdini’s principles of influence, to examine forensic signals at three levels: the utterance, the conversation, and the listener response.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose a three-layer framework for analyzing deceptive interactions... Layer 1: Illocutionary analysis (utterance level)... Layer 2: Conversational norm analysis... Layer 3: Coercion pattern analysis

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

76 extracted references · 76 canonical work pages · 2 internal anchors

[1]

Deepfakes: What are they and why would i make one? https://www.bbc.co

BBC Bitesize. Deepfakes: What are they and why would i make one? https://www.bbc.co. uk/bitesize/articles/zfkwcqt, 2019. Accessed: 2026-05-01

work page 2019
[2]

FaceForensics++: Learning to detect manipulated facial images

Andreas Rössler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner. FaceForensics++: Learning to detect manipulated facial images. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 1–11. IEEE, 2019

work page 2019
[3]

Celeb-df: A large-scale challenging dataset for deepfake forensics

Yuezun Li, Xin Yang, Pu Sun, Honggang Qi, and Siwei Lyu. Celeb-df: A large-scale challenging dataset for deepfake forensics. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3207–3216, 2020

work page 2020
[4]

Deeperforensics-1.0: A large-scale dataset for real-world face forgery detection

Liming Jiang, Ren Li, Wayne Wu, Chen Qian, and Chen Change Loy. Deeperforensics-1.0: A large-scale dataset for real-world face forgery detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2889–2898. IEEE, 2020

work page 2020
[5]

Deepfake-Eval-2024: A multi-modal in-the-wild benchmark of deepfakes circulated in 2024.arXiv preprint arXiv:2503.02857, 2025

Nuria Alina Chandra, Ryan Murtfeldt, Lin Qiu, Arnab Karmakar, Hannah Lee, Emmanuel Tan- umihardja, Kevin Farhat, Ben Caffee, Sejin Paik, Changyeon Lee, Jongwook Choi, Aerin Kim, and Oren Etzioni. Deepfake-Eval-2024: A multi-modal in-the-wild benchmark of deepfakes circulated in 2024.arXiv preprint arXiv:2503.02857, 2025

work page arXiv 2024
[6]

Binh Minh Le, Jiwon Kim, Shahroz Tariq, Kristen Moore, Alsharif Abuadbba, and Simon S. Woo. Sok: Systematization and benchmarking of deepfake detectors in a unified framework. 2025 IEEE 10th European Symposium on Security and Privacy (EuroS&P), pages 883–902, 2024

work page 2025
[7]

Identity fraud report 2023

Sum and Substance Ltd (UK). Identity fraud report 2023. https://sumsub.com/blog/ guides-reports/identity-fraud-report-2023/, 2023. Accessed: 2026-05-01

work page 2023
[8]

Identity fraud report 2024

Sum and Substance Ltd (UK). Identity fraud report 2024. https://sumsub.com/ fraud-report-2024/, 2024. Accessed: 2026-04-17

work page 2024
[9]

Generative AI is expected to magnify the risk of deepfakes and other fraud in banking

Deloitte Center for Financial Services. Generative AI is expected to magnify the risk of deepfakes and other fraud in banking. https://www.deloitte.com/us/en/insights/ industry/financial-services/deepfake-banking-fraud-risk-on-the-rise. html, 2024. Accessed: 2026-04-17

work page 2024
[10]

Fraudsters used ai to mimic ceo’s voice in unusual cybercrime case, 2019

Catherine Stupp. Fraudsters used ai to mimic ceo’s voice in unusual cybercrime case, 2019. Accessed: 2026-04-16

work page 2019
[11]

Arup lost $25mn in hong kong deepfake video conference scam, 2024

Leng Cheng and Ho-him Chan. Arup lost $25mn in hong kong deepfake video conference scam, 2024. Accessed: 2026-04-16

work page 2024
[12]

’i need to identify you’: How one question saved ferrari from a deepfake scam, July 2024

Daniele Lepido. ’i need to identify you’: How one question saved ferrari from a deepfake scam, July 2024. Accessed: 2026-04-16

work page 2024
[13]

Large language models in digital forensics: capabilities, challenges and future directions.Forensic Science International: Digital Investigation, 56:302043, 2026

Maxim Chernyshev, Zubair Baig, Naeem Syed, Robin Doss, and Malcolm Shore. Large language models in digital forensics: capabilities, challenges and future directions.Forensic Science International: Digital Investigation, 56:302043, 2026

work page 2026
[14]

Deepfake generation and detection: A benchmark and survey.ACM Comput

Gan Pei, Jiangning Zhang, Menghan Hu, Zhenyu Zhang, Chengjie Wang, Yunsheng Wu, Guangtao Zhai, Jian Yang, and Dacheng Tao. Deepfake generation and detection: A benchmark and survey.ACM Comput. Surv., 58(11), 2026

work page 2026
[15]

Searle.Expression and Meaning: Studies in the Theory of Speech Acts

John R. Searle.Expression and Meaning: Studies in the Theory of Speech Acts. Cambridge University Press, 1979

work page 1979
[16]

H. P. Grice. Logic and conversation. In Peter Cole and Jerry L. Morgan, editors,Syntax and Semantics 3: Speech acts, pages 41–58. Academic Press, New York, 1975

work page 1975
[17]

Cialdini.Pre-Suasion: A Revolutionary Way to Influence and Persuade

R. Cialdini.Pre-Suasion: A Revolutionary Way to Influence and Persuade. Simon & Schuster, 2016. 10

work page 2016
[18]

Face X-Ray for more general face forgery detection

Lingzhi Li, Jianmin Bao, Ting Zhang, Hao Yang, Dong Chen, Fang Wen, and Baining Guo. Face X-Ray for more general face forgery detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 5000–5009. IEEE, 2020

work page 2020
[19]

Laa-net: Localized artifact attention network for quality- agnostic and generalizable deepfake detection

Dat Nguyen, Nesryne Mejri, Inder Pal Singh, Polina Kuleshova, Marcella Astrid, Anis Kacem, Enjie Ghorbel, and Djamila Aouada. Laa-net: Localized artifact attention network for quality- agnostic and generalizable deepfake detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17395–17405. IEEE, 2024

work page 2024
[20]

High- resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, 2022

work page 2022
[21]

Df40: toward next-generation deepfake detection

Zhiyuan Yan, Taiping Yao, Shen Chen, Yandan Zhao, Xinghe Fu, Junwei Zhu, Donghao Luo, Chengjie Wang, Shouhong Ding, Yunsheng Wu, and Li Yuan. Df40: toward next-generation deepfake detection. InProceedings of the 38th International Conference on Neural Information Processing Systems, NIPS ’24, pages 29387–29434. Curran Associates Inc., 2024

work page 2024
[22]

Thinking in frequency: Face forgery detection by mining frequency-aware clues

Yuyang Qian, Guojun Yin, Lu Sheng, Zixuan Chen, and Jing Shao. Thinking in frequency: Face forgery detection by mining frequency-aware clues. InComputer Vision – ECCV 2020, page 86–103. Springer International Publishing, 2020

work page 2020
[23]

Frequency-aware deepfake detection: Improving generalizability through frequency space domain learning

Chuangchuang Tan, Yao Zhao, Shikui Wei, Guanghua Gu, Ping Liu, and Yunchao Wei. Frequency-aware deepfake detection: Improving generalizability through frequency space domain learning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 5052–5060. AAAI, 2024

work page 2024
[24]

Fe-clip: Frequency enhanced clip model for zero-shot anomaly detection and segmentation

Tao Gong, Qi Chu, Bin Liu, Wei Zhou, and Nenghai Yu. Fe-clip: Frequency enhanced clip model for zero-shot anomaly detection and segmentation. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 21220–21230. IEEE, 2025

work page 2025
[25]

On the detection of synthetic images generated by diffusion models

Riccardo Corvi, Davide Cozzolino, Giada Zingarini, Giovanni Poggi, Koki Nagano, and Luisa Verdoliva. On the detection of synthetic images generated by diffusion models. InIEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5, 2023

work page 2023
[26]

AEROBLADE: Training-free detection of latent diffusion images using autoencoder reconstruction error

Jonas Ricker, Denis Lukovnikov, and Asja Fischer. AEROBLADE: Training-free detection of latent diffusion images using autoencoder reconstruction error. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9130–9140. IEEE, 2024

work page 2024
[27]

Exploring Temporal Coherence for More General Video Face Forgery Detection

Yinglin Zheng, Jianmin Bao, Dong Chen, Ming Zeng, and Fang Wen. Exploring Temporal Coherence for More General Video Face Forgery Detection . In2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 15024–15034. IEEE, 2021

work page 2021
[28]

Altfreezing for more general video face forgery detection

Zhendong Wang, Jianmin Bao, Wengang Zhou, Weilun Wang, and Houqiang Li. Altfreezing for more general video face forgery detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4129–4138. IEEE, 2023

work page 2023
[29]

Msvt: Multiple spatiotemporal views transformer for deepfake video detection.IEEE transactions on circuits and systems for video technology (Print), 33(9):4462–4471, 2023

Yang Yu, Rongrong Ni, Yao Zhao, Siyuan Yang, Fen Xia, Ning Jiang, and Guoqing Zhao. Msvt: Multiple spatiotemporal views transformer for deepfake video detection.IEEE transactions on circuits and systems for video technology (Print), 33(9):4462–4471, 2023

work page 2023
[30]

Analyzing temporal coherence for deepfake video detection.Electronic Research Archive, 32:2621–2641, 01 2024

Muhammad Amin, Yongjian Hu, and Jiankun Hu. Analyzing temporal coherence for deepfake video detection.Electronic Research Archive, 32:2621–2641, 01 2024

work page 2024
[31]

Align your latents: High-resolution video synthesis with latent diffusion models

Andreas Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim, Sanja Fidler, and Karsten Kreis. Align your latents: High-resolution video synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 22563–22575. IEEE, 2023

work page 2023
[32]

Spatio-temporal knowledge distilled video vision transformer (stkd-vvit) for multimodal deepfake detection.Neurocomputing, 620(C), 2025

Shaheen Usmani, Sunil Kumar, and Debanjan Sadhya. Spatio-temporal knowledge distilled video vision transformer (stkd-vvit) for multimodal deepfake detection.Neurocomputing, 620(C), 2025. 11

work page 2025
[33]

In ictu oculi: Exposing ai created fake videos by detecting eye blinking

Yuezun Li, Ming-Ching Chang, and Siwei Lyu. In ictu oculi: Exposing ai created fake videos by detecting eye blinking. In2018 IEEE International Workshop on Information Forensics and Security (WIFS), pages 1–7. IEEE, 2018

work page 2018
[34]

Deepvision: Deepfakes detection using human eye blinking pattern.IEEE Access, 8:83144–83154, 2020

TackHyun Jung, Sangwon Kim, and Keecheon Kim. Deepvision: Deepfakes detection using human eye blinking pattern.IEEE Access, 8:83144–83154, 2020

work page 2020
[35]

Where do deep fakes look? synthetic face detection via gaze tracking

Ilke Demir and Umur Aybars Ciftci. Where do deep fakes look? synthetic face detection via gaze tracking. InACM Symposium on Eye Tracking Research and Applications, New York, NY , USA, 2021. ACM

work page 2021
[36]

Fakecatcher: Detection of synthetic por- trait videos using biological signals.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020

Umur Aybars Ciftci, Ilke Demir, and Lijun Yin. Fakecatcher: Detection of synthetic por- trait videos using biological signals.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020

work page 2020
[37]

Deeprhythm: Exposing deepfakes with attentional visual heartbeat rhythms

Hua Qi, Qing Guo, Felix Juefei-Xu, Xiaofei Xie, Lei Ma, Wei Feng, Yang Liu, and Jianjun Zhao. Deeprhythm: Exposing deepfakes with attentional visual heartbeat rhythms. InProceedings of the 28th ACM International Conference on Multimedia, MM ’20, page 4318–4327. Association for Computing Machinery, 2020

work page 2020
[38]

Wisotzky, Arian Beckmann, Benjamin Kossack, Anna Hilsmann, and Peter Eisert

Clemens Seibold, Eric L. Wisotzky, Arian Beckmann, Benjamin Kossack, Anna Hilsmann, and Peter Eisert. High-quality deepfakes have a heart!Frontiers in Imaging, 4, 2025

work page 2025
[39]

Adversarial deepfakes: Evaluating vulnerability of deepfake detectors to adversarial examples

Shehzeen Hussain, Paarth Neekhara, Malhar Jere, Farinaz Koushanfar, and Julian McAuley. Adversarial deepfakes: Evaluating vulnerability of deepfake detectors to adversarial examples. InProceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), pages 3348–3357, 2021

work page 2021
[40]

Impact of video processing operations in deepfake detection

Yuhang Lu and Touradj Ebrahimi. Impact of video processing operations in deepfake detection. 2023 24th International Conference on Digital Signal Processing (DSP), pages 1–5, 2023

work page 2023
[41]

Cialdini.Influence: The Psychology of Persuasion

R.B. Cialdini.Influence: The Psychology of Persuasion. Collins Business Essentials. Harper- Collins, 2009

work page 2009
[42]

Oxford University Press, Oxford, 1962

John Langshaw Austin.How to Do Things with Words. Oxford University Press, Oxford, 1962

work page 1962
[43]

Danni Yu, Luyang Li, Hang Su, and Matteo Fuoli. Assessing the potential of llm-assisted anno- tation for corpus-based pragmatics and discourse analysis: The case of apology.International Journal of Corpus Linguistics, 29, 06 2024

work page 2024
[44]

An analysis of social engineering principles in effective phishing

Ana Ferreira and Gabriele Lenzini. An analysis of social engineering principles in effective phishing. InProceedings of the 2015 Workshop on Socio-Technical Aspects in Security and Trust, STAST ’15, page 9–16. IEEE, 2015

work page 2015
[45]

Algorithmic detection of misinformation and disinformation: Gricean perspectives.Journal of Documentation, 74(2):309–332, 2017

Sille Obelitz Søe. Algorithmic detection of misinformation and disinformation: Gricean perspectives.Journal of Documentation, 74(2):309–332, 2017

work page 2017
[46]

Speech acts in social media fraud: Manipulative communication strategies on whatsapp and facebook.Language Circle: Journal of Language and Literature, 20:16–28, 10 2025

Nur Lailiyah, Galuh Areni, Favorita Kurwidaria, Setyo Cahyono, Monika Surtikanti, and Farida Wijayanti. Speech acts in social media fraud: Manipulative communication strategies on whatsapp and facebook.Language Circle: Journal of Language and Literature, 20:16–28, 10 2025

work page 2025
[47]

Fine-grained analysis of propaganda in news articles

Giovanni Da San Martino, Seunghak Yu, Alberto Barrón-Cedeño, Rostislav Petrov, and Preslav Nakov. Fine-grained analysis of propaganda in news articles. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5636–5646. ACL, 2019

work page 2019
[48]

Deepfake influence tactics through the lens of cialdini’s principles: Case studies and the deep frame tool proposal.Applied Cybersecurity & Internet Governance, 2024

Pawel Zegarow and Ewelina Bartuzi. Deepfake influence tactics through the lens of cialdini’s principles: Case studies and the deep frame tool proposal.Applied Cybersecurity & Internet Governance, 2024

work page 2024
[49]

Detecting deepfakes and false ads through analysis of text and social engineering techniques

Alicja Martinek and Ewelina Bartuzi-Trokielewicz. Detecting deepfakes and false ads through analysis of text and social engineering techniques. InProceedings of the 31st International Conference on Computational Linguistics, pages 8432–8448. ACL, 2025. 12

work page 2025
[50]

Spiesberger, Iosif Tsangko, Xin Jing, Verena Distler, Felix Dietz, Florian Alt, and Björn W

Andreas Triantafyllopoulos, Anika A. Spiesberger, Iosif Tsangko, Xin Jing, Verena Distler, Felix Dietz, Florian Alt, and Björn W. Schuller. Vishing: Detecting social engineering in spoken communication — a first survey & urgent roadmap to address an emerging societal challenge. Comput. Speech Lang., 94(C), 2025

work page 2025
[51]

Sok: the good, the bad, and the unbalanced: measuring structural limitations of deepfake media datasets

Seth Layton, Tyler Tucker, Daniel Olszewski, Kevin Warren, Kevin Butler, and Patrick Traynor. Sok: the good, the bad, and the unbalanced: measuring structural limitations of deepfake media datasets. InProceedings of the 33rd USENIX Conference on Security Symposium, SEC ’24, USA, 2024. USENIX Association

work page 2024
[52]

An analysis of recent advances in deepfake image detection in an evolving threat landscape

Sifat Muhammad Abdullah, Aravind Cheruvu, Shravya Kanchi, Taejoong Chung, Peng Gao, Murtuza Jadliwala, and Bimal Viswanath. An analysis of recent advances in deepfake image detection in an evolving threat landscape. In2024 IEEE Symposium on Security and Privacy (SP), pages 91–109. IEEE, 2024

work page 2024
[53]

Unlocking the capabilities of large vision-language models for generalizable and explainable deepfake detection

Peipeng Yu, Jianwei Fei, Hui Gao, Xuan Feng, Zhihua Xia, and Chip Hong Chang. Unlocking the capabilities of large vision-language models for generalizable and explainable deepfake detection. InProceedings of the 42nd International Conference on Machine Learning, volume 267, pages 72925–72943. PMLR, 2025

work page 2025
[54]

Diffusionfake: enhancing generalization in deepfake detection via guided stable diffusion

Ke Sun, Shen Chen, Taiping Yao, Hong Liu, Xiaoshuai Sun, Shouhong Ding, and Rongrong Ji. Diffusionfake: enhancing generalization in deepfake detection via guided stable diffusion. In Proceedings of the 38th International Conference on Neural Information Processing Systems, pages 101474–101497. Curran Associates Inc., 2024

work page 2024
[55]

Few-shot learner generalizes across ai-generated image detection

Shiyu Wu, Jing Liu, Jing Li, and Yequan Wang. Few-shot learner generalizes across ai-generated image detection. InProceedings of the 42nd International Conference on Machine Learning (ICML’25). JMLR.org, 2025

work page 2025
[56]

Seeing is believing: Exploring perceptual differences in deepfake videos

Rashid Tahir, Brishna Batool, Hira Jamshed, Mahnoor Jameel, Mubashir Anwar, Faizan Ahmed, Muhammad Adeel Zaffar, and Muhammad Fareed Zaffar. Seeing is believing: Exploring perceptual differences in deepfake videos. InProceedings of the 2021 CHI Conference on Human Factors in Computing Systems, pages 1–16. ACM, 2021

work page 2021
[57]

MacDorman, Martin Teufel, and Alexander Bäuerle

Alexander Diel, Tania Lalgi, Isabel Carolin Schröter, Karl F. MacDorman, Martin Teufel, and Alexander Bäuerle. Human performance in detecting deepfakes: A systematic review and meta-analysis of 56 papers.Computers in Human Behavior Reports, 16:100538, 2024

work page 2024
[58]

Miller, and Mary Holmes

Klaire Somoray, Dan J. Miller, and Mary Holmes. Human performance in deepfake detection: A systematic review.Human Behavior and Emerging Technologies, 2025(1):1833228, 2025

work page 2025
[59]

Leibowicz, Sean McGregor, and Aviv Ovadya

Claire R. Leibowicz, Sean McGregor, and Aviv Ovadya. The deepfake detection dilemma: A multistakeholder exploration of adversarial dynamics in synthetic media. InProceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, page 736–744. ACM, 2021

work page 2021
[60]

Spang, and Sebastian Möller

Vera Schmitt, Luis-Felipe Villa-Arenas, Nils Feldhus, Joachim Meyer, Robert P. Spang, and Sebastian Möller. The role of explainability in collaborative human-ai disinformation detection. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, page 2157–2174. ACM, 2024

work page 2024
[61]

Deepfake the menace: mitigating the negative impacts of ai-generated content

Siwei Lyu. Deepfake the menace: mitigating the negative impacts of ai-generated content. Organizational Cybersecurity Journal: Practice, Process and People, 4:1–18, 06 2024

work page 2024
[62]

Deepfacelab: Integrated, flexible and extensible face-swapping framework.Pattern Recogn., 141(C), 2023

Kunlin Liu, Ivan Perov, Daiheng Gao, Nikolay Chervoniy, Wenbo Zhou, and Weiming Zhang. Deepfacelab: Integrated, flexible and extensible face-swapping framework.Pattern Recogn., 141(C), 2023

work page 2023
[63]

Simswap: An efficient framework for high fidelity face swapping

Renwang Chen, Xuanhong Chen, Bingbing Ni, and Yanhao Ge. Simswap: An efficient framework for high fidelity face swapping. InProceedings of the 28th ACM International Conference on Multimedia (MM’20), pages 2003–2011. ACM, 2020

work page 2003
[64]

Advancing high fidelity identity swapping for forgery detection

Lingzhi Li, Jianmin Bao, Hao Yang, Dong Chen, and Fang Wen. Advancing high fidelity identity swapping for forgery detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5073–5082, 2020. 13

work page 2020
[65]

Watch your up-convolution: CNN based generative deep neural networks are failing to reproduce spectral distributions

Ricard Durall, Margret Keuper, and Janis Keuper. Watch your up-convolution: CNN based generative deep neural networks are failing to reproduce spectral distributions. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7890–7899. IEEE, 2020

work page 2020
[66]

Xception: Deep learning with depthwise separable convolutions

François Chollet. Xception: Deep learning with depthwise separable convolutions. InProceed- ings of the IEEE conference on computer vision and pattern recognition, 2017

work page 2017
[67]

Efficientnet

Brett Koonce. Efficientnet. InConvolutional neural networks with swift for Tensorflow, pages 109–123. Springer, 2021

work page 2021
[68]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. InPro- ceedings of the 34th Conference on Neural Information Processing Systems. Curran Associates Inc., 2020

work page 2020
[69]

Sora: Creating video from text

OpenAI. Sora: Creating video from text. Technical report, OpenAI, 2024. Updated April 2026

work page 2024
[70]

HunyuanVideo: A Systematic Framework For Large Video Generative Models

Weijie Kong, Qi Tian, Zijian Zhang, Rox Min, Zuozhuo Dai, Jin Zhou, Jia-Liang Xiong, Xin Li, Bo Wu, Jianwei Zhang, et al. Hunyuanvideo: A systematic framework for large video generative models.arXiv preprint arXiv:2412.03603, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[71]

Efficient region-aware neural radiance fields for high-fidelity talking portrait synthesis

Jiahe Li, Jiawei Zhang, Xiao Bai, Jun Zhou, and Lin Gu. Efficient region-aware neural radiance fields for high-fidelity talking portrait synthesis. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 7568–7578. IEEE, 2023

work page 2023
[72]

3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42(4):139–1, 2023

Bernhard Kerbl, Georgios Kopanas, Thomas Leimkuehler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42(4):139–1, 2023

work page 2023
[73]

Hallo2: Long-duration and high-resolution audio-driven portrait image animation.International Conference on Learning Representations (ICLR), 2025

Jiahao Cui, Hui Li, Yao Yao, Hao Zhu, Hanlin Shang, Kaihui Cheng, Hang Zhou, Siyu Zhu, and Jingdong Wang. Hallo2: Long-duration and high-resolution audio-driven portrait image animation.International Conference on Learning Representations (ICLR), 2025

work page 2025
[74]

Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations (ICLR), 2022

work page 2022
[75]

DreamBooth: Fine tuning text-to-image diffusion models for subject-driven generation

Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. DreamBooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 22500–22510, June 2023

work page 2023
[76]

IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models

Hu Ye, Jun Zhang, Sibo Liu, Xiao Han, and Wei Yang. IP-Adapter: Text compatible image prompt adapter for text-to-image diffusion models.arXiv preprint arXiv:2308.06721, 2023. 14 A Appendix A.1 Core Components of Social-Theoretic Frameworks Table A.1: Core components of the three social-theoretic frameworks, their analytical level, and key conditions requi...

work page internal anchor Pith review Pith/arXiv arXiv 2023