Recognition: 3 theorem links
· Lean TheoremDetecting Deepfakes via Hamiltonian Dynamics
Pith reviewed 2026-05-08 17:58 UTC · model grok-4.3
The pith
Deepfakes can be distinguished from real images by releasing their latent states under Hamiltonian dynamics and measuring larger trajectory instability.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that Hamiltonian Action Anomaly Detection (HAAD) identifies deepfakes by treating the image latent manifold as a potential energy surface and probing it with Hamiltonian-inspired dynamics. Real images induce basin-like low-energy responses that keep trajectories bounded, whereas deepfakes produce high-potential gradients and larger trajectory excursions. These behaviors are summarized by two statistics—Hamiltonian action and energy dissipation—yielding a detector that outperforms baselines on challenging transfer settings.
What carries the argument
Hamiltonian dynamics applied to the latent manifold modeled as a potential energy surface, acting as a stability probe that tracks trajectory responses from rest states and quantifies them via action and dissipation statistics.
If this is right
- Detectors can generalize across unseen generative models without periodic retraining on new artifacts.
- A stability prior supplements statistical pattern matching in digital forensics applications.
- The approach reduces reliance on dataset-specific calibration as generative techniques evolve.
- It provides a concrete way to test whether generative models fail to enforce geometric smoothness constraints present in natural images.
Where Pith is reading between the lines
- Improving generators to better approximate physical low-energy equilibria might reduce their detectability under this framework.
- The same dynamics probe could be tested on other synthetic media such as audio or video by extending the latent-space simulation.
- It links detection performance to the mismatch between statistical optimization and physical dissipation in generative training.
Load-bearing premise
Natural images settle near stable low-energy equilibria on the latent manifold while deepfakes occupy unstable high-energy states.
What would settle it
A direct counterexample would be deepfake images that produce bounded trajectories and low action values indistinguishable from those of real images under the same Hamiltonian simulation.
Figures
read the original abstract
Driven by the rapid development of generative AI models, deepfake detectors are compelled to undergo periodic recalibration to capture newly developed synthetic artifacts. To break this cycle, we propose a new perspective on deepfake detection: moving from static pattern recognition to dynamical stability analysis. Specifically, our approach is motivated by physics-inspired priors: we hypothesize that natural images, as products of dissipative physical processes, tend to settle near stable, low-energy equilibria. In contrast, generative models optimize for statistical similarity to real images but do not explicitly enforce structural constraints such as geometric smoothness, leaving deepfakes more likely to occupy unstable, high-energy states. To operationalize this, we introduce Hamiltonian Action Anomaly Detection (HAAD), comprising three contributions: \textbf{i)} We model the image latent manifold as a potential energy surface. Under this hypothesis, real images are expected to produce basin-like low-energy responses, whereas fake images are more likely to induce high-potential, high-gradient responses. \textbf{ii)} We employ Hamiltonian-inspired dynamics as a stability probe. By releasing latent states from rest, samples near stable regions remain bounded, while high-gradient samples produce larger trajectory responses. \textbf{iii)} We quantify these dynamic behaviors through two trajectory statistics, \ie, Hamiltonian action and energy dissipation. Extensive experiments show that HAAD outperforms evaluated state-of-the-art baselines on challenging cross-dataset transfer benchmarks, supporting a physics-inspired stability prior for digital forensics.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes Hamiltonian Action Anomaly Detection (HAAD) as a physics-inspired approach to deepfake detection. It models the image latent manifold as a potential energy surface, hypothesizing that natural images (from dissipative processes) occupy stable low-energy equilibria while deepfakes occupy unstable high-energy states. Hamiltonian dynamics are simulated by releasing latent states from rest to probe stability, with classification based on two trajectory statistics: Hamiltonian action and energy dissipation. The paper reports that HAAD outperforms state-of-the-art baselines on challenging cross-dataset transfer benchmarks.
Significance. If the central results hold after addressing validation gaps, the work offers a distinctive dynamical-stability prior for digital forensics that could improve generalization across generative models by moving beyond static artifact detection. The explicit use of Hamiltonian trajectories and derived statistics provides a falsifiable, physics-grounded framework that may inspire similar priors in other anomaly-detection domains.
major comments (3)
- [Abstract / §3] Abstract and §3 (Method): The claim that trajectory statistics serve as an independent probe of the stability prior is not supported by any direct, task-independent evidence. No plots, tables, or statistics are shown comparing potential-energy values, gradient magnitudes, or basin properties of real versus fake images on the constructed energy surface before classification; performance gains on cross-dataset benchmarks could therefore arise from generic anomaly features rather than confirming the hypothesized low-energy equilibria for natural images.
- [§4] §4 (Experiments): The cross-dataset transfer results are presented as supporting the physics-inspired prior, yet no ablation isolates whether the Hamiltonian action and dissipation statistics actually reflect energy-landscape differences or simply function as learned features. A control experiment (e.g., replacing the dynamics with random trajectories or a non-physics baseline while keeping the same statistics) is required to establish that the stability prior is load-bearing.
- [§2 / §3.1] §2 (Hypothesis) and §3.1 (Potential Energy Surface): The definition of the potential energy surface on the latent manifold is described at a high level but lacks an explicit functional form or derivation showing it is independent of the downstream classifier. Without this, it remains unclear whether the surface is constructed in a parameter-free manner or whether the reported separation is tautological with the fitting procedure.
minor comments (2)
- [Abstract] Abstract: The acronym HAAD is expanded once but subsequent references should consistently use the full name or acronym to avoid ambiguity for readers unfamiliar with the method.
- [Figures / §4] Figure captions and §4: Ensure all trajectory visualizations include axis labels, units, and clear real/fake color coding so that the bounded versus divergent behaviors can be directly inspected.
Simulated Author's Rebuttal
We thank the referee for the thorough and insightful review. The comments highlight important aspects that will help clarify the contributions of our work. We address each major comment point by point below, proposing specific revisions where appropriate.
read point-by-point responses
-
Referee: [Abstract / §3] Abstract and §3 (Method): The claim that trajectory statistics serve as an independent probe of the stability prior is not supported by any direct, task-independent evidence. No plots, tables, or statistics are shown comparing potential-energy values, gradient magnitudes, or basin properties of real versus fake images on the constructed energy surface before classification; performance gains on cross-dataset benchmarks could therefore arise from generic anomaly features rather than confirming the hypothesized low-energy equilibria for natural images.
Authors: We agree that providing direct, task-independent evidence of the energy landscape properties would better support our claims. In the revised manuscript, we will add visualizations (e.g., histograms and scatter plots) and quantitative statistics comparing potential energy values, gradient magnitudes, and basin properties between real and fake images on the latent manifold. This will demonstrate the separation prior to the application of Hamiltonian dynamics and classification. revision: yes
-
Referee: [§4] §4 (Experiments): The cross-dataset transfer results are presented as supporting the physics-inspired prior, yet no ablation isolates whether the Hamiltonian action and dissipation statistics actually reflect energy-landscape differences or simply function as learned features. A control experiment (e.g., replacing the dynamics with random trajectories or a non-physics baseline while keeping the same statistics) is required to establish that the stability prior is load-bearing.
Authors: We acknowledge this valid point regarding the need to isolate the contribution of the physics-inspired dynamics. We will include an ablation study in the revised experiments section, where we compare our Hamiltonian-based statistics against controls such as random trajectories or non-dynamical feature extraction while using the same downstream classifier. This will help confirm that the performance improvements stem from the stability analysis rather than generic anomaly detection features. revision: yes
-
Referee: [§2 / §3.1] §2 (Hypothesis) and §3.1 (Potential Energy Surface): The definition of the potential energy surface on the latent manifold is described at a high level but lacks an explicit functional form or derivation showing it is independent of the downstream classifier. Without this, it remains unclear whether the surface is constructed in a parameter-free manner or whether the reported separation is tautological with the fitting procedure.
Authors: The potential energy surface is constructed from the geometry of the latent manifold using a fixed, pre-defined function based on local curvature and deviation from equilibrium, which does not depend on the classifier parameters. We will expand §3.1 in the revision to include the explicit mathematical definition and a derivation demonstrating its independence from the downstream classification task, ensuring it is parameter-free with respect to the detector. revision: yes
Circularity Check
No significant circularity; derivation is self-contained hypothesis plus independent validation
full rationale
The paper states a physics-motivated hypothesis (natural images near low-energy equilibria, deepfakes in high-energy states), operationalizes it by modeling the latent manifold as a potential surface and applying Hamiltonian dynamics to generate trajectory statistics (action and dissipation), then reports cross-dataset detection performance. No equations reduce the statistics to the hypothesis by construction, no parameters are fitted on a subset and relabeled as predictions, and no load-bearing self-citations or uniqueness theorems appear. The experiments constitute external validation on held-out benchmarks rather than tautological confirmation. The approach therefore remains non-circular under the stated criteria.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Natural images are products of dissipative physical processes and therefore tend to occupy stable, low-energy equilibria on the latent manifold.
invented entities (1)
-
Hamiltonian Action Anomaly Detection (HAAD)
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith.Cost (Jcost, CostAlpha) and Foundation.AlphaCoordinateFixationcost_alpha_one_eq_jcost / J_uniquely_calibrated_via_higher_derivative unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
H(q,p) = T(p) + V(q) = ½‖p‖² + V(q); dq/dt = p, dp/dt = −∇V(q). V(q) = λ_geo V_geo(q) + λ_photo V_photo(n,ρ,l), with V_geo a Dirichlet/graph-Laplacian energy and V_photo a Lambertian shading variance; potential is learned end-to-end.
-
IndisputableMonolith.Foundation.AlphaDerivationExplicitalphaProvenanceCert (parameter-free derivation contrast) unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The framework is operationalized via a learned diagonal mass matrix M(q), step size η, evolution length T=4, regularization weights λ, λ_geo, λ_photo, and a hinge margin γ — all tuned hyperparameters with no closed-form derivation.
-
IndisputableMonolith.Foundation.ArithmeticFromLogic and Constants (phi)embed_strictMono_of_one_lt / phi_golden_ratio unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Symplectic Euler is shown to have det(J)=1 (volume preservation); no use of golden ratio, ratio symmetry x↔1/x, or 8-tick periodicity anywhere.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Generative adversarial nets,
I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. C. Courville, and Y . Bengio, “Generative adversarial nets,” inNIPS, 2014, pp. 2672–2680
2014
-
[2]
Scaling rectified flow transformers for high-resolution image synthesis,
P. Esser, S. Kulal, A. Blattmann, R. Entezari, J. M ¨uller, H. Saini, Y . Levi, D. Lorenz, A. Sauer, F. Boesel, D. Podell, T. Dockhorn, Z. English, and R. Rombach, “Scaling rectified flow transformers for high-resolution image synthesis,” inICML, 2024, pp. 1–13
2024
-
[3]
Face2face: Real-time face capture and reenactment of RGB videos,
J. Thies, M. Zollh ¨ofer, M. Stamminger, C. Theobalt, and M. Nießner, “Face2face: Real-time face capture and reenactment of RGB videos,” in CVPR, 2016, pp. 2387–2395
2016
-
[4]
Can we leave deepfake data behind in training deepfake detector?
J. Cheng, Z. Yan, Y . Zhang, Y . Luo, Z. Wang, and C. Li, “Can we leave deepfake data behind in training deepfake detector?” inNeurIPS, 2024, pp. 21 979–21 998
2024
-
[5]
Contrastive learning for deepfake classification and localization via multi-label ranking,
C. Hong, Y . Hsu, and T. Liu, “Contrastive learning for deepfake classification and localization via multi-label ranking,” inCVPR, 2024, pp. 17 627–17 637
2024
-
[6]
Fair deepfake detectors can generalize,
H. Cheng, M.-H. Liu, Y . Guo, T. Wang, L. Nie, and M. Kankanhalli, “Fair deepfake detectors can generalize,” inNeurIPS, 2025
2025
-
[7]
From specificity to generality: Revisiting generalizable artifacts in detecting face deepfakes,
L. Ma, Z. Yan, J. Xu, Y . Chen, Q. Guo, Z. Bi, Y . Liao, and H. Lin, “From specificity to generality: Revisiting generalizable artifacts in detecting face deepfakes,” inNeurIPS, 2025
2025
-
[8]
Mmnet: multi- collaboration and multi-supervision network for sequential deepfake detection,
R. Xia, D. Liu, J. Li, L. Yuan, N. Wang, and X. Gao, “Mmnet: multi- collaboration and multi-supervision network for sequential deepfake detection,”IEEE TIFS, vol. 19, pp. 3409–3422, 2024
2024
-
[9]
Detecting deepfakes with self-blended images,
K. Shiohara and T. Yamasaki, “Detecting deepfakes with self-blended images,” inCVPR, 2022, pp. 18 699–18 708
2022
-
[10]
Rethinking the up-sampling operations in cnn-based generative network for gener- alizable deepfake detection,
C. Tan, H. Liu, Y . Zhao, S. Wei, G. Gu, P. Liu, and Y . Wei, “Rethinking the up-sampling operations in cnn-based generative network for gener- alizable deepfake detection,” inCVPR, 2024, pp. 28 130–28 139
2024
-
[11]
Towards more general video-based deepfake detection through facial component guided adaptation for foundation model,
Y .-H. Han, T.-M. Huang, K.-L. Hua, and J.-C. Chen, “Towards more general video-based deepfake detection through facial component guided adaptation for foundation model,” inCVPR, 2025, pp. 22 995–23 005
2025
-
[12]
Diffusion facial forgery detection,
H. Cheng, Y . Guo, T. Wang, L. Nie, and M. Kankanhalli, “Diffusion facial forgery detection,” inACM MM, 2024, p. 5939–5948
2024
-
[13]
Y . Li, D. Zhu, X. Cui, and S. Lyu, “Celeb-DF++: A Large-Scale challenging video DeepFake benchmark for generalizable forensics,” arXiv preprint arXiv:2507.18015, 2025
-
[14]
arXiv preprint arXiv:1907.04490 , year=
M. Lutter, C. Ritter, and J. Peters, “Deep Lagrangian Networks: Using physics as model prior for deep learning,”arXiv preprint arXiv:1907.04490, 2019
-
[15]
Lagrangian neural networks.arXiv:2003.04630,
M. Cranmer, S. Greydanus, S. Hoyer, P. Battaglia, D. Spergel, and S. Ho, “Lagrangian neural networks,”arXiv preprint arXiv:2003.04630, 2020
-
[16]
Celeb-DF: A large-scale challenging dataset for DeepFake forensics,
Y . Li, X. Yang, P. Sun, H. Qi, and S. Lyu, “Celeb-DF: A large-scale challenging dataset for DeepFake forensics,” inCVPR, 2020, pp. 3204– 3213
2020
-
[17]
The DeepFake Detection Challenge (DFDC) Dataset
B. Dolhansky, J. Bitton, B. Pflaum, J. Lu, R. Howes, M. Wang, and C. C. Ferrer, “The deepfake detection challenge (DFDC) dataset,”arXiv preprint arXiv:2006.07397, 2020
work page internal anchor Pith review arXiv 2006
-
[18]
Contributing Data to Deepfake Detection Research,
Google AI Blog, “Contributing Data to Deepfake Detection Research,” 2019. [Online]. Available: https://research.google/blog/ contributing-data-to-deepfake-detection-research/
2019
-
[19]
DeepfakeBench: A comprehensive benchmark of deepfake detection,
Z. Yan, Y . Zhang, X. Yuan, S. Lyu, and B. Wu, “DeepfakeBench: A comprehensive benchmark of deepfake detection,” inNeurIPS, 2023, pp. 4534–4565
2023
-
[20]
Diffswap: High- fidelity and controllable face swapping via 3d-aware masked diffusion,
W. Zhao, Y . Rao, W. Shi, Z. Liu, J. Zhou, and J. Lu, “Diffswap: High- fidelity and controllable face swapping via 3d-aware masked diffusion,” inCVPR, 2023, pp. 8568–8577
2023
-
[21]
GenImage: A million-scale benchmark for detecting AI- generated image,
M. Zhu, H. Chen, Q. Yan, X. Huang, G. Lin, W. Li, Z. Tu, H. Hu, J. Hu, and Y . Wang, “GenImage: A million-scale benchmark for detecting AI- generated image,” inNeurIPS, 2023, pp. 77 771–77 782
2023
-
[22]
DIRE for diffusion-generated image detection,
Z. Wang, J. Bao, W. Zhou, W. Wang, H. Hu, H. Chen, and H. Li, “DIRE for diffusion-generated image detection,” inICCV, 2023, pp. 22 445–22 455
2023
-
[23]
Advancing generalized deepfake detector with forgery perception guid- ance,
R. Xia, D. Zhou, D. Liu, L. Yuan, S. Wang, J. Li, N. Wang, and X. Gao, “Advancing generalized deepfake detector with forgery perception guid- ance,” inACM MM, 2024, pp. 6676–6685
2024
-
[24]
Learning real facial concepts for independent deepfake detection,
M.-H. Liu, H. Cheng, T. Wang, X. Luo, and X.-S. Xu, “Learning real facial concepts for independent deepfake detection,” inIJCAI, 2025, pp. 1585–1593. 11
2025
-
[25]
Exploring frequency adversarial attacks for face forgery detection,
S. Jia, C. Ma, T. Yao, B. Yin, S. Ding, and X. Yang, “Exploring frequency adversarial attacks for face forgery detection,” inCVPR, 2022, pp. 4093–4102
2022
-
[26]
Sstnet: Detecting manipulated faces through spatial, steganalysis and temporal features,
X. Wu, Z. Xie, Y . Gao, and Y . Xiao, “Sstnet: Detecting manipulated faces through spatial, steganalysis and temporal features,” inICASSP, 2020, pp. 2952–2956
2020
-
[27]
Face x-ray for more general face forgery detection,
L. Li, J. Bao, T. Zhang, H. Yang, D. Chen, F. Wen, and B. Guo, “Face x-ray for more general face forgery detection,” inCVPR, 2020, pp. 5000– 5009
2020
-
[28]
End-to- end reconstruction-classification learning for face forgery detection,
J. Cao, C. Ma, T. Yao, S. Chen, S. Ding, and X. Yang, “End-to- end reconstruction-classification learning for face forgery detection,” in CVPR, 2022, pp. 4103–4112
2022
-
[29]
Generalizable synthetic image detection via language-guided contrastive learning,
H. Wu, J. Zhou, and S. Zhang, “Generalizable synthetic image detection via language-guided contrastive learning,”arXiv preprint arXiv:2305.13800, 2023
-
[30]
Forgery-aware adaptive transformer for generalizable synthetic image detection,
H. Liu, Z. Tan, C. Tan, Y . Wei, J. Wang, and Y . Zhao, “Forgery-aware adaptive transformer for generalizable synthetic image detection,” in CVPR, 2024, pp. 10 770–10 780
2024
-
[31]
Towards universal fake image detectors that generalize across generative models,
U. Ojha, Y . Li, and Y . J. Lee, “Towards universal fake image detectors that generalize across generative models,” inCVPR, 2023, pp. 24 480– 24 489
2023
-
[32]
Towards general visual-linguistic face forgery detection,
K. Sun, S. Chen, T. Yao, Z. Zhou, J. Ji, X. Sun, C.-W. Lin, and R. Ji, “Towards general visual-linguistic face forgery detection,” inCVPR, 2025, pp. 19 576–19 586
2025
-
[33]
Learning transferable visual models from natural language supervision,
A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clarket al., “Learning transferable visual models from natural language supervision,” inICML, 2021, pp. 8748–8763
2021
-
[34]
Orthogonal subspace decomposition for generalizable ai-generated image detection,
Z. Yan, J. Wang, P. Jin, K.-Y . Zhang, C. Liu, S. Chen, T. Yao, S. Ding, B. Wu, and L. Yuan, “Orthogonal subspace decomposition for generalizable ai-generated image detection,” inICML, 2025, pp. 1–13
2025
-
[35]
FAMM: Facial muscle motions for detecting compressed deepfake videos over social networks,
X. Liao, Y . Wang, T. Wang, K. P. Chow, and S. Lyu, “FAMM: Facial muscle motions for detecting compressed deepfake videos over social networks,”IEEE TCSVT, vol. 33, no. 12, pp. 7236–7251, 2023
2023
-
[36]
Physg: Inverse rendering with spherical gaussians for physics-based material editing and relighting,
K. Zhang, F. Luan, Q. Wang, K. Bala, and N. Snavely, “Physg: Inverse rendering with spherical gaussians for physics-based material editing and relighting,” inCVPR, 2021, pp. 5453–5462
2021
-
[37]
Inverse rendering for complex indoor scenes: Shape, spatially-varying lighting and svbrdf from a single image,
Z. Li, M. Shafiei, R. Ramamoorthi, K. Sunkavalli, and M. Chandraker, “Inverse rendering for complex indoor scenes: Shape, spatially-varying lighting and svbrdf from a single image,” inCVPR, 2020, pp. 2475– 2484
2020
-
[38]
Energy-based out-of-distribution detection,
W. Liu, X. Wang, J. Owens, and Y . Li, “Energy-based out-of-distribution detection,” inNeurIPS, 2020, pp. 21 464–21 475
2020
-
[39]
Hamiltonian neural net- works,
S. Greydanus, M. Dzamba, and J. Yosinski, “Hamiltonian neural net- works,” inNeurIPS, 2019, pp. 15 353–15 363
2019
-
[40]
Your classifier is secretly an energy based model and you should treat it like one,
D. Duvenaud, J. Wang, J. Jacobsen, K. Swersky, M. Norouzi, and W. Grathwohl, “Your classifier is secretly an energy based model and you should treat it like one,” inICLR, 2020
2020
-
[41]
arXiv preprint arXiv:1909.12077 , year=
Y . D. Zhong, B. Dey, and A. Chakraborty, “Symplectic ODE- Net: Learning Hamiltonian dynamics with control,”arXiv preprint arXiv:1909.12077, 2019
-
[42]
Neural networks and physical systems with emergent collective computational abilities
J. J. Hopfield, “Neural networks and physical systems with emergent collective computational abilities.”PNAS, vol. 79, no. 8, pp. 2554–2558, 1982
1982
-
[43]
Thinking in frequency: Face forgery detection by mining frequency-aware clues,
Y . Qian, G. Yin, L. Sheng, Z. Chen, and J. Shao, “Thinking in frequency: Face forgery detection by mining frequency-aware clues,” inECCV, 2020, pp. 86–103
2020
-
[44]
Spatial-phase shallow learning: Rethinking face forgery detection in frequency domain,
H. Liu, X. Li, W. Zhou, Y . Chen, Y . He, H. Xue, W. Zhang, and N. Yu, “Spatial-phase shallow learning: Rethinking face forgery detection in frequency domain,” inCVPR, 2021, pp. 772–781
2021
-
[45]
Generalizing face forgery detection with high-frequency features,
Y . Luo, Y . Zhang, J. Yan, and W. Liu, “Generalizing face forgery detection with high-frequency features,” inCVPR, 2021, pp. 16 317– 16 326
2021
-
[46]
Core: Consistent representation learning for face forgery detection,
Y . Ni, D. Meng, C. Yu, C. Quan, D. Ren, and Y . Zhao, “Core: Consistent representation learning for face forgery detection,” inCVPRW, 2022, pp. 12–21
2022
-
[47]
UCF: uncovering common features for generalizable deepfake detection,
Z. Yan, Y . Zhang, Y . Fan, and B. Wu, “UCF: uncovering common features for generalizable deepfake detection,” inICCV, 2023, pp. 22 355–22 366
2023
-
[48]
Implicit identity leakage: The stumbling block to improving deepfake detection generalization,
S. Dong, J. Wang, R. Ji, J. Liang, H. Fan, and Z. Ge, “Implicit identity leakage: The stumbling block to improving deepfake detection generalization,” inCVPR, 2023, pp. 3994–4004
2023
-
[49]
Transcending forgery specificity with latent space augmentation for generalizable deepfake detection,
Z. Yan, Y . Luo, S. Lyu, Q. Liu, and B. Wu, “Transcending forgery specificity with latent space augmentation for generalizable deepfake detection,” inCVPR, 2024, pp. 8984–8994
2024
-
[50]
X2-dfd: A framework for explainable and extendable deepfake detection,
Y . Chen, Z. Yan, G. Cheng, K. Zhao, S. Lyu, and B. Wu, “X2-dfd: A framework for explainable and extendable deepfake detection,” in NeurIPS, 2025
2025
-
[51]
FaceForensics++: Learning to detect manipulated facial images,
A. R ¨ossler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and M. Nießner, “FaceForensics++: Learning to detect manipulated facial images,” inICCV, 2019, pp. 1–11
2019
-
[52]
Deeperforensics-1.0: A large-scale dataset for real-world face forgery detection,
L. Jiang, R. Li, W. Wu, C. Qian, and C. C. Loy, “Deeperforensics-1.0: A large-scale dataset for real-world face forgery detection,” inCVPR, 2020, pp. 2889–2898
2020
-
[53]
Wilddeepfake: A challenging real-world dataset for deepfake detection,
B. Zi, M. Chang, J. Chen, X. Ma, and Y . Jiang, “Wilddeepfake: A challenging real-world dataset for deepfake detection,” inACM MM, 2020, pp. 2382–2390
2020
-
[54]
Face forensics in the wild,
T. Zhou, W. Wang, Z. Liang, and J. Shen, “Face forensics in the wild,” inCVPR, 2021, pp. 5778–5788
2021
-
[55]
DF40: Toward next-generation deepfake detec- tion,
Z. Yan, T. Yao, S. Chen, Y . Zhao, X. Fu, J. Zhu, D. Luo, C. Wang, S. Ding, Y . Wuet al., “DF40: Toward next-generation deepfake detec- tion,” inNeurIPS, 2024, pp. 29 387–29 434
2024
-
[56]
CNN- generated images are surprisingly easy to spot... for now,
S.-Y . Wang, O. Wang, R. Zhang, A. Owens, and A. A. Efros, “CNN- generated images are surprisingly easy to spot... for now,” inCVPR, 2020, pp. 8695–8704
2020
-
[57]
Frequency-aware deepfake detection: Improving generalizability through frequency space domain learning,
C. Tan, Y . Zhao, S. Wei, G. Gu, P. Liu, and Y . Wei, “Frequency-aware deepfake detection: Improving generalizability through frequency space domain learning,” inAAAI, 2024, pp. 5052–5060
2024
-
[58]
Drct: Diffusion reconstruction contrastive training towards universal detection of diffusion generated images,
B. Chen, J. Zeng, J. Yang, and R. Yang, “Drct: Diffusion reconstruction contrastive training towards universal detection of diffusion generated images,” inICML, 2024, pp. 7621–7639
2024
-
[59]
Dual data alignment makes AI-generated image detector easier generalizable,
R. Chen, J. Xi, Z. Yan, K.-Y . Zhang, S. Wu, J. Xie, X. Chen, L. Xu, I. Guan, T. Yaoet al., “Dual data alignment makes AI-generated image detector easier generalizable,” inNeurIPS, 2025
2025
-
[60]
Deep residual learning for image recognition,
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inCVPR, 2016, pp. 770–778
2016
-
[61]
Leimkuhler and S
B. Leimkuhler and S. Reich,Simulating Hamiltonian Dynamics. Cam- bridge University Press, 2004, no. 14. 12 APPENDIXA THEORETICALMOTIVATION FOR THEMANIFOLD STABILITYHYPOTHESIS In this section, we provide a theoretical motivation for the Manifold Stability Hypothesis proposed in Section III-A. We connect the Principle of Least Action (PLA) to the static geo...
2004
-
[62]
This neighborhood implies two conditions: 1)Vanishing Gradient:∥∇V(q real)∥ ≈0
Case 1: Real Samples (Stable Equilibrium):Under the Manifold Stability Hypothesis(Assumption III.2), a real sampleq real is hypothesized to reside near a local minimum. This neighborhood implies two conditions: 1)Vanishing Gradient:∥∇V(q real)∥ ≈0. 2)Positive Definite Hessian:The HessianH(q real)is positive semi-definite (convex basin). Substituting∇V≈0in...
-
[63]
Hamiltonian-inspired, symplectic-style
Case 2: Fake Samples (Unstable State):For a deepfake sampleq f ake, the hypothesis posits that it is more likely to lie on a slope with a significant non-zero gradient in the local comparison of interest. Letg=∇V(q f ake)be the gradient vector, where∥g∥=C≫0. For a small time stept, we can assume the gradient force is approximately constant (zeroth-order a...
-
[64]
q=Linear(D in →D phy)(x).(38)
Physical State Projection:To reduce computational complexity and enforce a bottleneck, we project the input featuresxto a physical stateq∈R N×D phy withD phy = 64. q=Linear(D in →D phy)(x).(38)
-
[65]
It is estimated via a lightweight MLP with a Softplus activation to keep the scaling positive for numerical stability: M−1(q) =Softplus(MLP(q)) +ϵ,(39) whereϵ= 10 −3
Mass Estimation Network:The state-conditioned di- agonal preconditionerM −1(q)rescales the position update for each patch and feature dimension. It is estimated via a lightweight MLP with a Softplus activation to keep the scaling positive for numerical stability: M−1(q) =Softplus(MLP(q)) +ϵ,(39) whereϵ= 10 −3. The MLP architecture is: •Layer 1: Linear(64→...
-
[66]
•Albedo (ρ):Linear(64→1) followed by Sigmoid acti- vation
Potential Parameterization Heads:To compute the pho- tometric potentialV photo, we estimate intrinsic physical prop- erties using shallow linear heads: •Surface Normal (n):Linear(64→3) followed byL 2 normalization. •Albedo (ρ):Linear(64→1) followed by Sigmoid acti- vation. •Global Light (l):Linear(64→3), averaged over all patches to obtain a global vector...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.