pith. machine review for the scientific record. sign in

arxiv: 2605.04405 · v1 · submitted 2026-05-06 · 💻 cs.CV · cs.AI

Recognition: 3 theorem links

· Lean Theorem

Detecting Deepfakes via Hamiltonian Dynamics

Authors on Pith no claims yet

Pith reviewed 2026-05-08 17:58 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords deepfake detectionHamiltonian dynamicslatent manifoldstability analysisanomaly detectionphysics-inspired priorcross-dataset generalization
0
0 comments X

The pith

Deepfakes can be distinguished from real images by releasing their latent states under Hamiltonian dynamics and measuring larger trajectory instability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a shift in deepfake detection from static artifact recognition to dynamical stability analysis on the latent manifold. It starts from the premise that real images arise from dissipative physical processes and therefore rest near low-energy stable equilibria, whereas generative models produce outputs that sit in higher-energy, high-gradient regions. The method models the latent space as a potential energy surface and applies Hamiltonian dynamics by releasing points from rest to observe how their trajectories evolve. Stable real samples produce bounded motion with low action and dissipation, while deepfakes yield larger deviations that are quantified by two trajectory statistics. Experiments on cross-dataset benchmarks show improved generalization, indicating that a physics-inspired stability prior can reduce the need for repeated retraining on new generators.

Core claim

The central claim is that Hamiltonian Action Anomaly Detection (HAAD) identifies deepfakes by treating the image latent manifold as a potential energy surface and probing it with Hamiltonian-inspired dynamics. Real images induce basin-like low-energy responses that keep trajectories bounded, whereas deepfakes produce high-potential gradients and larger trajectory excursions. These behaviors are summarized by two statistics—Hamiltonian action and energy dissipation—yielding a detector that outperforms baselines on challenging transfer settings.

What carries the argument

Hamiltonian dynamics applied to the latent manifold modeled as a potential energy surface, acting as a stability probe that tracks trajectory responses from rest states and quantifies them via action and dissipation statistics.

If this is right

  • Detectors can generalize across unseen generative models without periodic retraining on new artifacts.
  • A stability prior supplements statistical pattern matching in digital forensics applications.
  • The approach reduces reliance on dataset-specific calibration as generative techniques evolve.
  • It provides a concrete way to test whether generative models fail to enforce geometric smoothness constraints present in natural images.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Improving generators to better approximate physical low-energy equilibria might reduce their detectability under this framework.
  • The same dynamics probe could be tested on other synthetic media such as audio or video by extending the latent-space simulation.
  • It links detection performance to the mismatch between statistical optimization and physical dissipation in generative training.

Load-bearing premise

Natural images settle near stable low-energy equilibria on the latent manifold while deepfakes occupy unstable high-energy states.

What would settle it

A direct counterexample would be deepfake images that produce bounded trajectories and low action values indistinguishable from those of real images under the same Hamiltonian simulation.

Figures

Figures reproduced from arXiv: 2605.04405 by Harry Cheng, Liqiang Nie, Ming-Hui Liu, Mohan Kankanhalli, Tianyi Wang, Weili Guan.

Figure 1
Figure 1. Figure 1: Visualization of microscopic structural irregularities, measured by view at source ↗
Figure 2
Figure 2. Figure 2: (a) Manifold Stability Hypothesis. Real images are expected to view at source ↗
Figure 3
Figure 3. Figure 3: Overview of the proposed HAAD. We cast deepfake detection as view at source ↗
Figure 5
Figure 5. Figure 5: (a) Energy evolution trajectories. Real images (blue) exhibit bounded view at source ↗
Figure 6
Figure 6. Figure 6: Potential landscape visualization. We visualize the learned energy view at source ↗
Figure 7
Figure 7. Figure 7: Sensitivity to Step Size η. Performance peaks around η = 0.4; excessively large steps lead to numerical instability. B. Solver-Induced vs. Data-Induced Components of D Because Symplectic Euler conserves a modified (shadow) Hamiltonian with O(η 2 ) error per step [61], its baseline drift for near-conservative trajectories (real samples) grows predictably with step size η. In contrast, fake-induced drift eme… view at source ↗
read the original abstract

Driven by the rapid development of generative AI models, deepfake detectors are compelled to undergo periodic recalibration to capture newly developed synthetic artifacts. To break this cycle, we propose a new perspective on deepfake detection: moving from static pattern recognition to dynamical stability analysis. Specifically, our approach is motivated by physics-inspired priors: we hypothesize that natural images, as products of dissipative physical processes, tend to settle near stable, low-energy equilibria. In contrast, generative models optimize for statistical similarity to real images but do not explicitly enforce structural constraints such as geometric smoothness, leaving deepfakes more likely to occupy unstable, high-energy states. To operationalize this, we introduce Hamiltonian Action Anomaly Detection (HAAD), comprising three contributions: \textbf{i)} We model the image latent manifold as a potential energy surface. Under this hypothesis, real images are expected to produce basin-like low-energy responses, whereas fake images are more likely to induce high-potential, high-gradient responses. \textbf{ii)} We employ Hamiltonian-inspired dynamics as a stability probe. By releasing latent states from rest, samples near stable regions remain bounded, while high-gradient samples produce larger trajectory responses. \textbf{iii)} We quantify these dynamic behaviors through two trajectory statistics, \ie, Hamiltonian action and energy dissipation. Extensive experiments show that HAAD outperforms evaluated state-of-the-art baselines on challenging cross-dataset transfer benchmarks, supporting a physics-inspired stability prior for digital forensics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes Hamiltonian Action Anomaly Detection (HAAD) as a physics-inspired approach to deepfake detection. It models the image latent manifold as a potential energy surface, hypothesizing that natural images (from dissipative processes) occupy stable low-energy equilibria while deepfakes occupy unstable high-energy states. Hamiltonian dynamics are simulated by releasing latent states from rest to probe stability, with classification based on two trajectory statistics: Hamiltonian action and energy dissipation. The paper reports that HAAD outperforms state-of-the-art baselines on challenging cross-dataset transfer benchmarks.

Significance. If the central results hold after addressing validation gaps, the work offers a distinctive dynamical-stability prior for digital forensics that could improve generalization across generative models by moving beyond static artifact detection. The explicit use of Hamiltonian trajectories and derived statistics provides a falsifiable, physics-grounded framework that may inspire similar priors in other anomaly-detection domains.

major comments (3)
  1. [Abstract / §3] Abstract and §3 (Method): The claim that trajectory statistics serve as an independent probe of the stability prior is not supported by any direct, task-independent evidence. No plots, tables, or statistics are shown comparing potential-energy values, gradient magnitudes, or basin properties of real versus fake images on the constructed energy surface before classification; performance gains on cross-dataset benchmarks could therefore arise from generic anomaly features rather than confirming the hypothesized low-energy equilibria for natural images.
  2. [§4] §4 (Experiments): The cross-dataset transfer results are presented as supporting the physics-inspired prior, yet no ablation isolates whether the Hamiltonian action and dissipation statistics actually reflect energy-landscape differences or simply function as learned features. A control experiment (e.g., replacing the dynamics with random trajectories or a non-physics baseline while keeping the same statistics) is required to establish that the stability prior is load-bearing.
  3. [§2 / §3.1] §2 (Hypothesis) and §3.1 (Potential Energy Surface): The definition of the potential energy surface on the latent manifold is described at a high level but lacks an explicit functional form or derivation showing it is independent of the downstream classifier. Without this, it remains unclear whether the surface is constructed in a parameter-free manner or whether the reported separation is tautological with the fitting procedure.
minor comments (2)
  1. [Abstract] Abstract: The acronym HAAD is expanded once but subsequent references should consistently use the full name or acronym to avoid ambiguity for readers unfamiliar with the method.
  2. [Figures / §4] Figure captions and §4: Ensure all trajectory visualizations include axis labels, units, and clear real/fake color coding so that the bounded versus divergent behaviors can be directly inspected.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thorough and insightful review. The comments highlight important aspects that will help clarify the contributions of our work. We address each major comment point by point below, proposing specific revisions where appropriate.

read point-by-point responses
  1. Referee: [Abstract / §3] Abstract and §3 (Method): The claim that trajectory statistics serve as an independent probe of the stability prior is not supported by any direct, task-independent evidence. No plots, tables, or statistics are shown comparing potential-energy values, gradient magnitudes, or basin properties of real versus fake images on the constructed energy surface before classification; performance gains on cross-dataset benchmarks could therefore arise from generic anomaly features rather than confirming the hypothesized low-energy equilibria for natural images.

    Authors: We agree that providing direct, task-independent evidence of the energy landscape properties would better support our claims. In the revised manuscript, we will add visualizations (e.g., histograms and scatter plots) and quantitative statistics comparing potential energy values, gradient magnitudes, and basin properties between real and fake images on the latent manifold. This will demonstrate the separation prior to the application of Hamiltonian dynamics and classification. revision: yes

  2. Referee: [§4] §4 (Experiments): The cross-dataset transfer results are presented as supporting the physics-inspired prior, yet no ablation isolates whether the Hamiltonian action and dissipation statistics actually reflect energy-landscape differences or simply function as learned features. A control experiment (e.g., replacing the dynamics with random trajectories or a non-physics baseline while keeping the same statistics) is required to establish that the stability prior is load-bearing.

    Authors: We acknowledge this valid point regarding the need to isolate the contribution of the physics-inspired dynamics. We will include an ablation study in the revised experiments section, where we compare our Hamiltonian-based statistics against controls such as random trajectories or non-dynamical feature extraction while using the same downstream classifier. This will help confirm that the performance improvements stem from the stability analysis rather than generic anomaly detection features. revision: yes

  3. Referee: [§2 / §3.1] §2 (Hypothesis) and §3.1 (Potential Energy Surface): The definition of the potential energy surface on the latent manifold is described at a high level but lacks an explicit functional form or derivation showing it is independent of the downstream classifier. Without this, it remains unclear whether the surface is constructed in a parameter-free manner or whether the reported separation is tautological with the fitting procedure.

    Authors: The potential energy surface is constructed from the geometry of the latent manifold using a fixed, pre-defined function based on local curvature and deviation from equilibrium, which does not depend on the classifier parameters. We will expand §3.1 in the revision to include the explicit mathematical definition and a derivation demonstrating its independence from the downstream classification task, ensuring it is parameter-free with respect to the detector. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained hypothesis plus independent validation

full rationale

The paper states a physics-motivated hypothesis (natural images near low-energy equilibria, deepfakes in high-energy states), operationalizes it by modeling the latent manifold as a potential surface and applying Hamiltonian dynamics to generate trajectory statistics (action and dissipation), then reports cross-dataset detection performance. No equations reduce the statistics to the hypothesis by construction, no parameters are fitted on a subset and relabeled as predictions, and no load-bearing self-citations or uniqueness theorems appear. The experiments constitute external validation on held-out benchmarks rather than tautological confirmation. The approach therefore remains non-circular under the stated criteria.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on an untested domain assumption about the energy landscape of natural versus generated images; no free parameters or invented entities are explicitly quantified in the abstract.

axioms (1)
  • domain assumption Natural images are products of dissipative physical processes and therefore tend to occupy stable, low-energy equilibria on the latent manifold.
    This hypothesis is stated directly as the motivation for modeling the latent space as a potential energy surface.
invented entities (1)
  • Hamiltonian Action Anomaly Detection (HAAD) no independent evidence
    purpose: A detection framework that uses dynamical trajectory statistics to identify unstable latent states produced by generative models.
    New method name and three-component procedure introduced in the abstract.

pith-pipeline@v0.9.0 · 5565 in / 1305 out tokens · 117973 ms · 2026-05-08T17:58:51.977749+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

66 extracted references · 6 canonical work pages · 1 internal anchor

  1. [1]

    Generative adversarial nets,

    I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. C. Courville, and Y . Bengio, “Generative adversarial nets,” inNIPS, 2014, pp. 2672–2680

  2. [2]

    Scaling rectified flow transformers for high-resolution image synthesis,

    P. Esser, S. Kulal, A. Blattmann, R. Entezari, J. M ¨uller, H. Saini, Y . Levi, D. Lorenz, A. Sauer, F. Boesel, D. Podell, T. Dockhorn, Z. English, and R. Rombach, “Scaling rectified flow transformers for high-resolution image synthesis,” inICML, 2024, pp. 1–13

  3. [3]

    Face2face: Real-time face capture and reenactment of RGB videos,

    J. Thies, M. Zollh ¨ofer, M. Stamminger, C. Theobalt, and M. Nießner, “Face2face: Real-time face capture and reenactment of RGB videos,” in CVPR, 2016, pp. 2387–2395

  4. [4]

    Can we leave deepfake data behind in training deepfake detector?

    J. Cheng, Z. Yan, Y . Zhang, Y . Luo, Z. Wang, and C. Li, “Can we leave deepfake data behind in training deepfake detector?” inNeurIPS, 2024, pp. 21 979–21 998

  5. [5]

    Contrastive learning for deepfake classification and localization via multi-label ranking,

    C. Hong, Y . Hsu, and T. Liu, “Contrastive learning for deepfake classification and localization via multi-label ranking,” inCVPR, 2024, pp. 17 627–17 637

  6. [6]

    Fair deepfake detectors can generalize,

    H. Cheng, M.-H. Liu, Y . Guo, T. Wang, L. Nie, and M. Kankanhalli, “Fair deepfake detectors can generalize,” inNeurIPS, 2025

  7. [7]

    From specificity to generality: Revisiting generalizable artifacts in detecting face deepfakes,

    L. Ma, Z. Yan, J. Xu, Y . Chen, Q. Guo, Z. Bi, Y . Liao, and H. Lin, “From specificity to generality: Revisiting generalizable artifacts in detecting face deepfakes,” inNeurIPS, 2025

  8. [8]

    Mmnet: multi- collaboration and multi-supervision network for sequential deepfake detection,

    R. Xia, D. Liu, J. Li, L. Yuan, N. Wang, and X. Gao, “Mmnet: multi- collaboration and multi-supervision network for sequential deepfake detection,”IEEE TIFS, vol. 19, pp. 3409–3422, 2024

  9. [9]

    Detecting deepfakes with self-blended images,

    K. Shiohara and T. Yamasaki, “Detecting deepfakes with self-blended images,” inCVPR, 2022, pp. 18 699–18 708

  10. [10]

    Rethinking the up-sampling operations in cnn-based generative network for gener- alizable deepfake detection,

    C. Tan, H. Liu, Y . Zhao, S. Wei, G. Gu, P. Liu, and Y . Wei, “Rethinking the up-sampling operations in cnn-based generative network for gener- alizable deepfake detection,” inCVPR, 2024, pp. 28 130–28 139

  11. [11]

    Towards more general video-based deepfake detection through facial component guided adaptation for foundation model,

    Y .-H. Han, T.-M. Huang, K.-L. Hua, and J.-C. Chen, “Towards more general video-based deepfake detection through facial component guided adaptation for foundation model,” inCVPR, 2025, pp. 22 995–23 005

  12. [12]

    Diffusion facial forgery detection,

    H. Cheng, Y . Guo, T. Wang, L. Nie, and M. Kankanhalli, “Diffusion facial forgery detection,” inACM MM, 2024, p. 5939–5948

  13. [13]

    Celeb-DF++: A large-scale challenging video deepfake benchmark for generalizable forensics.arXiv preprint arXiv:2507.18015, 2025

    Y . Li, D. Zhu, X. Cui, and S. Lyu, “Celeb-DF++: A Large-Scale challenging video DeepFake benchmark for generalizable forensics,” arXiv preprint arXiv:2507.18015, 2025

  14. [14]

    arXiv preprint arXiv:1907.04490 , year=

    M. Lutter, C. Ritter, and J. Peters, “Deep Lagrangian Networks: Using physics as model prior for deep learning,”arXiv preprint arXiv:1907.04490, 2019

  15. [15]

    Lagrangian neural networks.arXiv:2003.04630,

    M. Cranmer, S. Greydanus, S. Hoyer, P. Battaglia, D. Spergel, and S. Ho, “Lagrangian neural networks,”arXiv preprint arXiv:2003.04630, 2020

  16. [16]

    Celeb-DF: A large-scale challenging dataset for DeepFake forensics,

    Y . Li, X. Yang, P. Sun, H. Qi, and S. Lyu, “Celeb-DF: A large-scale challenging dataset for DeepFake forensics,” inCVPR, 2020, pp. 3204– 3213

  17. [17]

    The DeepFake Detection Challenge (DFDC) Dataset

    B. Dolhansky, J. Bitton, B. Pflaum, J. Lu, R. Howes, M. Wang, and C. C. Ferrer, “The deepfake detection challenge (DFDC) dataset,”arXiv preprint arXiv:2006.07397, 2020

  18. [18]

    Contributing Data to Deepfake Detection Research,

    Google AI Blog, “Contributing Data to Deepfake Detection Research,” 2019. [Online]. Available: https://research.google/blog/ contributing-data-to-deepfake-detection-research/

  19. [19]

    DeepfakeBench: A comprehensive benchmark of deepfake detection,

    Z. Yan, Y . Zhang, X. Yuan, S. Lyu, and B. Wu, “DeepfakeBench: A comprehensive benchmark of deepfake detection,” inNeurIPS, 2023, pp. 4534–4565

  20. [20]

    Diffswap: High- fidelity and controllable face swapping via 3d-aware masked diffusion,

    W. Zhao, Y . Rao, W. Shi, Z. Liu, J. Zhou, and J. Lu, “Diffswap: High- fidelity and controllable face swapping via 3d-aware masked diffusion,” inCVPR, 2023, pp. 8568–8577

  21. [21]

    GenImage: A million-scale benchmark for detecting AI- generated image,

    M. Zhu, H. Chen, Q. Yan, X. Huang, G. Lin, W. Li, Z. Tu, H. Hu, J. Hu, and Y . Wang, “GenImage: A million-scale benchmark for detecting AI- generated image,” inNeurIPS, 2023, pp. 77 771–77 782

  22. [22]

    DIRE for diffusion-generated image detection,

    Z. Wang, J. Bao, W. Zhou, W. Wang, H. Hu, H. Chen, and H. Li, “DIRE for diffusion-generated image detection,” inICCV, 2023, pp. 22 445–22 455

  23. [23]

    Advancing generalized deepfake detector with forgery perception guid- ance,

    R. Xia, D. Zhou, D. Liu, L. Yuan, S. Wang, J. Li, N. Wang, and X. Gao, “Advancing generalized deepfake detector with forgery perception guid- ance,” inACM MM, 2024, pp. 6676–6685

  24. [24]

    Learning real facial concepts for independent deepfake detection,

    M.-H. Liu, H. Cheng, T. Wang, X. Luo, and X.-S. Xu, “Learning real facial concepts for independent deepfake detection,” inIJCAI, 2025, pp. 1585–1593. 11

  25. [25]

    Exploring frequency adversarial attacks for face forgery detection,

    S. Jia, C. Ma, T. Yao, B. Yin, S. Ding, and X. Yang, “Exploring frequency adversarial attacks for face forgery detection,” inCVPR, 2022, pp. 4093–4102

  26. [26]

    Sstnet: Detecting manipulated faces through spatial, steganalysis and temporal features,

    X. Wu, Z. Xie, Y . Gao, and Y . Xiao, “Sstnet: Detecting manipulated faces through spatial, steganalysis and temporal features,” inICASSP, 2020, pp. 2952–2956

  27. [27]

    Face x-ray for more general face forgery detection,

    L. Li, J. Bao, T. Zhang, H. Yang, D. Chen, F. Wen, and B. Guo, “Face x-ray for more general face forgery detection,” inCVPR, 2020, pp. 5000– 5009

  28. [28]

    End-to- end reconstruction-classification learning for face forgery detection,

    J. Cao, C. Ma, T. Yao, S. Chen, S. Ding, and X. Yang, “End-to- end reconstruction-classification learning for face forgery detection,” in CVPR, 2022, pp. 4103–4112

  29. [29]

    Generalizable synthetic image detection via language-guided contrastive learning,

    H. Wu, J. Zhou, and S. Zhang, “Generalizable synthetic image detection via language-guided contrastive learning,”arXiv preprint arXiv:2305.13800, 2023

  30. [30]

    Forgery-aware adaptive transformer for generalizable synthetic image detection,

    H. Liu, Z. Tan, C. Tan, Y . Wei, J. Wang, and Y . Zhao, “Forgery-aware adaptive transformer for generalizable synthetic image detection,” in CVPR, 2024, pp. 10 770–10 780

  31. [31]

    Towards universal fake image detectors that generalize across generative models,

    U. Ojha, Y . Li, and Y . J. Lee, “Towards universal fake image detectors that generalize across generative models,” inCVPR, 2023, pp. 24 480– 24 489

  32. [32]

    Towards general visual-linguistic face forgery detection,

    K. Sun, S. Chen, T. Yao, Z. Zhou, J. Ji, X. Sun, C.-W. Lin, and R. Ji, “Towards general visual-linguistic face forgery detection,” inCVPR, 2025, pp. 19 576–19 586

  33. [33]

    Learning transferable visual models from natural language supervision,

    A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clarket al., “Learning transferable visual models from natural language supervision,” inICML, 2021, pp. 8748–8763

  34. [34]

    Orthogonal subspace decomposition for generalizable ai-generated image detection,

    Z. Yan, J. Wang, P. Jin, K.-Y . Zhang, C. Liu, S. Chen, T. Yao, S. Ding, B. Wu, and L. Yuan, “Orthogonal subspace decomposition for generalizable ai-generated image detection,” inICML, 2025, pp. 1–13

  35. [35]

    FAMM: Facial muscle motions for detecting compressed deepfake videos over social networks,

    X. Liao, Y . Wang, T. Wang, K. P. Chow, and S. Lyu, “FAMM: Facial muscle motions for detecting compressed deepfake videos over social networks,”IEEE TCSVT, vol. 33, no. 12, pp. 7236–7251, 2023

  36. [36]

    Physg: Inverse rendering with spherical gaussians for physics-based material editing and relighting,

    K. Zhang, F. Luan, Q. Wang, K. Bala, and N. Snavely, “Physg: Inverse rendering with spherical gaussians for physics-based material editing and relighting,” inCVPR, 2021, pp. 5453–5462

  37. [37]

    Inverse rendering for complex indoor scenes: Shape, spatially-varying lighting and svbrdf from a single image,

    Z. Li, M. Shafiei, R. Ramamoorthi, K. Sunkavalli, and M. Chandraker, “Inverse rendering for complex indoor scenes: Shape, spatially-varying lighting and svbrdf from a single image,” inCVPR, 2020, pp. 2475– 2484

  38. [38]

    Energy-based out-of-distribution detection,

    W. Liu, X. Wang, J. Owens, and Y . Li, “Energy-based out-of-distribution detection,” inNeurIPS, 2020, pp. 21 464–21 475

  39. [39]

    Hamiltonian neural net- works,

    S. Greydanus, M. Dzamba, and J. Yosinski, “Hamiltonian neural net- works,” inNeurIPS, 2019, pp. 15 353–15 363

  40. [40]

    Your classifier is secretly an energy based model and you should treat it like one,

    D. Duvenaud, J. Wang, J. Jacobsen, K. Swersky, M. Norouzi, and W. Grathwohl, “Your classifier is secretly an energy based model and you should treat it like one,” inICLR, 2020

  41. [41]

    arXiv preprint arXiv:1909.12077 , year=

    Y . D. Zhong, B. Dey, and A. Chakraborty, “Symplectic ODE- Net: Learning Hamiltonian dynamics with control,”arXiv preprint arXiv:1909.12077, 2019

  42. [42]

    Neural networks and physical systems with emergent collective computational abilities

    J. J. Hopfield, “Neural networks and physical systems with emergent collective computational abilities.”PNAS, vol. 79, no. 8, pp. 2554–2558, 1982

  43. [43]

    Thinking in frequency: Face forgery detection by mining frequency-aware clues,

    Y . Qian, G. Yin, L. Sheng, Z. Chen, and J. Shao, “Thinking in frequency: Face forgery detection by mining frequency-aware clues,” inECCV, 2020, pp. 86–103

  44. [44]

    Spatial-phase shallow learning: Rethinking face forgery detection in frequency domain,

    H. Liu, X. Li, W. Zhou, Y . Chen, Y . He, H. Xue, W. Zhang, and N. Yu, “Spatial-phase shallow learning: Rethinking face forgery detection in frequency domain,” inCVPR, 2021, pp. 772–781

  45. [45]

    Generalizing face forgery detection with high-frequency features,

    Y . Luo, Y . Zhang, J. Yan, and W. Liu, “Generalizing face forgery detection with high-frequency features,” inCVPR, 2021, pp. 16 317– 16 326

  46. [46]

    Core: Consistent representation learning for face forgery detection,

    Y . Ni, D. Meng, C. Yu, C. Quan, D. Ren, and Y . Zhao, “Core: Consistent representation learning for face forgery detection,” inCVPRW, 2022, pp. 12–21

  47. [47]

    UCF: uncovering common features for generalizable deepfake detection,

    Z. Yan, Y . Zhang, Y . Fan, and B. Wu, “UCF: uncovering common features for generalizable deepfake detection,” inICCV, 2023, pp. 22 355–22 366

  48. [48]

    Implicit identity leakage: The stumbling block to improving deepfake detection generalization,

    S. Dong, J. Wang, R. Ji, J. Liang, H. Fan, and Z. Ge, “Implicit identity leakage: The stumbling block to improving deepfake detection generalization,” inCVPR, 2023, pp. 3994–4004

  49. [49]

    Transcending forgery specificity with latent space augmentation for generalizable deepfake detection,

    Z. Yan, Y . Luo, S. Lyu, Q. Liu, and B. Wu, “Transcending forgery specificity with latent space augmentation for generalizable deepfake detection,” inCVPR, 2024, pp. 8984–8994

  50. [50]

    X2-dfd: A framework for explainable and extendable deepfake detection,

    Y . Chen, Z. Yan, G. Cheng, K. Zhao, S. Lyu, and B. Wu, “X2-dfd: A framework for explainable and extendable deepfake detection,” in NeurIPS, 2025

  51. [51]

    FaceForensics++: Learning to detect manipulated facial images,

    A. R ¨ossler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and M. Nießner, “FaceForensics++: Learning to detect manipulated facial images,” inICCV, 2019, pp. 1–11

  52. [52]

    Deeperforensics-1.0: A large-scale dataset for real-world face forgery detection,

    L. Jiang, R. Li, W. Wu, C. Qian, and C. C. Loy, “Deeperforensics-1.0: A large-scale dataset for real-world face forgery detection,” inCVPR, 2020, pp. 2889–2898

  53. [53]

    Wilddeepfake: A challenging real-world dataset for deepfake detection,

    B. Zi, M. Chang, J. Chen, X. Ma, and Y . Jiang, “Wilddeepfake: A challenging real-world dataset for deepfake detection,” inACM MM, 2020, pp. 2382–2390

  54. [54]

    Face forensics in the wild,

    T. Zhou, W. Wang, Z. Liang, and J. Shen, “Face forensics in the wild,” inCVPR, 2021, pp. 5778–5788

  55. [55]

    DF40: Toward next-generation deepfake detec- tion,

    Z. Yan, T. Yao, S. Chen, Y . Zhao, X. Fu, J. Zhu, D. Luo, C. Wang, S. Ding, Y . Wuet al., “DF40: Toward next-generation deepfake detec- tion,” inNeurIPS, 2024, pp. 29 387–29 434

  56. [56]

    CNN- generated images are surprisingly easy to spot... for now,

    S.-Y . Wang, O. Wang, R. Zhang, A. Owens, and A. A. Efros, “CNN- generated images are surprisingly easy to spot... for now,” inCVPR, 2020, pp. 8695–8704

  57. [57]

    Frequency-aware deepfake detection: Improving generalizability through frequency space domain learning,

    C. Tan, Y . Zhao, S. Wei, G. Gu, P. Liu, and Y . Wei, “Frequency-aware deepfake detection: Improving generalizability through frequency space domain learning,” inAAAI, 2024, pp. 5052–5060

  58. [58]

    Drct: Diffusion reconstruction contrastive training towards universal detection of diffusion generated images,

    B. Chen, J. Zeng, J. Yang, and R. Yang, “Drct: Diffusion reconstruction contrastive training towards universal detection of diffusion generated images,” inICML, 2024, pp. 7621–7639

  59. [59]

    Dual data alignment makes AI-generated image detector easier generalizable,

    R. Chen, J. Xi, Z. Yan, K.-Y . Zhang, S. Wu, J. Xie, X. Chen, L. Xu, I. Guan, T. Yaoet al., “Dual data alignment makes AI-generated image detector easier generalizable,” inNeurIPS, 2025

  60. [60]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inCVPR, 2016, pp. 770–778

  61. [61]

    Leimkuhler and S

    B. Leimkuhler and S. Reich,Simulating Hamiltonian Dynamics. Cam- bridge University Press, 2004, no. 14. 12 APPENDIXA THEORETICALMOTIVATION FOR THEMANIFOLD STABILITYHYPOTHESIS In this section, we provide a theoretical motivation for the Manifold Stability Hypothesis proposed in Section III-A. We connect the Principle of Least Action (PLA) to the static geo...

  62. [62]

    This neighborhood implies two conditions: 1)Vanishing Gradient:∥∇V(q real)∥ ≈0

    Case 1: Real Samples (Stable Equilibrium):Under the Manifold Stability Hypothesis(Assumption III.2), a real sampleq real is hypothesized to reside near a local minimum. This neighborhood implies two conditions: 1)Vanishing Gradient:∥∇V(q real)∥ ≈0. 2)Positive Definite Hessian:The HessianH(q real)is positive semi-definite (convex basin). Substituting∇V≈0in...

  63. [63]

    Hamiltonian-inspired, symplectic-style

    Case 2: Fake Samples (Unstable State):For a deepfake sampleq f ake, the hypothesis posits that it is more likely to lie on a slope with a significant non-zero gradient in the local comparison of interest. Letg=∇V(q f ake)be the gradient vector, where∥g∥=C≫0. For a small time stept, we can assume the gradient force is approximately constant (zeroth-order a...

  64. [64]

    q=Linear(D in →D phy)(x).(38)

    Physical State Projection:To reduce computational complexity and enforce a bottleneck, we project the input featuresxto a physical stateq∈R N×D phy withD phy = 64. q=Linear(D in →D phy)(x).(38)

  65. [65]

    It is estimated via a lightweight MLP with a Softplus activation to keep the scaling positive for numerical stability: M−1(q) =Softplus(MLP(q)) +ϵ,(39) whereϵ= 10 −3

    Mass Estimation Network:The state-conditioned di- agonal preconditionerM −1(q)rescales the position update for each patch and feature dimension. It is estimated via a lightweight MLP with a Softplus activation to keep the scaling positive for numerical stability: M−1(q) =Softplus(MLP(q)) +ϵ,(39) whereϵ= 10 −3. The MLP architecture is: •Layer 1: Linear(64→...

  66. [66]

    •Albedo (ρ):Linear(64→1) followed by Sigmoid acti- vation

    Potential Parameterization Heads:To compute the pho- tometric potentialV photo, we estimate intrinsic physical prop- erties using shallow linear heads: •Surface Normal (n):Linear(64→3) followed byL 2 normalization. •Albedo (ρ):Linear(64→1) followed by Sigmoid acti- vation. •Global Light (l):Linear(64→3), averaged over all patches to obtain a global vector...