pith. sign in

arxiv: 2508.05691 · v3 · submitted 2025-08-06 · 💻 cs.CR · cs.AI· cs.LG

SPRINT: Robust Model Attribution of Generated Images via Secret Pixel Reconstruction

Pith reviewed 2026-05-19 00:38 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.LG
keywords model attributionAI-generated imagesfingerprintingadaptive attacksrobustnesssecret reconstructionimage forensicsaccountability
0
0 comments X

The pith

SPRINT attributes generated images to source models by defining secret reconstruction targets that stay private from attackers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a technique for identifying which model produced an AI-generated image even when an attacker knows the overall approach and tries to erase or fake the evidence. Standard fingerprinting methods use patterns an attacker can discover and then disrupt, but SPRINT instead ties verification to secret targets that the attacker cannot see. This keeps the verification task itself hidden, so the attacker lacks the information needed to craft effective evasions. Experiments on the FFHQ dataset report high accuracy across both varied models and nearly identical checkpoints while holding adaptive attack success rates near zero.

Core claim

SPRINT creates model fingerprints by assigning each image a set of hidden reconstruction targets defined by a secret. Because the attacker does not know the verification task at attack time, the details required to remove or forge the fingerprint remain unavailable, yielding both high clean attribution accuracy and strong resistance to adaptive removal and forgery.

What carries the argument

Secret pixel reconstruction fingerprinting, which replaces public discoverable patterns with private reconstruction targets known only to the verifier.

Load-bearing premise

The secret that defines the reconstruction targets remains unknown to the attacker at the time of an adaptive attack.

What would settle it

A demonstration that an attacker who correctly learns or guesses the secret reconstruction targets can remove or forge the fingerprints with high success rate.

Figures

Figures reproduced from arXiv: 2508.05691 by Kai Yao, Marc Juarez.

Figure 1
Figure 1. Figure 1: Overview of the AUTHPRINT pipeline. In the certification phase, a verifier with black-box access to the claimed model selects secret pixel locations (image channels are simplified for illustration) as a fingerprint and trains a private reconstructor to predict them from generated images. After certification, the provider deploys the model via its black-box API, while the verifier hosts a separate verificat… view at source ↗
Figure 3
Figure 3. Figure 3: shows the results of the evaluation for SD models, where the target model is SD 2.1, and negatives are earlier versions (SD 1.5 to 1.1), introducing progressively larger distributional shifts due to architectural and dataset differences. The maximum fingerprint size is larger compared to Style￾GAN2 due to the larger output size of SD models. As seen in the figures, AUTHPRINT consistently achieves low detec… view at source ↗
Figure 4
Figure 4. Figure 4: Impact of the number of denoising steps (i.e. iterations of reverse [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Effect of reconstructor size (left) and reconstructor training set size [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Effect of model modifications on detection performance for [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 6
Figure 6. Figure 6: Effect of ensemble detection on AUTHPRINT performance for SD models using the Japanese cafe´ prompt. Each reconstructor has 674M parameters, trained on 512k samples with fingerprint length 64. We evaluate this scheme on SD models using identical hyperparameters but distinct index sets per reconstructor. As shown in [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 8
Figure 8. Figure 8: Effect of prompt specificity on detection performance for SD. We [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Analysis of reconstructor stability under pixel manipulation. Mean [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗
read the original abstract

Detecting the source model of AI-generated images is a growing accountability problem. AI fingerprinting techniques address this by detecting imperceptible patterns in the images that are unique to each model, achieving high detection accuracy under ideal conditions. However, recent research has shown that image fingerprints are extremely brittle to adaptive attacks, where knowledge of the technique can be exploited to perturb the fingerprints and evade detection. We present SPRINT (Secret Pixel Reconstruction fingerprinting), a novel model attribution method specifically designed to provide robustness to adaptive attacks. As opposed to existing fingerprinting, which focuses on publicly discoverable patterns in the image, SPRINT relies on a secret to define hidden reconstruction targets, thus keeping the verification task itself private. As a result, the attacker can no longer see the task that the verifier solves at verification time, protecting the information exploited by the attacks. Our results show that SPRINT achieves high closed-world accuracy while remaining robust to adaptive attacks: on the FFHQ dataset, SPRINT reaches 99.17% clean accuracy on a diverse 12-model pool and 98.83% on a harder pool of 6 close checkpoints of the same model architecture, while reducing adaptive removal and forgery attack success rates to 1% or below. When the same pool of close model checkpoints is considered an open world, SPRINT maintains high accuracy with an AUROC of 99.30%. These findings show that the approach of privatizing the verification task can make adaptive evasion substantially harder while maintaining performance in the clean setting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper introduces SPRINT, a model attribution technique for AI-generated images that employs a secret to define hidden pixel reconstruction targets, thereby privatizing the verification task and aiming to resist adaptive attacks that exploit knowledge of public fingerprints. On the FFHQ dataset, it reports 99.17% clean accuracy on a 12-model pool and 98.83% on 6 close checkpoints of the same architecture, with adaptive removal and forgery attack success rates reduced to 1% or below; it also achieves 99.30% AUROC in an open-world setting for close checkpoints.

Significance. If the secrecy of the reconstruction targets holds against adaptive adversaries and the empirical results are supported by rigorous controls, the privatization of the verification task offers a conceptually distinct approach to robust attribution that could address the brittleness of existing fingerprinting methods. The reported performance on both diverse and close-checkpoint pools, combined with low attack success, would represent a meaningful advance in the area if the underlying assumption is validated.

major comments (2)
  1. Abstract: The central robustness claim (adaptive removal and forgery success rates of 1% or below) rests on the assumption that 'the verification task itself [remains] private' because reconstruction targets are defined by a secret unknown to the attacker. The manuscript provides no details on how this secrecy is maintained or tested (e.g., whether adaptive attacks include attempts to infer targets via generator access, side-channel information, or statistical analysis of outputs), which directly undermines the reported attack-resistance numbers.
  2. Results and experimental sections: The abstract states specific accuracy figures (99.17%, 98.83%, AUROC 99.30%) and attack success rates without describing the experimental setup, number of trials, statistical significance testing, or precise implementation of the adaptive attacks (including whether attackers operated with or without knowledge of the secret task). These omissions make it impossible to assess whether the empirical claims support the central contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We have revised the manuscript to address the concerns about insufficient details on secrecy assumptions and experimental procedures. Below we respond point by point to the major comments.

read point-by-point responses
  1. Referee: Abstract: The central robustness claim (adaptive removal and forgery success rates of 1% or below) rests on the assumption that 'the verification task itself [remains] private' because reconstruction targets are defined by a secret unknown to the attacker. The manuscript provides no details on how this secrecy is maintained or tested (e.g., whether adaptive attacks include attempts to infer targets via generator access, side-channel information, or statistical analysis of outputs), which directly undermines the reported attack-resistance numbers.

    Authors: We agree that the original manuscript did not sufficiently elaborate on the secrecy mechanism and threat model. The reconstruction targets are chosen uniformly at random from a large discrete space (on the order of 10^6 possible targets per image) and are treated as a shared secret between the model owner and verifier, transmitted via a secure channel outside the image generation pipeline. In the revised manuscript we have added a new paragraph in Section 3.2 explicitly stating this assumption and the threat model: the adaptive attacker is assumed to know the SPRINT algorithm and have white-box access to the generator but no knowledge of the specific secret targets. We have also added a short discussion of why statistical inference or side-channel attacks on the targets are considered outside the current scope, with a note that such attacks would require a different threat model not addressed in this work. The reported attack success rates therefore reflect the setting where the secret remains unknown to the attacker. revision: yes

  2. Referee: Results and experimental sections: The abstract states specific accuracy figures (99.17%, 98.83%, AUROC 99.30%) and attack success rates without describing the experimental setup, number of trials, statistical significance testing, or precise implementation of the adaptive attacks (including whether attackers operated with or without knowledge of the secret task). These omissions make it impossible to assess whether the empirical claims support the central contribution.

    Authors: We acknowledge that the experimental details were too brief. In the revised version we have expanded Section 4 to include a complete Experimental Setup subsection. This now specifies: evaluation on 10,000 images per model drawn from the FFHQ test split, results averaged over 5 independent random seeds with reported standard deviations, and use of 95% confidence intervals for the accuracy and AUROC figures. The adaptive removal and forgery attacks are described with pseudocode; each attack is run with full knowledge of the SPRINT algorithm but without access to the secret reconstruction targets. We have also added the exact hyperparameters used for the attacks and a statement that all experiments were performed under the closed-world and open-world settings described in the paper. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical results stand on experimental validation

full rationale

The paper introduces SPRINT as a design that privatizes the verification task via secret reconstruction targets, then reports empirical accuracies (99.17% clean on 12-model FFHQ pool, 98.83% on close checkpoints, AUROC 99.30% open-world, adaptive attack success ≤1%). No equations, derivations, or parameter fits are shown that reduce these outcomes to the inputs by construction. The secrecy assumption is a stated design premise rather than a self-referential definition or fitted prediction. No load-bearing self-citations, uniqueness theorems, or ansatz smuggling appear in the abstract or described claims. The central performance numbers are framed as measured outcomes under the stated threat model, making the derivation chain self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the assumption that secrecy of the reconstruction targets can be maintained against determined attackers; no free parameters or invented physical entities are mentioned in the abstract.

axioms (1)
  • domain assumption The secret reconstruction targets remain unknown to potential attackers during adaptive attacks
    Robustness claim depends on this privacy property as described in the abstract.

pith-pipeline@v0.9.0 · 5802 in / 1202 out tokens · 37480 ms · 2026-05-19T00:38:27.426297+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages · 4 internal anchors

  1. [1]

    On the Opportunities and Risks of Foundation Models

    R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M. S. Bernstein, J. Bohg, A. Bosselut, E. Brunskillet al., “On the opportunities and risks of foundation models,”arXiv preprint arXiv:2108.07258, 2021

  2. [2]

    European Parliament and Council of the European Union, “Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence (Artificial Intelligence Act) and amending certain Union legislative acts,” 2023, official Journal of the European Union, pp. 1–144

  3. [3]

    Notice of Violation to V olkswagen AG, Audi AG, and V olkswagen Group of America, Inc

    US EPA, “Notice of Violation to V olkswagen AG, Audi AG, and V olkswagen Group of America, Inc.” U.S. Environmental Protection Agency (EPA), Notice of Violation, September 2015

  4. [4]

    GPT-4 is getting worse over time, not bet- ter,

    S. L. Valdarrama, “GPT-4 is getting worse over time, not bet- ter,” Under Twitter handle @svpino: https://x.com/svpino/status/ 1681614284613099520, 2023, accessed: Jul 27, 2025

  5. [5]

    The responsibility gap: Ascribing responsibility for the actions of learning automata,

    A. Matthias, “The responsibility gap: Ascribing responsibility for the actions of learning automata,”Ethics and information technology, vol. 6, no. 3, pp. 175–183, 2004

  6. [6]

    Governing the AI Business Model: Plat- forms All the Way Down?

    M. Veale, “Governing the AI Business Model: Plat- forms All the Way Down?” https://efi.ed.ac.uk/event/ governing-the-ai-business-model-platforms-all-the-way-down/, 2023, accessed: Dec 15, 2023

  7. [7]

    Safetynets: Verifiable execution of deep neural networks on an untrusted cloud,

    Z. Ghodsi, T. Gu, and S. Garg, “Safetynets: Verifiable execution of deep neural networks on an untrusted cloud,”Advances in Neural Information Processing Systems, vol. 30, 2017

  8. [8]

    Efficient functional commit- ments: How to commit to a private function,

    D. Boneh, W. Nguyen, and A. Ozdemir, “Efficient functional commit- ments: How to commit to a private function,”Cryptology ePrint Archive, 2021

  9. [9]

    Scaling up trustless dnn inference with zero-knowledge proofs,

    D. Kang, T. Hashimoto, I. Stoica, and Y . Sun, “Scaling up trustless dnn inference with zero-knowledge proofs,” inAdvances in Neural Information Processing Systems, vol. 36. Curran Associates, Inc., 2023

  10. [10]

    vcnn: Verifiable convolutional neural network based on zk-snarks,

    S. Lee, H. Ko, J. Kim, and H. Oh, “vcnn: Verifiable convolutional neural network based on zk-snarks,”IEEE Transactions on Dependable and Secure Computing, 2024

  11. [11]

    zkllm: Zero knowledge proofs for large language models,

    H. Sun, J. Li, and H. Zhang, “zkllm: Zero knowledge proofs for large language models,” inProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, 2024, pp. 4405–4419

  12. [12]

    zkPyTorch: A hierarchical optimized compiler for zero-knowledge machine learning,

    T. Xie, T. Lu, Z. Fang, S. Wang, Z. Zhang, Y . Jia, D. Song, and J. Zhang, “zkPyTorch: A hierarchical optimized compiler for zero-knowledge machine learning,” Cryptology ePrint Archive, Paper 2025/535, 2025. [Online]. Available: https://eprint.iacr.org/2025/535

  13. [13]

    Zktorch: Compiling ml inference to zero-knowledge proofs via parallel proof accumulation,

    B.-J. Chen, L. Tang, and D. Kang, “Zktorch: Compiling ml inference to zero-knowledge proofs via parallel proof accumulation,”arXiv preprint arXiv:2507.07031, 2025

  14. [14]

    Slalom: Fast, Verifiable and Private Execution of Neural Networks in Trusted Hardware

    F. Tramer and D. Boneh, “Slalom: Fast, verifiable and private ex- ecution of neural networks in trusted hardware,”arXiv preprint arXiv:1806.03287, 2018

  15. [15]

    Attributing fake images to GANs: Learning and analyzing GAN fingerprints,

    N. Yu, L. S. Davis, and M. Fritz, “Attributing fake images to GANs: Learning and analyzing GAN fingerprints,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 7555–7565

  16. [16]

    Fourier spectrum discrepancies in deep network generated images,

    T. Dzanic, K. Shah, and F. Witherden, “Fourier spectrum discrepancies in deep network generated images,” inAdvances in Neural Information Processing Systems (NeurIPS), vol. 33. Curran Associates, Inc., 2020, pp. 3022–3032, arXiv:1911.06465

  17. [17]

    Towards discovery and attribution of open-world gan generated images,

    S. Girish, S. Suri, S. Rambhatla, and A. Shrivastava, “Towards discovery and attribution of open-world gan generated images,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, 2021, pp. 14 094–14 103

  18. [18]

    ManiFPT: Defining and analyzing fingerprints of generative models,

    H. J. Song, M. Khayatkhoei, and W. AbdAlmageed, “ManiFPT: Defining and analyzing fingerprints of generative models,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 10 971–10 981

  19. [19]

    Riemannian-geometric fingerprints of generative models,

    H. J. Song and L. Itti, “Riemannian-geometric fingerprints of generative models,”arXiv preprint arXiv:2506.22802, 2025

  20. [20]

    Pixel recurrent neural networks,

    A. van den Oord, N. Kalchbrenner, and K. Kavukcuoglu, “Pixel recurrent neural networks,” inProceedings of the 33rd International Conference on Machine Learning. PMLR, 2016, pp. 1747–1756

  21. [21]

    Generative image modeling using spatial lstms,

    L. Theis and M. Bethge, “Generative image modeling using spatial lstms,” inAdvances in Neural Information Processing Systems, vol. 28, 2015

  22. [22]

    Training generative adversarial networks with limited data,

    T. Karras, M. Aittala, S. Laine, E. H ¨ark¨onen, J. Hellsten, J. Lehtinen, and T. Aila, “Training generative adversarial networks with limited data,” inAdvances in Neural Information Processing Systems (NeurIPS),

  23. [23]

    Available: https://arxiv.org/abs/2006.06676

    [Online]. Available: https://arxiv.org/abs/2006.06676

  24. [24]

    Improved consistency regularization for gans,

    Z. Zhao, S. Singh, H. Lee, Z. Zhang, A. Odena, and H. Zhang, “Improved consistency regularization for gans,”arXiv preprint arXiv:2002.04724, 2020. [Online]. Available: https://arxiv.org/abs/2002. 04724

  25. [25]

    Gpt-4o: Openai’s omnimodal model,

    OpenAI, “Gpt-4o: Openai’s omnimodal model,” https://openai.com/ index/hello-gpt-4o, 2024, accessed: 2025-09-23

  26. [26]

    Diffusiondb: A large-scale prompt gallery dataset for text-to-image generative models,

    Z. J. Wang, E. Montoya, D. Munechika, H. Yang, B. Hoover, and D. H. Chau, “Diffusiondb: A large-scale prompt gallery dataset for text-to-image generative models,” inProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023). Association for Computational Linguistics, 2023, long paper, Best Paper Honorable Mention

  27. [27]

    Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

    J. Yu, Y . Xu, J. Y . Koh, T. Luong, G. Baid, Z. Wang, V . Vasudevan, A. Ku, Y . Yang, B. Karagol Ayan, B. Hutchinson, W. Han, Z. Parekh, X. Li, H. Zhang, J. Baldridge, and Y . Wu, “Scaling autoregres- sive models for content-rich text-to-image generation,”arXiv preprint arXiv:2206.10789, 2022

  28. [28]

    Towards Deep Learning Models Resistant to Adversarial Attacks

    A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,”arXiv preprint arXiv:1706.06083, 2017

  29. [29]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778

  30. [30]

    Do GANs leave artificial fingerprints?

    F. Marra, D. Gragnaniello, L. Verdoliva, and G. Poggi, “Do GANs leave artificial fingerprints?” inProceedings of the 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), 2019, pp. 506–511

  31. [31]

    Nonadaptive algorithms for threshold group testing,

    H. B. Chen and H. L. Fu, “Nonadaptive algorithms for threshold group testing,”Discrete Applied Mathematics, vol. 157, no. 8, pp. 1581–1585, 2009

  32. [32]

    Improved non-adaptive algorithms for threshold group testing with a gap,

    T. V . Bui, M. Cheraghchi, and I. Echizen, “Improved non-adaptive algorithms for threshold group testing with a gap,”IEEE Transactions on Information Theory, vol. 67, no. 11, pp. 7180–7196, 2021