pith. sign in

arxiv: 2505.03519 · v7 · submitted 2025-05-06 · 💻 cs.LG

Revisiting Model Inversion Evaluation: From Misleading Standards to Reliable Privacy Assessment

Pith reviewed 2026-05-22 15:44 UTC · model grok-4.3

classification 💻 cs.LG
keywords model inversionprivacy assessmentfalse positivesadversarial examplesevaluation frameworkmultimodal large language modelsmachine learning security
0
0 comments X

The pith

Many model inversion reconstructions counted as successful do not capture the target's visual identity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines the standard way of judging model inversion attacks, where success is measured by a secondary model trained on the same private data and task as the target. It shows that many reconstructions accepted by this method are false positives that fail to recover the actual visual features of the target individual. These false positives meet the formal definition of Type I adversarial examples and transfer reliably across different models, which inflates attack accuracy numbers and overstates privacy leakage. The authors introduce an alternative evaluation method that uses multimodal large language models for assessment, relying on broad visual reasoning rather than narrow task-specific features. Re-testing 27 prior attack setups with the new method reveals high false-positive rates in the conventional results.

Core claim

The dominant evaluation framework for model inversion attacks computes attack accuracy through a secondary evaluation model trained on the same private data and task design as the target model. Many reconstructions that satisfy this framework do not capture the visual identity of the target individual. These MI false positives satisfy the same formal conditions as Type I adversarial examples and show extremely high false-positive transferability, indicating they contain Type I adversarial features. This transferability inflates reported attack accuracy and overstates privacy leakage. An MLLM-based evaluation framework, built on systematic design principles, avoids the shared-task weakness of

What carries the argument

MLLM-based evaluation framework, which applies multimodal large language models' general-purpose visual reasoning to judge whether a reconstruction matches a target's identity instead of relying on task-specific classification.

If this is right

  • Reported attack accuracies in existing model inversion studies are inflated by false positives.
  • Privacy leakage from model inversion has been overstated in prior research.
  • Reevaluation of 27 MI attack setups across datasets and models shows consistently high false-positive rates under the old method.
  • MLLM-based evaluation reduces Type I adversarial transferability and provides a more reliable privacy assessment standard.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Evaluation problems of this kind could appear in other privacy attack studies that depend on task-specific models for success measurement.
  • Techniques developed to detect adversarial examples might be adapted to strengthen MI evaluation methods.
  • Future work could combine MLLM judgments with other checks to create even more robust privacy assessment protocols.

Load-bearing premise

Multimodal large language models have general-purpose visual reasoning that prevents them from sharing the task-specific vulnerabilities of conventional evaluation models.

What would settle it

If independent MLLM evaluations on reconstructions from prior MI papers find that a large share of those previously labeled successful do not match the target person's visual identity, the claim of widespread false positives would be supported.

Figures

Figures reproduced from arXiv: 2505.03519 by Alexander Binder, Koh Jun Hao, Ngai-Man Cheung, Ngoc-Bao Nguyen, Sy-Tuyen Ho.

Figure 1
Figure 1. Figure 1: We present the first and in-depth study on the Model Inversion (MI) evaluation. Particularly, we investigate the most common MI evaluation framework FCurr to measure MI Attack Accuracy (AttAcc). FCurr is introduced in [59] and is utilized to assess almost all recent MI attacks/defenses. However, we find that FCurr suffers from a significant number of false positives. These false positive MI reconstructed s… view at source ↗
Figure 2
Figure 2. Figure 2: An example of evaluation query in our FMLLM. The task is to determine whether “Image A” depicts the same individual as those in “Image B”. We have two setups: (1) “Image A” and “Image B” consist of private images and (2) “Image A” is an MI-reconstructed image x r y of the target label y while four real images of y are randomly selected as “Image B”. MLLM is tasked with responding either “Yes” or “No” to in… view at source ↗
Figure 3
Figure 3. Figure 3: Examples of ChatGPT-5 refusing to evaluate MI-related queries [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Our detailed implementation of MLLM-based MI Evaluation Framework FMLLM. For each reconstructed image, we pair with a set of private training data to construct an evaluation query image. Then, each evaluation query image is passed to Gemini with a textual prompt. The detailed of textual prompt can be found in Sec. 7.2. The final attack accuracy is computed based on Gemini’s responses [PITH_FULL_IMAGE:figu… view at source ↗
Figure 5
Figure 5. Figure 5: Additional visualization of false positives. These MI [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Additional visualization of false positives. These MI false [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 8
Figure 8. Figure 8: Additional visualization of false positives. These MI [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗
read the original abstract

Model Inversion attacks aim to reconstruct information from private training data by exploiting access to a target model. Nearly all recent MI studies evaluate attack success using a standard framework that computes attack accuracy through a secondary evaluation model trained on the same private data and task design as the target model. In this paper, we present the first in-depth analysis of this dominant evaluation framework and reveal a fundamental issue: many reconstructions deemed successful under the existing framework are in fact false positives that do not capture the visual identity of the target individual. We first show that these MI false positives satisfy the same formal conditions as Type I adversarial examples. Our controlled experiments, we demonstrate extremely high false-positive transferability, an empirical signature characteristic of adversarial behavior, indicating that many MI false positives likely contain Type I adversarial features. This adversarial transferability significantly inflates reported attack accuracy and leads to an overstatement of privacy leakage in existing MI work. To address this issue, as our second contribution, we introduce a new evaluation framework based on MLLMs, whose general-purpose visual reasoning avoids the shared-task vulnerability and reduces Type-I adversarial transferability of current evaluation framework. We propose systematic design principles for MLLM-based evaluation. Using this framework, we reassess 27 MI attack setups across diverse datasets, target models, and priors, and find consistently high false-positive rates under the conventional approach. Our results call for a reevaluation of progress in MI research and establish MLLM-based evaluation as a more reliable standard for assessing privacy risks in machine learning systems. Code/data/prompt are available at https://hosytuyen.github.io/projects/FMLLM

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper argues that the dominant evaluation framework for model inversion (MI) attacks—using a secondary model trained on the same private data and task—produces many false positives that do not capture the target's visual identity. It shows these false positives exhibit high adversarial transferability akin to Type I adversarial examples, inflating reported attack accuracies and overstating privacy leakage. As a remedy, the authors introduce an MLLM-based evaluation framework with systematic prompt design principles that purportedly avoids shared-task vulnerabilities, then reassess 27 MI setups across datasets, models, and priors to report consistently high false-positive rates under conventional metrics.

Significance. If the central empirical findings hold, the work would have substantial impact on privacy research in machine learning by exposing a systemic flaw in how MI attack success is measured and by supplying a reproducible alternative protocol. The release of code, data, and prompts strengthens the contribution by enabling direct verification and adoption. The result challenges the validity of a large body of prior MI literature and could shift evaluation standards toward more general-purpose visual reasoning tools.

major comments (3)
  1. [§4] §4 (MLLM Evaluation Framework): The claim that MLLM-based evaluation reliably measures visual identity capture rests on the assumption that MLLM judgments correlate with human recognition of individuals, yet no human validation study or correlation analysis is reported. Without this, the reported false-positive rates and the conclusion that conventional metrics overstate privacy leakage remain dependent on an unverified proxy.
  2. [§5.3] §5.3 (Reassessment of 27 setups): The paper states that false-positive criteria are applied systematically, but the exact decision rules, prompt templates, and threshold definitions for declaring an MLLM output a false positive versus a true identity match are only sketched. This lack of operational detail makes it difficult to reproduce the key quantitative result that conventional accuracy is consistently inflated.
  3. [§3.1] §3.1 (Adversarial characterization): The formal mapping of MI false positives to Type I adversarial examples is presented as a key insight, but the controlled experiments demonstrating 'extremely high false-positive transferability' do not report the precise attack success thresholds or the number of transfer trials used to establish the empirical signature.
minor comments (2)
  1. [Figure 2] Figure 2 caption and axis labels should explicitly state the number of transfer trials and the exact success metric used to quantify transferability.
  2. [§1] The abstract and §1 refer to '27 MI attack setups' without a summary table listing the specific attacks, datasets, and target models; adding such a table would improve clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their detailed and insightful comments, which have helped us identify areas for improvement in clarity and reproducibility. We address each major comment below, indicating the revisions we plan to make in the updated manuscript.

read point-by-point responses
  1. Referee: [§4] §4 (MLLM Evaluation Framework): The claim that MLLM-based evaluation reliably measures visual identity capture rests on the assumption that MLLM judgments correlate with human recognition of individuals, yet no human validation study or correlation analysis is reported. Without this, the reported false-positive rates and the conclusion that conventional metrics overstate privacy leakage remain dependent on an unverified proxy.

    Authors: We agree that a direct correlation analysis with human judgments would provide stronger validation for the MLLM framework. In the revised manuscript we will add a dedicated limitations subsection discussing this point and include results from a small-scale human study on a representative subset of reconstructions to report agreement rates between MLLM outputs and human assessments. revision: yes

  2. Referee: [§5.3] §5.3 (Reassessment of 27 setups): The paper states that false-positive criteria are applied systematically, but the exact decision rules, prompt templates, and threshold definitions for declaring an MLLM output a false positive versus a true identity match are only sketched. This lack of operational detail makes it difficult to reproduce the key quantitative result that conventional accuracy is consistently inflated.

    Authors: We thank the referee for highlighting the need for greater operational detail. In the revision we will expand Section 5.3 and add a new appendix containing the complete prompt templates, exact decision rules, threshold values, and pseudocode for the full MLLM evaluation pipeline to ensure full reproducibility of the reported false-positive rates. revision: yes

  3. Referee: [§3.1] §3.1 (Adversarial characterization): The formal mapping of MI false positives to Type I adversarial examples is presented as a key insight, but the controlled experiments demonstrating 'extremely high false-positive transferability' do not report the precise attack success thresholds or the number of transfer trials used to establish the empirical signature.

    Authors: We acknowledge that the experimental description in Section 3.1 lacks sufficient quantitative detail. In the revised version we will explicitly report the attack success thresholds, the exact number of transfer trials performed, and any statistical measures used to quantify the high false-positive transferability observed in the controlled experiments. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical critique or new evaluation protocol

full rationale

The paper is an empirical critique of existing model inversion success metrics, demonstrating via controlled experiments that many reported successes are false positives with adversarial transferability properties, followed by introduction of an MLLM-based evaluation framework. No equations, derivations, or first-principles results are present that reduce claimed outcomes to fitted inputs or self-citations by construction. The analysis reassesses 27 prior setups using new measurements without definitional loops, self-referential uniqueness theorems, or renaming of known results as novel unifications. The work is self-contained against external benchmarks through direct experimental evidence rather than tautological reductions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the empirical observation that task-specific evaluation models share vulnerabilities with the attack, plus the assumption that MLLMs do not share those vulnerabilities. No free parameters or invented entities are introduced.

axioms (1)
  • domain assumption Task-specific evaluation models trained on the same private data and task as the target model are vulnerable to Type I adversarial features in reconstructions.
    Invoked to explain why conventional accuracy metric produces false positives.

pith-pipeline@v0.9.0 · 5836 in / 1126 out tokens · 22954 ms · 2026-05-22T15:44:50.863070+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages · 5 internal anchors

  1. [1]

    Mirror: Model inversion for deep learning network with high fidelity

    Shengwei An, Guanhong Tao, Qiuling Xu, Yingqi Liu, Guangyu Shen, Yuan Yao, Jingwei Xu, and Xiangyu Zhang. Mirror: Model inversion for deep learning network with high fidelity. InProceedings of the 29th Network and Distributed System Security Symposium, 2022

  2. [2]

    Towards evaluating the robustness of neural networks

    Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In2017 ieee symposium on security and privacy (sp), pages 39–57. Ieee, 2017

  3. [3]

    End-to-end multi-speaker speech recognition with transformer

    Xuankai Chang, Wangyou Zhang, Yanmin Qian, Jonathan Le Roux, and Shinji Watanabe. End-to-end multi-speaker speech recognition with transformer. InICASSP 2020, pages 6134–6138. IEEE, 2020

  4. [4]

    Knowledge-enriched distributional model inversion attacks

    Si Chen, Mostafa Kahla, Ruoxi Jia, and Guo-Jun Qi. Knowledge-enriched distributional model inversion attacks. InProceedings of the IEEE/CVF international conference on computer vision, pages 16178–16187, 2021

  5. [5]

    Simplifying data preparation for gen ai with google’s gemini

    Google Cloud Community. Simplifying data preparation for gen ai with google’s gemini. Google Cloud Community,

  6. [6]

    https://www.googlecloudcommunity.com/gc/Cloud- Product-Articles/Simplifying-Data-Preparation-for-Gen-AI- with-Google-s-Gemini/ta-p/841925

  7. [7]

    Why do adversarial attacks transfer? explain- ing transferability of evasion and poisoning attacks

    Ambra Demontis, Marco Melis, Maura Pintor, Matthew Jagielski, Battista Biggio, Alina Oprea, Cristina Nita-Rotaru, and Fabio Roli. Why do adversarial attacks transfer? explain- ing transferability of evasion and poisoning attacks. In28th USENIX security symposium (USENIX security 19), pages 321–338, 2019

  8. [8]

    Towards fine-grained visual representations by combining contrastive learning with image reconstruction and attention-weighted pooling.arXiv preprint arXiv:2104.04323, 2021

    Jonas Dippel, Steffen V ogler, and Johannes Höhne. Towards fine-grained visual representations by combining contrastive learning with image reconstruction and attention-weighted pooling.arXiv preprint arXiv:2104.04323, 2021

  9. [9]

    Con- trastive learning with continuous proxy meta-data for 3d mri classification

    Benoit Dufumier, Pietro Gori, Julie Victor, Antoine Grigis, Michele Wessa, Paolo Brambilla, Pauline Favre, Mircea Polosan, Colm Mcdonald, Camille Marie Piguet, et al. Con- trastive learning with continuous proxy meta-data for 3d mri classification. InMICCAI 2021. Springer, 2021

  10. [10]

    Privacy in pharmaco- genetics: An end-to-end case study of personalized warfarin dosing

    Matthew Fredrikson, Eric Lantz, Somesh Jha, Simon Lin, David Page, and Thomas Ristenpart. Privacy in pharmaco- genetics: An end-to-end case study of personalized warfarin dosing. In23rd USENIX Security Symposium (USENIX Secu- rity 14), pages 17–32, 2014

  11. [11]

    Model inversion attacks that exploit confidence information and basic countermeasures

    Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. Model inversion attacks that exploit confidence information and basic countermeasures. InProceedings of the 22nd ACM SIGSAC conference on computer and communications security, pages 1322–1333, 2015

  12. [12]

    Explaining and Harnessing Adversarial Examples

    Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples.arXiv preprint arXiv:1412.6572, 2014

  13. [13]

    On calibration of modern neural networks

    Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q Weinberger. On calibration of modern neural networks. InICML, pages 1321–1330. PMLR, 2017

  14. [14]

    Learning meta face recognition in unseen domains

    Jianzhu Guo, Xiangyu Zhu, Chenxu Zhao, Dong Cao, Zhen Lei, and Stan Z Li. Learning meta face recognition in unseen domains. InCVPR, pages 6163–6172, 2020

  15. [15]

    Rein- forcement learning-based black-box model inversion attacks

    Gyojin Han, Jaehyun Choi, Haeil Lee, and Junmo Kim. Rein- forcement learning-based black-box model inversion attacks. InCVPR, pages 20504–20513, 2023

  16. [16]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceed- ings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

  17. [17]

    Model inversion robustness: Can transfer learning help? InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12183–12193, 2024

    Sy-Tuyen Ho, Koh Jun Hao, Keshigeyan Chandrasegaran, Ngoc-Bao Nguyen, and Ngai-Man Cheung. Model inversion robustness: Can transfer learning help? InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12183–12193, 2024

  18. [18]

    Densely connected convolutional networks

    Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kil- ian Q Weinberger. Densely connected convolutional networks. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708, 2017

  19. [19]

    Cur- ricularface: adaptive curriculum learning loss for deep face recognition

    Yuge Huang, Yuhan Wang, Ying Tai, Xiaoming Liu, Pengcheng Shen, Shaoxin Li, Jilin Li, and Feiyue Huang. Cur- ricularface: adaptive curriculum learning loss for deep face recognition. Inproceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5901–5910, 2020

  20. [20]

    Adversarial examples are not bugs, they are features

    Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Logan Engstrom, Brandon Tran, and Aleksander Madry. Adversarial examples are not bugs, they are features. 2019

  21. [21]

    Label-only model inversion attacks via boundary repulsion

    Mostafa Kahla, Si Chen, Hoang Anh Just, and Ruoxi Jia. Label-only model inversion attacks via boundary repulsion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15045–15053, 2022

  22. [22]

    A style-based generator architecture for generative adversarial networks

    Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019

  23. [23]

    Training generative ad- versarial networks with limited data.Advances in neural information processing systems, 33:12104–12114, 2020

    Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, and Timo Aila. Training generative ad- versarial networks with limited data.Advances in neural information processing systems, 33:12104–12114, 2020

  24. [24]

    On the vulnerability of skip connections to model inversion attacks

    Jun Hao Koh, Sy-Tuyen Ho, Ngoc-Bao Nguyen, and Ngai- man Cheung. On the vulnerability of skip connections to model inversion attacks. InEuropean Conference on Com- puter Vision, 2024

  25. [25]

    Speech recognition with no speech or with noisy speech

    Gautam Krishna, Co Tran, Jianguo Yu, and Ahmed H Tewfik. Speech recognition with no speech or with noisy speech. InICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1090–1094. IEEE, 2019

  26. [26]

    Adversarial Machine Learning at Scale

    Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Ad- versarial machine learning at scale.arXiv preprint arXiv:1611.01236, 2016

  27. [27]

    Gemini-assisted deep learning classification model for automated medical image annotation

    Sang Lee and Anisha Patel. Gemini-assisted deep learning classification model for automated medical image annotation. Journal of Digital Imaging, 37:123–135, 2024

  28. [28]

    From head to tail: Efficient black-box model inversion attack via long-tailed learning

    Ziang Li, Hongguang Zhang, Juan Wang, Meihui Chen, Hongxin Hu, Wenzhe Yi, Xiaoyang Xu, Mengda Yang, and Chenjun Ma. From head to tail: Efficient black-box model inversion attack via long-tailed learning. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 29288–29298, 2025

  29. [29]

    Uncovering the connections be- tween adversarial transferability and knowledge transferabil- ity

    Kaizhao Liang, Jacky Y Zhang, Boxin Wang, Zhuolin Yang, Sanmi Koyejo, and Bo Li. Uncovering the connections be- tween adversarial transferability and knowledge transferabil- ity. InInternational Conference on Machine Learning, pages 6577–6587. PMLR, 2021

  30. [30]

    Deep learning face attributes in the wild

    Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. InProceedings of the IEEE international conference on computer vision, pages 3730–3738, 2015

  31. [31]

    Improving adversarial transferability via model alignment

    Avery Ma, Amir-massoud Farahmand, Yangchen Pan, Philip Torr, and Jindong Gu. Improving adversarial transferability via model alignment. InEuropean Conference on Computer Vision, pages 74–92. Springer, 2024

  32. [32]

    Magface: A universal representation for face recognition and quality assessment

    Qiang Meng, Shichao Zhao, Zhida Huang, and Feng Zhou. Magface: A universal representation for face recognition and quality assessment. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14225–14234, 2021

  33. [33]

    Deepfool: a simple and accurate method to fool deep neural networks

    Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pas- cal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. InProceedings of the IEEE con- ference on computer vision and pattern recognition, pages 2574–2582, 2016

  34. [34]

    A data-driven approach to cleaning large face datasets

    Hong-Wei Ng and Stefan Winkler. A data-driven approach to cleaning large face datasets. In2014 IEEE international conference on image processing (ICIP), pages 343–347. IEEE, 2014

  35. [35]

    Deep neural networks are easily fooled: High confidence predictions for unrecognizable images

    Anh Nguyen, Jason Yosinski, and Jeff Clune. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. InProceedings of the IEEE con- ference on computer vision and pattern recognition, pages 427–436, 2015

  36. [36]

    Re-thinking model inversion attacks against deep neural networks

    Ngoc-Bao Nguyen, Keshigeyan Chandrasegaran, Milad Ab- dollahzadeh, and Ngai-Man Cheung. Re-thinking model inversion attacks against deep neural networks. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

  37. [37]

    Label-only model in- version attacks via knowledge transfer

    Ngoc-Bao Nguyen, Keshigeyan Chandrasegaran, Milad Ab- dollahzadeh, and Ngai man Cheung. Label-only model in- version attacks via knowledge transfer. InThirty-seventh Conference on Neural Information Processing Systems, 2023

  38. [38]

    Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples

    Nicolas Papernot, Patrick McDaniel, and Ian Goodfellow. Transferability in machine learning: from phenomena to black-box attacks using adversarial samples.arXiv preprint arXiv:1605.07277, 2016

  39. [39]

    The lim- itations of deep learning in adversarial settings

    Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram Swami. The lim- itations of deep learning in adversarial settings. In2016 IEEE European symposium on security and privacy (EuroS&P), pages 372–387. IEEE, 2016

  40. [40]

    Bilateral dependency optimiza- tion: Defending against model-inversion attacks

    Xiong Peng, Feng Liu, Jingfeng Zhang, Long Lan, Junjie Ye, Tongliang Liu, and Bo Han. Bilateral dependency optimiza- tion: Defending against model-inversion attacks. InKDD, 2022

  41. [41]

    Pseudo-private data guided model inversion attacks

    Xiong Peng, Bo Han, Feng Liu, Tongliang Liu, and Mingyuan Zhou. Pseudo-private data guided model inversion attacks. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

  42. [42]

    Boosting the transferability of ad- versarial attacks with reverse adversarial perturbation

    Zeyu Qin, Yanbo Fan, Yi Liu, Li Shen, Yong Zhang, Jue Wang, and Baoyuan Wu. Boosting the transferability of ad- versarial attacks with reverse adversarial perturbation. pages 29845–29858, 2022

  43. [43]

    A closer look at gan priors: Exploiting intermediate features for enhanced model inversion attacks

    Yixiang Qiu, Hao Fang, Hongyao Yu, Bin Chen, MeiKang Qiu, and Shu-Tao Xia. A closer look at gan priors: Exploiting intermediate features for enhanced model inversion attacks. InProceedings of European Conference on Computer Vision, 2024

  44. [44]

    Facenet: A unified embedding for face recognition and clus- tering

    Florian Schroff, Dmitry Kalenichenko, and James Philbin. Facenet: A unified embedding for face recognition and clus- tering. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 815–823, 2015

  45. [45]

    Adversarial training for free! Advances in neural information processing systems, 32, 2019

    Ali Shafahi, Mahyar Najibi, Mohammad Amin Ghiasi, Zheng Xu, John Dickerson, Christoph Studer, Larry S Davis, Gavin Taylor, and Tom Goldstein. Adversarial training for free! Advances in neural information processing systems, 32, 2019

  46. [46]

    Capabilities of Gemini Models in Medicine

    Laura Smith and Wei Chen. Capabilities of gemini models in medicine.arXiv preprint arXiv:2404.18416, 2024

  47. [47]

    Plug & play attacks: Towards robust and flexible model inversion attacks

    Lukas Struppek, Dominik Hintersdorf, Antonio De Almeida Correira, Antonia Adler, and Kristian Kersting. Plug & play attacks: Towards robust and flexible model inversion attacks. InInternational Conference on Machine Learning, pages 20522–20545. PMLR, 2022

  48. [48]

    Be careful what you smooth for: Label smoothing can be a privacy shield but also a catalyst for model inversion attacks

    Lukas Struppek, Dominik Hintersdorf, and Kristian Kersting. Be careful what you smooth for: Label smoothing can be a privacy shield but also a catalyst for model inversion attacks. InICLR, 2024

  49. [49]

    Be careful what you smooth for: Label smoothing can be a privacy shield but also a catalyst for model inversion attacks

    Lukas Struppek, Dominik Hintersdorf, and Kristian Kerst- ing. Be careful what you smooth for: Label smoothing can be a privacy shield but also a catalyst for model inversion attacks. InThe Twelfth International Conference on Learning Representations, 2024

  50. [50]

    Intriguing properties of neural networks

    Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks.arXiv preprint arXiv:1312.6199, 2013

  51. [51]

    Adversarial attack type i: Cheat classifiers by significant changes.IEEE transactions on pattern analysis and machine intelligence, 43(3):1100–1109, 2019

    Sanli Tang, Xiaolin Huang, Mingjian Chen, Chengjin Sun, and Jie Yang. Adversarial attack type i: Cheat classifiers by significant changes.IEEE transactions on pattern analysis and machine intelligence, 43(3):1100–1109, 2019

  52. [52]

    Maxvit: Multi- axis vision transformer

    Zhengzhong Tu, Hossein Talebi, Han Zhang, Feng Yang, Peyman Milanfar, Alan Bovik, and Yinxiao Li. Maxvit: Multi- axis vision transformer. InEuropean conference on computer vision, pages 459–479. Springer, 2022

  53. [53]

    Variational model inversion attacks.Advances in Neural Information Processing Systems, 34:9706–9719, 2021

    Kuan-Chieh Wang, Yan Fu, Ke Li, Ashish Khisti, Richard Zemel, and Alireza Makhzani. Variational model inversion attacks.Advances in Neural Information Processing Systems, 34:9706–9719, 2021

  54. [54]

    Improving robustness to model inversion attacks via mutual information regularization

    Tianhao Wang, Yuheng Zhang, and Ruoxi Jia. Improving robustness to model inversion attacks via mutual information regularization. InProceedings of the AAAI Conference on Artificial Intelligence, pages 11666–11673, 2021

  55. [55]

    Mitigating neural network overconfidence with logit normalization

    Hongxin Wei, Renchunzi Xie, Hao Cheng, Lei Feng, Bo An, and Yixuan Li. Mitigating neural network overconfidence with logit normalization. InInternational conference on ma- chine learning, pages 23631–23644. PMLR, 2022

  56. [56]

    Skip connections matter: On the transferability of adversarial examples generated with resnets

    Dongxian Wu, Yisen Wang, Shu-Tao Xia, James Bailey, and Xingjun Ma. Skip connections matter: On the transferability of adversarial examples generated with resnets. InInterna- tional Conference on Learning Representations, 2020

  57. [57]

    Towards better understanding and better generalization of few-shot classification in histology images with contrastive learning

    Jiawei Yang, Hanbo Chen, Jiangpeng Yan, Xiaoyu Chen, and Jianhua Yao. Towards better understanding and better generalization of few-shot classification in histology images with contrastive learning. 2022

  58. [58]

    Neural network inversion in adversarial setting via back- ground knowledge alignment

    Ziqi Yang, Jiyi Zhang, Ee-Chien Chang, and Zhenkai Liang. Neural network inversion in adversarial setting via back- ground knowledge alignment. InProceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, pages 225–240, 2019

  59. [59]

    Pseudo label-guided model in- version attack via conditional generative adversarial network

    Xiaojian Yuan, Kejiang Chen, Jie Zhang, Weiming Zhang, Nenghai Yu, and Yang Zhang. Pseudo label-guided model in- version attack via conditional generative adversarial network. Thirty Seventh AAAI Conference on Artificial Intelligence (AAAI 23), 2023

  60. [60]

    The secret revealer: Generative model- inversion attacks against deep neural networks

    Yuheng Zhang, Ruoxi Jia, Hengzhi Pei, Wenxiao Wang, Bo Li, and Dawn Song. The secret revealer: Generative model- inversion attacks against deep neural networks. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 253–261, 2020

  61. [61]

    Towards understanding adversarial transferability from surrogate train- ing.IEEE Symposium on Security and Privacy (SP), 2024

    Yechao Zhang, Shengshan Hu, Leo Yu Zhang, Junyu Shi, Minghui Li, Xiaogeng Liu, Wei Wan, and Hai Jin. Towards understanding adversarial transferability from surrogate train- ing.IEEE Symposium on Security and Privacy (SP), 2024

  62. [62]

    A semi-automated approach for crafting outputs with gemini pro.Advanced Engineering Informatics, 61:101432, 2024

    Yi Zhou, Cheng Li, and Manish Gupta. A semi-automated approach for crafting outputs with gemini pro.Advanced Engineering Informatics, 61:101432, 2024