Revisiting Model Inversion Evaluation: From Misleading Standards to Reliable Privacy Assessment
Pith reviewed 2026-05-22 15:44 UTC · model grok-4.3
The pith
Many model inversion reconstructions counted as successful do not capture the target's visual identity.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The dominant evaluation framework for model inversion attacks computes attack accuracy through a secondary evaluation model trained on the same private data and task design as the target model. Many reconstructions that satisfy this framework do not capture the visual identity of the target individual. These MI false positives satisfy the same formal conditions as Type I adversarial examples and show extremely high false-positive transferability, indicating they contain Type I adversarial features. This transferability inflates reported attack accuracy and overstates privacy leakage. An MLLM-based evaluation framework, built on systematic design principles, avoids the shared-task weakness of
What carries the argument
MLLM-based evaluation framework, which applies multimodal large language models' general-purpose visual reasoning to judge whether a reconstruction matches a target's identity instead of relying on task-specific classification.
If this is right
- Reported attack accuracies in existing model inversion studies are inflated by false positives.
- Privacy leakage from model inversion has been overstated in prior research.
- Reevaluation of 27 MI attack setups across datasets and models shows consistently high false-positive rates under the old method.
- MLLM-based evaluation reduces Type I adversarial transferability and provides a more reliable privacy assessment standard.
Where Pith is reading between the lines
- Evaluation problems of this kind could appear in other privacy attack studies that depend on task-specific models for success measurement.
- Techniques developed to detect adversarial examples might be adapted to strengthen MI evaluation methods.
- Future work could combine MLLM judgments with other checks to create even more robust privacy assessment protocols.
Load-bearing premise
Multimodal large language models have general-purpose visual reasoning that prevents them from sharing the task-specific vulnerabilities of conventional evaluation models.
What would settle it
If independent MLLM evaluations on reconstructions from prior MI papers find that a large share of those previously labeled successful do not match the target person's visual identity, the claim of widespread false positives would be supported.
Figures
read the original abstract
Model Inversion attacks aim to reconstruct information from private training data by exploiting access to a target model. Nearly all recent MI studies evaluate attack success using a standard framework that computes attack accuracy through a secondary evaluation model trained on the same private data and task design as the target model. In this paper, we present the first in-depth analysis of this dominant evaluation framework and reveal a fundamental issue: many reconstructions deemed successful under the existing framework are in fact false positives that do not capture the visual identity of the target individual. We first show that these MI false positives satisfy the same formal conditions as Type I adversarial examples. Our controlled experiments, we demonstrate extremely high false-positive transferability, an empirical signature characteristic of adversarial behavior, indicating that many MI false positives likely contain Type I adversarial features. This adversarial transferability significantly inflates reported attack accuracy and leads to an overstatement of privacy leakage in existing MI work. To address this issue, as our second contribution, we introduce a new evaluation framework based on MLLMs, whose general-purpose visual reasoning avoids the shared-task vulnerability and reduces Type-I adversarial transferability of current evaluation framework. We propose systematic design principles for MLLM-based evaluation. Using this framework, we reassess 27 MI attack setups across diverse datasets, target models, and priors, and find consistently high false-positive rates under the conventional approach. Our results call for a reevaluation of progress in MI research and establish MLLM-based evaluation as a more reliable standard for assessing privacy risks in machine learning systems. Code/data/prompt are available at https://hosytuyen.github.io/projects/FMLLM
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper argues that the dominant evaluation framework for model inversion (MI) attacks—using a secondary model trained on the same private data and task—produces many false positives that do not capture the target's visual identity. It shows these false positives exhibit high adversarial transferability akin to Type I adversarial examples, inflating reported attack accuracies and overstating privacy leakage. As a remedy, the authors introduce an MLLM-based evaluation framework with systematic prompt design principles that purportedly avoids shared-task vulnerabilities, then reassess 27 MI setups across datasets, models, and priors to report consistently high false-positive rates under conventional metrics.
Significance. If the central empirical findings hold, the work would have substantial impact on privacy research in machine learning by exposing a systemic flaw in how MI attack success is measured and by supplying a reproducible alternative protocol. The release of code, data, and prompts strengthens the contribution by enabling direct verification and adoption. The result challenges the validity of a large body of prior MI literature and could shift evaluation standards toward more general-purpose visual reasoning tools.
major comments (3)
- [§4] §4 (MLLM Evaluation Framework): The claim that MLLM-based evaluation reliably measures visual identity capture rests on the assumption that MLLM judgments correlate with human recognition of individuals, yet no human validation study or correlation analysis is reported. Without this, the reported false-positive rates and the conclusion that conventional metrics overstate privacy leakage remain dependent on an unverified proxy.
- [§5.3] §5.3 (Reassessment of 27 setups): The paper states that false-positive criteria are applied systematically, but the exact decision rules, prompt templates, and threshold definitions for declaring an MLLM output a false positive versus a true identity match are only sketched. This lack of operational detail makes it difficult to reproduce the key quantitative result that conventional accuracy is consistently inflated.
- [§3.1] §3.1 (Adversarial characterization): The formal mapping of MI false positives to Type I adversarial examples is presented as a key insight, but the controlled experiments demonstrating 'extremely high false-positive transferability' do not report the precise attack success thresholds or the number of transfer trials used to establish the empirical signature.
minor comments (2)
- [Figure 2] Figure 2 caption and axis labels should explicitly state the number of transfer trials and the exact success metric used to quantify transferability.
- [§1] The abstract and §1 refer to '27 MI attack setups' without a summary table listing the specific attacks, datasets, and target models; adding such a table would improve clarity.
Simulated Author's Rebuttal
We thank the referee for their detailed and insightful comments, which have helped us identify areas for improvement in clarity and reproducibility. We address each major comment below, indicating the revisions we plan to make in the updated manuscript.
read point-by-point responses
-
Referee: [§4] §4 (MLLM Evaluation Framework): The claim that MLLM-based evaluation reliably measures visual identity capture rests on the assumption that MLLM judgments correlate with human recognition of individuals, yet no human validation study or correlation analysis is reported. Without this, the reported false-positive rates and the conclusion that conventional metrics overstate privacy leakage remain dependent on an unverified proxy.
Authors: We agree that a direct correlation analysis with human judgments would provide stronger validation for the MLLM framework. In the revised manuscript we will add a dedicated limitations subsection discussing this point and include results from a small-scale human study on a representative subset of reconstructions to report agreement rates between MLLM outputs and human assessments. revision: yes
-
Referee: [§5.3] §5.3 (Reassessment of 27 setups): The paper states that false-positive criteria are applied systematically, but the exact decision rules, prompt templates, and threshold definitions for declaring an MLLM output a false positive versus a true identity match are only sketched. This lack of operational detail makes it difficult to reproduce the key quantitative result that conventional accuracy is consistently inflated.
Authors: We thank the referee for highlighting the need for greater operational detail. In the revision we will expand Section 5.3 and add a new appendix containing the complete prompt templates, exact decision rules, threshold values, and pseudocode for the full MLLM evaluation pipeline to ensure full reproducibility of the reported false-positive rates. revision: yes
-
Referee: [§3.1] §3.1 (Adversarial characterization): The formal mapping of MI false positives to Type I adversarial examples is presented as a key insight, but the controlled experiments demonstrating 'extremely high false-positive transferability' do not report the precise attack success thresholds or the number of transfer trials used to establish the empirical signature.
Authors: We acknowledge that the experimental description in Section 3.1 lacks sufficient quantitative detail. In the revised version we will explicitly report the attack success thresholds, the exact number of transfer trials performed, and any statistical measures used to quantify the high false-positive transferability observed in the controlled experiments. revision: yes
Circularity Check
No significant circularity in empirical critique or new evaluation protocol
full rationale
The paper is an empirical critique of existing model inversion success metrics, demonstrating via controlled experiments that many reported successes are false positives with adversarial transferability properties, followed by introduction of an MLLM-based evaluation framework. No equations, derivations, or first-principles results are present that reduce claimed outcomes to fitted inputs or self-citations by construction. The analysis reassesses 27 prior setups using new measurements without definitional loops, self-referential uniqueness theorems, or renaming of known results as novel unifications. The work is self-contained against external benchmarks through direct experimental evidence rather than tautological reductions.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Task-specific evaluation models trained on the same private data and task as the target model are vulnerable to Type I adversarial features in reconstructions.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We show that these MI false positives satisfy the same formal conditions as Type I adversarial examples... mathematical equivalence of MI false positives and Type I adversarial examples
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
adversarial transferability... shared-task vulnerability
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Mirror: Model inversion for deep learning network with high fidelity
Shengwei An, Guanhong Tao, Qiuling Xu, Yingqi Liu, Guangyu Shen, Yuan Yao, Jingwei Xu, and Xiangyu Zhang. Mirror: Model inversion for deep learning network with high fidelity. InProceedings of the 29th Network and Distributed System Security Symposium, 2022
work page 2022
-
[2]
Towards evaluating the robustness of neural networks
Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In2017 ieee symposium on security and privacy (sp), pages 39–57. Ieee, 2017
work page 2017
-
[3]
End-to-end multi-speaker speech recognition with transformer
Xuankai Chang, Wangyou Zhang, Yanmin Qian, Jonathan Le Roux, and Shinji Watanabe. End-to-end multi-speaker speech recognition with transformer. InICASSP 2020, pages 6134–6138. IEEE, 2020
work page 2020
-
[4]
Knowledge-enriched distributional model inversion attacks
Si Chen, Mostafa Kahla, Ruoxi Jia, and Guo-Jun Qi. Knowledge-enriched distributional model inversion attacks. InProceedings of the IEEE/CVF international conference on computer vision, pages 16178–16187, 2021
work page 2021
-
[5]
Simplifying data preparation for gen ai with google’s gemini
Google Cloud Community. Simplifying data preparation for gen ai with google’s gemini. Google Cloud Community,
-
[6]
https://www.googlecloudcommunity.com/gc/Cloud- Product-Articles/Simplifying-Data-Preparation-for-Gen-AI- with-Google-s-Gemini/ta-p/841925
-
[7]
Why do adversarial attacks transfer? explain- ing transferability of evasion and poisoning attacks
Ambra Demontis, Marco Melis, Maura Pintor, Matthew Jagielski, Battista Biggio, Alina Oprea, Cristina Nita-Rotaru, and Fabio Roli. Why do adversarial attacks transfer? explain- ing transferability of evasion and poisoning attacks. In28th USENIX security symposium (USENIX security 19), pages 321–338, 2019
work page 2019
-
[8]
Jonas Dippel, Steffen V ogler, and Johannes Höhne. Towards fine-grained visual representations by combining contrastive learning with image reconstruction and attention-weighted pooling.arXiv preprint arXiv:2104.04323, 2021
-
[9]
Con- trastive learning with continuous proxy meta-data for 3d mri classification
Benoit Dufumier, Pietro Gori, Julie Victor, Antoine Grigis, Michele Wessa, Paolo Brambilla, Pauline Favre, Mircea Polosan, Colm Mcdonald, Camille Marie Piguet, et al. Con- trastive learning with continuous proxy meta-data for 3d mri classification. InMICCAI 2021. Springer, 2021
work page 2021
-
[10]
Privacy in pharmaco- genetics: An end-to-end case study of personalized warfarin dosing
Matthew Fredrikson, Eric Lantz, Somesh Jha, Simon Lin, David Page, and Thomas Ristenpart. Privacy in pharmaco- genetics: An end-to-end case study of personalized warfarin dosing. In23rd USENIX Security Symposium (USENIX Secu- rity 14), pages 17–32, 2014
work page 2014
-
[11]
Model inversion attacks that exploit confidence information and basic countermeasures
Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. Model inversion attacks that exploit confidence information and basic countermeasures. InProceedings of the 22nd ACM SIGSAC conference on computer and communications security, pages 1322–1333, 2015
work page 2015
-
[12]
Explaining and Harnessing Adversarial Examples
Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples.arXiv preprint arXiv:1412.6572, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[13]
On calibration of modern neural networks
Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q Weinberger. On calibration of modern neural networks. InICML, pages 1321–1330. PMLR, 2017
work page 2017
-
[14]
Learning meta face recognition in unseen domains
Jianzhu Guo, Xiangyu Zhu, Chenxu Zhao, Dong Cao, Zhen Lei, and Stan Z Li. Learning meta face recognition in unseen domains. InCVPR, pages 6163–6172, 2020
work page 2020
-
[15]
Rein- forcement learning-based black-box model inversion attacks
Gyojin Han, Jaehyun Choi, Haeil Lee, and Junmo Kim. Rein- forcement learning-based black-box model inversion attacks. InCVPR, pages 20504–20513, 2023
work page 2023
-
[16]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceed- ings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016
work page 2016
-
[17]
Sy-Tuyen Ho, Koh Jun Hao, Keshigeyan Chandrasegaran, Ngoc-Bao Nguyen, and Ngai-Man Cheung. Model inversion robustness: Can transfer learning help? InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12183–12193, 2024
work page 2024
-
[18]
Densely connected convolutional networks
Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kil- ian Q Weinberger. Densely connected convolutional networks. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708, 2017
work page 2017
-
[19]
Cur- ricularface: adaptive curriculum learning loss for deep face recognition
Yuge Huang, Yuhan Wang, Ying Tai, Xiaoming Liu, Pengcheng Shen, Shaoxin Li, Jilin Li, and Feiyue Huang. Cur- ricularface: adaptive curriculum learning loss for deep face recognition. Inproceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5901–5910, 2020
work page 2020
-
[20]
Adversarial examples are not bugs, they are features
Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Logan Engstrom, Brandon Tran, and Aleksander Madry. Adversarial examples are not bugs, they are features. 2019
work page 2019
-
[21]
Label-only model inversion attacks via boundary repulsion
Mostafa Kahla, Si Chen, Hoang Anh Just, and Ruoxi Jia. Label-only model inversion attacks via boundary repulsion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15045–15053, 2022
work page 2022
-
[22]
A style-based generator architecture for generative adversarial networks
Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019
work page 2019
-
[23]
Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, and Timo Aila. Training generative ad- versarial networks with limited data.Advances in neural information processing systems, 33:12104–12114, 2020
work page 2020
-
[24]
On the vulnerability of skip connections to model inversion attacks
Jun Hao Koh, Sy-Tuyen Ho, Ngoc-Bao Nguyen, and Ngai- man Cheung. On the vulnerability of skip connections to model inversion attacks. InEuropean Conference on Com- puter Vision, 2024
work page 2024
-
[25]
Speech recognition with no speech or with noisy speech
Gautam Krishna, Co Tran, Jianguo Yu, and Ahmed H Tewfik. Speech recognition with no speech or with noisy speech. InICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1090–1094. IEEE, 2019
work page 2019
-
[26]
Adversarial Machine Learning at Scale
Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Ad- versarial machine learning at scale.arXiv preprint arXiv:1611.01236, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[27]
Gemini-assisted deep learning classification model for automated medical image annotation
Sang Lee and Anisha Patel. Gemini-assisted deep learning classification model for automated medical image annotation. Journal of Digital Imaging, 37:123–135, 2024
work page 2024
-
[28]
From head to tail: Efficient black-box model inversion attack via long-tailed learning
Ziang Li, Hongguang Zhang, Juan Wang, Meihui Chen, Hongxin Hu, Wenzhe Yi, Xiaoyang Xu, Mengda Yang, and Chenjun Ma. From head to tail: Efficient black-box model inversion attack via long-tailed learning. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 29288–29298, 2025
work page 2025
-
[29]
Uncovering the connections be- tween adversarial transferability and knowledge transferabil- ity
Kaizhao Liang, Jacky Y Zhang, Boxin Wang, Zhuolin Yang, Sanmi Koyejo, and Bo Li. Uncovering the connections be- tween adversarial transferability and knowledge transferabil- ity. InInternational Conference on Machine Learning, pages 6577–6587. PMLR, 2021
work page 2021
-
[30]
Deep learning face attributes in the wild
Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. InProceedings of the IEEE international conference on computer vision, pages 3730–3738, 2015
work page 2015
-
[31]
Improving adversarial transferability via model alignment
Avery Ma, Amir-massoud Farahmand, Yangchen Pan, Philip Torr, and Jindong Gu. Improving adversarial transferability via model alignment. InEuropean Conference on Computer Vision, pages 74–92. Springer, 2024
work page 2024
-
[32]
Magface: A universal representation for face recognition and quality assessment
Qiang Meng, Shichao Zhao, Zhida Huang, and Feng Zhou. Magface: A universal representation for face recognition and quality assessment. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14225–14234, 2021
work page 2021
-
[33]
Deepfool: a simple and accurate method to fool deep neural networks
Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pas- cal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. InProceedings of the IEEE con- ference on computer vision and pattern recognition, pages 2574–2582, 2016
work page 2016
-
[34]
A data-driven approach to cleaning large face datasets
Hong-Wei Ng and Stefan Winkler. A data-driven approach to cleaning large face datasets. In2014 IEEE international conference on image processing (ICIP), pages 343–347. IEEE, 2014
work page 2014
-
[35]
Deep neural networks are easily fooled: High confidence predictions for unrecognizable images
Anh Nguyen, Jason Yosinski, and Jeff Clune. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. InProceedings of the IEEE con- ference on computer vision and pattern recognition, pages 427–436, 2015
work page 2015
-
[36]
Re-thinking model inversion attacks against deep neural networks
Ngoc-Bao Nguyen, Keshigeyan Chandrasegaran, Milad Ab- dollahzadeh, and Ngai-Man Cheung. Re-thinking model inversion attacks against deep neural networks. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023
work page 2023
-
[37]
Label-only model in- version attacks via knowledge transfer
Ngoc-Bao Nguyen, Keshigeyan Chandrasegaran, Milad Ab- dollahzadeh, and Ngai man Cheung. Label-only model in- version attacks via knowledge transfer. InThirty-seventh Conference on Neural Information Processing Systems, 2023
work page 2023
-
[38]
Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples
Nicolas Papernot, Patrick McDaniel, and Ian Goodfellow. Transferability in machine learning: from phenomena to black-box attacks using adversarial samples.arXiv preprint arXiv:1605.07277, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[39]
The lim- itations of deep learning in adversarial settings
Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram Swami. The lim- itations of deep learning in adversarial settings. In2016 IEEE European symposium on security and privacy (EuroS&P), pages 372–387. IEEE, 2016
work page 2016
-
[40]
Bilateral dependency optimiza- tion: Defending against model-inversion attacks
Xiong Peng, Feng Liu, Jingfeng Zhang, Long Lan, Junjie Ye, Tongliang Liu, and Bo Han. Bilateral dependency optimiza- tion: Defending against model-inversion attacks. InKDD, 2022
work page 2022
-
[41]
Pseudo-private data guided model inversion attacks
Xiong Peng, Bo Han, Feng Liu, Tongliang Liu, and Mingyuan Zhou. Pseudo-private data guided model inversion attacks. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024
work page 2024
-
[42]
Boosting the transferability of ad- versarial attacks with reverse adversarial perturbation
Zeyu Qin, Yanbo Fan, Yi Liu, Li Shen, Yong Zhang, Jue Wang, and Baoyuan Wu. Boosting the transferability of ad- versarial attacks with reverse adversarial perturbation. pages 29845–29858, 2022
work page 2022
-
[43]
A closer look at gan priors: Exploiting intermediate features for enhanced model inversion attacks
Yixiang Qiu, Hao Fang, Hongyao Yu, Bin Chen, MeiKang Qiu, and Shu-Tao Xia. A closer look at gan priors: Exploiting intermediate features for enhanced model inversion attacks. InProceedings of European Conference on Computer Vision, 2024
work page 2024
-
[44]
Facenet: A unified embedding for face recognition and clus- tering
Florian Schroff, Dmitry Kalenichenko, and James Philbin. Facenet: A unified embedding for face recognition and clus- tering. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 815–823, 2015
work page 2015
-
[45]
Adversarial training for free! Advances in neural information processing systems, 32, 2019
Ali Shafahi, Mahyar Najibi, Mohammad Amin Ghiasi, Zheng Xu, John Dickerson, Christoph Studer, Larry S Davis, Gavin Taylor, and Tom Goldstein. Adversarial training for free! Advances in neural information processing systems, 32, 2019
work page 2019
-
[46]
Capabilities of Gemini Models in Medicine
Laura Smith and Wei Chen. Capabilities of gemini models in medicine.arXiv preprint arXiv:2404.18416, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[47]
Plug & play attacks: Towards robust and flexible model inversion attacks
Lukas Struppek, Dominik Hintersdorf, Antonio De Almeida Correira, Antonia Adler, and Kristian Kersting. Plug & play attacks: Towards robust and flexible model inversion attacks. InInternational Conference on Machine Learning, pages 20522–20545. PMLR, 2022
work page 2022
-
[48]
Lukas Struppek, Dominik Hintersdorf, and Kristian Kersting. Be careful what you smooth for: Label smoothing can be a privacy shield but also a catalyst for model inversion attacks. InICLR, 2024
work page 2024
-
[49]
Lukas Struppek, Dominik Hintersdorf, and Kristian Kerst- ing. Be careful what you smooth for: Label smoothing can be a privacy shield but also a catalyst for model inversion attacks. InThe Twelfth International Conference on Learning Representations, 2024
work page 2024
-
[50]
Intriguing properties of neural networks
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks.arXiv preprint arXiv:1312.6199, 2013
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[51]
Sanli Tang, Xiaolin Huang, Mingjian Chen, Chengjin Sun, and Jie Yang. Adversarial attack type i: Cheat classifiers by significant changes.IEEE transactions on pattern analysis and machine intelligence, 43(3):1100–1109, 2019
work page 2019
-
[52]
Maxvit: Multi- axis vision transformer
Zhengzhong Tu, Hossein Talebi, Han Zhang, Feng Yang, Peyman Milanfar, Alan Bovik, and Yinxiao Li. Maxvit: Multi- axis vision transformer. InEuropean conference on computer vision, pages 459–479. Springer, 2022
work page 2022
-
[53]
Kuan-Chieh Wang, Yan Fu, Ke Li, Ashish Khisti, Richard Zemel, and Alireza Makhzani. Variational model inversion attacks.Advances in Neural Information Processing Systems, 34:9706–9719, 2021
work page 2021
-
[54]
Improving robustness to model inversion attacks via mutual information regularization
Tianhao Wang, Yuheng Zhang, and Ruoxi Jia. Improving robustness to model inversion attacks via mutual information regularization. InProceedings of the AAAI Conference on Artificial Intelligence, pages 11666–11673, 2021
work page 2021
-
[55]
Mitigating neural network overconfidence with logit normalization
Hongxin Wei, Renchunzi Xie, Hao Cheng, Lei Feng, Bo An, and Yixuan Li. Mitigating neural network overconfidence with logit normalization. InInternational conference on ma- chine learning, pages 23631–23644. PMLR, 2022
work page 2022
-
[56]
Skip connections matter: On the transferability of adversarial examples generated with resnets
Dongxian Wu, Yisen Wang, Shu-Tao Xia, James Bailey, and Xingjun Ma. Skip connections matter: On the transferability of adversarial examples generated with resnets. InInterna- tional Conference on Learning Representations, 2020
work page 2020
-
[57]
Jiawei Yang, Hanbo Chen, Jiangpeng Yan, Xiaoyu Chen, and Jianhua Yao. Towards better understanding and better generalization of few-shot classification in histology images with contrastive learning. 2022
work page 2022
-
[58]
Neural network inversion in adversarial setting via back- ground knowledge alignment
Ziqi Yang, Jiyi Zhang, Ee-Chien Chang, and Zhenkai Liang. Neural network inversion in adversarial setting via back- ground knowledge alignment. InProceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, pages 225–240, 2019
work page 2019
-
[59]
Pseudo label-guided model in- version attack via conditional generative adversarial network
Xiaojian Yuan, Kejiang Chen, Jie Zhang, Weiming Zhang, Nenghai Yu, and Yang Zhang. Pseudo label-guided model in- version attack via conditional generative adversarial network. Thirty Seventh AAAI Conference on Artificial Intelligence (AAAI 23), 2023
work page 2023
-
[60]
The secret revealer: Generative model- inversion attacks against deep neural networks
Yuheng Zhang, Ruoxi Jia, Hengzhi Pei, Wenxiao Wang, Bo Li, and Dawn Song. The secret revealer: Generative model- inversion attacks against deep neural networks. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 253–261, 2020
work page 2020
-
[61]
Yechao Zhang, Shengshan Hu, Leo Yu Zhang, Junyu Shi, Minghui Li, Xiaogeng Liu, Wei Wan, and Hai Jin. Towards understanding adversarial transferability from surrogate train- ing.IEEE Symposium on Security and Privacy (SP), 2024
work page 2024
-
[62]
Yi Zhou, Cheng Li, and Manish Gupta. A semi-automated approach for crafting outputs with gemini pro.Advanced Engineering Informatics, 61:101432, 2024
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.