AI Model Extraction Attacks: Bypassing Single-Client Assumptions in Defenses

Gustavo S\'anchez; Johannes F. Loevenich; Laurin Holz; Maxime Schwarzer; Roberto Rigolin F. Lopes; Thies M\"ohlenhof; Tobias H\"urten; Veit Hagenmeyer

arxiv: 2606.03381 · v1 · pith:TFXJMZOYnew · submitted 2026-06-02 · 💻 cs.CR · cs.AI

AI Model Extraction Attacks: Bypassing Single-Client Assumptions in Defenses

Maxime Schwarzer , Johannes F. Loevenich , Gustavo S\'anchez , Laurin Holz , Thies M\"ohlenhof , Tobias H\"urten , Roberto Rigolin F. Lopes , Veit Hagenmeyer This is my paper

Pith reviewed 2026-06-28 09:57 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords model extraction attackssingle client assumptionPRADAcoordinated adversariesadvanced persistent threatsAI model securitydefense bypass

0 comments

The pith

The single-client assumption in model extraction defenses fails against coordinated attackers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper argues that current defenses against model extraction attacks rest on the assumption that attacks come from single, isolated clients. It demonstrates that when adversaries coordinate, such as in advanced persistent threats, they can distribute queries to evade detection. Using a new framework to simulate these scenarios, it shows established defenses like PRADA lose effectiveness with simple round-robin strategies. The work calls for shifting to defenses that do not rely on client identity.

Core claim

The Single Client Assumption (SCA) is fundamentally invalid in the presence of coordinated threat actors such as APTs. Well-established defenses like PRADA can be bypassed by basic round-robin query distribution, leading to significant reduction in detection performance. Even global aggregation approaches can be rendered useless through adaptive traffic mixing, necessitating stateful, identity-independent defense architectures.

What carries the argument

The Single Client Assumption (SCA), the implicit premise that model extraction attacks originate from isolated client identities, which the paper shows does not hold for coordinated adversaries.

If this is right

PRADA and similar defenses show significantly reduced detection performance when queries are distributed round-robin across clients.
Global aggregation defenses become operationally useless against adaptive traffic mixing strategies.
Defense architectures must move toward stateful and identity-independent designs to counter coordinated attacks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Other identity-based defenses in AI security may similarly fail against distributed adversaries.
Testing defenses in simulated multi-client environments could reveal vulnerabilities not apparent in single-client tests.
Real-world deployment in critical infrastructure should account for APT coordination capabilities.

Load-bearing premise

The simulated distributed attack scenarios accurately reflect the query patterns and capabilities of real coordinated adversaries like APTs.

What would settle it

A real-world observation of an APT using distributed query strategies against a deployed model and whether PRADA or similar systems detect it at expected rates.

Figures

Figures reproduced from arXiv: 2606.03381 by Gustavo S\'anchez, Johannes F. Loevenich, Laurin Holz, Maxime Schwarzer, Roberto Rigolin F. Lopes, Thies M\"ohlenhof, Tobias H\"urten, Veit Hagenmeyer.

**Figure 2.** Figure 2: Abstract flow of an experiment in CerberusAI: From [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

read the original abstract

Ensuring the protection of Artificial Intelligence (AI) models deployed in military Command and Control (C2) systems and critical infrastructure is essential for maintaining information superiority. Model Extraction Attacks (MEAs) pose a significant threat, as they enable adversaries to replicate proprietary models, compromise protected information, and prepare offline adversarial attacks. However, current defense strategies predominantly rely on the Single Client Assumption (SCA), which is the implicit assumption that attacks originate from isolated identities. This work systematically demonstrates that the SCA is fundamentally invalid in the presence of coordinated threat actors, such as Advanced Persistent Threats (APTs). We introduce a modular, open-source framework called CerberusAI for reproducible model-stealing research, and use it to simulate distributed attack scenarios. Our empirical evaluation shows that well-established defense mechanisms, such as Protecting Against Deep Neural Network Model Stealing Attacks (PRADA), can be bypassed by basic round-robin query distribution strategies, resulting in a significant reduction in detection performance. Furthermore, we demonstrate that even global aggregation approaches can be rendered operationally useless through adaptive traffic mixing. These results highlight the need for a paradigm shift towards stateful, identity-independent defense architectures in the field of model extraction attacks. This paper was originally presented at the International Conference on Military Communication and Information Systems (ICMCIS), organized by the Information Systems Technology (IST) Scientific and Technical Committee, IST-224-RSY - the ICMCIS, held in Bath, United Kingdom, 12-13 May 2026 and won the best paper award.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows single-client defenses fail against distributed MEAs via simulation, but the evidence is thin on experimental specifics.

read the letter

The main thing to know is that this work claims the single-client assumption in model extraction defenses is invalid once attackers coordinate, and they demonstrate it with a new open-source framework called CerberusAI plus bypass results against PRADA and global aggregation.

What the paper does is introduce a modular simulation tool for distributed attack scenarios and run empirical checks showing round-robin queries cut detection performance and adaptive mixing can sideline aggregation. That framework is a concrete addition for anyone wanting to test these ideas reproducibly, and the military C2 focus makes the stakes clear.

The soft spots sit in the evaluation. The results come from simulation only, with no visible details here on model architectures, query volumes, exact metrics, baselines, or error bars, so the size of the claimed reductions is hard to judge. Whether the simulated patterns match what real coordinated actors like APTs would do remains an assumption rather than a tested claim. The work is a conference paper, so the analysis stays at the level of demonstration rather than exhaustive validation.

This is for researchers in AI security who care about practical defenses in sensitive deployments. A reader looking for tools to explore distributed threats would find it useful. It deserves a serious referee because the core challenge to the SCA is direct and the framework is open, even if more experimental grounding would strengthen it.

I'd send it to peer review and ask for the missing experimental details and any steps toward real-world checks.

Referee Report

2 major / 1 minor

Summary. The manuscript presents an argument that the Single-Client Assumption (SCA) in model extraction attack (MEA) defenses is fundamentally invalid for coordinated threat actors such as Advanced Persistent Threats (APTs). It introduces the CerberusAI open-source framework to simulate distributed attack scenarios and empirically demonstrates that defenses like PRADA can be bypassed using round-robin query distribution, leading to reduced detection performance, while global aggregation approaches can be rendered ineffective through adaptive traffic mixing. The work advocates for a shift to stateful, identity-independent defense architectures.

Significance. If the results hold, the significance is high for the field of AI security, particularly for protecting models in military C2 systems and critical infrastructure. By providing a modular, open-source framework for reproducible model-stealing research, the paper enables the community to build upon the findings and explore coordinated attack vectors. This addresses a potential gap in current defenses and could influence the design of future MEA protections.

major comments (2)

[Abstract] Abstract: The description of the empirical evaluation lacks specific quantitative results, such as the exact reduction in detection performance for PRADA or details on the simulation parameters (e.g., number of clients, query distribution), which is load-bearing for substantiating the bypass claims and the conclusion that the SCA is invalid.
[Abstract] Abstract: The generalization that the SCA is 'fundamentally invalid' in the presence of coordinated actors rests on the simulation results; however, the manuscript does not provide evidence or discussion on how the CerberusAI scenarios map to real APT capabilities, posing a risk to the central claim's applicability.

minor comments (1)

The abstract includes details about the conference presentation and best paper award, which might be better placed in the acknowledgments section rather than the abstract.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the presentation of our results. We address each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: The description of the empirical evaluation lacks specific quantitative results, such as the exact reduction in detection performance for PRADA or details on the simulation parameters (e.g., number of clients, query distribution), which is load-bearing for substantiating the bypass claims and the conclusion that the SCA is invalid.

Authors: We agree that the abstract would be strengthened by including concrete quantitative results and simulation parameters. The current abstract prioritizes brevity, but the full manuscript contains these details from the CerberusAI experiments. In the revised version we will incorporate specific figures (e.g., detection-rate reductions under round-robin distribution and the exact client counts and query schedules used) directly into the abstract. revision: yes
Referee: [Abstract] Abstract: The generalization that the SCA is 'fundamentally invalid' in the presence of coordinated actors rests on the simulation results; however, the manuscript does not provide evidence or discussion on how the CerberusAI scenarios map to real APT capabilities, posing a risk to the central claim's applicability.

Authors: The central claim is that the SCA is invalid once coordinated, identity-independent attacks are admitted; the simulations demonstrate that such attacks are feasible and effective against existing defenses. We do not claim the scenarios are calibrated to any specific real-world APT campaign, as no public dataset of that form exists. In revision we will add an explicit limitations paragraph that states the modeling assumptions of CerberusAI and notes that the results establish a lower bound on vulnerability rather than a calibrated threat model. revision: partial

Circularity Check

0 steps flagged

Empirical simulation with no derivation chain

full rationale

The paper's central contribution is an empirical demonstration: it introduces the CerberusAI simulation framework and reports that round-robin query distribution bypasses PRADA and global aggregation. No equations, fitted parameters, or mathematical derivations are present in the abstract or described content. The claims rest on simulation outcomes rather than any reduction of a result to its own inputs by construction, self-citation, or ansatz smuggling. The reader's assessment of score 1.0 is consistent with the absence of any load-bearing derivation step.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Abstract-only review provides no equations, fitted parameters, or explicit axioms. The framework CerberusAI is introduced as a new tool but without details on its internal parameters or assumptions.

invented entities (1)

CerberusAI framework no independent evidence
purpose: Modular open-source tool to simulate distributed model extraction attacks
Introduced in the abstract as the vehicle for the empirical evaluation; no independent evidence of its correctness or completeness is provided.

pith-pipeline@v0.9.1-grok · 5842 in / 1216 out tokens · 17168 ms · 2026-06-28T09:57:42.067439+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

20 extracted references · 4 canonical work pages

[1]

Design and evaluation of an Autonomous Cyber Defence agent using DRL and an augmented LLM,

J. F. Loevenich, E. Adler, T. Huerten, and R. R. F. Lopes, “Design and evaluation of an Autonomous Cyber Defence agent using DRL and an augmented LLM,”Computer Networks, vol. 262, p. 111162, 2025. [Online]. Available: https://doi.org/10.1016/j.comnet.2025.111162

work page doi:10.1016/j.comnet.2025.111162 2025
[2]

Automating Cyber Threat Intelligence and Attack Chain Generation using Cyber Security Knowledge Graphs and Large Language Models,

J. F. Loevenich, E. Adler, T. H ¨urten, F. Spelter, D. Roncevic, and R. R. F. Lopes, “Automating Cyber Threat Intelligence and Attack Chain Generation using Cyber Security Knowledge Graphs and Large Language Models,” in2025 International Conference on Military Communication and Information Systems (ICMCIS), 2025, pp. 1–10

2025
[3]

Software-defined defence: Algorithms at war,

S. R. Soare, P. Singh, and M. Nouwens, “Software-defined defence: Algorithms at war,”The International Institute for Strategic Studies, 2023

2023
[4]

Knockoff nets: Stealing functionality of black-box models,

T. Orekondy, B. Schiele, and M. Fritz, “Knockoff nets: Stealing functionality of black-box models,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4954– 4963

2019
[5]

Practical black-box attacks against machine learning,

N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami, “Practical black-box attacks against machine learning,” in Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, ser. ASIA CCS ’17. New York, NY , USA: Association for Computing Machinery, 2017, p. 506–519. [Online]. Available: https://doi.org/10....

work page doi:10.1145/3052973.3053009 2017
[6]

PRADA: Protecting against DNN Model Stealing Attacks,

M. Juuti, S. Szyller, A. Dmitrenko, S. Marchal, and N. Asokan, “PRADA: Protecting against DNN Model Stealing Attacks,”CoRR, vol. abs/1805.02628, 2018. [Online]. Available: http://arxiv.org/abs/1805. 02628

Pith/arXiv arXiv 2018
[7]

Queen: Query unlearning against model extraction,

H. Chen, T. Zhu, L. Zhang, B. Liu, D. Wang, W. Zhou, and M. Xue, “Queen: Query unlearning against model extraction,”Trans. Info. For. Sec., vol. 20, p. 2143–2156, Jan. 2025. [Online]. Available: https://doi.org/10.1109/TIFS.2025.3538266

work page doi:10.1109/tifs.2025.3538266 2025
[8]

Model-guardian: Protecting against data-free model stealing using gradient representations and deceptive predictions,

Y . Yang, X. Chen, Y . Xuan, and Z. Zhao, “Model-guardian: Protecting against data-free model stealing using gradient representations and deceptive predictions,” in2025 IEEE International Conference on Multimedia and Expo (ICME), 2025, pp. 1–6

2025
[9]

Stealing machine learning models via prediction{APIs},

F. Tram `er, F. Zhang, A. Juels, M. K. Reiter, and T. Ristenpart, “Stealing machine learning models via prediction{APIs},” in25th USENIX security symposium (USENIX Security 16), 2016, pp. 601–618

2016
[10]

Activethief: Model extraction using active learning and unannotated public data,

S. Pal, Y . Gupta, A. Shukla, A. Kanade, S. Shevade, and V . Ganapathy, “Activethief: Model extraction using active learning and unannotated public data,”Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 01, pp. 865–872, Apr. 2020. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/5432

2020
[11]

Maze: Data-free model stealing attack using zeroth-order gradient estimation,

S. Kariyappa, A. Prakash, and M. Qureshi, “Maze: Data-free model stealing attack using zeroth-order gradient estimation,” 06 2021, pp. 13 809–13 818

2021
[12]

First to possess his statistics: Data-free model extraction attack on tabular data,

M. Tasumi, K. Iwahana, N. Yanai, K. Shishido, T. Shimizu, Y . Higuchi, I. Morikawa, and J. Yajima, “First to possess his statistics: Data-free model extraction attack on tabular data,” 2021. [Online]. Available: https://arxiv.org/abs/2109.14857

arXiv 2021
[13]

Inversion-guided defense: Detecting model stealing attacks by output inverting,

S. Zhou, T. Zhu, D. Ye, W. Zhou, and W. Zhao, “Inversion-guided defense: Detecting model stealing attacks by output inverting,”IEEE Transactions on Information Forensics and Security, vol. 19, pp. 4130– 4145, 2024

2024
[14]

Cyber espionage through botnets,

Z. Bederna and T. Szadeczky, “Cyber espionage through botnets,” Security Journal, vol. 33, no. 1, pp. 43–62, Mar. 2020. [Online]. Available: https://doi.org/10.1057/s41284-019-00194-6

work page doi:10.1057/s41284-019-00194-6 2020
[15]

Fdinet: Protecting against dnn model extraction via feature distortion index,

H. Yao, Z. Li, H. Weng, F. Xue, Z. Qin, and K. Ren, “Fdinet: Protecting against dnn model extraction via feature distortion index,”
[16]

Available: https://arxiv.org/abs/2306.11338

[Online]. Available: https://arxiv.org/abs/2306.11338

arXiv
[17]

Attacking learning-based models in smart grids: Current challenges and new frontiers,

G. S ´anchez, G. Elbez, and V . Hagenmeyer, “Attacking learning-based models in smart grids: Current challenges and new frontiers,” in Proceedings of the 15th ACM International Conference on Future and Sustainable Energy Systems, 2024, pp. 589–595

2024
[18]

An analysis of variance test for normality (complete samples),

S. S. Shapiro and M. B. Wilk, “An analysis of variance test for normality (complete samples),”Biometrika, vol. 52, no. 3/4, pp. 591–611, 1965

1965
[19]

Silberschatz, P

A. Silberschatz, P. B. Galvin, and G. Gagne,Operating System Concepts, 8th ed. John Wiley & Sons, 2008

2008
[20]

Boosting adversarial attacks with momentum,

Y . Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu, and J. Li, “Boosting adversarial attacks with momentum,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 9185– 9193

2018

[1] [1]

Design and evaluation of an Autonomous Cyber Defence agent using DRL and an augmented LLM,

J. F. Loevenich, E. Adler, T. Huerten, and R. R. F. Lopes, “Design and evaluation of an Autonomous Cyber Defence agent using DRL and an augmented LLM,”Computer Networks, vol. 262, p. 111162, 2025. [Online]. Available: https://doi.org/10.1016/j.comnet.2025.111162

work page doi:10.1016/j.comnet.2025.111162 2025

[2] [2]

Automating Cyber Threat Intelligence and Attack Chain Generation using Cyber Security Knowledge Graphs and Large Language Models,

J. F. Loevenich, E. Adler, T. H ¨urten, F. Spelter, D. Roncevic, and R. R. F. Lopes, “Automating Cyber Threat Intelligence and Attack Chain Generation using Cyber Security Knowledge Graphs and Large Language Models,” in2025 International Conference on Military Communication and Information Systems (ICMCIS), 2025, pp. 1–10

2025

[3] [3]

Software-defined defence: Algorithms at war,

S. R. Soare, P. Singh, and M. Nouwens, “Software-defined defence: Algorithms at war,”The International Institute for Strategic Studies, 2023

2023

[4] [4]

Knockoff nets: Stealing functionality of black-box models,

T. Orekondy, B. Schiele, and M. Fritz, “Knockoff nets: Stealing functionality of black-box models,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4954– 4963

2019

[5] [5]

Practical black-box attacks against machine learning,

N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami, “Practical black-box attacks against machine learning,” in Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, ser. ASIA CCS ’17. New York, NY , USA: Association for Computing Machinery, 2017, p. 506–519. [Online]. Available: https://doi.org/10....

work page doi:10.1145/3052973.3053009 2017

[6] [6]

PRADA: Protecting against DNN Model Stealing Attacks,

M. Juuti, S. Szyller, A. Dmitrenko, S. Marchal, and N. Asokan, “PRADA: Protecting against DNN Model Stealing Attacks,”CoRR, vol. abs/1805.02628, 2018. [Online]. Available: http://arxiv.org/abs/1805. 02628

Pith/arXiv arXiv 2018

[7] [7]

Queen: Query unlearning against model extraction,

H. Chen, T. Zhu, L. Zhang, B. Liu, D. Wang, W. Zhou, and M. Xue, “Queen: Query unlearning against model extraction,”Trans. Info. For. Sec., vol. 20, p. 2143–2156, Jan. 2025. [Online]. Available: https://doi.org/10.1109/TIFS.2025.3538266

work page doi:10.1109/tifs.2025.3538266 2025

[8] [8]

Model-guardian: Protecting against data-free model stealing using gradient representations and deceptive predictions,

Y . Yang, X. Chen, Y . Xuan, and Z. Zhao, “Model-guardian: Protecting against data-free model stealing using gradient representations and deceptive predictions,” in2025 IEEE International Conference on Multimedia and Expo (ICME), 2025, pp. 1–6

2025

[9] [9]

Stealing machine learning models via prediction{APIs},

F. Tram `er, F. Zhang, A. Juels, M. K. Reiter, and T. Ristenpart, “Stealing machine learning models via prediction{APIs},” in25th USENIX security symposium (USENIX Security 16), 2016, pp. 601–618

2016

[10] [10]

Activethief: Model extraction using active learning and unannotated public data,

S. Pal, Y . Gupta, A. Shukla, A. Kanade, S. Shevade, and V . Ganapathy, “Activethief: Model extraction using active learning and unannotated public data,”Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 01, pp. 865–872, Apr. 2020. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/5432

2020

[11] [11]

Maze: Data-free model stealing attack using zeroth-order gradient estimation,

S. Kariyappa, A. Prakash, and M. Qureshi, “Maze: Data-free model stealing attack using zeroth-order gradient estimation,” 06 2021, pp. 13 809–13 818

2021

[12] [12]

First to possess his statistics: Data-free model extraction attack on tabular data,

M. Tasumi, K. Iwahana, N. Yanai, K. Shishido, T. Shimizu, Y . Higuchi, I. Morikawa, and J. Yajima, “First to possess his statistics: Data-free model extraction attack on tabular data,” 2021. [Online]. Available: https://arxiv.org/abs/2109.14857

arXiv 2021

[13] [13]

Inversion-guided defense: Detecting model stealing attacks by output inverting,

S. Zhou, T. Zhu, D. Ye, W. Zhou, and W. Zhao, “Inversion-guided defense: Detecting model stealing attacks by output inverting,”IEEE Transactions on Information Forensics and Security, vol. 19, pp. 4130– 4145, 2024

2024

[14] [14]

Cyber espionage through botnets,

Z. Bederna and T. Szadeczky, “Cyber espionage through botnets,” Security Journal, vol. 33, no. 1, pp. 43–62, Mar. 2020. [Online]. Available: https://doi.org/10.1057/s41284-019-00194-6

work page doi:10.1057/s41284-019-00194-6 2020

[15] [15]

Fdinet: Protecting against dnn model extraction via feature distortion index,

H. Yao, Z. Li, H. Weng, F. Xue, Z. Qin, and K. Ren, “Fdinet: Protecting against dnn model extraction via feature distortion index,”

[16] [16]

Available: https://arxiv.org/abs/2306.11338

[Online]. Available: https://arxiv.org/abs/2306.11338

arXiv

[17] [17]

Attacking learning-based models in smart grids: Current challenges and new frontiers,

G. S ´anchez, G. Elbez, and V . Hagenmeyer, “Attacking learning-based models in smart grids: Current challenges and new frontiers,” in Proceedings of the 15th ACM International Conference on Future and Sustainable Energy Systems, 2024, pp. 589–595

2024

[18] [18]

An analysis of variance test for normality (complete samples),

S. S. Shapiro and M. B. Wilk, “An analysis of variance test for normality (complete samples),”Biometrika, vol. 52, no. 3/4, pp. 591–611, 1965

1965

[19] [19]

Silberschatz, P

A. Silberschatz, P. B. Galvin, and G. Gagne,Operating System Concepts, 8th ed. John Wiley & Sons, 2008

2008

[20] [20]

Boosting adversarial attacks with momentum,

Y . Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu, and J. Li, “Boosting adversarial attacks with momentum,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 9185– 9193

2018