pith. sign in

arxiv: 2508.01107 · v2 · submitted 2025-08-01 · 💻 cs.CR · cs.DC

Variational Autoencoder-Based Black-Box Adversarial Attack on Collaborative DNN Inference

Pith reviewed 2026-05-19 00:55 UTC · model grok-4.3

classification 💻 cs.CR cs.DC
keywords adversarial attackvariational autoencodercollaborative DNN inferenceblack-box attackmodel partitioningIoT securitymisclassificationCIFAR-100
0
0 comments X

The pith

AdVAR-DNN launches black-box adversarial attacks on collaborative DNN inference by detecting partitioning exchanges and using a VAE to generate misclassifying samples.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces AdVAR-DNN as a method that first uses classifiers to extract dynamic DNN partitioning details from unsecured network traffic and then trains a variational autoencoder to produce adversarial input samples that cause the collaborative inference to misclassify. This setup targets IoT environments where devices offload parts of DNN computation to remote servers for real-time tasks such as object classification. A sympathetic reader would care because the attack requires no prior knowledge of the model architecture or partitioning scheme, yet claims high success rates with little detection on standard datasets. If correct, the result shows that the information exchanged to enable collaborative inference itself creates an exploitable vulnerability.

Core claim

AdVAR-DNN is a black-box adversarial attack that leverages classifiers to detect model partitioning information from network exchanges and a variational autoencoder to generate untraceable manipulated samples, compromising collaborative DNN inference with high attack success rate and little to no probability of detection on popular object classification DNNs evaluated with the CIFAR-100 dataset.

What carries the argument

AdVAR-DNN, which combines partitioning-information classifiers with a variational autoencoder to produce adversarial samples that exploit unsecured exchanges in collaborative inference.

If this is right

  • The attack succeeds without prior knowledge of the DNN model or its partitioning.
  • It produces high misclassification rates on popular object classification DNNs.
  • Detection probability stays low or zero during evaluation on CIFAR-100.
  • The approach applies to dynamic partitioning scenarios common in IoT collaborative inference.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Protecting the channel that carries partitioning data would directly block the first stage of the attack.
  • The same detection-plus-VAE pattern could be tested against other distributed inference or edge AI setups.
  • Measuring attack performance under realistic network latency and packet loss would show how well the method survives real IoT conditions.

Load-bearing premise

Partitioning information can be reliably detected from unsecured network exchanges and a VAE trained on that information can generate effective adversarial samples without any prior knowledge of the DNN model.

What would settle it

Running AdVAR-DNN against an actual collaborative inference system on CIFAR-100 and measuring both the misclassification rate and any detection rate by the participants.

Figures

Figures reproduced from arXiv: 2508.01107 by Motahare Mounesan, Saptarshi Debroy, Shima Yousefi.

Figure 1
Figure 1. Figure 1: (a) Clustering of features from VGG19 and MobileNet; (b) Clustering of features from two layers of VGG19; (c) Clustering of features from two layers of MobileNet TABLE I: Layer specifications for the benchmarking experiments. DNN Model Layer index Layer name Layer output size VGG19 16 block4 conv1 (- , 14, 14, 512) 20 block5 conv4 (- , 14, 14, 512) MobileNet 40 conv pw 6 (- , 14, 14, 512) 63 conv dw 10 rel… view at source ↗
Figure 2
Figure 2. Figure 2: K-means clustering results for class intermediate outputs from (a) VGG19 and (b) MobileNet [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: The overall workflow of the AdVAR-DNN attack. The adversary perturbs the extracted latent features before feeding them into the decoder, leading to the generation of adversarial samples that deviate significantly from the original input data. the attacker aims to cause misclassification with high degree of confidence (by the DNN) in order to avoid triggering system anomaly detection. C. AdVAR-DNN Attack Me… view at source ↗
Figure 5
Figure 5. Figure 5: Performance of AlexNet under adversarial attack vs.α (attack strength) : (a) DNN model accuracy degradation, (b) DNN model confidence of prediction, (c) Attack Success Rate (ASR) • Attack Success Rate (ASR) evaluates how often the attacker successfully degrades the DNN models’ classification performance. It shows the percentage by which accuracy decreases in comparison to the baseline accuracy, which is th… view at source ↗
Figure 6
Figure 6. Figure 6: Performance of VGG under adversarial attack vs.α (attack strength) : (a) DNN model accuracy degradation, (b) DNN model confidence of prediction, (c) Attack Success Rate (ASR) (a) (b) [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Performance of MobileNet under adversarial attack vs.α (attack strength) : (a) Classifier accuracy degradation, (b) Classifier confidence of prediction, (c) Attack Success Rate (ASR) [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Impact of Intermediate Collected Samples on VAE Training and DNN model Deception (VGG19 Layer 20). Increasing the number of collected samples leads to higher DNN model confidence in misclassified predictions. D. Attack Data Budget The success of an adversarial attack depends on both the model architecture and the amount of data collected for training the VAE. To analyze this, we experiment with different d… view at source ↗
Figure 9
Figure 9. Figure 9: Performance of VGG19 under adversarial attack using different interpolation techniques vs.α (attack strength) : (a) Classifier confidence of prediction, (b) Attack Success Rate (ASR) sample size for an effective attack without making it dependent on oversampling with diminishing return, thus making the approach more realistic. E. Effect of Interpolation on AdVAR-DNN Attack Next, we compare Spherical Linear… view at source ↗
read the original abstract

In recent years, Deep Neural Networks (DNNs) have become increasingly integral to IoT-based environments, enabling realtime visual computing. However, the limited computational capacity of these devices has motivated the adoption of collaborative DNN inference, where the IoT device offloads part of the inference-related computation to a remote server. Such offloading often requires dynamic DNN partitioning information to be exchanged among the participants over an unsecured network or via relays/hops, leading to novel privacy vulnerabilities. In this paper, we propose AdVAR-DNN, an adversarial variational autoencoder (VAE)-based misclassification attack, leveraging classifiers to detect model information and a VAE to generate untraceable manipulated samples, specifically designed to compromise the collaborative inference process. AdVAR-DNN attack uses the sensitive information exchange vulnerability of collaborative DNN inference and is black-box in nature in terms of having no prior knowledge about the DNN model and how it is partitioned. Our evaluation using the most popular object classification DNNs on the CIFAR-100 dataset demonstrates the effectiveness of AdVAR-DNN in terms of high attack success rate with little to no probability of detection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes AdVAR-DNN, a variational autoencoder (VAE)-based black-box adversarial attack targeting collaborative DNN inference in IoT settings. It exploits the exchange of dynamic DNN partitioning information over unsecured networks by employing classifiers to extract model details and a VAE to generate manipulated input samples that induce misclassification while remaining difficult to detect. The attack is presented as requiring no prior knowledge of the target DNN architecture or partitioning strategy. Evaluation is claimed to demonstrate high attack success rates with little to no detection probability when tested on popular object-classification DNNs using the CIFAR-100 dataset.

Significance. If the empirical results hold under rigorous scrutiny, the work would identify a concrete privacy and integrity vulnerability in collaborative inference pipelines that rely on dynamic partitioning over open networks. The combination of traffic-based model inference with VAE-generated adversarial examples could inform defenses for edge-cloud DNN deployments. The black-box framing, if substantiated with reproducible attack pipelines and cross-partition robustness tests, would strengthen the contribution beyond white-box assumptions common in adversarial ML literature.

major comments (2)
  1. [Abstract] Abstract: The central claim of 'high attack success rate with little to no probability of detection' is asserted without any quantitative metrics, baseline comparisons, error bars, or description of how success rate and detection probability were measured. This absence prevents assessment of whether the reported effectiveness supports the black-box attack narrative.
  2. [Abstract] Abstract: The black-box claim rests on the assumption that partitioning information can be reliably extracted from unsecured network exchanges without prior model knowledge. No details are supplied on classifier input features, training regime for the VAE, or robustness under realistic conditions (packet loss, timing jitter, concurrent inferences), which are load-bearing for both the attack success and low-detection results.
minor comments (1)
  1. Clarify the exact network traffic features used by the partitioning classifier and provide pseudocode or a diagram of the end-to-end attack pipeline.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments on our manuscript. We address each major comment in detail below, providing clarifications from the full paper and indicating where revisions will be made to strengthen the presentation.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim of 'high attack success rate with little to no probability of detection' is asserted without any quantitative metrics, baseline comparisons, error bars, or description of how success rate and detection probability were measured. This absence prevents assessment of whether the reported effectiveness supports the black-box attack narrative.

    Authors: We agree that the abstract, being a concise overview, does not embed the specific numerical results. The full manuscript presents these details in the Evaluation section, including attack success rates, detection probabilities measured via standard classifiers, comparisons against adapted baseline attacks, and error bars derived from repeated trials. To address the concern directly, we will revise the abstract to incorporate key quantitative indicators and a brief statement on the measurement approach. revision: yes

  2. Referee: [Abstract] Abstract: The black-box claim rests on the assumption that partitioning information can be reliably extracted from unsecured network exchanges without prior model knowledge. No details are supplied on classifier input features, training regime for the VAE, or robustness under realistic conditions (packet loss, timing jitter, concurrent inferences), which are load-bearing for both the attack success and low-detection results.

    Authors: The Methodology section describes the classifier's input features drawn from observable network traffic metadata and the VAE training procedure using standard variational objectives on the CIFAR-100 dataset. The black-box property holds because the attack infers partitioning solely from traffic without accessing model weights or architecture. We acknowledge that the evaluation focused on controlled network conditions and did not extensively test robustness to packet loss, jitter, or concurrent flows; we will add an explicit discussion of these assumptions and their implications as a limitation in the revised manuscript. revision: partial

Circularity Check

0 steps flagged

Empirical attack proposal and evaluation contain no circular derivation

full rationale

The paper introduces AdVAR-DNN as a VAE-based black-box attack that detects partitioning via classifiers and generates adversarial samples, then reports empirical success rates on CIFAR-100 with standard DNNs. No derivation chain, equations, or first-principles claims are present that reduce by construction to fitted parameters, self-definitions, or self-citations; the central results are direct experimental outcomes rather than predictions forced by the method's own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Report based solely on abstract; limited visibility into internal assumptions or parameters.

axioms (1)
  • domain assumption Collaborative DNN inference requires exchanging dynamic partitioning information over unsecured networks or relays, creating detectable vulnerabilities.
    Invoked in the abstract to motivate the attack surface.
invented entities (1)
  • AdVAR-DNN no independent evidence
    purpose: Black-box adversarial attack framework combining classifiers and VAE for collaborative inference compromise
    Proposed method name and structure introduced to address the identified vulnerability.

pith-pipeline@v0.9.0 · 5739 in / 1297 out tokens · 46001 ms · 2026-05-19T00:55:34.483824+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · 5 internal anchors

  1. [1]

    Resource management in mobile edge computing: A comprehensive survey,

    X. Zhang and S. Debroy, “Resource management in mobile edge computing: A comprehensive survey,” ACM Computing Surveys , vol. 55, no. 13s, pp. 1–37, 2023

  2. [2]

    Data security and privacy-preserving in edge computing paradigm: Survey and open issues,

    J. Zhang, B. Chen, Y . Zhao, X. Cheng, and F. Hu, “Data security and privacy-preserving in edge computing paradigm: Survey and open issues,” IEEE access, vol. 6, pp. 18 209–18 237, 2018

  3. [3]

    Intent-driven data falsification attack on collaborative iot-edge environments,

    S. Yousefi, S. Bhattacharjee, and S. Debroy, “Intent-driven data falsification attack on collaborative iot-edge environments,” in2024 IEEE/ACM Symposium on Edge Computing (SEC) . IEEE, 2024, pp. 425–430

  4. [4]

    The limitations of deep learning in adversarial settings,

    N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami, “The limitations of deep learning in adversarial settings,” in 2016 IEEE European symposium on security and privacy (EuroS&P) . IEEE, 2016, pp. 372–387

  5. [5]

    That person moves like a car: Misclassification attack detection for autonomous systems using spatiotemporal consistency,

    Y . Man, R. Muller, M. Li, Z. B. Celik, and R. Gerdes, “That person moves like a car: Misclassification attack detection for autonomous systems using spatiotemporal consistency,” in 32nd USENIX Security Symposium (USENIX Security 23) , 2023, pp. 6929–6946

  6. [6]

    A survey on security and privacy issues in edge-computing-assisted internet of things,

    A. Alwarafy, K. A. Al-Thelaya, M. Abdallah, J. Schneider, and M. Hamdi, “A survey on security and privacy issues in edge-computing-assisted internet of things,” IEEE Internet of Things Journal , vol. 8, no. 6, pp. 4004–4022, 2020. 9

  7. [7]

    Secure roadside unit hotspot against eavesdropping based traffic analysis in edge computing based internet of vehicles,

    X. Huang, R. Yu, M. Pan, and L. Shu, “Secure roadside unit hotspot against eavesdropping based traffic analysis in edge computing based internet of vehicles,” IEEE Access, vol. 6, pp. 62 371–62 383, 2018

  8. [8]

    Adversarial autoencoder based model extraction attacks for collaborative dnn inference at edge,

    M. Zneit, X. Zhang, M. Mounesan, and S. Debroy, “Adversarial autoencoder based model extraction attacks for collaborative dnn inference at edge,” in NOMS 2025-2025 IEEE Network Operations and Management Symposium , 2025, pp. 01–09

  9. [9]

    I know what you see: Power side-channel attack on convolutional neural network accelerators,

    L. Wei, B. Luo, Y . Li, Y . Liu, and Q. Xu, “I know what you see: Power side-channel attack on convolutional neural network accelerators,” in Proceedings of the 34th Annual Computer Security Applications Conference , 2018, pp. 393–406

  10. [10]

    {GhostImage}: Remote perception attacks against camera-based image classification systems,

    Y . Man, M. Li, and R. Gerdes, “ {GhostImage}: Remote perception attacks against camera-based image classification systems,” in 23rd International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2020) , 2020, pp. 317–332

  11. [11]

    Poltergeist: Acoustic adversarial machine learning against cameras and computer vision,

    X. Ji, Y . Cheng, Y . Zhang, K. Wang, C. Yan, W. Xu, and K. Fu, “Poltergeist: Acoustic adversarial machine learning against cameras and computer vision,” in 2021 IEEE Symposium on Security and Privacy (SP) . IEEE, 2021, pp. 160–175

  12. [12]

    One pixel attack for fooling deep neural networks,

    J. Su, D. V . Vargas, and K. Sakurai, “One pixel attack for fooling deep neural networks,” IEEE Transactions on Evolutionary Computation , vol. 23, no. 5, pp. 828–841, 2019

  13. [13]

    Generative adversarial networks: A survey toward private and secure applications,

    Z. Cai, Z. Xiong, H. Xu, P. Wang, W. Li, and Y . Pan, “Generative adversarial networks: A survey toward private and secure applications,”ACM Computing Surveys (CSUR), vol. 54, no. 6, pp. 1–38, 2021

  14. [14]

    Constructing unrestricted adversarial examples with generative models,

    Y . Song, R. Shu, N. Kushman, and S. Ermon, “Constructing unrestricted adversarial examples with generative models,” Advances in neural information processing systems, vol. 31, 2018

  15. [15]

    Handling uncertainty in health data using generative algorithms,

    M. A. Loodaricheh, N. Majmudar, A. Raja, and A. Salleb-Aouissi, “Handling uncertainty in health data using generative algorithms,” arXiv preprint arXiv:2503.03715, 2025

  16. [16]

    Generating Natural Adversarial Examples

    Z. Zhao, D. Dua, and S. Singh, “Generating natural adversarial examples,” arXiv preprint arXiv:1710.11342 , 2017

  17. [17]

    Model inversion attacks that exploit confidence information and basic countermeasures,

    M. Fredrikson, S. Jha, and T. Ristenpart, “Model inversion attacks that exploit confidence information and basic countermeasures,” in Proceedings of the 22nd ACM SIGSAC conference on computer and communications security , 2015, pp. 1322–1333

  18. [18]

    Model inversion attacks against collaborative inference,

    Z. He, T. Zhang, and R. B. Lee, “Model inversion attacks against collaborative inference,” in Proceedings of the 35th Annual Computer Security Applications Conference, 2019, pp. 148–162

  19. [19]

    Going deeper in spiking neural networks: Vgg and residual architectures,

    A. Sengupta, Y . Ye, R. Wang, C. Liu, and K. Roy, “Going deeper in spiking neural networks: Vgg and residual architectures,” Frontiers in neuroscience, vol. 13, p. 95, 2019

  20. [20]

    MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

    A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861 , 2017

  21. [21]

    Character Font Images,

    R. Lyman, “Character Font Images,” UCI Machine Learning Repository, 2016, DOI: https://doi.org/10.24432/C5X61Q

  22. [22]

    Internet of things applications, security challenges, attacks, intrusion detection, and future visions: A systematic review,

    N. Mishra and S. Pandya, “Internet of things applications, security challenges, attacks, intrusion detection, and future visions: A systematic review,” IEEE Access, vol. 9, pp. 59 353–59 377, 2021

  23. [23]

    Internet of things: Security and solutions survey,

    P. K. Sadhu, V . P. Yanambaka, and A. Abdelgawad, “Internet of things: Security and solutions survey,” Sensors, vol. 22, no. 19, p. 7433, 2022

  24. [24]

    Iot: Internet of threats? a survey of practical security vulnerabilities in real iot devices,

    F. Meneghello, M. Calore, D. Zucchetto, M. Polese, and A. Zanella, “Iot: Internet of threats? a survey of practical security vulnerabilities in real iot devices,” IEEE Internet of Things Journal , vol. 6, no. 5, pp. 8182–8201, 2019

  25. [25]

    Effect-dnn: Energy-efficient edge framework for real-time dnn inference,

    X. Zhang, M. Mounesan, and S. Debroy, “Effect-dnn: Energy-efficient edge framework for real-time dnn inference,” in 2023 IEEE 24th International Symposium on a World of Wireless, Mobile and Multimedia Networks (WoWMoM) . IEEE, 2023, pp. 10–20

  26. [26]

    Edgerl: Reinforcement learning-driven deep learning model inference optimization at edge,

    M. Mounesan, X. Zhang, and S. Debroy, “Edgerl: Reinforcement learning-driven deep learning model inference optimization at edge,” in 2024 20th International Conference on Network and Service Management (CNSM) , 2024, pp. 1–5

  27. [27]

    Infer-edge: Dynamic dnn inference optimization in just-in-time edge-ai implementations,

    ——, “Infer-edge: Dynamic dnn inference optimization in just-in-time edge-ai implementations,” in NOMS 2025-2025 IEEE Network Operations and Management Symposium, 2025, pp. 1–9

  28. [28]

    Data augmentation via latent space interpolation for image classification,

    X. Liu, Y . Zou, L. Kong, Z. Diao, J. Yan, J. Wang, S. Li, P. Jia, and J. You, “Data augmentation via latent space interpolation for image classification,” in 2018 24th International Conference on Pattern Recognition (ICPR) . IEEE, 2018, pp. 728–733

  29. [29]

    Sampling Generative Networks

    T. White, “Sampling generative networks,” arXiv preprint arXiv:1609.04468 , 2016

  30. [30]

    Imagenet classification with deep convolutional neural networks,

    A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Communications of the ACM , vol. 60, no. 6, pp. 84–90, 2017

  31. [31]

    Adam: A Method for Stochastic Optimization

    D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980 , 2014

  32. [32]

    Adversarial Defense of Image Classification Using a Variational Auto-Encoder

    Y . Luo and H. Pfister, “Adversarial defense of image classification using a variational auto-encoder,” arXiv preprint arXiv:1812.02891 , 2018

  33. [33]

    Learning multiple layers of features from tiny images,

    A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” 2009

  34. [34]

    Github repository,

    GitHub, “Github repository,” https://github.com/dissectlab/AdV AR-DNN-LCN2025.git, 2025, accessed: July 31, 2025

  35. [35]

    arXiv preprint arXiv:2311.02960 , year=

    P. Wang, X. Li, C. Yaras, Z. Zhu, L. Balzano, W. Hu, and Q. Qu, “Understanding deep representation learning via layerwise feature compression and discrimination,” arXiv preprint arXiv:2311.02960 , 2023. 10 TABLE III: Visualization of selected samples comparing original images vs. generated adversarial examples using intermediate features of layer 20 in th...