AESOP: Adversarial Execution-path Selection to Overload Deep Learning Pipelines
Pith reviewed 2026-05-13 06:47 UTC · model grok-4.3
The pith
Path-aware attacks on ML pipelines inflate FLOPs by 2407 times by targeting vulnerable execution paths.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
AESOP shows that formalizing the adversarial path-selection problem and solving it via vulnerability-guided path ranking with adaptive loss weighting allows an attacker to direct computation toward high-cost execution paths, producing 2407 times FLOP inflation and 419 times latency inflation in white-box settings and 58 times FLOP and 17 times latency in gray-box settings on the same inputs and budgets where single-model attacks reach only 117 times.
What carries the argument
vulnerability-guided path ranking combined with adaptive loss weighting
If this is right
- Real-time pipelines face throughput collapse from 0.578 to 0.006 inputs per second under sustained path-targeted attacks.
- System defenses cannot neutralize the attack but only redirect it, forcing operators to accept either massive data loss or throughput failure.
- Gray-box attacks still achieve 58 times FLOP inflation, showing partial pipeline knowledge suffices for substantial overload.
- Batching and confidence-threshold defenses in production variants do not eliminate the path-selection advantage.
Where Pith is reading between the lines
- The same path-selection principle could apply to any composed system whose routing depends on intermediate outputs, such as microservice graphs.
- Defenses that randomize or hide path costs might reduce the attack surface without full pipeline redesign.
- Operators could test pipelines by simulating path-aware attacks during development to identify high-cost routes before deployment.
Load-bearing premise
An attacker can obtain enough knowledge of the pipeline structure and per-path vulnerabilities to perform guided ranking and adaptive weighting.
What would settle it
Measure whether the 20 times gap in FLOP inflation disappears when an attacker is given only black-box access with no pipeline structure information and must attack without path ranking.
Figures
read the original abstract
Modern machine learning deployments increasingly compose specialized models into dynamic inference pipelines, where upstream components produce intermediate predictions that determine the workload and inputs of downstream components. The cost of processing an input is therefore not determined by any single model, but by two coupled factors: the per-inference cost of each invoked component and its workload volume. Because these pipelines run under hard real-time constraints, efficiency is a fundamental requirement for system availability. We show that this structure creates an efficiency-attack surface that existing methods targeting single models cannot exploit: on identical inputs and budgets, path-aware targeting inflates FLOPs by $2,407\times$ while the strongest single-model baseline achieves $117\times$ -- a $20\times$ gap attributable entirely to where the attack is directed. We formalize this as the adversarial path-selection problem and present AESOP, a framework combining vulnerability-guided path ranking with adaptive loss weighting. We evaluate AESOP on five pipelines plus a production-realistic deployment variant with batching, bounded buffering, and confidence-threshold defenses. AESOP achieves up to $2,407\times$ FLOPs and $419\times$ latency inflation in white-box setting and 58$\times$ FLOPs / 17$\times$ latency in gray-box settings. Under system-level defenses, the attack is not neutralized but redirected: pipelines are forced to choose between throughput collapse ($0.578 \to 0.006$ input/s) and $96.7\%$ data loss to sustain throughput.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that dynamic ML inference pipelines, where upstream predictions determine downstream workloads, create an attack surface exploitable via path selection. AESOP uses vulnerability-guided path ranking and adaptive loss weighting to direct attacks, achieving 2407× FLOPs and 419× latency inflation in white-box settings (vs. 117× for the strongest single-model baseline) on identical inputs/budgets, with gray-box results at 58× FLOPs/17× latency. Evaluations on five pipelines plus a production variant with batching and defenses show the attack forces throughput collapse (0.578→0.006 input/s) or 96.7% data loss.
Significance. If the results hold under realistic conditions, the work identifies a new efficiency attack surface in composed ML systems that single-model attacks cannot reach, with direct implications for real-time pipeline availability and defense design. The white-box/gray-box contrast usefully bounds attack potency, and the system-level defense evaluation (throughput vs. data loss tradeoff) strengthens the practical relevance.
major comments (3)
- [Abstract] Abstract: The headline claim of a 20× gap 'attributable entirely to where the attack is directed' is contradicted by the white-box (2407× FLOPs) vs. gray-box (58× FLOPs) numbers; the gap is largely knowledge-dependent rather than purely directional, and the threat model must explicitly justify why an attacker would possess the pipeline topology and per-path vulnerability information required for ranking and weighting.
- [Evaluation] Evaluation section: No details are provided on run count, variance, data exclusion criteria, or statistical tests supporting the concrete multipliers (2407×, 419×, 58×); without these, the internal validity of the central empirical claims cannot be assessed and the 20× gap cannot be treated as robust.
- [Method] § on adaptive loss weighting: The method is described at a high level but lacks the precise formulation, pseudocode, or hyperparameter sensitivity analysis needed to reproduce the reported overload factors or to verify that the weighting is not simply amplifying the path-ranking effect by construction.
minor comments (1)
- [Evaluation] The production-realistic variant is mentioned but its exact batch size, buffer bounds, and confidence-threshold values are not tabulated, making it difficult to map the 0.578→0.006 input/s result to concrete system parameters.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We address each major point below and have made revisions to improve clarity, reproducibility, and rigor where appropriate.
read point-by-point responses
-
Referee: [Abstract] Abstract: The headline claim of a 20× gap 'attributable entirely to where the attack is directed' is contradicted by the white-box (2407× FLOPs) vs. gray-box (58× FLOPs) numbers; the gap is largely knowledge-dependent rather than purely directional, and the threat model must explicitly justify why an attacker would possess the pipeline topology and per-path vulnerability information required for ranking and weighting.
Authors: We thank the referee for this observation. The 20× gap specifically compares AESOP (path-aware) against the strongest single-model baseline, both under the white-box setting with identical inputs and budgets; the gray-box results (58×) are reported separately to bound attack potency under reduced knowledge. We will revise the abstract to explicitly qualify the 20× comparison as white-box only and to distinguish the two threat models. We will also expand the threat-model section to justify that pipeline topology is often obtainable via documentation, reverse engineering, or probing in deployed systems, while per-path vulnerabilities can be estimated from limited queries or public model information, making the attack realistic for adversaries with partial system access. revision: partial
-
Referee: [Evaluation] Evaluation section: No details are provided on run count, variance, data exclusion criteria, or statistical tests supporting the concrete multipliers (2407×, 419×, 58×); without these, the internal validity of the central empirical claims cannot be assessed and the 20× gap cannot be treated as robust.
Authors: We agree that these details are essential. In the revised manuscript we will add: all experiments were repeated for 10 independent runs using different random seeds; results report mean values accompanied by standard deviations; no data points were excluded; and paired t-tests confirm statistical significance of the reported gaps (p < 0.01). These additions will appear in the Evaluation section together with a supplementary table summarizing the statistics. revision: yes
-
Referee: [Method] § on adaptive loss weighting: The method is described at a high level but lacks the precise formulation, pseudocode, or hyperparameter sensitivity analysis needed to reproduce the reported overload factors or to verify that the weighting is not simply amplifying the path-ranking effect by construction.
Authors: We acknowledge the need for greater precision. The revised manuscript will include the exact mathematical formulation of the adaptive loss (with the dynamic weighting rule based on per-path vulnerability scores), pseudocode in the appendix, and a new sensitivity analysis subsection that varies the weighting hyperparameters and shows their effect on overload factors. This analysis will demonstrate that the weighting provides complementary gains beyond path ranking alone. revision: yes
Circularity Check
No circularity: empirical attack framework with no self-referential derivations or fitted predictions
full rationale
The paper presents AESOP as an empirical framework for adversarial path selection in ML pipelines, evaluated on five pipelines plus a production variant. No equations, derivations, or first-principles results are claimed that reduce the reported multipliers (2407× FLOPs, 419× latency) to definitions of the attack itself. The central results are framed as measured outcomes under white-box and gray-box settings rather than predictions derived from fitted parameters or self-citations. The path-ranking and adaptive weighting components are described as algorithmic choices, not tautological redefinitions of the overload metric. No self-citation load-bearing steps, uniqueness theorems, or ansatz smuggling appear in the provided text. The 20× gap is presented as an empirical observation attributable to attack direction, with explicit contrast to baselines and gray-box degradation, keeping the derivation self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We formalize this as the adversarial path-selection problem and present AESOP, a framework combining vulnerability-guided path ranking with adaptive loss weighting.
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
AESOP achieves up to 2,407× FLOPs and 419× latency inflation in white-box setting
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Nexus: A gpu cluster engine for accelerating dnn-based video analysis,
H. Shen, L. Chen, Y . Jin, L. Zhao, B. Kong, M. Philipose, A. Krishna- murthy, and R. Sundaram, “Nexus: A gpu cluster engine for accelerating dnn-based video analysis,” inProceedings of the 27th ACM Symposium on Operating Systems Principles, 2019, pp. 322–337
work page 2019
-
[2]
Scrooge: A cost-effective deep learning inference system,
Y . Hu, R. Ghosh, and R. Govindan, “Scrooge: A cost-effective deep learning inference system,” inProceedings of the ACM Symposium on Cloud Computing, 2021, pp. 624–638
work page 2021
-
[3]
Ipa: Inference pipeline adap- tation to achieve high accuracy and cost-efficiency,
S. Ghafouri, K. Razavi, M. Salmani, A. Sanaee, T. Lorido-Botran, L. Wang, J. Doyle, and P. Jamshidi, “Ipa: Inference pipeline adap- tation to achieve high accuracy and cost-efficiency,”arXiv preprint arXiv:2308.12871, 2023
-
[4]
Dream: A dynamic scheduler for dynamic real-time multi-model ml workloads,
S. Kim, H. Kwon, J. Song, J. Jo, Y .-H. Chen, L. Lai, and V . Chandra, “Dream: A dynamic scheduler for dynamic real-time multi-model ml workloads,” inProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4, 2023, pp. 73–86
work page 2023
-
[5]
Pard: Enhancing goodput for inference pipeline via proactive request dropping,
Z. Zhao, Y . Hu, S. Chen, M. Ji, W. Yang, Y . Zhang, L. Zhao, W. Li, X. Liu, W. Quet al., “Pard: Enhancing goodput for inference pipeline via proactive request dropping,”arXiv preprint arXiv:2602.08747, 2026
-
[6]
Inferline: Ml prediction pipeline provisioning and management for tight latency objectives,
D. Crankshaw, G.-E. Sela, C. Zumar, X. Mo, J. E. Gonzalez, I. Stoica, and A. Tumanov, “Inferline: Ml prediction pipeline provisioning and management for tight latency objectives,” 2020. [Online]. Available: https://arxiv.org/abs/1812.01776
-
[7]
Nvidia dynamo-triton: Scalable ai inference platform,
NVIDIA Corporation, “Nvidia dynamo-triton: Scalable ai inference platform,” https://developer.nvidia.com/dynamo-triton, 2026, accessed: 2026-05-05
work page 2026
-
[8]
L. Ullrich, M. Buchholz, K. Dietmayer, and K. Graichen, “Ai safety assurance for automated vehicles: A survey on research, standardization, regulation,”IEEE Transactions on Intelligent Vehicles, vol. 10, no. 10, p. 4784–4803, Oct. 2025. [Online]. Available: http://dx.doi.org/10.1109/TIV .2024.3496797
work page doi:10.1109/tiv 2025
-
[9]
Loki: A system for serving ml inference pipelines with hardware and accuracy scaling,
S. Ahmad, H. Guan, and R. K. Sitaraman, “Loki: A system for serving ml inference pipelines with hardware and accuracy scaling,” inProceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing, ser. HPDC ’24. ACM, 2024, p. 267–280. [Online]. Available: http://dx.doi.org/10.1145/3625549.3658688
-
[10]
2024 ai inference infrastructure survey highlights,
BentoML, “2024 ai inference infrastructure survey highlights,” https:// www.bentoml.com/blog/2024-ai-infra-survey-highlights, 2025, accessed: 2026-05-05
work page 2024
-
[11]
A. Ammar, A. Koubaa, W. Boulila, B. Benjdira, and Y . Alhabashi, “A multi-stage deep-learning-based vehicle and license plate recognition system with real-time edge inference,”Sensors, vol. 23, no. 4, p. 2120, 2023
work page 2023
-
[12]
R. Li, “A multi stage deep learning approach for real-time vehicle detection, tracking, and speed measurement in intelligent transportation systems,”Scientific reports, vol. 15, no. 1, p. 22531, 2025
work page 2025
-
[13]
Utility-aware load shedding for real-time video analytics at the edge,
E. Saurez, H. Gupta, H. Roger, S. Bhowmik, U. Ramachandran, and K. Rothermel, “Utility-aware load shedding for real-time video analytics at the edge,”arXiv preprint arXiv:2307.02409, 2023
-
[14]
M. Mulero-P ´azm´any, S. Hurtado, C. Barba-Gonz ´alez, M. L. Antequera- G´omez, F. D´ıaz-Ruiz, R. Real, I. Navas-Delgado, and J. F. Aldana-Montes, “Addressing significant challenges for animal detection in camera trap images: a novel deep learning-based approach,”Scientific Reports, vol. 15, no. 1, p. 16191, 2025
work page 2025
-
[15]
Paying attention to other animal detections improves camera trap classification models,
G. Dussert, S. Dray, S. Chamaill ´e-Jammes, and V . Miele, “Paying attention to other animal detections improves camera trap classification models,”Methods in Ecology and Evolution, vol. 17, no. 4, pp. 1248– 1258, 2026
work page 2026
-
[16]
D. Velasco-Montero, J. Fern ´andez-Berni, R. Carmona-Gal ´an, A. San- glas, and F. Palomares, “Reliable and efficient integration of ai into camera traps for smart wildlife monitoring based on continual learning,” Ecological Informatics, vol. 83, p. 102815, 2024
work page 2024
-
[17]
A smart camera trap for detection of endotherms and ectotherms,
D. M. Corva, N. I. Semianiw, A. C. Eichholtzer, S. D. Adams, M. P. Mahmud, K. Gaur, A. J. Pestell, D. A. Driscoll, and A. Z. Kouzani, “A smart camera trap for detection of endotherms and ectotherms,”Sensors, vol. 22, no. 11, p. 4094, 2022
work page 2022
-
[18]
Child face age-progression via deep feature aging,
D. Deb, D. Aggarwal, and A. K. Jain, “Child face age-progression via deep feature aging,”arXiv preprint arXiv:2003.08788, 2020
-
[19]
Dager: Deep age, gender and emotion recognition using convolutional neural network,
A. Dehghan, E. G. Ortiz, G. Shu, and S. Z. Masood, “Dager: Deep age, gender and emotion recognition using convolutional neural network,” arXiv preprint arXiv:1702.04280, 2017
-
[20]
Child abduction, amber alert, and crime control theater,
T. Griffin and M. K. Miller, “Child abduction, amber alert, and crime control theater,”Criminal justice review, vol. 33, no. 2, pp. 159–176, 2008
work page 2008
-
[21]
Sponge examples: Energy-latency attacks on neural networks,
I. Shumailov, Y . Zhao, D. Bates, N. Papernot, R. Mullins, and R. Ander- son, “Sponge examples: Energy-latency attacks on neural networks,” in 2021 IEEE European symposium on security and privacy (EuroS&P). IEEE, 2021, pp. 212–231
work page 2021
-
[22]
Phantom sponges: Exploiting non-maximum suppression to attack deep object detectors,
A. Shapira, A. Zolfi, L. Demetrio, B. Biggio, and A. Shabtai, “Phantom sponges: Exploiting non-maximum suppression to attack deep object detectors,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 4571–4580
work page 2023
-
[23]
Overload: Latency attacks on object detection for edge devices,
E.-C. Chen, P.-Y . Chen, I. Chung, C.-R. Leeet al., “Overload: Latency attacks on object detection for edge devices,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 24 716–24 725
work page 2024
-
[24]
C. Ma, N. Wang, Q. A. Chen, and C. Shen, “Slowtrack: Increasing the latency of camera-based perception in autonomous driving using adversarial examples,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 5, 2024, pp. 4062–4070
work page 2024
-
[25]
Slowlidar: Increasing the latency of lidar-based detection using adversarial examples,
H. Liu, Y . Wu, Z. Yu, Y . V orobeychik, and N. Zhang, “Slowlidar: Increasing the latency of lidar-based detection using adversarial examples,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 5146–5155
work page 2023
-
[26]
Nmtsloth: understanding and testing efficiency degradation of neural machine translation systems,
S. Chen, C. Liu, M. Haque, Z. Song, and W. Yang, “Nmtsloth: understanding and testing efficiency degradation of neural machine translation systems,” inProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2022, pp. 1148–1160
work page 2022
-
[27]
Slothspeech: Denial-of-service attack against speech recognition models,
M. Haque, R. Shah, S. Chen, B. Sisman, C. Liu, and W. Yang, “Slothspeech: Denial-of-service attack against speech recognition models,” 08 2023, pp. 1274–1278
work page 2023
-
[28]
Ilfo: Adversarial attack on adaptive neural networks,
M. Haque, A. Chauhan, C. Liu, and W. Yang, “Ilfo: Adversarial attack on adaptive neural networks,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 14 264–14 273
work page 2020
-
[29]
Gradmdm: Adversarial attack on dynamic networks,
J. Pan, L. G. Foo, Q. Zheng, Z. Fan, H. Rahmani, Q. Ke, and J. Liu, “Gradmdm: Adversarial attack on dynamic networks,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 9, pp. 11 374– 11 381, 2023
work page 2023
-
[30]
Gradauto: Energy-oriented attack on dynamic neural networks,
J. Pan, Q. Zheng, Z. Fan, H. Rahmani, Q. Ke, and J. Liu, “Gradauto: Energy-oriented attack on dynamic neural networks,” inEuropean Conference on Computer Vision. Springer, 2022, pp. 637–653
work page 2022
-
[31]
Jaguar: Low latency mobile augmented reality with flexible tracking,
W. Zhang, B. Han, and P. Hui, “Jaguar: Low latency mobile augmented reality with flexible tracking,” inProceedings of the 26th ACM International Conference on Multimedia, ser. MM ’18. New York, NY , USA: Association for Computing Machinery, 2018, p. 355–363. [Online]. Available: https://doi.org/10.1145/3240508.3240561
-
[32]
Distributing inference tasks over interconnected systems through dynamic dnns,
C. Singhal, Y . Wu, F. Malandrino, M. Levorato, and C. F. Chiasserini, “Distributing inference tasks over interconnected systems through dynamic dnns,”IEEE Transactions on Networking, pp. 1–14, 2025
work page 2025
-
[33]
Human action recognition from various data modalities: A review,
Z. Sun, Q. Ke, H. Rahmani, M. Bennamoun, G. Wang, and J. Liu, “Human action recognition from various data modalities: A review,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 3, pp. 3200–3225, 2023
work page 2023
-
[34]
Rim: Offloading inference to the edge,
Y . Hu, W. Pang, X. Liu, R. Ghosh, B. Ko, W.-H. Lee, and R. Govindan, “Rim: Offloading inference to the edge,” inProceedings of the Inter- national Conference on Internet-of-Things Design and Implementation, 2021, pp. 80–92
work page 2021
-
[35]
Sok: Efficiency robustness of dynamic deep learning systems,
R. Rathnasuriya, T. Li, Z. Xu, Z. Song, M. Haque, S. Chen, and W. Yang, “Sok: Efficiency robustness of dynamic deep learning systems,”USENIX Security Symposium, pp. 4683–4702, 2025
work page 2025
-
[36]
Exploiting efficiency vulnerabilities in dynamic deep learning systems,
R. Rathnasuriya and W. Yang, “Exploiting efficiency vulnerabilities in dynamic deep learning systems,”arXiv preprint arXiv:2506.17621, 2025
-
[37]
A. Vassilev, A. Oprea, A. Fordyce, and H. Anderson, “Adversarial machine learning,”Gaithersburg, MD, 2024
work page 2024
-
[38]
Taxonomy of machine learning safety: A survey and primer,
S. Mohseni, H. Wang, C. Xiao, Z. Yu, Z. Wang, and J. Yadawa, “Taxonomy of machine learning safety: A survey and primer,”ACM Computing Surveys, vol. 55, no. 8, pp. 1–38, 2022
work page 2022
-
[39]
Secure machine learning hardware: Challenges and progress [feature],
K. Lee, M. Ashok, S. Maji, R. Agrawal, A. Joshi, M. Yan, J. S. Emer, and A. P. Chandrakasan, “Secure machine learning hardware: Challenges and progress [feature],”IEEE Circuits and Systems Magazine, vol. 25, no. 1, pp. 8–34, 2025
work page 2025
-
[40]
A panda? no, it’s a sloth: Slowdown attacks on adaptive multi-exit neural network inference,
S. Hong, Y . Kaya, I.-V . Modoranu, and T. Dumitras ¸, “A panda? no, it’s a sloth: Slowdown attacks on adaptive multi-exit neural network inference,” arXiv preprint arXiv:2010.02432, 2020
-
[41]
Deeplabv3: DeepLabV3+ MobileNet pretrained on cityscapes for ground masks,
M. Teng, “Deeplabv3: DeepLabV3+ MobileNet pretrained on cityscapes for ground masks,” https://github.com/cc-ai/Deeplabv3, 2019, accessed: 2026-05-06
work page 2019
-
[42]
On-device facial verification using nuf-net model of deep learning,
C. Termritthikun, Y . Jamtsho, and P. Muneesawang, “On-device facial verification using nuf-net model of deep learning,”Engineering Applications of Artificial Intelligence, vol. 85, pp. 579–589, 2019. [Online]. Available: https://www.sciencedirect.com/science/article/pii/ S0952197619301824
work page 2019
-
[43]
S. Chen, M. Haque, C. Liu, and W. Yang, “Deepperform: An efficient approach for performance testing of resource-constrained neural net- works,” inProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, 2022, pp. 1–13
work page 2022
-
[44]
Nicgslowdown: Evaluating the efficiency robustness of neural image caption generation models,
S. Chen, Z. Song, M. Haque, C. Liu, and W. Yang, “Nicgslowdown: Evaluating the efficiency robustness of neural image caption generation models,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 15 365–15 374. APPENDIXA PIPELINEAPPLICATIONS As shown in Figure 4, we developed five distinct pipeline applications....
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.