AESOP: Adversarial Execution-path Selection to Overload Deep Learning Pipelines

Mingfang Ji; Ravishka Shemal Rathnasuriya; Simin Chen; Tingxi Li; Wei Yang; Yitao Hu

arxiv: 2605.10987 · v1 · submitted 2026-05-09 · 💻 cs.LG · cs.AI· cs.CR

AESOP: Adversarial Execution-path Selection to Overload Deep Learning Pipelines

Tingxi Li , Mingfang Ji , Ravishka Shemal Rathnasuriya , Simin Chen , Yitao Hu , Wei Yang This is my paper

Pith reviewed 2026-05-13 06:47 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CR

keywords adversarial attacksinference pipelinesexecution path selectionmodel overloaddeep learning efficiencyreal-time systems

0 comments

The pith

Path-aware attacks on ML pipelines inflate FLOPs by 2407 times by targeting vulnerable execution paths.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that dynamic machine learning pipelines, where upstream model outputs route work to downstream components, create a new attack surface based on execution path selection. Existing single-model attacks cannot exploit this because they do not account for how input-dependent routing multiplies workload volume and per-component costs. AESOP formalizes the adversarial path-selection problem and solves it with vulnerability-guided ranking plus adaptive loss weighting, showing that path-directed attacks produce 20 times more overload than the best single-model baselines on identical inputs. The work evaluates the method on multiple pipelines including production-like variants with batching and defenses, measuring extreme resource inflation in both white-box and gray-box settings. If correct, this means pipeline operators must defend against path choice rather than isolated model vulnerabilities to preserve real-time availability.

Core claim

AESOP shows that formalizing the adversarial path-selection problem and solving it via vulnerability-guided path ranking with adaptive loss weighting allows an attacker to direct computation toward high-cost execution paths, producing 2407 times FLOP inflation and 419 times latency inflation in white-box settings and 58 times FLOP and 17 times latency in gray-box settings on the same inputs and budgets where single-model attacks reach only 117 times.

What carries the argument

vulnerability-guided path ranking combined with adaptive loss weighting

If this is right

Real-time pipelines face throughput collapse from 0.578 to 0.006 inputs per second under sustained path-targeted attacks.
System defenses cannot neutralize the attack but only redirect it, forcing operators to accept either massive data loss or throughput failure.
Gray-box attacks still achieve 58 times FLOP inflation, showing partial pipeline knowledge suffices for substantial overload.
Batching and confidence-threshold defenses in production variants do not eliminate the path-selection advantage.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same path-selection principle could apply to any composed system whose routing depends on intermediate outputs, such as microservice graphs.
Defenses that randomize or hide path costs might reduce the attack surface without full pipeline redesign.
Operators could test pipelines by simulating path-aware attacks during development to identify high-cost routes before deployment.

Load-bearing premise

An attacker can obtain enough knowledge of the pipeline structure and per-path vulnerabilities to perform guided ranking and adaptive weighting.

What would settle it

Measure whether the 20 times gap in FLOP inflation disappears when an attacker is given only black-box access with no pipeline structure information and must attack without path ranking.

Figures

Figures reproduced from arXiv: 2605.10987 by Mingfang Ji, Ravishka Shemal Rathnasuriya, Simin Chen, Tingxi Li, Wei Yang, Yitao Hu.

**Figure 2.** Figure 2: Approach Overview. neural component and edge edge e ∈ E carries data between components through an inter-process queue. Each component v exhibits three input-dependent behaviors: a per-inference cost cv, an output cardinality ov that determines downstream workload, and a gating function gv that may forward, drop, or route inputs based on predicted labels, confidence thresholds, or shape constraints. The to… view at source ↗

**Figure 3.** Figure 3: Traffic-monitoring pipeline in two configurations. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Pipeline applications used in evaluation. Implementation [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

read the original abstract

Modern machine learning deployments increasingly compose specialized models into dynamic inference pipelines, where upstream components produce intermediate predictions that determine the workload and inputs of downstream components. The cost of processing an input is therefore not determined by any single model, but by two coupled factors: the per-inference cost of each invoked component and its workload volume. Because these pipelines run under hard real-time constraints, efficiency is a fundamental requirement for system availability. We show that this structure creates an efficiency-attack surface that existing methods targeting single models cannot exploit: on identical inputs and budgets, path-aware targeting inflates FLOPs by $2,407\times$ while the strongest single-model baseline achieves $117\times$ -- a $20\times$ gap attributable entirely to where the attack is directed. We formalize this as the adversarial path-selection problem and present AESOP, a framework combining vulnerability-guided path ranking with adaptive loss weighting. We evaluate AESOP on five pipelines plus a production-realistic deployment variant with batching, bounded buffering, and confidence-threshold defenses. AESOP achieves up to $2,407\times$ FLOPs and $419\times$ latency inflation in white-box setting and 58$\times$ FLOPs / 17$\times$ latency in gray-box settings. Under system-level defenses, the attack is not neutralized but redirected: pipelines are forced to choose between throughput collapse ($0.578 \to 0.006$ input/s) and $96.7\%$ data loss to sustain throughput.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper claims that dynamic ML inference pipelines, where upstream predictions determine downstream workloads, create an attack surface exploitable via path selection. AESOP uses vulnerability-guided path ranking and adaptive loss weighting to direct attacks, achieving 2407× FLOPs and 419× latency inflation in white-box settings (vs. 117× for the strongest single-model baseline) on identical inputs/budgets, with gray-box results at 58× FLOPs/17× latency. Evaluations on five pipelines plus a production variant with batching and defenses show the attack forces throughput collapse (0.578→0.006 input/s) or 96.7% data loss.

Significance. If the results hold under realistic conditions, the work identifies a new efficiency attack surface in composed ML systems that single-model attacks cannot reach, with direct implications for real-time pipeline availability and defense design. The white-box/gray-box contrast usefully bounds attack potency, and the system-level defense evaluation (throughput vs. data loss tradeoff) strengthens the practical relevance.

major comments (3)

[Abstract] Abstract: The headline claim of a 20× gap 'attributable entirely to where the attack is directed' is contradicted by the white-box (2407× FLOPs) vs. gray-box (58× FLOPs) numbers; the gap is largely knowledge-dependent rather than purely directional, and the threat model must explicitly justify why an attacker would possess the pipeline topology and per-path vulnerability information required for ranking and weighting.
[Evaluation] Evaluation section: No details are provided on run count, variance, data exclusion criteria, or statistical tests supporting the concrete multipliers (2407×, 419×, 58×); without these, the internal validity of the central empirical claims cannot be assessed and the 20× gap cannot be treated as robust.
[Method] § on adaptive loss weighting: The method is described at a high level but lacks the precise formulation, pseudocode, or hyperparameter sensitivity analysis needed to reproduce the reported overload factors or to verify that the weighting is not simply amplifying the path-ranking effect by construction.

minor comments (1)

[Evaluation] The production-realistic variant is mentioned but its exact batch size, buffer bounds, and confidence-threshold values are not tabulated, making it difficult to map the 0.578→0.006 input/s result to concrete system parameters.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below and have made revisions to improve clarity, reproducibility, and rigor where appropriate.

read point-by-point responses

Referee: [Abstract] Abstract: The headline claim of a 20× gap 'attributable entirely to where the attack is directed' is contradicted by the white-box (2407× FLOPs) vs. gray-box (58× FLOPs) numbers; the gap is largely knowledge-dependent rather than purely directional, and the threat model must explicitly justify why an attacker would possess the pipeline topology and per-path vulnerability information required for ranking and weighting.

Authors: We thank the referee for this observation. The 20× gap specifically compares AESOP (path-aware) against the strongest single-model baseline, both under the white-box setting with identical inputs and budgets; the gray-box results (58×) are reported separately to bound attack potency under reduced knowledge. We will revise the abstract to explicitly qualify the 20× comparison as white-box only and to distinguish the two threat models. We will also expand the threat-model section to justify that pipeline topology is often obtainable via documentation, reverse engineering, or probing in deployed systems, while per-path vulnerabilities can be estimated from limited queries or public model information, making the attack realistic for adversaries with partial system access. revision: partial
Referee: [Evaluation] Evaluation section: No details are provided on run count, variance, data exclusion criteria, or statistical tests supporting the concrete multipliers (2407×, 419×, 58×); without these, the internal validity of the central empirical claims cannot be assessed and the 20× gap cannot be treated as robust.

Authors: We agree that these details are essential. In the revised manuscript we will add: all experiments were repeated for 10 independent runs using different random seeds; results report mean values accompanied by standard deviations; no data points were excluded; and paired t-tests confirm statistical significance of the reported gaps (p < 0.01). These additions will appear in the Evaluation section together with a supplementary table summarizing the statistics. revision: yes
Referee: [Method] § on adaptive loss weighting: The method is described at a high level but lacks the precise formulation, pseudocode, or hyperparameter sensitivity analysis needed to reproduce the reported overload factors or to verify that the weighting is not simply amplifying the path-ranking effect by construction.

Authors: We acknowledge the need for greater precision. The revised manuscript will include the exact mathematical formulation of the adaptive loss (with the dynamic weighting rule based on per-path vulnerability scores), pseudocode in the appendix, and a new sensitivity analysis subsection that varies the weighting hyperparameters and shows their effect on overload factors. This analysis will demonstrate that the weighting provides complementary gains beyond path ranking alone. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical attack framework with no self-referential derivations or fitted predictions

full rationale

The paper presents AESOP as an empirical framework for adversarial path selection in ML pipelines, evaluated on five pipelines plus a production variant. No equations, derivations, or first-principles results are claimed that reduce the reported multipliers (2407× FLOPs, 419× latency) to definitions of the attack itself. The central results are framed as measured outcomes under white-box and gray-box settings rather than predictions derived from fitted parameters or self-citations. The path-ranking and adaptive weighting components are described as algorithmic choices, not tautological redefinitions of the overload metric. No self-citation load-bearing steps, uniqueness theorems, or ansatz smuggling appear in the provided text. The 20× gap is presented as an empirical observation attributable to attack direction, with explicit contrast to baselines and gray-box degradation, keeping the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the framework is described as a combination of path ranking and loss weighting without further decomposition.

pith-pipeline@v0.9.0 · 5590 in / 1100 out tokens · 78428 ms · 2026-05-13T06:47:45.546713+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We formalize this as the adversarial path-selection problem and present AESOP, a framework combining vulnerability-guided path ranking with adaptive loss weighting.
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

AESOP achieves up to 2,407× FLOPs and 419× latency inflation in white-box setting

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages

[1]

Nexus: A gpu cluster engine for accelerating dnn-based video analysis,

H. Shen, L. Chen, Y . Jin, L. Zhao, B. Kong, M. Philipose, A. Krishna- murthy, and R. Sundaram, “Nexus: A gpu cluster engine for accelerating dnn-based video analysis,” inProceedings of the 27th ACM Symposium on Operating Systems Principles, 2019, pp. 322–337

work page 2019
[2]

Scrooge: A cost-effective deep learning inference system,

Y . Hu, R. Ghosh, and R. Govindan, “Scrooge: A cost-effective deep learning inference system,” inProceedings of the ACM Symposium on Cloud Computing, 2021, pp. 624–638

work page 2021
[3]

Ipa: Inference pipeline adap- tation to achieve high accuracy and cost-efficiency,

S. Ghafouri, K. Razavi, M. Salmani, A. Sanaee, T. Lorido-Botran, L. Wang, J. Doyle, and P. Jamshidi, “Ipa: Inference pipeline adap- tation to achieve high accuracy and cost-efficiency,”arXiv preprint arXiv:2308.12871, 2023

work page arXiv 2023
[4]

Dream: A dynamic scheduler for dynamic real-time multi-model ml workloads,

S. Kim, H. Kwon, J. Song, J. Jo, Y .-H. Chen, L. Lai, and V . Chandra, “Dream: A dynamic scheduler for dynamic real-time multi-model ml workloads,” inProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4, 2023, pp. 73–86

work page 2023
[5]

Pard: Enhancing goodput for inference pipeline via proactive request dropping,

Z. Zhao, Y . Hu, S. Chen, M. Ji, W. Yang, Y . Zhang, L. Zhao, W. Li, X. Liu, W. Quet al., “Pard: Enhancing goodput for inference pipeline via proactive request dropping,”arXiv preprint arXiv:2602.08747, 2026

work page arXiv 2026
[6]

Inferline: Ml prediction pipeline provisioning and management for tight latency objectives,

D. Crankshaw, G.-E. Sela, C. Zumar, X. Mo, J. E. Gonzalez, I. Stoica, and A. Tumanov, “Inferline: Ml prediction pipeline provisioning and management for tight latency objectives,” 2020. [Online]. Available: https://arxiv.org/abs/1812.01776

work page arXiv 2020
[7]

Nvidia dynamo-triton: Scalable ai inference platform,

NVIDIA Corporation, “Nvidia dynamo-triton: Scalable ai inference platform,” https://developer.nvidia.com/dynamo-triton, 2026, accessed: 2026-05-05

work page 2026
[8]

Mrdjan Jankovic

L. Ullrich, M. Buchholz, K. Dietmayer, and K. Graichen, “Ai safety assurance for automated vehicles: A survey on research, standardization, regulation,”IEEE Transactions on Intelligent Vehicles, vol. 10, no. 10, p. 4784–4803, Oct. 2025. [Online]. Available: http://dx.doi.org/10.1109/TIV .2024.3496797

work page doi:10.1109/tiv 2025
[9]

Loki: A system for serving ml inference pipelines with hardware and accuracy scaling,

S. Ahmad, H. Guan, and R. K. Sitaraman, “Loki: A system for serving ml inference pipelines with hardware and accuracy scaling,” inProceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing, ser. HPDC ’24. ACM, 2024, p. 267–280. [Online]. Available: http://dx.doi.org/10.1145/3625549.3658688

work page doi:10.1145/3625549.3658688 2024
[10]

2024 ai inference infrastructure survey highlights,

BentoML, “2024 ai inference infrastructure survey highlights,” https:// www.bentoml.com/blog/2024-ai-infra-survey-highlights, 2025, accessed: 2026-05-05

work page 2024
[11]

A multi-stage deep-learning-based vehicle and license plate recognition system with real-time edge inference,

A. Ammar, A. Koubaa, W. Boulila, B. Benjdira, and Y . Alhabashi, “A multi-stage deep-learning-based vehicle and license plate recognition system with real-time edge inference,”Sensors, vol. 23, no. 4, p. 2120, 2023

work page 2023
[12]

A multi stage deep learning approach for real-time vehicle detection, tracking, and speed measurement in intelligent transportation systems,

R. Li, “A multi stage deep learning approach for real-time vehicle detection, tracking, and speed measurement in intelligent transportation systems,”Scientific reports, vol. 15, no. 1, p. 22531, 2025

work page 2025
[13]

Utility-aware load shedding for real-time video analytics at the edge,

E. Saurez, H. Gupta, H. Roger, S. Bhowmik, U. Ramachandran, and K. Rothermel, “Utility-aware load shedding for real-time video analytics at the edge,”arXiv preprint arXiv:2307.02409, 2023

work page arXiv 2023
[14]

Addressing significant challenges for animal detection in camera trap images: a novel deep learning-based approach,

M. Mulero-P ´azm´any, S. Hurtado, C. Barba-Gonz ´alez, M. L. Antequera- G´omez, F. D´ıaz-Ruiz, R. Real, I. Navas-Delgado, and J. F. Aldana-Montes, “Addressing significant challenges for animal detection in camera trap images: a novel deep learning-based approach,”Scientific Reports, vol. 15, no. 1, p. 16191, 2025

work page 2025
[15]

Paying attention to other animal detections improves camera trap classification models,

G. Dussert, S. Dray, S. Chamaill ´e-Jammes, and V . Miele, “Paying attention to other animal detections improves camera trap classification models,”Methods in Ecology and Evolution, vol. 17, no. 4, pp. 1248– 1258, 2026

work page 2026
[16]

Reliable and efficient integration of ai into camera traps for smart wildlife monitoring based on continual learning,

D. Velasco-Montero, J. Fern ´andez-Berni, R. Carmona-Gal ´an, A. San- glas, and F. Palomares, “Reliable and efficient integration of ai into camera traps for smart wildlife monitoring based on continual learning,” Ecological Informatics, vol. 83, p. 102815, 2024

work page 2024
[17]

A smart camera trap for detection of endotherms and ectotherms,

D. M. Corva, N. I. Semianiw, A. C. Eichholtzer, S. D. Adams, M. P. Mahmud, K. Gaur, A. J. Pestell, D. A. Driscoll, and A. Z. Kouzani, “A smart camera trap for detection of endotherms and ectotherms,”Sensors, vol. 22, no. 11, p. 4094, 2022

work page 2022
[18]

Child face age-progression via deep feature aging,

D. Deb, D. Aggarwal, and A. K. Jain, “Child face age-progression via deep feature aging,”arXiv preprint arXiv:2003.08788, 2020

work page arXiv 2003
[19]

Dager: Deep age, gender and emotion recognition using convolutional neural network,

A. Dehghan, E. G. Ortiz, G. Shu, and S. Z. Masood, “Dager: Deep age, gender and emotion recognition using convolutional neural network,” arXiv preprint arXiv:1702.04280, 2017

work page arXiv 2017
[20]

Child abduction, amber alert, and crime control theater,

T. Griffin and M. K. Miller, “Child abduction, amber alert, and crime control theater,”Criminal justice review, vol. 33, no. 2, pp. 159–176, 2008

work page 2008
[21]

Sponge examples: Energy-latency attacks on neural networks,

I. Shumailov, Y . Zhao, D. Bates, N. Papernot, R. Mullins, and R. Ander- son, “Sponge examples: Energy-latency attacks on neural networks,” in 2021 IEEE European symposium on security and privacy (EuroS&P). IEEE, 2021, pp. 212–231

work page 2021
[22]

Phantom sponges: Exploiting non-maximum suppression to attack deep object detectors,

A. Shapira, A. Zolfi, L. Demetrio, B. Biggio, and A. Shabtai, “Phantom sponges: Exploiting non-maximum suppression to attack deep object detectors,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 4571–4580

work page 2023
[23]

Overload: Latency attacks on object detection for edge devices,

E.-C. Chen, P.-Y . Chen, I. Chung, C.-R. Leeet al., “Overload: Latency attacks on object detection for edge devices,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 24 716–24 725

work page 2024
[24]

Slowtrack: Increasing the latency of camera-based perception in autonomous driving using adversarial examples,

C. Ma, N. Wang, Q. A. Chen, and C. Shen, “Slowtrack: Increasing the latency of camera-based perception in autonomous driving using adversarial examples,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 5, 2024, pp. 4062–4070

work page 2024
[25]

Slowlidar: Increasing the latency of lidar-based detection using adversarial examples,

H. Liu, Y . Wu, Z. Yu, Y . V orobeychik, and N. Zhang, “Slowlidar: Increasing the latency of lidar-based detection using adversarial examples,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 5146–5155

work page 2023
[26]

Nmtsloth: understanding and testing efficiency degradation of neural machine translation systems,

S. Chen, C. Liu, M. Haque, Z. Song, and W. Yang, “Nmtsloth: understanding and testing efficiency degradation of neural machine translation systems,” inProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2022, pp. 1148–1160

work page 2022
[27]

Slothspeech: Denial-of-service attack against speech recognition models,

M. Haque, R. Shah, S. Chen, B. Sisman, C. Liu, and W. Yang, “Slothspeech: Denial-of-service attack against speech recognition models,” 08 2023, pp. 1274–1278

work page 2023
[28]

Ilfo: Adversarial attack on adaptive neural networks,

M. Haque, A. Chauhan, C. Liu, and W. Yang, “Ilfo: Adversarial attack on adaptive neural networks,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 14 264–14 273

work page 2020
[29]

Gradmdm: Adversarial attack on dynamic networks,

J. Pan, L. G. Foo, Q. Zheng, Z. Fan, H. Rahmani, Q. Ke, and J. Liu, “Gradmdm: Adversarial attack on dynamic networks,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 9, pp. 11 374– 11 381, 2023

work page 2023
[30]

Gradauto: Energy-oriented attack on dynamic neural networks,

J. Pan, Q. Zheng, Z. Fan, H. Rahmani, Q. Ke, and J. Liu, “Gradauto: Energy-oriented attack on dynamic neural networks,” inEuropean Conference on Computer Vision. Springer, 2022, pp. 637–653

work page 2022
[31]

Jaguar: Low latency mobile augmented reality with flexible tracking,

W. Zhang, B. Han, and P. Hui, “Jaguar: Low latency mobile augmented reality with flexible tracking,” inProceedings of the 26th ACM International Conference on Multimedia, ser. MM ’18. New York, NY , USA: Association for Computing Machinery, 2018, p. 355–363. [Online]. Available: https://doi.org/10.1145/3240508.3240561

work page doi:10.1145/3240508.3240561 2018
[32]

Distributing inference tasks over interconnected systems through dynamic dnns,

C. Singhal, Y . Wu, F. Malandrino, M. Levorato, and C. F. Chiasserini, “Distributing inference tasks over interconnected systems through dynamic dnns,”IEEE Transactions on Networking, pp. 1–14, 2025

work page 2025
[33]

Human action recognition from various data modalities: A review,

Z. Sun, Q. Ke, H. Rahmani, M. Bennamoun, G. Wang, and J. Liu, “Human action recognition from various data modalities: A review,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 3, pp. 3200–3225, 2023

work page 2023
[34]

Rim: Offloading inference to the edge,

Y . Hu, W. Pang, X. Liu, R. Ghosh, B. Ko, W.-H. Lee, and R. Govindan, “Rim: Offloading inference to the edge,” inProceedings of the Inter- national Conference on Internet-of-Things Design and Implementation, 2021, pp. 80–92

work page 2021
[35]

Sok: Efficiency robustness of dynamic deep learning systems,

R. Rathnasuriya, T. Li, Z. Xu, Z. Song, M. Haque, S. Chen, and W. Yang, “Sok: Efficiency robustness of dynamic deep learning systems,”USENIX Security Symposium, pp. 4683–4702, 2025

work page 2025
[36]

Exploiting efficiency vulnerabilities in dynamic deep learning systems,

R. Rathnasuriya and W. Yang, “Exploiting efficiency vulnerabilities in dynamic deep learning systems,”arXiv preprint arXiv:2506.17621, 2025

work page arXiv 2025
[37]

Adversarial machine learning,

A. Vassilev, A. Oprea, A. Fordyce, and H. Anderson, “Adversarial machine learning,”Gaithersburg, MD, 2024

work page 2024
[38]

Taxonomy of machine learning safety: A survey and primer,

S. Mohseni, H. Wang, C. Xiao, Z. Yu, Z. Wang, and J. Yadawa, “Taxonomy of machine learning safety: A survey and primer,”ACM Computing Surveys, vol. 55, no. 8, pp. 1–38, 2022

work page 2022
[39]

Secure machine learning hardware: Challenges and progress [feature],

K. Lee, M. Ashok, S. Maji, R. Agrawal, A. Joshi, M. Yan, J. S. Emer, and A. P. Chandrakasan, “Secure machine learning hardware: Challenges and progress [feature],”IEEE Circuits and Systems Magazine, vol. 25, no. 1, pp. 8–34, 2025

work page 2025
[40]

A panda? no, it’s a sloth: Slowdown attacks on adaptive multi-exit neural network inference,

S. Hong, Y . Kaya, I.-V . Modoranu, and T. Dumitras ¸, “A panda? no, it’s a sloth: Slowdown attacks on adaptive multi-exit neural network inference,” arXiv preprint arXiv:2010.02432, 2020

work page arXiv 2010
[41]

Deeplabv3: DeepLabV3+ MobileNet pretrained on cityscapes for ground masks,

M. Teng, “Deeplabv3: DeepLabV3+ MobileNet pretrained on cityscapes for ground masks,” https://github.com/cc-ai/Deeplabv3, 2019, accessed: 2026-05-06

work page 2019
[42]

On-device facial verification using nuf-net model of deep learning,

C. Termritthikun, Y . Jamtsho, and P. Muneesawang, “On-device facial verification using nuf-net model of deep learning,”Engineering Applications of Artificial Intelligence, vol. 85, pp. 579–589, 2019. [Online]. Available: https://www.sciencedirect.com/science/article/pii/ S0952197619301824

work page 2019
[43]

Deepperform: An efficient approach for performance testing of resource-constrained neural net- works,

S. Chen, M. Haque, C. Liu, and W. Yang, “Deepperform: An efficient approach for performance testing of resource-constrained neural net- works,” inProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, 2022, pp. 1–13

work page 2022
[44]

Nicgslowdown: Evaluating the efficiency robustness of neural image caption generation models,

S. Chen, Z. Song, M. Haque, C. Liu, and W. Yang, “Nicgslowdown: Evaluating the efficiency robustness of neural image caption generation models,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 15 365–15 374. APPENDIXA PIPELINEAPPLICATIONS As shown in Figure 4, we developed five distinct pipeline applications....

work page 2022

[1] [1]

Nexus: A gpu cluster engine for accelerating dnn-based video analysis,

H. Shen, L. Chen, Y . Jin, L. Zhao, B. Kong, M. Philipose, A. Krishna- murthy, and R. Sundaram, “Nexus: A gpu cluster engine for accelerating dnn-based video analysis,” inProceedings of the 27th ACM Symposium on Operating Systems Principles, 2019, pp. 322–337

work page 2019

[2] [2]

Scrooge: A cost-effective deep learning inference system,

Y . Hu, R. Ghosh, and R. Govindan, “Scrooge: A cost-effective deep learning inference system,” inProceedings of the ACM Symposium on Cloud Computing, 2021, pp. 624–638

work page 2021

[3] [3]

Ipa: Inference pipeline adap- tation to achieve high accuracy and cost-efficiency,

S. Ghafouri, K. Razavi, M. Salmani, A. Sanaee, T. Lorido-Botran, L. Wang, J. Doyle, and P. Jamshidi, “Ipa: Inference pipeline adap- tation to achieve high accuracy and cost-efficiency,”arXiv preprint arXiv:2308.12871, 2023

work page arXiv 2023

[4] [4]

Dream: A dynamic scheduler for dynamic real-time multi-model ml workloads,

S. Kim, H. Kwon, J. Song, J. Jo, Y .-H. Chen, L. Lai, and V . Chandra, “Dream: A dynamic scheduler for dynamic real-time multi-model ml workloads,” inProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4, 2023, pp. 73–86

work page 2023

[5] [5]

Pard: Enhancing goodput for inference pipeline via proactive request dropping,

Z. Zhao, Y . Hu, S. Chen, M. Ji, W. Yang, Y . Zhang, L. Zhao, W. Li, X. Liu, W. Quet al., “Pard: Enhancing goodput for inference pipeline via proactive request dropping,”arXiv preprint arXiv:2602.08747, 2026

work page arXiv 2026

[6] [6]

Inferline: Ml prediction pipeline provisioning and management for tight latency objectives,

D. Crankshaw, G.-E. Sela, C. Zumar, X. Mo, J. E. Gonzalez, I. Stoica, and A. Tumanov, “Inferline: Ml prediction pipeline provisioning and management for tight latency objectives,” 2020. [Online]. Available: https://arxiv.org/abs/1812.01776

work page arXiv 2020

[7] [7]

Nvidia dynamo-triton: Scalable ai inference platform,

NVIDIA Corporation, “Nvidia dynamo-triton: Scalable ai inference platform,” https://developer.nvidia.com/dynamo-triton, 2026, accessed: 2026-05-05

work page 2026

[8] [8]

Mrdjan Jankovic

L. Ullrich, M. Buchholz, K. Dietmayer, and K. Graichen, “Ai safety assurance for automated vehicles: A survey on research, standardization, regulation,”IEEE Transactions on Intelligent Vehicles, vol. 10, no. 10, p. 4784–4803, Oct. 2025. [Online]. Available: http://dx.doi.org/10.1109/TIV .2024.3496797

work page doi:10.1109/tiv 2025

[9] [9]

Loki: A system for serving ml inference pipelines with hardware and accuracy scaling,

S. Ahmad, H. Guan, and R. K. Sitaraman, “Loki: A system for serving ml inference pipelines with hardware and accuracy scaling,” inProceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing, ser. HPDC ’24. ACM, 2024, p. 267–280. [Online]. Available: http://dx.doi.org/10.1145/3625549.3658688

work page doi:10.1145/3625549.3658688 2024

[10] [10]

2024 ai inference infrastructure survey highlights,

BentoML, “2024 ai inference infrastructure survey highlights,” https:// www.bentoml.com/blog/2024-ai-infra-survey-highlights, 2025, accessed: 2026-05-05

work page 2024

[11] [11]

A multi-stage deep-learning-based vehicle and license plate recognition system with real-time edge inference,

A. Ammar, A. Koubaa, W. Boulila, B. Benjdira, and Y . Alhabashi, “A multi-stage deep-learning-based vehicle and license plate recognition system with real-time edge inference,”Sensors, vol. 23, no. 4, p. 2120, 2023

work page 2023

[12] [12]

A multi stage deep learning approach for real-time vehicle detection, tracking, and speed measurement in intelligent transportation systems,

R. Li, “A multi stage deep learning approach for real-time vehicle detection, tracking, and speed measurement in intelligent transportation systems,”Scientific reports, vol. 15, no. 1, p. 22531, 2025

work page 2025

[13] [13]

Utility-aware load shedding for real-time video analytics at the edge,

E. Saurez, H. Gupta, H. Roger, S. Bhowmik, U. Ramachandran, and K. Rothermel, “Utility-aware load shedding for real-time video analytics at the edge,”arXiv preprint arXiv:2307.02409, 2023

work page arXiv 2023

[14] [14]

Addressing significant challenges for animal detection in camera trap images: a novel deep learning-based approach,

M. Mulero-P ´azm´any, S. Hurtado, C. Barba-Gonz ´alez, M. L. Antequera- G´omez, F. D´ıaz-Ruiz, R. Real, I. Navas-Delgado, and J. F. Aldana-Montes, “Addressing significant challenges for animal detection in camera trap images: a novel deep learning-based approach,”Scientific Reports, vol. 15, no. 1, p. 16191, 2025

work page 2025

[15] [15]

Paying attention to other animal detections improves camera trap classification models,

G. Dussert, S. Dray, S. Chamaill ´e-Jammes, and V . Miele, “Paying attention to other animal detections improves camera trap classification models,”Methods in Ecology and Evolution, vol. 17, no. 4, pp. 1248– 1258, 2026

work page 2026

[16] [16]

Reliable and efficient integration of ai into camera traps for smart wildlife monitoring based on continual learning,

D. Velasco-Montero, J. Fern ´andez-Berni, R. Carmona-Gal ´an, A. San- glas, and F. Palomares, “Reliable and efficient integration of ai into camera traps for smart wildlife monitoring based on continual learning,” Ecological Informatics, vol. 83, p. 102815, 2024

work page 2024

[17] [17]

A smart camera trap for detection of endotherms and ectotherms,

D. M. Corva, N. I. Semianiw, A. C. Eichholtzer, S. D. Adams, M. P. Mahmud, K. Gaur, A. J. Pestell, D. A. Driscoll, and A. Z. Kouzani, “A smart camera trap for detection of endotherms and ectotherms,”Sensors, vol. 22, no. 11, p. 4094, 2022

work page 2022

[18] [18]

Child face age-progression via deep feature aging,

D. Deb, D. Aggarwal, and A. K. Jain, “Child face age-progression via deep feature aging,”arXiv preprint arXiv:2003.08788, 2020

work page arXiv 2003

[19] [19]

Dager: Deep age, gender and emotion recognition using convolutional neural network,

A. Dehghan, E. G. Ortiz, G. Shu, and S. Z. Masood, “Dager: Deep age, gender and emotion recognition using convolutional neural network,” arXiv preprint arXiv:1702.04280, 2017

work page arXiv 2017

[20] [20]

Child abduction, amber alert, and crime control theater,

T. Griffin and M. K. Miller, “Child abduction, amber alert, and crime control theater,”Criminal justice review, vol. 33, no. 2, pp. 159–176, 2008

work page 2008

[21] [21]

Sponge examples: Energy-latency attacks on neural networks,

I. Shumailov, Y . Zhao, D. Bates, N. Papernot, R. Mullins, and R. Ander- son, “Sponge examples: Energy-latency attacks on neural networks,” in 2021 IEEE European symposium on security and privacy (EuroS&P). IEEE, 2021, pp. 212–231

work page 2021

[22] [22]

Phantom sponges: Exploiting non-maximum suppression to attack deep object detectors,

A. Shapira, A. Zolfi, L. Demetrio, B. Biggio, and A. Shabtai, “Phantom sponges: Exploiting non-maximum suppression to attack deep object detectors,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 4571–4580

work page 2023

[23] [23]

Overload: Latency attacks on object detection for edge devices,

E.-C. Chen, P.-Y . Chen, I. Chung, C.-R. Leeet al., “Overload: Latency attacks on object detection for edge devices,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 24 716–24 725

work page 2024

[24] [24]

Slowtrack: Increasing the latency of camera-based perception in autonomous driving using adversarial examples,

C. Ma, N. Wang, Q. A. Chen, and C. Shen, “Slowtrack: Increasing the latency of camera-based perception in autonomous driving using adversarial examples,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 5, 2024, pp. 4062–4070

work page 2024

[25] [25]

Slowlidar: Increasing the latency of lidar-based detection using adversarial examples,

H. Liu, Y . Wu, Z. Yu, Y . V orobeychik, and N. Zhang, “Slowlidar: Increasing the latency of lidar-based detection using adversarial examples,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 5146–5155

work page 2023

[26] [26]

Nmtsloth: understanding and testing efficiency degradation of neural machine translation systems,

S. Chen, C. Liu, M. Haque, Z. Song, and W. Yang, “Nmtsloth: understanding and testing efficiency degradation of neural machine translation systems,” inProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2022, pp. 1148–1160

work page 2022

[27] [27]

Slothspeech: Denial-of-service attack against speech recognition models,

M. Haque, R. Shah, S. Chen, B. Sisman, C. Liu, and W. Yang, “Slothspeech: Denial-of-service attack against speech recognition models,” 08 2023, pp. 1274–1278

work page 2023

[28] [28]

Ilfo: Adversarial attack on adaptive neural networks,

M. Haque, A. Chauhan, C. Liu, and W. Yang, “Ilfo: Adversarial attack on adaptive neural networks,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 14 264–14 273

work page 2020

[29] [29]

Gradmdm: Adversarial attack on dynamic networks,

J. Pan, L. G. Foo, Q. Zheng, Z. Fan, H. Rahmani, Q. Ke, and J. Liu, “Gradmdm: Adversarial attack on dynamic networks,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 9, pp. 11 374– 11 381, 2023

work page 2023

[30] [30]

Gradauto: Energy-oriented attack on dynamic neural networks,

J. Pan, Q. Zheng, Z. Fan, H. Rahmani, Q. Ke, and J. Liu, “Gradauto: Energy-oriented attack on dynamic neural networks,” inEuropean Conference on Computer Vision. Springer, 2022, pp. 637–653

work page 2022

[31] [31]

Jaguar: Low latency mobile augmented reality with flexible tracking,

W. Zhang, B. Han, and P. Hui, “Jaguar: Low latency mobile augmented reality with flexible tracking,” inProceedings of the 26th ACM International Conference on Multimedia, ser. MM ’18. New York, NY , USA: Association for Computing Machinery, 2018, p. 355–363. [Online]. Available: https://doi.org/10.1145/3240508.3240561

work page doi:10.1145/3240508.3240561 2018

[32] [32]

Distributing inference tasks over interconnected systems through dynamic dnns,

C. Singhal, Y . Wu, F. Malandrino, M. Levorato, and C. F. Chiasserini, “Distributing inference tasks over interconnected systems through dynamic dnns,”IEEE Transactions on Networking, pp. 1–14, 2025

work page 2025

[33] [33]

Human action recognition from various data modalities: A review,

Z. Sun, Q. Ke, H. Rahmani, M. Bennamoun, G. Wang, and J. Liu, “Human action recognition from various data modalities: A review,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 3, pp. 3200–3225, 2023

work page 2023

[34] [34]

Rim: Offloading inference to the edge,

Y . Hu, W. Pang, X. Liu, R. Ghosh, B. Ko, W.-H. Lee, and R. Govindan, “Rim: Offloading inference to the edge,” inProceedings of the Inter- national Conference on Internet-of-Things Design and Implementation, 2021, pp. 80–92

work page 2021

[35] [35]

Sok: Efficiency robustness of dynamic deep learning systems,

R. Rathnasuriya, T. Li, Z. Xu, Z. Song, M. Haque, S. Chen, and W. Yang, “Sok: Efficiency robustness of dynamic deep learning systems,”USENIX Security Symposium, pp. 4683–4702, 2025

work page 2025

[36] [36]

Exploiting efficiency vulnerabilities in dynamic deep learning systems,

R. Rathnasuriya and W. Yang, “Exploiting efficiency vulnerabilities in dynamic deep learning systems,”arXiv preprint arXiv:2506.17621, 2025

work page arXiv 2025

[37] [37]

Adversarial machine learning,

A. Vassilev, A. Oprea, A. Fordyce, and H. Anderson, “Adversarial machine learning,”Gaithersburg, MD, 2024

work page 2024

[38] [38]

Taxonomy of machine learning safety: A survey and primer,

S. Mohseni, H. Wang, C. Xiao, Z. Yu, Z. Wang, and J. Yadawa, “Taxonomy of machine learning safety: A survey and primer,”ACM Computing Surveys, vol. 55, no. 8, pp. 1–38, 2022

work page 2022

[39] [39]

Secure machine learning hardware: Challenges and progress [feature],

K. Lee, M. Ashok, S. Maji, R. Agrawal, A. Joshi, M. Yan, J. S. Emer, and A. P. Chandrakasan, “Secure machine learning hardware: Challenges and progress [feature],”IEEE Circuits and Systems Magazine, vol. 25, no. 1, pp. 8–34, 2025

work page 2025

[40] [40]

A panda? no, it’s a sloth: Slowdown attacks on adaptive multi-exit neural network inference,

S. Hong, Y . Kaya, I.-V . Modoranu, and T. Dumitras ¸, “A panda? no, it’s a sloth: Slowdown attacks on adaptive multi-exit neural network inference,” arXiv preprint arXiv:2010.02432, 2020

work page arXiv 2010

[41] [41]

Deeplabv3: DeepLabV3+ MobileNet pretrained on cityscapes for ground masks,

M. Teng, “Deeplabv3: DeepLabV3+ MobileNet pretrained on cityscapes for ground masks,” https://github.com/cc-ai/Deeplabv3, 2019, accessed: 2026-05-06

work page 2019

[42] [42]

On-device facial verification using nuf-net model of deep learning,

C. Termritthikun, Y . Jamtsho, and P. Muneesawang, “On-device facial verification using nuf-net model of deep learning,”Engineering Applications of Artificial Intelligence, vol. 85, pp. 579–589, 2019. [Online]. Available: https://www.sciencedirect.com/science/article/pii/ S0952197619301824

work page 2019

[43] [43]

Deepperform: An efficient approach for performance testing of resource-constrained neural net- works,

S. Chen, M. Haque, C. Liu, and W. Yang, “Deepperform: An efficient approach for performance testing of resource-constrained neural net- works,” inProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, 2022, pp. 1–13

work page 2022

[44] [44]

Nicgslowdown: Evaluating the efficiency robustness of neural image caption generation models,

S. Chen, Z. Song, M. Haque, C. Liu, and W. Yang, “Nicgslowdown: Evaluating the efficiency robustness of neural image caption generation models,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 15 365–15 374. APPENDIXA PIPELINEAPPLICATIONS As shown in Figure 4, we developed five distinct pipeline applications....

work page 2022