Recognition: unknown
NeuroTrace: Inference Provenance-Based Detection of Adversarial Examples
Pith reviewed 2026-05-10 12:26 UTC · model grok-4.3
The pith
Inference provenance graphs capture cross-layer dataflow to distinguish adversarial inputs from benign ones.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
NeuroTrace extracts Inference Provenance Graphs from instrumented model executions; each graph encodes activation values together with the parameter-driven dataflow paths that produced them. Detectors built on these graphs reliably flag adversarial examples under intra-attack, multi-attack, and cross-domain transfer conditions while improving on prior graph baselines. The framework supplies a reusable extraction engine, a standardized graph format, and a public benchmark spanning multiple attack families.
What carries the argument
Inference Provenance Graphs (IPGs), heterogeneous graphs that record both activation behavior and parameter-induced dataflow during the forward pass.
If this is right
- IPG detectors maintain high performance when trained on one attack family and tested on others.
- The same graphs improve detection accuracy over earlier graph-based methods in both vision and malware tasks.
- Runtime and storage costs of provenance extraction can be measured and traded off against detection quality.
- Releasing the extraction pipeline and dataset enables repeatable study of inference-time information flow.
Where Pith is reading between the lines
- Provenance tracking could be combined with existing monitoring tools to create layered defenses that audit both inputs and internal execution.
- If IPGs generalize beyond adversarial examples, similar graphs might flag other runtime anomalies such as model poisoning or distribution shift.
- The open dataset allows direct comparison of provenance signals against activation-only or attribution-only baselines on the same inputs.
Load-bearing premise
That the cross-layer patterns captured in the graphs remain informative enough to separate adversarial from benign inputs even when the attack type or application domain changes.
What would settle it
A new attack family or domain where an IPG-based detector achieves no better than chance accuracy on held-out adversarial examples while layer-local detectors still succeed.
Figures
read the original abstract
Deep neural networks (DNNs) remain largely opaque at inference time, limiting our ability to detect and diagnose malicious input manipulations such as adversarial examples. Existing detection methods predominantly rely on layer-local signals (e.g., activations or attribution scores), leaving cross-layer information flow and execution structure under-explored. We introduce NeuroTrace, a framework and open dataset for analyzing inference provenance through Inference Provenance Graphs (IPGs). IPGs are heterogeneous graphs that capture both activation behavior and parameter-induced dataflow during a model's forward pass, providing a structured representation of how information propagates through the network. NeuroTrace includes (i) a reproducible extraction engine that instruments model execution, (ii) a standardized graph representation compatible with heterogeneous GNNs, and (iii) a benchmark suite spanning multiple adversarial attack families across vision and malware domains. Using this framework, we evaluate IPG-based detectors for adversarial example detection under intra-attack, multi-attack, and cross-threat transfer settings. Our results show that inference provenance provides a strong and transferable signal for distinguishing adversarial and benign inputs, achieving consistently high detection performance and improving over prior graph-based baselines. We further analyze the conditions under which provenance-based detection generalizes across attack types, as well as the associated runtime and storage trade-offs. By releasing the dataset, extraction pipeline, and evaluation protocol, NeuroTrace enables systematic study of inference-time behavior and establishes inference provenance as a practical foundation for building more transparent and auditable machine learning systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces NeuroTrace, a framework for detecting adversarial examples via Inference Provenance Graphs (IPGs). IPGs are heterogeneous graphs that encode both activation behavior and parameter-induced dataflow during DNN forward passes. The work supplies a reproducible extraction engine, a standardized graph format for heterogeneous GNNs, and a benchmark suite spanning vision and malware domains. Evaluations are performed under intra-attack, multi-attack, and cross-threat transfer regimes, with the central claim that provenance yields a strong, transferable detection signal that improves over prior graph-based baselines. The dataset, pipeline, and protocol are released openly.
Significance. If the benchmark results hold, the contribution is significant because it moves adversarial detection beyond layer-local signals to structured cross-layer provenance information. The open release of the dataset, extraction pipeline, and evaluation protocol is a clear strength that supports reproducibility and systematic follow-on work in inference-time ML analysis. This positions inference provenance as a practical foundation for more transparent and auditable machine-learning systems.
minor comments (1)
- [Abstract] Abstract: the summary asserts 'consistently high detection performance' and improvement over baselines without any numerical values, dataset sizes, or error bars. Adding one or two headline metrics would allow readers to assess the strength of the claim immediately.
Simulated Author's Rebuttal
We thank the referee for their positive assessment of NeuroTrace, the open release of the dataset and pipeline, and the recommendation for minor revision. No specific major comments were raised in the report.
Circularity Check
No significant circularity
full rationale
The paper describes an empirical framework for building Inference Provenance Graphs (IPGs) from instrumented DNN forward passes and training heterogeneous GNNs for binary classification of adversarial vs. benign inputs. No equations, derivations, or first-principles claims appear in the provided text; the central results rest on released datasets, extraction pipelines, and benchmark evaluations across attack families and domains. The methodology is self-contained against external benchmarks with no self-definitional reductions, fitted inputs renamed as predictions, or load-bearing self-citations that collapse the claimed signal to its own inputs.
Axiom & Free-Parameter Ledger
invented entities (1)
-
Inference Provenance Graphs (IPGs)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Abderrahmen Amich and Birhanu Eshete. 2021. Explanation-Guided Diagnosis of Machine Learning Evasion Attacks. InSecurity and Privacy in Communication Networks, Joaquin Garcia-Alfaro, Shujun Li, Radha Poovendran, Hervé Debar, and Moti Yung (Eds.). Springer International Publishing, Cham, 207–228
2021
-
[2]
H. S. Anderson and P. Roth. 2018. EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models.ArXiv e-prints(2018). arXiv:1804.04637
work page Pith review arXiv 2018
- [3]
-
[4]
Alexander Binder, Grégoire Montavon, Sebastian Lapuschkin, Klaus-Robert Müller, and Wojciech Samek. 2016. Layer-Wise Relevance Propagation for Neural Networks with Local Renormalization Layers. InArtificial Neural Networks and Machine Learning - ICANN 2016 - 25th International Conference on Artificial Neu- ral Networks, Barcelona, Spain, September 6-9, 20...
- [5]
-
[6]
Goodfellow, Jonathon Shlens, and Christian Szegedy
Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and Harnessing Adversarial Examples. In3rd International Conference on Learning Representations, ICLR
2015
-
[7]
Firas Ben Hmida, Abderrahmen Amich, Ata Kaboudi, and Birhanu Eshete. 2025. DeepProv: Behavioral Characterization and Repair of Neural Networks via Infer- ence Provenance Graph Analysis. InIEEE Annual Computer Security Applications Conference, ACSAC 2025, Honolulu, HI, USA, December 8-12, 2025. IEEE, 922–938. doi:10.1109/ACSAC67867.2025.00077
-
[8]
Kherchouche, S
A. Kherchouche, S. A. Fezza, W. Hamidouche, and O. Deforges. 2020. Detection of Adversarial Examples in Deep Neural Networks with Natural Scene Statistics. In2020 International Joint Conference on Neural Networks (IJCNN). 1–7
2020
-
[9]
Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. 2009. CIFAR-10 (Canadian Institute for Advanced Research). (2009). http://www.cs.toronto.edu/~kriz/cifar. html
2009
-
[10]
S. Ma, Y. Liu, G. Tao, W.-C. Lee, and X. Zhang. 2019. NIC: Detecting Adversarial Samples with Neural Network Invariant Checking. InProceedings 2019 Network and Distributed System Security Symposium
2019
- [11]
-
[12]
Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2017. Towards Deep Learning Models Resistant to Adversarial Attacks.CoRRabs/1706.06083 (2017)
work page internal anchor Pith review arXiv 2017
-
[13]
Sotgiu, A
A. Sotgiu, A. Demontis, M. Melis, B. Biggio, G. Fumera, X. Feng, and F. Roli
-
[14]
Deep Neural Rejection against Adversarial Examples.EURASIP Journal on Information Security2020 (2019)
2019
- [15]
-
[16]
Xiaosen Wang, Zeliang Zhang, and Jianping Zhang. 2023. Structure Invari- ant Transformation for better Adversarial Transferability. InProceedings of the IEEE/CVF International Conference on Computer Vision
2023
-
[17]
Fei Zhang, Zhe Li, Yahang Hu, and Yaohua Wang. 2024. CIGA: Detecting Adver- sarial Samples via Critical Inference Graph Analysis. In2024 Annual Computer Security Applications Conference (ACSAC). 1231–1244. doi:10.1109/ACSAC63791. 2024.00098
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.