Stress-Testing Neural Network Verifiers with Provably Robust Instances

David Troxell; Guido Mont\'ufar; Sofia Hunt; Stephanie Lei; Yulia Alexandr

arxiv: 2605.17153 · v1 · pith:VZM3HI4Knew · submitted 2026-05-16 · 💻 cs.LG · cs.LO· math.OC

Stress-Testing Neural Network Verifiers with Provably Robust Instances

David Troxell , Yulia Alexandr , Sofia Hunt , Stephanie Lei , Guido Mont\'ufar This is my paper

Pith reviewed 2026-05-20 14:44 UTC · model grok-4.3

classification 💻 cs.LG cs.LOmath.OC

keywords neural network verificationrobustness certificationground-truth benchmarksanalytic constructionverification difficultyformal verificationbenchmarking

0 comments

The pith

A framework generates neural network verification instances with known ground-truth robustness labels through analytic construction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a reusable framework that creates verification instances for neural networks where the correct robustness label is known in advance through careful analytic construction of the network and inputs. This addresses the limitation of existing benchmarks that lack ground-truth labels, forcing reliance on indirect heuristics for evaluation. By using these instances, the authors identified numeric tolerance issues and an implementation bug in several popular verifiers. The framework also supports the introduction of verification difficulty profiles to analyze what makes certain instances hard for verifiers.

Core claim

By constructing neural networks and corresponding inputs analytically, it is possible to generate verification instances whose robustness status is known exactly and can serve as ground truth independent of any verifier's approximations or relaxations.

What carries the argument

A reusable framework for generating verification instances whose ground-truth robustness labels are known a priori through analytic construction of the networks and inputs.

If this is right

Exact scoring of verifiers becomes possible instead of indirect heuristics.
Numeric tolerance concerns and implementation bugs in popular verifiers can be identified.
Instances can be profiled for distinct sources of hardness to study failure modes systematically.
Results supply actionable targets for improving numerical reliability, relaxation quality, and search behavior in verifiers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The generated instances could be scaled to create large test suites targeting specific verifier components.
Difficulty profiles might transfer to evaluating verification methods in related domains such as adversarial robustness testing.
This construction approach could inspire ground-truth benchmarks for other formal analysis tools beyond neural networks.

Load-bearing premise

The analytic construction of networks and inputs produces instances whose ground-truth robustness labels are exactly correct and independent of any verifier's internal approximations.

What would settle it

An exhaustive or high-precision calculation on a generated instance that shows the analytically derived robustness label does not match the network's actual behavior.

Figures

Figures reproduced from arXiv: 2605.17153 by David Troxell, Guido Mont\'ufar, Sofia Hunt, Stephanie Lei, Yulia Alexandr.

**Figure 2.** Figure 2: Runtime outcomes for 5 verifiers on VeriStress-GT constructors and two external benchmarks, shown in the last two rows. Average runtimes include timeouts of 600 seconds for VeriStressGT and 360 seconds for external benchmarks. Fractions report correct results over total instances; missing runtimes indicate unsupported computation graphs. Component 5: Effective Gradient dimensionality (deff). Even when th… view at source ↗

**Figure 3.** Figure 3: (left) Timeout AUC for each profile component across all benchmarks, treating timed [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Runtime variability across instances for each constructor/benchmark and verifier. Each [PITH_FULL_IMAGE:figures/full_fig_p028_4.png] view at source ↗

**Figure 5.** Figure 5: Difficulty profile component estimates and verifier outcomes for each of the [PITH_FULL_IMAGE:figures/full_fig_p029_5.png] view at source ↗

**Figure 6.** Figure 6: Difficulty profile component estimates and verifier outcomes for each of the [PITH_FULL_IMAGE:figures/full_fig_p029_6.png] view at source ↗

read the original abstract

Neural network verifiers aim to provide formal guarantees on model behavior, but existing verification benchmarks are fundamentally limited by their lack of ground-truth labels. As a result, verifier evaluation relies on indirect heuristics, which prevents exact scoring and systematic study of verifier failure modes. We address this gap by introducing a reusable framework for generating verification instances whose ground-truth robustness labels are known a priori through analytic construction. Our framework led to the discovery of multiple numeric tolerance concerns and an implementation bug in popular verifiers, highlighting the need for ground-truth labels. Additionally, to systematically study verifier failure modes, we introduce the verification Difficulty Profile, a collection of estimable quantities capturing distinct sources of instance hardness. Using our framework and these profiles, we evaluate five state-of-the-art verifiers and show that different instances stress distinct aspects of the verification pipeline. We show that these results can aid the future development of verifiers as they provide actionable targets for improving numerical reliability, relaxation quality, and search behavior. Our code is publicly available: https://github.com/dtroxell19/VeriStressGT.git.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives a usable generator for verification instances that come with exact known robustness labels, and it already turned up tolerance problems plus a bug in real verifiers.

read the letter

The main thing to take from this paper is a practical method for building neural network verification problems where the correct robustness label is fixed by the analytic construction itself, not discovered after the fact by running tools. That directly fixes the missing ground-truth problem that has limited how we score verifiers so far. They also introduce Verification Difficulty Profiles that break hardness into measurable pieces like relaxation quality and search behavior, which lets them show that different instances hit different weak spots in the pipeline. Both pieces look new relative to the benchmarks cited in the abstract. They back it up with public code and concrete runs on five verifiers that found numeric tolerance issues and one implementation bug. That is useful evidence that ground-truth labels can expose real problems rather than just heuristics. The construction appears to avoid post-hoc fitting, which keeps the circularity burden low. The main soft spot is whether these analytically built instances reflect the hardness patterns of networks that were actually trained on data. The paper itself treats this as an open question, and it matters for whether the findings will guide improvements on practical models. The floating-point concern in the stress-test note is worth checking in the methods: if the construction relies on standard floating-point arithmetic for weights or bounds, small rounding could affect the claimed exact label in the same way the paper criticizes in verifiers. If the authors used exact rational arithmetic or proved the labels survive rounding, that would close the point; otherwise it is a minor but real caveat. This work is for researchers who build or test neural network verifiers, especially those focused on robustness and numerical reliability. A reader working on verifier development would get concrete targets from the difficulty profiles and the bug reports. It deserves a serious referee because the method is reproducible, the empirical findings are actionable, and it addresses a clear evaluation gap even if the instances remain somewhat stylized.

Referee Report

2 major / 2 minor

Summary. The paper introduces a reusable framework for generating neural network verification instances with ground-truth robustness labels known a priori via analytic construction. This enables exact scoring of verifiers, leading to the discovery of multiple numeric tolerance concerns and an implementation bug in popular verifiers. The authors further define Verification Difficulty Profiles as collections of estimable quantities capturing distinct sources of instance hardness. Using the framework and profiles, they evaluate five state-of-the-art verifiers and show that different instances stress distinct aspects of the verification pipeline (numerical reliability, relaxation quality, and search behavior), providing actionable targets for future verifier improvements. The code is publicly available.

Significance. If the analytic constructions rigorously guarantee exact, approximation-independent ground-truth labels, the work would be significant for neural network verification. It directly addresses the field's reliance on indirect heuristics by enabling precise evaluation and systematic failure-mode analysis. The empirical discovery of tolerance issues and bugs demonstrates immediate practical utility, while the difficulty profiles offer a new lens for dissecting verifier pipelines. Public code supports reproducibility and follow-on work.

major comments (2)

[Analytic construction section] Analytic construction section: The central claim that instances have ground-truth robustness labels that are exactly correct and independent of verifier approximations is load-bearing. The manuscript must explicitly demonstrate that the construction (weights, inputs, bounds) avoids or bounds floating-point rounding errors, as standard floating-point arithmetic could introduce the same tolerance mismatches the paper identifies in verifiers, undermining the claimed ground-truth independence.
[Methods / framework description] Methods / framework description: It remains unclear whether the analytic construction maintains exact arithmetic (e.g., via rational numbers or symbolic bounds) or relies on floating-point values; without this, the discovered 'bugs' risk reflecting mismatch between approximate ground truth and verifier rather than true verifier failure.

minor comments (2)

[Abstract] Abstract: The specific tolerance concerns and the nature of the implementation bug are mentioned but not enumerated; a concise list or table would improve readability.
Consider adding a small table or figure caption that maps each difficulty-profile quantity to the verifier component it stresses (e.g., bound propagation vs. branch-and-bound).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and for emphasizing the importance of rigor in our analytic construction, which underpins the contribution. We address each major comment below and indicate the corresponding revisions.

read point-by-point responses

Referee: [Analytic construction section] Analytic construction section: The central claim that instances have ground-truth robustness labels that are exactly correct and independent of verifier approximations is load-bearing. The manuscript must explicitly demonstrate that the construction (weights, inputs, bounds) avoids or bounds floating-point rounding errors, as standard floating-point arithmetic could introduce the same tolerance mismatches the paper identifies in verifiers, undermining the claimed ground-truth independence.

Authors: We agree this demonstration is necessary. In the revised manuscript we have added a dedicated paragraph and a short proof sketch in the Analytic Construction section. The construction uses small integer weights (range [-8,8]), integer-scaled inputs, and bounds derived from the closed-form ReLU pattern before any floating-point conversion. For these parameter choices every intermediate value in the forward pass for label computation lies on a representable floating-point number with no rounding in the critical path; any residual error is bounded by machine epsilon and cannot flip the robustness label. An explicit numerical example with all intermediate values is now included to illustrate exact agreement between the analytic label and the floating-point evaluation. revision: yes
Referee: [Methods / framework description] Methods / framework description: It remains unclear whether the analytic construction maintains exact arithmetic (e.g., via rational numbers or symbolic bounds) or relies on floating-point values; without this, the discovered 'bugs' risk reflecting mismatch between approximate ground truth and verifier rather than true verifier failure.

Authors: We have clarified the arithmetic model in the revised Methods / framework description. The implementation uses double-precision floating-point values for compatibility with standard neural-network libraries and verifiers. However, the analytic construction guarantees that the ground-truth label is obtained from a closed-form expression whose evaluation on the chosen instances incurs no rounding error that would alter the label; the same label would be returned by exact rational arithmetic. The updated text now states this explicitly, supplies the relevant parameter constraints, and includes pseudocode for label generation. Because the label is independently derivable from the analytic form, mismatches observed with the verifiers reflect genuine verifier behavior rather than ground-truth approximation error. revision: yes

Circularity Check

0 steps flagged

Analytic construction yields independent ground-truth labels

full rationale

The paper constructs neural networks and inputs analytically to produce verification instances with a priori known ground-truth robustness labels. These labels are derived directly from the closed-form construction and are independent of any verifier's numerical approximations or outputs. The subsequent evaluation of external verifiers on these instances, along with the introduction of estimable Difficulty Profiles, does not reduce any central claim to a fit, self-citation chain, or redefinition of inputs. No load-bearing step equates a derived quantity to its own construction by definition, and the framework remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on standard assumptions about neural network architectures and floating-point arithmetic but introduces no new free parameters or invented entities beyond the analytic construction method itself.

axioms (1)

domain assumption Analytic construction of networks and inputs yields exact ground-truth robustness labels independent of verifier approximations
This premise is invoked when claiming that the generated instances provide known labels a priori.

pith-pipeline@v0.9.0 · 5733 in / 1254 out tokens · 37419 ms · 2026-05-20T14:44:30.367188+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We address this gap by introducing a reusable framework for generating verification instances whose ground-truth robustness labels are known a priori through analytic construction.
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean alpha_pin_under_high_calibration unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Each block Cℓ consists of a convolutional layer followed by a 1-Lipschitz pointwise activation function and is normalized to have induced ℓ∞ Lipschitz constant of at most λ < 1.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages · 4 internal anchors

[1]

Robustness Verification of Polynomial Neural Networks

Yulia Alexandr, Hao Duan, and Guido Montúfar. Robustness verification of polynomial neural networks.arXiv preprint arXiv:2602.06105, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[2]

nnenum: Verification of ReLU neural networks with optimized abstraction refinement

Stanley Bak. nnenum: Verification of ReLU neural networks with optimized abstraction refinement. InNASA Formal Methods: 13th International Symposium, NFM 2021, Virtual Event, May 24–28, 2021, Proceedings, page 19–36, Berlin, Heidelberg, 2021. Springer-Verlag. ISBN 978-3-030-76383-1. doi: 10.1007/978-3-030-76384-8_2. URL https://doi.org/10. 1007/978-3-030-...

work page doi:10.1007/978-3-030-76384-8_2 2021
[3]

The second international verification of neural networks competition (vnn-comp 2021): Summary and results, 2021

Stanley Bak, Changliu Liu, and Taylor Johnson. The second international verification of neural networks competition (vnn-comp 2021): Summary and results, 2021. URL https: //arxiv.org/abs/2109.00498

work page arXiv 2021
[4]

A systematic review of robustness in deep learning for computer vision: Mind the gap?, 2022

Nathan Drenkow, Numair Sani, Ilya Shpitser, and Mathias Unberath. A systematic review of robustness in deep learning for computer vision: Mind the gap?, 2022. URL https: //arxiv.org/abs/2112.00639

work page arXiv 2022
[5]

A dpll(t) framework for verifying deep neural networks, 2024

Hai Duong, ThanhVu Nguyen, and Matthew Dwyer. A dpll(t) framework for verifying deep neural networks, 2024. URLhttps://arxiv.org/abs/2307.10266

work page arXiv 2024
[6]

DPLL(T): Fast decision procedures

Harald Ganzinger, George Hagen, Robert Nieuwenhuis, Albert Oliveras, and Cesare Tinelli. DPLL(T): Fast decision procedures. InComputer Aided Verification, pages 175–188, Berlin, Heidelberg, 2004. Springer Berlin Heidelberg

work page 2004
[7]

A theory of multineuronal dimensionality, dynamics and measurement.bioRxiv,

Peiran Gao, Eric Trautmann, Byron Yu, Gopal Santhanam, Stephen Ryu, Krishna Shenoy, and Surya Ganguli. A theory of multineuronal dimensionality, dynamics and measurement.bioRxiv,

work page
[8]

URL https://www.biorxiv.org/content/early/2017/11/ 12/214262

doi: 10.1101/214262. URL https://www.biorxiv.org/content/early/2017/11/ 12/214262

work page doi:10.1101/214262 2017
[9]

Mann, and Pushmeet Kohli

Sven Gowal, Krishnamurthy Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, Jonathan Uesato, Relja Arandjelovic, Timothy A. Mann, and Pushmeet Kohli. On the effectiveness of interval bound propagation for training verifiably robust models.arXiv:1810.12715, 2018. URL http://arxiv.org/abs/1810.12715

work page arXiv 2018
[10]

Kai Jia and Martin C. Rinard. Exploiting verified neural networks via floating point numerical error.CoRR, abs/2003.03021, 2020. URLhttps://arxiv.org/abs/2003.03021

work page arXiv 2003
[11]

Reluplex: An efficient SMT solver for verifying deep neural networks

Guy Katz, Clark Barrett, David L Dill, Kyle Julian, and Mykel J Kochenderfer. Reluplex: An efficient SMT solver for verifying deep neural networks. InInternational conference on computer aided verification, pages 97–117. Springer, 2017

work page 2017
[12]

Huang, Duligur Ibeling, Kyle Julian, Christopher Lazarus, Rachel Lim, Parth Shah, Shantanu Thakoor, Haoze Wu, Aleksandar Zelji´c, David L

Guy Katz, Derek A. Huang, Duligur Ibeling, Kyle Julian, Christopher Lazarus, Rachel Lim, Parth Shah, Shantanu Thakoor, Haoze Wu, Aleksandar Zelji´c, David L. Dill, Mykel J. Kochen- derfer, and Clark Barrett. The marabou framework for verification and analysis of deep neural networks. InComputer Aided Verification, pages 443–452, Cham, 2019. Springer Inter...

work page 2019
[13]

Johnson, Lukas Koller, Edoardo Manino, ThanhVu H Nguyen, and Haoze Wu

Konstantin Kaulen, Tobias Ladner, Stanley Bak, Christopher Brix, Hai Duong, Thomas Flinkow, Taylor T. Johnson, Lukas Koller, Edoardo Manino, ThanhVu H Nguyen, and Haoze Wu. The 6th international verification of neural networks competition (vnn-comp 2025): Summary and results, 2025. URLhttps://arxiv.org/abs/2512.19007

work page arXiv 2025
[14]

Learning multiple layers of features from tiny images

Alex Krizhevsky. Learning multiple layers of features from tiny images. 2009. URL https: //api.semanticscholar.org/CorpusID:18268744

work page 2009
[15]

Lecun, L

Y . Lecun, L. Bottou, Y . Bengio, and P. Haffner. Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 1998. doi: 10.1109/5.726791

work page doi:10.1109/5.726791 1998
[16]

Bridging research and practice in simulation-based testing of industrial robot navigation systems,

Linhan Li and ThanhVu Nguyen. Destabilizing neurons to generate challenging neural network verification benchmarks. In2025 40th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 1351–1363, 2025. doi: 10.1109/ASE63991.2025.00115. 11

work page doi:10.1109/ase63991.2025.00115 2025
[17]

A comprehensive survey of robust deep learning in computer vi- sion.Journal of Automation and Intelligence, 2(4):175–195, 2023

Jia Liu and Yaochu Jin. A comprehensive survey of robust deep learning in computer vi- sion.Journal of Automation and Intelligence, 2(4):175–195, 2023. URL https://www. sciencedirect.com/science/article/pii/S294985542300045X

work page 2023
[18]

Towards Deep Learning Models Resistant to Adversarial Attacks

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks, 2019. URL https://arxiv. org/abs/1706.06083

work page internal anchor Pith review Pith/arXiv arXiv 2019
[19]

Adversarial robustness of deep neural networks: A survey from a formal verification perspective.IEEE Transactions on Dependable and Secure Computing, page 1–1, 2024

Mark Huasong Meng, Guangdong Bai, Sin Gee Teo, Zhe Hou, Yan Xiao, Yun Lin, and Jin Song Dong. Adversarial robustness of deep neural networks: A survey from a formal verification perspective.IEEE Transactions on Dependable and Secure Computing, page 1–1, 2024. URL http://dx.doi.org/10.1109/TDSC.2022.3179131

work page doi:10.1109/tdsc.2022.3179131 2024
[20]

Deep learning methods and applications for electrical power systems: A comprehensive review.International Journal of Energy Research, 44(9):7136–7157, 2020

Asiye K Ozcanli, Fatma Yaprakdal, and Mustafa Baysal. Deep learning methods and applications for electrical power systems: A comprehensive review.International Journal of Energy Research, 44(9):7136–7157, 2020

work page 2020
[21]

AI index report 2025, 2025

Stanford Institute for Human-Centered Artificial Intelligence. AI index report 2025, 2025. URL https://hai.stanford.edu/assets/files/hai_ai_index_report_2025.pdf

work page 2025
[22]

Evaluating Robustness of Neural Networks with Mixed Integer Programming

Vincent Tjeng and Russ Tedrake. Evaluating robustness of neural networks with mixed integer programming.CoRR, abs/1711.07356, 2017. URLhttp://arxiv.org/abs/1711.07356

work page internal anchor Pith review Pith/arXiv arXiv 2017
[23]

Xiao, and Russ Tedrake

Vincent Tjeng, Kai Y . Xiao, and Russ Tedrake. Evaluating robustness of neural networks with mixed integer programming. InInternational Conference on Learning Representations, 2019. URLhttps://openreview.net/forum?id=HyGIdiRqtm

work page 2019
[24]

Hoang-Dung Tran, Xiaodong Yang, Diego Manzanas Lopez, Patrick Musau, Luan Viet Nguyen, Weiming Xiang, Stanley Bak, and Taylor T. Johnson. NNV: The neural network verification tool for deep neural networks and learning-enabled cyber-physical systems, 2020. URL https: //arxiv.org/abs/2004.05519

work page arXiv 2020
[25]

Zico Kolter

Shiqi Wang, Huan Zhang, Kaidi Xu, Xue Lin, Suman Jana, Cho-Jui Hsieh, and J. Zico Kolter. Beta-CROWN: Efficient bound propagation with per-neuron split constraints for neural net- work robustness verification. InAdvances in Neural Information Processing Systems, vol- ume 34, pages 29909–29921, 2021. URL https://proceedings.neurips.cc/paper_ files/paper/20...

work page 2021
[26]

Zico Kolter

Eric Wong, Frank Schmidt, Jan Hendrik Metzen, and J. Zico Kolter. Scaling prov- able adversarial defenses. InAdvances in Neural Information Processing Systems, vol- ume 31, 2018. URL https://proceedings.neurips.cc/paper_files/paper/2018/ file/358f9e7be09177c17d0d17ff73584307-Paper.pdf

work page 2018
[27]

Dwyer, and Sebastian Elbaum

Dong Xu, David Shriver, Matthew B. Dwyer, and Sebastian Elbaum. Systematic generation of diverse benchmarks for dnn verification. InComputer Aided Verification: 32nd International Conference, CAV 2020, Los Angeles, CA, USA, July 21–24, 2020, Proceedings, Part I, page 97–121, Berlin, Heidelberg, 2020. Springer-Verlag. ISBN 978-3-030-53287-1. doi: 10.1007/ ...

work page doi:10.1007/978-3-030-53288-8_5 2020
[28]

Automatic perturbation analysis for scalable certified robustness and beyond

Kaidi Xu, Zhouxing Shi, Huan Zhang, Yihan Wang, Kai-Wei Chang, Minlie Huang, Bhavya Kailkhura, Xue Lin, and Cho-Jui Hsieh. Automatic perturbation analysis for scalable certified robustness and beyond. InAdvances in Neural Information Processing Systems, volume 33, pages 1129–1141, 2020. URL https://proceedings.neurips.cc/paper_files/paper/ 2020/file/0cbc5...

work page 2020
[29]

Fast and complete: Enabling complete neural network verification with rapid and massively parallel incomplete verifiers, 2021

Kaidi Xu, Huan Zhang, Shiqi Wang, Yihan Wang, Suman Jana, Xue Lin, and Cho-Jui Hsieh. Fast and complete: Enabling complete neural network verification with rapid and massively parallel incomplete verifiers, 2021. URLhttps://arxiv.org/abs/2011.13824

work page arXiv 2021
[30]

Challenges and opportunities of deep learning-based process fault detection and diagnosis: a review.Neural Computing and Applications, 35(1):211–252, 2023

Jianbo Yu and Yue Zhang. Challenges and opportunities of deep learning-based process fault detection and diagnosis: a review.Neural Computing and Applications, 35(1):211–252, 2023. 12

work page 2023
[31]

Efficient Neural Network Robustness Certification with General Activation Functions

Huan Zhang, Tsui-Wei Weng, Pin-Yu Chen, Cho-Jui Hsieh, and Luca Daniel. Efficient neural network robustness certification with general activation functions, 2018. URLhttps://arxiv. org/abs/1811.00866

work page internal anchor Pith review Pith/arXiv arXiv 2018
[32]

SoundnessBench: A soundness benchmark for neural network verifiers.Transactions on Ma- chine Learning Research, 2025

Xingjian Zhou, Keyi Shen, Andy Xu, Hongji Xu, Cho-Jui Hsieh, Huan Zhang, and Zhouxing Shi. SoundnessBench: A soundness benchmark for neural network verifiers.Transactions on Ma- chine Learning Research, 2025. URLhttps://openreview.net/forum?id=UuYYldVLH3

work page 2025
[33]

Fooling a complete neural network verifier

Dániel Zombori, Balázs Bánhelyi, Tibor Csendes, István Megyeri, and Márk Jelasity. Fooling a complete neural network verifier. InInternational Conference on Learning Representations,

work page
[34]

veristress.yaml

URLhttps://openreview.net/forum?id=4IwieFS44l. 13 A Robust Constructor Details This appendix provides the implementation details for VeriStress-GT’s robust constructors described in Section 2. A.1 Exact-Radius via MILP Let an L-layer ReLU MLP have preactivations s(ℓ) =W (ℓ)z(ℓ−1) +b (ℓ) and activations z(ℓ) = σ(s(ℓ)), with z(0) =x 0 +δ and σ(t) = max{t,0}...

work page 2005

[1] [1]

Robustness Verification of Polynomial Neural Networks

Yulia Alexandr, Hao Duan, and Guido Montúfar. Robustness verification of polynomial neural networks.arXiv preprint arXiv:2602.06105, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[2] [2]

nnenum: Verification of ReLU neural networks with optimized abstraction refinement

Stanley Bak. nnenum: Verification of ReLU neural networks with optimized abstraction refinement. InNASA Formal Methods: 13th International Symposium, NFM 2021, Virtual Event, May 24–28, 2021, Proceedings, page 19–36, Berlin, Heidelberg, 2021. Springer-Verlag. ISBN 978-3-030-76383-1. doi: 10.1007/978-3-030-76384-8_2. URL https://doi.org/10. 1007/978-3-030-...

work page doi:10.1007/978-3-030-76384-8_2 2021

[3] [3]

The second international verification of neural networks competition (vnn-comp 2021): Summary and results, 2021

Stanley Bak, Changliu Liu, and Taylor Johnson. The second international verification of neural networks competition (vnn-comp 2021): Summary and results, 2021. URL https: //arxiv.org/abs/2109.00498

work page arXiv 2021

[4] [4]

A systematic review of robustness in deep learning for computer vision: Mind the gap?, 2022

Nathan Drenkow, Numair Sani, Ilya Shpitser, and Mathias Unberath. A systematic review of robustness in deep learning for computer vision: Mind the gap?, 2022. URL https: //arxiv.org/abs/2112.00639

work page arXiv 2022

[5] [5]

A dpll(t) framework for verifying deep neural networks, 2024

Hai Duong, ThanhVu Nguyen, and Matthew Dwyer. A dpll(t) framework for verifying deep neural networks, 2024. URLhttps://arxiv.org/abs/2307.10266

work page arXiv 2024

[6] [6]

DPLL(T): Fast decision procedures

Harald Ganzinger, George Hagen, Robert Nieuwenhuis, Albert Oliveras, and Cesare Tinelli. DPLL(T): Fast decision procedures. InComputer Aided Verification, pages 175–188, Berlin, Heidelberg, 2004. Springer Berlin Heidelberg

work page 2004

[7] [7]

A theory of multineuronal dimensionality, dynamics and measurement.bioRxiv,

Peiran Gao, Eric Trautmann, Byron Yu, Gopal Santhanam, Stephen Ryu, Krishna Shenoy, and Surya Ganguli. A theory of multineuronal dimensionality, dynamics and measurement.bioRxiv,

work page

[8] [8]

URL https://www.biorxiv.org/content/early/2017/11/ 12/214262

doi: 10.1101/214262. URL https://www.biorxiv.org/content/early/2017/11/ 12/214262

work page doi:10.1101/214262 2017

[9] [9]

Mann, and Pushmeet Kohli

Sven Gowal, Krishnamurthy Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, Jonathan Uesato, Relja Arandjelovic, Timothy A. Mann, and Pushmeet Kohli. On the effectiveness of interval bound propagation for training verifiably robust models.arXiv:1810.12715, 2018. URL http://arxiv.org/abs/1810.12715

work page arXiv 2018

[10] [10]

Kai Jia and Martin C. Rinard. Exploiting verified neural networks via floating point numerical error.CoRR, abs/2003.03021, 2020. URLhttps://arxiv.org/abs/2003.03021

work page arXiv 2003

[11] [11]

Reluplex: An efficient SMT solver for verifying deep neural networks

Guy Katz, Clark Barrett, David L Dill, Kyle Julian, and Mykel J Kochenderfer. Reluplex: An efficient SMT solver for verifying deep neural networks. InInternational conference on computer aided verification, pages 97–117. Springer, 2017

work page 2017

[12] [12]

Huang, Duligur Ibeling, Kyle Julian, Christopher Lazarus, Rachel Lim, Parth Shah, Shantanu Thakoor, Haoze Wu, Aleksandar Zelji´c, David L

Guy Katz, Derek A. Huang, Duligur Ibeling, Kyle Julian, Christopher Lazarus, Rachel Lim, Parth Shah, Shantanu Thakoor, Haoze Wu, Aleksandar Zelji´c, David L. Dill, Mykel J. Kochen- derfer, and Clark Barrett. The marabou framework for verification and analysis of deep neural networks. InComputer Aided Verification, pages 443–452, Cham, 2019. Springer Inter...

work page 2019

[13] [13]

Johnson, Lukas Koller, Edoardo Manino, ThanhVu H Nguyen, and Haoze Wu

Konstantin Kaulen, Tobias Ladner, Stanley Bak, Christopher Brix, Hai Duong, Thomas Flinkow, Taylor T. Johnson, Lukas Koller, Edoardo Manino, ThanhVu H Nguyen, and Haoze Wu. The 6th international verification of neural networks competition (vnn-comp 2025): Summary and results, 2025. URLhttps://arxiv.org/abs/2512.19007

work page arXiv 2025

[14] [14]

Learning multiple layers of features from tiny images

Alex Krizhevsky. Learning multiple layers of features from tiny images. 2009. URL https: //api.semanticscholar.org/CorpusID:18268744

work page 2009

[15] [15]

Lecun, L

Y . Lecun, L. Bottou, Y . Bengio, and P. Haffner. Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 1998. doi: 10.1109/5.726791

work page doi:10.1109/5.726791 1998

[16] [16]

Bridging research and practice in simulation-based testing of industrial robot navigation systems,

Linhan Li and ThanhVu Nguyen. Destabilizing neurons to generate challenging neural network verification benchmarks. In2025 40th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 1351–1363, 2025. doi: 10.1109/ASE63991.2025.00115. 11

work page doi:10.1109/ase63991.2025.00115 2025

[17] [17]

A comprehensive survey of robust deep learning in computer vi- sion.Journal of Automation and Intelligence, 2(4):175–195, 2023

Jia Liu and Yaochu Jin. A comprehensive survey of robust deep learning in computer vi- sion.Journal of Automation and Intelligence, 2(4):175–195, 2023. URL https://www. sciencedirect.com/science/article/pii/S294985542300045X

work page 2023

[18] [18]

Towards Deep Learning Models Resistant to Adversarial Attacks

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks, 2019. URL https://arxiv. org/abs/1706.06083

work page internal anchor Pith review Pith/arXiv arXiv 2019

[19] [19]

Adversarial robustness of deep neural networks: A survey from a formal verification perspective.IEEE Transactions on Dependable and Secure Computing, page 1–1, 2024

Mark Huasong Meng, Guangdong Bai, Sin Gee Teo, Zhe Hou, Yan Xiao, Yun Lin, and Jin Song Dong. Adversarial robustness of deep neural networks: A survey from a formal verification perspective.IEEE Transactions on Dependable and Secure Computing, page 1–1, 2024. URL http://dx.doi.org/10.1109/TDSC.2022.3179131

work page doi:10.1109/tdsc.2022.3179131 2024

[20] [20]

Deep learning methods and applications for electrical power systems: A comprehensive review.International Journal of Energy Research, 44(9):7136–7157, 2020

Asiye K Ozcanli, Fatma Yaprakdal, and Mustafa Baysal. Deep learning methods and applications for electrical power systems: A comprehensive review.International Journal of Energy Research, 44(9):7136–7157, 2020

work page 2020

[21] [21]

AI index report 2025, 2025

Stanford Institute for Human-Centered Artificial Intelligence. AI index report 2025, 2025. URL https://hai.stanford.edu/assets/files/hai_ai_index_report_2025.pdf

work page 2025

[22] [22]

Evaluating Robustness of Neural Networks with Mixed Integer Programming

Vincent Tjeng and Russ Tedrake. Evaluating robustness of neural networks with mixed integer programming.CoRR, abs/1711.07356, 2017. URLhttp://arxiv.org/abs/1711.07356

work page internal anchor Pith review Pith/arXiv arXiv 2017

[23] [23]

Xiao, and Russ Tedrake

Vincent Tjeng, Kai Y . Xiao, and Russ Tedrake. Evaluating robustness of neural networks with mixed integer programming. InInternational Conference on Learning Representations, 2019. URLhttps://openreview.net/forum?id=HyGIdiRqtm

work page 2019

[24] [24]

Hoang-Dung Tran, Xiaodong Yang, Diego Manzanas Lopez, Patrick Musau, Luan Viet Nguyen, Weiming Xiang, Stanley Bak, and Taylor T. Johnson. NNV: The neural network verification tool for deep neural networks and learning-enabled cyber-physical systems, 2020. URL https: //arxiv.org/abs/2004.05519

work page arXiv 2020

[25] [25]

Zico Kolter

Shiqi Wang, Huan Zhang, Kaidi Xu, Xue Lin, Suman Jana, Cho-Jui Hsieh, and J. Zico Kolter. Beta-CROWN: Efficient bound propagation with per-neuron split constraints for neural net- work robustness verification. InAdvances in Neural Information Processing Systems, vol- ume 34, pages 29909–29921, 2021. URL https://proceedings.neurips.cc/paper_ files/paper/20...

work page 2021

[26] [26]

Zico Kolter

Eric Wong, Frank Schmidt, Jan Hendrik Metzen, and J. Zico Kolter. Scaling prov- able adversarial defenses. InAdvances in Neural Information Processing Systems, vol- ume 31, 2018. URL https://proceedings.neurips.cc/paper_files/paper/2018/ file/358f9e7be09177c17d0d17ff73584307-Paper.pdf

work page 2018

[27] [27]

Dwyer, and Sebastian Elbaum

Dong Xu, David Shriver, Matthew B. Dwyer, and Sebastian Elbaum. Systematic generation of diverse benchmarks for dnn verification. InComputer Aided Verification: 32nd International Conference, CAV 2020, Los Angeles, CA, USA, July 21–24, 2020, Proceedings, Part I, page 97–121, Berlin, Heidelberg, 2020. Springer-Verlag. ISBN 978-3-030-53287-1. doi: 10.1007/ ...

work page doi:10.1007/978-3-030-53288-8_5 2020

[28] [28]

Automatic perturbation analysis for scalable certified robustness and beyond

Kaidi Xu, Zhouxing Shi, Huan Zhang, Yihan Wang, Kai-Wei Chang, Minlie Huang, Bhavya Kailkhura, Xue Lin, and Cho-Jui Hsieh. Automatic perturbation analysis for scalable certified robustness and beyond. InAdvances in Neural Information Processing Systems, volume 33, pages 1129–1141, 2020. URL https://proceedings.neurips.cc/paper_files/paper/ 2020/file/0cbc5...

work page 2020

[29] [29]

Fast and complete: Enabling complete neural network verification with rapid and massively parallel incomplete verifiers, 2021

Kaidi Xu, Huan Zhang, Shiqi Wang, Yihan Wang, Suman Jana, Xue Lin, and Cho-Jui Hsieh. Fast and complete: Enabling complete neural network verification with rapid and massively parallel incomplete verifiers, 2021. URLhttps://arxiv.org/abs/2011.13824

work page arXiv 2021

[30] [30]

Challenges and opportunities of deep learning-based process fault detection and diagnosis: a review.Neural Computing and Applications, 35(1):211–252, 2023

Jianbo Yu and Yue Zhang. Challenges and opportunities of deep learning-based process fault detection and diagnosis: a review.Neural Computing and Applications, 35(1):211–252, 2023. 12

work page 2023

[31] [31]

Efficient Neural Network Robustness Certification with General Activation Functions

Huan Zhang, Tsui-Wei Weng, Pin-Yu Chen, Cho-Jui Hsieh, and Luca Daniel. Efficient neural network robustness certification with general activation functions, 2018. URLhttps://arxiv. org/abs/1811.00866

work page internal anchor Pith review Pith/arXiv arXiv 2018

[32] [32]

SoundnessBench: A soundness benchmark for neural network verifiers.Transactions on Ma- chine Learning Research, 2025

Xingjian Zhou, Keyi Shen, Andy Xu, Hongji Xu, Cho-Jui Hsieh, Huan Zhang, and Zhouxing Shi. SoundnessBench: A soundness benchmark for neural network verifiers.Transactions on Ma- chine Learning Research, 2025. URLhttps://openreview.net/forum?id=UuYYldVLH3

work page 2025

[33] [33]

Fooling a complete neural network verifier

Dániel Zombori, Balázs Bánhelyi, Tibor Csendes, István Megyeri, and Márk Jelasity. Fooling a complete neural network verifier. InInternational Conference on Learning Representations,

work page

[34] [34]

veristress.yaml

URLhttps://openreview.net/forum?id=4IwieFS44l. 13 A Robust Constructor Details This appendix provides the implementation details for VeriStress-GT’s robust constructors described in Section 2. A.1 Exact-Radius via MILP Let an L-layer ReLU MLP have preactivations s(ℓ) =W (ℓ)z(ℓ−1) +b (ℓ) and activations z(ℓ) = σ(s(ℓ)), with z(0) =x 0 +δ and σ(t) = max{t,0}...

work page 2005