Scaling Neural Network Verification with Tensor Parallelism and Fully Sharded Data Parallelism
Pith reviewed 2026-06-27 17:18 UTC · model grok-4.3
The pith
Fully sharded data parallelism on weight matrices produces bitwise-identical verification bounds while cutting baseline memory 80-90%.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
FSDP shards only weight matrices with a per-layer AllGather, producing bounds that are bitwise identical to the single-GPU baseline: baseline memory drops by 80-90%, peak memory by 34-39% on wide MLPs. A complete unsat result is obtained for CIFAR-100 ResNet-large under FSDP.
What carries the argument
The per-layer AllGather operation in Fully Sharded Data Parallelism that shards only weight matrices while leaving relaxation coefficients on each device.
If this is right
- FSDP integrates directly with complete verification using beta-CROWN plus branch-and-bound.
- BoundConv convolutional layers remain compatible under the FSDP sharding scheme.
- Tensor parallelism at two GPUs yields approximately 2x peak-memory reduction on MNIST-FC benchmarks.
- Per-neuron alpha tensors become the dominant memory consumer once weights are sharded.
Where Pith is reading between the lines
- Sharding the alpha tensors themselves would be the next logical step to remove the new bottleneck.
- The same FSDP pattern could be applied to other bound-propagation libraries without altering their core relaxation logic.
- Tensor parallelism tightness loss might be mitigated by keeping more intermediate bounds on every device rather than forcing IBP.
Load-bearing premise
The sharding and AllGather steps preserve exact numerical values and overall soundness of bound propagation without introducing drift.
What would settle it
Run the identical verification query once on a single GPU and once under FSDP on multiple GPUs; any bound that differs beyond floating-point error falsifies the claim of bitwise identity.
read the original abstract
Formal neural network verification -- proving that a network satisfies safety properties for *all* inputs in a specified domain -- is bounded in practice by GPU memory: standard implementations of bound-propagation algorithms (IBP, CROWN, $\alpha$-CROWN) require weight and relaxation-coefficient matrices to reside entirely on one accelerator. We adapt two parallelism techniques originally developed for large-scale model training to the auto_LiRPA / $\alpha,\beta$-CROWN verification framework. Tensor Parallelism (TP) shards both weight and $A$-matrices across GPUs, achieving ${\approx}2\times$ peak-memory reduction at $P{=}2$; soundness is confirmed on VNN-COMP 2022 MNIST-FC benchmarks, though bound tightness degrades with the number of sharded zones due to forced IBP substitution for intermediate bounds inside sharded zones. Fully Sharded Data Parallelism (FSDP) shards only weight matrices with a per-layer AllGather, producing bounds that are bitwise identical to the single-GPU baseline: baseline memory drops by 80--90%, peak memory by 34--39% on wide MLPs. FSDP integrates cleanly with complete verification ($\beta$-CROWN + Branch-and-Bound) and with convolutional layers (BoundConv); a complete unsat result is obtained for CIFAR-100 ResNet-large (VNN-COMP 2024) under FSDP. Across all experiments the memory bottleneck in $\alpha$-CROWN+BaB mode proves to be per-neuron alpha tensors, not weight matrices, pointing to the key direction for future work.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper adapts Tensor Parallelism (TP) and Fully Sharded Data Parallelism (FSDP) from training to the auto_LiRPA / α,β-CROWN verification framework. TP shards both weights and A-matrices for ≈2× peak-memory reduction at P=2, with soundness confirmed on VNN-COMP 2022 MNIST-FC benchmarks but with bound tightness degrading due to forced IBP substitutions in sharded zones. FSDP shards only weights via per-layer AllGather, claims bitwise-identical bounds to the single-GPU baseline, reports 80–90% baseline and 34–39% peak memory reductions on wide MLPs, integrates with β-CROWN + Branch-and-Bound and BoundConv, and obtains a complete unsat result on CIFAR-100 ResNet-large (VNN-COMP 2024). The work identifies per-neuron alpha tensors as the remaining memory bottleneck.
Significance. If the bitwise-identity claim for FSDP holds without numerical drift, the result is significant because it directly addresses the GPU-memory barrier that currently limits bound-propagation verification to modest-sized networks. The complete verification result on a 2024 ResNet benchmark and the clean integration with existing BoundConv and BaB machinery are concrete engineering advances. The identification of alpha tensors rather than weights as the next bottleneck supplies a clear direction for follow-on work. The experimental grounding on standard VNN-COMP suites strengthens the practical utility.
major comments (2)
- [Abstract / FSDP implementation] Abstract (FSDP paragraph): the bitwise-identity guarantee for bounds produced after per-layer AllGather rests on the unstated assumptions that (a) the gather occurs immediately before every matrix multiplication that consumes the sharded weights, (b) no kernel fusion or re-association changes floating-point evaluation order, and (c) alpha tensors and relaxation coefficients are materialized and used identically to the monolithic case. The manuscript must supply pseudocode or a precise description of the AllGather insertion point relative to the IBP/CROWN forward passes and confirm that the BoundConv and β-CROWN code paths are literally unchanged once the full weight matrix is assembled.
- [TP section] TP description: the claim that soundness is preserved under forced IBP substitution inside sharded zones is load-bearing for the overall contribution, yet the manuscript provides no quantitative characterization of how the substitution affects final bound values or the number of verified properties on the VNN-COMP 2022 suite when the number of sharded zones increases beyond P=2.
minor comments (1)
- [Abstract] The abstract refers to “wide MLPs” for the memory-reduction numbers but does not list the exact layer widths or depths used; a table or explicit architecture description would improve reproducibility.
Simulated Author's Rebuttal
We thank the referee for the careful and constructive review. The two major comments identify areas where additional detail and data would strengthen the manuscript. We address each point below and will revise accordingly.
read point-by-point responses
-
Referee: [Abstract / FSDP implementation] Abstract (FSDP paragraph): the bitwise-identity guarantee for bounds produced after per-layer AllGather rests on the unstated assumptions that (a) the gather occurs immediately before every matrix multiplication that consumes the sharded weights, (b) no kernel fusion or re-association changes floating-point evaluation order, and (c) alpha tensors and relaxation coefficients are materialized and used identically to the monolithic case. The manuscript must supply pseudocode or a precise description of the AllGather insertion point relative to the IBP/CROWN forward passes and confirm that the BoundConv and β-CROWN code paths are literally unchanged once the full weight matrix is assembled.
Authors: We agree that the bitwise-identity claim requires explicit justification of the listed assumptions. In the revised manuscript we will insert pseudocode that shows the per-layer AllGather occurring immediately before each matrix multiplication that consumes the reconstructed weight matrix, for both the IBP and CROWN forward passes. We will also state that the BoundConv and β-CROWN code paths remain literally unchanged once the full weight matrix has been assembled by the AllGather, and that alpha tensors and relaxation coefficients are materialized and used exactly as in the monolithic baseline. Our current implementation performs no kernel fusion or re-association that would alter floating-point order. revision: yes
-
Referee: [TP section] TP description: the claim that soundness is preserved under forced IBP substitution inside sharded zones is load-bearing for the overall contribution, yet the manuscript provides no quantitative characterization of how the substitution affects final bound values or the number of verified properties on the VNN-COMP 2022 suite when the number of sharded zones increases beyond P=2.
Authors: We acknowledge that a quantitative characterization for P>2 would strengthen the soundness claim. While the manuscript already notes degradation of bound tightness with increasing sharded zones and confirms soundness at P=2, the revised version will add experimental results for P=4 on the VNN-COMP 2022 MNIST-FC benchmarks, reporting the effect on final bound values and the number of verified properties. revision: yes
Circularity Check
No circularity; engineering adaptation validated by external baseline comparison
full rationale
The paper describes an engineering port of TP and FSDP into the auto_LiRPA / α,β-CROWN framework. All central claims (memory reductions, bitwise-identical bounds under FSDP, complete unsat on CIFAR-100 ResNet-large) rest on direct experimental comparison to the single-GPU baseline and on VNN-COMP benchmarks. No derivation, prediction, or first-principles result is presented that reduces by construction to fitted parameters, self-citations, or renamed inputs. The work contains no mathematical uniqueness theorems, ansatzes smuggled via prior work, or self-definitional quantities. The single minor self-citation risk (if any) is non-load-bearing and does not affect the reported results.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
URL https://www.semanticscholar.org/paper/Reluplex% 3A-An-Efficient-SMT-Solver-for-Verifying-Katz-Barrett/ b0dc598adda48acab590f95a5985fcc7abf2aca9
Reluplex: An efficient smt solver for verifying deep neural networks. URL https://www.semanticscholar.org/paper/Reluplex% 3A-An-Efficient-SMT-Solver-for-Verifying-Katz-Barrett/ b0dc598adda48acab590f95a5985fcc7abf2aca9
-
[2]
Evaluating Robustness of Neural Networks with Mixed Integer Programming
Evaluating robustness of neural networks with mixed integer programming, 2017. URL https://arxiv.org/abs/1711.07356
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[3]
URL https: //arxiv.org/abs/2212.08567
Optimized symbolic interval propagation for neural network verification, 2022. URL https: //arxiv.org/abs/2212.08567
-
[4]
URLhttps://arxiv.org/abs/2212.10376
The third international verification of neural networks competition (vnn-comp 2022): Summary and results, 2022. URLhttps://arxiv.org/abs/2212.10376
-
[5]
URLhttps://arxiv.org/abs/2312.16760
The fourth international verification of neural networks competition (vnn-comp 2023): Sum- mary and results, 2023. URLhttps://arxiv.org/abs/2312.16760
-
[6]
Marabou 2.0: A versatile formal analyzer of neural networks, 2024. URL https://arxiv. org/html/2401.14461v1
-
[7]
URLhttps://arxiv.org/abs/2412.19985
The fifth international verification of neural networks competition (vnn-comp 2024): Summary and results, 2024. URLhttps://arxiv.org/abs/2412.19985
-
[8]
nnenum: Verification of ReLU neural networks with optimized abstrac- tion refinement, 2021
Stanley Bak. nnenum: Verification of ReLU neural networks with optimized abstrac- tion refinement, 2021. URL https://www.researchgate.net/publication/ 351683889_nnenum_Verification_of_ReLU_Neural_Networks_with_ Optimized_Abstraction_Refinement. 13
2021
-
[9]
Training neural networks is ∃R-complete
Daniel Bertschinger, Christoph Hertrich, Paul Jungeblut, Till Miltzow, and Si- mon Weber. Training neural networks is ∃R-complete. InAdvances in Neural Information Processing Systems (NeurIPS), volume 34, 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/ 9813b270ed0288e7c0388f0fd4ec68f5-Paper.pdf
2021
-
[10]
VNN-LIB: The verification of neural networks library, 2021
Stefano Demarchi, Dario Guidotti, Luca Pulina, and Armando Tacchella. VNN-LIB: The verification of neural networks library, 2021. URLhttps://www.vnnlib.org/
2021
-
[11]
Supporting standardization of neural networks verification with vnn-lib and coconet
Stefano Demarchi, Dario Guidotti, Luca Pulina, and Armando Tacchella. Supporting standardization of neural networks verification with vnn-lib and coconet. InProceed- ings of the 6th Workshop on Formal Methods for ML-Enabled Autonomous Systems (FoM- LAS 2023), volume 16 ofKalpa Publications in Computing, pages 47–58, 2023. URL https://easychair.org/publica...
2023
-
[12]
NeuralSAT: A DPLL(T)-based framework for verifying deep neural networks, 2023
Hai Duong, ThanhVu Nguyen, and Matthew Dwyer. NeuralSAT: A DPLL(T)-based framework for verifying deep neural networks, 2023. URL https://github.com/dynaroars/ neuralsat
2023
-
[13]
Enhancing neural networks through formal verification
Dario Guidotti. Enhancing neural networks through formal verification. InProceedings of the AI*IA 2019 Doctoral Consortium (AI*IA 2019 DC), volume 2495 ofCEUR Workshop Proceed- ings, pages 107–112. CEUR-WS.org, 2019. URL https://ceur-ws.org/Vol-2495/ paper13.pdf
2019
-
[14]
Le, Yonghui Wu, and Zhifeng Chen
Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Dehao Chen, Mia Xu Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V . Le, Yonghui Wu, and Zhifeng Chen. Gpipe: Efficient training of giant neural networks using pipeline parallelism. InAdvances in Neural Information Processing Systems (NeurIPS), volume 32, 2019. URL https://arxiv.org/abs/1811. 06965
2019
-
[15]
Jiangchao Liu, Liqian Chen, Antoine Min´e, and Ji Wang. Input validation for neural networks via runtime local robustness verification.CoRR, abs/2002.03339, 2020. URL https:// arxiv.org/abs/2002.03339
-
[16]
Advancing neural network verifica- tion through hierarchical safety abstract interpretation, 2025
Luca Marzari, Isabella Mastroeni, and Alessandro Farinelli. Advancing neural network verifica- tion through hierarchical safety abstract interpretation, 2025. URL https://arxiv.org/ abs/2505.05235
-
[17]
Pierre-Jean Meyer, Alex Devonport, and Murat Arcak. Interval reachability analysis: Bounding trajectories of uncertain systems with boxes for control and verification, 2021. URL https: //www.researchgate.net/publication/348624119
-
[18]
Moore, R
Ramon E. Moore, R. Baker Kearfott, and Michael J. Cloud.Introduction to Interval Analysis. SIAM, 2009. ISBN 978-0-898716-69-6
2009
-
[19]
Scaling polyhedral neural network verification on GPUs, 2021
Christoph M ¨uller, Gagandeep Makarchuk, Gagandeep Singh, Markus P ¨uschel, and Martin Vechev. Scaling polyhedral neural network verification on GPUs, 2021. URL https:// files.sri.inf.ethz.ch/website/papers/mller2021neural.pdf. 14
2021
-
[20]
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, and Yuxiong He. Zero: Memory optimiza- tions toward training trillion parameter models, 2020. URL https://arxiv.org/abs/ 1910.02054
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[21]
˜Zico Kolter, Suman Jana, Cho-Jui Hsieh, and Huan Zhang
Zhouxing Shi, Qirui Jin, J. ˜Zico Kolter, Suman Jana, Cho-Jui Hsieh, and Huan Zhang. Scalable neural network verification with branch-and-bound inferred cutting planes. InAdvances in Neural Information Processing Systems (NeurIPS), volume 37, 2024. URL https://proceedings.neurips.cc/paper_files/paper/2024/file/ 33d93e4dc57453e7667b20f62e7c0681-Paper-Confe...
2024
-
[22]
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, and Bryan Catanzaro. Megatron-lm: Training multi-billion parameter language models using model parallelism, 2020. URLhttps://arxiv.org/abs/1909.08053
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[23]
An abstract domain for certifying neural networks, 2019
Gagandeep Singh, Timon Gehr, Markus P ¨uschel, and Martin Vechev. An abstract domain for certifying neural networks, 2019. URL https://ggndpsngh.github.io/files/ DeepPoly.pdf
2019
-
[24]
Zico Kolter
Shiqi Wang, Huan Zhang, Kaidi Xu, Xue Lin, Suman Jana, Cho-Jui Hsieh, and J. Zico Kolter. Beta-CROWN: Efficient bound propagation with per-neuron split constraints for neural network robustness verification. InAdvances in Neural Information Processing Systems (NeurIPS),
- [25]
-
[26]
Automatic perturbation analysis for scalable certified robustness and beyond, 2020
Kaidi Xu, Zhouxing Shi, Huan Zhang, Yihan Wang, Kai-Wei Chang, Minlie Huang, Bhavya Kailkhura, Xue Lin, and Cho-Jui Hsieh. Automatic perturbation analysis for scalable certified robustness and beyond, 2020. URL https://github.com/ Verified-Intelligence/auto_LiRPA
2020
-
[27]
Efficient Neural Network Robustness Certification with General Activation Functions
Huan Zhang, Tsui-Wei Weng, Pin-Yu Chen, Cho-Jui Hsieh, and Luca Daniel. Efficient neural network robustness certification with general activation functions, 2018. URL https: //arxiv.org/abs/1811.00866
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[28]
˜Zico Kolter
Huan Zhang, Shiqi Wang, Kaidi Xu, Linyi Li, Bo Li, Suman Jana, Cho-Jui Hsieh, and J. ˜Zico Kolter. alpha-beta-CROWN: A complete and efficient bound propagation based neural network verifier (vnn-comp 2021–2024), 2023. URL https://github.com/ Verified-Intelligence/alpha-beta-CROWN_vnncomp23. 15
2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.