pith. sign in

arxiv: 2606.09377 · v2 · pith:G33PDSREnew · submitted 2026-06-08 · 💻 cs.LG · cs.AI

Scaling Neural Network Verification with Tensor Parallelism and Fully Sharded Data Parallelism

Pith reviewed 2026-06-27 17:18 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords neural network verificationbound propagationtensor parallelismfully sharded data parallelismFSDPalpha-CROWNmemory reductioncomplete verification
0
0 comments X

The pith

Fully sharded data parallelism on weight matrices produces bitwise-identical verification bounds while cutting baseline memory 80-90%.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper adapts tensor parallelism and fully sharded data parallelism from training to bound-propagation verification frameworks such as auto_LiRPA and alpha-CROWN. Tensor parallelism shards both weights and A-matrices across GPUs for roughly 2x peak-memory reduction, though it requires forced IBP substitutions inside sharded zones that loosen bounds. Fully sharded data parallelism shards only weights via per-layer AllGather, keeps every bound bitwise identical to the single-GPU run, and integrates directly with beta-CROWN branch-and-bound and convolutional layers. This enables a complete unsat result on a CIFAR-100 ResNet-large instance from VNN-COMP 2024. The experiments identify per-neuron alpha tensors, not weights, as the remaining memory limit.

Core claim

FSDP shards only weight matrices with a per-layer AllGather, producing bounds that are bitwise identical to the single-GPU baseline: baseline memory drops by 80-90%, peak memory by 34-39% on wide MLPs. A complete unsat result is obtained for CIFAR-100 ResNet-large under FSDP.

What carries the argument

The per-layer AllGather operation in Fully Sharded Data Parallelism that shards only weight matrices while leaving relaxation coefficients on each device.

If this is right

  • FSDP integrates directly with complete verification using beta-CROWN plus branch-and-bound.
  • BoundConv convolutional layers remain compatible under the FSDP sharding scheme.
  • Tensor parallelism at two GPUs yields approximately 2x peak-memory reduction on MNIST-FC benchmarks.
  • Per-neuron alpha tensors become the dominant memory consumer once weights are sharded.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Sharding the alpha tensors themselves would be the next logical step to remove the new bottleneck.
  • The same FSDP pattern could be applied to other bound-propagation libraries without altering their core relaxation logic.
  • Tensor parallelism tightness loss might be mitigated by keeping more intermediate bounds on every device rather than forcing IBP.

Load-bearing premise

The sharding and AllGather steps preserve exact numerical values and overall soundness of bound propagation without introducing drift.

What would settle it

Run the identical verification query once on a single GPU and once under FSDP on multiple GPUs; any bound that differs beyond floating-point error falsifies the claim of bitwise identity.

read the original abstract

Formal neural network verification -- proving that a network satisfies safety properties for *all* inputs in a specified domain -- is bounded in practice by GPU memory: standard implementations of bound-propagation algorithms (IBP, CROWN, $\alpha$-CROWN) require weight and relaxation-coefficient matrices to reside entirely on one accelerator. We adapt two parallelism techniques originally developed for large-scale model training to the auto_LiRPA / $\alpha,\beta$-CROWN verification framework. Tensor Parallelism (TP) shards both weight and $A$-matrices across GPUs, achieving ${\approx}2\times$ peak-memory reduction at $P{=}2$; soundness is confirmed on VNN-COMP 2022 MNIST-FC benchmarks, though bound tightness degrades with the number of sharded zones due to forced IBP substitution for intermediate bounds inside sharded zones. Fully Sharded Data Parallelism (FSDP) shards only weight matrices with a per-layer AllGather, producing bounds that are bitwise identical to the single-GPU baseline: baseline memory drops by 80--90%, peak memory by 34--39% on wide MLPs. FSDP integrates cleanly with complete verification ($\beta$-CROWN + Branch-and-Bound) and with convolutional layers (BoundConv); a complete unsat result is obtained for CIFAR-100 ResNet-large (VNN-COMP 2024) under FSDP. Across all experiments the memory bottleneck in $\alpha$-CROWN+BaB mode proves to be per-neuron alpha tensors, not weight matrices, pointing to the key direction for future work.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper adapts Tensor Parallelism (TP) and Fully Sharded Data Parallelism (FSDP) from training to the auto_LiRPA / α,β-CROWN verification framework. TP shards both weights and A-matrices for ≈2× peak-memory reduction at P=2, with soundness confirmed on VNN-COMP 2022 MNIST-FC benchmarks but with bound tightness degrading due to forced IBP substitutions in sharded zones. FSDP shards only weights via per-layer AllGather, claims bitwise-identical bounds to the single-GPU baseline, reports 80–90% baseline and 34–39% peak memory reductions on wide MLPs, integrates with β-CROWN + Branch-and-Bound and BoundConv, and obtains a complete unsat result on CIFAR-100 ResNet-large (VNN-COMP 2024). The work identifies per-neuron alpha tensors as the remaining memory bottleneck.

Significance. If the bitwise-identity claim for FSDP holds without numerical drift, the result is significant because it directly addresses the GPU-memory barrier that currently limits bound-propagation verification to modest-sized networks. The complete verification result on a 2024 ResNet benchmark and the clean integration with existing BoundConv and BaB machinery are concrete engineering advances. The identification of alpha tensors rather than weights as the next bottleneck supplies a clear direction for follow-on work. The experimental grounding on standard VNN-COMP suites strengthens the practical utility.

major comments (2)
  1. [Abstract / FSDP implementation] Abstract (FSDP paragraph): the bitwise-identity guarantee for bounds produced after per-layer AllGather rests on the unstated assumptions that (a) the gather occurs immediately before every matrix multiplication that consumes the sharded weights, (b) no kernel fusion or re-association changes floating-point evaluation order, and (c) alpha tensors and relaxation coefficients are materialized and used identically to the monolithic case. The manuscript must supply pseudocode or a precise description of the AllGather insertion point relative to the IBP/CROWN forward passes and confirm that the BoundConv and β-CROWN code paths are literally unchanged once the full weight matrix is assembled.
  2. [TP section] TP description: the claim that soundness is preserved under forced IBP substitution inside sharded zones is load-bearing for the overall contribution, yet the manuscript provides no quantitative characterization of how the substitution affects final bound values or the number of verified properties on the VNN-COMP 2022 suite when the number of sharded zones increases beyond P=2.
minor comments (1)
  1. [Abstract] The abstract refers to “wide MLPs” for the memory-reduction numbers but does not list the exact layer widths or depths used; a table or explicit architecture description would improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review. The two major comments identify areas where additional detail and data would strengthen the manuscript. We address each point below and will revise accordingly.

read point-by-point responses
  1. Referee: [Abstract / FSDP implementation] Abstract (FSDP paragraph): the bitwise-identity guarantee for bounds produced after per-layer AllGather rests on the unstated assumptions that (a) the gather occurs immediately before every matrix multiplication that consumes the sharded weights, (b) no kernel fusion or re-association changes floating-point evaluation order, and (c) alpha tensors and relaxation coefficients are materialized and used identically to the monolithic case. The manuscript must supply pseudocode or a precise description of the AllGather insertion point relative to the IBP/CROWN forward passes and confirm that the BoundConv and β-CROWN code paths are literally unchanged once the full weight matrix is assembled.

    Authors: We agree that the bitwise-identity claim requires explicit justification of the listed assumptions. In the revised manuscript we will insert pseudocode that shows the per-layer AllGather occurring immediately before each matrix multiplication that consumes the reconstructed weight matrix, for both the IBP and CROWN forward passes. We will also state that the BoundConv and β-CROWN code paths remain literally unchanged once the full weight matrix has been assembled by the AllGather, and that alpha tensors and relaxation coefficients are materialized and used exactly as in the monolithic baseline. Our current implementation performs no kernel fusion or re-association that would alter floating-point order. revision: yes

  2. Referee: [TP section] TP description: the claim that soundness is preserved under forced IBP substitution inside sharded zones is load-bearing for the overall contribution, yet the manuscript provides no quantitative characterization of how the substitution affects final bound values or the number of verified properties on the VNN-COMP 2022 suite when the number of sharded zones increases beyond P=2.

    Authors: We acknowledge that a quantitative characterization for P>2 would strengthen the soundness claim. While the manuscript already notes degradation of bound tightness with increasing sharded zones and confirms soundness at P=2, the revised version will add experimental results for P=4 on the VNN-COMP 2022 MNIST-FC benchmarks, reporting the effect on final bound values and the number of verified properties. revision: yes

Circularity Check

0 steps flagged

No circularity; engineering adaptation validated by external baseline comparison

full rationale

The paper describes an engineering port of TP and FSDP into the auto_LiRPA / α,β-CROWN framework. All central claims (memory reductions, bitwise-identical bounds under FSDP, complete unsat on CIFAR-100 ResNet-large) rest on direct experimental comparison to the single-GPU baseline and on VNN-COMP benchmarks. No derivation, prediction, or first-principles result is presented that reduces by construction to fitted parameters, self-citations, or renamed inputs. The work contains no mathematical uniqueness theorems, ansatzes smuggled via prior work, or self-definitional quantities. The single minor self-citation risk (if any) is non-load-bearing and does not affect the reported results.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no information on free parameters, axioms, or invented entities. No numbers are fitted inside the paper; the work reports empirical memory measurements against an external baseline.

pith-pipeline@v0.9.1-grok · 5822 in / 1277 out tokens · 21265 ms · 2026-06-27T17:18:35.299879+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

28 extracted references · 13 canonical work pages · 4 internal anchors

  1. [1]

    URL https://www.semanticscholar.org/paper/Reluplex% 3A-An-Efficient-SMT-Solver-for-Verifying-Katz-Barrett/ b0dc598adda48acab590f95a5985fcc7abf2aca9

    Reluplex: An efficient smt solver for verifying deep neural networks. URL https://www.semanticscholar.org/paper/Reluplex% 3A-An-Efficient-SMT-Solver-for-Verifying-Katz-Barrett/ b0dc598adda48acab590f95a5985fcc7abf2aca9

  2. [2]

    Evaluating Robustness of Neural Networks with Mixed Integer Programming

    Evaluating robustness of neural networks with mixed integer programming, 2017. URL https://arxiv.org/abs/1711.07356

  3. [3]

    URL https: //arxiv.org/abs/2212.08567

    Optimized symbolic interval propagation for neural network verification, 2022. URL https: //arxiv.org/abs/2212.08567

  4. [4]

    URLhttps://arxiv.org/abs/2212.10376

    The third international verification of neural networks competition (vnn-comp 2022): Summary and results, 2022. URLhttps://arxiv.org/abs/2212.10376

  5. [5]

    URLhttps://arxiv.org/abs/2312.16760

    The fourth international verification of neural networks competition (vnn-comp 2023): Sum- mary and results, 2023. URLhttps://arxiv.org/abs/2312.16760

  6. [6]

    URL https://arxiv

    Marabou 2.0: A versatile formal analyzer of neural networks, 2024. URL https://arxiv. org/html/2401.14461v1

  7. [7]

    URLhttps://arxiv.org/abs/2412.19985

    The fifth international verification of neural networks competition (vnn-comp 2024): Summary and results, 2024. URLhttps://arxiv.org/abs/2412.19985

  8. [8]

    nnenum: Verification of ReLU neural networks with optimized abstrac- tion refinement, 2021

    Stanley Bak. nnenum: Verification of ReLU neural networks with optimized abstrac- tion refinement, 2021. URL https://www.researchgate.net/publication/ 351683889_nnenum_Verification_of_ReLU_Neural_Networks_with_ Optimized_Abstraction_Refinement. 13

  9. [9]

    Training neural networks is ∃R-complete

    Daniel Bertschinger, Christoph Hertrich, Paul Jungeblut, Till Miltzow, and Si- mon Weber. Training neural networks is ∃R-complete. InAdvances in Neural Information Processing Systems (NeurIPS), volume 34, 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/ 9813b270ed0288e7c0388f0fd4ec68f5-Paper.pdf

  10. [10]

    VNN-LIB: The verification of neural networks library, 2021

    Stefano Demarchi, Dario Guidotti, Luca Pulina, and Armando Tacchella. VNN-LIB: The verification of neural networks library, 2021. URLhttps://www.vnnlib.org/

  11. [11]

    Supporting standardization of neural networks verification with vnn-lib and coconet

    Stefano Demarchi, Dario Guidotti, Luca Pulina, and Armando Tacchella. Supporting standardization of neural networks verification with vnn-lib and coconet. InProceed- ings of the 6th Workshop on Formal Methods for ML-Enabled Autonomous Systems (FoM- LAS 2023), volume 16 ofKalpa Publications in Computing, pages 47–58, 2023. URL https://easychair.org/publica...

  12. [12]

    NeuralSAT: A DPLL(T)-based framework for verifying deep neural networks, 2023

    Hai Duong, ThanhVu Nguyen, and Matthew Dwyer. NeuralSAT: A DPLL(T)-based framework for verifying deep neural networks, 2023. URL https://github.com/dynaroars/ neuralsat

  13. [13]

    Enhancing neural networks through formal verification

    Dario Guidotti. Enhancing neural networks through formal verification. InProceedings of the AI*IA 2019 Doctoral Consortium (AI*IA 2019 DC), volume 2495 ofCEUR Workshop Proceed- ings, pages 107–112. CEUR-WS.org, 2019. URL https://ceur-ws.org/Vol-2495/ paper13.pdf

  14. [14]

    Le, Yonghui Wu, and Zhifeng Chen

    Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Dehao Chen, Mia Xu Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V . Le, Yonghui Wu, and Zhifeng Chen. Gpipe: Efficient training of giant neural networks using pipeline parallelism. InAdvances in Neural Information Processing Systems (NeurIPS), volume 32, 2019. URL https://arxiv.org/abs/1811. 06965

  15. [15]

    Input validation for neural networks via runtime local robustness verification.CoRR, abs/2002.03339, 2020

    Jiangchao Liu, Liqian Chen, Antoine Min´e, and Ji Wang. Input validation for neural networks via runtime local robustness verification.CoRR, abs/2002.03339, 2020. URL https:// arxiv.org/abs/2002.03339

  16. [16]

    Advancing neural network verifica- tion through hierarchical safety abstract interpretation, 2025

    Luca Marzari, Isabella Mastroeni, and Alessandro Farinelli. Advancing neural network verifica- tion through hierarchical safety abstract interpretation, 2025. URL https://arxiv.org/ abs/2505.05235

  17. [17]

    Interval reachability analysis: Bounding trajectories of uncertain systems with boxes for control and verification, 2021

    Pierre-Jean Meyer, Alex Devonport, and Murat Arcak. Interval reachability analysis: Bounding trajectories of uncertain systems with boxes for control and verification, 2021. URL https: //www.researchgate.net/publication/348624119

  18. [18]

    Moore, R

    Ramon E. Moore, R. Baker Kearfott, and Michael J. Cloud.Introduction to Interval Analysis. SIAM, 2009. ISBN 978-0-898716-69-6

  19. [19]

    Scaling polyhedral neural network verification on GPUs, 2021

    Christoph M ¨uller, Gagandeep Makarchuk, Gagandeep Singh, Markus P ¨uschel, and Martin Vechev. Scaling polyhedral neural network verification on GPUs, 2021. URL https:// files.sri.inf.ethz.ch/website/papers/mller2021neural.pdf. 14

  20. [20]

    ZeRO: Memory Optimizations Toward Training Trillion Parameter Models

    Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, and Yuxiong He. Zero: Memory optimiza- tions toward training trillion parameter models, 2020. URL https://arxiv.org/abs/ 1910.02054

  21. [21]

    ˜Zico Kolter, Suman Jana, Cho-Jui Hsieh, and Huan Zhang

    Zhouxing Shi, Qirui Jin, J. ˜Zico Kolter, Suman Jana, Cho-Jui Hsieh, and Huan Zhang. Scalable neural network verification with branch-and-bound inferred cutting planes. InAdvances in Neural Information Processing Systems (NeurIPS), volume 37, 2024. URL https://proceedings.neurips.cc/paper_files/paper/2024/file/ 33d93e4dc57453e7667b20f62e7c0681-Paper-Confe...

  22. [22]

    Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

    Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, and Bryan Catanzaro. Megatron-lm: Training multi-billion parameter language models using model parallelism, 2020. URLhttps://arxiv.org/abs/1909.08053

  23. [23]

    An abstract domain for certifying neural networks, 2019

    Gagandeep Singh, Timon Gehr, Markus P ¨uschel, and Martin Vechev. An abstract domain for certifying neural networks, 2019. URL https://ggndpsngh.github.io/files/ DeepPoly.pdf

  24. [24]

    Zico Kolter

    Shiqi Wang, Huan Zhang, Kaidi Xu, Xue Lin, Suman Jana, Cho-Jui Hsieh, and J. Zico Kolter. Beta-CROWN: Efficient bound propagation with per-neuron split constraints for neural network robustness verification. InAdvances in Neural Information Processing Systems (NeurIPS),

  25. [25]

    URLhttps://arxiv.org/abs/2103.06624

  26. [26]

    Automatic perturbation analysis for scalable certified robustness and beyond, 2020

    Kaidi Xu, Zhouxing Shi, Huan Zhang, Yihan Wang, Kai-Wei Chang, Minlie Huang, Bhavya Kailkhura, Xue Lin, and Cho-Jui Hsieh. Automatic perturbation analysis for scalable certified robustness and beyond, 2020. URL https://github.com/ Verified-Intelligence/auto_LiRPA

  27. [27]

    Efficient Neural Network Robustness Certification with General Activation Functions

    Huan Zhang, Tsui-Wei Weng, Pin-Yu Chen, Cho-Jui Hsieh, and Luca Daniel. Efficient neural network robustness certification with general activation functions, 2018. URL https: //arxiv.org/abs/1811.00866

  28. [28]

    ˜Zico Kolter

    Huan Zhang, Shiqi Wang, Kaidi Xu, Linyi Li, Bo Li, Suman Jana, Cho-Jui Hsieh, and J. ˜Zico Kolter. alpha-beta-CROWN: A complete and efficient bound propagation based neural network verifier (vnn-comp 2021–2024), 2023. URL https://github.com/ Verified-Intelligence/alpha-beta-CROWN_vnncomp23. 15