arxiv: 2605.01519 · v1 · submitted 2026-05-02 · 💻 cs.CV

Recognition: unknown

Certified vs. Empirical Adversarial Robust-ness via Hybrid Convolutions with Attention Stochasticity

Joy Dhar , Song Xia , Manish Kumar Pandey , Maryam Haghighat , Azadeh Alavi , Ferdous Sohel , Wenyu Zhang , Nayyar Zaidi

Authors on Pith no claims yet

Pith reviewed 2026-05-09 14:27 UTC · model grok-4.3

classification 💻 cs.CV

keywords adversarial robustnesscertified defenseempirical defenseLipschitz networksstochastic attentionhybrid convolutionsrandomized defensemedical imaging

0 comments

The pith

Coupling 1-Lipschitz convolutions with stochastic random projections and attention noise produces a network with formal L2 certificates that also resists strong empirical attacks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents HyCAS as a defense architecture that merges deterministic Lipschitz constraints on convolutions with two forms of controlled randomness inside the network layers. This design keeps the overall mapping Lipschitz-bounded by 2, which directly supplies formal robustness certificates under L2 perturbations while also raising resistance to practical attack algorithms. Experiments across CIFAR, ImageNet, and two medical imaging collections demonstrate higher certified accuracy and higher empirical robustness than prior certified or empirical methods, all without any drop in clean-data performance. The approach therefore addresses the common situation in which strengthening one kind of robustness measure weakens the other or harms ordinary accuracy.

Core claim

HyCAS unifies deterministic and randomized principles by coupling 1-Lipschitz, spectrally normalized convolutions with spectral normalized random projection filters and a randomized attention-noise mechanism to realize a randomized defense whose overall Lipschitz constant is at most 2 and therefore admits formal certificates; extensive experiments show this architecture surpasses prior leading certified and empirical defenses on CIFAR-10/100, ImageNet-1k, NIH Chest X-ray, and HAM10000 while preserving clean accuracy.

What carries the argument

HyCAS, the hybrid architecture that couples spectrally normalized 1-Lipschitz convolutions with spectral-normalized random projection filters and randomized attention-noise to enforce an overall Lipschitz bound of 2.

If this is right

Certified accuracy rises by up to 7.3 percent on the NIH Chest X-ray dataset relative to prior certified defenses.
Empirical robustness rises by up to 3.1 percent on the HAM10000 dataset relative to prior empirical defenses.
Clean accuracy remains unchanged across all tested benchmarks.
A single randomized Lipschitz-constrained network can improve both certified L2 and empirical L robustness at once.
The resulting models are positioned for safer use in high-stakes imaging applications.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same pattern of embedding bounded randomness inside Lipschitz layers could be tested on transformer blocks or on non-image data modalities to check whether the dual robustness benefit generalizes.
If the Lipschitz bound survives the stochastic components, it supplies a route to certificates for other randomized defenses that currently lack them.
The approach suggests that the certified-versus-empirical trade-off is not fundamental but can be narrowed by architectural choices that jointly control smoothness and randomness.

Load-bearing premise

The specific combination of 1-Lipschitz convolutions with the two chosen stochastic components actually yields a network whose overall Lipschitz constant stays at most 2 and therefore admits formal certificates.

What would settle it

A concrete counter-example would be an input pair within the certified L2 radius on which the network output changes, or a replicated evaluation on NIH Chest X-ray or HAM10000 that fails to reproduce the reported certified or empirical accuracy gains under the same attack budgets.

Figures

Figures reproduced from arXiv: 2605.01519 by Azadeh Alavi, Ferdous Sohel, Joy Dhar, Manish Kumar Pandey, Maryam Haghighat, Nayyar Zaidi, Song Xia, Wenyu Zhang.

**Figure 1.** Figure 1: Overview of HyCAS mechanism. It consists of three parallel streams—FDPAN, SNCAN, and RPFAN—each built from 1-Lipschitz cores with Randomized Attention Noise Injection (RANI) residuals. Per-channel convex gating fuses the streams to form Gb(; Ω). Each stream is ≤ 2-Lipschitz; the fused stream and the stacked network remain ≤ 2-Lipschitz, enabling a margin-based ℓ2 certificate. 3 HYBRID CONVOLUTIONS WITH ATT… view at source ↗

**Figure 5.** Figure 5: Overview of FDPAN stream. A four-stage cascade: (i) low-pass DCT masking and orthogonal 1×1 channel mix (both 1-Lipschitz); (ii) SNCAN block (spectrally normalized convolution) with RANI; (iii) additional RANI; and (iv) skip/gating. The stream remains ≤2-Lipschitz view at source ↗

**Figure 7.** Figure 7: Overview of RPFAN stream. (i) Orthogonal 1×1 pre-mix (1-Lipschitz). (ii) Batch-aware spectral normalization of a random-projection convolution (1-Lipschitz core). (iii) RANI residual, yielding a ≤ 2- Lipschitz stochastic block view at source ↗

read the original abstract

We introduce Hybrid Convolutions with Attention Stochasticity (HyCAS), an adversarial defense that narrows the long-standing gap between provable robustness under L2 certificates and empirical robustness against strong L attacks, while preserving strong generalization across diverse imaging benchmarks. HyCAS unifies deterministic and randomized principles by coupling 1-Lipschitz, spectrally normalized convolutions with two stochastic components, spectral normalized random, projection filters and a randomized attention-noise mechanism, to realize a randomized defense. Injecting smoothing randomness inside the architecture yields an overall <= 2-Lipschitz network with formal certificates. Exten-sive experiments on diverse imaging benchmarks, including CIFAR-10/100, ImageNet-1k, NIH Chest X-ray, HAM10000, show that HyCAS surpasses prior leading certified and empirical defenses, boosting certified accuracy by up to 7.3% (on NIH Chest X-ray) and empirical robustness by up to 3.1% (on HAM10000), without sacrificing clean accuracy. These results show that a randomized Lipschitz constrained architecture can simultaneously improve both certified L2 and empirical L adversarial robustness, thereby supporting safer deployment of deep models in high-stakes applications. Code: https://github.com/misti1203/HyCAS

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

HyCAS combines spectrally normalized convolutions with stochastic attention and random projections to claim simultaneous gains in certified L2 and empirical robustness, but the key <=2-Lipschitz bound has no visible derivation.

read the letter

The paper's core move is to take 1-Lipschitz spectrally normalized convolutions, add spectral-normalized random projection filters, and layer on randomized attention noise so the whole thing stays under a Lipschitz constant of 2 and therefore admits formal L2 certificates. It then reports that this hybrid beats prior certified and empirical defenses on CIFAR-10/100, ImageNet-1k, NIH Chest X-ray, and HAM10000, with certified accuracy up 7.3 % on the chest X-ray set and empirical robustness up 3.1 % on the skin-lesion set, all while clean accuracy holds steady. That joint improvement on standard benchmarks plus medical data is the concrete result worth noting. The architecture itself is new in its specific pairing of those three pieces, even if each piece draws from earlier Lipschitz and smoothing work. The experiments are run on four datasets and two attack types, which gives the numbers some breadth. The soft spot is exactly the one the stress-test flags: the abstract asserts that the injected randomness produces an overall <=2-Lipschitz network, yet nothing in the provided text shows the composition rule for the attention-noise term. Softmax is not Lipschitz, so the chain-rule or expectation argument that keeps the product under 2 needs to be written out; without it the certified-accuracy claims rest on an unverified premise. Hyper-parameter choices for the stochastic parts could also be tuned to the benchmarks, though the paper does not appear to fit directly to test labels. This work is aimed at people already working on certified adversarial defenses who want an architecture that tries to improve both theory and practice at once. A reader who cares about medical imaging robustness might extract the empirical numbers for follow-up experiments. I would send it to peer review because the empirical side is concrete and the idea is coherent enough to be worth a referee's time, even if the Lipschitz section needs a clear derivation before acceptance.

Referee Report

1 major / 2 minor

Summary. The paper introduces Hybrid Convolutions with Attention Stochasticity (HyCAS), which couples spectrally normalized 1-Lipschitz convolutions with spectral-normalized random projection filters and a randomized attention-noise mechanism. This is claimed to produce an overall network with Lipschitz constant at most 2, enabling formal L2 certificates, while also improving empirical robustness against L attacks. Experiments across CIFAR-10/100, ImageNet-1k, NIH Chest X-ray, and HAM10000 report gains of up to 7.3% in certified accuracy and 3.1% in empirical robustness without loss of clean accuracy.

Significance. If the Lipschitz bound is rigorously established, the work would be significant for narrowing the gap between certified and empirical adversarial defenses by embedding smoothing randomness inside a constrained architecture. The multi-benchmark evaluation and provision of open-source code at https://github.com/misti1203/HyCAS are strengths that support reproducibility and broader applicability in high-stakes imaging domains.

major comments (1)

[Lipschitz analysis / theoretical bound section] The central claim that the architecture realizes an overall <=2-Lipschitz network (stated in the abstract and used to justify formal certificates) lacks an explicit derivation. No chain-rule or expectation-based argument is supplied showing how the randomized attention-noise term (involving non-Lipschitz softmax and additive stochastic perturbation) composes with the preceding 1-Lipschitz convolutions and random projection filters to stay within the factor of 2. This is load-bearing for the certified-accuracy results such as the +7.3% gain on NIH Chest X-ray.

minor comments (2)

[Abstract] Abstract contains typographical errors: 'Robust-ness' (hyphenated), 'Exten-sive' (split), and 'random, projection' (extraneous comma).
[Experiments section] The reported performance gains would be strengthened by inclusion of error bars or standard deviations across runs, particularly for the cross-dataset claims.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed and constructive review. The positive assessment of the work's significance, multi-benchmark evaluation, and open-source code is appreciated. We address the single major comment below and will revise the manuscript to strengthen the presentation of the theoretical results.

read point-by-point responses

Referee: [Lipschitz analysis / theoretical bound section] The central claim that the architecture realizes an overall <=2-Lipschitz network (stated in the abstract and used to justify formal certificates) lacks an explicit derivation. No chain-rule or expectation-based argument is supplied showing how the randomized attention-noise term (involving non-Lipschitz softmax and additive stochastic perturbation) composes with the preceding 1-Lipschitz convolutions and random projection filters to stay within the factor of 2. This is load-bearing for the certified-accuracy results such as the +7.3% gain on NIH Chest X-ray.

Authors: We agree that the manuscript would benefit from an explicit, self-contained derivation of the overall Lipschitz bound. While the abstract and theoretical sections state that the composition of spectrally normalized 1-Lipschitz convolutions, random projection filters, and the randomized attention-noise mechanism yields a network with Lipschitz constant at most 2, the step-by-step argument (including how the expectation over the stochastic attention term interacts with the non-Lipschitz softmax) is not expanded in sufficient detail. In the revised version we will add a dedicated subsection (likely in Section 3) that provides: (i) the Lipschitz property of each deterministic component via spectral normalization, (ii) a bound on the randomized attention-noise block using the fact that additive stochastic perturbations and the attention mechanism can be analyzed via randomized smoothing arguments to contribute an additional factor of at most 2 in expectation, and (iii) the overall composition via the chain rule for Lipschitz constants. This derivation will directly underpin the certified-accuracy claims. We will also include a short proof sketch in the main text with full details moved to the appendix if space is limited. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in Lipschitz bound or performance claims

full rationale

The paper introduces HyCAS as a new hybrid architecture and asserts that coupling spectrally normalized 1-Lipschitz convolutions with the two stochastic components produces an overall <=2-Lipschitz network admitting formal certificates. This assertion is presented as following from the architectural construction rather than being defined in terms of the target bound itself. Reported gains in certified and empirical accuracy are measured on standard external benchmarks (CIFAR, ImageNet, NIH, HAM10000) with no evidence that the numbers reduce to quantities fitted directly from the same test sets or that a self-citation chain supplies the missing composition rules. No renaming of known results, ansatz smuggling, or uniqueness theorems imported from prior author work appear in the provided sections. The derivation is therefore treated as self-contained for the purpose of this circularity check.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the preservation of a global 2-Lipschitz bound after the addition of stochastic layers and on the validity of randomized-smoothing certificates under that bound.

free parameters (1)

stochasticity hyperparameters
Scale and variance parameters of the random projections and attention noise are chosen to achieve the reported accuracy-robustness trade-off.

axioms (1)

domain assumption The composition of 1-Lipschitz convolutions with the described stochastic components yields a network whose Lipschitz constant is at most 2.
Stated directly in the abstract as the source of formal certificates.

pith-pipeline@v0.9.0 · 5551 in / 1281 out tokens · 25379 ms · 2026-05-09T14:27:32.942045+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

4 extracted references · 1 canonical work pages

[1]

Francesco Croce, Sven Gowal, Thomas Brunner, Evan Shelhamer, Matthias Hein, and Taylan Cemgil

PMLR, 2020. Francesco Croce, Sven Gowal, Thomas Brunner, Evan Shelhamer, Matthias Hein, and Taylan Cemgil. Evaluating the adversarial robustness of adaptive test -time defenses. In International Conference on Machine Learning (ICML), pp. 4421–4435, 2022. Jia Deng, Wei Dong, Richard Socher, Li -Jia Li, Kai Li, and Li Fei -Fei. ImageNet: A large -scale hier...

work page doi:10.1007/s10994-020-05908-7 2020
[2]

We certify the smoothed classifier gσ(x) ≜ arg max Pε,Ω[fθ(x + ε; Ω) = c]

RS certificate. We certify the smoothed classifier gσ(x) ≜ arg max Pε,Ω[fθ(x + ε; Ω) = c] . c∈Y We follow the standard two–stage Monte-Carlo protocol: draw n0 samples to select the can- didate class cˆ and then n samples to bound its probability. Let pˆA and pˆB be the empirical proportions of the top and runner-up classes. Using exact Clopper–Pearson int...
[3]

Independently of input noise, we certify the backbone + internal randomness by averaging logits only over Ω: Z(x) ≜ EΩ [ s(x; Ω) ], with Lip(Z) ≤ 2

Deterministic Lipschitz (margin) certificate. Independently of input noise, we certify the backbone + internal randomness by averaging logits only over Ω: Z(x) ≜ EΩ [ s(x; Ω) ], with Lip(Z) ≤ 2. Let ∆Z(x) = Z(1)(x) Z(2)(x) be the gap between the top-two expected logits. Then for every perturbation δ 2 < ∆Z(x)/4, the arg max of Z( ) is invariant; i.e., the...

2026
[4]

to set up our experiments on our diverse datasets. For Adversarial Evaluation—HyCAS is tested under white-box attacks—PGD (Madry et al., 2018), APGD (Croce & Hein, 2020), and AutoAttack (AA) (Croce & Hein, 2020) using ϵ = { 8 , 16 }, step size α = 20 , and 10–100 iterations. 255 255 255 Training Details for Certified Robustness. Following ARS, we use a si...

2018