arxiv: 1312.6199 · v4 · submitted 2013-12-21 · 💻 cs.CV · cs.LG· cs.NE

Recognition: 3 theorem links

· Lean Theorem

Intriguing properties of neural networks

Christian Szegedy , Wojciech Zaremba , Ilya Sutskever , Joan Bruna , Dumitru Erhan , Ian Goodfellow , Rob Fergus

Authors on Pith no claims yet

Pith reviewed 2026-05-11 21:49 UTC · model grok-4.3

classification 💻 cs.CV cs.LGcs.NE

keywords deep neural networksadversarial examplesinterpretabilityfeature representationsrobustnessdiscontinuous mappingstransferability

0 comments

The pith

Deep neural networks map inputs to outputs in ways that are discontinuous enough for tiny perturbations to cause misclassifications, and these attacks transfer to other networks while high-level semantics reside in linear combinations of 0.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper identifies two counter-intuitive properties of deep neural networks that achieve strong performance on vision tasks. Analysis of high layers shows that single units and random linear combinations of units are indistinguishable under standard visualization and activation methods, pointing to the overall feature space as the carrier of semantic content. Separately, small changes found by maximizing a network's prediction error on a given image prove sufficient to flip its classification, yet remain too subtle for human detection. The same perturbations also fool networks trained on different data splits, indicating the discontinuities are not mere artifacts of a particular training run.

Core claim

There is no distinction between individual high level units and random linear combinations of high level units according to various methods of unit analysis, suggesting that the space rather than the individual units contains the semantic information in the high layers. Deep neural networks learn input-output mappings that are fairly discontinuous to a significant extent; a network can be made to misclassify an image by an imperceptible perturbation found by maximizing its prediction error, and the same perturbation misclassifies the input in a different network trained on a different data subset.

What carries the argument

Error-maximization search for minimal input perturbations combined with unit analysis that compares single high-level activations against random linear combinations of them.

If this is right

Semantic information at higher layers lives in the collective linear span of units rather than in any one neuron.
The learned functions are sensitive to structured, low-magnitude changes that standard training does not prevent.
Adversarial perturbations generated against one model frequently affect independently trained models on the same task.
Accurate performance on clean test sets does not imply robustness to targeted, small-magnitude attacks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Transferability opens the possibility of black-box attacks that require no knowledge of a target model's weights or training data.
Regularization methods that penalize large output changes under small input shifts could reduce the observed discontinuities.
Visualization and interpretability work may need to shift focus from individual units to the subspaces they span.

Load-bearing premise

The chosen unit-analysis techniques accurately reflect semantic content, the generated perturbations count as imperceptible to humans, and the observed transferability of attacks extends beyond the tested networks and datasets.

What would settle it

A deep network in which no small perturbation obtained by maximizing prediction error produces a misclassification on natural images, or in which single high-level units exhibit semantic responses clearly distinct from those of random linear combinations.

read the original abstract

Deep neural networks are highly expressive models that have recently achieved state of the art performance on speech and visual recognition tasks. While their expressiveness is the reason they succeed, it also causes them to learn uninterpretable solutions that could have counter-intuitive properties. In this paper we report two such properties. First, we find that there is no distinction between individual high level units and random linear combinations of high level units, according to various methods of unit analysis. It suggests that it is the space, rather than the individual units, that contains of the semantic information in the high layers of neural networks. Second, we find that deep neural networks learn input-output mappings that are fairly discontinuous to a significant extend. We can cause the network to misclassify an image by applying a certain imperceptible perturbation, which is found by maximizing the network's prediction error. In addition, the specific nature of these perturbations is not a random artifact of learning: the same perturbation can cause a different network, that was trained on a different subset of the dataset, to misclassify the same input.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper flags two empirical properties of DNNs—distributed high-level semantics and transferable adversarial perturbations—that later proved durable.

read the letter

The main things to know are that high-level units in these networks do not appear to hold distinct semantic content on their own—random linear combinations of them behave similarly under the analysis methods used—and that small perturbations found by maximizing prediction error can flip classifications and often transfer to networks trained on different data splits. Both observations come from direct experiments rather than theory. What is new is the concrete demonstration of these behaviors on standard vision models, especially the transferability result, which had not been shown before in the cited literature. The paper does well at presenting clear examples and keeping the claims tied to what the runs actually showed. The experiments use off-the-shelf datasets and architectures, and the optimization procedure for perturbations is straightforward to reproduce. On the soft spots, the imperceptibility claim rests on the perturbations being small in pixel norm without any human detection tests, so it is possible the changes are noticeable under closer inspection. The unit analysis likewise depends on the specific visualization and activation-maximization techniques chosen; if those techniques miss certain aspects of semantics, the “no distinction” conclusion could be narrower than stated. These are real but limited caveats for an exploratory paper—the core empirical patterns have held up in follow-on work. The citation pattern is appropriate and points to relevant prior DNN results without circularity. This is for readers working on model interpretation or robustness who want concrete starting points rather than proofs. It shows honest engagement with the data and deserves a serious referee to discuss the experimental controls and how far the transferability extends.

Referee Report

3 major / 3 minor

Summary. The paper reports two intriguing properties of deep neural networks. First, there is no distinction between individual high-level units and random linear combinations of high-level units according to various unit analysis methods, suggesting that semantic information resides in the feature space rather than in individual units. Second, deep neural networks learn fairly discontinuous input-output mappings, as demonstrated by the ability to cause misclassifications with imperceptible perturbations found by maximizing the network's prediction error, and these perturbations transfer across networks trained on different data subsets.

Significance. If the results hold, they are significant because they challenge assumptions about interpretability in DNNs by showing that high-level representations are distributed rather than localized in units, and they reveal a fundamental vulnerability in the robustness of these models to small adversarial changes. The empirical demonstrations on large-scale datasets like ImageNet provide concrete examples that have influenced subsequent research on adversarial examples and network interpretability.

major comments (3)

[§3.1] §3.1 (unit analysis experiments): The claim that there is 'no distinction' between individual high-level units and random linear combinations rests on qualitative visualizations from activation maximization and related methods. No quantitative metric (such as a similarity score between generated images or a statistical test for indistinguishability) is reported, which weakens the load-bearing conclusion that semantics reside in the space rather than the basis directions.
[§3.2] §3.2 (adversarial perturbation experiments): Imperceptibility is asserted via small L2/L∞ norms, but the manuscript provides no human psychophysical validation or comparison to perceptual thresholds. This is central to the claim that the mappings are discontinuous in a practically meaningful way and that the perturbations are not merely artifacts.
[Transferability experiments] Transferability results (across networks trained on different subsets): While success rates are shown, the paper does not report the exact degree of training-data overlap, architecture differences, or statistical controls for chance-level transfer, limiting the generality of the 'not a random artifact' claim.

minor comments (3)

[Abstract] Abstract: 'to a significant extend' should read 'to a significant extent'.
[Abstract] Abstract: 'contains of the semantic information' is grammatically incorrect and should be 'contains the semantic information'.
[Figures] Figure captions (throughout): Captions should explicitly state the norm used for perturbations and the quantitative success rates observed, rather than relying solely on visual inspection.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments. We respond to each major comment below and indicate where revisions will be made.

read point-by-point responses

Referee: [§3.1] §3.1 (unit analysis experiments): The claim that there is 'no distinction' between individual high-level units and random linear combinations rests on qualitative visualizations from activation maximization and related methods. No quantitative metric (such as a similarity score between generated images or a statistical test for indistinguishability) is reported, which weakens the load-bearing conclusion that semantics reside in the space rather than the basis directions.

Authors: We agree that the unit analysis in §3.1 is based on qualitative visualizations showing that random linear combinations of high-level units produce images with comparable semantic content to those from individual units. To address the lack of quantitative support, we will add a metric in the revision, such as average SSIM or cosine similarity on the generated images across units and combinations, to demonstrate statistical similarity. This will be included as supplementary analysis in §3.1. revision: yes
Referee: [§3.2] §3.2 (adversarial perturbation experiments): Imperceptibility is asserted via small L2/L∞ norms, but the manuscript provides no human psychophysical validation or comparison to perceptual thresholds. This is central to the claim that the mappings are discontinuous in a practically meaningful way and that the perturbations are not merely artifacts.

Authors: The perturbations have small norms (L∞ ≈ 0.007 on [0,1]-normalized images), which are below typical visual detection thresholds in standard viewing conditions. We did not conduct human psychophysical studies. In the revision we will add a discussion comparing the norms to established perceptual thresholds from the image processing literature and note the absence of direct human validation as a limitation. revision: partial
Referee: Transferability results (across networks trained on different subsets): While success rates are shown, the paper does not report the exact degree of training-data overlap, architecture differences, or statistical controls for chance-level transfer, limiting the generality of the 'not a random artifact' claim.

Authors: The networks used identical architectures and were trained on completely disjoint random subsets of the data (zero overlap). Reported transfer rates (often exceeding 70%) are orders of magnitude above chance level for 1000-class classification. We will revise the text to explicitly state the zero overlap, identical architectures, and subset sizes, and add a brief note that the rates far exceed random guessing. Formal multi-seed statistical controls were not performed and will be acknowledged as a limitation. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical observations without derivational reduction

full rationale

The paper reports two empirical findings from direct experiments on trained neural networks: (1) unit analysis methods (activation maximization and related techniques) show no semantic distinction between single high-level units and random linear combinations of them, and (2) small perturbations obtained by maximizing prediction error are imperceptible and transfer across independently trained models. These are presented as experimental results rather than derivations from first principles, fitted parameters, or self-referential definitions. No equations, predictions, or uniqueness theorems are invoked that could reduce to the inputs by construction, and the text contains no self-citations used as load-bearing justification. The claims rest on observable outcomes of the described procedures, making the derivation chain self-contained and non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper presents empirical findings from trained networks rather than relying on mathematical axioms, free parameters, or new postulated entities.

pith-pipeline@v0.9.0 · 5508 in / 1124 out tokens · 37438 ms · 2026-05-11T21:49:35.576361+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Foundation.DiscretenessForcing continuous_space_no_lockIn echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

we find that there is no distinction between individual high level units and random linear combinations of high level units, according to various methods of unit analysis. It suggests that it is the space, rather than the individual units, that contains the semantic information in the high layers of neural networks.
Foundation.DiscretenessForcing continuous_no_isolated_zero_defect echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

deep neural networks learn input-output mappings that are fairly discontinuous to a significant extend. We can cause the network to misclassify an image by applying a certain imperceptible perturbation, which is found by maximizing the network's prediction error.
Foundation.LedgerForcing conservation_from_balance unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the same perturbation can cause a different network, that was trained on a different subset of the dataset, to misclassify the same input.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 48 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Quantitative Linear Logic
cs.LO 2026-05 accept novelty 8.0

pQLL calculi assign real-valued strength to proofs, generalize hypersequent and deep inference systems, prove cut elimination, and achieve completeness for soft residuated lattices, recovering MALL as p goes to infinity.
Quantitative Linear Logic
cs.LO 2026-05 unverdicted novelty 8.0

pQLL calculi make proof validity and sequent provability real-valued quantities, generalizing hypersequent calculi and deep inference while proving cut-elimination and completeness for soft residuated lattices.
Quantitative Linear Logic for Neuro-Symbolic Learning and Verification
cs.LO 2026-05 unverdicted novelty 7.0

QLL is a novel logic for neuro-symbolic learning that uses ML-native operations (sum, log-sum-exp) on logits to embed constraints, satisfying most linear logic properties and showing stronger correlation between empir...
AuraMask: An Extensible Pipeline for Developing Aesthetic Anti-Facial Recognition Image Filters
cs.CV 2026-05 conditional novelty 7.0

AuraMask produces 40 aesthetic anti-facial recognition filters that match or exceed prior adversarial effectiveness and achieve significantly higher user acceptance in a 630-person study.
Control Your View: High-Resolution Global Semantic Manipulation in Learned Image Compression
cs.CV 2026-05 unverdicted novelty 7.0

PGD²-GSM is the first method to stably achieve high-resolution global semantic manipulation in learned image compression via a Periodic Geometric Decay schedule that handles Lazying-Oscillating-Refining attack stages.
TARO: Temporal Adversarial Rectification Optimization Using Diffusion Models as Purifiers
cs.LG 2026-05 unverdicted novelty 7.0

TARO builds a temporally guided score prior from high-noise and low-noise diffusion views to purify adversarial examples more robustly than uniform timestep methods.
Empirical Evidence for Simply Connected Decision Regions in Image Classifiers
cs.CV 2026-05 unverdicted novelty 7.0

Empirical tests with quad-mesh filling indicate that decision regions in modern image classifiers are simply connected.
Low Rank Adaptation for Adversarial Perturbation
cs.LG 2026-04 unverdicted novelty 7.0

Adversarial perturbations possess an inherently low-rank structure that enables more efficient and effective black-box adversarial attacks via subspace projection.
A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework
cs.CR 2026-04 unverdicted novelty 7.0

A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.
Benign Overfitting in Adversarial Training for Vision Transformers
cs.LG 2026-04 unverdicted novelty 7.0

Adversarial training on simplified Vision Transformers achieves benign overfitting with near-zero robust loss and generalization error when signal-to-noise ratio and perturbation budget meet specific conditions.
Physically-Induced Atmospheric Adversarial Perturbations: Enhancing Transferability and Robustness in Remote Sensing Image Classification
cs.CV 2026-04 unverdicted novelty 7.0

FogFool creates fog-based adversarial perturbations using Perlin noise optimization to achieve high black-box transferability (83.74% TASR) and robustness to defenses in remote sensing classification.
Understanding and Improving Continuous Adversarial Training for LLMs via In-context Learning Theory
cs.LG 2026-04 unverdicted novelty 7.0

Continuous adversarial training in the embedding space produces a robust generalization bound for linear transformers that decreases with perturbation radius, tied to singular values of the embedding matrix, and motiv...
Can Drift-Adaptive Malware Detectors Be Made Robust? Attacks and Defenses Under White-Box and Black-Box Threats
cs.CR 2026-04 unverdicted novelty 7.0

A fine-tuning framework reduces PGD attack success on AdvDA detectors from 100% to 3.2% and MalGuise from 13% to 5.1%, but optimal training strategies differ by threat model and robustness does not transfer across them.
Hidden Reliability Risks in Large Language Models: Systematic Identification of Precision-Induced Output Disagreements
cs.AI 2026-04 unverdicted novelty 7.0

PrecisionDiff is a differential testing framework that uncovers widespread precision-induced behavioral disagreements in aligned LLMs, including safety-critical jailbreak divergences across precision formats.
Diffusion Models Beat GANs on Image Synthesis
cs.LG 2021-05 accept novelty 7.0

Diffusion models with architecture improvements and classifier guidance achieve superior FID scores to GANs on unconditional and conditional ImageNet image synthesis.
Understanding intermediate layers using linear classifier probes
stat.ML 2016-10 accept novelty 7.0

Linear probes demonstrate that feature separability for classification increases monotonically with network depth in Inception v3 and ResNet-50.
Concrete Problems in AI Safety
cs.AI 2016-06 accept novelty 7.0

The paper categorizes five concrete AI safety problems arising from flawed objectives, costly evaluation, and learning dynamics.
LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop
cs.CV 2015-06 accept novelty 7.0

LSUN dataset of one million images per category across 30 classes is constructed via iterative human-in-the-loop deep learning labeling.
Quantitative Linear Logic for Neuro-Symbolic Learning and Verification
cs.LO 2026-05 unverdicted novelty 6.0

Quantitative Linear Logic interprets logical connectives via natural ML operations on logits to embed constraints in neural training while satisfying most linear logic laws and correlating performance with independent...
Feature Visualization Recovers Known Cortical Selectivity from TRIBE v2
q-bio.NC 2026-05 unverdicted novelty 6.0

Feature visualization on TRIBE v2 brain encoders recovers the known ventral visual hierarchy from V1 to V4 and produces distinctive patterns for MT, FFA, and PPA, with optimized stimuli driving ~4x higher activation t...
ASD-Bench: A Four-Axis Comprehensive Benchmark of AI Models for Autism Spectrum Disorder
cs.LG 2026-05 unverdicted novelty 6.0

ASD-Bench evaluates 17 ML and deep learning models on 4,068 AQ-10 records across child, adolescent, and adult cohorts, showing high adult performance, harder adolescent classification, shifting feature importance, and...
Hierarchical End-to-End Taylor Bounds for Complete Neural Network Verification
cs.LG 2026-05 unverdicted novelty 6.0

HiTaB introduces a hierarchical Taylor bound framework for neural network reachability that systematically exploits second-order smoothness and curvature Lipschitz constants via layerwise propagation.
The Propagation Field: A Geometric Substrate Theory of Deep Learning
cs.LG 2026-05 unverdicted novelty 6.0

Neural networks possess a propagation field of trajectories and Jacobians whose quality can be measured and optimized independently of endpoint loss, yielding better unseen-path generalization and reduced forgetting i...
RELO: Reinforcement Learning to Localize for Visual Object Tracking
cs.CV 2026-05 unverdicted novelty 6.0

RELO replaces handcrafted spatial priors with a reinforcement learning policy for target localization in visual tracking and reports 57.5% AUC on LaSOText without template updates.
When AI reviews science: Can we trust the referee?
cs.AI 2026-04 unverdicted novelty 6.0

AI peer review systems are vulnerable to prompt injections, prestige biases, assertion strength effects, and contextual poisoning, as demonstrated by a new attack taxonomy and causal experiments on real conference sub...
Transferable Physical-World Adversarial Patches Against Pedestrian Detection Models
cs.CV 2026-04 unverdicted novelty 6.0

TriPatch generates transferable physical adversarial patches via multi-stage triplet loss, appearance consistency, and data augmentation to achieve higher attack success rates on pedestrian detectors than prior methods.
Can AI Detect Life? Lessons from Artificial Life
cs.LG 2026-04 unverdicted novelty 6.0

Artificial life experiments demonstrate that machine learning models for extraterrestrial life detection produce near-100% false positives on out-of-distribution samples, rendering them unreliable.
Quantum Patches: Enhancing Robustness of Quantum Machine Learning Models
quant-ph 2026-04 unverdicted novelty 6.0

Random quantum circuits used as adversarial training data reduce successful attack rates on QML models for CIFAR-10 from 89.8% to 68.45% and for CINIC-10 from 94.23% to 78.68%.
Adversarial Label Invariant Graph Data Augmentations for Out-of-Distribution Generalization
cs.LG 2026-04 unverdicted novelty 6.0

RIA uses adversarial exploration of counterfactual graph environments via label-invariant augmentations to improve OoD generalization in graph classification tasks.
Street-Legal Physical-World Adversarial Rim for License Plates
cs.CV 2026-04 conditional novelty 6.0

SPAR is a street-legal physical rim that cuts modern ALPR accuracy by 60% and reaches 18% targeted impersonation while costing under $100 and requiring no plate modification.
Jailbreaking Black Box Large Language Models in Twenty Queries
cs.LG 2023-10 conditional novelty 6.0

PAIR uses an attacker LLM to iteratively craft effective jailbreak prompts for black-box target LLMs in fewer than 20 queries.
SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks
cs.LG 2023-10 accept novelty 6.0

SmoothLLM mitigates jailbreaking attacks on LLMs by randomly perturbing multiple copies of a prompt at the character level and aggregating the outputs to detect adversarial inputs.
Baseline Defenses for Adversarial Attacks Against Aligned Language Models
cs.LG 2023-09 conditional novelty 6.0

Baseline defenses including perplexity-based detection, input preprocessing, and adversarial training offer partial robustness to text adversarial attacks on LLMs, with challenges arising from weak discrete optimizers.
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
cs.CL 2022-08 accept novelty 6.0

RLHF-aligned language models show increasing resistance to red teaming with scale up to 52B parameters, unlike prompted or rejection-sampled models, supported by a released dataset of 38,961 attacks.
Demystifying MMD GANs
stat.ML 2018-01 accept novelty 6.0

MMD GANs have unbiased critic gradients but biased generator gradients from sample-based learning, and the Kernel Inception Distance provides a practical new measure for GAN convergence and dynamic learning rate adaptation.
Dual-axis attribution of zebrafish tectal microcircuits for energy-efficient and robust neurocomputing
cs.NE 2026-05 conditional novelty 5.0

Zebrafish tectal subcircuits are dissociated into spike-efficient information gating and feedback-like robustness stabilization, then transferred to improve ResNet efficiency and noise tolerance.
Memory Efficient Full-gradient Attacks (MEFA) Framework for Adversarial Defense Evaluations
cs.LG 2026-05 unverdicted novelty 5.0

MEFA enables exact full-gradient white-box attacks on iterative stochastic purification defenses like diffusion and Langevin EBMs by trading recomputation for lower memory, revealing vulnerabilities missed by approxim...
Laundering AI Authority with Adversarial Examples
cs.CR 2026-05 unverdicted novelty 5.0

Adversarial examples enable AI authority laundering by causing production VLMs to give authoritative but wrong responses on subtly perturbed images, with success rates of 22-100% using decade-old attack methods.
Machine Learning Enhanced Laser Spectroscopy for Multi-Species Gas Detection in Complex and Harsh Environments
physics.optics 2026-05 unverdicted novelty 5.0

Machine learning methods including denoising autoencoders, unsupervised interference mitigation, blind source separation, and certifiable classification are developed and experimentally validated to improve multi-spec...
Adversarial Flow Matching for Imperceptible Attacks on End-to-End Autonomous Driving
cs.CV 2026-04 unverdicted novelty 5.0

AFM is a novel gray-box adversarial attack using flow matching to create visually imperceptible perturbations that degrade performance of Vision-Language-Action and modular end-to-end autonomous driving models while s...
Identity-Decoupled Anonymization for Visual Evidence in Multi-modal Retrieval-Augmented Generation
cs.CV 2026-04 unverdicted novelty 5.0

Proposes a three-part generative anonymization pipeline using disentangled variational encoding, manifold-aware identity replacement, and distilled latent diffusion to protect face identities in MRAG while preserving ...
UniAda: Universal Adaptive Multi-objective Adversarial Attack for End-to-End Autonomous Driving Systems
cs.SE 2026-04 unverdicted novelty 5.0

UniAda introduces a white-box multi-objective attack using adaptive weighting to generate perturbations that jointly affect steering and speed in E2E ADS, outperforming benchmarks with average deviations of 3.54-29 de...
NetworkNet: A Deep Neural Network Approach for Random Networks with Sparse Nodal Attributes and Complex Nodal Heterogeneity
stat.ME 2026-04 unverdicted novelty 5.0

NetworkNet uses a tailored deep neural network to estimate nodal expansiveness and popularity in random networks while performing data-driven selection of high-dimensional nodal attributes.
QShield: Securing Neural Networks Against Adversarial Attacks using Quantum Circuits
cs.CR 2026-04 unverdicted novelty 5.0

Hybrid quantum-classical models using structured entanglement keep high accuracy on MNIST, OrganAMNIST and CIFAR-10 while lowering adversarial attack success rates and raising the computational cost of generating attacks.
On the Properties of Feature Attribution for Supervised Contrastive Learning
cs.LG 2026-04 unverdicted novelty 4.0

Neural networks trained via supervised contrastive learning yield feature attributions that are more faithful, less complex, and more continuous than those from cross-entropy trained networks.
Beyond Attack Success Rate: A Multi-Metric Evaluation of Adversarial Transferability in Medical Imaging Models
cs.CV 2026-04 unverdicted novelty 4.0

Perceptual quality metrics correlate strongly with each other but show minimal correlation with attack success rate across medical imaging models and datasets, making ASR alone inadequate for assessing adversarial robustness.
SoK: A Comprehensive Analysis of the Current Status of Neural Tangent Generalization Attacks with Research Directions
cs.LG 2026-05 accept novelty 3.0

NTGA is the first clean-label generalization attack under black-box settings but is vulnerable to adversarial training and image transformations, with newer attacks outperforming it.
Enhancing Adversarial Robustness in Network Intrusion Detection: A Layer-wise Adaptive Regularization Approach
cs.CR 2026-05 unverdicted novelty 3.0

LARAR enhances adversarial robustness in network intrusion detection by using layer-wise adaptive regularization and auxiliary classifiers, achieving 95.01% clean accuracy and improved defense against FGSM, PGD, and t...

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages · cited by 46 Pith papers · 1 internal anchor

[1]

How to explain individual classiﬁcation decisions

David Baehrens, Timon Schroeter, Stefan Harmeling, Motoaki Kawanabe, Katja Hansen, and Klaus- Robert M ¨uller. How to explain individual classiﬁcation decisions. The Journal of Machine Learning Research, 99:1803–1831, 2010

work page 2010
[2]

Learning deep architectures for ai

Yoshua Bengio. Learning deep architectures for ai. F oundations and trends® in Machine Learning , 2(1):1–127, 2009

work page 2009
[3]

Imagenet: A large-scale hierarchi- cal image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchi- cal image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 248–255. IEEE, 2009

work page 2009
[4]

Visualizing higher-layer features of a deep network

Dumitru Erhan, Yoshua Bengio, Aaron Courville, and Pascal Vincent. Visualizing higher-layer features of a deep network. Technical Report 1341, University of Montreal, June 2009. Also presented at the ICML 2009 Workshop on Learning Feature Hierarchies, Montr´eal, Canada

work page 2009
[5]

A discriminatively trained, multiscale, de- formable part model

Pedro Felzenszwalb, David McAllester, and Deva Ramanan. A discriminatively trained, multiscale, de- formable part model. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pages 1–8. IEEE, 2008

work page 2008
[6]

Rich feature hierarchies for accurate object detection and semantic segmentation

Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. arXiv preprint arXiv:1311.2524, 2013

work page arXiv 2013
[7]

Measuring invariances in deep networks

Ian Goodfellow, Quoc Le, Andrew Saxe, Honglak Lee, and Andrew Y Ng. Measuring invariances in deep networks. Advances in neural information processing systems , 22:646–654, 2009

work page 2009
[8]

Hinton, Li Deng, Dong Yu, George E

Geoffrey E. Hinton, Li Deng, Dong Yu, George E. Dahl, Abdel rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N. Sainath, and Brian Kingsbury. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag., 29(6):82–97, 2012

work page 2012
[9]

Imagenet classiﬁcation with deep convolutional neural networks

Alex Krizhevsky, Ilya Sutskever, and Geoff Hinton. Imagenet classiﬁcation with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25 , pages 1106–1114, 2012

work page 2012
[10]

Building high-level features using large scale unsupervised learning

Quoc V Le, Marc’Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg S Corrado, Jeff Dean, and Andrew Y Ng. Building high-level features using large scale unsupervised learning. arXiv preprint arXiv:1112.6209, 2011

work page arXiv 2011
[11]

The mnist database of handwritten digits, 1998

Yann LeCun and Corinna Cortes. The mnist database of handwritten digits, 1998

work page 1998
[12]

Efficient Estimation of Word Representations in Vector Space

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efﬁcient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[13]

and Fergus, Rob , month = nov, year =

Matthew D Zeiler and Rob Fergus. Visualizing and understanding convolutional neural networks. arXiv preprint arXiv:1311.2901, 2013. 10

work page arXiv 2013