Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

Andrea Vedaldi; Andrew Zisserman; Karen Simonyan

arxiv: 1312.6034 · v2 · submitted 2013-12-20 · 💻 cs.CV

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

Karen Simonyan , Andrea Vedaldi , Andrew Zisserman This is my paper

Pith reviewed 2026-05-11 18:45 UTC · model grok-4.3

classification 💻 cs.CV

keywords convolutional networkssaliency mapsvisualizationweakly supervised segmentationgradient methodsimage classification

0 comments

The pith

A convolutional network trained only for image classification can produce saliency maps from class-score gradients that support weakly supervised object segmentation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper presents two gradient-based methods for visualizing what convolutional networks learn during image classification. The first synthesizes an image that maximizes the score for a target class to show what the network associates with it. The second computes a saliency map for any input image by propagating the class score gradient back to the pixels. These saliency maps prove effective for segmenting the object of interest in the image using only a classification-trained network and no location labels.

Core claim

The central claim is that computing the gradient of the class score with respect to the input image pixels yields both class-representative visualizations through optimization and image-specific saliency maps, which in turn enable weakly supervised object segmentation.

What carries the argument

The class score gradient with respect to input image pixels, used to generate saliency maps and class visualizations.

If this is right

Saliency maps from classification networks support object segmentation without supervised location data.
Maximizing class scores produces images that illustrate learned class concepts.
Gradient visualization methods connect directly to those employed in deconvolutional networks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same gradient technique might help identify biases or failure modes in network decisions by revealing attended regions.
These visualizations could be combined with other interpretability tools for deeper model analysis.

Load-bearing premise

The gradient of the class score with respect to input pixels provides a faithful measure of each pixel's importance to the classification decision.

What would settle it

Experiments showing that the resulting saliency maps do not align with object boundaries or fail to produce accurate segmentations in a weakly supervised setting would contradict the central claim.

read the original abstract

This paper addresses the visualisation of image classification models, learnt using deep Convolutional Networks (ConvNets). We consider two visualisation techniques, based on computing the gradient of the class score with respect to the input image. The first one generates an image, which maximises the class score [Erhan et al., 2009], thus visualising the notion of the class, captured by a ConvNet. The second technique computes a class saliency map, specific to a given image and class. We show that such maps can be employed for weakly supervised object segmentation using classification ConvNets. Finally, we establish the connection between the gradient-based ConvNet visualisation methods and deconvolutional networks [Zeiler et al., 2013].

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Gradient-based saliency maps give a clean way to visualize ConvNet decisions and hint at weakly supervised localization, but the segmentation results stay qualitative with no metrics or extraction procedure.

read the letter

The core contribution here is showing that the gradient of the class score with respect to input pixels produces usable saliency maps for a given image and class. They also revisit class-score maximization to generate prototype images and note the overlap with deconvolutional networks from Zeiler et al. The suggestion that these maps can support object segmentation from a pure classifier is the part that stands out as new relative to the cited priors like Erhan 2009.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces two gradient-based visualization techniques for ConvNets trained on image classification. The first synthesizes an input image that maximizes a target class score. The second computes a class-specific saliency map for a given image by back-propagating the class score gradient to the input pixels and taking the absolute value. The authors illustrate the use of these maps for weakly supervised object segmentation and establish a formal connection between the gradient-based visualizations and deconvolutional networks.

Significance. If the saliency maps reliably highlight object pixels, the work supplies practical tools for interpreting ConvNet decisions and enables segmentation from classification-only training data. The explicit link drawn to deconvolutional networks unifies two previously separate visualization approaches and is a clear strength of the manuscript.

major comments (2)

[Abstract / segmentation experiments] Abstract and the weakly-supervised segmentation section: the claim that saliency maps 'can be employed for weakly supervised object segmentation' rests only on a handful of qualitative visual examples. No automatic thresholding rule, connected-component procedure, or post-processing step is formalized, and no quantitative metrics (IoU, pixel accuracy, or similar) are reported against ground-truth masks on any dataset.
[Eq. (2)] Eq. (2): the saliency map is defined as the absolute value of the class-score gradient with respect to input pixels. The resulting maps are acknowledged to be noisy, yet the manuscript provides neither an analysis of how this noise propagates into the segmentation examples nor any error quantification that would support the central segmentation claim.

minor comments (2)

[Figures 1-3] Figure captions for the synthesized class images and saliency maps should explicitly state the optimization parameters (learning rate, number of iterations, regularization) used to produce each example.
[§3] The notation distinguishing the class score S_c from the network output f_c could be introduced once at the beginning of §3 and used consistently thereafter.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and positive assessment of the work's significance and its connection to deconvolutional networks. We respond point-by-point to the major comments below.

read point-by-point responses

Referee: [Abstract / segmentation experiments] Abstract and the weakly-supervised segmentation section: the claim that saliency maps 'can be employed for weakly supervised object segmentation' rests only on a handful of qualitative visual examples. No automatic thresholding rule, connected-component procedure, or post-processing step is formalized, and no quantitative metrics (IoU, pixel accuracy, or similar) are reported against ground-truth masks on any dataset.

Authors: We agree that the segmentation results are demonstrated through qualitative examples rather than a fully formalized pipeline with quantitative evaluation. The primary focus of the manuscript is the gradient-based visualization techniques themselves; the segmentation application is presented as an illustration of how the saliency maps might be used in a weakly-supervised setting. To address the concern, the revised manuscript will include an explicit description of the simple thresholding and connected-component post-processing applied to the examples, along with quantitative metrics (e.g., pixel accuracy and IoU) evaluated against ground-truth masks on a standard dataset such as PASCAL VOC. revision: yes
Referee: [Eq. (2)] Eq. (2): the saliency map is defined as the absolute value of the class-score gradient with respect to input pixels. The resulting maps are acknowledged to be noisy, yet the manuscript provides neither an analysis of how this noise propagates into the segmentation examples nor any error quantification that would support the central segmentation claim.

Authors: The manuscript does observe that the resulting saliency maps can appear noisy. The absolute-value operation is applied to produce a non-negative map that emphasizes pixels with the largest positive influence on the class score. We acknowledge the absence of a dedicated noise-propagation analysis or error quantification tied to the segmentation examples. In the revision we will add a short discussion of the noise characteristics of the raw gradients versus the absolute-value maps, supported by additional side-by-side visualizations that illustrate their effect on the downstream segmentation examples. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivations follow from standard back-propagation

full rationale

The paper defines its core visualization techniques directly from the gradient of the class score with respect to input pixels (via standard back-propagation) and the class-score maximization problem. These are not fitted to target outputs, nor are they defined in terms of the quantities they are later used to produce. The weakly-supervised segmentation application is presented as a qualitative demonstration rather than a formal prediction derived from fitted parameters. No load-bearing self-citations or uniqueness theorems are invoked; the cited prior work (Erhan et al., Zeiler et al.) is external. The claimed connection to deconvolutional networks is shown via explicit mathematical equivalence of the operations, not by renaming or self-reference. The derivation chain is therefore self-contained against external benchmarks and does not reduce to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract introduces no new free parameters, axioms, or invented entities. It relies on the standard assumption that gradients can be computed through a trained ConvNet and that these gradients carry semantic meaning for visualization.

pith-pipeline@v0.9.0 · 5422 in / 1086 out tokens · 36701 ms · 2026-05-11T18:45:55.981361+00:00 · methodology

discussion (0)

Forward citations

Cited by 60 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Measuring Cross-Modal Synergy: A Benchmark for VLM Explainability
cs.AI 2026-05 unverdicted novelty 7.0

Introduces Synergistic Faithfulness metric based on Shapley Interaction Index to evaluate cross-modal synergy in VLM explainers, revealing over-reliance on visual salience in existing methods.
I-SAFE: Wasserstein Coherence Metrics for Structural Auditing of Scientific AI Models
cs.LG 2026-05 unverdicted novelty 7.0

I-SAFE uses Wasserstein Coherence Metrics to audit distributional coherence of scientific AI models under structurally guided perturbations, revealing differences among DTI predictors that accuracy metrics miss.
Spectral Integrated Gradients for Coarse-to-Fine Feature Attribution
cs.CV 2026-05 unverdicted novelty 7.0

Spectral Integrated Gradients constructs SVD-based integration paths that activate singular components from largest to smallest, producing cleaner attribution maps and better quantitative scores than standard Integrat...
Toy Combinatorial Interpretability Models Reveal Lottery Tickets in Early Feature Space
cs.LG 2026-05 unverdicted novelty 7.0

In a combinatorial toy setting, winning lottery tickets preserve families of compatible feature locations in early feature space that balance proximity to final codes with low interference, rather than specific weight...
AIM: Adversarial Information Masking for Faithfulness Evaluation of Saliency Maps
cs.LG 2026-05 unverdicted novelty 7.0

AIM is a new saliency-guided adversarial feature replacement method to evaluate faithfulness of saliency maps and reliability of masking operators on image, audio, and EEG tasks.
$\alpha$-TCAV: A Unified Framework for Testing with Concept Activation Vectors
stat.ML 2026-05 unverdicted novelty 7.0

α-TCAV replaces TCAV's hard indicator with a tunable smooth function to create a unified probabilistic framework with lower variance and guidance for parameter choice or Bayes-optimal scoring.
How to Evaluate and Refine your CAM
cs.CV 2026-05 unverdicted novelty 7.0

Introduces synthetic ground-truth dataset for CAM evaluation, proposes ARCC composite metric, and RefineCAM method that aggregates layers for higher-resolution maps outperforming baselines.
From Mechanistic to Compositional Interpretability
cs.LG 2026-05 unverdicted novelty 7.0

Compositional interpretability defines explanations as commuting syntactic-semantic mapping pairs grounded in compositionality and minimum description length, with compressive refinement and a parsimony theorem guaran...
SeBA: Semi-supervised few-shot learning via Separated-at-Birth Alignment for tabular data
cs.LG 2026-05 unverdicted novelty 7.0

SeBA is a joint-embedding framework that separates tabular data into two complementary views and aligns one view's representations to the nearest-neighbor structure of the other, improving feature-label relationships ...
GRALIS: A Unified Canonical Framework for Linear Attribution Methods via Riesz Representation
cs.LG 2026-05 unverdicted novelty 7.0

GRALIS unifies linear XAI attribution methods via a Riesz Representation Theorem-derived canonical form (Q, w, Delta), delivering seven theorems on completeness, convergence, interactions, and multi-scale extensions.
Manifold-Aligned Guided Integrated Gradients for Reliable Feature Attribution
cs.LG 2026-05 unverdicted novelty 7.0

MA-GIG uses VAE latent space to align Integrated Gradients paths with the data manifold for more faithful feature attributions in deep neural networks.
ISAAC: Auditing Causal Reasoning in Deep Models for Drug-Target Interaction
cs.LG 2026-05 unverdicted novelty 7.0

ISAAC auditing applied to three DTI models on the Davis benchmark finds 25% relative differences in causal reasoning scores despite nearly identical AUROC values.
From Local to Global to Mechanistic: An iERF-Centered Unified Framework for Interpreting Vision Models
cs.CV 2026-05 unverdicted novelty 7.0

An iERF-centric framework unifies local, global, and mechanistic interpretability in vision models via SRD for saliency, CAFE for concept anchoring, and ICAT for interlayer attribution.
Mapping data sensitivities in global QCD analysis with linear response and influence functions
hep-ph 2026-04 unverdicted novelty 7.0

A framework based on linear response and influence functions maps data sensitivities in global QCD analyses to show how experiments determine central values, uncertainties, and correlations of non-perturbative functions.
Homogeneous Stellar Parameters from Heterogeneous Spectra with Deep Learning
astro-ph.GA 2026-04 unverdicted novelty 7.0

A single end-to-end Transformer model unifies stellar labels from heterogeneous spectroscopic surveys into a self-consistent scale without post-hoc recalibration.
Benchmarking bandgap prediction in semiconductors under experimental and realistic evaluation settings
cond-mat.mtrl-sci 2026-04 unverdicted novelty 7.0

Introduces the RealMat-BaG benchmark showing fundamental generalization limits of ML models when predicting experimental bandgaps from DFT-trained data.
Mamba-SSM with LLM Reasoning for Feature Selection: Faithfulness-Aware Biomarker Discovery
q-bio.QM 2026-04 unverdicted novelty 7.0

LLM chain-of-thought filtering of Mamba saliency features on TCGA-BRCA data produces a 17-gene set with AUC 0.927 that beats both the raw 50-gene saliency list and a 5000-gene baseline while using far fewer features, ...
Can Cross-Layer Transcoders Replace Vision Transformer Activations? An Interpretable Perspective on Vision
cs.CV 2026-04 unverdicted novelty 7.0

Cross-Layer Transcoders decompose ViT activations into sparse, depth-aware layer contributions that maintain zero-shot accuracy and enable faithful attribution of the final representation.
TRANSPORTER: Transferring Visual Semantics from VLM Manifolds
cs.CV 2025-11 unverdicted novelty 7.0

TRANSPORTER generates videos from VLM logits using optimal transport to interpret model predictions on object attributes, actions, and scenes.
Human-Centered Supervision for Sentiment Analysis in Telugu: A Systematic Inquiry Beyond Accuracy
cs.CL 2025-08 unverdicted novelty 7.0

Human rationales in supervision for Telugu sentiment analysis improve model alignment with human reasoning and often produce gains in predictive performance.
Scaling and evaluating sparse autoencoders
cs.LG 2024-06 unverdicted novelty 7.0

K-sparse autoencoders with dead-latent fixes produce clean scaling laws and better feature quality metrics that improve with size, shown by training a 16-million-latent model on GPT-4 activations.
Improving Dictionary Learning with Gated Sparse Autoencoders
cs.LG 2024-04 unverdicted novelty 7.0

Gated SAEs decouple which features to use from how large their activations should be, applying the L1 penalty only to selection and thereby eliminating shrinkage while halving the number of firing features needed for ...
Transcoders Trace Visual Grounding and Hallucinations in Vision-Language Models
cs.LG 2026-05 unverdicted novelty 6.0

Transcoders decompose MLP layers in Gemma 3-4B-IT to trace visual grounding more effectively than SAEs and predict hallucinations from circuit graph features at AUC 0.68.
ARC-STAR: Auditable Post-Hoc Correction for PDE Foundation Models
cs.LG 2026-05 unverdicted novelty 6.0

ARC-STAR is an auditable, budget-aware post-hoc correction method that reduces velocity rollout error by at least 36x over raw Poseidon across five flow benchmarks.
ARC-STAR: Auditable Post-Hoc Correction for PDE Foundation Models
cs.LG 2026-05 unverdicted novelty 6.0

ARC-STAR is a frozen, auditable post-hoc correction method that reduces velocity rollout error by at least 36x over raw Poseidon across five flow benchmarks using global and local stages with budget-aware triage.
Rethinking Visual Attribution for Chest X-ray Reasoning in Large Vision Language Models
cs.CV 2026-05 unverdicted novelty 6.0

Existing visual attribution methods often fail to identify the visual evidence used by LVLMs in chest X-ray reasoning, while MedFocus using unbalanced optimal transport and targeted interventions substantially outperf...
OCCAM: Open-set Causal Concept explAnation and Ontology induction for black-box vision Models
cs.AI 2026-05 unverdicted novelty 6.0

OCCAM discovers open-set visual concepts, estimates causal contributions via object-level interventions on black-box vision models, and induces a global concept ontology from aggregated dataset evidence.
GCE-MIL: Faithful and Recoverable Evidence for Multiple Instance Learning in Whole-Slide Imaging
cs.CV 2026-05 unverdicted novelty 6.0

GCE-MIL is a backbone-agnostic wrapper that directly optimizes MIL evidence for sufficiency, necessity, and recoverability, yielding modest gains in Macro-F1 and C-index plus more faithful patch selection across many ...
From Weight Perturbation to Feature Attribution for Explaining Fully Connected Neural Networks
cs.LG 2026-05 unverdicted novelty 6.0

XWP and XWP_c are novel attribution methods for FCNNs that estimate feature importance by perturbing attached weights to avoid added bias and out-of-distribution issues in occlusion approaches.
Feature Visualization Recovers Known Cortical Selectivity from TRIBE v2
q-bio.NC 2026-05 unverdicted novelty 6.0

Feature visualization on TRIBE v2 brain encoders recovers the known ventral visual hierarchy from V1 to V4 and produces distinctive patterns for MT, FFA, and PPA, with optimized stimuli driving ~4x higher activation t...
APEX: Audio Prototype EXplanations for Classification Tasks
cs.SD 2026-05 unverdicted novelty 6.0

APEX generates four types of prototype-based explanations for pre-trained audio classifiers that preserve output invariance and target acoustic properties better than gradient methods applied to spectrograms.
Scaling Vision Models Does Not Consistently Improve Localisation-Based Explanation Quality
cs.CV 2026-05 accept novelty 6.0

Scaling vision models by depth and parameter count does not consistently improve localisation-based explanation quality across architectures, datasets, and post-hoc methods; smaller models often perform comparably or better.
GRALIS: A Unified Canonical Framework for Linear Attribution Methods via Riesz Representation
cs.LG 2026-05 unverdicted novelty 6.0

GRALIS supplies a canonical representation (Q, w, Delta) for every additive linear continuous attribution functional on L^2 via the Riesz Representation Theorem, unifying SHAP, IG, LIME and linearized GradCAM while pr...
Manifold-Aligned Guided Integrated Gradients for Reliable Feature Attribution
cs.LG 2026-05 unverdicted novelty 6.0

MA-GIG improves Integrated Gradients by performing path integration in the latent space of a pre-trained VAE so that decoded points remain closer to the learned data manifold and reduce off-manifold gradient noise.
Evaluating the Alignment Between GeoAI Explanations and Domain Knowledge in Satellite-Based Flood Mapping
cs.CV 2026-04 unverdicted novelty 6.0

ADAGE uses Channel-Group SHAP to quantify alignment between GeoAI model explanations and domain knowledge references in satellite-based flood mapping.
H-Sets: Hessian-Guided Discovery of Set-Level Feature Interactions in Image Classifiers
cs.CV 2026-04 unverdicted novelty 6.0

H-Sets detects higher-order feature interactions in image classifiers via Hessian-guided pair merging and attributes them with IDG-Vis to generate more interpretable saliency maps than existing marginal or coarse methods.
On the Importance and Evaluation of Narrativity in Natural Language AI Explanations
cs.CL 2026-04 unverdicted novelty 6.0

XAI explanations should be narratives with continuous structure, cause-effect, fluency and diversity, and new metrics are needed to evaluate this better than standard NLP scores.
Contrastive Attribution in the Wild: An Interpretability Analysis of LLM Failures on Realistic Benchmarks
cs.AI 2026-04 conditional novelty 6.0

Token-level contrastive attribution yields informative signals for some LLM benchmark failures but is not universally applicable across datasets and models.
Potential of Gaia XP Spectra in Red Giant Star Asteroseismology: A Deep-Learning Approach
astro-ph.SR 2026-04 unverdicted novelty 6.0

Hybrid deep learning models recover large frequency separation, frequency of maximum power, and dipole period spacing from low-resolution Gaia XP spectra with accuracy comparable to moderate-resolution spectroscopy.
Towards Reliable Testing of Machine Unlearning
cs.LG 2026-04 unverdicted novelty 6.0

Causal fuzzing with budgeted interventions can detect residual direct and indirect influence of unlearned data that standard attribution methods miss due to proxies, cancellations, and masking.
Cross-Modal Knowledge Distillation for PET-Free Amyloid-Beta Detection from MRI
cs.CV 2026-04 unverdicted novelty 6.0

A PET-guided knowledge distillation approach achieves AUCs of 0.74 and 0.68 for amyloid-beta detection from MRI alone across two datasets without requiring PET or clinical covariates at test time.
Learn to Rank: Visual Attribution by Learning Importance Ranking
cs.CV 2026-04 unverdicted novelty 6.0

A new end-to-end training scheme for visual attribution maps that optimizes deletion and insertion metrics directly via differentiable ranking relaxation instead of surrogate objectives.
Hierarchical, Interpretable, Label-Free Concept Bottleneck Model
cs.CV 2026-04 unverdicted novelty 6.0

HIL-CBM is a hierarchical label-free concept bottleneck model that improves classification accuracy and explanation quality over prior single-level CBMs using a visual consistency loss and dual heads.
UNBOX: Unveiling Black-box visual models with Natural-language
cs.CV 2026-03 unverdicted novelty 6.0

UNBOX recovers interpretable text concepts that maximally activate classes in black-box vision models by recasting activation maximization as semantic search with LLMs and diffusion models.
MaskDiME: Adaptive Masked Diffusion for Precise and Efficient Visual Counterfactual Explanations
cs.CV 2026-02 unverdicted novelty 6.0

MaskDiME uses adaptive masked diffusion to produce 30x faster, localized, and semantically consistent visual counterfactual explanations without training, matching or exceeding prior performance on five datasets.
Faster Verified Explanations for Neural Networks
cs.LG 2025-11 unverdicted novelty 6.0

FaVeX accelerates verified explanations for neural networks via dynamic batch-sequential processing and query reuse while introducing verifier-optimal robust explanations that incorporate verifier incompleteness.
How to Use Deep Learning to Identify Sufficient Conditions: A Case Study on Stanley's $e$-Positivity
math.CO 2025-11 unverdicted novelty 6.0

Deep learning identifies co-triangle-free graphs as e-positive and proves e-positivity for claw-free claw-contractible-free graphs on 10 and 11 vertices, resolving an open conjecture.
AttnTrace: Contextual Attribution of Prompt Injection and Knowledge Corruption
cs.CL 2025-08 unverdicted novelty 6.0

AttnTrace is an attention-weight-based context traceback method for LLMs that claims higher accuracy and efficiency than prior art like TracLLM while aiding prompt injection detection.
Boosting Team Modeling through Tempo-Relational Representation Learning
cs.LG 2025-07 unverdicted novelty 6.0

A tempo-relational neural architecture jointly models temporal and relational aspects of team interactions to outperform prior approaches on team performance prediction and enable efficient multi-task prediction of te...
Why Do Class-Dependent Evaluation Effects Occur with Time Series Feature Attributions? A Synthetic Data Investigation
cs.LG 2025-06 unverdicted novelty 6.0

Synthetic experiments reveal that class-dependent effects appear in both perturbation-based and ground-truth evaluations of time series feature attributions, often producing contradictory rankings of attribution quali...
UntrustVul: An Automated Approach for Identifying Untrustworthy Alerts in Vulnerability Detection Models
cs.SE 2025-03 unverdicted novelty 6.0

UntrustVul identifies untrustworthy vulnerability predictions by marking lines that neither match historical vulnerability patterns nor influence vulnerable lines through dependencies, reporting AUC 70-88% and F1 82-9...
ExPath: Targeted Pathway Inference for Biological Knowledge Bases via Graph Learning and Explanation
cs.LG 2025-02 unverdicted novelty 6.0

ExPath is a subgraph inference framework that classifies bio-networks with experimental data and uses explanations to identify targeted pathways, reporting up to 4.5x higher Fidelity+ and 14x lower Fidelity- than base...
Explaining Object Detectors via Collective Contribution of Pixels
cs.CV 2024-12 unverdicted novelty 6.0

A Shapley-value method with interaction terms that explains object detector decisions by capturing collective pixel contributions for localization and classification.
SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation
cs.LG 2023-10 conditional novelty 6.0

SalUn uses gradient-based weight saliency to achieve effective machine unlearning of data, classes, or concepts in image classification and generation, narrowing the gap to exact retraining.
Multi-task Self-Supervised Learning for Human Activity Detection
cs.LG 2019-07 unverdicted novelty 6.0

A multi-task self-supervised approach trains a temporal CNN to detect transformations on sensory data, yielding features that match or exceed fully supervised performance in semi-supervised and transfer settings for s...
Interpretability Beyond Classification Output: Semantic Bottleneck Networks
cs.CV 2019-07 unverdicted novelty 6.0

Semantic Bottleneck Networks add interpretable semantic concept layers to deep networks, recovering SOTA segmentation performance with drastic channel reduction and enabling failure interpretation at over 99% accuracy...
Scalable Topological Data Analysis and Visualization for Evaluating Data-Driven Models in Scientific Applications
cs.LG 2019-07 unverdicted novelty 6.0

A scalable framework combining streaming graphs, topology computation, and topology-aware datacubes enables interactive analysis of high-dimensional functions in scientific ML applications.
Saliency-driven Word Alignment Interpretation for Neural Machine Translation
cs.CL 2019-06 unverdicted novelty 6.0

Saliency-driven interpretation methods reveal that NMT models learn word alignments of better quality than fast-align under force decoding and consistent with automatic tools under free decoding.
AttnGen: Attention-Guided Saliency Learning for Interpretable Genomic Sequence Classification
cs.LG 2026-05 unverdicted novelty 5.0

AttnGen embeds attention-based saliency into training via progressive masking to improve both accuracy and interpretability in classifying 200-nucleotide genomic sequences.
Biological Plausibility and Representational Alignment of Feedback Alignment in Convolutional Networks
cs.AI 2026-05 unverdicted novelty 5.0

Modified feedback alignment in convolutional networks produces representations geometrically aligned with backpropagation on CIFAR-10.

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages · cited by 86 Pith papers

[1]

Baehrens, T

D. Baehrens, T. Schroeter, S. Harmeling, M. Kawanabe, K. Hansen, and K.-R. M ¨uller. How to explain individual classiﬁcation decisions. JMLR, 11:1803–1831, 2010

work page 2010
[2]

A. Berg, J. Deng, and L. Fei-Fei. Large scale visual recognition challenge (ILSVRC), 2010. URL http://www.image-net.org/challenges/LSVRC/2010/

work page 2010
[3]

Boykov and M

Y . Boykov and M. P. Jolly. Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. In Proc. ICCV, volume 2, pages 105–112, 2001

work page 2001
[4]

D. C. Ciresan, U. Meier, and J. Schmidhuber. Multi-column deep neural networks for image classiﬁcation. In Proc. CVPR, pages 3642–3649, 2012

work page 2012
[5]

Erhan, Y

D. Erhan, Y . Bengio, A. Courville, and P. Vincent. Visualizing higher-layer features of a deep network. Technical Report 1341, University of Montreal, Jun 2009

work page 2009
[6]

Felzenszwalb, D

P. Felzenszwalb, D. Mcallester, and D. Ramanan. A discriminatively trained, multiscale, deformable part model. In Proc. CVPR, 2008. 7

work page 2008
[7]

G. E. Hinton, S. Osindero, and Y . W. Teh. A fast learning algorithm for deep belief nets. Neural Compu- tation, 18(7):1527–1554, 2006

work page 2006
[8]

Krizhevsky, I

A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet classiﬁcation with deep convolutional neural networks. In NIPS, pages 1106–1114, 2012

work page 2012
[9]

Q. Le, M. Ranzato, R. Monga, M. Devin, K. Chen, G. Corrado, J. Dean, and A. Ng. Building high-level features using large scale unsupervised learning. In Proc. ICML, 2012

work page 2012
[10]

LeCun, L

Y . LeCun, L. Bottou, Y . Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998

work page 1998
[11]

Perronnin, J

F. Perronnin, J. S ´anchez, and T. Mensink. Improving the Fisher kernel for large-scale image classiﬁcation. In Proc. ECCV, 2010

work page 2010
[12]

Simonyan, A

K. Simonyan, A. Vedaldi, and A. Zisserman. Deep Fisher networks and class saliency maps for ob- ject classiﬁcation and localisation. In ILSVRC workshop , 2013. URL http://image-net.org/ challenges/LSVRC/2013/slides/ILSVRC_az.pdf

work page 2013
[13]

M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. CoRR, abs/1311.2901v3, 2013. 8

work page Pith review arXiv 2013

[1] [1]

Baehrens, T

D. Baehrens, T. Schroeter, S. Harmeling, M. Kawanabe, K. Hansen, and K.-R. M ¨uller. How to explain individual classiﬁcation decisions. JMLR, 11:1803–1831, 2010

work page 2010

[2] [2]

A. Berg, J. Deng, and L. Fei-Fei. Large scale visual recognition challenge (ILSVRC), 2010. URL http://www.image-net.org/challenges/LSVRC/2010/

work page 2010

[3] [3]

Boykov and M

Y . Boykov and M. P. Jolly. Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. In Proc. ICCV, volume 2, pages 105–112, 2001

work page 2001

[4] [4]

D. C. Ciresan, U. Meier, and J. Schmidhuber. Multi-column deep neural networks for image classiﬁcation. In Proc. CVPR, pages 3642–3649, 2012

work page 2012

[5] [5]

Erhan, Y

D. Erhan, Y . Bengio, A. Courville, and P. Vincent. Visualizing higher-layer features of a deep network. Technical Report 1341, University of Montreal, Jun 2009

work page 2009

[6] [6]

Felzenszwalb, D

P. Felzenszwalb, D. Mcallester, and D. Ramanan. A discriminatively trained, multiscale, deformable part model. In Proc. CVPR, 2008. 7

work page 2008

[7] [7]

G. E. Hinton, S. Osindero, and Y . W. Teh. A fast learning algorithm for deep belief nets. Neural Compu- tation, 18(7):1527–1554, 2006

work page 2006

[8] [8]

Krizhevsky, I

A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet classiﬁcation with deep convolutional neural networks. In NIPS, pages 1106–1114, 2012

work page 2012

[9] [9]

Q. Le, M. Ranzato, R. Monga, M. Devin, K. Chen, G. Corrado, J. Dean, and A. Ng. Building high-level features using large scale unsupervised learning. In Proc. ICML, 2012

work page 2012

[10] [10]

LeCun, L

Y . LeCun, L. Bottou, Y . Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998

work page 1998

[11] [11]

Perronnin, J

F. Perronnin, J. S ´anchez, and T. Mensink. Improving the Fisher kernel for large-scale image classiﬁcation. In Proc. ECCV, 2010

work page 2010

[12] [12]

Simonyan, A

K. Simonyan, A. Vedaldi, and A. Zisserman. Deep Fisher networks and class saliency maps for ob- ject classiﬁcation and localisation. In ILSVRC workshop , 2013. URL http://image-net.org/ challenges/LSVRC/2013/slides/ILSVRC_az.pdf

work page 2013

[13] [13]

M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. CoRR, abs/1311.2901v3, 2013. 8

work page Pith review arXiv 2013