hub

Axiomatic Attribution for Deep Networks, June 2017

Mukund Sundararajan, Ankur Taly, Qiqi Yan · 2017 · cs.LG · arXiv 1703.01365

14 Pith papers cite this work. Polarity classification is still indexing.

14 Pith papers citing it

open full Pith review browse 14 citing papers arXiv PDF

abstract

We study the problem of attributing the prediction of a deep network to its input features, a problem previously studied by several other works. We identify two fundamental axioms---Sensitivity and Implementation Invariance that attribution methods ought to satisfy. We show that they are not satisfied by most known attribution methods, which we consider to be a fundamental weakness of those methods. We use the axioms to guide the design of a new attribution method called Integrated Gradients. Our method requires no modification to the original network and is extremely simple to implement; it just needs a few calls to the standard gradient operator. We apply this method to a couple of image models, a couple of text models and a chemistry model, demonstrating its ability to debug networks, to extract rules from a network, and to enable users to engage with models better.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Extremal Contours: Gradient-driven contours for compact visual attribution

cs.CV · 2025-11-03 · unverdicted · novelty 7.0

A training-free method using Fourier-parameterized star-convex contours optimized via gradients to generate compact, faithful visual attributions for image classifiers on benchmarks like ImageNet.

Neurons Speak in Ranges: Breaking Free from Discrete Neuronal Attribution

cs.LG · 2025-02-04 · unverdicted · novelty 7.0

Neurons exhibit concept-conditioned activation ranges forming Gaussian-like distributions with minimal overlap, and range-based interventions via NeuronLens outperform neuron-level masking in targeted manipulation with reduced collateral effects.

Homogeneous Stellar Parameters from Heterogeneous Spectra with Deep Learning

astro-ph.GA · 2026-04-28 · unverdicted · novelty 7.0

A single end-to-end Transformer model unifies stellar labels from heterogeneous spectroscopic surveys into a self-consistent scale without post-hoc recalibration.

Quantifying Explanation Consistency: The C-Score Metric for CAM-Based Explainability in Medical Image Classification

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

The C-Score quantifies intra-class explanation consistency for CAM methods via confidence-weighted pairwise soft IoU and detects AUC-consistency dissociation as an early warning for model instability on chest X-ray classification.

Rhamba: Region-Aware Hybrid Attention-Mamba Framework for Self-Supervised Learning in Resting-State fMRI

cs.LG · 2026-05-02 · unverdicted · novelty 6.0 · 2 refs

Rhamba uses region-aware masking strategies and hybrid Attention-Mamba models pretrained on ABIDE fMRI data to achieve top AUROC on schizophrenia and ADHD classification tasks while outperforming prior methods.

Compared to What? Baselines and Metrics for Counterfactual Prompting

cs.CL · 2026-05-01 · conditional · novelty 6.0

Counterfactual prompting effects on LLMs are often indistinguishable from those caused by meaning-preserving paraphrases, causing most previously reported demographic sensitivities to disappear under proper statistical comparison.

Validating the Clinical Utility of CineECG 3D Reconstructions through Cross-Modal Feature Attribution

eess.IV · 2026-04-29 · unverdicted · novelty 6.0

Cross-modal averaging maps ECG model attributions to CineECG 3D space, raising Dice overlap with expert annotations from 0.47 to 0.56 on 20 cases while filtering attribution noise.

TwinSpecNet: Extending APOGEE's chemical reach to low-S/N spectra via empirical paired learning

astro-ph.GA · 2026-04-29 · unverdicted · novelty 6.0

TwinSpecNet uses empirical paired learning on spectral twins to denoise low-S/N APOGEE spectra and predict stellar parameters and abundances with lower scatter than the standard pipeline.

What exactly did the Transformer learn from our physics data?

astro-ph.IM · 2025-05-27 · unverdicted · novelty 5.0

Transformers trained on cosmic ray simulations learn physically plausible features in positional encodings for symmetric air showers and in attention mechanisms for galaxy-origin particles.

AI-Generated Images: What Humans and Machines See When They Look at the Same Image

cs.CV · 2026-05-07 · unverdicted · novelty 5.0

Researchers train AI detectors on a large photorealistic fake image dataset, apply 16 XAI methods, and use human survey feedback to assess alignment between machine explanations and human perception of AI-generated images.

Uncertainty-Aware Transformers: Conformal Prediction for Language Models

cs.LG · 2026-04-10 · unverdicted · novelty 5.0

CONFIDE applies conformal prediction to transformer embeddings for valid prediction sets, improving accuracy up to 4.09% and efficiency over baselines on models like BERT-tiny.

ClinQueryAgent: A Conversational Agent for Population Health Management

cs.IR · 2026-04-13 · unverdicted · novelty 4.0

The paper introduces ClinQueryAgent, a conversational agent that converts natural language queries into database queries for population health management while keeping patient data secure, and reports its use by 128 staff across 15 NHS practices covering 148,319 patients.

A Giant-Step Baby-Step Classifier For Scalable and Real-Time Anomaly Detection In Industrial Control Systems and Water Treatment Systems

cs.CR · 2025-04-29 · unverdicted · novelty 4.0

A Giant-Step Baby-Step Classifier for real-time anomaly detection in ICS via linearization of sensor-actuator interactions, achieving millisecond response and 97.72% accuracy on a water treatment testbed with built-in explainability.

Architecture-Aware Explanation Auditing for Industrial Visual Inspection

cs.LG · 2026-05-14 · 2 refs

citing papers explorer

Showing 14 of 14 citing papers.

Extremal Contours: Gradient-driven contours for compact visual attribution cs.CV · 2025-11-03 · unverdicted · none · ref 14 · internal anchor
A training-free method using Fourier-parameterized star-convex contours optimized via gradients to generate compact, faithful visual attributions for image classifiers on benchmarks like ImageNet.
Neurons Speak in Ranges: Breaking Free from Discrete Neuronal Attribution cs.LG · 2025-02-04 · unverdicted · none · ref 30 · internal anchor
Neurons exhibit concept-conditioned activation ranges forming Gaussian-like distributions with minimal overlap, and range-based interventions via NeuronLens outperform neuron-level masking in targeted manipulation with reduced collateral effects.
Homogeneous Stellar Parameters from Heterogeneous Spectra with Deep Learning astro-ph.GA · 2026-04-28 · unverdicted · none · ref 70
A single end-to-end Transformer model unifies stellar labels from heterogeneous spectroscopic surveys into a self-consistent scale without post-hoc recalibration.
Quantifying Explanation Consistency: The C-Score Metric for CAM-Based Explainability in Medical Image Classification cs.CV · 2026-04-09 · unverdicted · none · ref 50
The C-Score quantifies intra-class explanation consistency for CAM methods via confidence-weighted pairwise soft IoU and detects AUC-consistency dissociation as an early warning for model instability on chest X-ray classification.
Rhamba: Region-Aware Hybrid Attention-Mamba Framework for Self-Supervised Learning in Resting-State fMRI cs.LG · 2026-05-02 · unverdicted · none · ref 59 · 2 links
Rhamba uses region-aware masking strategies and hybrid Attention-Mamba models pretrained on ABIDE fMRI data to achieve top AUROC on schizophrenia and ADHD classification tasks while outperforming prior methods.
Compared to What? Baselines and Metrics for Counterfactual Prompting cs.CL · 2026-05-01 · conditional · none · ref 29
Counterfactual prompting effects on LLMs are often indistinguishable from those caused by meaning-preserving paraphrases, causing most previously reported demographic sensitivities to disappear under proper statistical comparison.
Validating the Clinical Utility of CineECG 3D Reconstructions through Cross-Modal Feature Attribution eess.IV · 2026-04-29 · unverdicted · none · ref 13
Cross-modal averaging maps ECG model attributions to CineECG 3D space, raising Dice overlap with expert annotations from 0.47 to 0.56 on 20 cases while filtering attribution noise.
TwinSpecNet: Extending APOGEE's chemical reach to low-S/N spectra via empirical paired learning astro-ph.GA · 2026-04-29 · unverdicted · none · ref 75
TwinSpecNet uses empirical paired learning on spectral twins to denoise low-S/N APOGEE spectra and predict stellar parameters and abundances with lower scatter than the standard pipeline.
What exactly did the Transformer learn from our physics data? astro-ph.IM · 2025-05-27 · unverdicted · none · ref 26 · internal anchor
Transformers trained on cosmic ray simulations learn physically plausible features in positional encodings for symmetric air showers and in attention mechanisms for galaxy-origin particles.
AI-Generated Images: What Humans and Machines See When They Look at the Same Image cs.CV · 2026-05-07 · unverdicted · none · ref 42
Researchers train AI detectors on a large photorealistic fake image dataset, apply 16 XAI methods, and use human survey feedback to assess alignment between machine explanations and human perception of AI-generated images.
Uncertainty-Aware Transformers: Conformal Prediction for Language Models cs.LG · 2026-04-10 · unverdicted · none · ref 22
CONFIDE applies conformal prediction to transformer embeddings for valid prediction sets, improving accuracy up to 4.09% and efficiency over baselines on models like BERT-tiny.
ClinQueryAgent: A Conversational Agent for Population Health Management cs.IR · 2026-04-13 · unverdicted · none · ref 72 · internal anchor
The paper introduces ClinQueryAgent, a conversational agent that converts natural language queries into database queries for population health management while keeping patient data secure, and reports its use by 128 staff across 15 NHS practices covering 148,319 patients.
A Giant-Step Baby-Step Classifier For Scalable and Real-Time Anomaly Detection In Industrial Control Systems and Water Treatment Systems cs.CR · 2025-04-29 · unverdicted · none · ref 38 · internal anchor
A Giant-Step Baby-Step Classifier for real-time anomaly detection in ICS via linearization of sensor-actuator interactions, achieving millisecond response and 97.72% accuracy on a water treatment testbed with built-in explainability.
Architecture-Aware Explanation Auditing for Industrial Visual Inspection cs.LG · 2026-05-14 · unreviewed · ref 11 · 2 links · internal anchor

Axiomatic Attribution for Deep Networks, June 2017

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer