pith. sign in

hub

Axiomatic Attribution for Deep Networks, June 2017

14 Pith papers cite this work. Polarity classification is still indexing.

14 Pith papers citing it
abstract

We study the problem of attributing the prediction of a deep network to its input features, a problem previously studied by several other works. We identify two fundamental axioms---Sensitivity and Implementation Invariance that attribution methods ought to satisfy. We show that they are not satisfied by most known attribution methods, which we consider to be a fundamental weakness of those methods. We use the axioms to guide the design of a new attribution method called Integrated Gradients. Our method requires no modification to the original network and is extremely simple to implement; it just needs a few calls to the standard gradient operator. We apply this method to a couple of image models, a couple of text models and a chemistry model, demonstrating its ability to debug networks, to extract rules from a network, and to enable users to engage with models better.

hub tools

citation-role summary

background 1

citation-polarity summary

years

2026 10 2025 4

roles

background 1

polarities

background 1

representative citing papers

Neurons Speak in Ranges: Breaking Free from Discrete Neuronal Attribution

cs.LG · 2025-02-04 · unverdicted · novelty 7.0

Neurons exhibit concept-conditioned activation ranges forming Gaussian-like distributions with minimal overlap, and range-based interventions via NeuronLens outperform neuron-level masking in targeted manipulation with reduced collateral effects.

Compared to What? Baselines and Metrics for Counterfactual Prompting

cs.CL · 2026-05-01 · conditional · novelty 6.0

Counterfactual prompting effects on LLMs are often indistinguishable from those caused by meaning-preserving paraphrases, causing most previously reported demographic sensitivities to disappear under proper statistical comparison.

What exactly did the Transformer learn from our physics data?

astro-ph.IM · 2025-05-27 · unverdicted · novelty 5.0

Transformers trained on cosmic ray simulations learn physically plausible features in positional encodings for symmetric air showers and in attention mechanisms for galaxy-origin particles.

ClinQueryAgent: A Conversational Agent for Population Health Management

cs.IR · 2026-04-13 · unverdicted · novelty 4.0

The paper introduces ClinQueryAgent, a conversational agent that converts natural language queries into database queries for population health management while keeping patient data secure, and reports its use by 128 staff across 15 NHS practices covering 148,319 patients.

citing papers explorer

Showing 14 of 14 citing papers.