AttnTrace is an attention-weight-based context traceback method for LLMs that claims higher accuracy and efficiency than prior art like TracLLM while aiding prompt injection detection.
Axiomatic attribution for deep networks
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
Semantic Bottleneck Networks add interpretable semantic concept layers to deep networks, recovering SOTA segmentation performance with drastic channel reduction and enabling failure interpretation at over 99% accuracy for most outputs.
Methods are introduced to lift static attribution techniques to dynamical models for explaining risk increases in clinical alert systems.
citing papers explorer
-
AttnTrace: Contextual Attribution of Prompt Injection and Knowledge Corruption
AttnTrace is an attention-weight-based context traceback method for LLMs that claims higher accuracy and efficiency than prior art like TracLLM while aiding prompt injection detection.
-
Interpretability Beyond Classification Output: Semantic Bottleneck Networks
Semantic Bottleneck Networks add interpretable semantic concept layers to deep networks, recovering SOTA segmentation performance with drastic channel reduction and enabling failure interpretation at over 99% accuracy for most outputs.
-
Explaining an increase in predicted risk for clinical alerts
Methods are introduced to lift static attribution techniques to dynamical models for explaining risk increases in clinical alert systems.