MetaBackdoor shows that LLMs can be backdoored using positional triggers like sequence length, enabling stealthy activation on clean inputs to leak system prompts or trigger malicious behavior.
hub
Captum: A unified and generic model interpretability library for PyTorch
14 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Gaze data from eye-tracking carries predictive signals for subjective urban perception and improves accuracy when fused with image-based scene representations.
FASS benchmark shows post-hoc attributions remain unstable under geometric perturbations even after filtering for unchanged predictions, with Grad-CAM exhibiting the highest stability across ImageNet, COCO, and CIFAR-10.
MobileMold provides 4941 smartphone microscopy images and shows deep learning models reach 99.5% accuracy on mold detection and food classification tasks.
AIM is a new evaluation framework for explainability in GNNs that combines accuracy, instance-level, and model-level measures, applied to graph kernel networks to create an improved model xGKN.
Instructions trigger a production-centered mechanism in language models, with task-specific information stable in input tokens but varying strongly in output tokens and correlating with behavior.
DMI-Lib delivers 0.4-6.8% overhead for offline batch LLM inference and ~6% for moderate online serving while exposing rich internal signals across backends, cutting latency overhead 2-15x versus prior observability baselines.
Scaling vision models by depth and parameter count does not consistently improve localisation-based explanation quality across architectures, datasets, and post-hoc methods; smaller models often perform comparably or better.
Hallucinations in diffusion models are driven by local intrinsic dimension instabilities on the manifold, which Intrinsic Quenching corrects by deflating it.
X-SYS is a reference architecture for interactive explanation systems organized around STAR quality attributes and five service components, demonstrated via SemanticLens for vision-language models.
Delta-XAI wraps existing XAI methods for online time series and introduces SWING to explain prediction changes while accounting for temporal dependencies.
ExECG is a Python framework providing Wrapper, Explainer, and Visualizer stages to unify XAI methods for ECG models and improve reproducibility.
A visual transformer model trained on IRIS inversions predicts chromospheric temperature and density from SDO data with correlations around 0.8 on 80% of test cases.
citing papers explorer
-
MetaBackdoor: Exploiting Positional Encoding as a Backdoor Attack Surface in LLMs
MetaBackdoor shows that LLMs can be backdoored using positional triggers like sequence length, enabling stealthy activation on clean inputs to leak system prompts or trigger malicious behavior.
-
Modeling Subjective Urban Perception with Human Gaze
Gaze data from eye-tracking carries predictive signals for subjective urban perception and improves accuracy when fused with image-based scene representations.
-
Feature Attribution Stability Suite: How Stable Are Post-Hoc Attributions?
FASS benchmark shows post-hoc attributions remain unstable under geometric perturbations even after filtering for unchanged predictions, with Grad-CAM exhibiting the highest stability across ImageNet, COCO, and CIFAR-10.
-
MobileMold: A Smartphone-Based Microscopy Dataset for Food Mold Detection
MobileMold provides 4941 smartphone microscopy images and shows deep learning models reach 99.5% accuracy on mold detection and food classification tasks.
-
AIMing for Standardised Explainability Evaluation in GNNs: A Framework and Case Study on Graph Kernel Networks
AIM is a new evaluation framework for explainability in GNNs that combines accuracy, instance-level, and model-level measures, applied to graph kernel networks to create an improved model xGKN.
-
Instructions Shape Production of Language, not Processing
Instructions trigger a production-centered mechanism in language models, with task-specific information stable in input tokens but varying strongly in output tokens and correlating with behavior.
-
Enabling Performant and Flexible Model-Internal Observability for LLM Inference
DMI-Lib delivers 0.4-6.8% overhead for offline batch LLM inference and ~6% for moderate online serving while exposing rich internal signals across backends, cutting latency overhead 2-15x versus prior observability baselines.
-
Scaling Vision Models Does Not Consistently Improve Localisation-Based Explanation Quality
Scaling vision models by depth and parameter count does not consistently improve localisation-based explanation quality across architectures, datasets, and post-hoc methods; smaller models often perform comparably or better.
-
Local Intrinsic Dimension Unveils Hallucinations in Diffusion Models
Hallucinations in diffusion models are driven by local intrinsic dimension instabilities on the manifold, which Intrinsic Quenching corrects by deflating it.
-
X-SYS: A Reference Architecture for Interactive Explanation Systems
X-SYS is a reference architecture for interactive explanation systems organized around STAR quality attributes and five service components, demonstrated via SemanticLens for vision-language models.
-
Delta-XAI: A Unified Framework for Explaining Prediction Changes in Online Time Series Monitoring
Delta-XAI wraps existing XAI methods for online time series and introduces SWING to explain prediction changes while accounting for temporal dependencies.
-
ExECG: An Explainable AI Framework for ECG models
ExECG is a Python framework providing Wrapper, Explainer, and Visualizer stages to unify XAI methods for ECG models and improve reproducibility.
-
Predicting the thermodynamics in the chromosphere from the translation of SDO data into the IRIS$^{2}$ inversion results using a visual transformer model
A visual transformer model trained on IRIS inversions predicts chromospheric temperature and density from SDO data with correlations around 0.8 on 80% of test cases.
- Many-Shot CoT-ICL: Making In-Context Learning Truly Learn