Outcome-fair credit models often exhibit hidden procedural bias through inconsistent reasoning across groups, which the CEC framework mitigates by enforcing consistent feature attributions via counterfactuals.
hub Mixed citations
On the Robustness of Interpretability Methods
Mixed citation behavior. Most common role is background (60%).
abstract
We argue that robustness of explanations---i.e., that similar inputs should give rise to similar explanations---is a key desideratum for interpretability. We introduce metrics to quantify robustness and demonstrate that current methods do not perform well according to these metrics. Finally, we propose ways that robustness can be enforced on existing interpretability approaches.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
In-context symbolic regression methods improve robustness of symbolic formula recovery from KANs, cutting median OFAT test MSE by up to 99.8 percent across hyperparameter sweeps.
A training-free method using Fourier-parameterized star-convex contours optimized via gradients to generate compact, faithful visual attributions for image classifiers on benchmarks like ImageNet.
FASS benchmark shows post-hoc attributions remain unstable under geometric perturbations even after filtering for unchanged predictions, with Grad-CAM exhibiting the highest stability across ImageNet, COCO, and CIFAR-10.
The paper proposes an architecture-aware explanation audit protocol demonstrating that perturbation-based faithfulness is bounded by structural compatibility between explainer and model readout rather than architecture family.
Introduces a unified evaluation framework for XAI using five principled metrics and the PGCA method that fuses grid perturbation with Grad-CAM++ , reporting top scores in fidelity, interpretability and fairness on ResNet-50 models across five image domains.
Develops a method to find minimal input perturbations that flip GBDT predictions by extending random-forest counterfactuals to account for sequential tree dependencies and negative-gradient training.
Interpretability research should be judged by actionability—the degree to which its insights support concrete decisions and interventions—rather than explanatory power alone.
Static malware classifiers learn packing artifacts and dataset composition biases rather than malicious semantics, as diagnosed by TRUSTEE interpretability across controlled dataset variations.
RG-inspired lattice models for piecewise GLMs provide explicit interpretable partitions and a replica-analysis-derived scaling law for regularization that allows increasing complexity without expected rise in generalization loss.
The paper proposes GESD, a procedural fairness metric for group disparities in explanation stability and robustness, and integrates it into the FEU multi-objective optimization framework.
MIRAI is a unified index that combines five responsibility dimensions into one score for tabular models, demonstrating that predictive performance does not ensure high overall integrity.
ProtoSiTex introduces dual-phase prototype learning with hierarchical consistency loss for semi-interpretable multi-label text classification on a new subsentence-annotated hotel review dataset.
NEURON integrates SNOMED CT, ML, and RAG LLM to raise AUC from 0.74-0.77 to 0.84-0.88 and human-aligned explainability scores from 0.50 to 0.85 on MIMIC-IV acute heart failure data.
HETA is a new attribution framework for decoder-only LLMs that combines semantic transition vectors, Hessian-based sensitivity scores, and KL divergence to produce more faithful and human-aligned token attributions than prior methods.
Representation decorrelation regularization in MoE models improves explanation faithfulness on multimodal benchmarks while preserving task performance.
citing papers explorer
-
Explaining Predictions from Tree-based Boosting Ensembles
Develops a method to find minimal input perturbations that flip GBDT predictions by extending random-forest counterfactuals to account for sequential tree dependencies and negative-gradient training.