Introduces a causal intervention framework with new metrics for mechanistic interpretability of VAEs and reports empirical findings from extensive experiments on multiple models and datasets.
Investigating gender bias in language models using causal mediation analysis
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
A Multi-Level Causal Intervention Framework for Mechanistic Interpretability in Variational Autoencoders
Introduces a causal intervention framework with new metrics for mechanistic interpretability of VAEs and reports empirical findings from extensive experiments on multiple models and datasets.