A latent mediation framework with sparse autoencoders enables non-additive token-level influence attribution in LLMs by learning orthogonal features and back-propagating attributions.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Phoneme embeddings in self-supervised ASR models show both random variance and systematic bias as sources of demographic unfairness, with variance hindering fairness more than bias.
citing papers explorer
-
Correcting Influence: Unboxing LLM Outputs with Orthogonal Latent Spaces
A latent mediation framework with sparse autoencoders enables non-additive token-level influence attribution in LLMs by learning orthogonal features and back-propagating attributions.
-
Identifying and typifying demographic unfairness in phoneme-level embeddings of self-supervised speech recognition models
Phoneme embeddings in self-supervised ASR models show both random variance and systematic bias as sources of demographic unfairness, with variance hindering fairness more than bias.