A new framework shows concept subspaces are not unique, estimator choice affects containment and disentanglement, LEACE works well but generalizes poorly, and HuBERT encodes phone info as contained and disentangled from speaker info while speaker info resists compact containment.
Representation biases: will we achieve com- plete understanding by analyzing represen- tations?arXiv preprint arXiv:2507.22216, 2025
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
Proposes a value-encoding framework to characterize and counter homogenization in LLMs by formalizing it via normativity from queer theory and introducing xeno-reproduction tasks from feminist theory, illustrated with a gender-bias experiment on Claude 3.5 Haiku.
Stimulus symmetries render many neural representations functionally equivalent yet produce qualitatively different RSMs, including drifting ones from SGD or regularization in image-encoding networks.
citing papers explorer
-
A framework for analyzing concept representations in neural models
A new framework shows concept subspaces are not unique, estimator choice affects containment and disentanglement, LEACE works well but generalizes poorly, and HuBERT encodes phone info as contained and disentangled from speaker info while speaker info resists compact containment.
-
The Homogenization Problem in LLMs: Towards Meaningful Diversity in AI Safety
Proposes a value-encoding framework to characterize and counter homogenization in LLMs by formalizing it via normativity from queer theory and introducing xeno-reproduction tasks from feminist theory, illustrated with a gender-bias experiment on Claude 3.5 Haiku.
-
Stimulus symmetries can confound representational similarity analyses
Stimulus symmetries render many neural representations functionally equivalent yet produce qualitatively different RSMs, including drifting ones from SGD or regularization in image-encoding networks.