Causal Representation Learning Workshop at NeurIPS 2023 , year=

The Linear Representation Hypothesis, the Geometry of Large Language Models , author= · 2023

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

A framework for analyzing concept representations in neural models

cs.CL · 2026-05-02 · unverdicted · novelty 7.0

A new framework shows concept subspaces are not unique, estimator choice affects containment and disentanglement, LEACE works well but generalizes poorly, and HuBERT encodes phone info as contained and disentangled from speaker info while speaker info resists compact containment.

Minimal, Local, Causal Explanations for Jailbreak Success in Large Language Models

cs.AI · 2026-04-30 · unverdicted · novelty 7.0

LOCA identifies an average of six minimal interpretable changes in intermediate representations that causally induce refusal on otherwise successful jailbreaks for Gemma and Llama models.

Convergent Evolution: How Different Language Models Learn Similar Number Representations

cs.CL · 2026-04-22 · unverdicted · novelty 6.0

Diverse language models converge on similar periodic number features with a two-tier hierarchy of Fourier sparsity and geometric separability, acquired via language co-occurrences or multi-token arithmetic.

citing papers explorer

Showing 3 of 3 citing papers.

A framework for analyzing concept representations in neural models cs.CL · 2026-05-02 · unverdicted · none · ref 9
A new framework shows concept subspaces are not unique, estimator choice affects containment and disentanglement, LEACE works well but generalizes poorly, and HuBERT encodes phone info as contained and disentangled from speaker info while speaker info resists compact containment.
Minimal, Local, Causal Explanations for Jailbreak Success in Large Language Models cs.AI · 2026-04-30 · unverdicted · none · ref 40
LOCA identifies an average of six minimal interpretable changes in intermediate representations that causally induce refusal on otherwise successful jailbreaks for Gemma and Llama models.
Convergent Evolution: How Different Language Models Learn Similar Number Representations cs.CL · 2026-04-22 · unverdicted · none · ref 43
Diverse language models converge on similar periodic number features with a two-tier hierarchy of Fourier sparsity and geometric separability, acquired via language co-occurrences or multi-token arithmetic.

Causal Representation Learning Workshop at NeurIPS 2023 , year=

fields

years

verdicts

representative citing papers

citing papers explorer