Linear probes for Othello board states factor into tensor-product structure with square and color embeddings composed by a binding matrix, from which the linear probes can be directly recovered.
Temporal sparse autoencoders: Leveraging the sequential nature of language for interpretability
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 2
citation-polarity summary
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2roles
background 2polarities
background 2representative citing papers
RET learns temporally consistent macrovariables from LLM activations via self-supervised learning to support interpretability, early behavioral prediction, and causal intervention.
citing papers explorer
-
Tensor Product Representation Probes Reveal Shared Structure Across Linear Directions
Linear probes for Othello board states factor into tensor-product structure with square and color embeddings composed by a binding matrix, from which the linear probes can be directly recovered.
-
Towards Effective Theory of LLMs: A Representation Learning Approach
RET learns temporally consistent macrovariables from LLM activations via self-supervised learning to support interpretability, early behavioral prediction, and causal intervention.