SemRF supplies fixed semantic anchors and pseudo-inverse tying to produce stable coordinates for residual dynamics, Voronoi traces, and minimum-action canonical paths that link to parameter efficiency under controlled interface error.
Residual Stream Analysis with Multi-Layer SAE s
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
PRISM is a new activation-conditioned model that recovers full sets of simultaneous instructions from LLM hidden states via judge-guided GRPO training and outperforms prior activation-to-language methods on security-relevant tasks.
Auto-interpretation labels for SAE features generalize poorly across languages and scripts, missing the same semantic content up to 4x more often in Serbian than English and more in Cyrillic than Latin despite deterministic transliteration.
Tuning the depth-width ratio positions models in an efficient neural interaction interval that correlates with better generalization under fixed budgets and remains stable with scale.
citing papers explorer
-
PRISM: Recovering Instruction Sets from Language Model Activations
PRISM is a new activation-conditioned model that recovers full sets of simultaneous instructions from LLM hidden states via judge-guided GRPO training and outperforms prior activation-to-language methods on security-relevant tasks.