pith. sign in

arXiv preprint arXiv:2403.00745 , year=

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

citation-role summary

background 2 method 2

citation-polarity summary

years

2026 7 2024 1

representative citing papers

WriteSAE: Sparse Autoencoders for Recurrent State

cs.LG · 2026-05-12 · unverdicted · novelty 8.0 · 2 refs

WriteSAE introduces sparse autoencoders with rank-1 matrix atoms for recurrent state updates, allowing replacement tests that outperform deletion on 92.4% of positions and a formula predicting logit changes with R²=0.98.

How LLMs Are Persuaded: A Few Attention Heads, Rerouted

cs.AI · 2026-05-10 · unverdicted · novelty 7.0

Persuasion in LLMs works by redirecting a small set of attention heads to copy the target option token instead of reasoning over evidence, via a rank-one routing feature that can be directly edited or removed.

Cell-Based Representation of Relational Binding in Language Models

cs.CL · 2026-04-21 · unverdicted · novelty 7.0

Large language models encode relational bindings via a cell-based representation: a low-dimensional linear subspace in which each cell corresponds to an entity-relation index pair and attributes are retrieved from the matching cell.

Weight Patching: Toward Source-Level Mechanistic Localization in LLMs

cs.AI · 2026-04-15 · unverdicted · novelty 6.0

Weight Patching localizes capabilities to specific parameter modules in LLMs by replacing weights from a behavior-specialized model into a base model and validating recovery via a vector-anchor interface, revealing a hierarchy of source, routing, and execution components.

citing papers explorer

Showing 8 of 8 citing papers.