pith. sign in

hub

arXiv preprint arXiv:2403.00745 , year=

17 Pith papers cite this work. Polarity classification is still indexing.

17 Pith papers citing it

hub tools

citation-role summary

background 2 method 2

citation-polarity summary

years

2026 16 2024 1

clear filters

representative citing papers

WriteSAE: Sparse Autoencoders for Recurrent State

cs.LG · 2026-05-12 · unverdicted · novelty 8.0 · 2 refs

WriteSAE introduces sparse autoencoders with rank-1 matrix atoms for recurrent state updates, allowing replacement tests that outperform deletion on 92.4% of positions and a formula predicting logit changes with R²=0.98.

Subliminal Learning is a LoRA Artifact

cs.AI · 2026-05-30 · conditional · novelty 7.0

Subliminal learning is a LoRA artifact that disappears with full finetuning, depends on context tokens like system prompts, and localizes to overlapping finetuning-evaluation tokens.

How LLMs Are Persuaded: A Few Attention Heads, Rerouted

cs.AI · 2026-05-10 · unverdicted · novelty 7.0

Persuasion in LLMs works by redirecting a small set of attention heads to copy the target option token instead of reasoning over evidence, via a rank-one routing feature that can be directly edited or removed.

Cell-Based Representation of Relational Binding in Language Models

cs.CL · 2026-04-21 · unverdicted · novelty 7.0

Large language models encode relational bindings via a cell-based representation: a low-dimensional linear subspace in which each cell corresponds to an entity-relation index pair and attributes are retrieved from the matching cell.

Localizing Anchoring Pathways in Language Models

cs.CL · 2026-06-11 · unverdicted · novelty 6.0

Attribution methods localize anchoring signals in Qwen and Llama models; edge-level circuits transfer within a model but show sparse transfer from base to instruction-tuned variants.

Weight Patching: Toward Source-Level Mechanistic Localization in LLMs

cs.AI · 2026-04-15 · unverdicted · novelty 6.0

Weight Patching localizes capabilities to specific parameter modules in LLMs by replacing weights from a behavior-specialized model into a base model and validating recovery via a vector-anchor interface, revealing a hierarchy of source, routing, and execution components.

citing papers explorer

Showing 5 of 5 citing papers after filters.

  • Subliminal Learning is a LoRA Artifact cs.AI · 2026-05-30 · conditional · none · ref 27

    Subliminal learning is a LoRA artifact that disappears with full finetuning, depends on context tokens like system prompts, and localizes to overlapping finetuning-evaluation tokens.

  • How LLMs Are Persuaded: A Few Attention Heads, Rerouted cs.AI · 2026-05-10 · unverdicted · none · ref 32

    Persuasion in LLMs works by redirecting a small set of attention heads to copy the target option token instead of reasoning over evidence, via a rank-one routing feature that can be directly edited or removed.

  • Wiring the 'Why': A Unified Taxonomy and Survey of Abductive Reasoning in LLMs cs.AI · 2026-04-09 · accept · none · ref 52

    The paper delivers the first survey of abductive reasoning in LLMs, a unified two-stage taxonomy, a compact benchmark, and an analysis of gaps relative to deductive and inductive reasoning.

  • Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet cs.AI · 2026-05-28 · unverdicted · none · ref 44

    Sparse autoencoders scaled to 34 million features on Claude 3 Sonnet yield interpretable, steerable representations of concrete and abstract concepts that generalize across languages and modalities.

  • Weight Patching: Toward Source-Level Mechanistic Localization in LLMs cs.AI · 2026-04-15 · unverdicted · none · ref 24

    Weight Patching localizes capabilities to specific parameter modules in LLMs by replacing weights from a behavior-specialized model into a base model and validating recovery via a vector-anchor interface, revealing a hierarchy of source, routing, and execution components.