On relation-specific neurons in large language models

Liu, Yihong, Chen, Runsheng, Hirlimann, Lea, Hakimi, Ahmad Dawar, Wang, Mingyang, Kargaran, Amir Hossein · 2025 · DOI 10.18653/v1/2025.emnlp-main.52

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open at publisher browse 4 citing papers

representative citing papers

LMs as Task-Specific Knowledge Bases: An Interpretability Analysis

cs.CL · 2026-06-25 · unverdicted · novelty 6.0

LMs store facts in task-specific parameter subsets, shown by inconsistent emergence across tasks during training and distinct localized parameters for the same fact.

Expert-Aware Causal Tracing of Factual Recall in Sparse MoE Language Models

cs.CL · 2026-06-02 · unverdicted · novelty 6.0

Expert-aware causal tracing localizes factual recall to specific experts in some MoE models but requires coalitions in others, using CounterFact interventions on subject embeddings.

Tracing Relational Knowledge Recall in Large Language Models

cs.CL · 2026-04-21 · unverdicted · novelty 5.0

Per-head attention contributions to the residual stream serve as strong linear features for classifying relational knowledge in LLMs, with probe accuracy correlating to relation specificity and signal distribution.

Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models

cs.CL · 2026-01-20 · unverdicted · novelty 5.0

The survey organizes mechanistic interpretability techniques into a Locate-Steer-Improve framework to enable actionable improvements in LLM alignment, capability, and efficiency.

citing papers explorer

Showing 4 of 4 citing papers after filters.

LMs as Task-Specific Knowledge Bases: An Interpretability Analysis cs.CL · 2026-06-25 · unverdicted · none · ref 68
LMs store facts in task-specific parameter subsets, shown by inconsistent emergence across tasks during training and distinct localized parameters for the same fact.
Expert-Aware Causal Tracing of Factual Recall in Sparse MoE Language Models cs.CL · 2026-06-02 · unverdicted · none · ref 20
Expert-aware causal tracing localizes factual recall to specific experts in some MoE models but requires coalitions in others, using CounterFact interventions on subject embeddings.
Tracing Relational Knowledge Recall in Large Language Models cs.CL · 2026-04-21 · unverdicted · none · ref 14
Per-head attention contributions to the residual stream serve as strong linear features for classifying relational knowledge in LLMs, with probe accuracy correlating to relation specificity and signal distribution.
Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models cs.CL · 2026-01-20 · unverdicted · none · ref 194
The survey organizes mechanistic interpretability techniques into a Locate-Steer-Improve framework to enable actionable improvements in LLM alignment, capability, and efficiency.

On relation-specific neurons in large language models

fields

years

verdicts

representative citing papers

citing papers explorer