On the Multilingual Ability of Decoder-based Pre-trained Language Models: Finding and Controlling Language-Specific Neurons

Takeshi Kojima, Itsuki Okimura, Yusuke Iwasawa, Hitomi Yanaka, Yutaka Matsuo · 2024 · DOI 10.18653/v1/2024.naacl-long.384

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open at publisher browse 4 citing papers

representative citing papers

Copy First, Translate Later: Interpreting Translation Dynamics in Multilingual Pretraining

cs.CL · 2026-04-19 · unverdicted · novelty 7.0

Multilingual pretraining develops translation in two phases: early copying driven by surface similarities, followed by generalizing mechanisms while copying is refined.

Debiasing Reward Models via Causally Motivated Inference-Time Intervention

cs.CL · 2026-04-30 · unverdicted · novelty 6.0

Neuron-level inference-time intervention reduces multiple biases in reward models, enabling 2B and 7B models to match 70B performance on LLM alignment benchmarks without trade-offs.

Training-Induced Escape from Token Clustering in a Mean-Field Formulation of Transformers

cs.LG · 2026-05-08 · unverdicted · novelty 5.0

Training a mean-field Transformer under L2 regularization induces an escape from attention-driven token clustering in later layers after initial clustering.

Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models

cs.CL · 2026-01-20 · unverdicted · novelty 5.0

The survey organizes mechanistic interpretability techniques into a Locate-Steer-Improve framework to enable actionable improvements in LLM alignment, capability, and efficiency.

citing papers explorer

Showing 4 of 4 citing papers.

Copy First, Translate Later: Interpreting Translation Dynamics in Multilingual Pretraining cs.CL · 2026-04-19 · unverdicted · none · ref 13
Multilingual pretraining develops translation in two phases: early copying driven by surface similarities, followed by generalizing mechanisms while copying is refined.
Debiasing Reward Models via Causally Motivated Inference-Time Intervention cs.CL · 2026-04-30 · unverdicted · none · ref 15
Neuron-level inference-time intervention reduces multiple biases in reward models, enabling 2B and 7B models to match 70B performance on LLM alignment benchmarks without trade-offs.
Training-Induced Escape from Token Clustering in a Mean-Field Formulation of Transformers cs.LG · 2026-05-08 · unverdicted · none · ref 33
Training a mean-field Transformer under L2 regularization induces an escape from attention-driven token clustering in later layers after initial clustering.
Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models cs.CL · 2026-01-20 · unverdicted · none · ref 157
The survey organizes mechanistic interpretability techniques into a Locate-Steer-Improve framework to enable actionable improvements in LLM alignment, capability, and efficiency.

On the Multilingual Ability of Decoder-based Pre-trained Language Models: Finding and Controlling Language-Specific Neurons

fields

years

verdicts

representative citing papers

citing papers explorer