Transformer feed-forward layers build predictions by promoting concepts in the vocabulary space

Mor Geva, Avi Caciularu, Kevin Wang, Yoav Goldberg · 2022

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

citation-role summary

method 1

citation-polarity summary

use method 1

representative citing papers

Learning Through Noise: Why Subliminal Learning Works and When It Fails

cs.LG · 2026-05-22 · unverdicted · novelty 7.0

Subliminal learning occurs via compatible auxiliary and class output heads on task-unrelated inputs, even with random hidden layers or architecture changes, with theory and upper bounds on failure.

Event-Grounded Sparse Autoencoders for Vision-Language-Action Policies

cs.RO · 2026-05-17 · conditional · novelty 7.0

Event-grounded SAE analysis in VLA policies produces stronger causal effects on robot behavior than standard methods by anchoring features to clustered end-effector keyframes across simulations and real-robot tests.

The Geometry of Forgetting: Temporal Knowledge Drift as an Independent Axis in LLM Representations

cs.AI · 2026-05-09 · unverdicted · novelty 6.0

Temporal knowledge drift is encoded as a geometrically orthogonal direction in LLM residual streams, independent of correctness and uncertainty.

Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models

cs.CL · 2026-01-20 · unverdicted · novelty 5.0

The survey organizes mechanistic interpretability techniques into a Locate-Steer-Improve framework to enable actionable improvements in LLM alignment, capability, and efficiency.

citing papers explorer

Showing 4 of 4 citing papers.

Learning Through Noise: Why Subliminal Learning Works and When It Fails cs.LG · 2026-05-22 · unverdicted · none · ref 22
Subliminal learning occurs via compatible auxiliary and class output heads on task-unrelated inputs, even with random hidden layers or architecture changes, with theory and upper bounds on failure.
Event-Grounded Sparse Autoencoders for Vision-Language-Action Policies cs.RO · 2026-05-17 · conditional · none · ref 8
Event-grounded SAE analysis in VLA policies produces stronger causal effects on robot behavior than standard methods by anchoring features to clustered end-effector keyframes across simulations and real-robot tests.
The Geometry of Forgetting: Temporal Knowledge Drift as an Independent Axis in LLM Representations cs.AI · 2026-05-09 · unverdicted · none · ref 15
Temporal knowledge drift is encoded as a geometrically orthogonal direction in LLM residual streams, independent of correctness and uncertainty.
Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models cs.CL · 2026-01-20 · unverdicted · none · ref 84
The survey organizes mechanistic interpretability techniques into a Locate-Steer-Improve framework to enable actionable improvements in LLM alignment, capability, and efficiency.

Transformer feed-forward layers build predictions by promoting concepts in the vocabulary space

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer