Title resolution pending

Mor Geva, Jasmijn Bastings, Katja Filippova, Amir Globerson · 2023 · DOI 10.18653/v1/2023.emnlp-main.751

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

open at publisher browse 7 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

How Language Models Process Negation

cs.CL · 2026-05-04 · unverdicted · novelty 7.0

LLMs implement both attention-based suppression and constructive representations for negation, with construction dominant, despite poor accuracy from late-layer attention shortcuts.

Mechanistically Interpretable Neural Encoding Reveals Fine-Grained Functional Selectivity in Human Visual Cortex

cs.CV · 2026-05-15 · unverdicted · novelty 6.0

MINE uses mechanistic interpretability on language-aligned image representations to generate per-voxel feature descriptions, validated via image generation and counterfactual edits that causally shift brain activation.

Loop, Think, & Generalize: Implicit Reasoning in Recurrent-Depth Transformers

cs.CL · 2026-04-09 · unverdicted · novelty 6.0

Recurrent-depth transformers achieve systematic generalization and depth extrapolation on implicit reasoning tasks through iterative layer reuse, a three-stage grokking process, and inference-time scaling, while vanilla transformers fail.

Tracing Relational Knowledge Recall in Large Language Models

cs.CL · 2026-04-21 · unverdicted · novelty 5.0

Per-head attention contributions to the residual stream serve as strong linear features for classifying relational knowledge in LLMs, with probe accuracy correlating to relation specificity and signal distribution.

Analyzing the Effect of Noise in LLM Fine-tuning

cs.LG · 2026-04-14 · unverdicted · novelty 5.0

Label noise hurts fine-tuning performance most while grammatical and typographical noise sometimes act as mild regularizers, with changes concentrated in task-specific layers.

Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models

cs.CL · 2026-01-20 · unverdicted · novelty 5.0

The survey organizes mechanistic interpretability techniques into a Locate-Steer-Improve framework to enable actionable improvements in LLM alignment, capability, and efficiency.

Language-Switching Triggers Take a Latent Detour Through Language Models

cs.CL · 2026-05-18

citing papers explorer

Showing 7 of 7 citing papers.

How Language Models Process Negation cs.CL · 2026-05-04 · unverdicted · none · ref 58
LLMs implement both attention-based suppression and constructive representations for negation, with construction dominant, despite poor accuracy from late-layer attention shortcuts.
Mechanistically Interpretable Neural Encoding Reveals Fine-Grained Functional Selectivity in Human Visual Cortex cs.CV · 2026-05-15 · unverdicted · none · ref 39
MINE uses mechanistic interpretability on language-aligned image representations to generate per-voxel feature descriptions, validated via image generation and counterfactual edits that causally shift brain activation.
Loop, Think, & Generalize: Implicit Reasoning in Recurrent-Depth Transformers cs.CL · 2026-04-09 · unverdicted · none · ref 1
Recurrent-depth transformers achieve systematic generalization and depth extrapolation on implicit reasoning tasks through iterative layer reuse, a three-stage grokking process, and inference-time scaling, while vanilla transformers fail.
Tracing Relational Knowledge Recall in Large Language Models cs.CL · 2026-04-21 · unverdicted · none · ref 7
Per-head attention contributions to the residual stream serve as strong linear features for classifying relational knowledge in LLMs, with probe accuracy correlating to relation specificity and signal distribution.
Analyzing the Effect of Noise in LLM Fine-tuning cs.LG · 2026-04-14 · unverdicted · none · ref 6
Label noise hurts fine-tuning performance most while grammatical and typographical noise sometimes act as mild regularizers, with changes concentrated in task-specific layers.
Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models cs.CL · 2026-01-20 · unverdicted · none · ref 85
The survey organizes mechanistic interpretability techniques into a Locate-Steer-Improve framework to enable actionable improvements in LLM alignment, capability, and efficiency.
Language-Switching Triggers Take a Latent Detour Through Language Models cs.CL · 2026-05-18 · unreviewed · ref 16

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer