Advances in Neural Information Processing Systems , volume =

Meng, Kevin, Bau, David, Andonian, Alex, Belinkov, Yonatan , title =

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

representative citing papers

Artificial Aphasias in Lesioned Language Models

cs.CL · 2026-05-15 · unverdicted · novelty 7.0

Lesioning parameters in large language models produces aphasia-like symptoms whose distributions vary by attention versus feed-forward components and by layer depth, but differ qualitatively from human clinical profiles.

Ensemble Monitoring for AI Control: Diverse Signals Outweigh More Compute

cs.AI · 2026-05-14 · unverdicted · novelty 6.0

Diverse ensembles of prompted and fine-tuned GPT-4.1-Mini monitors achieve 2.4x better detection of flawed code solutions than homogeneous ensembles on adversarial inputs.

Single-Position Intervention Fails: Distributed Output Templates Drive In-Context Learning

cs.LG · 2026-04-10 · unverdicted · novelty 6.0

ICL task identity is encoded as distributed output format templates across demonstration tokens rather than localized at single positions.

TIDE: Every Layer Knows the Token Beneath the Context

cs.CL · 2026-05-07 · unverdicted · novelty 5.0

TIDE augments standard transformers with per-layer token embedding injection via an ensemble of memory blocks and a depth-conditioned router to mitigate rare-token undertraining and contextual collapse.

citing papers explorer

Showing 4 of 4 citing papers.

Artificial Aphasias in Lesioned Language Models cs.CL · 2026-05-15 · unverdicted · none · ref 27
Lesioning parameters in large language models produces aphasia-like symptoms whose distributions vary by attention versus feed-forward components and by layer depth, but differ qualitatively from human clinical profiles.
Ensemble Monitoring for AI Control: Diverse Signals Outweigh More Compute cs.AI · 2026-05-14 · unverdicted · none · ref 14
Diverse ensembles of prompted and fine-tuned GPT-4.1-Mini monitors achieve 2.4x better detection of flawed code solutions than homogeneous ensembles on adversarial inputs.
Single-Position Intervention Fails: Distributed Output Templates Drive In-Context Learning cs.LG · 2026-04-10 · unverdicted · none · ref 11
ICL task identity is encoded as distributed output format templates across demonstration tokens rather than localized at single positions.
TIDE: Every Layer Knows the Token Beneath the Context cs.CL · 2026-05-07 · unverdicted · none · ref 23
TIDE augments standard transformers with per-layer token embedding injection via an ensemble of memory blocks and a depth-conditioned router to mitigate rare-token undertraining and contextual collapse.

Advances in Neural Information Processing Systems , volume =

fields

years

verdicts

representative citing papers

citing papers explorer