pith. sign in

Attention retrieves, mlp memorizes: Disentangling trainable components in the transformer

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

years

2026 2 2025 2

verdicts

UNVERDICTED 4

representative citing papers

Geometry-Calibrated Conformal Abstention for Language Models

cs.CL · 2026-04-30 · unverdicted · novelty 6.0

Geometry-calibrated conformal abstention lets language models abstain from uncertain queries with finite-sample guarantees on both participation rate and conditional correctness of answers.

Provable Knowledge Acquisition and Extraction in One-Layer Transformers

cs.LG · 2025-07-28 · unverdicted · novelty 6.0

In a stylized one-layer transformer, pre-training encodes factual knowledge via relation-specific feature directions and attention patterns; fine-tuning extracts it through a relation-covering mechanism that succeeds when enough latent templates are triggered, with a failure regime explaining inauds

citing papers explorer

Showing 4 of 4 citing papers.