arXiv preprint arXiv:2005.00719 , year=

Probing the probing paradigm: · 2005 · arXiv 2005.00719

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Eliciting Latent Predictions from Transformers with the Tuned Lens

cs.LG · 2023-03-14 · accept · novelty 7.0

Training per-layer affine probes on frozen transformers yields more reliable latent predictions than the logit lens and enables detection of malicious inputs from prediction trajectories.

Probing for Representation Manifolds in Superposition

cs.LG · 2026-05-18 · unverdicted · novelty 5.0

Introduces the Manifold Probe to discover representation manifolds in superposition and demonstrates causal steering on time concepts in Llama 2-7b.

citing papers explorer

Showing 2 of 2 citing papers.

Eliciting Latent Predictions from Transformers with the Tuned Lens cs.LG · 2023-03-14 · accept · none · ref 75
Training per-layer affine probes on frozen transformers yields more reliable latent predictions than the logit lens and enables detection of malicious inputs from prediction trajectories.
Probing for Representation Manifolds in Superposition cs.LG · 2026-05-18 · unverdicted · none · ref 67
Introduces the Manifold Probe to discover representation manifolds in superposition and demonstrates causal steering on time concepts in Llama 2-7b.

arXiv preprint arXiv:2005.00719 , year=

fields

years

verdicts

representative citing papers

citing papers explorer