Inference-time intervention: Eliciting truthful answers from a language model.NeurIPS

Li, K · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

citation-role summary

background 1

background 1

cs.LG · 2026-04-03 · accept · novelty 8.0

Function vectors steer LLMs successfully where the logit lens fails to decode the target answer, showing the two properties come apart.

Showing 1 of 1 citing paper.

Steerable but Not Decodable: Function Vectors Operate Beyond the Logit Lens cs.LG · 2026-04-03 · accept · none · ref 13
Function vectors steer LLMs successfully where the logit lens fails to decode the target answer, showing the two properties come apart.