Estimating Knowledge in Large Language Models Without Generating a Single Token

Daniela Gottesman, Mor Geva · 2024 · DOI 10.18653/v1/2024.emnlp-main.232

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

representative citing papers

Instructions Shape Production of Language, not Processing

cs.CL · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

Instructions trigger a production-centered mechanism in language models, with task-specific information stable in input tokens but varying strongly in output tokens and correlating with behavior.

Interpretability Can Be Actionable

cs.LG · 2026-05-11 · conditional · novelty 6.0

Interpretability research should be judged by actionability—the degree to which its insights support concrete decisions and interventions—rather than explanatory power alone.

citing papers explorer

Showing 2 of 2 citing papers.

Instructions Shape Production of Language, not Processing cs.CL · 2026-05-11 · unverdicted · none · ref 20 · 2 links
Instructions trigger a production-centered mechanism in language models, with task-specific information stable in input tokens but varying strongly in output tokens and correlating with behavior.
Interpretability Can Be Actionable cs.LG · 2026-05-11 · conditional · none · ref 164
Interpretability research should be judged by actionability—the degree to which its insights support concrete decisions and interventions—rather than explanatory power alone.

Estimating Knowledge in Large Language Models Without Generating a Single Token

fields

years

verdicts

representative citing papers

citing papers explorer