The model’s performance was 77.4%; this was determined by having GPT4o-mini mark questions (B) Distribution of Gemma’s confidence responses across the 10 classes

were used, since we were focussed on understanding the generation of Gemma’s raw verbal confidence signals · 2017

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

cs.CL · 2026-03-18 · unverdicted · novelty 6.0

Mechanistic experiments on Gemma 3 27B, Qwen 2.5 7B and Magistral Small 24B show verbal confidence is cached at post-answer positions from answer tokens and captures richer answer-quality information beyond token log-probabilities.

citing papers explorer

Showing 1 of 1 citing paper.

How do LLMs Compute Verbal Confidence cs.CL · 2026-03-18 · unverdicted · none · ref 24
Mechanistic experiments on Gemma 3 27B, Qwen 2.5 7B and Magistral Small 24B show verbal confidence is cached at post-answer positions from answer tokens and captures richer answer-quality information beyond token log-probabilities.

The model’s performance was 77.4%; this was determined by having GPT4o-mini mark questions (B) Distribution of Gemma’s confidence responses across the 10 classes

fields

years

verdicts

representative citing papers

citing papers explorer