arXiv preprint arXiv:2510.04013 , year=

URL https://transformer-circuits · 2025 · arXiv 2510.04013

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

How Much Do Circuits Tell Us? Measuring the Consistency and Specificity of Language Model Circuits

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

Language model circuits show high within-task consistency and necessity but substantial overlap across tasks, making them less specific than assumed.

How LLMs Detect and Correct Their Own Errors: The Role of Internal Confidence Signals

cs.LG · 2026-04-24 · unverdicted · novelty 6.0

LLMs implement a second-order confidence architecture where the PANL activation encodes both error likelihood and the ability to correct it, beyond verbal confidence or log-probabilities.

How do LLMs Compute Verbal Confidence

cs.CL · 2026-03-18 · unverdicted · novelty 6.0

Mechanistic experiments on Gemma 3 27B, Qwen 2.5 7B and Magistral Small 24B show verbal confidence is cached at post-answer positions from answer tokens and captures richer answer-quality information beyond token log-probabilities.

citing papers explorer

Showing 3 of 3 citing papers.

How Much Do Circuits Tell Us? Measuring the Consistency and Specificity of Language Model Circuits cs.CL · 2026-05-08 · unverdicted · none · ref 35
Language model circuits show high within-task consistency and necessity but substantial overlap across tasks, making them less specific than assumed.
How LLMs Detect and Correct Their Own Errors: The Role of Internal Confidence Signals cs.LG · 2026-04-24 · unverdicted · none · ref 14
LLMs implement a second-order confidence architecture where the PANL activation encodes both error likelihood and the ability to correct it, beyond verbal confidence or log-probabilities.
How do LLMs Compute Verbal Confidence cs.CL · 2026-03-18 · unverdicted · none · ref 11
Mechanistic experiments on Gemma 3 27B, Qwen 2.5 7B and Magistral Small 24B show verbal confidence is cached at post-answer positions from answer tokens and captures richer answer-quality information beyond token log-probabilities.

arXiv preprint arXiv:2510.04013 , year=

fields

years

verdicts

representative citing papers

citing papers explorer