A unified representation underlying the judgment of large language models.arXiv preprint arXiv:2510.27328

URLhttps://arxiv · arXiv 2510.27328

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Probing Persona-Dependent Preferences in Language Models

cs.CL · 2026-05-13 · unverdicted · novelty 6.0

Linear probes on residual-stream activations identify a shared preference vector in LLMs that tracks choices across prompts and causally steers decisions even for anti-correlated personas.

How LLMs Detect and Correct Their Own Errors: The Role of Internal Confidence Signals

cs.LG · 2026-04-24 · unverdicted · novelty 6.0

LLMs implement a second-order confidence architecture where the PANL activation encodes both error likelihood and the ability to correct it, beyond verbal confidence or log-probabilities.

citing papers explorer

Showing 2 of 2 citing papers.

Probing Persona-Dependent Preferences in Language Models cs.CL · 2026-05-13 · unverdicted · none · ref 22
Linear probes on residual-stream activations identify a shared preference vector in LLMs that tracks choices across prompts and causally steers decisions even for anti-correlated personas.
How LLMs Detect and Correct Their Own Errors: The Role of Internal Confidence Signals cs.LG · 2026-04-24 · unverdicted · none · ref 15
LLMs implement a second-order confidence architecture where the PANL activation encodes both error likelihood and the ability to correct it, beyond verbal confidence or log-probabilities.

A unified representation underlying the judgment of large language models.arXiv preprint arXiv:2510.27328

fields

years

verdicts

representative citing papers

citing papers explorer