Do Androids Know They’re Only Dreaming of Electric Sheep?

Sky CH-Wang, Benjamin Van Durme, Jason Eisner, Chris Kedzie · 2024 · Findings of the Association for Computational Linguistics ACL 2024 · DOI 10.18653/v1/2024.findings-acl.260

3 Pith papers cite this work, alongside 3 external citations. Polarity classification is still indexing.

3 Pith papers citing it

3 external citations · Crossref

open at publisher browse 3 citing papers

representative citing papers

Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior

cs.LG · 2026-05-06 · unverdicted · novelty 7.0

Manifold steering along activation geometry induces behavioral trajectories matching the natural manifold of outputs, while linear steering produces off-manifold unnatural behaviors.

Process Supervision of Confidence Margin for Calibrated LLM Reasoning

cs.LG · 2026-04-25 · unverdicted · novelty 6.0

RLCM trains LLMs with a margin-enhanced process reward that widens the gap between correct and incorrect reasoning steps, improving calibration on math, code, logic, and science tasks without hurting accuracy.

Detecting Hallucinations in Large Language Models via Internal Attention Divergence Signals

cs.CL · 2026-05-06 · unverdicted · novelty 5.0

KL divergence of attention heads from uniform distribution predicts LLM answer correctness across datasets and model families.

citing papers explorer

Showing 3 of 3 citing papers.

Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior cs.LG · 2026-05-06 · unverdicted · none · ref 191
Manifold steering along activation geometry induces behavioral trajectories matching the natural manifold of outputs, while linear steering produces off-manifold unnatural behaviors.
Process Supervision of Confidence Margin for Calibrated LLM Reasoning cs.LG · 2026-04-25 · unverdicted · none · ref 8
RLCM trains LLMs with a margin-enhanced process reward that widens the gap between correct and incorrect reasoning steps, improving calibration on math, code, logic, and science tasks without hurting accuracy.
Detecting Hallucinations in Large Language Models via Internal Attention Divergence Signals cs.CL · 2026-05-06 · unverdicted · none · ref 63
KL divergence of attention heads from uniform distribution predicts LLM answer correctness across datasets and model families.

Do Androids Know They’re Only Dreaming of Electric Sheep?

fields

years

verdicts

representative citing papers

citing papers explorer