Do Androids Know They’re Only Dreaming of Electric Sheep?

Sky CH-Wang, Benjamin Van Durme, Jason Eisner, Chris Kedzie · 2024 · Findings of the Association for Computational Linguistics ACL 2024 · DOI 10.18653/v1/2024.findings-acl.260

8 Pith papers cite this work, alongside 3 external citations. Polarity classification is still indexing.

8 Pith papers citing it

3 external citations · Crossref

open at publisher browse 8 citing papers

representative citing papers

Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior

cs.LG · 2026-05-06 · unverdicted · novelty 7.0

Manifold steering along activation geometry induces behavioral trajectories matching the natural manifold of outputs, while linear steering produces off-manifold unnatural behaviors.

LLM Self-Recognition: Steering and Retrieving Activation Signatures

cs.AI · 2026-06-04 · unverdicted · novelty 6.0

Steering LLM residual streams with random sparse vectors creates detectable self-recognition fingerprints that enable over 98% accurate attribution of generated text to specific models without degrading output quality.

Boosting Self-Consistency with Ranking

cs.CL · 2026-06-03 · unverdicted · novelty 6.0

RISC reformulates self-consistency answer selection as a ranking task solved by a lightweight LambdaRank model with five hand-designed features, yielding better accuracy-efficiency trade-offs than majority voting on QA benchmarks.

TriLens: Per-Layer Logit-Lens Entropy for White-Box Hallucination Detection

cs.AI · 2026-05-31 · unverdicted · novelty 6.0

TriLens detects hallucinations via per-layer entropy trajectories of logit-lens readouts from three internal modules across LLMs and QA benchmarks.

Process Supervision of Confidence Margin for Calibrated LLM Reasoning

cs.LG · 2026-04-25 · unverdicted · novelty 6.0

RLCM trains LLMs with a margin-enhanced process reward that widens the gap between correct and incorrect reasoning steps, improving calibration on math, code, logic, and science tasks without hurting accuracy.

From Signals to Transfer: A Factorised Study of Probe-Based Uncertainty Estimation in Large Language Models

cs.CL · 2026-06-26 · conditional · novelty 5.0

A factorized study finds raw hidden states and attention features hard to beat in-domain for LLM uncertainty probes, but structured compressed features are more robust under distribution shift, with pretrained probes transferring to open-ended generation.

The Role of Ambiguity in Error Prediction via Uncertainty Quantification

cs.CL · 2026-06-01 · unverdicted · novelty 5.0

Disentangling input ambiguity from uncertainty quantification improves error prediction for LLMs on QA tasks, yielding over 10 PRR point gains across models and datasets.

Detecting Hallucinations in Large Language Models via Internal Attention Divergence Signals

cs.CL · 2026-05-06 · unverdicted · novelty 5.0

KL divergence of attention heads from uniform distribution predicts LLM answer correctness across datasets and model families.

citing papers explorer

Showing 8 of 8 citing papers after filters.

Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior cs.LG · 2026-05-06 · unverdicted · none · ref 191
Manifold steering along activation geometry induces behavioral trajectories matching the natural manifold of outputs, while linear steering produces off-manifold unnatural behaviors.
LLM Self-Recognition: Steering and Retrieving Activation Signatures cs.AI · 2026-06-04 · unverdicted · none · ref 28
Steering LLM residual streams with random sparse vectors creates detectable self-recognition fingerprints that enable over 98% accurate attribution of generated text to specific models without degrading output quality.
Boosting Self-Consistency with Ranking cs.CL · 2026-06-03 · unverdicted · none · ref 137
RISC reformulates self-consistency answer selection as a ranking task solved by a lightweight LambdaRank model with five hand-designed features, yielding better accuracy-efficiency trade-offs than majority voting on QA benchmarks.
TriLens: Per-Layer Logit-Lens Entropy for White-Box Hallucination Detection cs.AI · 2026-05-31 · unverdicted · none · ref 6
TriLens detects hallucinations via per-layer entropy trajectories of logit-lens readouts from three internal modules across LLMs and QA benchmarks.
Process Supervision of Confidence Margin for Calibrated LLM Reasoning cs.LG · 2026-04-25 · unverdicted · none · ref 8
RLCM trains LLMs with a margin-enhanced process reward that widens the gap between correct and incorrect reasoning steps, improving calibration on math, code, logic, and science tasks without hurting accuracy.
From Signals to Transfer: A Factorised Study of Probe-Based Uncertainty Estimation in Large Language Models cs.CL · 2026-06-26 · conditional · none · ref 19
A factorized study finds raw hidden states and attention features hard to beat in-domain for LLM uncertainty probes, but structured compressed features are more robust under distribution shift, with pretrained probes transferring to open-ended generation.
The Role of Ambiguity in Error Prediction via Uncertainty Quantification cs.CL · 2026-06-01 · unverdicted · none · ref 26
Disentangling input ambiguity from uncertainty quantification improves error prediction for LLMs on QA tasks, yielding over 10 PRR point gains across models and datasets.
Detecting Hallucinations in Large Language Models via Internal Attention Divergence Signals cs.CL · 2026-05-06 · unverdicted · none · ref 63
KL divergence of attention heads from uniform distribution predicts LLM answer correctness across datasets and model families.

Do Androids Know They’re Only Dreaming of Electric Sheep?

fields

years

verdicts

representative citing papers

citing papers explorer