Manifold steering along activation geometry induces behavioral trajectories matching the natural manifold of outputs, while linear steering produces off-manifold unnatural behaviors.
Do Androids Know They’re Only Dreaming of Electric Sheep?
3 Pith papers cite this work, alongside 3 external citations. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
RLCM trains LLMs with a margin-enhanced process reward that widens the gap between correct and incorrect reasoning steps, improving calibration on math, code, logic, and science tasks without hurting accuracy.
KL divergence of attention heads from uniform distribution predicts LLM answer correctness across datasets and model families.
citing papers explorer
-
Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior
Manifold steering along activation geometry induces behavioral trajectories matching the natural manifold of outputs, while linear steering produces off-manifold unnatural behaviors.
-
Process Supervision of Confidence Margin for Calibrated LLM Reasoning
RLCM trains LLMs with a margin-enhanced process reward that widens the gap between correct and incorrect reasoning steps, improving calibration on math, code, logic, and science tasks without hurting accuracy.
-
Detecting Hallucinations in Large Language Models via Internal Attention Divergence Signals
KL divergence of attention heads from uniform distribution predicts LLM answer correctness across datasets and model families.