STATEWITNESS is a decoder-based activation explainer that audits deception in LLMs by interpreting hidden states, reaching 0.916 mean AUROC on seven datasets with inspectable evidence.
Chain-of-Thought Unfaithful- ness as Disguised Accuracy
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
System 1 intuition in edge SLMs delivers 100% adversarial robustness and low latency for DAO consensus while System 2 reasoning causes 26.7% cognitive collapse and 17x slowdown.
citing papers explorer
-
Decoding Hidden Deception in Reasoning LLMs: Activation Explainers for Deception Auditing
STATEWITNESS is a decoder-based activation explainer that audits deception in LLMs by interpreting hidden states, reaching 0.916 mean AUROC on seven datasets with inspectable evidence.
-
The Cognitive Penalty: Ablating System 1 and System 2 Reasoning in Edge-Native SLMs for Decentralized Consensus
System 1 intuition in edge SLMs delivers 100% adversarial robustness and low latency for DAO consensus while System 2 reasoning causes 26.7% cognitive collapse and 17x slowdown.