pith. sign in
Pith Number

pith:3QNU6ZTU

pith:2026:3QNU6ZTUTOAN5YGL32KFYKSXF6
not attested not anchored not stored refs pending

Mechanisms of Introspective Awareness

Atticus Wang, Emmanuel Ameisen, Jack Lindsey, Li Yang, Peter Wallich, Uzay Macar

Large language models detect injected steering vectors through a two-stage circuit that emerges after preference optimization.

arxiv:2603.21396 v4 · 2026-03-22 · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{3QNU6ZTUTOAN5YGL32KFYKSXF6}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

We trace the detection mechanism to a two-stage circuit in which 'evidence carrier' features in early post-injection layers detect perturbations monotonically along diverse directions, suppressing downstream 'gate' features that implement a default negative response.

C2weakest assumption

The assumption that the observed changes in activation patterns after steering vector injection are causally responsible for the behavioral detection rather than merely correlated with it, which rests on the validity of the ablation and patching experiments used to identify the evidence-carrier and gate features.

C3one line summary

DPO training induces a two-stage detection circuit in LLMs using early evidence-carrier features and downstream gate features that is absent in base models and distinct from later-layer identification mechanisms.

Formal links

2 machine-checked theorem links

Cited by

1 paper in Pith

Receipt and verification
First computed 2026-05-20T00:01:40.442286Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

dc1b4f66749b80dee0cbde945c2a572f98634240808e8e8f21d7ac153019fddf

Aliases

arxiv: 2603.21396 · arxiv_version: 2603.21396v4 · doi: 10.48550/arxiv.2603.21396 · pith_short_12: 3QNU6ZTUTOAN · pith_short_16: 3QNU6ZTUTOAN5YGL · pith_short_8: 3QNU6ZTU
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/3QNU6ZTUTOAN5YGL32KFYKSXF6 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: dc1b4f66749b80dee0cbde945c2a572f98634240808e8e8f21d7ac153019fddf
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "0faeb7e14e32bcd80119d5aaf9260561081c70e8281f62eef5936815f21d0569",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-03-22T20:45:34Z",
    "title_canon_sha256": "3fda1e1ac9d141c93af47e8f354aad7d3f19a82fc23eb61f50a7bcde85c043ce"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2603.21396",
    "kind": "arxiv",
    "version": 4
  }
}