pith. sign in
Pith Number

pith:5PZ6WIBN

pith:2026:5PZ6WIBNBZKZOWU2PJG2K2RK4X
not attested not anchored not stored refs resolved

Fair outputs, Biased Internals: Causal Potency and Asymmetry of Latent Bias in LLMs for High-Stakes Decisions

Jagdish Tripathy, Marcus Buckmann

Instruction-tuned LLMs output fair high-stakes decisions while retaining asymmetric latent demographic biases that can reverse those decisions when reactivated.

arxiv:2605.15217 v1 · 2026-05-12 · cs.AI · cs.CY · cs.LG · econ.GN · q-fin.EC

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{5PZ6WIBNBZKZOWU2PJG2K2RK4X}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

When reinjected at critical layers, suppressed demographic representations produce near-complete decision reversals, and this latent bias is asymmetric—steering affects decisions in one demographic direction while producing minimal effects in reverse.

C2weakest assumption

That the activation steering and cross-layer interventions isolate the causal effect of latent demographic representations without introducing confounding changes to model behavior or that the matched application pairs differ only in racially-associated names with no other correlated signals.

C3one line summary

Open-weight LLMs show no output bias on matched mortgage applications differing only by racially-associated names, yet retain and amplify demographic representations that steering interventions can causally activate to produce near-complete asymmetric decision reversals.

References

7 extracted · 7 resolved · 0 Pith anchors

[1] Interpreting LLMs as Credit Risk Classifiers: Do Their Feature Explanations Align with Classical ML? 2025
[2] Alignment Reduces Expressed but Not Encoded Gender Bias: A Unified Framework and Study 2026
[3] More Women, Same Stereotypes: Unpacking the Gender Bias Paradox in Large Language Models 2025
[4] Marked personas: Using natural language prompts to measure stereotypes in language models.ArXiv, abs/2305.18189 2023
[5] AI generates covertly racist decisions about people based on their dialect 2024
Receipt and verification
First computed 2026-05-20T00:00:46.731307Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

ebf3eb202d0e55975a9a7a4da56a2ae5d46d3705084c246b00f4330c49b19dc2

Aliases

arxiv: 2605.15217 · arxiv_version: 2605.15217v1 · doi: 10.48550/arxiv.2605.15217 · pith_short_12: 5PZ6WIBNBZKZ · pith_short_16: 5PZ6WIBNBZKZOWU2 · pith_short_8: 5PZ6WIBN
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/5PZ6WIBNBZKZOWU2PJG2K2RK4X \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: ebf3eb202d0e55975a9a7a4da56a2ae5d46d3705084c246b00f4330c49b19dc2
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "64f7c12da9eaaf1b649c9a4e677aeb11e2f13ab3e6a365b0bfa1083c3b1c0004",
    "cross_cats_sorted": [
      "cs.CY",
      "cs.LG",
      "econ.GN",
      "q-fin.EC"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.AI",
    "submitted_at": "2026-05-12T12:14:58Z",
    "title_canon_sha256": "7f9bcdbbe306c826d7e7a956a4164c3c4f6247d706991595aca87e47dd75b797"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.15217",
    "kind": "arxiv",
    "version": 1
  }
}