Pith Number

pith:QEGTRZSK

pith:2026:QEGTRZSKM7TPLSA53H77I25ZDT

not attested not anchored not stored refs pending

From Program Slices to Causal Clarity: Evaluating Faithful, Actionable LLM-Generated Failure Explanations via Context Partitioning and LLM-as-a-Judge

Christian Medeiros Adriano, Germany), Holger Giese (Hasso Plattner Institute, Julius Porbeck, University of Potsdam

Varying the composition of debugging context causally changes the quality of LLM-generated failure explanations, with targeted artifacts yielding better causal and actionable insights than large undifferentiated contexts.

arxiv:2604.18309 v2 · 2026-04-20 · cs.SE

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{QEGTRZSKM7TPLSA53H77I25ZDT}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Our results indicate that explanation quality is causally affected by context composition. Evidence-rich, failure-specific artifacts improve causal and action-oriented quality, whereas overly large contexts tend to yield vague explanations. Higher explanation-score quartiles are associated with higher downstream repair pass rates and, for some models, with fixes that are closer to the reference minimal fixes.

C2weakest assumption

That the six evaluation criteria and LLM-as-a-judge scores faithfully reflect true causal and actionable quality, and that the 93 context configurations plus the chosen real bugs are representative enough to support general claims about context effects.

C3one line summary

Focused, failure-specific contexts such as program slices produce more causal and actionable LLM bug explanations than large undifferentiated contexts, and higher-quality explanations correlate with better downstream repair success rates.

Receipt and verification

First computed	2026-05-21T01:05:19.330590Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

810d38e64a67e6f5c81dd9fff46bb91cf94f839b823c465c08b967516ccff1b5

Aliases

arxiv: 2604.18309 · arxiv_version: 2604.18309v2 · doi: 10.48550/arxiv.2604.18309 · pith_short_12: QEGTRZSKM7TP · pith_short_16: QEGTRZSKM7TPLSA5 · pith_short_8: QEGTRZSK

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/QEGTRZSKM7TPLSA53H77I25ZDT \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 810d38e64a67e6f5c81dd9fff46bb91cf94f839b823c465c08b967516ccff1b5

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "896a0b51accb7ad6369fa5b7d6290a7ebfc7890c62a0e1bb4461c5c0e9330fb3",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.SE",
    "submitted_at": "2026-04-20T14:16:39Z",
    "title_canon_sha256": "cd7a2731dfd4788c8ba1214696a1ffbeff5d0e1a99fda2a2c02ccd7b0e4b98d8"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2604.18309",
    "kind": "arxiv",
    "version": 2
  }
}