pith:ENWG2XKP
LPDS: Evaluating LLM Robustness Through Logic-Preserving Difficulty Scaling
Logic-preserving difficulty scaling finds problem variations that cause language models to fail up to five times more often than random tests.
arxiv:2605.15393 v1 · 2026-05-14 · cs.LG
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{ENWG2XKPMCBSPT5H7Q5XXEX7HU}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
We show that LPDS efficiently finds difficult problem variations for a model, resulting in performance drops up to 5 times larger compared to random sampling.
The framework assumes that difficulty of a logic-preserving variation can be quantified in a model-agnostic or at least transferable way that reliably predicts where failures will occur, and that the search procedure finds variations that are truly harder rather than merely different.
LPDS quantifies difficulty of logic-preserving problem variations and searches for the hardest ones, producing up to 5x larger performance drops than random sampling and better robustness gains from fine-tuning on difficult examples.
References
Formal links
Receipt and verification
| First computed | 2026-05-20T00:00:56.318662Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
236c6d5d4f608327cfa7fc3b7b92ff3d3f3550026e8e6c3cc78bcff3c97c5f66
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/ENWG2XKPMCBSPT5H7Q5XXEX7HU \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 236c6d5d4f608327cfa7fc3b7b92ff3d3f3550026e8e6c3cc78bcff3c97c5f66
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "1cef0907d0f498f145adbc8860fb3d834bef77895a396b62279f7af2a33f5c26",
"cross_cats_sorted": [],
"license": "http://creativecommons.org/licenses/by-sa/4.0/",
"primary_cat": "cs.LG",
"submitted_at": "2026-05-14T20:26:59Z",
"title_canon_sha256": "3084e86520bad1b69c41ebd76253a81e4cba3f8551ce75ec8cb122d6bad18f0c"
},
"schema_version": "1.0",
"source": {
"id": "2605.15393",
"kind": "arxiv",
"version": 1
}
}