Pith Number

pith:2FLK2MWJ

pith:2023:2FLK2MWJLATJAVAZTWKFIVMEEF

not attested not anchored not stored refs resolved

Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting

Alane Suhr, Melanie Sclar, Yejin Choi, Yulia Tsvetkov

Several open-source LLMs vary in accuracy by up to 76 points on the same few-shot task due to minor prompt formatting differences.

arxiv:2310.11324 v2 · 2023-10-17 · cs.CL · cs.AI · cs.LG

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{2FLK2MWJLATJAVAZTWKFIVMEEF}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

several widely used open-source LLMs are extremely sensitive to subtle changes in prompt formatting in few-shot settings, with performance differences of up to 76 accuracy points when evaluated using LLaMA-2-13B

C2weakest assumption

that the set of tested formatting variations and the sampled formats in FormatSpread adequately represent the space of plausible, meaning-preserving prompt designs that users might actually employ

C3one line summary

LLMs are highly sensitive to prompt formatting in few-shot settings, with accuracy varying by up to 76 points across formats; FormatSpread samples formats to report performance intervals without model weights.

References

64 extracted · 64 resolved · 5 Pith anchors

[1] Tweet: Susan & I found MMLU performance jump 6-10 points in the 40s by formatting multiple choice as (A) not A in MMLU (for internal model) 2023

[2] Falcon-40B : an open large language model with state-of-the-art performance 2023

[3] An empirical evaluation of thompson sampling 2011

[5] Better hypothesis testing for statistical machine translation: Controlling for optimizer instability 2011

[7] GPT 3.int8(): 8-bit matrix multiplication for transformers at scale 2022

Cited by

49 papers in Pith

Efficient Safety Alignment of Language Models via Latent Personality Traits

How Consistent Are LLM Agents? Measuring Behavioral Reproducibility in Multi-Step Tool-Calling Pipelines

To Isolate or to Score? Model-Adaptive Assessment for Cost-Efficient Multi-Agent RAG

Instruction Bleed: Cross-Module Interference in Prompt-Composed Agentic Systems

A Deterministic Control Plane for LLM Coding Agents

Receipt and verification

First computed	2026-05-17T23:38:45.901734Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

d156ad32c958269054199d945455842140bc05d2fc21f7eb865c67bde5b35e2a

Aliases

arxiv: 2310.11324 · arxiv_version: 2310.11324v2 · doi: 10.48550/arxiv.2310.11324 · pith_short_12: 2FLK2MWJLATJ · pith_short_16: 2FLK2MWJLATJAVAZ · pith_short_8: 2FLK2MWJ

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/2FLK2MWJLATJAVAZTWKFIVMEEF \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: d156ad32c958269054199d945455842140bc05d2fc21f7eb865c67bde5b35e2a

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "1deda2aad4398016853ac581c2d4b4c56736fa4608e0b35ff62f6f41987e0bb3",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.LG"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2023-10-17T15:03:30Z",
    "title_canon_sha256": "36118606ae417a37c6d02143c21547b584d75bfdb5e7343bb753ba88f911ecd5"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2310.11324",
    "kind": "arxiv",
    "version": 2
  }
}