pith. sign in
Pith Number

pith:TRYCRSRP

pith:2026:TRYCRSRP3Y6B265MQZX2DCOGLH
not attested not anchored not stored refs pending

A Methodological Guide on Using Large Language Models for Reproducible Text Annotation in the Social Sciences and Humanities with Python and R

Erik-Jan van Kesteren, Javier Garcia Bernardo, Qixiang Fang

A structured workflow lets researchers use large language models to annotate text for social science and humanities projects while adjusting for errors in later analyses.

arxiv:2604.09638 v2 · 2026-03-21 · cs.CY

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{TRYCRSRP3Y6B265MQZX2DCOGLH}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

The paper provides a comprehensive, step-by-step methodological guide for using LLMs for text annotation in SSH research, covering how LLMs work, project identification, prompt design, quality evaluation without overfitting, integration into statistical analyses accounting for annotation error, and management of cost, efficiency, and reproducibility.

C2weakest assumption

That the recommended practices for iterative prompt refinement and accounting for annotation error in downstream analyses will reliably prevent bias in typical SSH statistical applications without additional validation steps.

C3one line summary

A practical guide for SSH researchers on applying LLMs to text annotation, covering project suitability, prompt design, quality evaluation, error-aware statistical integration, and scaling considerations.

Receipt and verification
First computed 2026-05-28T02:04:47.520021Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

9c7028ca2fde3c1d7bac866fa189c659f4d1ac93139b3410f4889969c9673c70

Aliases

arxiv: 2604.09638 · arxiv_version: 2604.09638v2 · doi: 10.48550/arxiv.2604.09638 · pith_short_12: TRYCRSRP3Y6B · pith_short_16: TRYCRSRP3Y6B265M · pith_short_8: TRYCRSRP
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/TRYCRSRP3Y6B265MQZX2DCOGLH \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 9c7028ca2fde3c1d7bac866fa189c659f4d1ac93139b3410f4889969c9673c70
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "69229ca4a45ddcc687022d7519bbd9d85a10f75f951232358b3e41402a5e0584",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CY",
    "submitted_at": "2026-03-21T00:09:50Z",
    "title_canon_sha256": "72efcce7889aeb36417c86e9c3bc7d4d449ae342c02e42ab81cc42011807e3e7"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2604.09638",
    "kind": "arxiv",
    "version": 2
  }
}