{"state_type":"pith_open_graph_state","state_version":"1.0","pith_number":"pith:2026:TRYCRSRP3Y6B265MQZX2DCOGLH","merge_version":"pith-open-graph-merge-v1","event_count":2,"valid_event_count":2,"invalid_event_count":0,"equivocation_count":0,"current":{"canonical_record":{"metadata":{"abstract_canon_sha256":"69229ca4a45ddcc687022d7519bbd9d85a10f75f951232358b3e41402a5e0584","cross_cats_sorted":[],"license":"http://creativecommons.org/licenses/by/4.0/","primary_cat":"cs.CY","submitted_at":"2026-03-21T00:09:50Z","title_canon_sha256":"72efcce7889aeb36417c86e9c3bc7d4d449ae342c02e42ab81cc42011807e3e7"},"schema_version":"1.0","source":{"id":"2604.09638","kind":"arxiv","version":2}},"source_aliases":[{"alias_kind":"arxiv","alias_value":"2604.09638","created_at":"2026-05-28T02:04:47Z"},{"alias_kind":"arxiv_version","alias_value":"2604.09638v2","created_at":"2026-05-28T02:04:47Z"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2604.09638","created_at":"2026-05-28T02:04:47Z"},{"alias_kind":"pith_short_12","alias_value":"TRYCRSRP3Y6B","created_at":"2026-05-28T02:04:47Z"},{"alias_kind":"pith_short_16","alias_value":"TRYCRSRP3Y6B265M","created_at":"2026-05-28T02:04:47Z"},{"alias_kind":"pith_short_8","alias_value":"TRYCRSRP","created_at":"2026-05-28T02:04:47Z"}],"graph_snapshots":[{"event_id":"sha256:a2d3e6534cccc76f84c2cb42b31fbd4d93849a99072c22c0493a05d21fac0c05","target":"graph","created_at":"2026-05-28T02:04:47Z","signer":{"key_id":"pith-v1-2026-05","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54","signer_id":"pith.science","signer_type":"pith_registry"},"payload":{"graph_snapshot":{"author_claims":{"count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","strong_count":0},"builder_version":"pith-number-builder-2026-05-17-v1","claims":{"count":4,"items":[{"attestation":"unclaimed","claim_id":"C1","kind":"strongest_claim","source":"verdict.strongest_claim","status":"machine_extracted","text":"The paper provides a comprehensive, step-by-step methodological guide for using LLMs for text annotation in SSH research, covering how LLMs work, project identification, prompt design, quality evaluation without overfitting, integration into statistical analyses accounting for annotation error, and management of cost, efficiency, and reproducibility."},{"attestation":"unclaimed","claim_id":"C2","kind":"weakest_assumption","source":"verdict.weakest_assumption","status":"machine_extracted","text":"That the recommended practices for iterative prompt refinement and accounting for annotation error in downstream analyses will reliably prevent bias in typical SSH statistical applications without additional validation steps."},{"attestation":"unclaimed","claim_id":"C3","kind":"one_line_summary","source":"verdict.one_line_summary","status":"machine_extracted","text":"A practical guide for SSH researchers on applying LLMs to text annotation, covering project suitability, prompt design, quality evaluation, error-aware statistical integration, and scaling considerations."},{"attestation":"unclaimed","claim_id":"C4","kind":"headline","source":"verdict.pith_extraction.headline","status":"machine_extracted","text":"A structured workflow lets researchers use large language models to annotate text for social science and humanities projects while adjusting for errors in later analyses."}],"snapshot_sha256":"f3e993f8be428f4bf998b676b49419658a7783c9e7ed99c3c49871bdb66978fd"},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"integrity":{"available":true,"clean":true,"detectors_run":[],"endpoint":"/pith/2604.09638/integrity.json","findings":[],"snapshot_sha256":"c28c3603d3b5d939e8dc4c7e95fa8dfce3d595e45f758748cecf8e644a296938","summary":{"advisory":0,"by_detector":{},"critical":0,"informational":0}},"paper":{"abstract_excerpt":"Large language models (LLMs) are increasingly used by researchers in the social sciences and humanities (SSH) for text analysis, particularly to automate text annotation. However, many researchers still face challenges in adopting LLMs, addressing their limitations, and producing reproducible workflows and results. For example, annotation errors can bias downstream statistical analyses even when apparent accuracy is high. This paper provides a step-by-step methodological guide to using LLMs for text annotation in SSH research, with practical Python and R examples. We explain how LLMs work, how","authors_text":"Erik-Jan van Kesteren, Javier Garcia Bernardo, Qixiang Fang","cross_cats":[],"headline":"A structured workflow lets researchers use large language models to annotate text for social science and humanities projects while adjusting for errors in later analyses.","license":"http://creativecommons.org/licenses/by/4.0/","primary_cat":"cs.CY","submitted_at":"2026-03-21T00:09:50Z","title":"A Methodological Guide on Using Large Language Models for Reproducible Text Annotation in the Social Sciences and Humanities with Python and R"},"references":{"count":0,"internal_anchors":0,"resolved_work":0,"sample":[],"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"source":{"id":"2604.09638","kind":"arxiv","version":2},"verdict":{"created_at":"2026-05-15T07:51:23.160746Z","id":"ebf1a254-e9c6-426b-9046-59307b010ba1","model_set":{"reader":"grok-4.3"},"one_line_summary":"A practical guide for SSH researchers on applying LLMs to text annotation, covering project suitability, prompt design, quality evaluation, error-aware statistical integration, and scaling considerations.","pipeline_version":"pith-pipeline@v0.9.0","pith_extraction_headline":"A structured workflow lets researchers use large language models to annotate text for social science and humanities projects while adjusting for errors in later analyses.","strongest_claim":"The paper provides a comprehensive, step-by-step methodological guide for using LLMs for text annotation in SSH research, covering how LLMs work, project identification, prompt design, quality evaluation without overfitting, integration into statistical analyses accounting for annotation error, and management of cost, efficiency, and reproducibility.","weakest_assumption":"That the recommended practices for iterative prompt refinement and accounting for annotation error in downstream analyses will reliably prevent bias in typical SSH statistical applications without additional validation steps."}},"verdict_id":"ebf1a254-e9c6-426b-9046-59307b010ba1"}}],"author_attestations":[],"timestamp_anchors":[],"storage_attestations":[],"citation_signatures":[],"replication_records":[],"corrections":[],"mirror_hints":[],"record_created":{"event_id":"sha256:89d4e33d43d9fb52a34bb1047bbc1f5de0df4555822c4673e05b43cf3eabeb3a","target":"record","created_at":"2026-05-28T02:04:47Z","signer":{"key_id":"pith-v1-2026-05","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54","signer_id":"pith.science","signer_type":"pith_registry"},"payload":{"attestation_state":"computed","canonical_record":{"metadata":{"abstract_canon_sha256":"69229ca4a45ddcc687022d7519bbd9d85a10f75f951232358b3e41402a5e0584","cross_cats_sorted":[],"license":"http://creativecommons.org/licenses/by/4.0/","primary_cat":"cs.CY","submitted_at":"2026-03-21T00:09:50Z","title_canon_sha256":"72efcce7889aeb36417c86e9c3bc7d4d449ae342c02e42ab81cc42011807e3e7"},"schema_version":"1.0","source":{"id":"2604.09638","kind":"arxiv","version":2}},"canonical_sha256":"9c7028ca2fde3c1d7bac866fa189c659f4d1ac93139b3410f4889969c9673c70","receipt":{"algorithm":"ed25519","builder_version":"pith-number-builder-2026-05-17-v1","canonical_sha256":"9c7028ca2fde3c1d7bac866fa189c659f4d1ac93139b3410f4889969c9673c70","first_computed_at":"2026-05-28T02:04:47.520021Z","key_id":"pith-v1-2026-05","kind":"pith_receipt","last_reissued_at":"2026-05-28T02:04:47.520021Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54","receipt_version":"0.3","signature_b64":"/ReI5byopuejSoaEp0QMN5IjCFsDucdpon6kBLcBEag4rFOWVGzpeTMUM445LNsdSPFhegUzE2C/DjLSqe4NBw==","signature_status":"signed_v1","signed_at":"2026-05-28T02:04:47.520767Z","signed_message":"canonical_sha256_bytes"},"source_id":"2604.09638","source_kind":"arxiv","source_version":2}}},"equivocations":[],"invalid_events":[],"applied_event_ids":["sha256:89d4e33d43d9fb52a34bb1047bbc1f5de0df4555822c4673e05b43cf3eabeb3a","sha256:a2d3e6534cccc76f84c2cb42b31fbd4d93849a99072c22c0493a05d21fac0c05"],"state_sha256":"dc2485789ff48d54dd3f9302fbb03a43219171651300eef915ece3baaa62956e"}