CAPC-CG: A Large-Scale, Expert-Directed LLM-Annotated Corpus of Adaptive Policy Communication in China
Pith reviewed 2026-05-21 20:47 UTC · model grok-4.3
The pith
CAPC-CG offers the first open annotated corpus of Chinese central government policies using a five-color taxonomy for clear and ambiguous language.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes CAPC-CG as a reliable, expert-annotated resource that applies a five-color taxonomy to categorize the language in national laws, administrative regulations, and ministerial rules, enabling quantitative study of how Chinese authorities communicate policy intent across decades.
What carries the argument
The five-color taxonomy of clear and ambiguous language categories, applied through a two-round expert labeling process to paragraph-level segments of policy documents.
If this is right
- High inter-annotator agreement of 0.86 kappa allows for effective supervised machine learning on policy language classification.
- Researchers can analyze historical shifts in the use of clear versus ambiguous directives from 1949 onward.
- Baseline LLM performances provide starting points for improving automated annotation of similar government texts.
- The released metadata and codebook support replication and extension to other policy domains.
Where Pith is reading between the lines
- The dataset could help identify whether ambiguous language in policies leads to varied local implementation outcomes.
- Comparing annotation patterns before and after key historical events might show changes in central government communication strategies.
- Adapting this taxonomy to non-Chinese policy documents could reveal cross-national differences in how governments balance clarity and flexibility.
Load-bearing premise
The five-color taxonomy developed for adaptive policy communication applies reliably to Chinese central government documents without major adaptation or loss of validity.
What would settle it
A new round of annotations by independent experts on a held-out sample of documents yielding substantially lower agreement scores or frequent category mismatches would indicate the taxonomy does not transfer reliably.
Figures
read the original abstract
We introduce CAPC-CG, the Chinese Adaptive Policy Communication (Central Government) Corpus, the first open dataset of Chinese policy directives annotated with a five-color taxonomy of clear and ambiguous language categories, building on Ang's theory of adaptive policy communication. Spanning 1949-2023, this corpus includes national laws, administrative regulations, and ministerial rules issued by China's top authorities. Each document is segmented into paragraphs, producing a total of 3.3 million units. Alongside the corpus, we release comprehensive metadata, a two-round labeling framework, and a gold-standard annotation set developed by expert and trained coders. Inter-annotator agreement achieves a Fleiss's kappa of K = 0.86 on directive labels, indicating high reliability for supervised modeling. We provide baseline classification results with several large language models (LLMs), together with our annotation codebook, and describe patterns from the dataset. This release aims to support downstream tasks and multilingual NLP research in policy communication.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces CAPC-CG, the first open large-scale corpus of Chinese policy directives from 1949-2023, consisting of 3.3 million paragraph units annotated using a five-color taxonomy for clear and ambiguous language based on Ang's adaptive policy communication theory. It includes a gold-standard set with expert annotations achieving Fleiss's kappa of 0.86, releases the LLM-annotated full corpus, metadata, annotation framework, codebook, and baseline LLM classification results.
Significance. This work provides a valuable new resource for research in policy communication, NLP, and Chinese politics by offering an unprecedented scale of annotated data spanning decades. The open release of the corpus, gold-standard annotations, and baselines promotes reproducibility and enables supervised modeling tasks. The high inter-annotator agreement on the gold set is a strength, supporting potential use in downstream applications if the LLM annotations maintain similar quality.
major comments (1)
- Abstract: The Fleiss's kappa of 0.86 is reported for the gold-standard annotation set by expert and trained coders. However, the primary released corpus uses LLM annotations on the full 3.3 million paragraphs, and no agreement metric (such as accuracy or kappa) between the LLM labels and the expert gold standard is provided. This extrapolation undermines the claim of high reliability for the released dataset.
Simulated Author's Rebuttal
We thank the referee for their constructive review and recommendation for minor revision. Their comment correctly identifies a point of clarification needed regarding the reliability claims for the LLM-annotated portion of the corpus, which we address directly below.
read point-by-point responses
-
Referee: Abstract: The Fleiss's kappa of 0.86 is reported for the gold-standard annotation set by expert and trained coders. However, the primary released corpus uses LLM annotations on the full 3.3 million paragraphs, and no agreement metric (such as accuracy or kappa) between the LLM labels and the expert gold standard is provided. This extrapolation undermines the claim of high reliability for the released dataset.
Authors: We appreciate this observation, which highlights a useful distinction. The Fleiss's kappa of 0.86 measures inter-annotator agreement among human experts on the gold-standard set and supports the validity of our five-color taxonomy and codebook. The full 3.3 million paragraphs were annotated via LLMs, with baseline classification results provided to illustrate model performance on held-out data derived from the gold standard. To directly address the concern, the revised manuscript will include explicit agreement metrics (accuracy, F1, and Cohen's kappa) comparing LLM-generated labels to expert annotations on a validation subset. This addition will strengthen transparency without changing the dataset release or core findings. revision: yes
Circularity Check
Data release with minor self-citation on taxonomy; no derivation reduces to inputs
full rationale
The paper is a corpus release that segments 3.3M paragraphs from 1949-2023 Chinese policy documents and annotates them with a five-color taxonomy drawn from Ang's prior theory of adaptive policy communication. One co-author (Yuen Yuen Ang) overlaps with that theory, producing a minor self-citation, but the central claim is the existence and release of the annotated dataset itself together with a gold-standard subset whose Fleiss's kappa is reported as 0.86. No equations, fitted parameters, predictions, or uniqueness theorems appear; the work contains no derivation chain that reduces by construction to its own inputs or to an unverified self-citation. The contribution is therefore self-contained as an empirical resource.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Ang's theory of adaptive policy communication supplies a valid five-color taxonomy for distinguishing clear and ambiguous language in Chinese policy directives.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We introduce CAPC-CG, the first open dataset of Chinese policy directives annotated with a five-color taxonomy of clear and ambiguous language categories, building on Ang's theory of adaptive policy communication.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Inter-annotator agreement achieves a Fleiss's kappa of K = 0.86 on directive labels
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
29.Carion, N.et al.Sam 3: Segment anything with concepts (2025)
Inter-annotator agreement is not the ceiling of machine learning performance: Evidence from a comprehensive set of simulations. InProceedings of the 21st workshop on biomedical language process- ing, pages 275–284. Marina Sokolova and Guy Lapalme. 2009. A system- atic analysis of performance measures for classifica- tion tasks.Information Processing & Man...
-
[2]
Recipient principle: All directives should be read from the perspective of state actors. Pri- vate laws and regulations directed solely at in- dividuals, enterprises, or social groups without assigning bureaucratic tasks are not considered directives
-
[3]
specificity: Code for clarity of in- tent, not specificity of detail
Clarity vs. specificity: Code for clarity of in- tent, not specificity of detail. A directive may express a clear purpose while remaining vague in its implementation, or contain extensive de- tails yet lack a clear underlying intent
-
[4]
Avoid keyword-only tagging: Keywords like must,flexible, orforbidcan serve as helpful cues, but they should never be used mechani- cally or treated as the deciding criteria. A.4 Step-by-Step Workflow This section provides a practical workflow for ap- plying the two-round annotation process. Step 1: Assign Level 1 Label (W / R / N)Deter- mine whether the p...
work page 1949
-
[5]
API Call: Send the entire raw text of docu- ment.txt to the LLM
-
[6]
LLM Task: The LLM internally processes the text and embeds XML tags (e.g., <L1>...) di- rectly into the content
-
[7]
LLM Output: The LLM returns the complete, modified text as its response. Cost analysisAt the project’s pricing of $0.20/1M input and $0.80/1M output tokens, this model is prohibitively expensive. The output token count is nearly identical to the input count, and output tokens are four times more costly. The total cost for a single document would be approx...
-
[8]
This step happens locally and incurs no API cost
Local Pre-processing: A Python script first adds <line #> markers to the document text. This step happens locally and incurs no API cost
-
[9]
API Call: Send the numbered text to the GPT- 4.1-mini-2025-04-14 model via the OpenAI Batch API
work page 2025
-
[10]
LLM Task: The LLM’s only task is to identify structural elements and output a compact JSON file listing labels and their corresponding line numbers
-
[11]
Local Reconstruction: A local script merges the LLM’s JSON output with the original text file to create the XML-tagged document. Cost analysisThis architecture transforms the cost equation. The output is now a very small JSON object, typically only 5-10% of the input token size. The total cost is: Cost optimized ≈ (Tokens input×$0.20) + (Tokens JSON×$0.80...
work page 2025
-
[12]
Metadata Generation: Heuristic Content Analysis Before segmentation, the script first analyzes the tagged document to determine its most sub- stantively important structural layer. This is achieved by scoring the content within each hi- erarchical layer (e.g., all L1 tags, all L2 tags) based on a set of heuristic parameters. The goal of this step is to pr...
-
[13]
Final Segmentation: Deterministic Rule Ap- plication 13 Parameter Value Rationale COLOR_DIVERSITY_WEIGHT3.0 Emphasizes the presence of action-oriented keywords. LENGTH_WEIGHT0.5 Favors longer, more substantive text blocks. LAYER_PENALTY_FACTOR1.5 Penalizes deeper layers (L3, L4), creating a bias for higher-level structure. MIN_LENGTH_THRESHOLD15.0 Minimum...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.