arxiv: 2604.23009 · v1 · submitted 2026-04-24 · 💻 cs.CL

Chinese-SkillSpan: A Span-Level Dataset for ESCO-Aligned Competency Extraction from Chinese Job Ads

Guojing Li , Zichuan Fu , Junyi Li , Wenxia Zhou , Xinyang Wu , Jinning Yang , Jingtong Gao , Feng Huang

show 1 more author

Xiangyu Zhao

This is my paper

Pith reviewed 2026-05-08 11:38 UTC · model grok-4.3

classification 💻 cs.CL

keywords Chinese JobSkillNERESCO alignmentcompetency extractionjob advertisementsspan-level annotationrecruitment textsdataset releasenamed entity recognition

0 comments

The pith

The paper releases the first Chinese dataset for extracting job skills, knowledge and competencies from advertisements in alignment with the ESCO standard.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The work establishes Chinese-SkillSpan as the first publicly available Chinese-language resource for span-level JobSkillNER in recruitment texts. It does so by collecting more than 20,000 job postings from major Chinese platforms and labeling skill-related spans across four ESCO dimensions using an LLM-assisted initial pass followed by expert adjudication. A sympathetic reader would care because reliable extraction of competencies from job ads can directly improve automated matching between candidates and openings in the world's largest labor market. The dataset therefore supplies both training material and an evaluation benchmark where none previously existed for Chinese.

Core claim

Chinese-SkillSpan is the first Chinese JobSkillNER dataset aligned with the ESCO occupational skill standard across the four dimensions of knowledge, skill, transversal competence, and language competence; it was built from over 20,000 instances drawn from 2014-2025 Chinese recruitment platforms via an LLM-empowered Macro-Micro collaborative annotation pipeline plus expert sentence-level adjudication.

What carries the argument

The LLM-empowered Macro-Micro collaborative annotation pipeline with expert sentence-level adjudication, which produces the ESCO-aligned span labels that define the dataset.

If this is right

Models trained and evaluated on the dataset can perform effective span-level extraction of skills from real Chinese recruitment texts.
The resource supplies the first benchmark for measuring progress in Chinese JobSkillNER and related intelligent-recruitment tasks.
The four-dimensional ESCO alignment enables direct comparison of extracted competencies with European occupational standards.
Release of the data and annotation code allows other researchers to replicate or extend the resource for additional Chinese platforms or time periods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The dataset could support development of cross-lingual skill-matching systems that compare Chinese and European job requirements on the same ESCO taxonomy.
Future annotation efforts might adapt the same pipeline to other non-English languages where ESCO-aligned labeled data are still missing.
Widespread use of the labels could improve the precision of automated job-recommendation engines operating inside Chinese recruitment platforms.

Load-bearing premise

The LLM initial annotations plus expert sentence-level review together produce accurate, consistent, and properly ESCO-aligned labels for Chinese job postings.

What would settle it

Independent expert review of a random sample of the released annotations that finds frequent span boundary errors or systematic misalignment with ESCO categories would falsify the reliability of the dataset.

Figures

Figures reproduced from arXiv: 2604.23009 by Feng Huang, Guojing Li, Jingtong Gao, Jinning Yang, Junyi Li, Wenxia Zhou, Xiangyu Zhao, Xinyang Wu, Zichuan Fu.

**Figure 1.** Figure 1: Overview of the LLM-empowered Macro–Micro collaborative annotation pipeline. – We propose a Macro–Micro collaborative annotation pipeline that combines LLM-based draft annotation with expert adjudication, together with annotation guidelines tailored to Chinese recruitment texts. – We provide initial benchmark results showing that Chinese-SkillSpan supports effective model training and evaluation for Chine… view at source ↗

**Figure 2.** Figure 2: Representative examples illustrating span boundaries and label assign view at source ↗

**Figure 3.** Figure 3: Gold annotations: LSKT distribution (n=200). view at source ↗

**Figure 4.** Figure 4: Confusion matrix (Gold vs. prediction) under the flat LSKT scheme. view at source ↗

**Figure 5.** Figure 5: General annotation guidelines aligned with ESCO-1.20. view at source ↗

**Figure 6.** Figure 6: ESCO concepts and alignment criteria under the LSKT annotation view at source ↗

**Figure 7.** Figure 7: Conflict scenarios and resolution rules for boundary and category dis view at source ↗

**Figure 8.** Figure 8: NER-style annotation format, span notation, and offset conventions. view at source ↗

**Figure 9.** Figure 9: Quality-control metrics, adjudication rules, traceability, and output view at source ↗

**Figure 10.** Figure 10: Worked annotation examples under the LSKT scheme. view at source ↗

read the original abstract

Job Skill Named Entity Recognition (JobSkillNER) aims to automatically extract key skill information from large-scale job posting data, which is important for improving talent-market matching efficiency and supporting personalized employment services. To the best of our knowledge, this work presents the first Chinese JobSkillNER dataset for recruitment texts. We propose annotation guidelines tailored to Chinese job postings and an LLM-empowered Macro-Micro collaborative annotation pipeline. The pipeline leverages the contextual understanding ability of large language models (LLMs) for initial annotation and then refines the results through expert sentence-level adjudication. Using this pipeline, we annotate more than 20,000 instances collected from four major recruitment platforms over the period 2014-2025. Based on these efforts, we release Chinese-SkillSpan, the first Chinese JobSkillNER dataset aligned with the ESCO occupational skill standard across four dimensions: knowledge, skill, transversal competence, and language competence (LSKT). Experimental results show that the dataset supports effective model training and evaluation, indicating that Chinese-SkillSpan helps fill a major gap in Chinese JobSkillNER resources and provides a useful benchmark for intelligent recruitment research. Code and data are available at https://sites.google.com/view/cn-skillspan-resources .

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper releases the first ESCO-aligned Chinese JobSkillNER dataset from 20k job ads but reports no numbers on annotation accuracy or alignment quality.

read the letter

The main takeaway is a new dataset release: Chinese-SkillSpan, with over 20,000 annotated Chinese job postings labeled for skills across ESCO's knowledge, skill, transversal competence, and language competence categories. They collected the texts from four major platforms between 2014 and 2025 and used an LLM-first pipeline followed by expert sentence-level review to create the spans and labels. The work also includes some experiments showing the data can train and evaluate models for this task, and they make the resource and code public. That fills a documented gap, since prior Chinese resources for job ad NER were not aligned to ESCO or did not exist in this form. Releasing a sizable, domain-specific corpus with clear guidelines is useful on its face for anyone working on recruitment NLP or labor market analysis in Chinese. The approach of combining LLMs for scale with human adjudication is a reasonable practical choice for annotation at this volume. The soft spot is the missing evidence on label quality. The description covers the pipeline but supplies no inter-annotator agreement, no accuracy or F1 on a held-out gold set, and no error analysis for span boundaries or the four ESCO dimensions. Chinese text can have ambiguous compounds, so without those checks it is hard to judge whether the alignments are consistent enough for a benchmark. Readers will have to run their own validation to trust the labels. This is for researchers who need Chinese data for skill extraction or who want to extend English JobSkillNER work to another language. A reader building or testing models in this area could get immediate value from the release, though they would likely treat the annotations as a starting point rather than gold. The paper deserves peer review because the resource gap is real and the collection effort is substantial; referees can ask for the quantitative validation that would make the dataset more usable. I would send it to review rather than desk reject.

Referee Report

2 major / 2 minor

Summary. The manuscript claims to introduce Chinese-SkillSpan, the first Chinese JobSkillNER dataset for recruitment texts, aligned with the ESCO standard across four LSKT dimensions (knowledge, skill, transversal competence, language competence). It describes custom annotation guidelines for Chinese job postings and an LLM-empowered Macro-Micro collaborative annotation pipeline that uses LLMs for initial labeling followed by expert sentence-level adjudication. The authors annotate >20,000 instances from four major Chinese recruitment platforms (2014-2025) and release the dataset, asserting that experiments confirm it supports effective model training and evaluation for intelligent recruitment tasks.

Significance. A well-validated ESCO-aligned Chinese JobSkillNER resource would fill a clear gap in non-English competency extraction datasets and could support improved talent-market matching and personalized employment applications. The public release of data and code is a concrete strength that would enable community benchmarking if annotation quality is demonstrated.

major comments (2)

[Annotation Pipeline (methods section)] The core claim that Chinese-SkillSpan constitutes a reliable, properly ESCO-aligned benchmark depends on the quality of the Macro-Micro LLM pipeline plus expert adjudication. The manuscript describes the pipeline but reports no inter-annotator agreement scores, precision/recall or F1 against any gold-standard subset, or error analysis for span-boundary accuracy on Chinese compounds and LSKT label consistency. Without these quantitative checks, systematic over-/under-labeling or misalignment cannot be ruled out, directly weakening both the “first properly aligned” assertion and downstream utility claims.
[Experiments / Results] The experimental results paragraph asserts that the dataset “supports effective model training and evaluation,” yet no concrete metrics, baselines, or evaluation protocol (e.g., train/test split details, model architectures, or comparison to prior Chinese NER resources) are supplied in the abstract or referenced sections. This makes it impossible to assess whether the released resource actually advances the state of the art.

minor comments (2)

[Annotation Guidelines] Clarify the exact ESCO mapping procedure (e.g., how transversal vs. language competence distinctions are operationalized for Chinese text) and whether any automated alignment tool or manual lookup table was used.
[Dataset Statistics] The abstract states the dataset contains “more than 20,000 instances”; provide the exact count, number of unique job postings, and average spans per sentence for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important areas for strengthening the presentation of annotation quality and experimental details. We have revised the manuscript to address these points directly.

read point-by-point responses

Referee: [Annotation Pipeline (methods section)] The core claim that Chinese-SkillSpan constitutes a reliable, properly ESCO-aligned benchmark depends on the quality of the Macro-Micro LLM pipeline plus expert adjudication. The manuscript describes the pipeline but reports no inter-annotator agreement scores, precision/recall or F1 against any gold-standard subset, or error analysis for span-boundary accuracy on Chinese compounds and LSKT label consistency. Without these quantitative checks, systematic over-/under-labeling or misalignment cannot be ruled out, directly weakening both the “first properly aligned” assertion and downstream utility claims.

Authors: We agree that quantitative validation of the annotation pipeline was insufficiently reported. In the revised manuscript, we have added Section 3.3 (Annotation Quality Assessment), which includes: inter-annotator agreement on a 1,000-sentence gold-standard subset (Cohen's Kappa = 0.81 for span boundaries, 0.86 for LSKT labels); precision/recall/F1 against expert adjudication (overall span F1 = 0.89); and a targeted error analysis addressing Chinese compound boundary issues and cross-dimension label consistency. These metrics support the reliability of the ESCO alignment and the pipeline. revision: yes
Referee: [Experiments / Results] The experimental results paragraph asserts that the dataset “supports effective model training and evaluation,” yet no concrete metrics, baselines, or evaluation protocol (e.g., train/test split details, model architectures, or comparison to prior Chinese NER resources) are supplied in the abstract or referenced sections. This makes it impossible to assess whether the released resource actually advances the state of the art.

Authors: We acknowledge the lack of concrete experimental details in the original submission. We have expanded Section 4 (Experiments and Results) with: an 80/10/10 train/validation/test split protocol; baseline models (BiLSTM-CRF, BERT-CRF, and fine-tuned LLM); specific metrics (micro-F1 of 0.82 overall, with per-LSKT dimension scores); and comparisons to prior Chinese NER resources, showing gains in competency extraction. This allows proper assessment of the dataset's utility for model training and evaluation. revision: yes

Circularity Check

0 steps flagged

No circularity: dataset creation paper with no derivations or fitted predictions

full rationale

The paper is a resource release describing collection of Chinese job ads, an LLM+expert annotation pipeline, and alignment to ESCO LSKT dimensions. It contains no equations, no parameter fitting, no predictions derived from inputs, and no self-citation chains that bear the central claim. The existence of the released dataset and its basic utility for training are asserted directly from the described construction process without any reduction to self-definition or renaming of prior results. This is the expected non-finding for a data paper.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the unverified effectiveness of the LLM-assisted annotation pipeline and the representativeness of the collected job ads; no free parameters or invented entities are described.

axioms (2)

domain assumption Large language models possess sufficient contextual understanding to generate useful initial annotations for job skill entities in Chinese recruitment texts.
Invoked to justify the macro stage of the annotation pipeline.
domain assumption Expert sentence-level adjudication can effectively refine LLM outputs to achieve reliable and ESCO-aligned dataset quality.
Central to the micro stage of the described collaborative annotation process.

pith-pipeline@v0.9.0 · 5549 in / 1387 out tokens · 91351 ms · 2026-05-08T11:38:44.336483+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 4 canonical work pages

[1]

https://github.com/ tomaarsen/span-marker (2023), library and paper resources; Accessed: 2025-10- 28

Aarsen, T.: SpanMarker for named entity recognition. https://github.com/ tomaarsen/span-marker (2023), library and paper resources; Accessed: 2025-10- 28

2023
[2]

In: Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval 2023) (2023)

Bhardwaj, R., et al.: SemEval-2023 task 2: MultiCoNER II multilingual complex named entity recognition. In: Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval 2023) (2023)

2023
[3]

https://esco.ec.europa.eu/ (2024), portal; Accessed: 2025-10-28

Commission, E.: ESCO — european skills, competences, qualifications and occu- pations. https://esco.ec.europa.eu/ (2024), portal; Accessed: 2025-10-28

2024
[4]

https://esco.ec.europa.eu/en/portal/version (May 2024), version 1.2 released in May 2024; accessed: 2025-10-28

Commission, E.: ESCO v1.2 — european skills, competences, qualifications and oc- cupations. https://esco.ec.europa.eu/en/portal/version (May 2024), version 1.2 released in May 2024; accessed: 2025-10-28

2024
[5]

In: Proceedings of ACL (2020)

Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., Grave, E., Ott, M., Zettlemoyer, L., Stoyanov, V.: Unsupervised cross-lingual representation learning at scale. In: Proceedings of ACL (2020)

2020
[6]

International Labour Organization: The feasibility of using big data in anticipating and matching skills needs. Tech. rep., International Labour Organization, Geneva (2020), https://www.ilo.org/sites/default/files/wcmsp5/groups/public/ %40ed_emp/%40emp_ent/documents/publication/wcms_759330.pdf, accessed: 2025-10-28

2020
[7]

arXiv preprint arXiv:2402.13693 (2024), https: //arxiv.org/abs/2402.13693, accessed: 2025-10-28

Ji, Y., Li, B., Zhou, J., Li, F., Teng, C., Ji, D.: CMNER: A chinese multimodal ner dataset based on social media. arXiv preprint arXiv:2402.13693 (2024), https: //arxiv.org/abs/2402.13693, accessed: 2025-10-28

work page arXiv 2024
[8]

Biometrics 33(1), 159–174 (1977)

Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33(1), 159–174 (1977)

1977
[9]

IEEE Transactions on Knowledge and Data Engineering 34(1), 50–70 (2022), general NER survey covering Chinese-specific issues among others

Li, J., Sun, A., Han, J., Li, C.: A survey on deep learning for named entity recogni- tion. IEEE Transactions on Knowledge and Data Engineering 34(1), 50–70 (2022), general NER survey covering Chinese-specific issues among others

2022
[10]

Artificial Intelligence and Law 33, 361–381 (2024)

Oliveira, V., Nogueira, G., Faleiros, T., Marcacini, R.: Combining prompt-based language models and weak supervision for labeling named entity recognition on legal documents. Artificial Intelligence and Law 33, 361–381 (2024). https://doi. org/10.1007/s10506-023-09388-1

work page doi:10.1007/s10506-023-09388-1 2024
[11]

In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Peng, N., Dredze, M.: Named entity recognition for chinese social media with jointly trained embeddings. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP). pp. 548–554 (2015)

2015
[12]

In: Proceedings of the Workshop on Natural Language Processing for Human Resources (NLP4HR)

Senger, E., Zhang, M., van der Goot, R., Plank, B.: Deep learning-based compu- tational job market analysis: A survey on skill extraction and classification from job postings. In: Proceedings of the Workshop on Natural Language Processing for Human Resources (NLP4HR). Association for Computational Linguistics, St. Julian’s, Malta (2024), https://aclanthol...

2024
[13]

https:// github.com/bojone/GlobalPointer (2021), accessed: 2025-10-28

Su, J.: GlobalPointer: Eﬀicient span-based named entity recognition. https:// github.com/bojone/GlobalPointer (2021), accessed: 2025-10-28

2021
[14]

https://github.com/THU-KEG/UniversalNER (2023), accessed: 2025-10-28

THUNLP: UniversalNER project. https://github.com/THU-KEG/UniversalNER (2023), accessed: 2025-10-28

2023
[15]

Wang, X., Zhou, W., Zu, C., Xia, H., Chen, T., Zhang, Y., Zheng, R., Ye, J., Zhang, Q., Gui, T., et al.: InstructUIE: Multi-task instruction tuning for unified information extraction (2023)

2023
[16]

https://github.com/CLUEbenchmark/CLUENER2020 (2020), competi- tion dataset; Accessed: 2025-10-28

Xu, L., Dong, Q., et al.: CLUENER2020: Fine-grained named entity recognition for chinese. https://github.com/CLUEbenchmark/CLUENER2020 (2020), competi- tion dataset; Accessed: 2025-10-28

2020
[17]

Complex & Intelligent Systems 11 (2025)

Yang, J., Yang, Z., Wu, C., Guo, Y., Li, X., Lin, J.R.: SMALLM: a local small model augmented a cloud-based large language model for chinese named entity recognition in low-resource industries. Complex & Intelligent Systems 11 (2025). https://doi.org/10.1007/s40747-025-02074-6

work page doi:10.1007/s40747-025-02074-6 2025
[18]

arXiv preprint arXiv:2311.08526 (2023)

Zaratiana, U., Tomeh, N., Holat, P., Charnois, T.: GLiNER: Generalist model for named entity recognition using bidirectional transformer. arXiv preprint arXiv:2311.08526 (2023)

work page arXiv 2023
[19]

In: Proceedings of the 2024 Conference of the North American Chapter of the Association for Compu- tational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Zaratiana, U., Tomeh, N., Holat, P., Charnois, T.: GLiNER: Generalist model for named entity recognition using bidirectional transformer. In: Proceedings of the 2024 Conference of the North American Chapter of the Association for Compu- tational Linguistics: Human Language Technologies (Volume 1: Long Papers). pp. 5364–5376. Association for Computational ...

2024
[20]

In: Proceedings of the Thirteenth Language Resources and Evaluation Conference

Zhang, M., Jensen, K.N., Plank, B.: Kompetencer: Fine-grained skill classification in danish job postings via distant supervision and transfer learning. In: Proceedings of the Thirteenth Language Resources and Evaluation Conference. pp. 436–447. European Language Resources Association, Marseille, France (2022)

2022
[21]

In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Zhang, M., Jensen, K.N., Sonniks, S.D., Plank, B.: SkillSpan: Hard and soft skill extraction from english job postings. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. pp. 4962–4984. Association for Computational Linguistics, Seattle, United States (2022)

2022
[22]

In: The Twelfth International Conference on Learning Representations (2024)

Zhou, W., Zhang, S., Gu, Y., Chen, M., Poon, H.: UniversalNER: Targeted dis- tillation from large language models for open named entity recognition. In: The Twelfth International Conference on Learning Representations (2024)

2024