Large Language Models Are Effective Human Annotation Assistants, But Not Good Independent Annotators

Benjamin Evans; Carlos Rafael Colon; Feng Gu; Ishani Mondal; Jordan Lee Boyd-Graber; Zongxia Li

arxiv: 2503.06778 · v3 · submitted 2025-03-09 · 💻 cs.CL · cs.AI

Large Language Models Are Effective Human Annotation Assistants, But Not Good Independent Annotators

Feng Gu , Zongxia Li , Carlos Rafael Colon , Benjamin Evans , Ishani Mondal , Jordan Lee Boyd-Graber This is my paper

classification 💻 cs.CL cs.AI

keywords eventannotationannotatorshumanllmsalthoughassistautomated

0 comments

read the original abstract

Event annotation is important for identifying market changes, monitoring breaking news, and understanding sociological trends. Although expert annotators set the gold standards, human coding is expensive and inefficient. Unlike information extraction experiments that focus on single contexts, we evaluate a holistic workflow that removes irrelevant documents, merges documents about the same event, and annotates the events. Although LLM-based automated annotations are better than traditional TF-IDF-based methods or Event Set Curation, they are still not reliable annotators compared to human experts. However, adding LLMs to assist experts for Event Set Curation can reduce the time and mental effort required for Variable Annotation. When using LLMs to extract event variables to assist expert annotators, they agree more with the extracted variables than fully automated LLMs for annotation.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

SMARTER: A Data-efficient Framework to Improve Toxicity Detection with Explanation via Self-augmenting Large Language Models
cs.CL 2025-09 unverdicted novelty 5.0

SMARTER boosts LLM toxicity detection and explanation performance by up to 13% macro-F1 on three hate-speech benchmarks through self-generated synthetic data and minimal-supervision preference optimization.