ChatGPT-4 outperforms experts and crowd workers for annotating political Twitter messages with zero-shot learning

"ChatGPT-4 Outperforms Experts, Crowd Workers in Annotating Political Twitter Messages with Zero-Shot Learning · 2023 · arXiv 2304.06588

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

read on arXiv browse 11 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Using AI Agents to Automate Black-Box Audits of Personalization Algorithms at Scale

cs.CL · 2026-06-29 · unverdicted · novelty 7.0

Introduces GenAI agent framework for auditing personalization algorithms via synthetic accounts with fixed personas, applied to X post-2024 election showing amplification of toxic and right-leaning content varying by ideology.

SPAGBias: Uncovering and Tracing Structured Spatial Gender Bias in Large Language Models

cs.CL · 2026-04-16 · unverdicted · novelty 7.0

SPAGBias reveals that LLMs form nuanced gender associations with specific urban micro-spaces that exceed real-world distributions and produce failures in planning and descriptive tasks.

The Shrinking Lifespan of LLMs in Science

cs.DL · 2026-04-08 · unverdicted · novelty 7.0

LLM adoption in science follows a compressing inverted-U trajectory where release year predicts time-to-peak and lifespan better than model attributes.

What Prediction Markets Can See: Market Formation, Settlement Legibility, and the Geography of Tradable Uncertainty in Africa and Latin America

econ.GN · 2026-06-13 · unverdicted · novelty 6.0

Prediction market inventories for Africa and Latin America topics are shaped more by settlement legibility than by public salience, with sports and elections favored over conflicts.

Interpretable Discriminative Text Representations via Agreement and Label Disentanglement

cs.CL · 2026-05-20 · unverdicted · novelty 6.0

LFD discovers predictive text features via LLM contrastive proposals, cross-LLM Cohen's kappa screening, and residual held-out gain selection, matching baseline accuracy while achieving higher human agreement and lower label leakage on ten tasks.

Assessing Capabilities of Large Language Models in Social Media Analytics: A Multi-task Quest

cs.CL · 2026-04-21 · unverdicted · novelty 6.0

LLMs show mixed results on authorship verification, post generation, and attribute inference from Twitter data, with new frameworks and user studies establishing benchmarks for these analytics tasks.

Evaluating LLMs as Human Surrogates in Controlled Experiments

cs.HC · 2026-03-08 · unverdicted · novelty 6.0

LLMs reproduce several directional effects from a human accuracy perception experiment but show inconsistent effect magnitudes and moderation patterns across models.

Characterizing initial human-AI proof formalization workflows

cs.AI · 2026-06-02 · unverdicted · novelty 5.0

A controlled user study and qualitative survey find that AI assistance raises formalization accuracy for math proofs, with users flexibly combining multiple tools while retaining oversight.

Large Language Models as Virtual Survey Respondents: Evaluating Sociodemographic Response Generation

cs.AI · 2025-09-08 · conditional · novelty 5.0

Introduces PAS and FAS task abstractions plus the LLM-S^3 benchmark to evaluate LLMs on generating sociodemographic survey responses across 11 real datasets and multiple models.

VIDEE: Visual and Interactive Decomposition, Execution, and Evaluation of Text Analytics with Intelligent Agents

cs.CL · 2025-06-17 · unverdicted · novelty 5.0

VIDEE introduces a human-in-the-loop system using Monte-Carlo Tree Search for task decomposition, executable pipeline generation, and LLM-based evaluation with visualizations to support non-expert text analytics.

LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods

cs.CL · 2024-12-07 · accept · novelty 3.0

A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Large Language Models as Virtual Survey Respondents: Evaluating Sociodemographic Response Generation cs.AI · 2025-09-08 · conditional · none · ref 34
Introduces PAS and FAS task abstractions plus the LLM-S^3 benchmark to evaluate LLMs on generating sociodemographic survey responses across 11 real datasets and multiple models.

ChatGPT-4 outperforms experts and crowd workers for annotating political Twitter messages with zero-shot learning

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer