arXiv preprint arXiv:2304.06588 , year=

Petter Törnberg · 2023 · arXiv 2304.06588

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

SPAGBias: Uncovering and Tracing Structured Spatial Gender Bias in Large Language Models

cs.CL · 2026-04-16 · unverdicted · novelty 7.0

SPAGBias reveals that LLMs form nuanced gender associations with specific urban micro-spaces that exceed real-world distributions and produce failures in planning and descriptive tasks.

The Shrinking Lifespan of LLMs in Science

cs.DL · 2026-04-08 · unverdicted · novelty 7.0

LLM adoption in science follows a compressing inverted-U trajectory where release year predicts time-to-peak and lifespan better than model attributes.

Interpretable Discriminative Text Representations via Agreement and Label Disentanglement

cs.CL · 2026-05-20 · unverdicted · novelty 6.0

LFD discovers predictive text features via LLM contrastive proposals, cross-LLM Cohen's kappa screening, and residual held-out gain selection, matching baseline accuracy while achieving higher human agreement and lower label leakage on ten tasks.

Assessing Capabilities of Large Language Models in Social Media Analytics: A Multi-task Quest

cs.CL · 2026-04-21 · unverdicted · novelty 6.0

LLMs show mixed results on authorship verification, post generation, and attribute inference from Twitter data, with new frameworks and user studies establishing benchmarks for these analytics tasks.

Evaluating LLMs as Human Surrogates in Controlled Experiments

cs.HC · 2026-03-08 · unverdicted · novelty 6.0

LLMs reproduce several directional effects from a human accuracy perception experiment but show inconsistent effect magnitudes and moderation patterns across models.

Large Language Models as Virtual Survey Respondents: Evaluating Sociodemographic Response Generation

cs.AI · 2025-09-08 · conditional · novelty 5.0

Introduces PAS and FAS task abstractions plus the LLM-S^3 benchmark to evaluate LLMs on generating sociodemographic survey responses across 11 real datasets and multiple models.

VIDEE: Visual and Interactive Decomposition, Execution, and Evaluation of Text Analytics with Intelligent Agents

cs.CL · 2025-06-17 · unverdicted · novelty 5.0

VIDEE introduces a human-in-the-loop system using Monte-Carlo Tree Search for task decomposition, executable pipeline generation, and LLM-based evaluation with visualizations to support non-expert text analytics.

LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods

cs.CL · 2024-12-07 · accept · novelty 3.0

A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.

citing papers explorer

Showing 8 of 8 citing papers.

SPAGBias: Uncovering and Tracing Structured Spatial Gender Bias in Large Language Models cs.CL · 2026-04-16 · unverdicted · none · ref 59
SPAGBias reveals that LLMs form nuanced gender associations with specific urban micro-spaces that exceed real-world distributions and produce failures in planning and descriptive tasks.
The Shrinking Lifespan of LLMs in Science cs.DL · 2026-04-08 · unverdicted · none · ref 14
LLM adoption in science follows a compressing inverted-U trajectory where release year predicts time-to-peak and lifespan better than model attributes.
Interpretable Discriminative Text Representations via Agreement and Label Disentanglement cs.CL · 2026-05-20 · unverdicted · none · ref 33
LFD discovers predictive text features via LLM contrastive proposals, cross-LLM Cohen's kappa screening, and residual held-out gain selection, matching baseline accuracy while achieving higher human agreement and lower label leakage on ten tasks.
Assessing Capabilities of Large Language Models in Social Media Analytics: A Multi-task Quest cs.CL · 2026-04-21 · unverdicted · none · ref 34
LLMs show mixed results on authorship verification, post generation, and attribute inference from Twitter data, with new frameworks and user studies establishing benchmarks for these analytics tasks.
Evaluating LLMs as Human Surrogates in Controlled Experiments cs.HC · 2026-03-08 · unverdicted · none · ref 31
LLMs reproduce several directional effects from a human accuracy perception experiment but show inconsistent effect magnitudes and moderation patterns across models.
Large Language Models as Virtual Survey Respondents: Evaluating Sociodemographic Response Generation cs.AI · 2025-09-08 · conditional · none · ref 34
Introduces PAS and FAS task abstractions plus the LLM-S^3 benchmark to evaluate LLMs on generating sociodemographic survey responses across 11 real datasets and multiple models.
VIDEE: Visual and Interactive Decomposition, Execution, and Evaluation of Text Analytics with Intelligent Agents cs.CL · 2025-06-17 · unverdicted · none · ref 55
VIDEE introduces a human-in-the-loop system using Monte-Carlo Tree Search for task decomposition, executable pipeline generation, and LLM-based evaluation with visualizations to support non-expert text analytics.
LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods cs.CL · 2024-12-07 · accept · none · ref 225
A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.

arXiv preprint arXiv:2304.06588 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer